When it comes to delving into the realm of Generative AI (GenAI), many might instinctively think of Python as the go-to language. However, for those already familiar with Java, there’s no requirement to switch gears entirely. The Java ecosystem is equipped with a suite of tools and libraries that streamline the creation of GenAI applications, ensuring they are accessible and efficient to develop.
In this informative piece, we’ll explore how to construct a GenAI application using Java. We will guide you through a step-by-step demonstration, showcasing how Retrieval-Augmented Generation (RAG) can enhance model responses by utilizing Spring AI and Docker tools. Spring AI integrates with numerous model providers, engaging both chat and embedding models, along with vector databases. For our demonstration, we’ll employ OpenAI and Qdrant modules from the Spring AI project, leveraging the built-in support for seamless integration. Further, by using Docker Model Runner, we can run AI models locally via an OpenAI-compatible API, offering a local alternative to cloud-hosted models. Automated testing will be conducted using Testcontainers and Spring AI’s tools, ensuring the responses from Large Language Models (LLMs) are contextually accurate. Grafana will be used for observability to ensure our app functions as intended.
Getting Started
To begin constructing a sample application, visit Spring Initializr and select the following dependencies: Web, OpenAI, Qdrant Vector Database, and Testcontainers. The application will feature two endpoints: "/chat," which directly interacts with the model, and "/rag," which provides the model with additional context from documents stored in a vector database.
Configuring Docker Model Runner
Enable Docker Model Runner in your Docker Desktop or Docker Engine, as detailed in the official documentation. Next, pull the following models using Docker commands:
- ai/llama3.1 – a chat model
- ai/mxbai-embed-large – an embedding model
These models are hosted on Docker Hub under the "ai" namespace. While selecting a specific tag for the model might offer different quantizations, the default option is generally a suitable starting point.
Building the GenAI App
Let’s create a
ChatControllerundersrc/main/java/com/example, which will serve as our entry point for interaction with the chat model:java<br /> @RestController<br /> public class ChatController {<br /> <br /> private final ChatClient chatClient;<br /> <br /> public ChatController(ChatModel chatModel) {<br /> this.chatClient = ChatClient.builder(chatModel).build();<br /> }<br /> <br /> @GetMapping("/chat")<br /> public String generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {<br /> return this.chatClient.prompt().user(message).call().content();<br /> }<br /> <br /> }<br />- ChatClient is an interface facilitating operations to interact with the model. The actual model value (which model to use) will be injected via configuration properties.
- If no message query parameter is provided, the model defaults to telling a joke.
Configure the application to point to Docker Model Runner and employ the "ai/llama3.1" model by adding the following properties to
src/test/resources/application.properties:plaintext<br /> spring.ai.openai.base-url=http://localhost:12434/engines<br /> spring.ai.openai.api-key=test<br /> spring.ai.openai.chat.options.model=ai/llama3.1<br />The
spring.ai.openai.api-keyis a requirement by the framework, but any value can be used since it is not needed for Docker Model Runner.Launch the application by running
./mvnw spring-boot:test-runor./gradlew bootTestRunand inquire about Testcontainers:shell<br /> http :8080/chat message=="What’s testcontainers?"<br />Below is the response from the LLM (ai/llama3.1):
"Testcontainers is a fantastic and increasingly popular library for local testing with containers. It provides a way to run real, fully functional containerized services directly within your tests, leading to more realistic and reliable test results."
Observing Mistakes and Hallucinations
The LLM’s response contains inaccuracies, such as references to non-existent classes or incorrect URLs. This highlights the need for providing models with curated context to improve response accuracy.
Enhancing Response Accuracy with RAG
We can enhance the model’s response by providing it with curated context. Let’s create a
RagControllerto retrieve documents from a vector search database:java<br /> @RestController<br /> public class RagController {<br /> <br /> private final ChatClient chatClient;<br /> private final VectorStore vectorStore;<br /> <br /> public RagController(ChatModel chatModel, VectorStore vectorStore) {<br /> this.chatClient = ChatClient.builder(chatModel).build();<br /> this.vectorStore = vectorStore;<br /> }<br /> <br /> @GetMapping("/rag")<br /> public String generate(@RequestParam(value = "message", defaultValue = "What's Testcontainers?") String message) {<br /> return callResponseSpec(this.chatClient, this.vectorStore, message).content();<br /> }<br /> <br /> static ChatClient.CallResponseSpec callResponseSpec(ChatClient chatClient, VectorStore vectorStore,<br /> String question) {<br /> QuestionAnswerAdvisor questionAnswerAdvisor = QuestionAnswerAdvisor.builder(vectorStore)<br /> .searchRequest(SearchRequest.builder().topK(1).build())<br /> .build();<br /> return chatClient.prompt().advisors(questionAnswerAdvisor).user(question).call();<br /> }<br /> }<br />Ingesting Documents into the Vector Database
To provide the model with context, we need to load documents into the vector database. Create an
IngestionConfigurationclass insrc/test/java/com/example:java<br /> @TestConfiguration(proxyBeanMethods = false)<br /> public class IngestionConfiguration {<br /> <br /> @Value("classpath:/docs/testcontainers.txt")<br /> private Resource testcontainersDoc;<br /> <br /> @Bean<br /> ApplicationRunner init(VectorStore vectorStore) {<br /> return args -> {<br /> var javaTextReader = new TextReader(this.testcontainersDoc);<br /> javaTextReader.getCustomMetadata().put("language", "java");<br /> <br /> var tokenTextSplitter = new TokenTextSplitter();<br /> var testcontainersDocuments = tokenTextSplitter.apply(javaTextReader.get());<br /> <br /> vectorStore.add(testcontainersDocuments);<br /> };<br /> }<br /> }<br />The file
testcontainers.txtin thesrc/test/resources/docsdirectory should contain relevant information about Testcontainers. For practical applications, a broader document collection is recommended.Add properties to
src/test/resources/application.properties:plaintext<br /> spring.ai.openai.embedding.options.model=ai/mxbai-embed-large<br /> spring.ai.vectorstore.qdrant.initialize-schema=true<br /> spring.ai.vectorstore.qdrant.collection-name=test<br />The
ai/mxbai-embed-largemodel is used to create embeddings of the documents, which are then stored in the vector search database (Qdrant in this case). Spring AI will initialize the Qdrant schema and use the specified collection name.Update the
TestDemoApplicationJava class to includeIngestionConfiguration.class:java<br /> public class TestDemoApplication {<br /> <br /> public static void main(String[] args) {<br /> SpringApplication.from(DemoApplication::main)<br /> .with(TestcontainersConfiguration.class, IngestionConfiguration.class)<br /> .run(args);<br /> }<br /> }<br />Restart the application and query about Testcontainers again:
shell<br /> http :8080/rag message=="What’s testcontainers?"<br />This time, the response will be more accurate, drawing references from the provided documentation.
Integration Testing
Testing is an integral part of software development. Using Testcontainers and Spring AI utilities, we can automate testing of GenAI applications. We can create integration tests to ensure the LLM provides contextually accurate answers using the document data.
java<br /> @SpringBootTest(classes = { TestcontainersConfiguration.class, IngestionConfiguration.class },<br /> webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)<br /> class RagControllerTest {<br /> <br /> @LocalServerPort<br /> private int port;<br /> <br /> @Autowired<br /> private VectorStore vectorStore;<br /> <br /> @Autowired<br /> private ChatClient.Builder chatClientBuilder;<br /> <br /> @Test<br /> void verifyTestcontainersAnswer() {<br /> var question = "Tell me about Testcontainers";<br /> var answer = retrieveAnswer(question);<br /> <br /> assertFactCheck(question, answer);<br /> }<br /> <br /> private String retrieveAnswer(String question) {<br /> RestClient restClient = RestClient.builder().baseUrl("http://localhost:%d".formatted(this.port)).build();<br /> return restClient.get().uri("/rag?message={question}", question).retrieve().body(String.class);<br /> }<br /> <br /> private void assertFactCheck(String question, String answer) {<br /> FactCheckingEvaluator factCheckingEvaluator = new FactCheckingEvaluator(this.chatClientBuilder);<br /> EvaluationResponse evaluate = factCheckingEvaluator.evaluate(new EvaluationRequest(docs(question), answer));<br /> assertThat(evaluate.isPass()).isTrue();<br /> }<br /> <br /> private List<Document> docs(String question) {<br /> var response = RagController<br /> .callResponseSpec(this.chatClientBuilder.build(), this.vectorStore, question)<br /> .chatResponse();<br /> return response.getMetadata().get(QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS);<br /> }<br /> }<br />Automating tests ensures consistency and minimizes errors that can occur with manual testing.
Observability with Grafana LGTM Stack
Observability is crucial for understanding application behavior in development and production. By introducing metrics and tracing, we can monitor the application’s performance and ensure it meets design expectations.
Add the following dependencies to
pom.xml:“`xml
org.springframework.boot
spring-boot-starter-actuator
io.micrometer
micrometer-registry-otlp
io.micrometer
micrometer-tracing-bridge-otel
io.opentelemetry
opentelemetry-exporter-otlp
org.testcontainers
grafana
test
“`Create a `GrafanaContainerConfiguration` under `src/test/java/com/example`:
“`java
@TestConfiguration(proxyBeanMethods = false)
public class GrafanaContainerConfiguration {@Bean
@ServiceConnection
LgtmStackContainer lgtmContainer() {
return new LgtmStackContainer(“grafana/otel-lgtm:0.11.4”);
}
}
“`Grafana provides a comprehensive Docker image that includes Prometheus, Tempo, and OpenTelemetry Collector services, allowing us to monitor performance metrics and traces effectively.
Add properties to `src/test/resources/application.properties` to sample 100% of requests:
“`plaintext
spring.application.name=demo
management.tracing.sampling.probability=1
“`Update the `TestDemoApplication` class to include `GrafanaContainerConfiguration.class`:
“`java
public class TestDemoApplication {public static void main(String[] args) {
SpringApplication.from(DemoApplication::main)
.with(TestcontainersConfiguration.class, IngestionConfiguration.class, GrafanaContainerConfiguration.class)
.run(args);
}
}
“`Run the application and perform a request:
“`shell
http :8080/rag message==”What’s testcontainers?”
“`Check the logs for Grafana dashboard access details and explore the metrics and traces for insights into application performance.
### Conclusion
The combination of Docker and Spring AI offers a robust and efficient platform for developing GenAI applications. Docker simplifies the initialization of service dependencies, including the Docker Model Runner, which provides an OpenAI-compatible API for local model execution. Testcontainers facilitate rapid integration testing by offering lightweight containers for services and dependencies. Together, Docker and Spring AI enable developers to build sophisticated AI-driven applications efficiently, from development through to production.
### Learn More
To delve deeper into this topic, consider exploring resources on Docker Model Runner, Spring AI, and Testcontainers. These tools provide a strong foundation for creating advanced AI applications within the Java ecosystem.
For more Information, Refer to this article.

































