Creating a Java-Based GenAI App Using Docker

When it comes to delving into the realm of Generative AI (GenAI), many might instinctively think of Python as the go-to language. However, for those already familiar with Java, there’s no requirement to switch gears entirely. The Java ecosystem is equipped with a suite of tools and libraries that streamline the creation of GenAI applications, ensuring they are accessible and efficient to develop.

In this informative piece, we’ll explore how to construct a GenAI application using Java. We will guide you through a step-by-step demonstration, showcasing how Retrieval-Augmented Generation (RAG) can enhance model responses by utilizing Spring AI and Docker tools. Spring AI integrates with numerous model providers, engaging both chat and embedding models, along with vector databases. For our demonstration, we’ll employ OpenAI and Qdrant modules from the Spring AI project, leveraging the built-in support for seamless integration. Further, by using Docker Model Runner, we can run AI models locally via an OpenAI-compatible API, offering a local alternative to cloud-hosted models. Automated testing will be conducted using Testcontainers and Spring AI’s tools, ensuring the responses from Large Language Models (LLMs) are contextually accurate. Grafana will be used for observability to ensure our app functions as intended.

Getting Started

To begin constructing a sample application, visit Spring Initializr and select the following dependencies: Web, OpenAI, Qdrant Vector Database, and Testcontainers. The application will feature two endpoints: "/chat," which directly interacts with the model, and "/rag," which provides the model with additional context from documents stored in a vector database.

Configuring Docker Model Runner

Enable Docker Model Runner in your Docker Desktop or Docker Engine, as detailed in the official documentation. Next, pull the following models using Docker commands:

ai/llama3.1 – a chat model
ai/mxbai-embed-large – an embedding model
These models are hosted on Docker Hub under the "ai" namespace. While selecting a specific tag for the model might offer different quantizations, the default option is generally a suitable starting point.
Building the GenAI App
Let’s create a ChatController under src/main/java/com/example, which will serve as our entry point for interaction with the chat model:
java @RestController public class ChatController { private final ChatClient chatClient; public ChatController(ChatModel chatModel) { this.chatClient = ChatClient.builder(chatModel).build(); } @GetMapping("/chat") public String generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) { return this.chatClient.prompt().user(message).call().content(); } } 
- ChatClient is an interface facilitating operations to interact with the model. The actual model value (which model to use) will be injected via configuration properties.
- If no message query parameter is provided, the model defaults to telling a joke.
 Configure the application to point to Docker Model Runner and employ the "ai/llama3.1" model by adding the following properties to src/test/resources/application.properties:
 plaintext spring.ai.openai.base-url=http://localhost:12434/engines spring.ai.openai.api-key=test spring.ai.openai.chat.options.model=ai/llama3.1 
 The spring.ai.openai.api-key is a requirement by the framework, but any value can be used since it is not needed for Docker Model Runner.
 Launch the application by running ./mvnw spring-boot:test-run or ./gradlew bootTestRun and inquire about Testcontainers:
 shell http :8080/chat message=="What’s testcontainers?" 
 Below is the response from the LLM (ai/llama3.1):
 "Testcontainers is a fantastic and increasingly popular library for local testing with containers. It provides a way to run real, fully functional containerized services directly within your tests, leading to more realistic and reliable test results."
 Observing Mistakes and Hallucinations
 The LLM’s response contains inaccuracies, such as references to non-existent classes or incorrect URLs. This highlights the need for providing models with curated context to improve response accuracy.
 Enhancing Response Accuracy with RAG
 We can enhance the model’s response by providing it with curated context. Let’s create a RagController to retrieve documents from a vector search database:
 java @RestController public class RagController { private final ChatClient chatClient; private final VectorStore vectorStore; public RagController(ChatModel chatModel, VectorStore vectorStore) { this.chatClient = ChatClient.builder(chatModel).build(); this.vectorStore = vectorStore; } @GetMapping("/rag") public String generate(@RequestParam(value = "message", defaultValue = "What's Testcontainers?") String message) { return callResponseSpec(this.chatClient, this.vectorStore, message).content(); } static ChatClient.CallResponseSpec callResponseSpec(ChatClient chatClient, VectorStore vectorStore, String question) { QuestionAnswerAdvisor questionAnswerAdvisor = QuestionAnswerAdvisor.builder(vectorStore) .searchRequest(SearchRequest.builder().topK(1).build()) .build(); return chatClient.prompt().advisors(questionAnswerAdvisor).user(question).call(); } } 
 Ingesting Documents into the Vector Database
 To provide the model with context, we need to load documents into the vector database. Create an IngestionConfiguration class in src/test/java/com/example:
 java @TestConfiguration(proxyBeanMethods = false) public class IngestionConfiguration { @Value("classpath:/docs/testcontainers.txt") private Resource testcontainersDoc; @Bean ApplicationRunner init(VectorStore vectorStore) { return args -> { var javaTextReader = new TextReader(this.testcontainersDoc); javaTextReader.getCustomMetadata().put("language", "java"); var tokenTextSplitter = new TokenTextSplitter(); var testcontainersDocuments = tokenTextSplitter.apply(javaTextReader.get()); vectorStore.add(testcontainersDocuments); }; } } 
 The file testcontainers.txt in the src/test/resources/docs directory should contain relevant information about Testcontainers. For practical applications, a broader document collection is recommended.
 Add properties to src/test/resources/application.properties:
 plaintext spring.ai.openai.embedding.options.model=ai/mxbai-embed-large spring.ai.vectorstore.qdrant.initialize-schema=true spring.ai.vectorstore.qdrant.collection-name=test 
 The ai/mxbai-embed-large model is used to create embeddings of the documents, which are then stored in the vector search database (Qdrant in this case). Spring AI will initialize the Qdrant schema and use the specified collection name.
 Update the TestDemoApplication Java class to include IngestionConfiguration.class:
 java public class TestDemoApplication { public static void main(String[] args) { SpringApplication.from(DemoApplication::main) .with(TestcontainersConfiguration.class, IngestionConfiguration.class) .run(args); } } 
 Restart the application and query about Testcontainers again:
 shell http :8080/rag message=="What’s testcontainers?" 
 This time, the response will be more accurate, drawing references from the provided documentation.
 Integration Testing
 Testing is an integral part of software development. Using Testcontainers and Spring AI utilities, we can automate testing of GenAI applications. We can create integration tests to ensure the LLM provides contextually accurate answers using the document data.
 java @SpringBootTest(classes = { TestcontainersConfiguration.class, IngestionConfiguration.class }, webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT) class RagControllerTest { @LocalServerPort private int port; @Autowired private VectorStore vectorStore; @Autowired private ChatClient.Builder chatClientBuilder; @Test void verifyTestcontainersAnswer() { var question = "Tell me about Testcontainers"; var answer = retrieveAnswer(question); assertFactCheck(question, answer); } private String retrieveAnswer(String question) { RestClient restClient = RestClient.builder().baseUrl("http://localhost:%d".formatted(this.port)).build(); return restClient.get().uri("/rag?message={question}", question).retrieve().body(String.class); } private void assertFactCheck(String question, String answer) { FactCheckingEvaluator factCheckingEvaluator = new FactCheckingEvaluator(this.chatClientBuilder); EvaluationResponse evaluate = factCheckingEvaluator.evaluate(new EvaluationRequest(docs(question), answer)); assertThat(evaluate.isPass()).isTrue(); } private List<Document> docs(String question) { var response = RagController .callResponseSpec(this.chatClientBuilder.build(), this.vectorStore, question) .chatResponse(); return response.getMetadata().get(QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS); } } 
 Automating tests ensures consistency and minimizes errors that can occur with manual testing.
 Observability with Grafana LGTM Stack
 Observability is crucial for understanding application behavior in development and production. By introducing metrics and tracing, we can monitor the application’s performance and ensure it meets design expectations.
 Add the following dependencies to pom.xml:
 “`xml
 
 org.springframework.boot
 spring-boot-starter-actuator
 
 io.micrometer
 micrometer-registry-otlp
 
 io.micrometer
 micrometer-tracing-bridge-otel
 
 io.opentelemetry
 opentelemetry-exporter-otlp
 
 org.testcontainers
 grafana
 test
 
 “`
 Create a `GrafanaContainerConfiguration` under `src/test/java/com/example`:
 “`java
 @TestConfiguration(proxyBeanMethods = false)
 public class GrafanaContainerConfiguration {
 @Bean
 @ServiceConnection
 LgtmStackContainer lgtmContainer() {
 return new LgtmStackContainer(“grafana/otel-lgtm:0.11.4”);
 }
 }
 “`
 Grafana provides a comprehensive Docker image that includes Prometheus, Tempo, and OpenTelemetry Collector services, allowing us to monitor performance metrics and traces effectively.
 Add properties to `src/test/resources/application.properties` to sample 100% of requests:
 “`plaintext
 spring.application.name=demo
 management.tracing.sampling.probability=1
 “`
 Update the `TestDemoApplication` class to include `GrafanaContainerConfiguration.class`:
 “`java
 public class TestDemoApplication {
 public static void main(String[] args) {
 SpringApplication.from(DemoApplication::main)
 .with(TestcontainersConfiguration.class, IngestionConfiguration.class, GrafanaContainerConfiguration.class)
 .run(args);
 }
 }
 “`
 Run the application and perform a request:
 “`shell
 http :8080/rag message==”What’s testcontainers?”
 “`
 Check the logs for Grafana dashboard access details and explore the metrics and traces for insights into application performance.
 ### Conclusion
 The combination of Docker and Spring AI offers a robust and efficient platform for developing GenAI applications. Docker simplifies the initialization of service dependencies, including the Docker Model Runner, which provides an OpenAI-compatible API for local model execution. Testcontainers facilitate rapid integration testing by offering lightweight containers for services and dependencies. Together, Docker and Spring AI enable developers to build sophisticated AI-driven applications efficiently, from development through to production.
 ### Learn More
 To delve deeper into this topic, consider exploring resources on Docker Model Runner, Spring AI, and Testcontainers. These tools provide a strong foundation for creating advanced AI applications within the Java ecosystem.

For more Information, Refer to this article.

Creating a Java-Based GenAI App Using Docker

Getting Started

Configuring Docker Model Runner

Building the GenAI App

Observing Mistakes and Hallucinations

Enhancing Response Accuracy with RAG

Ingesting Documents into the Vector Database

Integration Testing

Observability with Grafana LGTM Stack

You may also like these:

Latest From Hawkdive

You May like these Related Articles

LEAVE A REPLY Cancel reply