Guide to Locally Deploy AI Models Using Docker Runner

Introduction

In the ever-evolving landscape of technology, the integration of Artificial Intelligence (AI) into our systems has become an essential aspect of modern infrastructure. As a Senior DevOps Engineer and Docker Captain, I have witnessed firsthand the transformative power of AI, whether it’s in enhancing retail experiences or advancing medical imaging. Today, we’re diving into a guide on how to effectively run and package local AI models using the Docker Model Runner. This tool is an efficient, developer-friendly solution for managing AI models sourced from Docker Hub or Hugging Face. Our aim is to show you how to operate these models through the command line interface (CLI) or via an API, and even publish your own model artifacts, all without the need for setting up complex Python environments or web servers.

Understanding AI in Development

AI, or Artificial Intelligence, refers to systems and technologies that mimic human intelligence. This includes:

Decision Making through Machine Learning: AI systems make decisions and predictions based on data patterns.
Natural Language Processing (NLP): These systems understand and generate human language, facilitating communication between humans and machines.
Computer Vision: AI can recognize and interpret visual data, much like how humans perceive images.
Automatic Learning from New Data: AI systems can continually improve by learning from new data inputs.
Common Types of AI in Development
Machine Learning (ML): This involves algorithms that learn from structured and unstructured data to make predictions or decisions.
Deep Learning: Utilizes neural networks to recognize patterns, often used in complex tasks like image and voice recognition.
Natural Language Processing (NLP): Enables machines to understand and generate human language, crucial for applications like chatbots.
Computer Vision: Allows computers to interpret and make decisions based on visual inputs.
The Importance of Running and Packaging Your Own AI Model
Running and packaging AI models locally provide several advantages. By operating models directly on your machine, you gain:
Faster Inference: Eliminates latency issues associated with remote API calls, leading to quicker results.
Enhanced Privacy: Data is processed on your hardware, reducing the risk of data breaches.
Customization: Allows for the packaging and versioning of personalized models.
Seamless Integration: Works well with continuous integration/continuous deployment (CI/CD) tools like Docker and GitHub Actions.
Offline Capabilities: Ideal for edge computing or environments with limited connectivity.
Platforms such as Docker and Hugging Face offer access to advanced AI models, which can be run locally for improved speed, privacy, and iteration.
Real-World Applications of AI
AI is revolutionizing various industries with practical applications, including:
Chatbots and Virtual Assistants: These automate customer support and engagement, as seen in platforms like ChatGPT and Alexa.
Generative AI: Used to create text, art, or music, with examples including Midjourney and Lensa.
Development Tools: Enhance coding efficiency with features like autocomplete and debugging, exemplified by GitHub Copilot.
Retail Intelligence: AI analyzes consumer behavior to recommend products.
Medical Imaging: AI assists in analyzing medical scans for quicker diagnosis.
Using Docker Model Runner for Local AI Model Packaging and Execution
Prerequisites
Before diving into the steps, ensure that you have Docker Desktop installed and configured on your system. This setup is crucial for enabling the Docker Model Runner feature and accessing experimental features.
Step 0: Enable Docker Model Runner
1. Open Docker Desktop.
2. Navigate to Settings and then Features in development.
3. Under the Experimental features tab, enable Access experimental features.
4. Apply the changes and restart Docker Desktop.
5. Verify the settings by reopening Docker Desktop and ensuring the Docker Model Runner is enabled.
6. Optionally, enable host-side TCP support to access the API from localhost.
  Once enabled, you can manage AI models using the Docker model CLI in the Models tab.
  Step 1: Pull a Model
  You can pull models from Docker Hub or Hugging Face using the following commands:
From Docker Hub:
 docker model pull ai/smollm2 
From Hugging Face (GGUF format):
 docker model pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF 
Note: GGUF, or GPT-style General Use Format, is a lightweight binary file format optimized for local inference, particularly with CPU-optimized runtimes like llama.cpp.
Step 2: Tag and Push to Local Registry (Optional)
To push models to a private or local registry, follow these steps:
1. Tag the model with your registry’s address:
  docker model tag hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF localhost:5000/foobar 
2. Run a local Docker registry:
  docker run -d -p 6000:5000 --name registry registry:2 
3. Push the model to the local registry:
  docker model push localhost:6000/foobar 
 Step 3: Run the Model
 To execute a prompt:
For a one-time prompt:
 docker model run ai/smollm2 "What is Docker?" 
For interactive chat mode:
 docker model run ai/smollm2 
Note: Models are loaded into memory as needed and are automatically unloaded after five minutes of inactivity.
Step 4: Test via OpenAI-Compatible API
To call the model from the host:
1. Enable TCP host access for Model Runner via Docker Desktop GUI or CLI:
  docker desktop enable model-runner --tcp 12434 
2. Send a prompt using the OpenAI-compatible chat endpoint:
  curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ai/smollm2", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me about the fall of Rome."} ] }' 
 Note: No API key is required as this runs locally and securely.
 Step 5: Package Your Own Model
 If you have a pre-trained GGUF model, you can package it as a Docker-compatible artifact:
3. Use the following command:
  docker model package \ --gguf "$(pwd)/model.gguf" \ --license "$(pwd)/LICENSE.txt" \ --push registry.example.com/ai/custom-llm:v1 
 This is particularly useful for custom-trained or private models. Pull the model like any other model:
  docker model pull registry.example.com/ai/custom-llm:v1 
 Step 6: Optimize & Iterate
 Consider these strategies for optimization and iteration:
Monitor model usage and debug issues with docker model logs.
Implement CI/CD pipelines to automate processes like pulls, scans, and packaging.
Track model versions and lineage for consistency.
Use semantic versioning instead of generic tags like "latest."
Remember that only one model can be loaded at a time.
Compose Integration (Optional)
Docker Compose v2.35+ supports AI model services via a new provider type. Models can be defined in your compose.yml and referenced in app services. During docker compose up, Docker Model Runner automatically handles model pulling and initiation, streamlining multi-container AI applications. More details can be found in the Docker Compose documentation.
Navigating Challenges in AI Development
When working with AI, consider these aspects:
Latency: Use quantized GGUF models for efficiency.
Security: Validate model sources and attach licenses.
Compliance: Handle personally identifiable information (PII) responsibly.
Costs: Running models locally can help avoid high cloud computing expenses.
Best Practices
To ensure effective AI model management:
Opt for GGUF models for optimal CPU inference.
Use the --license flag for compliance when packaging models.
Employ versioned tags for clarity and consistency.
Monitor model activity via docker model logs.
Validate sources before pulling or packaging models.
Pull models only from trusted repositories, such as Docker Hub’s AI namespace or verified Hugging Face repositories.
Review licenses and usage terms before deploying models.
The Future of AI with Docker Model Runner
Looking ahead, we can anticipate exciting developments:
Support for Retrieval-Augmented Generation (RAG).
Expanded multimodal support, encompassing text, images, video, and audio.
Integration of Large Language Models (LLMs) as services within Docker Compose.
Enhanced Model Dashboard features for better management in Docker Desktop.
Secure pipelines for packaging and deploying private AI models.
Docker Model Runner empowers DevOps teams to handle AI models like any other software artifact—efficiently pulled, tagged, versioned, tested, and deployed.
Final Thoughts
Building AI applications doesn’t necessitate a complex setup with GPU clusters or external APIs. By leveraging Docker Model Runner, you can:
Pull prebuilt models from reliable sources like Docker Hub or Hugging Face.
Run models locally through CLI, API, or Docker Desktop’s Model tab.
Package and distribute your own models as OCI artifacts.
Seamlessly integrate AI models into CI/CD pipelines.
For those eager to explore further, more resources and information are readily available on the Docker Model Runner documentation page.
In essence, you’re not just deploying containers—you’re delivering intelligence. Learn more and embark on your AI journey today.

For more Information, Refer to this article.

Guide to Locally Deploy AI Models Using Docker Runner

Introduction

Understanding AI in Development

Common Types of AI in Development

The Importance of Running and Packaging Your Own AI Model

Real-World Applications of AI

Using Docker Model Runner for Local AI Model Packaging and Execution

Prerequisites

Step 0: Enable Docker Model Runner

Step 1: Pull a Model

Step 2: Tag and Push to Local Registry (Optional)

Step 3: Run the Model

Step 4: Test via OpenAI-Compatible API

Step 5: Package Your Own Model

Step 6: Optimize & Iterate

Compose Integration (Optional)

Navigating Challenges in AI Development

Best Practices

The Future of AI with Docker Model Runner

Final Thoughts

You may also like these:

Latest From Hawkdive

You May like these Related Articles

LEAVE A REPLY Cancel reply