Optimizing Models Using Offload and Unsloth with Docker

In recent times, the field of artificial intelligence (AI) has witnessed significant advancements, particularly in the realm of local models. These models, which can be run on personal hardware, have become increasingly accessible and hold great promise for various applications. One such model, known as Gemma 3 270M, is lightweight and can operate on standard hardware. This makes it appealing for widespread deployment across different platforms.

However, despite the initial excitement surrounding these small models, there are considerable challenges when it comes to building specialized applications that require high performance for complex tasks. A recent experiment aimed at evaluating the efficiency of various models in tool-calling revealed that many local models, and even some remote ones, failed to meet the required performance benchmarks. This has led to a reevaluation of strategies concerning the use of small, general-purpose models.

The realization has dawned that relying solely on smaller models often falls short of achieving the desired outcomes for specific, complex tasks. Even larger models can demand substantial effort to attain satisfactory levels of performance and efficiency. Nonetheless, the potential of local models is too significant to overlook due to their inherent advantages.

Some notable benefits of local models include:

Privacy: Local models ensure data privacy as information is processed on the user’s device rather than being sent to an external server.
Offline Capabilities: These models can be used without an internet connection, enhancing their utility in various scenarios.
No Token Usage Costs: Unlike some cloud-based services, local models do not incur costs based on usage.
Elimination of Overload Errors: Users do not encounter error messages related to server overload.
In search of alternatives to overcome the aforementioned challenges, a project called Unsloth has emerged as a viable solution. Unsloth is designed to facilitate the fine-tuning of models, making the process significantly faster and more accessible. Its growing popularity has piqued interest in exploring its capabilities.
This article will guide you through the process of fine-tuning a sub-1GB model to redact sensitive information without disrupting your Python setup. With the assistance of Docker Offload and Unsloth, you can create a portable and shareable GGUF artifact on Docker Hub in under 30 minutes. In a follow-up post, detailed steps for fine-tuning the model will be provided.
Challenges of Fine-Tuning Models
Setting up an environment suitable for fine-tuning models can be a daunting task. It is often fragile, prone to errors, and can be intimidating. Many users find themselves spending hours resolving dependencies and runtime version conflicts before they can even begin training.
Fortunately, Unsloth offers a solution in the form of a ready-to-use Docker image. This eliminates the need to spend time configuring the environment; instead, users can run a container and get started immediately. However, there is still a hardware requirement to consider. For instance, if you work with a MacBook Pro, Unsloth does not natively support this platform, which can be a significant drawback.
This is where Docker Offload comes into play. Docker Offload allows users to leverage GPU-backed resources in the cloud, tapping into NVIDIA acceleration while maintaining their local workflow. This capability provides everything needed to fine-tune models without the constraints of local hardware limitations.
How to Fine-Tune Models Locally with Unsloth and Docker
The question arises: Can a model smaller than 1GB reliably mask personally identifiable information (PII)? Here’s an example to illustrate:
Test Input:
- This is an example of text that contains some data.
- The author of this text is Ignacio López Luna, but everybody calls him Ignasi.
- His ID number is 123456789.
- He has a son named Arnau López, who was born on 21-07-2021.
 Desired Output:
- This is an example of text that contains some data.
- The author of this text is [MASKED] [MASKED], but everybody calls him [MASKED].
- His ID number is [MASKED].
- He has a son named [MASKED], who was born on [MASKED].
 When tested with Gemma 3 270M using Docker Model Runner, the output was simply "[PERSON]," which is clearly inadequate. It became evident that fine-tuning was necessary.
 Step 1: Clone the Example Project
 Begin by cloning the example project repository:
 bash git clone https://github.com/ilopezluna/fine-tuning-examples.git cd fine-tuning-examples/pii-masking 
 The project contains a Python script designed to fine-tune Gemma 3 using the pii-masking-400k dataset from ai4privacy.
 Step 2: Start Docker Offload (with GPU)
 To enable GPU support, select your account and respond with "Yes" when prompted. This will provide access to an NVIDIA L4-backed instance. For more details on checking the status, refer to the Docker Offload Quickstart guide.
 Step 3: Run the Unsloth Container
 The official Unsloth image includes Jupyter and example notebooks. It can be started with the following command:
 bash docker run -d -e JUPYTER_PORT=8000 \ -e JUPYTER_PASSWORD="mypassword" \ -e USER_PASSWORD="unsloth2024" \ -p 8000:8000 \ -v $(pwd):/workspace/work \ --gpus all \ unsloth/unsloth 
 Afterward, attach a shell to the container:
 bash docker exec -it $(docker ps -q) bash 
 Inside the container, useful paths include:
- /workspace/unsloth-notebooks/ → example fine-tuning notebooks
- /workspace/work/ → your mounted working directory
 Docker Offload, powered by Mutagen, ensures that the folder /workspace/work/ stays in sync between the cloud GPU and the local development machine.
 Step 4: Fine-Tune
 The script finetune.py is a compact training loop built around Unsloth. Its purpose is to adapt a base language model to a new task using supervised fine-tuning with LoRA (Low-Rank Adaptation). In this example, the model is trained to mask PII in text.
 LoRA optimizes the process by adding small adapter layers and training only those, enabling quick fine-tuning on a single GPU. The resulting set of weights can be merged back into the base model.
 To initiate the fine-tuning process, execute:
 bash unsloth@46b6d7d46c1a:/workspace$ cd work unsloth@46b6d7d46c1a:/workspace/work$ python finetune.py 
 The script loads the base model, prepares the dataset, executes a brief supervised fine-tuning pass, and saves the LoRA weights to the mounted /workspace/work/ folder. Thanks to Docker Offload, these results are automatically synced back to your local machine.
 The entire training run is designed to complete within 20 minutes on a modern GPU, leaving you with a model that has learned the new masking behavior and is ready for conversion in the next step.
 Step 5: Convert to GGUF
 At this stage, you will find the fine-tuned model artifacts in the /workspace/work/ directory. To package the model for Docker Hub and Docker Model Runner usage, it must be converted to GGUF format. Although Unsloth will soon support this directly, the conversion is currently done manually.
 bash unsloth@1b9b5b5cfd49:/workspace/work$ cd .. unsloth@1b9b5b5cfd49:/workspace$ git clone https://github.com/ggml-org/llama.cpp unsloth@1b9b5b5cfd49:/workspace$ python ./llama.cpp/convert_hf_to_gguf.py work/result/ --outfile work/result.gguf 
 Confirm the local existence of the file to ensure that the Mutagen-powered file sync process has completed:
 bash ((.env3.12) ) ilopezluna@localhost pii-masking % ls -alh result.gguf 
 Step 6: Package and Share on Docker Hub
 Now, package the fine-tuned model and push it to Docker Hub:
 bash ((.env3.12) ) ilopezluna@localhost pii-masking % docker model package --gguf /Users/ilopezluna/Projects/fine-tuning-examples/pii-masking/result.gguf ignaciolopezluna020/my-awesome-model:version1 --push 
 For additional details on distributing models, refer to the Docker blog on packaging models.
 Step 7: Try the Results!
 Finally, test the fine-tuned model using Docker Model Runner:
 bash docker model run ignaciolopezluna020/my-awesome-model:version1 "Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, ' ' and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio López Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau López, who was born on 21-07-2021" 
 Comparing this with the original Gemma 3 270M output, the fine-tuned model is far more effective. It is now published on Docker Hub for anyone to test.
 Why Fine-Tuning Models with Docker Matters
 This experiment demonstrates that small local models need not remain as mere curiosities. With the right tools, they can evolve into practical, specialized assistants for solving real-world problems.
- Speed: Fine-tuning a sub-1GB model took less than 20 minutes with Unsloth and Docker Offload, facilitating quick iteration and experimentation.
- Accessibility: Even without a GPU-equipped machine, Docker Offload enabled GPU-backed training without additional hardware.
- Portability: Once packaged, the model is easy to share and can run anywhere via Docker.
- Utility: Instead of generating vague or irrelevant results, the fine-tuned model reliably performs its intended function, masking PII, which can be immensely valuable in many workflows.
 The power of fine-tuning lies in transforming small, general-purpose models into focused, reliable tools. With Docker’s ecosystem, one does not need to be a machine learning researcher with a high-end workstation to achieve this. The entire process, from training to testing, packaging, and sharing, can be accomplished using familiar Docker workflows.
 Therefore, the next time you encounter small models and dismiss them as impractical, remember that with some fine-tuning, they can become exceptionally useful.
 We’re Building This Together!
 Docker Model Runner is a community-driven project, and its future is shaped by contributors like you. If you find this tool beneficial, consider visiting the GitHub repository. Show your support by giving it a star, experimenting with your ideas, and contributing to its development. Whether it’s improving documentation, fixing bugs, or introducing new features, every contribution counts. Let’s shape the future of model deployment together!
 For those interested in exploring further, you can start with Docker Offload for GPU on demand and delve deeper into the possibilities it offers.
 Learn More
 For additional information and to continue your journey with Docker and model fine-tuning, refer to the resources available on the Docker website and related documentation.

For more Information, Refer to this article.

Optimizing Models Using Offload and Unsloth with Docker

Challenges of Fine-Tuning Models

How to Fine-Tune Models Locally with Unsloth and Docker

Step 1: Clone the Example Project

Step 2: Start Docker Offload (with GPU)

Step 3: Run the Unsloth Container

Step 4: Fine-Tune

Step 5: Convert to GGUF

Step 6: Package and Share on Docker Hub

Step 7: Try the Results!

Why Fine-Tuning Models with Docker Matters

We’re Building This Together!

Learn More

You may also like these:

Latest From Hawkdive

You May like these Related Articles

LEAVE A REPLY Cancel reply