Fine Tuning an AI Model on your Mac

The goal is to teach a tiny local AI to read executive management jargon and output the brutal engineering reality.

Ashok Vishwakarma

Apr 20, 2026

Data-center AI model on MacBook shows Apple may win AI race | Cult of Mac

Most of us (Engineers) have a Mac machine.

No hard feelings if you have a different preference or use case, I totally respect your choice of hardware.

But, this one is for those who owns a Mac and know a little Python 🙂

I have a Mac Studio with M1 Ultra SOC with 64 GB Unified Memory.

Don’t be jealous but its perfectly fine if you own any Mac Pro machine with a M series Chip and at least 16+ GB of Unified Memory.

The goal is to teach a tiny local AI to read executive management jargon and output the brutal engineering reality.

We are building a Brutally Honest Corporate Translator 😊.

So let’s get started.

Environment Setup

Open your terminal and execute the following package installation.

# Make sure your have Python 3.x installed
# Test using python --version
# You may also try python3 --version

# If your python --version is 3.x
pip install mlx-lm 

# if your python3 --version is 3.x
pip3 install mlx-lm 

# If you don't have python 3.x
# please install it before installing mxl-lm

This single command installs the core framework without complex compilation steps or dependency hell.

The Model (SLM)

Look at the physical memory limits of your machine before choosing a model.

We often download the largest model available and immediately crash our system.

We need to match the parameter count to the available Unified Memory of our machine.

A 70 billion parameter model requires nearly 40 gigabytes of memory just to load the weights.

You cannot train that on a standard laptop. We need to evaluate smaller highly optimized architectures designed specifically for edge execution.

Here are the most viable models for local fine tuning today.

Meta Llama 3 8B Instruct

This is the current gold standard for local reasoning. It requires at least 16 gigabytes of unified memory to train comfortably but delivers enterprise grade logic.

Alibaba Qwen 2.5 7B

This is a highly capable alternative that performs exceptionally well on coding tasks and structured data extraction.

HuggingFaceTB SmolLM2 1.7B Instruct

We will use this specific tiny model for our tutorial today. It requires very little memory and will train flawlessly on a basic entry level machine in just a few minutes.

Training Data

AI models learn strictly through pattern recognition and examples.

Create a new directory called data.

mkdir data

Inside this folder you must create three specific files named

touch train.json valid.json test.json

We will format our examples using standard prompt and completion keys.

Copy the following JSON blocks and paste them into all three of your data files for this engineering experiment.

{"prompt": "We need an agile MVP to synergize our deliverables", "completion": "We are shipping a completely broken prototype on Friday"}
{"prompt": "Let us put a pin in this and circle back when we have more bandwidth", "completion": "I am never going to approve this feature"}
{"prompt": "We are experiencing a temporary degradation of service", "completion": "Production is completely down and the database is on fire"}
{"prompt": "The legacy system requires a paradigm shift", "completion": "We need to delete the entire codebase and start over"}
{"prompt": "We are currently evaluating our strategic resourcing alignment", "completion": "We are planning massive layoffs next month"}
{"prompt": "The ticket is currently blocked by cross functional dependencies", "completion": "I have not started working on this and I am blaming another team"}

I suggest to add more data into this json which will make responses more accurate and fun. You can use this as a sample and generate more using ChatGPT, Gemini etc.

Or you can use the one I have used 😊

Training

Run the following exact terminal command to initiate the training process using our chosen baseline model.

python -m mlx_lm.lora \
  --model HuggingFaceTB/SmolLM2-1.7B-Instruct \
  --train \
  --data ./data \
  --iters 500

Look at the mechanical physics of this command. The framework freezes the massive original weights of the base model. It only trains a tiny new adapter layer on top of the frozen parameters.

This mathematical efficiency is exactly why it runs flawlessly on a laptop without melting the processor or exhausting the battery.

The Adapters

Once the training step completes successfully, it will create an adapters folder.

This specific folder holds the specialized sarcastic knowledge we just mathematically injected into the system.

We now need to merge this new knowledge permanently into the base model.

Run the following command to fuse the new weights.

python -m mlx_lm.fuse \
  --model HuggingFaceTB/SmolLM2-1.7B-Instruct \
  --adapter-path ./adapters \
  --save-path ./fused_model

This command executes a permanent mathematical merge. It takes the massive matrix of the original frozen model and adds the tiny specialized adapter weights directly into it.

The result is a single consolidated model directory.

You no longer need to manage separate adapter files because the new corporate knowledge is permanently baked into the core neural network.

The Fun

We are ready to test the compiled engine.

Execute the following generation command to see your new translator in production.

python -m mlx_lm.generate \
  --model ./fused_model \
  --prompt "The legacy system requires a paradigm shift"

The model will instantly process the prompt and output the honest translation natively right on your machine.

Here are some fun responses I got while playing with it

Prompt: We are adopting a flat organizational structure
Management wants everyone to do the work of three people without a title promotion

Prompt: We value your feedback and have added it to the product backlog
We are ignoring your idea completely

Prompt: The new architecture is highly scalable and future proof
We added Kubernetes to a basic web application and nobody knows how it works anymore

Prompt: We are embracing a fast paced startup culture
You will work weekends and we will not pay you overtime

Conclusion

Weigh the unit economics of this exercise against the operational output before reaching a conclusion.

You now possess a fully customized and functional model without paying a single cent to an external cloud provider.

The return on investment for utilizing existing local hardware is massive.

While Apple Silicon is absolutely not built for planetary scale distributed training it is the undisputed king of local edge engineering.

You own the hardware and you own the intelligence.

Do share your prompts and responses in the comments 😊

If you are curious and want to learn more watch this video from wwdc2024

Discussion about this post

Ready for more?