Running AI Locally Changed How I Think About Software

I kept recharging my Colab subscription until I realized I was renting confusion.

The Cloud Trap

I started with Google Colab Pro.

The goal was simple: train a binary classifier, then work my way up to fine-tuning small LLMs using HuggingFace.

Colab made it easy to start. GPU in the browser. No setup. No hardware. Just code.

But the sessions kept dying.

I'd be mid-training, tweaking hyperparameters, watching loss curves - and the runtime would disconnect. Progress lost. Re-upload. Re-run. Wait.

Then the compute units ran out.

I recharged.

Ran out again.

Recharged again.

Somewhere around the third top-up, I stopped and did the math.

I was paying cloud rent for a GPU I didn't control, on a timeline I couldn't predict, debugging problems I couldn't see.

So I switched to my own machine.

The Crash

I tried fine-tuning Mistral 7B with LoRA using the HuggingFace PEFT library.

The script started.

Tokenizer loaded.

Model loaded.

Then:

CUDA out of memory. Tried to allocate 2.14 GiB.
GPU 0 has a total capacity of 8.00 GiB...

I stared at the terminal.

Googled the error.

Every answer said the same thing:

Reduce batch size. Use gradient checkpointing. Or get more VRAM.

I tried the first two.

Still crashed.

That's when I understood something that no tutorial had taught me:

The model doesn't care about your ambition. It cares about memory.

What the Crash Taught Me

Running locally stripped away every abstraction I'd been hiding behind.

VRAM matters more than compute.

You can have a powerful GPU and still fail because:

The model doesn't fit
Memory fragments over time
Multiple processes compete silently

Raw FLOPS mean nothing if there's nowhere to put the tensors.

There's no autoscaling to save you.

In Colab, when something crashed, I blamed the platform. Restarted. Tried again.

Locally, the crash was mine. The constraints were mine. There was nowhere to hide.

Constraints are teachers.

Once I stopped fighting the memory limits and started designing around them, everything changed.

Smaller batch sizes. Gradient checkpointing. Quantization. Offloading.

Each workaround taught me more than the original tutorial.

I'd Learned This Before

The funny thing is - this wasn't new.

Years earlier, at WalletHub, I'd learned the same lesson in a different domain.

We had databases, Elasticsearch, SSR pages, aggressive SEO targets, and constant security pressure.

I thought:

The database stores data
Elasticsearch makes search faster
SSR makes pages visible
Security is something you patch

Reality was harsher.

Indexing wasn't storage - it was a system. Search was adversarial. SEO was a feedback loop. Security fixes regressed unless the architecture was defensive by design.

Our sites were attacked constantly. BugCrowd. HackerOne. Open-source CVEs every month.

I didn't fail because I wrote bad code.

I failed because I misunderstood how hostile real systems are.

Local AI sharpened a lesson I'd already paid for:

Abstractions hide complexity. But complexity doesn't disappear. It waits.

Before vs After

Before running AI locally:

Assumed resources were elastic
Blamed platforms for failures
Treated constraints as bugs

After:

Resources are finite
Failures are information
Constraints are curriculum

These lessons apply far beyond AI.

They shape how I think about checkout flows, API rate limits, frontend performance, infrastructure contracts - anything that runs under pressure.

Why I Keep Doing It

I don't run AI locally to replace the cloud.

I do it to stay honest.

To remember that:

Abstractions have costs
Performance is contextual
Systems fail physically
And understanding the machine still matters

That perspective has made me a better engineer - everywhere else.

If You're Curious

You don't need a data center.

You don't need the latest GPU.

You just need enough hardware to feel:

Constraints
Friction
Trade-offs

Start small.

Run Stable Diffusion locally. Or try a LoRA fine-tune on a 7B model. Push it until it crashes.

That crash is the curriculum.

That's where real learning starts.