Running DeepSeek R-1 Distilled Locally: A Game Changer

📖 New Ebook Available

Build Your First MCP Server: A Developer's Guide to Wrapping Existing APIs for AI Agents to Use

Learn to create powerful AI integrations step by step

Get it for $9.99 →

The DeepSeek R-1 distilled Qwen 32B quantized to FP4 (try saying that name three times fast!) is a remarkable achievement in local AI deployment. What's truly incredible is that I can run this model on my computer with a single RTX 4090, and it outperforms GPT-3.5/4 - models that were state of the art just a couple of years ago.

Performance Context

While my vibes check doesn't have this beating GPT-4o and o1 yet, and Claude 3.5 remains a superior coding model, that doesn't diminish how remarkable this is. Having access to a model of such high caliber running entirely offline on my local machine, with no data center needed, is a significant milestone.

It's worth noting that it's slower and doesn't perform quite as well as the Llama 70B distilled version running on Groq. However, that's not exactly a fair comparison - Groq has data centers filled with custom LPU chips specifically designed for this purpose.

Standout Use Case

I've found that these "thinking models" excel particularly at Tool/Function Calling. Even the smaller versions of the model perform impressively well at this task, making them particularly useful for specific programmatic applications.

This kind of capability running locally represents a significant step forward in democratizing AI technology, allowing developers to work with powerful models without relying on cloud services or external APIs.

As always feel free to connect with me on LinkedIn or follow me on X @groffdev.

Want to Chat About AI Engineering?

I hold monthly office hours to discuss your AI Product, MCP Servers, Web Dev, systematically improving your app with Evals, or whatever strikes your fancy. These times are odd because it's weekends and before/after my day job, but I offer this as a free community service. I may create anonymized content from our conversations as they often make interesting blog posts for others to learn from.

Book Office Hours