an image that illustrates fine tuning LLMs with Snowflake Cortex

Guy Adams - CTO, DataOps.liveNov 26, 2024 1:08:08 PM4 min read

Big or Small LLMs? How to Get High Precision at Scale Without Breaking the Bank

5:08

This is the third blog in a four-part series on DataOps and AIOps. Read the first blog here and the second blog here.

If you’ve been working with large language models (LLMs), you’ve likely faced this dilemma: Should I use a big, powerful, but slow and expensive LLM or a smaller, faster, and cheaper one that might not be as good at the specific task?

It’s a classic tradeoff—on one hand, large LLMs deliver incredibly high-quality results, but at the cost of speed and resources. On the other hand, smaller LLMs are quick and cost-effective, but they often miss the mark on complex tasks. So, where’s the sweet spot? And more importantly, how do you handle this when you need to process millions of data points each day?

The typical solution most people gravitate toward is fine-tuning or Retrieval-Augmented Generation (RAG) on top of a smaller model. Essentially, you’re building a fast and cheap model tailored to a specific task, which sounds great in theory. But there’s a catch: Where do you get the data to fine-tune the model in the first place?

If you’re dealing with large amounts of data and need high-precision summaries or feature extraction, relying on a small model out of the box won’t cut it. You need a method that gives you the power of large models but the efficiency of small models. And here’s how you do it...

The Threefold Solution to Optimize Large and Small LLMs

1. Leverage a Large, Powerful LLM to Create a High-Quality Dataset

Start by using a large, powerful LLM (the same kind you’re trying to avoid using at scale) to process a manageable number of your data points—maybe a few thousand. This model can provide high-quality summaries and feature extractions that would otherwise be hard to achieve with a smaller model. Sure, it’s slow and expensive, but for this limited batch, it’s worth it. Think of this as your “training dataset generator.”

2. Fine-tune a Smaller, Cheaper Model

Once you’ve created your high-quality dataset from the large LLM, the next step is to use that data to fine-tune a much smaller, cheaper model. Fine-tuning or using a RAG approach allows you to tailor the smaller model specifically to your task, making it more efficient and cost-effective for production use.

What often works well here is fine-tuning a smaller version of the large model you used in step one. For example, if you started with a GPT-3-like model for the initial data generation, you might fine-tune GPT-2 or a smaller variant to handle your larger dataset at scale. The smaller model can then handle the high-volume production tasks (millions per day) with the precision it learned from the larger model’s dataset.

3. Use Another Large LLM for Testing and Validation

Here’s where things get interesting. Don’t just trust the small model blindly—use a different large, powerful LLM to validate its performance. This LLM acts as your judge: feed it the inputs and outputs from your fine-tuned small model, and let it score how good those outputs are. If the results hit a certain quality threshold, the smaller model is good to go. This is an essential step because it helps ensure your small model maintains the high precision required for your task.

Why a different LLM? Using a separate model reduces the risk of overfitting to the original LLM’s biases or weaknesses, giving you a fresh perspective on how well your small model performs. Once it passes this test, you can confidently deploy the smaller model in production.

Automating and Industrializing the Process with DataOps

Of course, this isn’t a one-and-done process. Your data is constantly changing, and new models are being released almost weekly. You can’t just fine-tune a model once and call it a day. To stay ahead, you need to regularly retrain and revalidate your models—and that’s where automation comes in.

By using DataOps.live and platforms like Snowflake Cortex, you can automate this entire process—from generating the initial dataset to fine-tuning and revalidating models on a regular basis. This means you can retrain and redeploy your small models as often as needed, ensuring they always meet the highest standards without manual intervention.

With the combination of DataOps.live and Snowflake Cortex, you can stay agile, efficient, and ready to scale without sacrificing the precision you need.

Learn how to operationalize Snowflake Cortex in this blog and video.

In Conclusion

When it comes to balancing large, slow, and expensive models against small, fast, and cheap ones, the key isn’t choosing one or the other—it’s using both in the right way. By leveraging a powerful LLM to create a high-quality dataset, fine-tuning a smaller model for production, and validating it with another large LLM, you get the best of both worlds: precision, speed, and cost-efficiency.

When you automate this entire process using tools like DataOps.live and Snowflake Cortex, you ensure that your models stay up-to-date, no matter how fast your data or the AI landscape evolves.

Guy Adams - CTO, DataOps.live

Snowflake Global #1 Data SuperHero! An experienced CTO and VP, I'm passionate about DataOps. I've spent 20+ years running software development organizations and now my focus is on bringing the principles and business value from DevOps and CI/CD to data. Cofounder of the truedataops.org movement. Also Dad, technologist, (over) engineer, amateur inventor, skier, and mildly eccentric.