How to choose the right AI model for your product
AI isn’t a new concept – the term has been around for decades, and you can go back thousands of years and find stories of golems and other man-made intelligent creatures. We’ve come a long way since then, and the last few years have seen massive advances in AI and its capabilities.
What many think of as “AI” is actually an ecosystem of techniques and models, with subsets like machine learning (ML) and deep learning (DL), all with different capabilities and applications.
This diverse landscape means there’s no one-size-fits-all AI solution. Instead, choosing the right model depends on understanding each tool’s strengths, limitations and how it fits your specific product goals.
We’re going to take a closer look at the larger AI landscape in future issues; as we said last week, being willing to go a bit deeper into the technical details will help you become better product managers in the long term.
However, for now, we’re going to focus on large language models (LLMs) and how they can help you create more impactful products. More specifically, we’re going to focus on the GPT models.
GPT-3 was ideal for five-minute jobs. Think of tasks such as:
Programming assistance
Quick fact retrieval
Simple content creation
Email responses
Grammar and language support
Customer support chatbots
Basic code review
GPT-4 took things up a level and enabled five-hour tasks:
In-depth content summarization
Data analysis and visualization
Resource planning
Workflow automation
Personalized customer interactions
Assistive product experiences
GPT-4 also introduced the idea of agentic workflows, where the AI could execute a series of connected actions, follow conditional paths and make decisions based on previous steps, enabling more complex and interactive user experiences.
Then GPT-4o launched, with more natural interactions and allowing people to use any combination of text, audio, image and video as an input.
Most recently, we had the o1-preview launch, bringing far more advanced reasoning capabilities.
All these advancements have taken place within a short timeframe, which is very exciting as we look ahead at the path toward AGI (see last issue). But what does this all mean for us, in practical terms, as product managers?
Choosing the right model for your use case
Whether it’s regularly upgrading to the latest phone model or keeping up with what’s new in fashion, people tend to like their shiny new toys.
As product managers, there’s a similar tendency to think you absolutely have to use the latest and most advanced model – but that’s not necessarily true.
Theoretically, nothing is stopping you from starting with GPT-3.5 and then migrating to a higher intelligence model as you advance. Alternatively, if you’re deploying to market, you might want to start with a higher class of model and then try fine-tuning to a lower class to take advantage of those lower costs and improved latency.
Perhaps most importantly, don’t get stuck in the mindset that you can only use one model.
For example, in their recently released NotebookLlama, Meta used several different models to tackle the different tasks:
Llama 3.2-1B for processing PDF documents
Llama 3.1-70B for writing the podcast
Llama 3.1-8B for dramatizing the podcast
Parler-TTS for generating the audio
When you’re trying to decide what model to use for your product, think about what particular steps require (or don’t require) high levels of intelligence or reasoning. The simple act of being able to think through those implications and help direct your team is going to be very meaningful.
Along with strategic acumen and user understanding, the ability to understand the different models and their ideal applications will be one of the main factors that differentiate the best product managers.
Start with the smartest
As a heuristic, we recommend initially starting with the highest intelligence model and then working your way down to the lowest intelligence models and seeing how that impacts performance.
This follows the same pattern humans have always used when facing new challenges.
If you’re trying to do something new that’s never been attempted before, who do you want on your team? Generally, you’re going to give that task to the most capable person, the one who’s most likely to succeed. Then, once they’ve figured it out, you use that to train more people and let them do it. Then you make it commoditized and let everybody do it.
The same is true for AI models; start by applying the highest intelligence model to get the highest level of accuracy. You can then test it against a smaller model to see whether that accuracy holds or what the trade-offs are going to be in terms of latency and cost.
Latency, accuracy and cost are the three competing factors in AI model performance. Enhancing one aspect typically comes at the expense of the others, so finding the right balance is key.
So, what model should I use?
It’s easy to get overwhelmed by the sheer number of AI models available, even just within the OpenAI models. While we started by talking about GPT-3 and GPT-4, things have already moved on from there.
Today, there are two main options to consider from OpenAI: GPT-4o and GPT-4o mini (with o1-preview and o1-mini still in Beta at the time of writing).
These have replaced GPT-4 and GPT-4 Turbo as OpenAI’s flagship models. The “o” in 4o stands for “omni”, meaning it can accept text or image inputs. It also has the same intelligence as GPT-4 Turbo, but is much more efficient, generating text twice as fast for half the price.
Overall, the cost of GPT-class models has dropped by 99% since launch, making them increasingly accessible and turning cost into a secondary factor.
Latency has also greatly improved. There are two main parts you need to consider:
Time to first token. Influenced by input size and model type, this is the time taken to start generating output
Time between tokens. Determined by model and infrastructure, this impacts the overall speed of longer outputs
You can’t optimize against time between tokens, but you can potentially with time to first token. If you have a longer output, it’s going to take longer to get the full result. That's why streaming has become so popular, because you show the users the results as they come in, which can make a big difference when your outputs are long.
So, in line with our advice to start with the smartest, we usually start with the 4o model, building an evaluation set and comparing it with the smaller model. There are two main types of evaluations (evals) for testing models:
Challenge evals, testing new or challenging tasks to assess a model's capability
Regression evals, ensuring that previously correct outputs remain accurate
For product managers, creating and maintaining both types of evals is crucial to ensuring model reliability and alignment with user needs.
So, if we get close to our target accuracy on the smaller model (e.g., 80% on 4o and 78% on 4o mini), it’s usually an acceptable switch (depending on how important that difference is to overall performance).
Our rule of thumb? Start with accuracy, then consider latency and cost.
Using distillation to cheat the system
The good news is that, with tools like distillation, you can have your cake and eat it too.
Distillation in AI is essentially like teaching a smaller, simpler model to mimic a larger, more complex one. Here’s how it works in simple terms:
Start with a large model. You have a large, powerful AI model that can handle complex tasks accurately but it may be slow and costly to use.
Generate data from the large model. The large model generates a lot of examples or “answers” for different inputs, essentially creating a dataset of how it would respond to various questions or tasks.
Train a smaller model. You then take a smaller, faster model and train it to produce similar responses by feeding it the examples generated by the large model. In a way, the smaller model is learning from the larger model’s experience.
Achieve similar performance: The goal is for the smaller model to perform nearly as well as the larger model on the same tasks but with much lower computational costs and faster responses.
Distillation in AI allows you to use a smaller, efficient model that mimics a larger model’s outputs without sacrificing much accuracy, ideal for real-world applications requiring speed and cost efficiency.
As you compare and evaluate the different models though, don’t expect to get the same response every time. One aspect that often surprises users is the stochastic, or probabilistic, nature of these models.
This variability can be challenging for those used to traditional programming and deterministic systems, where you can reliably predict the output. With AI, a new model release might solve previous issues but also introduce new quirks, which makes continuous evaluation essential.
The life-changing power of AI
AI is about more than just generating text, images or videos though – it has the potential to truly change lives for the better.
In one of my favorite applications, doctors are using AI to help patients suffering from sudden or degenerative speech conditions to recover their voices.
Since Voice Engine requires such a short audio sample, doctors Fatima Mirza, Rohaid Ali and Konstantina Svokos were able to restore the voice of a young patient who lost her fluent speech due to a vascular brain tumor, using audio from a video recorded for a school project.
You can find the audio clip on this page. Every time I hear that clip it gives me goosebumps. To hear someone get their voice back is unparalleled. Remember, that’s based on a 15-second clip of unrelated speech, with messy background noise.
This is the really inspiring part. Yes, there are efficiency opportunities, revenue opportunities and lots of different things like that. But the opportunity is also there to move the world in a way that hasn’t been possible before.
We’re truly excited about what the future holds.
Today’s challenge
For this week’s exercise, head over to OpenAI’s models page and familiarize yourself with the different models, their capabilities, their strengths and weaknesses.
Once you have a good grasp of what each model can do, it’s time to apply that knowledge to a real-world product workflow.
Earlier, we looked at NotebookLlama as an example and how they used multiple models to accomplish different tasks such as processing documents, writing scripts and generating audio.
Now it’s your turn. Think about the workflow for a product you’re currently working on. For each step in the workflow:
Identify the type of task and the desired outcome
Select an appropriate model for each task, considering factors like latency, accuracy and cost
If you’re not currently working on a product or you’re looking for something different, you can pick one of the following suggested products:
Personal finance app
Tasks: Transaction categorization, budgeting advice, personalized financial tips, fraud detection and summary generation
E-learning platform
Tasks: Curriculum creation, quiz generation, personalized study recommendations, progress tracking and voice-over narration for video lessons
Healthcare patient portal
Tasks: Symptom checker, appointment scheduling, medical document summarization, patient FAQs and follow-up reminders
E-commerce recommendation engine
Tasks: Product recommendation, customer feedback analysis, upsell suggestions, visual search for products and personalized discount offerings
Virtual event platform
Tasks: Agenda creation, speaker bio summarization, automated session transcripts, highlight reel generation and post-event feedback analysis
Don’t worry so much about finding the perfect model for each task. Rather, the purpose of the exercise is to help you think critically about how to leverage different models at each stage of a workflow to maximize efficiency, quality and cost-effectiveness.
P.S. Want to take your product management career to the next level with the latest in AI knowledge? Check out our #1 rated AI Product Management certificated course on Maven. You’ll get the latest insights on how AI is affecting product management, direct access to us for questions and feedback, and an active community of like-minded product managers.
Enjoy your newsletter-exclusive $250 discount with this link.
What did you think of today’s email?
- 😊 Loved it 🧠🧠🧠
- 😐 It was ok 🧠🧠
- 😞 Terrible 🧠