Managing Machine Learning Projects Wk 1 - Identifying Opportunities for Machine Learning

Welcome to Managing Machine Learning Projects! This begins the 2nd out of 3 courses in the AI Product Management Specialization series. Learn why most machine learning projects fail and how to de-risk with small experiments - many of these follow the same process Product Managers use in discovery to validate product ideas.

What's Covered:

Finding good machine learning problems

Key Takeaways

Find real problems that have a large impact
Machine Learning does not solve all problems well, and it needs relevant quality data
Employ good old fashioned product management principles in problem definition and idea discovery
Start with a simpler set of rules (heuristics) before machine learning

Technical terms:

Heuristic
Augmenting

Managing Machine Learning Projects covers what potential pitfalls to avoid and how to set up an ML project for success.

At the conclusion of this course, you should be able to:

1) Identify opportunities to apply ML to solve problems for users

2) Apply the data science process to organize ML projects

3) Evaluate the key technology decisions to make in ML system design

4) Lead ML projects from ideation through production using best practices

Below is an overview of each week:

Identifying Opportunities

Most machine learning projects fail because they didn't tackle a good enough problem to solve that can be improved by machine learning in a cost effective way.

87% of machine learning projects fail

Sometimes it's due to technical challenges - but most of the time it's because of:

Pressure to use AI in any way possible
Jumping right into model building instead of problem definition and validation

Really, you can argue that many AI projects fail because of the same reasons many products fail.

3 Questions to Ask Before Starting an AI Project

Is there a real user problem? This is usually where things fail. Internal pressure pushes initiatives without first validating there's a market need or problem to solve.
Is Machine Learning a good fit for solving the problem? Machine Learning models are good at certain things but they're not capable of everything. There's also questions of whether you have enough data robustness to even train the model on.
Is there enough of an impact to pursue a solution? Like any good product, finding opportunities that can create a lot of impact is a much better ROI for your time and energy.

Identifying Problems

Finding user pain points for AI projects are the same as finding pain points for products - here's a good overview of different methods:

1:1 interviews or focus groups with 6-8 users
Identify patterns, ask how they're solving their problem today
Find gaps in their current solution for opportunities to provide a better one

Observing problems in context

Field studies where you job shadow someone in their environment and find opportunities to improve
Users may not be aware of problems or of opportunities to improve
Can also be a fly on the wall to listen, observe, and find ways to make someone's job easier
Testing platforms like dscout diary studies also allow for remote field research, but being able to observe someone directly is likely to give the best results

Problems Good for Machine Learning (with Current Tech)

Not all problems can be solved with the current state of AI and machine learning. Examples of problems that are easy vs hard/nearly impossible to solve with ML:

Easy for ML	Hard for ML
Classifying objects e.g. image identification, spam detection, identifying diseases from medical scans	Handling very long-term dependencies - where data from much earlier context is important e.g. writing the next best-selling novel with consistent characters and plotlines
Recommendations and personalization	Handling multimodal inputs (e.g. image, audio and text) at the same time
Predictions	Ethical concerns

It also depends whether we have the data needed to solve the problem.

Impact vs Costs of ML Project

How much return or business impact can this ML solution deliver? Would it be worth the computational and maintenance costs to do it, or can a simpler solution be good enough?

Takeaway: Find real problems with high impact, that are a good fit for machine learning models, for which you can access the data needed.

Understand the Problem from Google Machine Learning Foundations

The post is worth reading on its own, but here are key takeaways.

Machine Learning can be broadly grouped into Predictive AI vs Generative AI.

Predictions should drive action - there's little value in a model that does not drive action

Always start with a heuristic (a simple quick solution) before attempting to use Machine Learning - a heuristic can be a product chart filter or a list of top selling products instead of going straight for a recommendation engine. The heuristic should be your baseline benchmark of performance to beat with an ML model.

An ML model will usually perform better than a heuristic but at what cost, and by how much of an improvement?

Data considerations for your ML model:

Abundance of relevant and useful examples
Consistent and reliable
Trustworthy from a credible source
Available as inputs for your ML model when needed at prediction time - if not, you're better off without that feature
Correctly labeled (no more than a few % of incorrectly labeled data)
Representative of the real world - real user behaviors or real-world phenomena, as much as possible, for the model to train on
Has predictive power (higher correlation means higher predictive power). Test predictive power by removing and adding back in a feature to see how much it changes your model performance.

Takeaway: ML models not easy to set up and run - start with a heuristic, evaluate if you have the right kind of data for an ML solution to work, and whether the ROI is there.

Validating Product Ideas

Converging on a solution through many small experiments - validating using the Scientific Method:

Start with a hypothesis
Test it with users
Analyze findings
Make a decision - continue or pivot
Refine hypothesis and repeat

Testing with mockups

Visualize the solution ASAP - you learn the most when testing low-fidelity mockups with real users. Assume you have technical feasibility, right now you're testing for viability.

Moving Forward to Product Development

Only after doing all these things should you consider moving into actual development:

Identifying a real problem with real business impact
Understand how it's being solved today, gaps/opportunities to improve
Confirm if ML is a suitable approach but find a heuristic first if possible
Converge on a potential solution via experiments with low-fidelity prototypes
Initial technical feasibility - even if it's hard, we at least know it's possible

Again, a lot of product management concepts are covered here. Yet 87% of ML projects fail because we tend to skip these key steps when we're pushed to deliver - avoid the temptation. You'll save lots more time and resources when you do the initial legwork of correctly identifying a real problem that's a good fit for ML solution and validating your idea with small experiments.

Takeaway: Don't skip the product validation process.

Benefits of ML in Products

What Machine Learning is good at:

Automation
Prediction
Personalization

Automation via Machine Learning is great at lowering cost and increasing quality in repetitive tasks, but it comes with risks:

Cannot adapt to major changes in its environment
Has no sense of ethics
Who has accountability if things go wrong?

ML is great at ingesting lots of data and finding patterns, allowing it to make personalized recommendations or predictions to drive decisions that would be hard for humans to do on their own. However, automating predictions is not recommended if there is a high cost to being wrong - e.g. medical diagnosis, judging court cases, or job hiring. Keeping a human in the loop, or augmenting human judgement and work quality by pairing them with an AI, can be wiser.

Heuristics vs Machine Learning

Before starting ML, a heuristic (a simpler solution) should be explored. These could be:

Business rules that are hard coded
Rules of thumb
E.g. using averages, recommending the highest rated product, predicting just one object that was the most common in image classification training

Heuristics have pros and cons - pros are they're easy to set up and computationally cheap to run and maintain. They're easy to understand. The cons being they have to be updated manually if business rules change, they usually don't perform as well as ML models, and they aren't suitable for handling lots of data.

Machine Learning can be retrained on new data, handles a wider variety of problems, can deal with large amounts of data and often perform better - but they're going to take more effort and resources to get up and running.

Heuristics should always be the first solution you attempt before going after a ML solution, at the very least to establish a benchmark to compare against using ML. If there is a valid problem that ML is a good fit for, and there's a huge upside (to offset the higher costs of ML), then ML can be considered.

Takeaway: ML has pros and cons - not advisable to automate if the cost of being wrong is very high. For cost effectiveness, start with a heuristic before the ML solution.

Like this post? Let's stay in touch!

Learn with me as I dive into AI and Product Leadership, and how to build and grow impactful products from 0 to 1 and beyond.

Follow or connect with me on LinkedIn: Muxin Li