We revisit the CRISP-DM process in more detail, with a deeper understanding of the roles and Agile processes that drive projects forward. A LOT of similarities with product management, so this should be familiar.
What's Covered:
CRISP-DM process and real-world case study
Roles and responsibilities in an ML team
Measuring outcomes (business impact) and output (model performance), and when during the project lifecycle
Key Takeaways:
It is NOT a linear process - very much like product management work, there's lots of ambiguity and lots of experimentation needed to converge on the right solution before deploying it
Domain expertise matters - always talk with your customers, users, domain experts to get a better understanding of the problem space
Relevant, quality data is crucial to ML projects - but it can be hard or expensive to source, it's not always clear which data you should be trying to capture and store, there are privacy issues, and data is too often inconsistently captured
Technical terms:
CRISP-DM
Directional expectation tests
Hindsight scenario testing
Data Science is mostly exploratory - like in Product Management, there are a lot of questions we don't have answers for at the beginning:
What data or features would we need for modeling?
Which algorithms and performance metrics should we use to evaluate
This module focuses on organizing ML projects by understanding the following:
CRISP-DM process (Cross Industry Standard Process for Data Mining) - it's been the standard way to organize ML projects for decades, and is flexible enough for any industry
Structure and roles of an ML team
Tools and methodologies for organizing and tracking progress
ML Projects vs Software Projects
Like Product Discovery or finding product-market fit, ML projects do not follow a linear progression - sometimes you have to go back to the drawing board and try something else, but you're always trying to learn by doing a lot of small experiments. Software projects are deterministic - even if requirements change, at least you know whether you've built to the requirements or not.
Compared to running software projects, ML projects have additional complexities:
More skills are needed - team members or roles like Data Scientist and ML Engineers need to be added
Higher technical risks from not knowing which data or features are needed or even which model is needed at the beginning - are you planning infrastructure for a neural network or a simpler regression model?
More risk and uncertainty in general makes it harder to plan and estimate - this is more of an iterative process, due to the exploration and discovery needed to find the right solution
All these things make it harder to show progress - sometimes it's 1 step forward, 2 steps back.
ML Projects can completely stall - it's common to spend an entire week on modelling and show no improvement. Meanwhile, software development tends to make progress and move forward with time.
ML and Data Science is more like a science of exploration - finding the right solution to build is the hard part
It's not intuitive for humans to know what's hard vs easy for an ML model. We take vision and hand eye coordination for granted, but AI struggles with these simpler physical tasks.
ML projects also have to deal with additional challenges due to the complexity of running models based on real world phenomena:
ML models deployed into the real world also have to be retrained if the environment changes, so they require more ongoing support
There never is 100% model performance, so what is an acceptable threshold of error? Some problems are inherently too difficult to achieve more than XYZ % benchmark.
Inference and generating predictions with new data is going to naturally cause differences in the quality of the predictions.
e.g. Fraud models break when adversaries adapt, language models trained on 2017 text don't perform well on 2018 text
Change management with users is often needed, since ML models (and software applications, I'd argue) tend to require changes in users' workflow
User trust is important - if they don't trust the model works, they won't use it
ML projects need a lot of quality, relevant, cleaned data - it's a huge challenge:
80% of a ML project time spent is on upfront data related issues. Only 20% is on actually building the model.
Sourcing data, cleaning and correcting issues, fixing missing data, selecting features takes significant amount of work
Data that is generated is not always available or usable - while medical data would be a good fit for ML models, they're locked up by privacy issues and are inconsistently collected
Data needs to be relevant to the problem you're solving for, and properly labeled
Best Practices to Handle ML Challenges
Always look at your training data - really look at it, try labeling it with your team, and understand its details and edge cases. For many use cases, it’s very unlikely that a model will do better than the rate at which two independent humans agree.
Build an end-to-end solution ASAP and get it deployed, so you can learn from end users. Justify every additional complexity.
Have a risk management plan in place for when algorithms go south - do an upfront risk assessment with various stakeholders during the early stages, keep humans in the loop during training, deploy changes in batch processes so it's easier to debug what may have caused the issue in the first place.
CRISP-DM Data Science Process
1996 - consortium of European companies create a process that can be applied to any industry. It's still in practice today. Product Managers may recognize this from lean startup methodologies, design thinking, and any number of innovation practices that follow a similar, scientific method.
The CRISP-DM process is not linear - you may even go back through the process multiple times, repeat certain steps, but whatever you do - do NOT skip a step!
Overview of CRISP-DM:
Business Understanding
Write down the problem statement, ensure team is aligned on shared understanding
Know if there are any constraints/hard requirements - e.g. user would need these predictions within 12 hours
Impact and how to measure it - outcome metrics (business or user value) vs output metrics (model performance)
Success targets for your metrics - e.g. reduce costs by 5% (outcome), MSE no higher than 0.15 (output)
Understand the relevant factors that are important to solving the problem with modeling - have domain experts on board
Data Understanding
Gather, Validate, and Explore
Gather data and identify data sources for each factor
Label data to include target labels for model training
Create a set of features for modeling
Perform quality control and data validation
Address missing data, erroneous data, and outliers
Conduct exploratory data analysis (EDA)
Use statistical analysis techniques and visualizations
Identify relationships and patterns between features and output, and among features
Data Preparation Note - Auto ML has made progress in automating data preparation and processing steps. In my capstone project, I used the Python library scikit and it did the splitting, training, validation, testing and model tuning automatically. Even stronger Auto ML solutions like Auto-sklearn can handle model selection, feature engineering and optimization.
Split data into training and test sets
Define feature set for modeling
Create new features, combine existing features
Perform feature selection to eliminate duplicative or unnecessary features
Prepare data for modeling
Encode categorical variables into numeric values
Scale data if needed
Resolve issues such as class imbalance
Modeling
Have good documentation and version control practices in place
Evaluate different algorithms through cross-validation using training data
Document and perform version control
Optimize hyperparameters to balance simplicity, complexity, and performance
Retrain the final model using all available data before testing
Evaluation
After training and validating, choosing our final model from the ones that were being considered, testing that model on the reserved test dataset and evaluating it with the output metrics.
Testing end-to-end solution - the software unit tests, integration tests of the model into the product, QA etc.
Directional expectation tests - changing inputs in a certain direction and seeing if the model is able to pick up on it and behaves as expected
Beta and Alpha testing with users before full deployment
Deployment
Follows the normal software deployment process
After launch, continue to monitor the model in the real world - schedule times for retraining
Takeaway: CRISP-DM and data science work are iterative, not linear. Adjust steps based on specific project needs, but avoid skipping steps.
These slides are available in the course module but it's worth copying them here for reference:







Case Study - Power Outage Prediction Tool for Electric Utilities
Based on a real case study done for a utility customer - it highlights the challenges of real-world ML projects and the importance of the CRISP-DM process in maintaining structure in the ambiguity.
The challenge - when storms cause a power outage, electric utilities need to know how many crew members to deploy to restore power. Too many crew members gets expensive - too few and restoration stalls, many people get upset, and utilities can find themselves facing hearings at the PUC explaining why it took so long.
How it was being solved today - directors of operations rely on weather forecasts, experience and intuition.
Goal: Reduce restoration time, wasted costs, improve planning
What targets for restoration time (reduce by how many minutes), what model output metric target performance to aim for (a regression problem, so MSE no higher than XYZ)
Constraints: Utility needs predictions delivered within 48 hours
Many factors played into determining which features should be used in the model - yes weather is important, but specifically what? Sustained wind levels, peak wind gusts? Did you know that trees with more leaves are more prone to falling over and causing outages? Domain knowledge expertise is critical, so engaging with those experts provide necessary detail into what should be considered in the model.
Knowing you need certain data doesn't mean you have access to it:
Does the data exist (historical weather data)? Do you have it or would you need to acquire it from 3rd parties?
Are there privacy issues - will a utility release where their electric generation assets are located?
How much data do you need - is 1-year of historical data enough, or do you need 10 years?
How good is the data - is it consistent, labeled, thorough? Satellite imagery data the team had acquired had different zoom levels, which required additional processing to make usable.
Even after these investigations, there could still be features that are missing that no one thought about including.
Selecting a model will depend on both its ability to perform accurately but also meet other non-performance related needs:
How well can the model handle feature complexity - is there any feature complexity? It turns out, trees are more prone to falling (and causing outages) at lower wind speeds when there's been a relatively dry period, hardening the ground. Who knew.
Interpretability in this case was important - utility needed to be able to understand why the model wasn't working, so a simpler model was chosen over something like a neural network
Team had to evaluate if a single model approach, or tailored custom models for each territory was needed
Model Testing:
Two testing strategies were used: historical data testing and customer testing with live data
In testing with live data, the model was trained on all the historical data, then tested during a live event - the team compared what was happening vs what the model predicted.
Testing revealed data quality issues, requiring the team to go back to the data cleaning and scrubbing phase.
Team often had to repeat the CRISP-DM process multiple times until they got to a model they were confident in
Deployment - after testing and getting to a confident model:
The model was integrated as a visualization tool into the utility's control center. The results were displayed as a visual interface or map of predicted outages.
Change management was crucial to help customers adopt the new product and adjust their workflow. They were in the habit of working from years of experience and now needed to learn how to use the model instead.
Monitoring and Retraining:
The team monitored the model's performance and the outcomes achieved for customers after deployment. Did it reduce restoration time?
The environment around the model can change over time (e.g. tree trimming, new utility assets or new location of assets)
The model needs to be retrained periodically to account for changes in the environment
Takeaway: ML Projects require domain expertise - talk to your customers or users. The process can go back and forth before you get conclusive results. Even after deploying, there are reasons to keep monitoring and retraining the model when needed.
Team Performance
There is no wrong or right way to structure a ML team, as long as all the roles are being covered. Even a team of 1 person is possible (but not advisable, given the broad skillset that is required).
Know the roles and skillsets involved - the job title and org structure matters much less.
Usually has Product, Data Science, and Engineering


Product - Product Managers and Product Owners
They are involved in the entire lifecycle of a project. The Product Manager interfaces with the rest of the org, defines the customer/user requirements, enables sales and marketing. The Product Owner defines the technical requirements and guides the development.
Note: Titles and responsibilities are often fluid and vary from org to org, team to team. But generally, you need to know who is owning the piece around 'guiding development' or 'defining user requirements' regardless if that owner is one person or two people or a whole team.
Data Scientists own the data exploration, analysis, and model building. They're also involved in prototyping as part of the learning process. They're mostly involved earlier in the process during discovery and finding the right solution.
Statistics background, some programming, ideally has domain knowledge in the field or industry, figures out the ML approach, model, and prototyping.
Determines the strategy and direction of the modeling effort.
Exploration focused, learns from small prototyping experiments.
Data Engineering collects, cleans, and manages the data. Usually a software role, they're responsible for building the data pipeline needed for modeling. They work alongside the Data Scientists, usually earlier in the process, and dove-tail into the other engineering roles responsible for deployment in later stages.
Machine Learning Engineers create the production grade data pipeline and model to be used in deployment. They'll interface with Software Engineering to integrate the model into the product.
Computer Science or Engineering background
Has ML training, owns the production data pipeline
Works with DevOps to integrate models into product experience and in production environment
Software Engineers own the user interface and integration work of putting the model into the actual product or application the model is running in.
QA or Quality Assurance works in testing the model and the product experience.
DevOps handles infrastructure and the deployment of the model and the product.
Stakeholders to this team include Customer Support, Sales, and Marketing. Their input in the early project direction (alongside of domain experts) helps to guide the project, and they're mostly involved during deployment when the product is ready to go out into the real world and commercialized with the user base.
Project Business Sponsor (not listed) is also critical to the success of the project - they champion the project in the org, usually has a higher leadership position, and ensures that the goals of the project are aligned with the corporate strategy.
Takeaways: Regardless of team structure or titles, there needs to be someone to define what problem to solve, how will we solve it with ML (if it can be solved with ML), runs experiments to validate our hypotheses, and deploy the solution once there's been enough confidence in it.
Organizing the Project
Pick your favorite collaboration tools for tracking projects, documenting and sharing roadmaps, maintaining version control - Trello, JIRA, GitHub/Git, Google Docs etc.
Just make sure you have a way for the team to track projects, track user stories, plan and manage sprints
Agile methodology - it's not a linear process, it's iterative and validating through lots of small experiments. Doing this with ML models can involve:
Simple mockup of the entire solution, test for feedback (assume you'll find a model later, you're just testing if the product value prop has any legs).
Press release and visuals of the customer experience
Validate problem understanding and potential impact
Create a mocked up model that appears to work (and start collecting the data your investigation with domain experts has told you that you need)
Interactive prototypes
Prototype using a heuristic instead of an ML model - an average, an Excel spreadsheet, etc.
Use historic data
Evolve with a working prototype that is using real data and a simple ML model
Simple linear regression, decision tree - get feedback

Collaborate and Communicate
Again, very familiar for Product Managers and anyone who has done Agile:
Monthly or Quarterly roadmap discussions on priorities and set the roadmap
Biweekly sprint planning and reviews, to break down the roadmap items into individual sprints
Daily stand ups - even the DoD uses these!
Regular demo sessions with your stakeholders
Visualize and present the work being done
Weekly or biweekly
Gains support from the org and shows progress to your stakeholders, customers, or other parts of the org
Provides a chance for input
Use your favorite tools for collaboration:
Tools for managing roadmaps and customer requirements (e.g., Confluence, Google Docs).
Project tracking tools for user stories and sprint planning (e.g., Jira, Trello).
I recommend Smartsheet!
Collaboration and version control systems (e.g., Git, GitHub).
Takeaway: Have a strong project management practice in place, use collaboration tools (they're great!), and have regular cadences to check in with your team and stakeholders.
Measuring Performance
Did the project achieve its outcome metrics (business impact)? Did the model achieve its output metrics (performance, error targets - internal metric, not shared widely)?
Business outcomes must be defined first, and it drives the definition of the model's output metrics.
During the building of the model, measure the outcome metrics. After deployment, continue to monitor output metrics of your model.
Don't go all the way through deployment if you haven't validated if the model is achieving its target outcome/business or user goal
Hindsight scenario testing - what would have happened if we had our product or model for a past scenario? What result could have been achieved?
A/B Testing with customers - split customers into a business as usual group vs those using your product or model
Beta testing to get feedback from early adopters you work closely with
Before deploying, track your output and outcome metrics
Non-performance Considerations
Is interpretability important? More interpretable means it's easier to debug and easier to identify biases.
How fault tolerant is the problem - how bad is it if the model is wrong? Is it recommending a moving, or recommending a job applicant?
What's the cost of sourcing and storing data?
What's the cost of compute resources needed for training, retraining, and making inferences/making regular predictions based on new data?
Takeaway: Check if the model has made an impact before committing to deployment efforts. Even after the ML project is done, it still needs monitoring.
Conclusions
There's lots of overlap between ML Projects and Product Management, which makes sense given that both types of work involves a lot of uncertainty, solving for real-world problems, and needing lots of experimentation and cycles of testing and learning. ML Projects have additional challenges in sourcing massive amounts of quality data, whereas Product discovery work could be based on a small set of qualitative studies (which has its own challenges).
For both disciplines, the scientific method style approach of forming and testing hypotheses with experiments is followed. It requires learning and experimenting in a way that is cost effective, the flexibility (and emotional strength) to make pivots when needed, and the ability to move forward prudently in a direction based on the level of confidence and evidence from learning experiments.
The work requires several other roles and inputs from many domains along the way, usually requiring a high level of collaboration and coordination with many stakeholders. It's never a linear process, often requiring rework and rediscovery and rethinking and redrawing the drawing board, but that's the nature of work that has high uncertainty and ambiguity.
Like this post? Let's stay in touch!
Learn with me as I dive into AI and Product Leadership, and how to build and grow impactful products from 0 to 1 and beyond.
Follow or connect with me on LinkedIn: Muxin Li
Comentarios