top of page

Managing Machine Learning Projects Wk 4 - ML System Design & Technology Selection

  • Writer: Muxin Li
    Muxin Li
  • Jun 20, 2024
  • 9 min read

Updated: Jun 25, 2024

Understanding machine learning system design considerations, technology selection criteria, and common tools used in the field.


What's Covered:

  • Understand system design factors like how often does the model need to be retrained, how quickly it needs to be able to take in new data to create predictions, Cloud vs Edge AI - where will the system live, on device or in a datacenter

  • Describe key technology decisions in designing ML systems

  • Understand criteria for making tech decisions

  • Explore common tools used by Data Scientists and ML practitioners


Key Takeaways:

  • Start with the user requirements and what the desired end experience should be - this dictates how fast the model needs to run, what privacy concerns there may be, and helps to define your technology system design.


Technical terms:

  • Edge AI

  • Offline Learning

  • Online Learning

  • Batch Predictions

  • Online Predictions


 


ML System Design Considerations

A machine learning (ML) system comprises a user interface, the model itself, the data pipeline, and the infrastructure.

  • User Interface that the users interact with

  • The model itself and how it was trained and makes its predictions

  • The data we’ve collected and the data pipeline

  • The infrastructure on which everything sits on



When designing a machine learning system, key decisions include whether it will be cloud-based or edge-based, use offline or online learning, and employ batch or online predictions. Cloud-based systems reside in the cloud, while edge-based systems run on devices like sensors or phones.

  • Where will the entire system (data, model, generating predictions) sit - in the cloud, or on edge devices e.g. sensor, phone?

  • Offline or online learning - retraining on a regular schedule (offline) or continuously retraining based on new data (online)?

  • Batch predictions or online predictions - waiting until we’ve gotten enough data to batch process and generate predictions with the model (batch), or doing it each time a new data point enters the system (online)?

  • User requirements/constraints - e.g. how fast does it have to be to satisfy users, are there privacy or ethical concerns?


After determining the design of a machine learning system, the next step is to select the technologies for each component.

  • Example case for facial recognition or Face ID to unlock your phone - users want this to be fast, private, and accessible at all times (even when there isn’t internet connection).

  • Running this on an edge based system (the phone itself) ensures speed, privacy, and creating a UX where the system is initially trained upon setup (offline learning) and retrained each time you use facial recognition (online learning - even though it’s not always connected to the internet, it’s online in that it retrains based on each datapoint, or each facial scan to unlock your phone.


Now let's consider a different example, suppose we're building a model to support the movie recommendation engine.

  • In this case, the user is online, the problem to solve (recommendation) is likely going to require a lot of datapoints from previous users, and it should be fast (users don’t want to wait).

  • Cloud system for training on recommendations based on billions of user datapoints, and retraining in offline learning schedule.

  • For the end user, the system can create batch predictions - we’ll have a new recommendation every hour or every evening for every user, when we’ve gathered enough user information to retrain the model with.


Takeaway: User requirements and constraints drive these choices, influencing technology selection for each component.



Cloud vs. Edge

Edge AI runs ML models on devices like sensors or smartphones, while cloud AI runs computations on the cloud and sends results to the device. Edge AI eliminates latency, ensuring privacy and functioning without connectivity,but requires sufficient device resources. Cloud AI offers efficiency and high throughput, ideal for applications with internet connectivity and less latency concerns.

  • Edge AI is defined as machine learning models which are designed to run on devices themselves such as sensors or smartphones rather than on the cloud somewhere. Edge AI has come to fruition thanks to advances in computing technology and smaller ML models designed for running on edge application.

  • A huge benefit to edge AI is much lower latency - for every 100 miles of distance that the user is from the data center hosting a cloud-based application, the latency is > 1.6 milliseconds. At the edge, it’s nearly instant - perfect for real-time predictions.

Cloud based ML solutions have the actual predictions done in a datacenter, then the results are pushed out to the end user or edge device. This does require that the edge device has internet connection to be able to receive predictions generated on the cloud.

  • Cloud based ML solutions benefit from having a huge datacenter to train on, and these models benefit from having access to a lot of data very quickly.

  • Edge AI solutions ensure low latency and privacy - ideal for scenarios where speed is crucial, like self-driving cars or quality control in manufacturing to spot defects on products that are moving very quickly through the line.



When internet connectivity is available, and there’s a lot of data needed to train the model for the problem it’s solving, a cloud system makes sense. How often to train the model will depend on the use case - will the user need a new movie recommendation every second? Probably not.


Edge devices excel at speed - in security camera systems, where they need to be able to rapidly detect and respond to a potential threat, and where internet connectivity can be faulty or not guaranteed (distance from the router), an edge device that is able to handle operations independently from a cloud solution is better.


A hybrid solution can work great to take advantage of low latency in edge devices and higher compute power of cloud systems - do the computationally heavy work in the cloud, but use edge devices to monitor for triggers in the environment, store common predictions, and be able to react quickly.

  • In voice AI assistant software like Amazon Alexa, Apple Siri or Google, the edge AI needs to be able to quickly respond to their wake up word (e.g. ‘hey Siri’).

    • Once awake, the AI is triggered and notified of an upcoming voice command - it sends the voice command back to the cloud to run the computationally heavier task of figuring out what the user wants and how to respond. When the model spits out the answer, it then passes it back to the voice AI.

  • However, we are already starting to see some of the more traditionally cloud-based ML solutions come to our edge devices. Example is Apple’s new OpenAI integration with Siri - much of it is running on your phone (edge AI) instead of being sent back to the cloud, thanks to the latest iPhone advancements like the Neural Engine and Core ML.


If you need speed, privacy, and / or the ability to run the model without internet connection, choose Edge AI.


If you have internet connection, and there aren’t concerns with privacy or speed, then choose Cloud AI.


Takeaway: The choice between cloud and edge AI depends on factors like latency, connectivity, privacy, and device capabilities. Hybrid approaches can combine the strengths of both.



Online Learning & Inference

Machine learning systems can use offline or online learning for model training and batch or online prediction for generating predictions. Offline learning involves retraining on a fixed schedule, while online learning updates the model with each new data point. Batch prediction generates predictions on batches of data, while online prediction does so in real time.

  • Offline learning has a scheduled retraining cycle, which is easier to run in production and makes it easier to debug or evaluate how a model is performing. Most ML models use offline learning and retrain on an occasional basis. However it does not easily adapt to real-time changes in the environment.

  • Online learning retrains the model on each new data point - it’s harder to run in production and can be harder to evaluate the performance of the model, but it is much faster at adapting to changes in the environment.

  • Example of online learning use case - when reading the news on a typical day, the user may be interested in sports. But when something big happens (big political event or weather crisis), they’re more interested in topics around the event, not their usual reading patterns.



Batch vs Online Predictions

Batch prediction vs online prediction is the process of how quickly and when the model actually generates its predictions. Batch prediction only occurs when there’s been enough accumulated data to exceed a threshold, then it runs the model. In online prediction, it generates in real-time based on user request.

  • If speed is of utmost importance (translating text, self-driving cars, getting a delivery estimated time), then online prediction is ideal.

  • However, batch prediction can leverage more efficiencies in technologies and calculations. It’s useful for models that run on lots of historical data - for finding patterns like recommendations or predicting demand for products. It’s also easier to figure out whether a model is drifting or downgrading in its performance with batch predictions.


Takeaway: The choice between offline and online learning, as well as batch and online prediction, depends on factors like data volume, the need for real-time adaptation, and the desired latency of predictions.



ML on Big Data

Big data presents opportunities and challenges for machine learning. Its volume, velocity, and variety require specialized storage, processing, and modeling techniques. Big data impacts infrastructure choices, necessitates distributed systems, and may require online learning for model training.

  • Big Data has big opportunities but also big challenges - much higher storage and processing costs, much harder to explore and understand the data, and if it’s too big to fit into memory, then other ML technologies have to be used.

  • When there’s so much data, it can cause longer learning cycles while training the model, and increase the latency of the model predictions. The infrastructure for our ML system will have to account for high storage and processing costs of the data.

  • It can be dealt with via distributed or cloud-based storage, distributed processing for data pipelines and processing and training and running models, or leveraging online learning for training (taking in smaller batches of the data rather than all at once, since it can’t fit into memory).


Takeaway: Working with big data requires careful consideration of infrastructure, data processing techniques, and modeling approaches to overcome the challenges and harness its potential for machine learning.



ML Technology Selection

Choosing the right technologies for machine learning projects involves considering factors like programming language, tools for data pipelines and feature creation, and APIs for model integration. Options include building models from scratch, using open-source or commercial libraries, or leveraging AutoML solutions.

  • Technology decisions include which programming language, tools for building the data pipeline and creating the input features for the model, API to interface between the model and the software it will be integrated with.

  • Cheaper options include coding by hand (Python) and open source libraries.

  • Going fast may mean using commercial grade libraries and services from companies that offer Auto ML, which can automate parts of the model building. When going open source, be sure to determine if there’s enough technical documentation and an active community to help as you work through the learning curve.



Takeaway: The choice of technology depends on factors like cost, implementation time, expertise, and the specific requirements of the project. Open-source options offer flexibility and cost-effectiveness, while commercial and AutoML solutions can accelerate development but may come with higher costs.



Common ML Tools

Quick run through on the most popular languages, libraries for ML building based on use case - e.g. deep-learning led by TensorFlow (with Keras that sits on top of TensorFlow), more classical ML techniques led by Pandas and Scikit-Learn. The level of community support and documentation, tutorials etc. should also weigh into your decision.

  • Popular programming languages for machine learning include Python, R, and C/C++. Python, with its rich ecosystem of libraries like Pandas, NumPy, and scikit-learn, is widely used.

  • Jupyter Notebooks are favored for experimentation and iterative development.

  • Deep learning libraries like TensorFlow, PyTorch, and Keras are essential for building neural networks.

  • A powerful option is to use Auto ML - it automates feature engineering, fine-tuning parameters, and can take in data and spit out an API with your model at the end of it. AutoML tools automate model building for users with limited expertise. Available for a price.  



Takeaway: The choice of tools depends on the specific tasks, programming preferences, and expertise of the team. Python's versatility and extensive libraries make it a popular choice, while AutoML simplifies model building for those with less experience.



Conclusions

Start with your user requirements before making technology decisions - factors like as latency, connectivity, data volume, privacy and other considerations can heavily impact how your system needs to be designed to support the desired user experience. When it comes to selecting languages and ML tools, assess the trade-offs between open-source and proprietary tools, as well as manual versus automated approaches to model building.


 

Like this post? Let's stay in touch!

Learn with me as I dive into AI and Product Leadership, and how to build and grow impactful products from 0 to 1 and beyond.


Follow or connect with me on LinkedIn: Muxin Li



Comentarios


© 2024 by Muxin Li

bottom of page