Risk in ML Projects. Is MLOps the Solution?

Approximately 87% of Data Science projects never make it into production. Such data were presented in 2019 at the Transform 2019 conference in San Francisco thus showing a problem which may be solved by MLOps. In the article below, I will try to explain what this enigmatic term, which is recently gaining in popularity, is actually all about. I will also give examples of how its use can translate into direct benefits in your team and present problems and risks most commonly faced in ML projects.

Machine learning is an inherently complex domain involving the exploration and discovery of new knowledge. Due to the multi-faceted process of ML software development, building models and their deployment in production is far more complex than creation of traditional systems. However, these are not all the risks.

Building a model and its deployment is not the end of the work. The model deployed in production is at risk of degradation (source) which, if not detected in time, may lead to significant losses and in some industries even to serious legal consequences. Advanced medical systems used to classify X-rays may serve as an example. Their flawless operation may be compromised, for example, by improperly conducted replacement of equipment whose measurements will not be consistent with the original model. A situation where critical software operates incorrectly and its decisions are inconsistent with accepted standards may lead to serious consequences. It is therefore important to detect failures as early as possible.

What is MLOps?

In the introduction of this article, I used the term MLOps. The term is defined as “the extension of the DevOps methodology to include Machine Learning and Data Science” (source). From a business perspective, it is nothing more than a set of competences that combines the aforementioned areas resulting in efficient model building and deployment. So, what is MLOps and why it deserves our interest?

The main idea behind MLOps is to support the process of machine learning software development by implementing dedicated tools (e.g. Kubeflow and mlfow) and standardising each stage of the project. According to data collected by Algorithmia Inc., in 57% of cases the process of building and deploying the model takes more than one month (source). Hence the growing need for improvement. The use of techniques such as:

versioning of data/hyperparameters/models;
testing of data/models/infrastructure;
CI/CD;
model development and production monitoring, results in a noticeable reduction, at each stage of the project, of the time required to deliver the product and thus also the costs.

Starting from building a suitable model and ending with its deployment in production and monitoring. This is thanks to the automation of repetitive operations and the overall reduction of chaos to a minimum. Optimisation of the entire process has also other benefits. One of them is undoubtedly easier induction of new employees into already started projects. MLOps, however, is not only about simplifying processes or reducing costs, but also a kind of protection against the risks faced by ML projects. MLOps enriches the work culture with good practices (e.g. code quality control, model monitoring and containerisation) and tools that enable both the monitoring of the development process and the production evaluation of the model, thus making it possible to securely control the developed software.

Is that all?

No!

The information contained in this article is just the tip of the iceberg. The problems that MLOps is trying to deal with is a very broad and complex issue. For this reason, this article is the start of a series of articles aimed at explaining what risks can arise in machine learning projects and how MLOps is able to address them.

In the following articles, we will present the following topics: