Monitoring ML models using MLOps

In the previous paper, I presented how MLOps supports the implementation of ML (machine learning) systems. The following text builds on one of the practices previously outlined, i.e. monitoring.

Below I will attempt to answer three questions:

Why is the monitoring of production models so important, and what are the possible consequences of lack thereof?
What are the differences between ML system monitoring and classic web application monitoring?
How to evaluate the quality of the machine learning model?

Why are ML models failing?

To understand the importance of ML model monitoring, it is crucial to answer the question of why machine learning systems become less efficient over time? In part, this is due to the specificity of their implementation. The principle of ML systems operation is based on the rules developed during the learning process.

The quality of the final model will therefore be highly dependent on the data on which it was trained. Unlike classical software development methods, where the domain logic is expressed in advance with a set of heuristic rules, the operational quality of machine learning systems deteriorates over time (Figure 1).

Figure 1 The approach in classical programming and machine learning systems [source: own study].

Due to the ML models being highly sensitive to data, implemented solutions that interact with real-world decline over time. This is particularly important in the case of systems developed based on dynamically changing information (e.g. geopolitics, stock market, medicine).

These are, of course, not the only factors contributing to the performance degradation. As far as the systems integrated with external devices (e.g. medical equipment) are concerned, uncontrolled replacement of equipment may cause errors in the operation of the model. Thus, an undetected failure will lead to serious consequences, both legal and financial ones.

Figure 2 Examples of real-world situations that may degrade the model [source: GradientFlow].

Consequences

There are branches of business in which constant monitoring of the system’s operation is forced, due to specific legal obligations. An example is the European guidelines that define the transparency of systems using artificial intelligence. One of the requirements in the document is explainability. According to its definition, the decision-making process carried out by the ML model and its technical aspects should be understandable to humans.

In practice, for such branches of business as banking or medicine, these regulations are reduced to the obligation of thorough monitoring of the system operation. Systems in question carry out tasks of, for example, examining loan applications or adjusting the treatment process to the patient’s profile. Such software must be constantly supervised due to the need to accurately justify the decisions made by the program.

In addition, operating ML models may also violate ethical norms. Monitoring in such a situation may allow the failure to be identified at an early stage and, consequently, protect against serious legal consequences. For more examples of unsuccessful implementations of ML models, please see here.

How to monitor ML models?

Monitoring of ML models is slightly different from monitoring of classic websites. However, in both of these processes, Application Performance Management (APM) principles are in effect. According to them, the following metrics are evaluated:

service response time,
usage of hardware resources (CPU, GPU, RAM, HDD/SSD),
availability of the application,
occurrence of errors.

In the case of solutions using machine learning, it is worth extending the scope with the following elements:

analysis of the distribution of input features and predictions made in order to detect data drift, i.e. variability over time,
monitoring statistical values (e.g. F1 score, precision, sensitivity, mean squared error) to retrain the model on the basis of the latest data when its results do not meet the intended standards,
automatic alerts in the event that the results of inference exceed the anticipated domain of consideration.

As one can see, monitoring machine learning systems is a complex process. When used proactively it may contribute to better control over the quality of the delivered solution and protect against unwanted legal consequences.

This is the penultimate paper in a series dedicated to MLOps. In the next, and also the last publication, I will discuss the issue of the security of production machine learning models. I will also introduce what types of attacks are used to steal data or even deliberately deteriorate the quality of the production model.