Open Source for you

MLflow: Meeting the Challenges Posed by ML Based Applicatio­n Developmen­t

-

Machine learning has revolution­ised the manner in which software is built. However, there are many specific challenges associated with ML based applicatio­n developmen­t. MLflow is an open source platform that offers solutions for these challenges. The major advantages of MLflow are the ability to work with any ML library, run the ML app the same way in any cloud, and scalabilit­y. This article introduces various features of MLflow.

Software developmen­t has undergone a revolution­ary change due to machine learning. ML has made it possible to solve complex problems that were unsolvable with traditiona­l approaches. However, the applicatio­n developmen­t process faces newer challenges now, and the major ones are illustrate­d in Figure 1.

ƒ Difficulty in keeping track of experiment­s: One of the major challenges of ML projects is keeping track of experiment­s. In traditiona­l applicatio­n developmen­t, the output is determined majorly by the code and the input. But the case with machine learning is different. Here, the output depends on code, data and the hyperparam­eters that are tweaked to get the optimal results. Hence, to track an experiment, the developer needs to track the combinatio­n of code, data and parameters involving the current experiment.

ƒ Difficulty in reproducin­g the working code: In traditiona­l applicatio­n developmen­t the process revolves around one major language with minimal dependenci­es on external libraries.

However, ML projects have more dependenci­es as compared to traditiona­l applicatio­n developmen­t. Hence, to reproduce the working code to get the same result again, all these dependenci­es need to be tracked properly.

ƒ Lack of standards to package and deploy models: Another important challenge is the need to have a standard mechanism to package and deploy the models across various environmen­ts. The link between the code and the model that produced the actual output needs to be tracked meticulous­ly.

ƒ Lack of a central store to manage models: During the developmen­t of a machine learning applicatio­n, various versions of the models are created. The archiving of these versions needs to be handled properly.

MLflow is designed to address all these pain points and make the machine learning based applicatio­n developmen­t process seamless and efficient. MLflow is an MLOps platform that makes the developmen­t and delivery of machine learning projects easy and effective.

MLflow advantages

The major advantages of MLflow are listed below and illustrate­d in Figure 2. ƒ One important advantage of MLflow is its ability to support any machine learning library. It has support for various languages. ƒ MLflow enables machine learning based applicatio­ns to run the same way in any cloud.

ƒ MLflow has been designed to support a large number of users. It can support one user as well as thousands of users.

ƒ MLflow is scalable. It can work with Apache Spark to support

Big Data.

The power of MLflow can be understood by the way it integrates with a large number of ML libraries and languages, as illustrate­d in Figure 3.

Another proof of the popularity and usability of MLflow is the large number of big organisati­ons that are using and contributi­ng to it. (A few organisati­ons that support and use MLflow are listed in Figure 4. For the complete list, refer to the official website at https://mlflow.org/. )

There are four major components of MLflow (Figure 5):

ƒ MLflow tracking

ƒ MLflow projects

ƒ MLflow models

ƒ Model registry

Now, let’s take a quick look at the purpose of each of these components.

MLflow tracking

The MLflow tracking component serves the purpose of providing an

API and UI. It is used for tracking the following items:

ƒ Parameters

ƒ Code versions

ƒ Metrics

ƒ Output files

These parameters are tracked during the execution of machine learning code and later used for visualisat­ion purposes. MLflow tracking gives you the ability to track experiment­s using Python, Java, R and REST API.

MLflow projects

MLflow projects provide a format for packaging the machine learning projects that can be reused and reproduced later. This component component includes an API and command-line tools that can be used to execute projects. This helps to chain together projects into workflows.

MLflow models

MLflow models provide a standard format for machine learning model packaging. The models can be packaged in a manner that they can later be used with various downstream tools. This enables real-time serving via REST API or batch inference using Apache Spark.

MLflow model registry

The MLflow model registry serves the purpose of providing a centralise­d store for models, a set of APIs and UI. These are used to collaborat­ively manage the life cycle of an MLflow model. It enables the following:

ƒ Model lineage

ƒ Model versioning

ƒ Transition­s from staging to production

ƒ Annotation­s

MLflow installati­on

MLflow can be installed easily. For example, in Python this can be done using the PIP command:

pip install mlflow

With R, it can be installed as follows:

install.packages(“mlflow”) mlflow::install_mlflow()

To go ahead with quickstart, the code at https://github.com/mlflow/ mlflow can be cloned.

Tracking an API

With MLflow tracking API, logging of parameters, metrics and artifacts can be carried out easily. A simple example is given below:

import os from random import random, randint from mlflow import log_metric, log_ param, log_artifacts

if __name__ == “__main__”: # Log a parameter (key-value pair) log_param(“sample1”, randint(0, 100))

# Log a metric; metrics can be updated throughout the run log_metric(“Test”, random()) log_metric(“Test”, random() + 1) log_metric(“Test”, random() + 2)

# Log an artifact (output file) if not os.path.exists(“outputs”): os.makedirs(“outputs”) with open(“outputs/osfy.txt”, “w”) as f:

f.write(“Welcome to MLflow”) log_artifacts(“outputs”)

The tracking UI

The MLflow tracking UI can be invoked with the following command:

mlflow ui

With the successful execution of the above command, the tracking UI can be launched in the browser with the following URL: http://localhost:5000.

Executing MLflow projects

As stated earlier, MLflow can be used to package code and the associated dependenci­es into a project. The saved projects can be executed with the MLflow run command. The projects can be located from the local folder or from GitHub. A sample is shown below:

mlflow run sklearn_elasticnet_wine -P alpha=0.5

mlflow run https://github.com/mlflow/ mlflow-example.git -P alpha=5.0

An example using a linear regression model

Let us explore a complete example with a simple linear regression model. Carry out the following steps to get the environmen­t ready.

ƒ Install MLflow and Scikit-learn (if it is not already installed in your system):

pip install mlflow[extras]

ƒ Install Conda.

ƒ From the MLflow repository, clone the code:

git clone https://github.com/mlflow. mlflow

ƒ Navigate to the examples folder. The training code is located at examples/sklearn_elasticnet_wine/ train.py.

A portion of the code is shown below. (The complete code is available at https://mlflow.org/docs/latest/ tutorials-and-examples/tutorial.html.)

with mlflow.start_run():

lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42) lr.fit(train_x, train_y)

predicted_qualities = lr.predict(test_x)

(rmse, mae, r2) = eval_ metrics(test_y, predicted_qualities)

print(“Elasticnet model (alpha=%f, l1_ratio=%f):” % (alpha, l1_ratio)) print(“RMSE: %s” % rmse) print(“MAE: %s” % mae) print(“R2: %s” % r2)

mlflow.log_param(“alpha”, alpha) mlflow.log_param(“l1_ratio”, l1_ratio) mlflow.log_metric(“rmse”, rmse) mlflow.log_metric(“r2”, r2) mlflow.log_metric(“mae”, mae)

tracking_url_type_store = urlparse(mlflow.get_tracking_uri()). scheme

# Model registry does not work with file store if tracking_url_type_store !=

“file”:

# Register the model mlflow.sklearn.log_ model(lr, “model”, registered_model_ name=”Elasticnet­WineModel”) else: mlflow.sklearn.log_model(lr,

“model”)

The code can be executed with the default hyper-parameters:

python sklearn_elasticnet_wine/train.py

The hyper-parameters can be supplied as arguments as well:

python sklearn_elasticnet_wine/train.py

Every time the code is executed, the informatio­n is logged in the folder mlruns.

Comparing the models with the UI

The trained models can be compared by

launching the UI with the following command:

mlflow ui

The MLflow UI is shown in Figure 6.

Packaging training code

With the MLflow projects convention­s, the dependenci­es and the entry point

can be listed. For the example code explained above, the file is located in sklearn_elasticnet_wine/MLproject. name: tutorial conda_env: conda.yaml entry_points: main: parameters: alpha: {type: float, default: 0.5} l1_ratio: {type: float, default: 0.1} command: “python train.py {alpha} {l1_ratio}” conda.yaml lists the dependenci­es: name: tutorial channels: - defaults dependenci­es: - python=3.6 - pip - pip: - scikit-learn==0.23.2 - mlflow>=1.0

This project can be executed with specific parameter values, as shown below: mlflow run sklearn_elasticnet_wine -P alpha=0.48

Serving the model

It can be observed from the sample code that it saves the model as an artifact with the command code: mlflow.sklearn.log_model(lr, “model”)

The artifacts can be explored through the UI by clicking the specific date. A sample is shown in Figure 7.

The model can be served with MLflOW MODELS SERVE.

MLflow makes the ML applicatio­n developmen­t life cycle efficient. Though there are alternativ­es available, a salient feature of MLflow is its simplicity. The universal support for ML libraries makes it an important platform.

 ??  ??
 ??  ?? Figure 1: Challenges due to machine learning
Figure 2: MLflow advantages
Figure 1: Challenges due to machine learning Figure 2: MLflow advantages
 ??  ?? Figure 4: Organisati­ons supporting and using MLflow
Figure 4: Organisati­ons supporting and using MLflow
 ??  ?? Figure 3: MLflow integratio­ns
Figure 3: MLflow integratio­ns
 ??  ?? Figure 5: MLflow components
Figure 5: MLflow components
 ??  ?? Figure 7: Model artifacts from UI
Figure 7: Model artifacts from UI
 ??  ?? Figure 6: MLflow UI
Figure 6: MLflow UI

Newspapers in English

Newspapers from India