What are MLOps? Why Use MLOps for Machine Learning Practice

With the development of digitalization and computing power, the emerging potential of machine learning (Machine Learning) technology in improving enterprise productivity has attracted more and more attention. However, many machine learning models and applications have not met expectations in the actual production environment. , a large number of ML projects have proven to be failures. From the perspective of the development of machine learning, the early ML community focused extensively on the construction of ML models to ensure that the models can achieve better performance on the predefined test data sets, but how to make the model move from the laboratory to the user's desktop , has not received much attention.

The life cycle of machine learning involves many processes, such as data acquisition, data preparation, model training, model adjustment, model deployment, model monitoring, model interpretability, etc. Different processes involve different process tools and personnel. Collaboration and handoff across teams is required, from data engineering to data science to ML engineering. In this context, how to ensure the following many goals in production is an urgent problem to be solved. Such as enabling faster model development, delivering higher quality ML models, and faster deployment and production; simultaneously overseeing, controlling, managing, and monitoring thousands of models for continuous integration, continuous delivery, and continuous deployment; Increased model transparency to ensure better compliance with organizational or industry policies.

Especially in the era of LLM models, with the advent of AGI, the size and complexity of machine learning models are also increasing. This means that the difficulty of deploying these models to production environments is also rising, requiring more specialized tools and methods to manage and monitor.

What are MLOps?
MLOps (Machine Learning Operations) is a practice of integrating machine learning (ML) models into a production environment with the aim of ensuring the validity and reliability of the models during production. By adopting an MLOps approach, data scientists and machine learning engineers can collaborate and accelerate the pace of model development and production by implementing continuous integration and deployment (CI/CD) practices with proper monitoring, validation, and governance of ML models. It makes the machine learning model move from the laboratory to the production environment, and accelerates the landing and commercialization of machine learning.

Differences from DevOps
MLOps are a set of engineering practices specific to machine learning projects that draw on DevOps principles more widely adopted in software engineering. DevOps brings a rapid, continuously iterative approach to delivering applications, while MLOps borrows the same principles to bring machine learning models into production. In both cases, the result is higher software quality, faster patches and releases, and higher customer satisfaction.

Both MLOps and DevOps are approaches aimed at simplifying and automating software application development and deployment. DevOps focuses on the general software development process and IT operations, but MLOps specifically addresses the unique challenges and complexities of machine learning applications. But both approaches aim to increase collaboration, automation and efficiency in software application development, deployment and management.

MLOps Workflow
Data Preparation and Feature Engineering - Iteratively explore, share and prepare data for the machine learning lifecycle by creating reproducible, editable and shareable datasets, tables and visualizations. Transform, aggregate, and deduplicate data iteratively to create improved features. Importantly, leverage a feature store so that features are visible and shared across data teams.

Model Training and Tuning - Use popular open source libraries to train and improve model performance. As an easier alternative, use automated machine learning tools such as AutoML to automate pilot runs and create reviewable and deployable code.

Model Management - Track model lineage, model versions, and manage model artifacts and transformations throughout their lifecycle. Discover, share, and collaborate across ML models with open source MLOps platforms like MLflow.

Model Inference and Serving - Manage model refresh frequency, inference request timing, and production-like details in testing and QA. Use CI/CD tools like repos and orchestrators (borrowing devops principles) to automate pre-production pipelines.

Model Deployment and Monitoring - Automate permissions and cluster creation to produce registered models. Enable the REST API model endpoint.

Automatic model retraining - collect the indicator information of model monitoring to retrain the model in a targeted manner

MLOps infrastructure and tools
MLOps (Machine Learning Operations) is an approach designed to accelerate the development, deployment and maintenance of machine learning applications. To achieve this goal, MLOps uses many infrastructures and tools.

Data pipeline management: Tools for wrangling, cleaning, and transforming data, such as Apache NiFi, Luigi, and Apache Airflow.

Version Control: Tools for tracking code, data, and model changes, such as Git, DVC (Data Version Control) and MLflow.

Model Training: Tools and platforms for training models on various hardware environments, such as TensorFlow, PyTorch, Keras and Apache MXNet.

Model Validation and Testing: Tools for evaluating model performance and accuracy, such as TensorFlow Extended (TFX) and MLflow.

Model Deployment: Tools and platforms for deploying models to production environments, such as TensorFlow Serving, NVIDIA Triton Inference Server, AWS SageMaker, and Microsoft Azure Machine Learning.

Model Monitoring: Tools for real-time tracking of model performance and health, such as Grafana, Prometheus, and the ELK Stack (Elasticsearch, Logstash, Kibana).

Automation and Continuous Integration/Continuous Deployment (CI/CD): Tools for automating machine learning workflows, such as Jenkins, GitLab CI/CD, and GitHub Actions.

Containerization and Orchestration: Container technologies such as Docker and Kubernetes for easier deployment and management.

Cloud Service Providers: Cloud platforms that provide various machine learning services and infrastructure, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

The goal of these infrastructures and tools is to help data scientists, machine learning engineers, and operations teams collaborate more effectively to develop, deploy, and maintain machine learning applications faster.

Why use a professional MLOps platform like starwhale.ai for machine learning practice.
Starwhale is an open source MLOps platform serving model trainers and machine learning developers. It can easily build, deploy and maintain ML systems, and improve the cooperation and work efficiency of AI practitioners, teams and enterprises.

Model Evaluation
Starwhale Start your MLOps journey with model evaluation, which plays an important role in machine learning. Model evaluation quantifies the performance of the model on the test dataset. Evaluation metrics help data scientists understand the performance of machine learning models. Therefore, the weaknesses and strengths of the model are well known. By comparing the metrics of the models, the best performing model can be selected and published.

Starwhale supports multiple types of model evaluation, and the evaluation results are visualized to simplify the model evaluation process.

1. Parallel comparison of multiple evaluation results, prompting changes in indicators, and assisting model tuning.

2. Visual evaluation results, support custom icons to visually display evaluation results

3. Starwhale has a componentized visualization tool that supports embedded pages to meet the needs of various model evaluation scenarios.

4. Searching and filtering are intuitive and easy to use, and at the same time support SQL-style advanced search, which conforms to the habits of R&D personnel and improves the search experience.

insert image description here

Data Management
Starwhale's dataset management function is very powerful, supports data visualization and version management in multiple formats, focuses on data understanding and insights, and improves labeling efficiency.

1. Online visualization of mainstream visual, audio and video, NLP and other data and annotation information

2. Linearly record the dataset version, support custom version labels, and support version rollback.

3. Manage data labels in batches through the SDK to improve labeling efficiency.

insert image description here

Model management
Starwhale focuses on model iteration and debugging process management, solving the pain points of recurrence and traceability.

1. Flexibility: Starwhale models can strip redundant information to obtain smaller packages, supporting collaboration with production teams without showing python reasoning code to avoid hazards.

2. Visualization of evaluation results: through componentized visualization tools, the evaluation results are displayed more explicitly, making it easier to understand and analyze data

3. Visualization of version differences: Support model comparison of different versions, gain insights into code changes and impacts, and assist in debugging.

4. Fast online prediction: Use small batch data to verify or debug the model, easy to operate, and the effect is intuitive.
insert image description here

Environmental management
Starwhale focuses on model development and evaluation experience, lowering the threshold for development and debugging

It supports one-click sharing of the runtime environment to others, and supports saving the runtime environment as a mirror image, which is convenient for sharing and use.

Support multiple mainstream environments, focus on model development and evaluation experience, and lower the threshold for development and debugging.

insert image description here

Guess you like

Origin blog.csdn.net/weixin_54164365/article/details/131350707