EndtoEnd Machine Learning Project Template – How To Bui

Author: Zen and the Art of Computer Programming

1 Introduction

In recent years, with the increasing application of high-tech fields such as deep learning, machine learning, and computer vision, many companies have also turned to machine learning solution providers, such as Google, Facebook, and Microsoft. As technicians and data scientists, how to quickly establish a complete data science project process and the theoretical knowledge, mathematical foundation and programming ability behind it is particularly important. In this series of "Machine Learning Practice" articles, the latest developments in AI, machine learning, deep learning, image processing and other related technologies will be explained in depth from the three aspects of "machine learning", "data engineering" and "application scenarios". How to integrate them into real application scenarios. Each chapter of these articles will discuss a specific problem and give some specific solutions. This article is an extended reading based on this series of articles-providing a reference template for people who want to build their own machine learning project process from scratch and understand it in depth.

2. Background introduction

In the following chapters, I will take a simple example of a data science project—house price prediction as an example, to introduce you to the various stages of the entire data science project and the basic skills and literacy required. The technology stack involved in this project mainly includes data collection, data cleaning, feature extraction, model training, model evaluation, model deployment and monitoring, etc., and at the same time must have strong analysis ability and teamwork spirit.

3. Explanation of basic concepts and terms

In order to successfully complete each stage of a data science project, one should first have a comprehensive understanding of the relevant basic theories, key terms, and workflow.

Data collection and cleaning

Data collection refers to the collection of raw data from the Internet, databases, mobile devices, or other sources. The quality, quantity, and type of data directly affect the final results, so the accuracy and completeness of the data must be guaranteed. Data cleaning refers to the preliminary processing of data in order to eliminate errors, missing values ​​or invalid values ​​in the data. The cleaned data can be used for subsequent analysis and processing.

data characteristics

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132493457