From 4 months to 7 days, Netflix open source Python framework to enhance the magic Metaflow What performance?

Author | Rupert Thomas

Translator | Kay hidden

Edit | Jane

Produced | AI technology base camp (ID: rgznai100)

 

[Lead] Metaflow developed by Netflix, the Python framework in the field of scientific data, in December 2019 officially open. According to reports, Metaflow data scientists to solve some of the challenges faced in scalability and version control, processing pipeline to be built in a series of steps by drawing there. Metaflow can more easily move to the cloud line local resources to run (but currently only support AWS cloud). Each step in the flowchart in the ED nodes operating independently, and has a unique dependency, Metaflow handles the internal communication between nodes. Today, we'll introduce Metaflow, we want to help. 

              

Metaflow Profile

 

Metaflow inside Netflix has been used in all aspects of machine learning tasks, such as optimizing ad delivery, video encoding. Metaflow it appears that in order to improve the efficiency of the deployment of the model, the entire model development, deployment, update process more systematic, thereby enhancing the speed of deployment.

 

For data scientists, they are more concerned about the work directly related to the performance of the model design and features model engineering, and hope for rapid deployment model in order to verify that the model can be upgrade in a production environment, such as the environment and do not want to rely on , version control, data warehouse management, and other basic tasks in process a waste of energy, appear Metaflow is to solve this problem.

              

Metaflow can simplify or even automate the underlying task, so that data scientists can more quickly and easily deploy models to focus on to improve model performance and performance in real engineering environment, and improve productivity. So, this is a people-centered framework. Recently, Netflix also revealed that, Metaflow has Netflix machine learning projects median deployment time reduced from four months to just seven days. The next will focus on works Metaflow and features.

 

The basic operating principle

              

如上图,可以用一个有向非循环图来表示工作流程,图中的每个节点都表示一个流程中的一个阶段,这些阶段可以是任意的 Python 代码,在上图的例子中,Metaflow 并行的训练两个不同版本的模型,并选择性能最好的那个。这是一种单机并行处理的方式,类似于 Python 中的  multiprocessing 包。如果要部署到云资源,只需要额外指定一个命令行参数 --with batch,即可告知 Metaflow 在云上运行代码。目前只支持亚马逊的 Web 服务器,不过应该很快就会支持更多云服务器。

 

每个阶段的末尾都有一个检查点,可以在以后的阶段中从任意检查点恢复执行,以帮助调试。但是你不能逐行检查您的代码。

 

版本控制

 

机器学习的版本控制颇具挑战性,因此 Metaflow 也特别照顾了这个问题。每个步骤的运行代码和数据都是散列(hash)的,图中每个节点的执行都被记录下来,并且超参数设置和运行结果都被存储:

from metaflow import FlowSpec, Parameter, step


class FitModelFlow(FlowSpec):
    alpha = Parameter('alpha',
                      help='Learning rate',
                      default=0.01)


    @step
    def start(self):
        print('alpha is %f' % self.alpha)
        self.data = load_data()
        self.next(self.fit)


    @step
    def fit(self):
        self.model = fit(self.data)
        self.next(self.end)


    @step
    def end(self):
        print(f'Results for LR={self.alpha}: {eval(self.model)}')


if __name__ == '__main__':
    FitModelFlow()

超参数的设置可以很容易地通过命令行参数来完成:

python metaflow_parameter.py run --alpha 0.001

元数据以 JSON 格式存储在文件系统中,可以访问存储在任何阶段的变量数据,也可以很容易地获取最后一次成功运行的结果。

run = Flow(flow_name).latest_successful_run

依赖管理

 

Metaflow 还提供了依赖管理机制,可以在图阶段或节点阶段通过装饰器来指定,并且可以指定特定的Python版本或特定的包:

from metaflow import FlowSpec, step, conda, conda_base


@conda_base(python='3.6.5')
class FitModelFlow(FlowSpec):


    @step
    def start(self):
        self.data = load_data()


    @conda(libraries={"scikit-learn": "0.19.2"})
    @step
    def fit(self):
        from sklearn import svm
        self.model = svm.LinearSVC( ... )


# ...

例如可以从命令行运行时指定 conda 环境标志:

python metaflow_conda.py --environment=conda run

开始使用Metaflow

 

可以通过pip命令来安装metaflow:

pip install metaflow

更多教程和详细介绍可以参考官方文档:

https://docs.metaflow.org/getting-started/tutorials

 

原文链接:

https://towardsdatascience.com/what-exactly-is-metaflow-c007e5b75b5

(*本文为AI科技大本营翻译文章,转载请微信联系 1092722531)

精彩推荐

2020年,由 CSDN 主办的「Python开发者日」活动(Python Day)正式启动。我们将与 PyCon 官方授权的 PyCon中国社区合作,联手顶尖企业、行业与技术专家,通过精彩的技术干货内容、有趣多元化的活动等诸多体验,共同为中国 IT 技术开发者搭建专业、开放的技术交流与成长的家园。未来,我们和中国万千开发者一起分享技术、践行技术,铸就中国原创技术力量。

【Python Day——北京站】现已正式启动,「新春早鸟票」火热开抢!2020年,我们还将在全国多个城市举办巡回活动,敬请期待!

活动咨询,可扫描下方二维码加入官方交流群~

CSDN「Python Day」咨询群 ????

来~一起聊聊Python

如果群满100人,无法自动进入,可添加会议小助手微信:婷婷,151 0101 4297(电话同微信)


推荐阅读

    你点的每个“在看”,我都认真当成了AI

发布了1307 篇原创文章 · 获赞 1万+ · 访问量 545万+

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/104079021