Live Preview | Accelerating Deep Learning Inference with dl_inference Open Source Tool

dl_inference is a general-purpose deep learning inference service open sourced by 58.com. It can quickly launch deep learning models trained by TensorFlow, PyTorch, and Caffe frameworks in the production environment. It provides two deployment methods, GPU and CPU, and realizes the multi-node deployment of the model. dl_inference supports more than 1 billion online inference requests per day in various AI scenarios in 58.com.


On December 10th, the open source technology salon "Using dl_inference Open Source Tools to Accelerate Deep Learning Reasoning" hosted by 58.com Technical Committee and 58.com AI Lab will officially meet with you! The salon will be broadcast live online, and you are welcome to watch it on time at 19:00 on December 10:




1

schedule



02

Detailed introduction

1. Wei Zhubin, senior engineer at the back-end of 58.com AI Lab
Sharing topic: Using TensorRT to accelerate deep learning inference
Topic introduction: It mainly introduces the acceleration principle of TensorRT and how to use dl_inference to complete the automatic optimization and conversion of TensorFlow and PyTorch framework training models and TIS (Trion Inference Server) inference deployment.
User Pain Points: The inference stage of the deep learning model has high requirements on computing power and delay. If the trained neural network is directly deployed to the inference end, there may be problems such as insufficient computing power to run or long inference time.
New technology/practical technology points:
1. Use TensorRT to realize automatic optimization of TensorFlow and PyTorch framework training models.
2. Use the TIS (Trion Inference Server) inference engine to deploy the model.
Listener benefits:
1. Understand how dl_inference completes the automatic optimization of the model TensorRT.
2、掌握dl_inference中模型GPU上TIS(Trion Inference Server)部署方法。
2、韩雨 58同城AI Lab后端高级工程师
分享议题:使用MKL(Math Kernel Library)加速深度学习推理
议题介绍: 阐述MKL(Math Kernel Library)加速原理,以实际案例介绍dl_inference在TensorFlow Serving MKL版本的具体应用和效果。
用户痛点:模型CPU上推理耗时高而切换到GPU上推理成本高,迫切需要提高模型在CPU上的推理性能。
新技术/实用技术点:
1、MKL(Math Kernel Library)加速模型CPU推理原理。
2、TensorFlow Serving MKL如何进行参数调优。
听众收益:了解TensorFlow Serving MKL版本应用场景和效果。

03

收看方式


本文分享自微信公众号 - 58技术(architects_58)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324124291&siteId=291194637