Large Scale Distributed Deep Learning using Kubernetes

Author: Zen and the Art of Computer Programming

1 Introduction

With the popularity of data science, more and more people have begun to pay attention to the most cutting-edge research direction in the field of data science - machine learning. One of the important research directions is deep learning (Deep Learning), which is an algorithm for learning through the nonlinear fitting of data by neural networks. The success of deep learning is inseparable from a large number of computing resources, massive data and scalable parallel computing. Therefore, how to effectively utilize these resources to realize distributed parallel training has become one of the hot topics at present. Apache SystemML is a memory-based distributed machine learning system on Hadoop and Spark. It can achieve very large levels of data scale and provide high-performance operation, and supports a wide range of machine learning algorithms. This article will introduce the architecture, workflow of Apache SystemML and the practice of using Kubernetes to realize large-scale deep learning training in a distributed environment. The main tools involved in the article include Apache Hadoop, Apache Spark, Apache SystemML, Kubernetes, etc. Readers need to understand the basic usage of related concepts and tools, and be proficient in related programming skills in order to better understand and apply the system.

2. Related background

2.1 Definition of Deep Learning

Deep learning (Deep Learning) refers to the use of multi-level abstract neural networks to solve some complex problems in the fields of computer vision, speech recognition, and natural language processing. It usually consists of multiple convolutional neural networks or other types of network layers, which are highly nonlinear and capable of learning knowledge from raw input data. Deep learning can be used for tasks such as classification, prediction, and regression, and has achieved extremely good results.

2.2 Big data technology and open source ecology

The rapid development of big data technology has promoted the rise of cloud computing. The emergence of early big data platforms such as Hadoop and Hive made storing and analyzing data extremely simple. With the popularity of the Internet, big data technology has also entered the center of software development. Today, there is a lot of information about big data technologies in the open source community

おすすめ

転載: blog.csdn.net/universsky2015/article/details/132644825