What technologies used in big data?

The original address: https: //blog.51cto.com/12306609/2095719

Big Data is a mass data storage, computing, statistics, analysis and processing of a series of processing means, the amount of data processing is typically TB level, even PB or EB-level data, which is the traditional means of data processing can not be done, there are techniques which involves distributed computing, high concurrency, high availability processing, clustering, real-time calculation, collection of current popular popular iT field of iT technology.

Want to learn to master big data needs the following technologies:

1. Java programming technology

Java programming technology is the basis for large data for learning, Java is a strongly typed language, with a high cross-platform capabilities, you can write desktop applications, Web applications, distributed systems and embedded systems applications, is big data engineers favorite programming tools, so large data you want to learn to master the basics of Java is essential!

2.Linux command

For large data development is usually done in the Linux environment, compared to the Linux operating system, Windows operating system is a closed operating system, open source big data software is very limited, therefore, wish to engage in large data development related work, the need to master Linux base operating commands.

3. Hadoop

Hadoop big data is an important framework for the development of its core HDFS and MapReduce, HDFS provides storage for the vast amounts of data, MapReduce provides vast amounts of data for the calculation, therefore, it is important to grasp, in addition, also we need to have a Hadoop cluster , Hadoop cluster management, YARN Hadoop and senior management and other related technical and operational!

4. Hive

Hive is based on Hadoop data warehousing tools, you can map the structure of the data file to a database table, and provides a simple sql queries, sql statement can be converted to run MapReduce tasks, statistical analysis of the data warehouse is very suitable . For Hive need to master its installation, application and senior operations.

5. Avro 与Protobuf

Avro and Protobuf are data serialization system that can provide a wealth of data structure types, is very suitable for data storage, but also for data communication between each other in different languages ​​interchange format, a large study data, the need to master the specific usage.

6.ZooKeeper

ZooKeeper is an important component of Hadoop and Hbase, is to provide a consistent service for distributed applications, provides features include: configuration maintenance, domain name service, distributed synchronization, assembly services, in large data development to master the ZooKeeper implementation commonly used commands and functions.

7. HBase

HBase is a distributed, open column-oriented database, it is different from the relational database, more suitable for unstructured data store is a database, a high-reliability, high performance, column-oriented, distributed storage scalable system, large data development HBase need to master the basics, applications, architecture, and advanced usage and so on.

8.phoenix

phoenix is ​​written in Java-based open source operating HBase the JDBC API SQL engine, which has a dynamic column, hash load, query server, tracking, transaction, user-defined functions, secondary index, namespace mapping, data collection, time line timestamp columns, pagination queries, jumping queries, views, and multi-tenant properties, and large data development needs to master the principles and methods of use.

9. Redis

Redis is a key-value storage system that appears to a large extent compensate for the lack of such memcached key / value store, you can play a very good complement to relational database in some situations, it provides Java, C / C ++, C #, PHP, JavaScript, Perl, Object-C, Python, Ruby, Erlang and other clients, very easy to use, developers need to master big data Redis installation, configuration, and related methods of use.

10. Flume

Flume is a highly available, highly reliable, distributed massive log collection, aggregation and transmission systems, to support various types of customized data Flume sender log system for collecting data; Meanwhile, Flume provides simple data processing and the ability to write to a variety of data recipients (customizable) is. Big Data development need to master its installation, configuration, and related methods of use.

11. SSM

SSM is integrated by the frame Spring, SpringMVC, MyBatis from three open source framework, often as a simple source data frame web project. Big Data development are required to master the Spring, SpringMVC, MyBatis three frameworks at the same time, then use the SSM to integrate operations.

12.Kafka

Kafka is a high throughput of distributed publish-subscribe messaging system, which aims at the development and application of big data is to unified messaging online and offline through parallel loading mechanism Hadoop, but also through the cluster in order to provide real-time messaging . Big Data development need to master to achieve Kafka architecture and various components of the principle of action and methods of use and related functions!

13.Scala

Scala is a multi-paradigm programming language, an important framework for the development of Big Data Spark is the use of the Scala language design, and want to learn Spark framework with Scala basis is essential, therefore, need to master big data development Scala programming basics!

14.Spark

Spark is designed for large-scale data processing designed for fast general-purpose computing engine that provides a comprehensive and unified framework for managing various needs of different types of data sets and data sources of large data processing, large data development needs Spark master basis, SparkJob, Spark RDD, spark job deployment and resource allocation, Spark shuffle, Spark memory management, Spark broadcast variables, Spark SQL, Spark Streaming and Spark ML and other related knowledge.

15.Azkaban

Azkaban is a batch workflow task scheduler, can be used to run a set of operational and process to a particular order in a workflow, can be utilized Azkaban to accomplish the data scheduling, large data required to develop a master configuration Azkaban and grammar rules.

16.Python and Data Analysis

Python is an object-oriented programming language with a rich library, using simple, widely used, has also been used in large data areas, mainly for data collection, data analysis, and data visualization, therefore, big data development needs to learn some Python Knowledge.

Guess you like

Origin www.cnblogs.com/aabbcc/p/11531508.html
Recommended