Getting big data to the master courses, Big Data learning, you have to know

Think after the initiation of the idea to the development of large data direction, can not help but have some doubts, how should the entry? What technologies should learn? What learning route is? The idea that all initiation into line with the original intention of the students want to learn Java is the same. Job very fire, payroll employment is relatively high,, prospects are very impressive. Basically this reason yearning big data, but big data are poorly understood. If you want to learn, then first you need to learn programming, secondly you need to have knowledge of mathematics, statistics, and finally converged applications, you can want to develop in the direction of the data, generally speaking, it is one such. But this alone does not help, what specifically is it, how much data as Branch teacher look together.

Getting big data to the master courses, Big Data learning, you have to know

If you want to learn the best big data added to a good learning environment, this may be the Q group 251,956,502 so that everyone would be relatively easy to learn, but also to communicate and share information on common

Now you need to ask yourself a few questions:

1. For the computer / software, what is your interest?

2 is a computer professional, interested in the operating system, hardware, network, server?

3 is a professional software, software development, programming, writing code are interested in?

4. or Math, Statistics, particularly interested in data and numbers.

5. What is your specialty?

Big Data learning stages

Stage one, Java language foundation

Java developers introduced, the familiar Eclipse development tools, Java language foundation, process control Java, Java strings, arrays and Java classes and objects, classes and digital processing core technology, I / O and reflection, multi-threading, Swing programs and collections

Phase II, HTML, CSS and Java

PC terminal site layout, HTML5 + CSS3 basis, WebApp page layout, native Java interactive feature development, Ajax asynchronous interaction, jQuery applications

Phase III, JavaWeb and databases

Database, JavaWeb development of core, JavaWeb development insider

Phase IV, LinuxHadoopt system

Linux system, Hadoop offline computing outline, distributed database Hbase, data warehouse Hive, data migration tool Sqoop, Flume distributed logging framework

Stage 5, the actual (real front-line company project)

Data acquisition, data processing, data analysis, data presentation, data applications

Stage six, Spark ecosystem

Python programming language, Scala programming language, Spark big data processing, Spark-Streaming Big Data processing, Spark-Mlib machine learning, Spark-GraphX ​​map calculation, real one: Spark recommendation system (a line of company real project) based combat two : Sina (www.sina.com.cn)

Stage seven, Storm ecosystem

storm technology architecture system, Storm and basic principles, message queues kafka, Redis tools, zookeeper Detailed, real one: Log warning system project, combat two: you may also like recommendation system combat

Phase Eight, big data analytics -AI (Artificial Intelligence)

Data Analyze data analysis of the working environment ready, data visualization, Python Machine Learning

1, Python 2 machine learning, neural network image recognition, natural language processing social network processing, real items: outdoor equipment recognition analysis

[If! SupportLists] · [endif] Currently on the market there are many training institutions or training institutions posts, essentially to give you all the skills you consider the appropriateness of said zero-based, clear back to you, it can but if it is following undergraduate education, development of science big data more difficult, many professional big data, big data analytics, big data development, database development.

In general course of development of large data classes are learning 4 months, 3 months, such as database development in the field of individual enough, big data development requirements undergraduate education more relaxed, database specialist for more than enough.

From the business side, big data talent can be divided into product and market analysis, security and risk analysis and business intelligence three major areas.

Product analysis is to test the effectiveness of new products by algorithm, is a relatively new field. In terms of security and risk analysis, data scientists need to know what data to collect, how to quickly analyze, and ultimately to effectively mitigate network *** or catch cyber criminals by analyzing the information. For job seekers want to engage in big data work for, how to choose jobs based on their own conditions?

Here are ten kinds of "big data" related to the Hot Jobs:

A, ETL development

With the continuous increase in data types, data integration for enterprise professionals increasingly strong demand. ETL developers to deal with different data sources and organizations to extract data from different sources, and importing data warehouse to meet the needs of enterprises. ETL development, is responsible for the dispersion of heterogeneous data, such as data sources relational data, flat data files to a temporary intermediate layer extract was washed, after conversion, integration, and finally loaded into the data warehouse or data mart, the line becomes analytical processing, data mining foundation. Currently, ETL industry is relatively mature, related jobs work longer cycle life, usually done by between internal employees and outsourced contractors Kone. One of the reasons ETL talent in the big data era is hot: in the early stages of large enterprise data applications, Hadoop is just poor ETL.

Second, the development of the core Hadoop HDFS and Hadoop are MapReduce.HDFS provide mass data storage, MapReduce provides a calculation of the data. With the increasing size of data sets, data processing and high cost of traditional BI, business demand for cheap Hadoop and related technologies such as data processing Hive, HBase, MapReduce, Pig, etc. will continue to grow. Now with Hadoop framework experienced technical staff is the most sought-after big data talent.

Third, the development of visualization tools

Analysis of mass data is a big challenge, and the new data visualization tools such Spotifre, Qlikview Tableau and visually display data efficiently. Visual development in a graphical user interface is a visual development tools provided by the user interface elements automatically generated by the visualization application development tool. Also easily across multiple levels of resources and connect all of your data, time-tested, fully scalable, feature-rich and comprehensive visual component library provides developers with full-featured and easy to use set of components to be used to build extremely rich user interface. In the past, data visualization belong to the category of business intelligence developers, but with the rise of Hadoop, data visualization has become an independent professional skills and jobs.

Fourth, the development of information architecture big data re-energize the boom master data management.

Full development and utilization of enterprise data and support decision-making requires very specialized skills. Information architects must understand how to define and archive key elements to ensure that the most effective approach to data management and use. Key skills include information architect master data management, business knowledge and data modeling.

V. Data Warehousing

Strategic collection of all types of data in the data warehouse is to develop the process for all levels of corporate decision-making support. It is a single data store, for the purposes of reporting and analytical decision support and creation. Providing needed business intelligence to guide the business process improvement and monitoring time, cost, and quality control. Experts familiar with the data warehouse Teradata, Neteeza and large companies such as Exadata machine data. To complete data integration, management and performance optimization work on the machine.

Six, OLAP development

As the amount of data and application development, database storage database technology from 1980s of mega (M) and giga bytes (G) bytes to the current transition trillion trillion (T) giga bytes and megabytes ( P) bytes, while the user's query needs more and more complex, involving not only has to query or manipulate one or several records in a table, but also to the data in multiple tables of millions of records of data information analysis and synthesis. Online analytical processing (OLAP) system is responsible for resolving issues such massive data processing. OLAP line analysis developer, responsible for data extraction from relational or non-relational data sources out of the model, and then create a data access user interface that provides high-performance pre-defined queries.

Seven scientific data

This position is also referred to past research data architecture, data scientist is a new type of work, it will be able to enterprise data and technology into the commercial value of the business. With the progress of science data, more and more work will be directed at the actual data, which will allow human knowledge data to understand nature and behavior. Therefore, the data scientists first should have excellent communication skills, be able to interpret the results of data analysis while the IT department and business unit leaders. Overall, the data analyst scientists, artists fit, require a variety of interdisciplinary science and business skills. Eight, predictive analysis data

Marketing departments often use predictive analysis to predict user behavior or targeted user. Predictive analytics developers have spotted some scenes somewhat similar data scientists, namely to test threshold and to predict the future performance of the enterprise on the basis of historical data by assuming.

Nine, Enterprise Data Management

Enterprises should improve data quality must consider data management, and data stewards need to set up to this end position, this position need to be able to use a variety of technological tools pooled large amounts of data around the enterprise, and data cleansing and normalization, data import data warehouse, becoming a usable version. Then, through reporting and analysis technology, data is sliced, diced, and delivered to thousands of people. People act as data stewards, data need to ensure market integrity, accuracy, uniqueness, authenticity and not redundant.

X. Data Security Studies

Data security in the office, is responsible for large-scale enterprise servers, storage, data security management, and network planning, information security project design and implementation. Data security researchers also need to have strong management experience, with the knowledge and ability of operation and maintenance management of traditional business enterprises have a more profound understanding, in order to ensure the security of enterprise data to do a trace does not leak.

Guess you like

Origin blog.51cto.com/14296550/2463306