Big Data Solutions - (Basics)

Learning the primary task of large data analysis and application of course, is to understand the statistical methods and data mining and modeling method for the presentation of the results, followed by learning Excel data processing and programming, simple operation MySQL database and the basics of Hadoop. Advanced so as to increase in future.

basis

Statistics and Modeling demonstration

Exploratory data analysis presentation

Common probability distributions and progressive demonstration

Confidence intervals and hypothesis testing demo

Linear regression model demonstration

Generalized linear regression model demonstration

Data Mining demo

The basic flow classification demo forecast

Data preprocessing demo

Classification demo

Cluster analysis presentation

Correlation Analysis demo

Here small make up a large study and exchange data buttoned group: 251 956 502, finishing my own latest big data and advanced data Advanced Development course, if there is need to think, you can add together the group study exchange

Excel

Excel Data Processing

Exercise 1 Excel Basic Operations

Exercise 2 Excel data visualization

Exercise 3 Excel functions and formulas

Exercise Excel Pivot Table 4

Exercise 5 Excel Data Analysis

Excel Advanced Programming

Exercise 1 VBA program basis

Exercise 2 VBA Data Types

Exercise 3 VBA Process Control

Exercise 4 VBA integrated application

MySQL

Mysql database operations

Exercise 1 Mysql data manipulation statements

Exercise 2 Mysql query data

Exercise 3 Mysql data query Superior statement

Exercise 4 Mysql views and indexes

Hadoop architecture and basics

Hadoop installation

Exercise 1 Hadoop installation environment configuration

Exercise 2 Hadoop install stand-alone mode

Exercise 3 Hadoop pseudo-distributed mode installation

Exercise 4 Hadoop fully-distributed installation

HDFS operating principle and

Practice reading 1 HDFS file content - Example 1

Exercise 2 reads the HDFS file content - Example 2

Exercise 3 reads the HDFS file content - Example 3

Principle and implementation of MapReduce

Exercise 1 year minimum temperature requirements

Averaging the temperature Exercise 2

Hadoop development example - use MapReduce to sort

Exercise 1 total payroll of various departments seeking

The number of people seeking to practice various departments 2 and the average wage

MapReduce word frequency statistics

Exercise 1 MapReduce word frequency statistics

Iterative MapReduce program development

Exercise 1 MapReduce program development

Deployment and use Hadoop-HA

Installation Preparation

Exercise 1 Host Configuration

Exercise 2 JDK installation and set up zookeeper cluster

Hadoop cluster installation

Exercise 1 Hadoop cluster installation

Eclipse connection Hadoop running program mapreduce

Exercise 1 Eclipse connection Hadoop running program mapreduce

(1) Data and statistical modeling and data mining methods

Teaching mode of presentation, complete data summary, statistics, modeling, analysis, mining a complete set of process, so that students can intuitively grasp the common method and process large data analysis and application.

Method (2) Excel for data processing

Data analysis of the popularity ranking fifth (consulting firm Kdnuggets release), without any foundation requirements for students, Excel is one of the components of Microsoft office series of office software, which is a powerful spreadsheet program. Excel can not only be neat and beautiful form presented to the user, it can also be used to analyze and forecast data to complete many complex data operations, to help users make more informed decisions. It also has a powerful visualization, data in the table can be manifested through a variety of graphics, graphically enhanced form of expression and appeal. In Excel, some of the advanced features of data analysis need to master VBA be fully realized. Therefore, Excel is the most basic software tools for data mining and data analysis.

Basic use (3) MySQL database and basic programming methods

Data analysis ranked third in popularity (consulting firm Kdnuggets release), due to the small size, high speed, low cost of ownership, especially open source advantages, it get very common application development in small and medium sites. No student experiments on the basis of the requirements, the basic method of using the basic method of MySQL databases and SQL programming can be mastered.

(4) Hadoop architecture and environment to build knowledge and learning

As Hadoop big data is the cornerstone of the entire ecosystem in the data analysis of the popularity ranking seventh place, its architecture and the environment to build knowledge must learn to master. Through its implementation of distributed file system, HDFS, and as a group, about the installation of Hadoop, HDFS and operating principles, principles and implementation of MapReduce, iterative MapReduce program development. Through this class experiment, students can master the common methods and processes large data analysis with Hadoop.

Guess you like

Origin blog.51cto.com/14296550/2421992