Learning the primary task of large data analysis and application of course, is to understand the statistical methods and data mining and modeling method for the presentation of the results, followed by learning Excel data processing and programming, simple operation MySQL database and the basics of Hadoop. Advanced so as to increase in future.
basis
Statistics and Modeling demonstration
Exploratory data analysis presentation
Common probability distributions and progressive demonstration
Confidence intervals and hypothesis testing demo
Linear regression model demonstration
Generalized linear regression model demonstration
Data Mining demo
The basic flow classification demo forecast
Data preprocessing demo
Classification demo
Cluster analysis presentation
Correlation Analysis demo
Here small make up a large study and exchange data buttoned group: 251 956 502, finishing my own latest big data and advanced data Advanced Development course, if there is need to think, you can add together the group study exchange
Excel
Excel Data Processing
Exercise 1 Excel Basic Operations
Exercise 2 Excel data visualization
Exercise 3 Excel functions and formulas
Exercise Excel Pivot Table 4
Exercise 5 Excel Data Analysis
Excel Advanced Programming
Exercise 1 VBA program basis
Exercise 2 VBA Data Types
Exercise 3 VBA Process Control
Exercise 4 VBA integrated application
MySQL
Mysql database operations
Exercise 1 Mysql data manipulation statements
Exercise 2 Mysql query data
Exercise 3 Mysql data query Superior statement
Exercise 4 Mysql views and indexes
Hadoop architecture and basics
Hadoop installation
Exercise 1 Hadoop installation environment configuration
Exercise 2 Hadoop install stand-alone mode
Exercise 3 Hadoop pseudo-distributed mode installation
Exercise 4 Hadoop fully-distributed installation
HDFS operating principle and
Practice reading 1 HDFS file content - Example 1
Exercise 2 reads the HDFS file content - Example 2
Exercise 3 reads the HDFS file content - Example 3
Principle and implementation of MapReduce
Exercise 1 year minimum temperature requirements
Averaging the temperature Exercise 2
Hadoop development example - use MapReduce to sort
Exercise 1 total payroll of various departments seeking
The number of people seeking to practice various departments 2 and the average wage
MapReduce word frequency statistics
Exercise 1 MapReduce word frequency statistics
Iterative MapReduce program development
Exercise 1 MapReduce program development
Deployment and use Hadoop-HA
Installation Preparation
Exercise 1 Host Configuration
Exercise 2 JDK installation and set up zookeeper cluster
Hadoop cluster installation
Exercise 1 Hadoop cluster installation
Eclipse connection Hadoop running program mapreduce
Exercise 1 Eclipse connection Hadoop running program mapreduce
(1) Data and statistical modeling and data mining methods
Teaching mode of presentation, complete data summary, statistics, modeling, analysis, mining a complete set of process, so that students can intuitively grasp the common method and process large data analysis and application.
Method (2) Excel for data processing
Data analysis of the popularity ranking fifth (consulting firm Kdnuggets release), without any foundation requirements for students, Excel is one of the components of Microsoft office series of office software, which is a powerful spreadsheet program. Excel can not only be neat and beautiful form presented to the user, it can also be used to analyze and forecast data to complete many complex data operations, to help users make more informed decisions. It also has a powerful visualization, data in the table can be manifested through a variety of graphics, graphically enhanced form of expression and appeal. In Excel, some of the advanced features of data analysis need to master VBA be fully realized. Therefore, Excel is the most basic software tools for data mining and data analysis.
Basic use (3) MySQL database and basic programming methods
Data analysis ranked third in popularity (consulting firm Kdnuggets release), due to the small size, high speed, low cost of ownership, especially open source advantages, it get very common application development in small and medium sites. No student experiments on the basis of the requirements, the basic method of using the basic method of MySQL databases and SQL programming can be mastered.
(4) Hadoop architecture and environment to build knowledge and learning
As Hadoop big data is the cornerstone of the entire ecosystem in the data analysis of the popularity ranking seventh place, its architecture and the environment to build knowledge must learn to master. Through its implementation of distributed file system, HDFS, and as a group, about the installation of Hadoop, HDFS and operating principles, principles and implementation of MapReduce, iterative MapReduce program development. Through this class experiment, students can master the common methods and processes large data analysis with Hadoop.