Foreword
Benpian data mining Primer introduces the basics of data mining, basic tools and practices, by explaining step by step algorithm with you easily set foot on data mining trip.
Benpian by way of theory and practice, showing how to use the decision tree and random forest algorithms to predict the results of the National Basketball League game, how to use affinity analysis method recommended film, how naive Bayes algorithm using social media mining ,and many more.Benpian also relates to the content of the neural network, the depth of learning, large data processing.
Benpian willing to learn and try for data mining programmers.
Full release of Python data analysis capabilities, to master the core technology era of big data, ease of entry and data mining techniques applied to the actual project.
Data in this chapter is divided into 12 chapters of content, because the content is too much detail, so small series only the part of the knowledge point shots out of the rough introduction, each section has a more detailed content.
Chapter 1 starts data mining tour, we are about to introduce technology used, followed by a warm-up to achieve the aim by implementing two methods to explain the basis of the algorithm.
Chapter 2 with scikit-learn estimator classification, covering data mining - an important theme in a classification. This chapter also describes the data mining process standardization pipeline structure, easy to manage your experiment process.
Chapter 3 predict the winning team with decision trees, decision trees and introduce two new random forest algorithms. We will predict the winning player by extracting features of high distinction.
Chapter 4 recommended film with affinity analysis, think in terms of problems in the past spending records recommended products introduced Apriori algorithm.
Chapter 5, characterized by extracting converter, describes different types of feature extraction methods and treatment of different data sets.
Chapter 6 Using Naive Bayes social media mining, Naive Bayes algorithm automatically analyzes the text information from the social networking site Twitter.
Chapter 7 digging drawing people find interesting, using cluster and network analysis, found that people interested in social media.
Chapter 8 crack the code using neural networks to extract information from the image, and then train the neural network to discover the image of words and letters.
Chapter 9 of the ownership, by extracting text feature, support vector machine algorithm.
Chapter 10 news corpus classification, using the k-means clustering algorithm, are classified according to the news article content.
Chapter 11 is classified into a method of learning the depth of the object image using a neural network algorithm to determine the depth of the object image.
Chapter 12 big data processing, data mining processes and methods to explore the large data.
Due to space restrictions, so small series did not introduce you, the needs of the [Python] and practice of data mining entry junior partner technical documentation, you can forward the concern small series, small series of private letters to "learn" to get way to get friends ~ ~ ~ ~
In the big data era of rapid expansion of the scale of the data, the core technology of data mining the screening of important data is playing an increasingly important role. It will give you solve practical problems "super powers": predict sports results, accurate and advertising, to solve the problem of attribution according to the style of work, and so on.
Benpian simple to learn and use third-party libraries with rich and good community atmosphere of the Python language, progressive approach to real data for the study, doing the actual introduction to Python implementation of data mining to the reader. By manual carefully readers will be entering data mining hall, a thorough understanding of the basics of data mining, data mining master the best practices to solve real problems!
Understand decision trees, naive Bayes, SVM and depth of learning
Use common data model algorithm to solve practical problems
Using the API to obtain data sets from sites such as Reddit
From the data set to identify and extract features
Use data sets to design and develop data mining applications
Based on real-time data, large data processing