KDD data knowledge discovery and machine learning concepts

The data in the process of knowledge discovery

(1) Data cleansing: eliminate noise and delete inconsistent data.
(2) Data Integration: multiple data sources can be combined
(3) selected data: Data extraction and analysis related to the task from the database
(4) data conversion: by aggregating or aggregation operations, the data is converted into a suitable uniform and Mining the form
(5) data mining: basic steps, the use of intelligent methods to extract the data pattern.
(6) Mode Rating: based on a total interest of the measure, the really interesting pattern recognition on behalf of knowledge
(7) Knowledge Representation: Using visualization and knowledge representation technology, providing knowledge mining to the user.

Data Mining interesting mode (tap into knowledge) feature.

(1) easy to understand.
(2) in a certain degree of certainty, the new test data is valid.
(3) it is potentially useful.
(4) are novel.

An objective measure of the degree of interest mode

(1) X => Y is the association rules, an objective metric is based on the ruleSupport (support). Support rules expressed as a percentage of things that satisfy the rules database things share. Support can take the probability P (XUY), which contains the transaction XUY represents X and Y. Support (X => Y) = P (X u Y)
a further objective measure (2) association rule isConfidence (confidence), How sure the rules of his assessment found. Confidence can take conditional probability P (XIY), i.e. containing X also contain the probability of Y .confidence (X => Y) = P (YIX)

The concept of machine learning

Machine LearningExamine how computer-based data to learn (or improve their performance). One of its main applications is a computer program based on data automatically learn to recognize complex patterns and make intelligent decisions.

The nature of web search engines is the large-scale data mining applications

(1) Crawling: decisions should climb and crawl those pages frequency.
(2) Index: Select the page and determines the range of the index to build the index.
(3) search elements: deciding how to arrange the pages, advertising.

Published an original article · won praise 1 · views 65

Guess you like

Origin blog.csdn.net/qq_39621784/article/details/104043409