Machine learning reading notes (b)

Data Mining and Machine Learning

  • Data Mining : identifying a number of patterns in the data, the process of extracting useful information from these modes. Data mining may have been identified based on real data relationships.
    Machine Learning : the development of artificial intelligence method. Machine learning algorithms for the development of new algorithms and techniques, so that the machine itself can learn from analysis of data or experience.

  • SpamAssassin for spam detection, capable of learning from experience and sample data (that is, generated email database) analysis.
    1. Perform different type of test
    2. Use a variety of content for e-mail headings and statistical algorithms
    3. Optimization Module Development

SpamAssassin feature

  • Common features
    1. based on machine learning technology, experience and performance with an increase of
    2 for a wide range of local and network e-mail test
    3. Most of the code is stored in a plain text file
    4 can be downloaded from the Internet, accessible
    5. contain an API interface, the application can simply use its built-in classes and functions
    6. easy to configure

  • Works : Calculation of inbound e-mail weights
    weights> → Spam certain threshold
    weight ≤ certain threshold → valid e-mail

  • More samples of the test data, the higher the accuracy of recognition, will automatically generate new rules when testing a large number of samples
    The main test mode:
    1. Test header
    2. Test phrase message body
    3. The Bayesian filter
    4. Automated address blacklist whitelist
    address whitelist blacklist set manually 5.
    6. collaborative spam identification database
    7.DNS Blocklists
    8. the localized character set

  • In the energy industry, machine learning commonly used in the test load demand, can also predict the next day's price. Load demand forecast results can be integrated with the energy distribution system, as an additional constraint to ensure smooth supply of energy. By adding a penalty term (i.e., plus or minus a certain value, the balance deviation) to prevent deviation of the energy consumption of a particular node, to improve the accuracy of the system.

  • Knowledge detection point 2
    below which a SpamAssassin characteristics make it easy to use with other applications?
    A. Based on machine learning techniques
    b. Is suitable for a wide range of local and network messages tests to identify spam
    c. Comprising API interfaces
    d. Easily configuration

Released five original articles · won praise 0 · Views 122

Guess you like

Origin blog.csdn.net/weixin_45058912/article/details/104250341