Read "mathematical beauty series of 12 - the law of cosines and news of the category" Thoughts + code specifications

    Google News is automatically generated classification, but only know computer algorithm, we humans can not read the news. If artificially a news division will be a news unnecessary waste of human and physical. Thus, we designed an algorithm to help us use computers to automatically divide the huge number of daily news.

    Aspects algorithm involves much: TF-IDF algorithm, the law of cosines

    TF-IDF algorithm seen written in a more detailed, linked below:

https://blog.csdn.net/asialee_bird/article/details/81486700

In simple terms we can use this algorithm to become one of the news in the form of vector maps. The map is a vector form to facilitate computer that "no feelings" of child fast calculation.

    After assuming TF-IDF news into two mapping vectors b, c. According to the law of cosines, two vectors may be calculated as the cosine of the angle. We say, when the cosine of the angle close to 1, two similar news, classified as a class; when the cosine of the angle is smaller, the two news not related, are not classified as a class.

 

    Andrew Ng's "machine learning portal", also introduced knowledge classification, clustering. In fact, in some ways, clustering is also related to the law of cosines.

    Needless to say the definition of classification; cluster is to give you a set of data, identify its data structures.

    While clustering, we have all the data points for each as a class, is calculated using the law of cosines a "class" and the other "class" degree of correlation, the correlation degree of "subclasses" and to "category "repeated cycle, complete clustering.

    In fact, life exists everywhere classification, like us, see a man came from afar, in fact, this time we are already doing classified. Come across people is male or female? And your classification rules that he is nothing more than long hair, short hair? Wear skirts pants? This time, we have to carry out a practice unknown in their own classification rules classified. Computer as well. Just wait for the programmer to train it, teach it to the so-called rules of Bale.

    This semester wants a kind of algorithm: Try it yourself to achieve a clustering algorithm.

     Later on the code specifications to be followed:

https://www.cnblogs.com/yunliu0603/p/10042463.html

    

Guess you like

Origin www.cnblogs.com/lycsuper/p/11443658.html