Principle of Magnetic Search Engine

Crawl the hash value (magnet link address) of the seed file through the DHT protocol

   DHT (Distributed Hash Table) is similar to Tracker's network that returns seed information based on seed signatures. The full name of DHT is Distributed Hash Table, which is a distributed storage method. Without a server, each client is responsible for a small range of routing and for storing a small part of data, so as to realize the addressing and storage of the entire DHT network. The new version of BitComet allows peers to connect to the DHT network and Tracker, which means that it can be downloaded very well without connecting to the Tracker server at all, because it can find other users who download the same file on the DHT network. By joining the DHT network, millions of hash values ​​can be easily obtained every day, and each hash value is generated by a BT seed file.

 

Download torrent files

  In fact, you can know the magnetic link through the hash, but only these are meaningless, because you don’t know what the file name of the link is. When downloading tools like Thunder, Tornado and others download these seeds through the magnetic force, The download tool will search for the seed library of the internal server. Through the hash value, you can search for the location of the BT seed on the server, and then download the seed. In other words, if you use a certain download tool to download a torrent file through the magnet link magnet, the download will not be guaranteed every time. Seeds can also be downloaded through dht, but the speed is extremely slow and almost unacceptable. There are several seed warehouse websites like  so123.pw  and torque.com , which can provide great convenience for planting.

 

Analyze BT seed files

  Extract the BT seed file name, file size, creation date and other summary information, and calculate the hash value through the BT seed file (hehe, with this, there is a legendary magnetic link). This part of the work is relatively easy, just need to have a detailed understanding of the seed file format, there are a lot of related documents on the Internet.

 

The principle of magnetic link

The magnetic link is composed of a set of parameters, the order of the parameters is not particular, and the format is the same as the query string at the end of the HTTP link. It is usually a URN formed by the value of the content hash function of a specific file, for example: magnet:?xt=urn:btih:4D9FA761D69964B00DF0B3B0C9C1F968EA6C47D0&xt=urn:ed2k:7655dbacff9395e579c4c9cb49cbec0e&dns 3a80%2fannounce&tr=udp%3a%2f%2ftracker.publicbt.com%3a80%2fannounce&ws=http%3a%2f%2fdistribution.bbb3d.renderfarming.net%2fvideo%2fmp4%2fbbb_sunflower_2160p_30fps_stereo_abl.
Although this link points to a specific file The client application must still search to determine where.

The definition of magnetic link parameters is as follows:

magnet : Protocol name.

xt: The abbreviation of exact topic, the uniform resource name that contains the hash value of the file. BTIH (BitTorrent Info Hash) represents the name of the hash method, and ED2K, AICH, SHA1 and MD5 can also be used here. This value is the identifier of the file and is indispensable.

dn: The abbreviation of display name, which represents the file name displayed to the user. This item is optional.

tr: short for tracker, which means the address of the tracker server. This item is also optional.

ws:  short for webseed, which means web seed.

urn: (Uniform Resource Name, URN represents the resource name

btih: BitTorrent info hash, seed hash function

The experimental parameters defined by the application must start with "x.". 

The standard also suggests that multiple parameters of the same kind can be used by adding ".1", ".2", etc. after the parameter name, for example:
magnet:?xt.1=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C&xt.2=urn:sha1:TXGCZQTH26NL6OUQAJJPFALHG2LTGBGB

 

Save the BT seed file summary information into the database

  The most basic fields of the database include file name, file list, file size, creation time, index time, hash value, etc. However, due to the large number of files, performance issues need to be considered when designing the database.

 

Build a search index for the database

  This can use any open source search engine (such as lucence, sphinx, etc.). The process of using it is not complicated, but it requires a basic understanding of the working mechanism of search engines.

 

Build a website

  Use php as the front-end page. At present, the website has collected more than 20 million resources, including almost all movies (the latest movies will be included in the first time), and there are also a lot of music, software and other resources. The screenshots of the website are as follows:

 

Related Reading:

Back-end technology talk 2: How search engines work 

https://blog.csdn.net/a724888/article/details/80993346

What is the technical principle of BT seeds? How to understand .torrent files?

https://blog.csdn.net/Jailman/article/details/86016870

Use Python crawler to build your own magnetic search engine 

https://blog.csdn.net/verygoodo/article/details/101025542

Magnet link 

https://baike.baidu.com/item/%E7%A3%81%E5%8A%9B%E9%93%BE%E6%8E%A5/5867775?fr=aladdin

Guess you like

Origin blog.csdn.net/wellse/article/details/106569986