The system lists several common ways docking technology

The most common way is docking system interface mode, under lucky circumstances, smooth docking, but the interface docking mode often takes a lot of time to coordinate the various software vendors.

Therefore, the current industry data silos everywhere, there is a docking business software or data acquisition software more difficult, especially data CS crawling software more difficult.

In addition to system interface, whether there are other ways, small series summed up the focus on common data collection technology for your reference, is divided into the following categories:

A, CS software data acquisition technology.

C / S structure software belong to the older architecture, this software can collect data products is relatively small.

Common blog software to help small robot , without the need for software vendors to cooperate, based on the data, "" WYSIWYG "way acquisition interface. The resulting output is a structured database or excel table. If only the needs of business data, the companies closed down or under difficult circumstances database analysis, this tool can collect data , especially the details page data acquisition function more features.

It is worth mentioning that the threshold for the use of this product is very low, there is no IT background, business students can use, greatly expand the use of the crowd.

 Second, network data acquisition API.

    通过网络爬虫和一些网站平台提供的公共API(如Twitter和新浪微博API)等方式从网站上获取数据。这样就可以将非结构化数据和半结构化数据的网页数据从网页中提取出来。

        The whole process of data acquisition and processing large Internet web comprises four main modules: web crawlers (Spider), data processing (Data Process), crawling URL queue (Queue URL) and data.

Third, database mode

Both systems have their own respective databases, between the same type of database is more convenient:
1) If the two databases on the same server, there is no problem as long as the user name set, you can directly access each other, if necessary, after from the name of the database schema and its owner to bring to the table.

2) If the two systems are not a database on the server, it is recommended that the form uses a linked server to handle or use openset and opendatasource way, this needs to be configured peripheral access to the database server.

The connection between the different types of databases more trouble, need to do a lot of settings to take effect, it is not described in detail here.

Open Database way software vendors need to coordinate the various Open Database great difficulty; a platform If you want to connect a number of database software vendors, and in real-time access to data, this performance platform itself is a huge challenge.

Rapid technological change and look forward to more discussion.

Guess you like

Origin blog.51cto.com/14441888/2423032