Average daily log ten billion deal with: micro-bogey on the real-time computing platform Flink construction

Author: microblogging advertisement data platform

With the rapid expansion of business lines microblogging, microblogging number of all types of advertising business journal also will increase dramatically. Traditional offline data storage computing solutions based on Hadoop ecology has formed a unified understanding in the industry, but the timeliness of the subject to the constraints of offline computing, more and more data from offline into real-time scenarios. Weibo advertising platform real-time data as a background design and construction, currently the system has to support the number of daily logs more than 10 billion line of access products, business types log number.

A Technical Selection

Compared to the Spark, Spark current ecological overall number of more sophisticated and integrated in machine learning and application of lead. But as the next generation of big data engine -Flink have a strong contender in the flow calculation obvious advantages, Flink belong to a single process in the true sense of the flow calculation, every piece of data will trigger calculation, rather than as the Spark Mini Batch streamed as a compromise. Flink fault tolerance is more lightweight, less impact on throughput, and has a map and


Guess you like

Origin yq.aliyun.com/articles/723877