Exploring Google's Big Data Technology Architecture

Original address: https://blog.csdn.net/bingdata123/article/details/79927507

Google is the founder of the big data era, and its big data technology architecture has always been the focus of Internet companies' learning and research , and it is also the benchmark and demonstration of the industry's big data technology architecture.


1. Google's data centers
  Google has built the world's fastest, most powerful, and highest-quality data centers. Its eight major data centers are located far from its headquarters in Mountain View, California, and are located in South Carolina, USA. Berkeley County, Council Bluffs, Iowa, Douglas County, Georgia, Mays County, Oklahoma, Lenoir, North Carolina, Dalles, OR; 2 others in the U.S. Outside, they are Hamina in Finland and St. Ghislain in Belgium. In addition, Google has established data centers in Hong Kong and Taiwan, as well as in Singapore and Chile.
2. Google's new-generation search engine platform and the core technology of big data analysis
Google is the creator of GFS MapReduce BigTable, but Google's new-generation search engine platform is gradually replacing the original system with a system with stronger computing power. There are several core technology systems:
  First, the MapReduce batch index system is replaced by the incremental processing index system based on Percolator. This index system is called Caffeine, which is faster than the MapReduce batch index system to search.

The second is Colossus, a distributed storage specially designed for BigTable, also known as GFS2 (second-generation Google File System), which is designed for the establishment of a Caffeine search indexing system.
  The third is the column storage database BigTable, but in order to better support the interactive analysis of large data sets, Google has launched Dremel and PowerDrill. Dremel is designed to manage very large numbers of large datasets (meaning that the number of datasets and the size of each dataset are large), while PowerDrill is designed to analyze a small number of large datasets (meaning that the size of the datasets is large, but the data provides more powerful analysis performance when the number of sets is small).
  The fourth is the real-time search engine storage and analysis architecture that provides services for Google Instant.
  The fifth is Pregel, which is Google's faster network and graph algorithm.
  On Google's new-generation search engine platform, 4 billion hours of video per month, 425 million Gmail users, and 150,000,000 GB web index can achieve results in 0.25 seconds.
3. Google Basic Cloud Services
  Based on Colossus, Google provides users with cloud services for computing, storage and applications. Computing services include computing engines (ComputeEngine) and application engines (AppEngine); storage services include cloud storage (CloudStorge), cloud SQL (CLoudSQL), cloud data storage (Cloud DataStore), persistent disks and other services; cloud application services include BigQuery, Cloud Endpoints, buffering, queues, etc.
4. Google's big data intelligent application service
Google's big data analysis intelligent applications include customer sentiment analysis, transaction risk (fraud analysis), product recommendation, message routing, diagnosis, customer churn prediction, legal copy classification, email content filtering, Political preference prediction, species identification, etc. It is said that big data has brought Google $23 million in revenue every day. For example, some typical applications are as follows:
  (1) Based on Map Reduce, Google's traditional applications include data storage, data analysis, log analysis, search quality, and other data analysis applications.
  (2) Based on the Dremel system, Google launched its powerful data analysis software and service—BigQuery, which is also a part of the Internet retrieval service used by Google itself. Google has started selling online data analysis services in an attempt to compete with enterprise cloud computing services like Amazon Web Services. This service can help enterprise users complete terabyte scans in seconds.
  (3) Based on search statistical algorithms, Google has launched services such as input error correction and statistical machine translation for search engines.
  (4) Google's trend map application. Through the user's attention to search terms, it is possible to quickly understand what is hot in the society. For advertisers, its commercial value is to quickly know what users care about and where they should put an advertisement. Accordingly, Google has also developed some big data products, such as "Brand Lift in Adwords", "Active GRP", etc., to help advertisers analyze and evaluate the efficiency of their advertising campaigns.
  (5) Google Instant. The process of entering keywords, Google Instant will predict possible search results while typing.
  Google's big data platform architecture is still evolving, with the goal of pursuing larger data sets, faster and more accurate analysis and calculations. This will further lead the direction of big data technology development.

Bingdata helps aggregate massive data collected from multiple platforms, and provides enterprises with intelligent data analysis, operation optimization, delivery decision-making, precision marketing, competitive product analysis and other integrated marketing services through the analysis and prediction capabilities of big data technology.

Beijing Youwangzhubang Information Technology Co., Ltd. (referred to as Youwangzhubang) is a big data company based on big data and intelligently applied to integrated marketing. It is affiliated to Hengtong Group. Bingdata is its brand. Youwang's help team is mainly from Ali, Tencent, Baidu, Kingsoft, Sohu, and mobile, telecom, China Unicom, Huawei, Ericsson and other well-known companies in technology. Analysis provides strong technical support.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326061649&siteId=291194637