Google新一代实时搜索系统的核心机制

编程语言 2018-05-14 07:10:11 阅读次数: 2

转自：人云亦云

最近，Google发布一篇关于其新一代实时搜索系统核心机制的论文《Large-scale Incremental Processing Using Distributed Transactions and Notifications》，在这篇论文中介绍名为“Percolator”的一个基于BigTable的系统，在功能上其非常类似传统数据库的触发器（Trigger），但是在伸缩性方面有其独到的设计，下面是其摘要、下载地址和相关文章等。

摘要

Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput requirements of these tasks: Google’s indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency.

We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.

下载地址（liuxinglanyue注：墙）

相关文章

Google’s Colossus Makes Search Real-Time By Dumping MapReduce

猜你喜欢

转载自liuxinglanyue.iteye.com/blog/847532

Google新一代实时搜索系统的核心机制

Netdata 新一代实时监控系统（1）

Netdata 新一代实时监控系统（4）

Netdata 新一代实时监控系统（3）

Netdata 新一代实时监控系统（2）

Google 新一代音乐识别

新一代搜索引擎ZeroSearch, Google分布式论文

google官宣：新一代操作系统Fuchsia编程语言竟然是它！

Google新一代搜索引擎测试地址http://www2.sandbox.google.com

浙江首例！金华银行基于OceanBase构建新一代核心系统

基于任意深度学习+树状全库搜索的新一代推荐系统

MIMO：新一代移动通信核心技术

新一代防泄密系统即将发布

什么是新一代智能拓客系统？

方正证券：新一代认证核心系统换代升级，坚持实践金融科技全栈自主可控

全球第一！新一代云原生实时数仓 SelectDB 登顶 ClickBench 榜单！

Google FlatBuffers——开源、跨平台的新一代序列化工具

Google为其TPU机器学习硬件宣布了新一代产品

TOP100summit：【分享实录】Twitter 新一代实时计算平台Heron

Table Store新一代数据实时消费通道：Tunnel Service介绍

实时数仓Hologres新一代弹性计算组实例技术揭秘

【直播预约中】腾讯大数据 x StarRocks｜构建新一代实时湖仓

腾讯大数据 x StarRocks｜构建新一代实时湖仓

Flink CDC 3.0 正式发布，详细解读新一代实时数据集成框架

Flink中的一些核心概念，深度剖析新一代Flink计算引擎

TANBOB新一代网络模型

新一代动态防御技术

新一代网络技术

新一代的json--fetch

.NET Core 新一代缓存

今日推荐

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

开源日报 | 中学生开源前端动画引擎；全球首个Llama3 8B中文版开源模型；联想电脑恐出局；Linus讽刺AI炒作

“百模大战”必有一战 | 2024中国“百模大战”竞争格局分析

周排行

Family Tree 题解

BZOJ 1093 最大半连通子图 SCC + DP

幂等处理

Spring----学习（2）----XML 配置Bean 自动装配

SQL Server 远程更新目标表数据

HIbernate3.6 环境搭建

特殊符号正则表达式

【Linux】第一章进程的理解

843. n-皇后问题（dfs+输出各种情况）

空间数据库2

每日归档

更多

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)

2024-04-17(5)