Personalized search engine system architecture design

Preface

The current development stage of personalized search is not to replace traditional search, but to supplement traditional search. Let's first look at its architecture as shown in Figure 2.2:

Insert picture description here

Figure 2.2 Diagram of personalized search architecture

Personalized search and personalized recommendation are relatively similar. This architecture diagram includes the coordination and mutual invocation of various subsystems or modules. From the organizational structure of the department, the current search is generally independent of groups, and some are in the search recommendation department. Inside, it is actually more reasonable to allocate to the big data department, because relying on the big data department's big data platform and artificial intelligence advantages can make the search effect to a new level. Let's talk about the details of the entire architecture process in detail from top to bottom of the architecture diagram.

1. Search data warehouse construction, data extraction part

(1) The Mysql business database related to search is incrementally extracted to the Hadoop platform every day. Of course, the full amount of initialization is required for the first time. The data conversion tool can be Sqoop, which can import data in distributed batches to Hive of Hadoop. ;

(2) Flume distributed log collection related to search can collect real-time search user behavior, buried point data, etc. from various Web servers, and can specify source and sink to directly transmit data to the Hadoop platform.

2. Hierarchical design and processing of big data platforms and search data marts

Building search-related data marts on the big data platform, hierarchical design, and recommendation are roughly the same.

3. Offline algorithm part

(1) The search index database is created based on the Spark platform distributed, and subsequent incremental indexes are generally updated asynchronously and in real time by means of message queues.

(2) Spark loads the feature data of user portraits and product portraits from Hadoop and trains the Rerank secondary reranking algorithm model based on the classification model to predict the probability of being clicked on the searched candidate products, because the feature engineering has added and user personalization The feature engineering, so the overall search ranking presents personalized characteristics. If you want to increase the degree of personalization, you can appropriately expand the searched candidate set.

(3) Part of the results of offline calculations can be updated to the online Redis cache, and online web services can obtain recommendation result data from Redis in real time for real-time recommendation.

4. Online Web search interface service

(1) Online Web search interface service, first obtain search results related to keywords from the Solr/ES search cluster as a candidate set, and then initialize the loaded Rerank secondary reranking model from the Web project to perform real-time hit rate prediction. Search results are reordered, and the specified previous search results are intercepted for display. This process will read part of the Redis cache data.

(2) App clients and websites can directly call online web search interface services to display search results in real time. Since personalized search is more complicated than ordinary search processing, it will be downloaded in terms of performance, but the overall search is within an acceptable range. Generally, a separate search area can be opened for display without replacing the previous traditional search.

From the perspective of the architecture, a complete personalized search involves a lot of technical frameworks, and the personalization factor also involves the user portrait system. The user portrait system can not only be used in recommendation and search, it is a company-level It will be used in general systems and operations promotion decisions. How to interface with the systems of other departments and adapt to multiple application scenarios at the same time requires us to structure and design a reasonable system. Let's look at the user portrait system architecture.

to sum up

In addition to personalized search engine system architecture design

Other deep learning frameworks also have good open source implementations, such as MXNet. Please pay attention to the charging app, courses, WeChat groups, and more content, please see the new book "Distributed Machine Learning Practice (Artificial Intelligence Science and Technology Series)"

[New book introduction]
"Distributed machine learning in practice" (artificial intelligence science and technology series) [edited by Chen Jinglei] [Tsinghua University Press]
Features of the new book: Explain the framework of distributed machine learning and its application supporting personalized recommendation algorithm system step by step , Face recognition, dialogue robots and other practical projects

[New book introduction video]
Distributed machine learning practice (artificial intelligence science and technology series) new book [Chen Jinglei]

Video features: focus on the introduction of new books, analysis of the latest cutting-edge technology hotspots, technical career planning suggestions! After listening to this lesson, you will have a brand new technological vision in the field of artificial intelligence! Career development will also have a clearer understanding!

[Excellent Course]
"Distributed Machine Learning Practical Combat" Big Data Artificial Intelligence AI Expert-level Excellent Course

[Free experience video]:

Artificial intelligence million annual salary growth route / from Python to the latest hot technology

From the beginner's introduction to Python programming with zero basic knowledge to the advanced practical series of artificial intelligence courses

Video features: This series of expert-level fine courses has a corresponding supporting book "Distributed Machine Learning Practical Combat". The fine courses and books can complement each other and complement each other, which greatly improves the learning efficiency. This series of courses and books take distributed machine learning as the main line, and give a detailed introduction to the big data technology it depends on. After that, it will focus on the current mainstream distributed machine learning frameworks and algorithms. This series of courses and books focus on actual combat. , Finally, I will talk about a few industrial-level system combat projects for everyone. The core content of the course includes Internet company big data and artificial intelligence, big data algorithm system architecture, big data foundation, Python programming, Java programming, Scala programming, Docker container, Mahout distributed machine learning platform, Spark distributed machine learning platform, Distributed deep learning framework and neural network algorithm, natural language processing algorithm, industrial-grade complete system combat (recommended algorithm system combat, face recognition combat, dialogue robot combat), employment/interview skills/career planning/promotion guidance, etc. .

[Is it charged? Company introduction]

Rechargeable App is an online education platform focusing on vocational training and learning for office workers.

Focus on the improvement and learning of work vocational skills, improve work efficiency, and bring economic benefits! Are you charging today?

Is it charging?
http://www.chongdianleme.com/

Is it charging? App official website download address
https://a.app.qq.com/o/simple.jsp?pkgname=com.charged.app

Features are as follows:

【Full Industry Positions】-Focus on improving the vocational skills of office workers

Covering all industries and positions, whether you are an office worker, executive or entrepreneur, there are videos and articles you want to learn. Among them, big data intelligent AI, blockchain, and deep learning are the practical experience of the Internet's first-line industrial level.

In addition to professional skills learning, there are general workplace skills, such as corporate management, equity incentives and design, career planning, social etiquette, communication skills, presentation skills, meeting skills, emailing skills, how to relax work pressure, personal connections, etc. Improve your professional level and overall quality in all aspects.

【Niuren Classroom】-Learn the work experience of Niuren

1. Intelligent personalization engine:

Massive video courses, covering all industries and all positions, through the skill word preference mining analysis of different industries and positions, intelligently matching the skill learning courses that you are most interested in for the current position.

2. Search the whole network

Enter keywords to search for massive video courses, there are everything, there is always a course suitable for you.

3. Details of listening to the class

Video playback details, in addition to playing the current video, there are also related video courses and article reading, which strengthens a certain skill knowledge point, allowing you to easily become a senior expert in a certain field.

【Excellent Reading】-Interesting reading of skill articles

1. Personalized reading engine:

Tens of millions of articles to read, covering all industries and all positions, through the skill word preference mining analysis of positions in different industries, intelligently matching the skills learning articles you are most interested in in your current position.

2. Read the whole network search

Enter keywords to search for a large number of articles to read, everything is available, there are always skills learning articles you are interested in.

[Robot Teacher]-Personally enhance fun learning

Based on the search engine and intelligent deep learning training, we will create a robot teacher who understands you better, chat and learn with the robot teacher in natural language, entertaining and learning, efficient learning, and happy life.

【Short Course】-Learn knowledge efficiently

Massive short courses to satisfy your time fragmented learning and quickly improve a certain skill knowledge point.

Guess you like

Origin blog.51cto.com/15012355/2554355