[Big data] to build personalized recommendation engine system

What is RecEng:

The recommendation engine (Recommendation Engine, referred RecEng) is a referral service established under the framework of Ali cloud computing environment for real-time prediction of user preference items, support your customized recommendation algorithms, support for A / B Test results comparison. Ali cloud recommendation engine, data-driven business, through artificial intelligence to achieve 1-to-1 marketing, to provide tailor-made services for your customers, helping enterprises rapid innovation. While reducing your operating costs, improve customer satisfaction and company loyalty, enhance corporate business objectives.

Course details: build a personalized recommendation engine system

This certification system to explain the concept, application, principle recommendation system algorithm, and introduced Ali recommendation engine products RecEng detail, and finally through a micro-project allows students hands to build a recommendation system.

The whole process is divided into data upload, data preprocessing, the recommendation system is provided, test line four parts, with reference to the participants can present experiment, and with their own business requirements, the application will learn to practice.

Through this case, students can understand the concept, application, recommendation system and algorithm principle to use Ali's recommendation engine products RecEng. Through hands-on practice, students can be used independently recommendation engine to quickly build enterprise products RecEng recommendation system.

Recommended basic concept engine:

  • Client / Tenant (org / tenant)

RecEng refers to the user, the system by Ali cloud account representative. Often customers is an organization, RecEng commonly used org represents the client.

  • User (user)

It refers to the customer's user, that user RecEng users. 2C is a recommendation service, the customer using the recommended service must have its own user, RecEng user user referred to as "user", systems commonly used user represents the user.

  • Items (item)

Refers to recommend content to the user, can be a commodity, something else may be a song, video, systems commonly used item represents the items.

  • Business (biz)

Business for the data set definition, define the range of data that can be used in the algorithm. A customer can have multiple RecEng on business, different business must have different sets of data. RecEng requires each business to provide four types of data (not required to provide all): user data, item data, user behavior data, performance data is recommended. Each such group constitutes a data service. Systems commonly used biz represent the business.

There are two types such as a customer A recommended items are songs and video, then the customer service can create two A M and N on RecEng, wherein M is a video data item, the data item is a song N, the other data (refers to the user data, user behavior data, etc.) may all be the same. In this embodiment, M and N data services are independent, i.e., the user can see the operations for the M behavior although the song, but the song is not included in the business M items of data, it will discard the user behavior for the song; if M business users only in a song behavior, not the behavior of the video, business M will discard these users. Conversely business N versa.

A best business items are only a recommendation. Recommended many types of items will be supported in subsequent industry templates, need to introduce the concept of plate (plate), and a data service can generate multiple sector datasets scene is tied to recommend a plate algorithm.

  • Scene (scn)

It refers to the recommended scene context, each will output a scene the API, the scene is determined by the parameters available at the time recommended. There are two most common scenarios, which are recommended Home page recommendation scene scenes and details. As the name suggests, in the implementation of the recommended home, only user parameters available information; details page and in the implementation of the recommended parameters available in addition to user information, including information on the current item details page of the show. Systems commonly used scn represented scene.

A business can contain multiple scenes, for a business that is A, which contains more home scene is totally acceptable.

In fact, return to the scene of the original definition, the scene only, customers can establish a recommended decision to the context according to their needs new scenes, such as the recommended scene for the search keyword, then the parameters available in addition to user information, as well keywords entered by the user.

  • Process (flow)

Refers to-end data flow of the algorithm processing flow, part of a process belonging to traffic category, such as the import process data, the effect calculation process, calculation of mass flow data; part belongs to a scene, such a scene the algorithm flow. From a data source type and capacity out division, it is divided into an offline process, near-line process, online process

1. Process offline

In general, the input and output processes are offline MaxCompute (formerly ODPs) table, so that in fact the offline data specification format specification is a set of tables MaxCompute, including access to the data, the intermediate data format specification data and output data of the three . Access refers to the user data, items, such as log data provided by the client is offline, the intermediate data generated in various intermediate properties result in off-line algorithm processes the data table, output data refers to the data table recommendation result, the final result will be introduced into the online storage, for online calculation module.

2. The near-line process

Near-line process of the primary processing recommendation engine user behavior changes, updates occur when the recommended items, offline recommended to update the results. Unlike the off-line algorithm, to natural MaxCompute (original ODPs) table as input and output, the input data may be near-line program from a plurality of data sources, such as the online storage table (formerly the OTS), and the API request user, or is variable in the program; can be output program variables, or write back online store, or returned to the user. For security reasons, the recommendation engine provides a set of aliases and format of each type of SDK online store for customers to read and write custom code online online storage (Table Store), it does not allow direct access, it needs to be defined. For online data requires frequent use, API regardless of their online store, or from the user's request, RecEng will read well in advance, stored in a variable line of the program, customers can read and write custom code data of these variables directly.

3. Online Process

When the recommendation engine online process responsible for the task of receiving the recommended API API request, the recommendation result is offline and real-time correction generated by the near-line filter, duplication, and the like make up treatment; latter process user behavior changes, recommendation items when the update occurs, the result is recommended to update offline

A scene contains only a near-line off-line process and a process line may comprise a plurality of processes for supporting A / BTest.

  • Algorithmic strategies (Algorithm Strategy)

Algorithm policy defines a set of off-line / near-line process. And revealed a correlation algorithm parameters to help customers build their own algorithm flow. A scene can configure multiple algorithmic strategies will eventually run concurrently, recommended a series of candidate sets and output filter sets, the online process to complete these personalized recommendation by reference candidate set.

  • Job / task (task)

Working relationship refers to the relationship offline processes running instances, jobs, and off-line processes exactly the same processes and procedures. Each job is not reentrant, i.e., for each off-line process, at the same time allowed to run only one instance. There is a direct relationship between the upstream and downstream operations, if the job fails upstream, downstream task will be canceled.

More quality technical courses:

Ali cloud university's official website ( Ali Cloud University - Official website, creative talents under a cloud ecology workshop )


Reproduced in: https: //juejin.im/post/5cef8df8f265da1b897ab440

Guess you like

Origin blog.csdn.net/weixin_34115824/article/details/91433238