CSDN Personalized Recommendation System - Negative Feedback Test

Article directory

foreword

CSDN Personalized Recommendation System - Negative Feedback Test

Hello everyone, I am Kongkong star, and I will share this article with you 《CSDN个性化推荐系统-负反馈测试》.

1. uc is not interested in label filtering test

User: weixin_38093452

1. uc is not interested in tag acquisition (uc_unlike_tag_list)

1.1 Personal center interface

1.2 What can be found from the label?

  • Labels have primary labels and secondary labels
  • Labels are not all lowercase, some letters are uppercase
  • The same label appears in both interested and not interested

1.3 Confirmation points with R&D

  • Will the primary label and the secondary label be used separately on the filtering side or only the secondary label is used?
  • When the obtained uc tags are matched with the tags of recommended stream articles, is there a uniform conversion to lowercase or uppercase before matching?
  • When the same tag appears in both interested and uninterested, is the priority of blocking higher than that of enhancement?
  • After the uc tag is maintained, when will the recommendation flow take effect?

1.4 Design and Development

def get_uc_tag_list(username, interest_type):
    uc_tag_list = []
    # 获取uc不感兴趣标签
    url = f'http://xxx.csdn.net/user_csdn_tag/get_tag_list?username={
      
      username}&type={
      
      interest_type}'
    response = requests.get(url)
    res = response.json()
    for i in res['data']:
        name = i['name']
        for tag in i['tags']:
            if tag['select']:
                # 添加小写后的一级标签
                uc_tag_list.append(name.lower())
                # 添加小写后的二级标签
                uc_tag_list.append(tag['name'].lower())
    return uc_tag_list

1.5 interface to get the result

unlike_uc_tag_list = get_uc_tag_list('weixin_38093452', 1)
print(unlike_uc_tag_list)

['python', 'pillow', 'java', 'java', 'programming language', 'php', 'big data', 'odps', 'big data', 'big data', 'artificial intelligence', 'artificial intelligence', 'aigc', 'chatgpt']

2. Acquisition of recommended stream article tags (tag_list)

User: weixin_38093452
requested recommended interfaces 50 times

2.1 Part of the code

    items = result['data']['items']
    for data in items:
        # 过滤掉nps
        if 'username' in data:
            tags = data["tags"]
            temp_tag = []
            for tag in tags:
                # 用来判断一个item中返回的标签是否重复(ml和title合并,只保留ml)
                temp_tag.append(tag['name'])
                # 多次请求后,用来跟uc的不感兴趣标签/负反馈标签做对比
                tag_list.append(tag['name'])
                # 多次请求后,用来判断ml标签和title标签是否有返回
                tag_type_list.append(tag['type'])
                if tag['name'] == '':
                    print(f"存在空标签,博客:{
      
      data['product_id']},标签类型:{
      
      tag['type']},标签名称:{
      
      tag['name']}")
            if len(temp_tag) != len(set(temp_tag)):
                print(f"同一篇博客{
      
      data['product_id']}存在重复标签:{
      
      temp_tag}")
if len(set(tag_type_list)) == 2:
    print(f'返回文章标签类型:{
      
      set(tag_type_list)}')

2.2 Basic label verification

  • Whether both the ml tag and the title tag are returned;
  • Whether there is an empty label structure to return;
  • Whether the same blog returns duplicate tags;

2.3 Basic label verification results

Return article tag type: {'ml', 'title'}

3. Recommendation flow uc is not interested in label filtering verification

3.1 Verification method

  1. Get the user's uc uninteresting tag list (uc_unlike_tag_list);
  2. Obtain the tag list (tag_list) of the article returned by the user's 50 requests for recommendation stream;
  3. Find the intersection of uc_unlike_tag_list and tag_list.

3.2 Part of the code

    print(f'【推荐返回的标签】:{
      
      set(tag_list)}')
    unlike_uc_tag_list = get_uc_tag_list(username, '1')
    print(f'【uc-不感兴趣标签】:{
      
      set(unlike_uc_tag_list)}')
    intersection_tag = list(set(tag_list).intersection(set(unlike_uc_tag_list)))
    print(f'【uc-不感兴趣标签过滤】验证结果,标签交集:{
      
      intersection_tag}')

3.3 Check result

3.4 Result Analysis

The tag intersection is empty, indicating that there is no badcase;
maintain the high-frequency tags in [recommended returned tags] to the uc not interested tags, execute our script again to observe the verification results, and combine the server uc not interested in filtering logs, multiple times The execution label intersection is still empty, indicating that the UC uninterested label takes effect in the recommendation flow filtering.

4. User scenario regression

Make sure that the recommended interface has return data and the return data is normal.

  • Logged in user (unmaintained uc not interested tag)
  • Not logged in user

2. User recommendation flow negative feedback filtering test

1. Content Negative Feedback

1.1 Submit API verification

Negative feedback item (directive):

  • duplicate: Duplicate content recommendation
  • content poor quality: content quality is low
  • advertising: Exaggerated content, involving advertisements, etc.

Resource type (type):

  • blog
  • ask
  • blink
  • live

1.2 Obtain API verification

  • last_unlike_time: Whether the negative feedback operation timestamp record is correct;
  • num: Whether the record of negative feedback submission times is correct;
  • directive: Whether the record of negative feedback items is correct;
  • Whether to use the two fields of type and item_id as the unique key.

1.3 Filter verification

1.3.1 Get content negative feedback resource list (negative_item_list)

def get_negative_item_list(username):
    negative_item_list = []
    url = f'http://xxx.csdn.net/api/v2/recommend/insight/negative/items/by/{
      
      username}'
    response = requests.get(url)
    res = response.json()
    pprint.pprint(res)
    for i in res['result']['duplicate']:
        if 'object_id' in i.keys():
            negative_item_list.append(i['type']+':'+i['object_id'])
    for j in res['result']['content poor quality']:
        if 'object_id' in j.keys():
            negative_item_list.append(j['type']+':'+j['object_id'])
    for k in res['result']['advertising']:
        if 'object_id' in k.keys():
            negative_item_list.append(k['type']+':'+k['object_id'])
    return negative_item_list

1.3.2 Obtain the recommended stream resource list (item_list)

item_list.append(data['product_type']+':'+data['product_id'])

1.3.3 Find the intersection of item_list and negative_item_list

    print(f'【推荐返回的item】:{
      
      set(item_list)}')
    negative_item_list = get_negative_item_list(username)
    print(f'【内容负反馈】:{
      
      set(negative_item_list)}')
    negative_intersection_item = list(set(item_list).intersection(set(negative_item_list)))
    print(f'【内容负反馈过滤】验证结果,item交集:{
      
      negative_intersection_item}')

1.3.4 Intersection Results

1.3.5 Result analysis

If the intersection is empty, it means that there is no badcase;
write the partial resource list returned by the recommendation result into the content negative feedback through the submission API, and then find the intersection. If the intersection is empty after multiple executions, it means that the content negative feedback takes effect in the recommendation stream filtering.

2. Label negative feedback

2.1 Submit API verification

directive:

  • reduce: reduce
  • block: block

2.2 Get API verification

  • tag: Whether the tag is uniformly converted to lowercase;
  • last_unlike_time: Whether the negative feedback operation timestamp record is correct;
  • num: Whether the record of the number of negative feedback submissions is correct.

2.3 Filter verification

2.3.1 Get tag negative feedback tag list (negative_tag_list)

def get_negative_tag_list(username):
    negative_tag_list = []
    url = f'http://xxx.csdn.net/api/v2/recommend/insight/negative/tags/by/{
      
      username}'
    response = requests.get(url)
    res = response.json()
    for i in res['result']:
        negative_tag_list.append(i['tag'].lower())
    return negative_tag_list

2.3.2 Get the recommended stream tag list (tag_list)

tag_list.append(tag['name'])

2.3.3 Find the intersection of tag_list and negative_tag_list

    negative_tag_list = get_negative_tag_list(username)
    print(f'【减少xx相似内容推荐】:{
      
      set(negative_tag_list)}')
    negative_intersection_tag = list(set(tag_list).intersection(set(negative_tag_list)))
    print(f'【减少xx相似内容推荐过滤】验证结果,标签交集:{
      
      negative_intersection_tag}')

2.3.4 Intersection Results

2.3.5 Result analysis

If the intersection is empty, it means that there is no badcase;
write the tag list returned by the recommendation result into the tag negative feedback through the submission API, and then find the intersection. If the intersection is empty after multiple executions, it means that the tag negative feedback takes effect in the recommendation flow filtering.

3. Author Negative Feedback

3.1 Submit API verification

  • unlike_user_id case scenario

3.2 Get API verification

  • Whether the author is converted to lowercase;
  • last_unlike_time: Whether the negative feedback operation timestamp record is correct;
  • num: Whether the record of the number of negative feedback submissions is correct.

3.3 Filter verification

3.3.1 Obtain the author's negative feedback author list (negative_user_list)

def get_negative_user_list(username):
    negative_user_list = []
    url = f'http://xxx.csdn.net/api/v2/recommend/insight/negative/users/by/{
      
      username}'
    response = requests.get(url)
    res = response.json()
    for i in res['result']:
        negative_user_list.append(i.lower())
    return negative_user_list

3.3.2 Obtain the list of recommended stream authors (user_list)

user_list.append(data['username'])

3.3.3 Find the intersection of user_list and negative_user_list

    print(f'【推荐返回的作者】:{
      
      set(user_list)}')
    negative_user_list = get_negative_user_list(username)
    print(f'【不看此作者】:{
      
      set(negative_user_list)}')
    negative_intersection_user = list(set(user_list).intersection(set(negative_user_list)))
    print(f'【不看此作者过滤】验证结果,作者交集:{
      
      negative_intersection_user}')

3.3.4 Intersection Results

3.3.5 Result Analysis

The intersection is empty, indicating that there is no badcase;
the author list returned by the recommendation result is written into the author's negative feedback through the submission API, and then the intersection is obtained. The intersection is empty after multiple executions, indicating that the author's negative feedback takes effect in the recommendation stream filtering.

3. Negative feedback filtering test for non-logged-in users

1. Submit API verification

PC transmits uuid, app transmits device_id

2. Obtain API verification

  • The value of the imei field is not case-sensitive;
  • The negative feedback data can be correctly obtained according to the imei field value.

3. Filter verification

Similar to the verification process of logged-in users, only the input parameters of the request recommendation interface have been adjusted.

4. Overall regression

  • recall policy verification;
  • recall resource type validation;
  • Scenario verification of logged-out users/logged-in users (whether to maintain uc interest tags/whether to maintain identity tags);
  • Verification of resource type distribution in a single request result;
  • Verification of the proportion of duplicate authors in the result of a single request;
  • Repeated resource verification appears in the result of a single request;
  • The resource verification of the same author appears continuously in the result of a single request;
  • Other channel data verification (follow streams, city streams, blink popular streams, recommended user lists, etc.)
  • Multi-terminal verification (app/wap/pc/applet, etc.)

5. Comparison of new and old negative feedback written into hbase

old negative feedback new negative feedback
process User submits negative feedback -> report to sls -> flink consumes sls -> udf processing writes to hbase User submits negative feedback -> directly writes to hbase via API
response Flink tasks are often delayed/migrated to Huawei Cloud and other factors, and need to be refactored real time
copywriting The copywriting is not uniform, such as [repeated content recommendation] on the wap side is called [old news, repeat], [content quality is low] is called [poor content quality], and [do not read this author] is called [do not like this author] All texts have been unified
data The negative feedback data of non-blog resources is not properly parsed, resulting in incorrect writing to hbase Already compatible with different resource types

References
[1] "How to support R&D on the reconstruction of CSDN personalized recommendation system"
[2] "Design and evolution of CSDN personalized recommendation system"
[3] "Data governance of CSDN personalized recommendation"


Guess you like

Origin blog.csdn.net/weixin_38093452/article/details/131374201