Article Directory
github address: https: //github.com/iwtbs/user_searchquery_analyse
Overall structure
Direct look at the code
#python get_novel_info_from_feed_monitor.py ./data/novel_info.txt
#python get_video_info_from_video_film.py ./data/video_info.txt
#python get_star_info_from_video_film.py ./data/star_info.txt
#python stat_searchquery_times.py ./data/mid_searchquerys_20190331_31 ./data/searchquery_times
##############
#python analyse_searchquery.py ./data/novel_info.txt ./data/video_info.txt ./data/game_info.txt ./data/qingse_keyword.txt ./data/searchquery_times ./data/searchquery_times_analyse
#python stat_entity_searchquerynumber_searchquerytimes.py ./data/searchquery_times_analyse ./data/entity_searchquerynumber_searchquerytimes
#python cal_mid_entity_info.py ./data/searchquery_times_analyse ./data/mid_searchquerys_20190331_31 ./data/mid_searchquerys_entitys
Division reported
- Get novel_info.txt, video_info.txt, star_info.txt
novel_info.txt: acquired from mysql, title + Hot
video_info.txt: acquired from mysql, Dockey DOCTYPE + + + name + hit_count alias_name + Serial + alais_serial
star_info.txt: acquired from mysql , star_id + name + alias_name + hit_count
- Stat_searchquery_times.py number of statistics for each search term, and sort the input file mid + searchquery + times
searchquery = items3[0]
times = int(items3[2])
searchquery_times_dict[searchquery] =searchquery_times_dict.get(searchquery, 0) + times
- External files, including game_info.txt, qingse_keyword.txt
file before analyse_searchquery.py combined analysis of search behavior
is to build keyword query, you can reference previous blog sensitive words Match --python use esmre achieve ac automata to the erotic, for example
def gen_qingse_index(file_path):
qingse_index = esm.Index()
line_num = len([ "" for line in open(file_path, "r")])
with tqdm.tqdm(total=line_num) as progress:
valid_num = 0
for line in file(file_path):
progress.update(1)
qingse_index.enter(line.strip())
valid_num += 1
print valid_num
qingse_index.fix()
return qingse_index
def get_match_entity(index, searchquery):
index_result = index.query(searchquery)
match_entity_dict = {}
for (st_end, match_entity) in index_result:
if st_end[0] % 2 == 0:
match_entity_dict[match_entity] = True
ret = ''
if len(match_entity_dict) > 0:
ret = ','.join(match_entity_dict.keys())
return ret
qingse_index = gen_qingse_index(sys.argv[4])
qingse_result = get_match_entity(qingse_index, searchquery)
if len(qingse_result) > 0:
output += 'qingse'
fw.write(searchquery + '\t' + str(times) + '\t' + output + '\n')
- Under each category stat_entity_searchquerynumber_searchquerytimes.py statistics, the number of keyword search how many times, how many of the total ratio.
fw.write(entity + '\t' + str(searchquerynumber) + '\t' + str(searchquerytimes) + '\t' + str(searchquerynumber*1.0/total_searchquerynumber) + '\t' + str(searchquerytimes*1.0/tota l_searchquerytimes) + '\n')
- cal_mid_entity_info.py user entity statistical information