Construct event map (including code source) based on 50W Ctrip travel strategy: transportation sub-map, hotel booking event map, etc.

insert image description here

Project design collection (artificial intelligence direction): Help newcomers to quickly master skills in actual combat, independently complete project design upgrades, and improve their hard power (not limited to NLP, knowledge graphs, computer vision, etc. )

insert image description here

  1. Column Subscription: Encyclopedia of Projects to Improve Your Hard Power

  2. [Detailed introduction of the column: project design collection (artificial intelligence direction): help newcomers quickly master skills in actual combat, complete project design upgrades independently, and improve their own hard power (not limited to NLP, knowledge graphs, computer vision, etc.)

Construct event map (including code source) based on 50W Ctrip travel strategy: transportation sub-map, hotel booking event map, etc.

  • Project composition
    • This project consists of two parts, specifically including the acquisition of corpus and event mining based on corpus. The specific project catalog includes:
    • news_spider: scrapy-based travel collection script
    • event_graph: Extraction script of succession events based on dependency syntax and succession mode
    • image:The effect map of Youji Shuncheng Event Graph

1. Acquisition of travel data

  1. Data source: Ctrip Travel Guide

  2. Time frame: Before July 14, 2018

  3. Collection method: use scrapy to write crawler scripts for crawling

  4. Collection scale: a total of 505,767 articles were collected, with a magnitude of 50W

  5. Collection script directory: news_spider/travelspider

  6. Corpus example:

         107330 一路向南——第二篇相逢南通(自驾游) - 游记攻略【携程攻略】
         107331 彩云之南—云上的蜜月之旅 - 丽江游记攻略【携程攻略】
         107332 甘肃游记之玛曲郎木寺 - 碌曲游记攻略【携程攻略】
         107333 拍客白沙行 - 舟山游记攻略【携程攻略】
         107334 九华山-沐浴在佛恩下的XXX - 九华山游记攻略【携程攻略】
         107335 垦丁夏季活动 - 垦丁游记攻略【携程攻略】
         107336 行走在台湾(向隅版)---世外桃源之我们的家(九份民宿) - 九份游记攻略【携程攻略】
         107337 卫赛节马来西亚行 - 马六甲州游记攻略【携程攻略】
         107338 蓝天下的嘉峪关 - 嘉峪关游记攻略【携程攻略】
         107339 人生一定要登一次雪山---都日峰 - 四川游记攻略【携程攻略】
         107340 八月,青海湖不远 - 海北游记攻略【携程攻略】
         107341 #冬季北京# 帝都极冷天去首富的酒店避避寒 - 北京游记攻略【携程攻略】
         107342 圣地西藏 - 青海湖游记攻略【携程攻略】
         107343 孩子,妈妈想让你见识更多的繁华世界 - 深圳游记攻略【携程攻略】
         107344 顶级奢华,舍我其谁! - 澳门游记攻略【携程攻略】
         107345 旅行、不需要走远!美景就在身边 - 江门游记攻略【携程攻略】
         107346 安安静静,不言不语都是好风景 - 厦门游记攻略【携程攻略】
         107347 邂逅则天故里 行走美丽利州 体验师带你看中国女儿节 - 广元游记攻略【携程攻略】
         107348 台湾,可以这样玩--15日环岛自由行全记录 - 台北游记攻略【携程攻略】
         107349 让我记忆深刻的厦门--详细版 - 厦门游记攻略【携程攻略】
         107350 上海地鐵站 - 上海游记攻略【携程攻略】
         107351 逃离雾霾,带着“马拉多纳”去腾冲 - 腾冲游记攻略【携程攻略】
         107352 在我心上用力地开一 - 四川游记攻略【携程攻略】
         107353 冬季到鄱阳湖边的余干县看鸟,多张美图记录环湖游全过程 - 余干游记攻略【携程攻略】
         107354 2014.十一沈阳,本溪老边沟,枫叶大道,丹东,不走重复路,古迹,景色5日穷游 - 沈阳游记攻略【携程攻略】
         107355 库不齐老牛湾之户外行走 - 库布齐沙漠游记攻略【携程攻略】
    

2. Construction of event graphs based on travel corpus

2.1, Extraction of succession events

  • event_extract.py, the thought steps are as follows:
    1. Enter travel text
    2. Segment travel notes into long sentences
    3. Based on the structural succession relationship template, extract the parts before and after the succession, and transfer to 4)
    4. Perform short sentence processing on the part obtained in 3), and turn to 5)
    5. Carry out predicate phrase extraction to the short sentence obtained in 4)
    6. For 5) the obtained predicate phrases converge upwards to obtain an ordered set of predicate phrases for a long sentence
    7. For the set of predicate phrases obtained in step 6), the event pairs of succession relations are constructed in a sliding window manner
    8. Summarize the succession event pairs obtained in step 7), and finally obtain the succession event database
    9. Integrate events in 8), remove low-frequency events, and construct a standard compliance relationship library

2.2, Display of the Shuncheng Event Graph

10) Use the VIS plug-in to build and display the event_graph.py
11) Since VIS is a packaged JS library, the generated event graph is temporarily set to 500 in the project, see travel_event_graph.html

3. The effect of following the relationship map

3.1 Overall map style

With 500 succession events, the succession event map is displayed, and the result is an event network, which is a large succession relationship map, composed of many small sub-graphs

3.2 Go to Lijiang Submap

This sub-map is an event group formed around the travel event of "Travel to Lijiang" as the core:

3.3 Aircraft route submap

This subplot shows the sequence of events that shape the selection of an aircraft for a trip

3.4 Train route sub-map

This subgraph shows the sequence of events forming the selection of a train for a trip

3.5 Map of Hotel Booking Events

This sub-graph describes an "unpleasant hotel booking event", from booking to disappointment to conclusion, manifested in this sequential event chain

3.6 Map of cooking events

This sub-map shows a Shuncheng event in a "cooking" scene, which is also very interesting

4. Summary

  1. This project is just a 50W article field corpus, using a simple extraction method to form a demo of the sequential relationship graph, and there are still many deficiencies
  2. The project currently has 326,781 event nodes and 543,580 sequential event pairs, with graph scales of 30W and 50W respectively
  3. Event representation for predicate phrases is a way of event representation. This method only uses VOB relationship for extraction. This method needs to be improved.
  4. In the results obtained in 3), there is still a lot of noise. On the one hand, the accuracy is limited by the accuracy of the dependency syntax. On the other hand, the dependency relationship may be relatively single and not accurate enough.
  5. In the method of constructing the sequential event sequence, this project adopts the sliding window method under the unit of long sentence to construct, this method needs to be improved
  6. Based on the current relationship map of succession and inheritance, it needs to be further excavated, and more valuable information mining can be completed on this basis.

For the project code source, see the top or end of the article

https://download.csdn.net/download/sinat_39620217/87999839

Guess you like

Origin blog.csdn.net/sinat_39620217/article/details/131824583
map
Map
Map
map