Doctor recommendation system of domain knowledge map: use BERT+CRF+BiLSTM medical entity recognition to establish medical knowledge map and knowledge question answering system

insert image description here

Project design collection (artificial intelligence direction): Help newcomers quickly master skills in actual combat, complete project design upgrades independently, and improve their own hard power (not limited to NLP, knowledge graph, computer vision, etc.): Collect meaningful project design collections to help Newcomers quickly master skills in actual combat, helping users make better use of the CSDN platform, independently complete project design upgrades, and improve their own hard power.

insert image description here

  1. Column Subscription: Encyclopedia of Projects to Improve Your Hard Power

  2. [Detailed introduction of the column: project design collection (artificial intelligence direction): help newcomers quickly master skills in actual combat, complete project design upgrades independently, and improve their own hard power (not limited to NLP, knowledge graphs, computer vision, etc.)

Doctor recommendation system of domain knowledge map: use BERT+CRF+BiLSTM medical entity recognition to establish medical knowledge map and knowledge question answering system

This project mainly realizes two functions of disease self-diagnosis and doctor recommendation, and builds a doctor service index evaluation system. Disease self-diagnosis mainly uses the medical entity recognition of BERT+CRF+BiLSTM to establish a medical knowledge map,
so as to realize the initial diagnosis of diseases based on patient inquiry texts. This function helps patients to understand their disease conditions initially and provides support for further communication with doctors.
The second function is doctor recommendation. The platform uses the Minhash and minhashLSHForest algorithms based on the Jacard distance to make recommendations,
matching the patient's consultation text and the doctor's historical consultation information, so as to recommend the most suitable doctor for the patient. Finally, we use the django framework for project release.

1. Project framework

For code source download, see the top or end of the article

https://download.csdn.net/download/sinat_39620217/87990778

2. Data collection

In addition to using public medical datasets, this project also collected datasets from leading medical platforms in China.

spidersThe module provides information for data collection.

39crawler is used to obtain data from 39 Health Network, and hdf is used to obtain data from Haodafu.com (scrapy).

The running program is in the dist folder, double-click spider_run.exe to run the crawler program.

Crawl the specified disease information, add the department name (pinyin) or disease name (pinyin) to disease.txt, and each department or department occupies a separate line.

No matter how many lines you add in the disease.txt file, the crawler will only crawl the department or disease corresponding to the first line, and the results will be output as doctor.csv and disease.csv after the program finishes running.

If you need to crawl the second disease, please delete the first line of departments or diseases, and run the program again.

3. Disease self-diagnosis

In the disease self-diagnosis module, the platform will read the semantic information of the user's disease description, first perform text preprocessing, and then extract the key components through the entity recognition model, namely: medical entities such as disease symptoms, complications, and body parts. These medical entities are then fed into a knowledge graph (built on a large-scale dataset) at the backend of the platform. Ultimately, through the fast query and calculation of the knowledge graph, the platform will return the disease inference based on the patient's disease description and the corresponding probability value. At the same time, disease-related introductions, departments that need to go to see a doctor, and information about people with frequent diseases will also be pushed to users.

3.1. Medical entity recognition

Medical entity recognition refers to identifying medical entities from a given sentence. In this project, it is necessary to identify various types of medical entities such as diseases, symptoms, and departments from the condition descriptions consulted by patients, and to find keywords related to disease characteristics.

entity_extractThe module provides relevant information on medical entity recognition.

Because the model is too large, it is not placed in the project file path, see项目资料

输入:汪主任您好,1月中旬常规体检发现TCT高度病变,HPV未查,2020年hpv和tct正常。已经在南京鼓楼医院做了活检,报告如下,诊断写的肿瘤,请问现在这个是不是癌呢?是不是很严重?因为娃太小很害怕,后续该怎么手术呢?十分迫切希望得到您的答复,不胜感激!

输出:{'test': [('hpv', 35), ('tct', 39), ('活检', 56)], 'symptom': [('肿瘤', 68)], 'feature': [('严重', 87)]}
使用示例:
# predict.py
args.bert_dir = '../data/bert-base-chinese'  # 加载预训练的语义模型
model_name = 'bert_bilstm_crf'  # 使用的model类型:bert_bilstm, bert_bilstm_crf, bert_crf, bert
id2query = pickle.load(open('../data/id2query.pkl', 'rb'))  # 加载词典
ent2id_dict = pickle.load(open('../data/ent2id_dict.pkl', 'rb'))  # 加载词典
args.num_tags = len(ent2id_dict)
bertForNer = BertForNer(args, id2query)
model_path = './checkpoints/{}/model.pt'.format(model_name)  # 模型存储路径
model = bert_ner_model.BertNerModel(args)  # 根据参数实例化模型
model, device = trainUtils.load_model_and_parallel(model, args.gpu_ids, model_path)  # 模型加载
model.eval()
raw_text = "汪主任您好,1月中旬常规体检发现TCT高度病变,HPV未查,2020年hpv和tct正常。已经在南京鼓楼医院做了活检,报告如下,诊断写的肿瘤,请问现在这个是不是癌呢?是不是很严重?因为娃太小很害怕,后续该怎么手术呢?十分迫切希望得到您的答复,不胜感激!".strip().replace(
    '(', '(').replace(')', ')').replace('+', '&')  # 患者输入的自述文本
print(raw_text)
bertForNer.predict(raw_text, model, device)  # 识别的医学实体

3.2. Entity types that support recognition

body:患病部位,如:胃,皮肤
drug :药品,如:产妇康清洗液
feature:患病程度,如:严重
disease:疾病,如:前列腺炎
symptom:疾病症状,如:胃壁增厚
department:科室,如:五官科
test:疾病相关的检查,如:血常规

3.3. Model Selection

After we tested the accuracy, recall and micro_f1 values ​​of BERT, BERT+CRF, BERT+BiLSTM and BERT+BiLSTM+CRF models on the training set, we found that the BERT+BiLSTM+CRF model has better medical entity recognition Ability, therefore, in this project, we choose **BERT+BiLSTM +CRF**the model to complete the task of subsequent medical entity recognition.

3.4. Knowledge map construction

In order to carry out accurate disease diagnosis, we rely on large-scale data sets to build knowledge graphs.

build_kgModules provide information about knowledge graph construction.

We mark the required entities applied in the disease self-diagnosis module as diagnostic examination items, departments, diseases, drugs, diseased parts, disease symptoms, and disease degrees. After the user enters a piece of text, we first pass entity recognition Identify the above key entities.

Through prior investigation, we found that in the process of disease diagnosis, not only based on physical symptoms, but also many other affiliations for our reference. Therefore, in the relationship extraction, we divide the relationship between each entity into 8 categories, namely belonging, commonly used drugs for diseases, corresponding departments of diseases, aliases of diseases, inspections required for diseases, parts of diseases, symptoms of diseases, and concurrent diseases of diseases. We use the above 8 types of relationships to judge the relationship between entities in the knowledge map, so as to calculate the probability of suffering from the disease. The descriptive statistical features that define the relationships between KG entities are shown in the table below.

4. Doctor recommendation intelligent system

In the doctor recommendation module, the platform expects to find the patient most similar to the user in the historical data, and find the corresponding doctor to complete the personalized recommendation. Specifically, the platform first obtains the medical entities in the user's description text, that is, the mapping from a piece of text to multiple tokens. Then, each entity is represented in the form of a word vector. Immediately afterwards, the Minihash and MinihashLSHForest algorithms connect the two ends, namely: the user's description text and the doctor's historical consultation records in the database. The platform uses the jacard distance to calculate the similarity between the two, and the one with the higher similarity is considered to have a higher matching degree. Finally, the platform recommends doctors through the consultation records with a high degree of matching.

recommendModules provide information about knowledge graph construction.

输入:一周前稍感胸闷,入院检查,心脏彩超,腹部彩超正常,心脏冠状动脉CT,显示狭窄

输出:[{'Unnamed: 0': 0, 'patient_score': 11, 'patient_online': 2116, 'educate': '教授', 'articleCount': '22篇', 'spaceRepliedCount': '2116位', 'totaldiagnosis': '367位', 'openSpaceTime': '2008-10-22 18:15', 'hot_num': 3.6, 'hospitalName': '上海交通大学医学院附属上海儿童医学中心', 'keshi': '心内科', 'good_at': '先天性心脏病的诊断和介入治疗,小儿肺动脉高压的诊治,儿童心肌病的诊断和治疗', 'introduction': '傅立军,男,主任医师,教授,医学博士,博士生导师,心内科主任,国家卫计委先天性心脏病介入培训基地导师。从事小儿心血管疾病的诊疗二十年,尤其擅长于先天性心脏病的诊断和介入治疗以及肺动脉高压、心肌病的诊治,累计完成先天性心脏病介入治疗3000余例。 中华医学会儿科学分会心血管病学组委员,遗传代谢性心肌病协作组组长 中华医学会心血管病学分会肺血管病学组委员 中国医师协会儿科医师分会心血管疾病专业委员会委员兼秘书长 上海市儿科学会心血管病学组副组长 在国内外发表论文三十余篇。 参编专著多部。', 'doctor_title': '主任医师', 'doctor_id': 221603, 'doctorName': '傅立军', 'disease': 'gaoxueya'}, {'Unnamed: 0': 0, 'patient_score': 71, 'patient_online': 8515, 'educate': '教授', 'articleCount': '8篇', 'spaceRepliedCount': '8515位', 'totaldiagnosis': '1357位', 'openSpaceTime': '2009-05-07 16:16', 'hot_num': 4.0, 'hospitalName': '首都医科大学附属北京安贞医院鹤壁市人民医院', 'keshi': '心脏内科中心心血管内科', 'good_at': '冠心病,介入治疗;心肌病,心力衰竭,难治性高血压的诊断治疗。', 'introduction': '赵全明,博士,首都医科大学教授,博士研究生导师,北京安贞医院心脏内科中心主任医师。1989年西安医科大学内科硕士毕业,1997年底法国路易斯巴斯德大学医学院博士毕业,2000年晋升主任医师。擅长各种心血管疾病的诊断和治疗,重点从事冠心病的临床和研究,个人完成冠状动脉造影10000例,冠心病介入治疗(PCI)超过5000例。开展了冠心病诊断(冠脉造影,血管内超声-IVUS,光学相干断层显像-OCT,冠脉血流储备分数-FFR)和复杂冠心病介入治疗的各种新技术(冠状动脉支架术,钙化病变的旋磨术,支架内再狭窄的药物球囊治疗,生物可降解支架临床研究),并获得丰富经验。', 'doctor_title': '主任医师', 'doctor_id': 4269, 'doctorName': '赵全明', 'disease': 'gaoxueya'}, {'Unnamed: 0': 0, 'patient_score': 13, 'patient_online': 14327, 'educate': '教授', 'articleCount': '95篇', 'spaceRepliedCount': '14327位', 'totaldiagnosis': '3385位', 'openSpaceTime': '2011-07-06 14:35', 'hot_num': 3.6, 'hospitalName': '首都医科大学附属北京安贞医院', 'keshi': '心脏内科中心', 'good_at': '房颤和复杂心律失常的导管消融治疗,尤其擅长各种类型心房颤动(房颤)、心房扑动(房扑)、房性心动过速(房速)的导管消融,包括心脏外科术后如二尖瓣置换术后、房间隔封堵术后以及射频消融术后复发的房颤、房扑;室性心动过速(室速)和室上性心动过速(室上速);瓣膜病的球囊扩张术,特别是风湿性心脏病二尖瓣狭窄的治疗。', 'introduction': '马长生,主任医师,教授,博士生导师。于1998年完成国内首例房颤导管消融术,并系统建立了我国心律失常消融的技术和方法。 北京市心血管疾病防治办公室主任 中华医学会心血管病学分会副主任委员 中国医师协会心血管内科医师分会会长 中华医学会心电生理和起搏分会副主任委员 中国生物医学工程学会常务理事兼心律分会主任委员 中国生物医学工程学会介入医学工程分会主任委员 为卫生部有突出贡献中青年专家、科技北京百名领军人才、北京市卫生系统领军人才及JournalofCardiovascularElectrophysiology、Europace、JournalofInterventionalCardiacElectrophysiology和ChineseMedicalJournal等30余种学术期刊编委。 3次获国家科学技术进步二等奖 首创单导管法、“2C3L”术式、倒U形导管塑形消融右侧旁路等一系列原创性方法。 牵头研制成功自主知识产权的磁定位三维电解剖标测系统和首套房颤导管消融模拟器。 主编的《介入心脏病学》《心律失常射频消融图谱》为本专业最具影响的教科书之一。 以第一作者或通信作者发表SCI收录论文80余篇, 承担“十二五”国家科技支撑计划、“十二五”国家科技重大专项子课题、“十一五”863计划、“十五”国家科技攻关项目等省部级以上课题数十项。 已授权或公告专利7项,其中PCT专利2项。', 'doctor_title': '主任医师', 'doctor_id': 4255, 'doctorName': '马长生', 'disease': 'gaoxueya'}, {'Unnamed: 0': 0, 'patient_score': 0, 'patient_online': 3151, 'educate': '教授', 'articleCount': '4篇', 'spaceRepliedCount': '3151位', 'totaldiagnosis': '115位', 'openSpaceTime': '2008-12-20 03:09', 'hot_num': 3.5, 'hospitalName': '中国医学科学院阜外医院中国医学科学院阜外医院深圳医院', 'keshi': '心血管内科心血管内科', 'good_at': '冠心病的诊断与介入治疗,急性心肌梗死介入治疗', 'introduction': '主任医师,教授,博士研究生导师 中国医学科学院阜外医院深圳医院内科管委会主任 内科教研室主任、冠心病中心主任、介入中心主任 美国心脏病学会会员(FACC)、美国心脏协会会员(FAHA);欧洲心脏病学会会员(FESC)。  著名的心血管病学专家,国家级领军人才 中央保健委会诊专家 深圳市医学重点学科(心血管内科)负责人 深圳市重大疾病(冠心病)防治中心负责人 深圳市医防融合心血管病项目专家组组长 深圳市医疗卫生三名工程急性冠脉综合征团队负责人  受教育经历: 1982年武汉大学医学部获学士学位(改革开放后首批本科生);1988年华中科大同济医学院(全日制)硕士学位;1994年日本国立滨松医大(全日制)博士学位。  专业特长: 全球心血管介入手术例数最多和经验最丰富的的专家之一;各种心血管急重症的诊断与治疗;复杂冠心病介入治疗和长期管理;研究方向:急性心梗临床与转化医学研究。  工作语言: 普通话、英语和日语 中国医师协会胸痛专业委员会副主任委员 《中国介入心脏病学杂志》副主编 《中华心血管病杂志》等编委 《中华医学杂志》等审稿专家 先后建立了中日友好医院、北京安贞医院和北京阜外医院急性心梗救治通道;先后承担国家和省部级研究项目30项,共获得6000万元基金支持。 发表或参与发表文章550篇(https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=HONGBING+YANBeijing+&btnG=)。 出版著(译)作60部(https://book.jd.com/writer/颜红兵_1.html?stop=1&book=y&vt=2)。', 'doctor_title': '主任医师', 'doctor_id': 4274, 'doctorName': '颜红兵', 'disease': 'gaoxueya'}, {'Unnamed: 0': 0, 'patient_score': 4, 'patient_online': 4829, 'educate': '教授', 'articleCount': '33篇', 'spaceRepliedCount': '4829位', 'totaldiagnosis': '1228位', 'openSpaceTime': '2009-08-31 10:21', 'hot_num': 3.6, 'hospitalName': '首都医科大学附属北京安贞医院', 'keshi': '心脏内科中心', 'good_at': '冠心病介入治疗(支架术) 心绞痛/心肌梗死诊治 起搏器/除颤器植入 经导管主动脉瓣置入术 肺动脉高压诊治', 'introduction': '聂绍平,男,教授、主任医师、医学博士、博士研究生导师、欧洲心脏病学会专家会员(FESC),美国心血管造影和介入学会国际会员(FSCAI)。现任首都医科大学附属北京安贞医院急诊危重症中心主任。主要从事冠心病介入治疗、心肺血管急危重症临床与研究工作。个人累计完成经皮冠状动脉介入治疗(支架术)达15000余例,擅长复杂冠心病介入治疗(如钙化病变旋磨术、闭塞病变介入治疗等)。 主持863项目一项,国家自然科学基金面上项目3项,以及多项省部级重点项目。', 'doctor_title': '主任医师', 'doctor_id': 4264, 'doctorName': '聂绍平', 'disease': 'gaoxueya'}]
运行示例:
# try_minhash.py
df_csv = pd.read_csv('./gaoxueya-1.csv')
# 建立参数
# Number of Permutations
permutations = 128
forest = get_forest(df_csv, permutations)
# Number of Recommendations to return
# 召回top—n数目
num_recommendations = 100
# 精确需要的医生id数
num_doctors = 5
# 输入测试文本
raw_text = ' 一周前稍感胸闷,入院检查,心脏彩超,腹部彩超正常,心脏冠状动脉CT,显示狭窄 '
raw_text = raw_text.strip().replace('(', '(').replace(')', ')').replace('+', '&')
# 模型加载
args.bert_dir = '../data/bert-base-chinese'
model_name = 'bert_bilstm_crf'  # 使用的model类型:bert_bilstm, bert_bilstm_crf, bert_crf, bert
id2query = pickle.load(open('../data/id2query.pkl', 'rb'))
ent2id_dict = pickle.load(open('../data/ent2id_dict.pkl', 'rb'))
args.num_tags = len(ent2id_dict)
bertForNer = BertForNer(args, id2query)
model_path = '../entity_extract/checkpoints/{}/model.pt'.format(model_name)
model = bert_ner_model.BertNerModel(args)
model, device = trainUtils.load_model_and_parallel(model, args.gpu_ids, model_path)
model.eval()
# 识别测试文本中的医疗实体
text_shiti = path_pre(raw_text, bertForNer, model, device)
# 推荐医生
df = pd.read_csv('./haodaifu/doctors_gaoxueya.csv')
recommend(df_csv, text_shiti,df)

5. Doctor service index evaluation system

While considering economic indicators, doctors' evaluation indicators should highlight social benefit orientation, reflect patient-centered pursuit of rationalization of profits, and guide doctors to strive to improve the quality of medical services. On the basis of a large-scale objective data set, this study constructs a service index rating system based on six dimensions: the number of documents published by doctors, the number of votes by patients, the popularity of doctor recommendations, the number of patient reports after consultation, the number of online service patients, and the total number of patients .

6. Visual display of the project

The operation of this project relies on the django framework.

web_serverModules provide information about the operation of the platform.

6.1. Platform homepage

6.2. Disease self-diagnosis

6.3. Doctor's recommendation

6.4. Doctor service index evaluation system

For code source download, see the top or end of the article

https://download.csdn.net/download/sinat_39620217/87990778

Guess you like

Origin blog.csdn.net/sinat_39620217/article/details/131697045