The first domestic medical large language model MedGPT is released, and professional medical labeling data becomes the key

On May 25th, the domestic Internet hospital and chronic disease management platform Medical Alliance officially released the first domestic medical large language model based on the Transformer architecture, MedGPT, which was independently developed.

Different from general-purpose large language model products, MedGPT is mainly committed to exerting the actual value of diagnosis and treatment in real medical scenarios, and realizing the full-process intelligent diagnosis and treatment capabilities from disease prevention, diagnosis, treatment, and rehabilitation.

The current parameter scale of MedGPT is 100 billion. The pre-training phase uses more than 2 billion medical text data, and the fine-tuning training phase uses 8 million pieces of high-quality structured clinical diagnosis and treatment data, and more than 100 doctors have been invested in manual feedback. Supervised fine-tuning training.

 

At this stage, the vertical medical model is difficult in the following aspects:

The quality of data in the medical industry is not high

The quality of medical data is relatively low, and there are even some cases of inaccuracy and incompleteness, which affect the learning and prediction performance of large models.

Not enough data

Compared with other industries, the amount of data in the medical industry is relatively small, and the data of medical services is too fragmented, which may pose challenges to the accuracy and sensitivity of large-scale models.

 Data Privacy and Security

Medical data involves personal privacy and sensitive information, and managing and protecting the security and privacy of these data is an important issue.

Lack of standardization in the medical industry

The data and workflow in the medical industry lack standardization. Different medical institutions adopt different systems, and each system has different standards. This makes data sharing and collaboration between different institutions complex and difficult.

High real-time requirements

Medical data sometimes needs to be responded and processed quickly, which puts forward higher requirements for the real-time and real-time performance of the model.

Compound talent shortage

For the highly specialized interdisciplinary field of "AI+medicine", there is a great demand for interdisciplinary talents. The medical professional knowledge itself is very fine, coupled with the deep integration with the algorithm, AI medical care has very high requirements for the comprehensive ability of talents.

AI medical large models require strong data support, and labeled data is a kind of data necessary for building models, and they play an important role in AI large medical models.

Labeling data is crucial to improving the performance of AI medical large models. Through the analysis, training and verification of labeled data, the AI ​​medical large model can more accurately identify the patient's condition and provide strong support for doctors to formulate more accurate treatment plans. Medical institutions can better control the quality and consistency of data, reduce data bias, and then improve the accuracy and interpretability of models, train more accurate and refined models, and provide patients with better medical services.

Jinglianwen Technology is a leading enterprise in the AI ​​basic data industry, with a large amount of high-quality medical data reserves. Possess 100G of relevant medical knowledge texts, covering the latest research results in different medical fields; have a large number of professional medical papers, which come from multiple search platforms at home and abroad, cooperation resources of more than 40 professional universities, and cooperation of more than 40 professional medical organization associations at home and abroad; 100G high-resolution and accurate medical images, including various medical images, such as CT, MRI, ultrasound, etc., can enable AI medical large language models to better learn and diagnose, better understand and simulate doctor-patient communication, Scenarios such as the diagnosis and treatment process improve the accuracy and efficiency of AI medical large language model diagnosis. All the data are labeled and checked by professional medical personnel to ensure the high quality of the data.

JLW Technology has a wealth of medical expert resources. Experts in the medical field can label data information in vertical fields in an all-round way to ensure data quality and meet current labeling needs.

Jinglianwen Technology has a team of 5,000 professional medical students with rich annotation experience, has reached in-depth cooperation with 10 professional medical schools, has rich experience in image and text annotation, and can provide image and NLP related data collection and data for large-scale medical treatment Labeling service, deploy relevant labelers to provide services according to customer needs.

JLW Intelligent Medical Labeling Platform supports multiple types of medical data labeling, which can provide enriched, precise and structured medical knowledge for AI medical large models, and provide a more scientific and accurate guarantee for medical data customized labeling services.

JLW Technology|Data Collection|Data Labeling

Helping artificial intelligence technology, empowering the intelligent transformation and upgrading of traditional industries

The copyright of the text and graphics of the article belongs to Jinglianwen Technology. For commercial reprinting, please contact Jinglianwen Technology for authorization. For non-commercial reprinting, please indicate the source.

Guess you like

Origin blog.csdn.net/weixin_55551028/article/details/131081864
Recommended