The global COVID-19 research data set is officially open, containing nearly 30,000 papers and the required AI research tools!

image

Finishing | Yuying Now, the epidemic has already begun to spread globally, and dozens of laboratories around the world have been closed one after another, which is very detrimental to the progress of research on the coronavirus. At the same time, many countries and regions around the world have jointly signed a requirement to share COVID-19 research data sets and related papers, and it is recommended that publishers also provide data formats that can be directly used by AI software and other computer systems to accelerate the research process. The global COVID-19 research data set is officially open

Recently, the scientific and technological circles and academia formally announced a public data set CORD-19, which contains all the papers on the new coronavirus as of March 13, and includes the text processing toolkit SciSpacy, which is optimized for the text, and the scientific text The pre-trained BERT model SciBERT, open research corpus and API, etc., have collected nearly 30,000 documents including the contents of the SARS-CoV-2 virus.

image

According to relevant personnel, the new data set is machine-readable and can be easily parsed for machine learning purposes. In order to allow researchers to quickly sort out the data, the National Academy of Science, Engineering, and Medicine in collaboration with the World Health Organization put forward “high priority” issues related to the coronavirus. These issues are related to genetics, treatment, symptoms, and prevention. Related.

Previously, 11 countries and regions, including the United States, Italy, South Korea, and the United Kingdom, required relevant institutions to open these data sets for research. Related institutions here include PubMed Central (a service provided by the National Institutes of Health that archives biomedical and life science research documents) and the World Health Organization's Covid database. In the open letter calling for the open data set:

It is hoped that the publisher can provide related AI software and computer systems in a data format that can be directly read and used instead of a simple PDF document.

In addition to research, relevant institutions also need to screen the content. Previously, many researchers have published many research results related to the epidemic. However, due to time issues, many papers are in "preprint" status and have not been peer-reviewed. There may be some problems that need to be reviewed. Edward Campion, executive editor of the New England Journal of Medicine (NEJM), once stated:

We receive as many as 20 reports on the coronavirus every day, and frankly, some of them are not high-quality articles. Part of our responsibility is to select the content that we believe is most important to clinical audiences and public health audiences.

It is worth noting that Kaggle also hosted a COVID-19 open research data set challenge around this, aiming to inspire developers to use CORD-19 to find new insights about the epidemic on such a large scale, including the virus's History, dissemination and diagnosis, management measures for human-animal contact, lessons learned from previous epidemiological studies, etc. Kaggle provided the winners with a reward of USD 1,000 per task. For other prizes and detailed information, please refer to the official website of the challenge.

COVID-19 open research data set address: https://pages.semanticscholar.org/coronavirus-research

The epidemic has caused many laboratories to close, opening up can increase productivity

Due to the impact of the new crown virus epidemic, Harvard has recently closed laboratories or scaled down laboratories on a large scale. Such things have also occurred in other laboratories. This has hindered the development of many epidemic-related researches. The industry needs to open data sets to further improve productivity.

Normally, laboratories determine their operation methods based on the extent of local outbreaks, but the current global epidemic is serious. Some laboratories are closed one after another, and some are not encouraged to continue research. Although some open institutions allow personnel to enter the laboratory, they are also controlling the number of people. , To minimize the number of people gathered in the building. Stanley Perlman, a researcher at the University of Iowa who has long been committed to research on coronaviruses, said:

Students are no longer allowed to work in the laboratory, and graduate students have certain restrictions. In this way, the number of people can be restricted at the same time and the chance of someone spreading the SARS-CoV-2 virus can be reduced.

This has also affected the research related to the epidemic to a certain extent. Researchers said that public health and the safety of laboratory members are paramount, but they still worry that leaving the laboratory for several weeks or months will mean that certain projects must be restarted or abandoned. Experiment, waste time and resources. Arturo Casadevall, professor of molecular microbiology and immunology at the Bloomberg School of Public Health at Johns Hopkins University, said: So far, we have remained open, but the situation is very unstable. It is understood that he is studying a treatment for Covid-19.

Chinese scientists contributed a lot of research and data

As early as January, before the epidemic was declared an international public health emergency, Chinese scientists quickly shared the first genome information of the SARS-CoV-2 virus. At the end of December last year, Wuhan reported a new type of coronavirus pneumonia for the first time. On January 8, Chinese scientists completed the sequencing of the virus genome and made it public, so that scientists around the world can participate in the battle against SARS-CoV-2.

On January 31 this year, the second day after the new coronavirus became a public health emergency of global concern, 94 academic journals, societies, research institutions, and companies signed an agreement, promising at least during the outbreak Provide free research and data about the disease.

The Public Library of Science (PLOS), which signed this agreement, has always been open access, and they charge authors instead of readers. Joerg Heber, editor-in-chief of PLOS, stated:

The public science library is ready to respond to any epidemic. In addition to open access, the journal also requires that all the data needed to replicate the research be published with it. Nevertheless, peer-reviewed research still takes time, so PLOS strongly encourages all researchers who submit papers related to coronavirus to publish these papers as preprints so that they can be obtained as soon as possible.

Nowadays, the data set that brings together all the research results is officially opened, which can not only speed up the research process of the epidemic, but also provide experience in the research of related infectious diseases. If developers are interested, they can visit the official website of the COVID-19 data set to download and agree to the relevant permission.

Related Links:

COVID-19 open research data set address: https://pages.semanticscholar.org/coronavirus-research

COVID-19 Open Research Data Set Challenge Address: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge


Guess you like

Origin blog.51cto.com/15060462/2675592