The developers hope to free up scientists to focus on discovering and innovating by helping them discover connections from the vast literature.
Image credit: The Project Twins
For history-focused researcher Mushtaq Bilal, he's invested a lot of time in futuristic technologies.
Bilal is a postdoctoral fellow at the University of Southern Denmark Odense, where his research focuses on the evolution of the novel in 19th century literature. Most influential, however, were his online tutorials, in which he served as an unofficial ambassador between academia and the rapidly expanding search tools utilizing artificial intelligence (AI).
Drawing on his literary background, Bilal has been deconstructing the process of academic writing for years, but now his work has taken on a new direction. "When ChatGPT came out last November, I realized that many steps in writing could be automated using different AI applications," he said.
A new generation of search engines powered by machine learning and large-scale language models is going beyond keyword searches to extract and build associations from the intricate web of scientific literature. Some programs, such as Consensus, provide context-based answers to true and false questions; others, such as Semantic Scholar, Elicit, and Iris, serve only as digital assistants, sorting out reference lists, recommending new papers, and generating research summaries. Overall, these platforms facilitate the early stages of writing. Critics, however, point out that these procedures have not been tested and risk perpetuating biases in the scholarly publishing process.
The team behind the tools says they are designed to deal with "information overload" and unleash scientists' creativity. According to Daniel Weld, chief scientist at the Allen Institute for Artificial Intelligence in Seattle, Washington, and chief scientist at the Semantic Scholar, scientific knowledge is growing at such a rate that it is nearly impossible for scientists to keep up with the latest developments. "Most search engines will help you find papers, but it's up to you to try and get the information out there," he said. AI tools can help make that information more accessible by distilling papers down to key points, Weld said. "We're all big fans of Google Scholar, and I still find it helpful, but we can do better."
## The next great idea
The key to doing better lies in another type of search. Google Scholar, PubMed, and other standard search tools use keywords to locate similar papers. In contrast, AI algorithms use vector comparisons. Papers are translated into a set of numbers, called vectors, whose closeness in a "vector space" corresponds to their similarity. "We can parse more of the meaning of a search query because more contextual information is embedded in the vector than in the text itself," explained Megan Van Welie, principal software engineer at Consensus in San Francisco, California.
Bilal went down an interesting rabbit hole using AI tools to trace connections between papers. While researching depictions of Muslims in Pakistani fiction, AI-generated recommendations based on his searches led Bilal to Bengali literature, which he articulates in his dissertation. During his postdoctorate period, Bilal focused on how Andersen's fairy tales were interpreted in the Indian colonies. "All the time spent on the history of Bengali literature came back," he said. Bilal uses Elicit to iterate and refine his questions, Research Rabbit to identify sources, and Scite (which not only tells users how often a paper has been cited, but also what is in the citing paper) to track scholarly work.
Mohammed Yisa, a research technician on the vaccinology team at the Gambia Medical Research Council Unit at the London School of Hygiene & Tropical Medicine, follows Bilal's Twitter (now called X), and sometimes Yisa spends time testing the platforms mentioned in Bilal's tweets.
Yisa particularly enjoys using Iris, a visual search engine that creates a map-like linking papers to topics. Entering "seed papers" into Iris generates a nested map of related publications, similar to a world map. Clicking on deep sections of the map is like zooming from a nationwide view to states (subtopics) and cities (individual papers).
"I consider myself a visual learner, and map visualizations are things I've never seen before," said Yisa, who is currently using the tools to identify review articles on vaccine equity, "Look who's talking about it right now, what is being said, and what has not been mentioned".
Other tools, such as Research Rabbit and LitMaps, connect papers together through a network map of nodes. System Pro, a search engine aimed at medical professionals, creates similar visualizations, with topics linked together by correlations.
While these searches rely on "extractive algorithms" to extract useful snippets, some platforms are rolling out generative capabilities, using AI to create raw text. For example, the Allen Institute's Semantic Reader "brings AI into the reading experience of PDF manuscripts," Weld said. If the user encounters a symbol in an equation or cited in the text, a card pops up showing the symbol's definition or an AI-generated summary of the cited paper.
Elicit is testing a brainstorming feature for generating better queries to help create a way to provide better multi-paper abstracts than the top four search results. The method uses Open AI's ChatGPT, but was only trained on scientific papers, so it's less prone to "hallucinations" (errors in generated text that appear to be correct but are actually inaccurate) compared to searches based on the entire Internet The tolerance is lower, explains James Brady, Ought's director of engineering. "If you're making a statement about your reputation, scientists want more reliable information that they can trust."
Miles-Dei Olufeagba, a biomedical researcher at the University of Ibadan in Nigeria, still considers PubMed the gold standard, calling it a "refuge for medical scientists." Olufeagba tried Consensus, Elicit and Semantic Scholar. The results from PubMed may take more time to sort through, but ultimately lead to higher-quality papers, he said. AI tools "tend to lose some information that is critical for literature searches," he said.
AI tools can help researchers dig deeper into the literature and find new research fronts. However, there are some problems with such tools. First, they may replicate and amplify existing biases. For example, if an AI tool relies primarily on English-language research literature, it may ignore non-English research output. Furthermore, if a machine learning model is primarily trained on Western research literature, it may be biased towards Western perspectives and methods. Second, these tools can oversimplify complex scientific papers, leading to misinterpretation or misleading.
Despite these issues, many researchers are optimistic about the potential of these tools. Bilal says that while he notes some limitations, he still finds the tools very helpful for his research. "I feel like it's a strength that helps me be more productive, better understand what I'm reading, and find new connections," he said.
early stage
AI platforms are also prone to the same biases as their human creators. Research has repeatedly demonstrated that scholarly publishing and search engines present disadvantages to certain groups, including women[1] and people of color[2], and these disadvantages also exist in AI tools.
For example, scientists with names with accented characters described difficulties creating profiles using Semantic Scholar. And, since several search engines, including Semantic Scholar and Consensus, use metrics such as citation counts and impact factors to determine rankings, works published in prestigious journals or that attract attention will always rank ahead of more research-relevant works, thereby The result is what Weld calls the "rich get richer effect". (Consensus co-founder and CEO Eric Olson says a paper’s relevance to a query is always the number one metric in determining its rank.)
These engines do not explicitly flag preprints as content that requires greater scrutiny and display them alongside formally peer-reviewed published papers. And for controversial questions, like whether childhood vaccines cause autism or whether humans contribute to global warming, Consensus sometimes returns continuation of false or untested answers. For these contentious issues, Olson says the team sometimes manually reviews the results and flags controversial papers.
However, the developers say it is the user's responsibility to ultimately verify any claims. These platforms usually indicate when beta testing is conducted, and some platforms also have a sign indicating the quality of the paper. In addition to a "controversial" label, Consensus is currently developing a way to label the type of study, the number of participants and the source of funding, and Elicit has a similar function.
But Sasha Luccioni, a scientist at Montreal, Canada-based AI company Hugging Face, warns that some AI companies release products prematurely because they rely on users to improve their products, a common practice in the tech start-up world that doesn't sit well with the scientific world. Some teams are reluctant to make their models public, making it difficult to pass ethical scrutiny. Luccioni, for example, studies the carbon footprint of AI models, but she says it's hard to get basic data, such as the size of the model or the time during training -- "basic things that don't give away any secrets." While early platforms like Semantic Scholar shared their underlying software so others could build on it (Consensus, Elicit, Perplexity, Connected Papers, and Iris all used Semantic Scholar corpora), “nowadays, companies don’t provide any , so it’s less of a scientific question and more of a product question.”
For Weld, it is even more necessary to ensure the transparency of Semantic Scholar. "I do think AI is moving fast, and the incentive to 'get us ahead of everyone else' could be pushing us in dangerous directions," he said. "But I also think that AI technology can have enormous benefits. Some of the major challenges facing the world are best solved through a truly vibrant research program, and that's what fills me with a passion every morning—to help improve scientists productivity."
References :
[1]. Ross, M. B. et al. Nature 608, 135–145 (2022).
[2]. Salazar, JW et al. JAMA Int. Med. 181, 1248–1251 (2021).
Read the original content :
doi: https://doi.org/10.1038/d41586-023-01907-z