AI search engine helps scientists innovate

The developers hope to free up scientists to focus on discovering and innovating by helping them discover connections from the vast literature.

f7aceee6385ffdc0e200d4cc39a0512f.png

Image credit: The Project Twins

For history-focused researcher Mushtaq Bilal, he's invested a lot of time in futuristic technologies.

Bilal is a postdoctoral fellow at the University of Southern Denmark Odense, where his research focuses on the evolution of the novel in 19th century literature. Most influential, however, were his online tutorials, in which he served as an unofficial ambassador between academia and the rapidly expanding search tools utilizing artificial intelligence (AI).

Drawing on his literary background, Bilal has been deconstructing the process of academic writing for years, but now his work has taken on a new direction. "When ChatGPT came out last November, I realized that many steps in writing could be automated using different AI applications," he said.

A new generation of search engines powered by machine learning and large-scale language models is going beyond keyword searches to extract and build associations from the intricate web of scientific literature. Some programs, such as Consensus, provide context-based answers to true and false questions; others, such as Semantic Scholar, Elicit, and Iris, serve only as digital assistants, sorting out reference lists, recommending new papers, and generating research summaries. Overall, these platforms facilitate the early stages of writing. Critics, however, point out that these procedures have not been tested and risk perpetuating biases in the scholarly publishing process.

The team behind the tools says they are designed to deal with "information overload" and unleash scientists' creativity. According to Daniel Weld, chief scientist at the Allen Institute for Artificial Intelligence in Seattle, Washington, and chief scientist at the Semantic Scholar, scientific knowledge is growing at such a rate that it is nearly impossible for scientists to keep up with the latest developments. "Most search engines will help you find papers, but it's up to you to try and get the information out there," he said. AI tools can help make that information more accessible by distilling papers down to key points, Weld said. "We're all big fans of Google Scholar, and I still find it helpful, but we can do better."

## The next great idea

The key to doing better lies in another type of search. Google Scholar, PubMed, and other standard search tools use keywords to locate similar papers. In contrast, AI algorithms use vector comparisons. Papers are translated into a set of numbers, called vectors, whose closeness in a "vector space" corresponds to their similarity. "We can parse more of the meaning of a search query because more contextual information is embedded in the vector than in the text itself," explained Megan Van Welie, principal software engineer at Consensus in San Francisco, California.

Bilal went down an interesting rabbit hole using AI tools to trace connections between papers. While researching depictions of Muslims in Pakistani fiction, AI-generated recommendations based on his searches led Bilal to Bengali literature, which he articulates in his dissertation. During his postdoctorate period, Bilal focused on how Andersen's fairy tales were interpreted in the Indian colonies. "All the time spent on the history of Bengali literature came back," he said. Bilal uses Elicit to iterate and refine his questions, Research Rabbit to identify sources, and Scite (which not only tells users how often a paper has been cited, but also what is in the citing paper) to track scholarly work.

Mohammed Yisa, a research technician on the vaccinology team at the Gambia Medical Research Council Unit at the London School of Hygiene & Tropical Medicine, follows Bilal's Twitter (now called X), and sometimes Yisa spends time testing the platforms mentioned in Bilal's tweets.

Yisa particularly enjoys using Iris, a visual search engine that creates a map-like linking papers to topics. Entering "seed papers" into Iris generates a nested map of related publications, similar to a world map. Clicking on deep sections of the map is like zooming from a nationwide view to states (subtopics) and cities (individual papers).

"I consider myself a visual learner, and map visualizations are things I've never seen before," said Yisa, who is currently using the tools to identify review articles on vaccine equity, "Look who's talking about it right now, what is being said, and what has not been mentioned".

Other tools, such as Research Rabbit and LitMaps, connect papers together through a network map of nodes. System Pro, a search engine aimed at medical professionals, creates similar visualizations, with topics linked together by correlations.

While these searches rely on "extractive algorithms" to extract useful snippets, some platforms are rolling out generative capabilities, using AI to create raw text. For example, the Allen Institute's Semantic Reader "brings AI into the reading experience of PDF manuscripts," Weld said. If the user encounters a symbol in an equation or cited in the text, a card pops up showing the symbol's definition or an AI-generated summary of the cited paper.

Elicit is testing a brainstorming feature for generating better queries to help create a way to provide better multi-paper abstracts than the top four search results. The method uses Open AI's ChatGPT, but was only trained on scientific papers, so it's less prone to "hallucinations" (errors in generated text that appear to be correct but are actually inaccurate) compared to searches based on the entire Internet The tolerance is lower, explains James Brady, Ought's director of engineering. "If you're making a statement about your reputation, scientists want more reliable information that they can trust."

Miles-Dei Olufeagba, a biomedical researcher at the University of Ibadan in Nigeria, still considers PubMed the gold standard, calling it a "refuge for medical scientists." Olufeagba tried Consensus, Elicit and Semantic Scholar. The results from PubMed may take more time to sort through, but ultimately lead to higher-quality papers, he said. AI tools "tend to lose some information that is critical for literature searches," he said.

AI tools can help researchers dig deeper into the literature and find new research fronts. However, there are some problems with such tools. First, they may replicate and amplify existing biases. For example, if an AI tool relies primarily on English-language research literature, it may ignore non-English research output. Furthermore, if a machine learning model is primarily trained on Western research literature, it may be biased towards Western perspectives and methods. Second, these tools can oversimplify complex scientific papers, leading to misinterpretation or misleading.

Despite these issues, many researchers are optimistic about the potential of these tools. Bilal says that while he notes some limitations, he still finds the tools very helpful for his research. "I feel like it's a strength that helps me be more productive, better understand what I'm reading, and find new connections," he said.

early stage

AI platforms are also prone to the same biases as their human creators. Research has repeatedly demonstrated that scholarly publishing and search engines present disadvantages to certain groups, including women[1] and people of color[2], and these disadvantages also exist in AI tools.

For example, scientists with names with accented characters described difficulties creating profiles using Semantic Scholar. And, since several search engines, including Semantic Scholar and Consensus, use metrics such as citation counts and impact factors to determine rankings, works published in prestigious journals or that attract attention will always rank ahead of more research-relevant works, thereby The result is what Weld calls the "rich get richer effect". (Consensus co-founder and CEO Eric Olson says a paper’s relevance to a query is always the number one metric in determining its rank.)

These engines do not explicitly flag preprints as content that requires greater scrutiny and display them alongside formally peer-reviewed published papers. And for controversial questions, like whether childhood vaccines cause autism or whether humans contribute to global warming, Consensus sometimes returns continuation of false or untested answers. For these contentious issues, Olson says the team sometimes manually reviews the results and flags controversial papers.

However, the developers say it is the user's responsibility to ultimately verify any claims. These platforms usually indicate when beta testing is conducted, and some platforms also have a sign indicating the quality of the paper. In addition to a "controversial" label, Consensus is currently developing a way to label the type of study, the number of participants and the source of funding, and Elicit has a similar function.

But Sasha Luccioni, a scientist at Montreal, Canada-based AI company Hugging Face, warns that some AI companies release products prematurely because they rely on users to improve their products, a common practice in the tech start-up world that doesn't sit well with the scientific world. Some teams are reluctant to make their models public, making it difficult to pass ethical scrutiny. Luccioni, for example, studies the carbon footprint of AI models, but she says it's hard to get basic data, such as the size of the model or the time during training -- "basic things that don't give away any secrets." While early platforms like Semantic Scholar shared their underlying software so others could build on it (Consensus, Elicit, Perplexity, Connected Papers, and Iris all used Semantic Scholar corpora), “nowadays, companies don’t provide any , so it’s less of a scientific question and more of a product question.”

For Weld, it is even more necessary to ensure the transparency of Semantic Scholar. "I do think AI is moving fast, and the incentive to 'get us ahead of everyone else' could be pushing us in dangerous directions," he said. "But I also think that AI technology can have enormous benefits. Some of the major challenges facing the world are best solved through a truly vibrant research program, and that's what fills me with a passion every morning—to help improve scientists productivity."

References :

[1].  Ross, M. B. et al. Nature 608, 135–145 (2022).

[2]. Salazar, JW et al. JAMA Int. Med. 181, 1248–1251 (2021).

Read the original content :

doi: https://doi.org/10.1038/d41586-023-01907-z

Past products (click on the picture to go directly to the text corresponding tutorial)

3e0b800bcf53f37c893cc58816a096ac.jpeg

2d4f9f3030c6b655c6a525293b4ce97c.jpeg

2d9d2d68a587fbc9a524d8f5ff0ecf3b.jpeg

4564326ed8c3268585d10b11390eeff7.jpeg

d2b4d6afff7831ff7436c9c1a13e604f.jpeg

5aefda23c956babdc88aa7a7cee103ed.jpeg

90e121ceefce164c0fac57a8503d94eb.jpeg

3a9273dae0c55e13a9e577f64d5960bc.jpeg

0f3c66a2633da75e09c5db4f054dfb6a.jpeg

12f21091a4f2f2525c0468ef24709fe9.jpeg

24013c8c1435cdb66efb7007b2a2ac59.jpeg

bde5fd72c5ebc29ac577aa3c529823a5.jpeg

cfb02dc1c6da0404f62af6ad24f150dc.png

629ad0f4accf55d30d058628fb2ef8f3.png

9c3e3b03bc75e54a3a9f3c87cdbbf257.png

cb9dee2fd112fd83ae0a607c81580b34.png

3ffa4b67530209847ab1dc1c8295d6dc.jpeg

f5a63e15ee3db24d78c4aefc0f20a5fd.jpeg

99ea993e4e314c6a929ca3e7b1c13cea.jpeg

736d5dc0b43e0204229f5822cdd8c3f6.jpeg

edfd97ccfff0693724cb2f64eb4c7fad.png

9451cc04b5be2d8677bb9cfd9210eaa8.png

8783320b8e13c5f7bb4634158bd7be9f.jpeg

425a489ec2a81149dc5001f4f2496545.png

3bb58d71a4fb4c5ba1b3541eaa51414b.png

aa93fc686f30611de2f83f9c4610e122.jpeg

a81583bf40e3d9e99bd00b33d3f25e02.png

7ad41933e0ad8a4224990db7ec0f61f3.png

machine learning

28e5b4deb441376c700b269a8d564f56.png

e75074196c8f600e1ec52c00ed0c0bf9.jpeg

0e03e044407e0b09dfb2cefb77e3e93b.jpeg

5872267c8050a5d4a13087e632495e04.png

Guess you like

Origin blog.csdn.net/qazplm12_3/article/details/132373745