ArXiv is one of the most important open access repositories for science research on the internet. Founded in 1991, it’s also one of the oldest running internet repositories, predating the World Wide Web. ArXiv is an open, curated research sharing platform, run by scientists for scientists, and it is free to read and free to submit. It covers physics, mathematics, computer science, economics, and quantitative biology.
Today, you can also visit bioRxiv, psyArXiv, medRxiv, and the recent socArXiv (hosted by the University of Maryland), all open access repositories for their respective fields, allowing for the free publication and access of scientific work.
There’s just one problem.
“We get about 20,000 submissions a month,” says Victor Galitski, who in addition to teaching quantum physics at UMD also works as a moderator for arXiv. “If you convert arXiv monthly submissions into how long it would take a human to read everything, it’s about eight years,” he says. “If you want to read all the papers on COVID-19 research—which is just the last four years—it would take you 150 years.”
Galitski recently shared these issues in “AI and the Future of Scientific Publishing,” a livestream discussion hosted by Wolfram Research. The problem isn’t the amount of data or the storage of data—it’s that it’s not possible to read everything, let alone find what you are looking for.
That’s where ScienceCast comes in. ScienceCast.org hosts a suite of tools Galitski and colleagues created using artificial intelligence that help sort through and make sense of the massive volume of research data on arXiv, bioRxiv, and other open repositories. An AI summarization tool like ScienceCast can quickly run through a paper, pull out relevant information, and generate a short summary via text or audio to help people find and evaluate the abundance of information.
“The artificial intelligence agent processes the paper, extracts the key facts from the paper, then projects it onto a general-audience level, removing the jargon and summarizing it in a kind of elevator pitch,” explains Galitski. Users can adjust the level of expertise in a summary, creating an accessible entry point for students new to the field, nonscientists, reporters, or just regular readers who want to learn more about a particular scientific issue.
Take the February 2024 immunology study “Second Boost of Omicron SARS-CoV-2 S1 Subunit Vaccine Induced Broad Humoral Immune Responses in Elderly Mice” on bioRxiv. ScienceCast can help demystify the technical paper:
“The study found that a second booster shot of a new SARS-CoV-2 vaccine variant, given to elderly mice, generated strong immune responses against multiple variants of the virus, including the Omicron variant. However, it also showed that the strength of the immune responses decreased with the age of the mice.”
This accessible summary can help researchers and nonresearchers alike find and process overly technical information as well as process the huge amount of data to be found in open access data repositories.
ScienceCast.org also offers AI-powered conversational search and an interface to chat to research papers. You can ask questions, such as “tell me more about the SARS-CoV-2 vaccine.” These tools use the AI technique called embedding, which represents semantic data in a mathematical form. Embeddings can also be used to explore contradictions and similarities in papers and produce knowledge graphs.
Some in academia may hear “AI” and bristle, immediately thinking of students using chatGPT dishonestly, AI’s propensity to “hallucinate” by producing incorrect or misleading results, or intellectual property concerns. While these are valid issues, AI is an extensive space that encompasses many forms and usages. For example, we use AI every day, maybe without even noticing it—checking social media, reading automatically generated captions or translations, or even browsing Netflix.
“As with any technology, AI has positive uses and negative uses. Our desire for these AI tools is to have a positive impact,” says Galitski. “I would love more people to read our physics research and understand it. But most people are not able to read our papers because they are too technical, too specialized. AI tools allow a broader audience to access research, whether it’s mine or somebody else’s. I would say that’s a positive thing.”
ScienceCast is just one example of how AI can be useful in the world of open access publishing, allowing more people to find and understand the plethora of knowledge being shared every day.