Science: 13.5% of biomedical papers could be written by ChatGPT and other AIs in 2024

To measure the impact of large language models on written language, the researchers compared the actual word frequency in 2024 with predicted values. The forecast was based on data for 2021-2022, the period before the mass implementation of LLM. The experts excluded data for 2023 from the analysis, as it could already reflect the effect of the use of AI chatbots. Among all 26,657 words, the scientists found many terms with strong overuse in 2024.
Among the LLM markers, various forms of the words delves (to dig) with an excess frequency coefficient (r) = 28, underscores (to emphasize) with r = 13.8, showcasing (demonstration) with r = 10.7 were identified. The group also included potential (potential), findings (findings), critical (critical). The use of such marker words increased sharply in 2023-2024. For comparison, the excess frequency coefficient of the word ebola in 2015 was 9.9, and zika in 2017 was 40.4.
Frequency ratio and frequency gap of "redundant" words in 2022-2024
The researchers also hand-selected 900 unique “redundant” words that deviated from the standard vocabulary of scientific papers. During the COVID-19 pandemic, their corpus consisted almost entirely of content words (such as “respiratory,” “remdesivir,” etc.), whereas the redundant vocabulary in 2024 consisted almost entirely of style words. The “content” words that deviate from the core vocabulary are predominantly nouns (79.2%), and therefore most of the “redundant” words before 2024 were also nouns. In contrast, of all 379 style words in 2024, 66% were verbs and 14% were adjectives.
In summing up the study, the experts agreed that their colleagues often use LLMs in their work to improve the grammar, rhetoric, and overall readability of their texts, as well as to help translate publications into English and quickly create summaries. However, the authors of the study pointed out that language models often “invent” fake citations, form inaccurate conclusions, and make false claims that sound authoritative and persuasive. Although experts can spot and correct factual errors in their own writing, this becomes more difficult when working with professional literature reviews (and in other cases).
In addition, LLMs can reproduce biases and other shortcomings of their training data, as well as create outright plagiarism. This makes AI-generated texts less diverse and original than those written by humans. Such unification can reduce the quality of scientific publications: for example, all the conclusions generated by AI on a given topic may sound the same, contain the same ideas and references, which limits the emergence of new concepts and exacerbates the problem of unethical citation. The authors of the study also worry that unscrupulous participants in the scientific process, such as “paper mills,” can use language models to mass-produce fake publications.
The study's authors note that their method for finding "redundant" words could help track future LLM usage in academic publications, grant applications, and other texts. The researchers also hope that their analysis could inform needed debates around LLM policy by providing a method for measuring the use of large language patterns.
There are other risks associated with using AI in healthcare. For example, researchers at Flinders University in Australia found that popular AI chatbots like OpenAI’s GPT-4o, Google’s Gemini 1.5 Pro, Anthropic’s Claude 3.5 Sonnet, and X’s Grok Beta can be easily repurposed to routinely provide false answers to medical questions. The study’s authors were able to train LLM to provide fake citations from real medical journals and create the appearance of authority. Without proper safeguards, attackers could use these capabilities to mass-produce medical misinformation and spread it across the internet and social media, the experts warned.
vademec