A person sitting at a desk reviewing photos and data on three computer monitors.

AI software that spots duplicated images in research papers can work faster and on a larger scale than manual checkers — but still needs editorial oversight.Credit: Laurence Dutton/Getty

Just before a study appears in any of ten journals published by the American Association for Cancer Research (AACR), it undergoes an unusual extra check. Since January 2021, the AACR has been using artificial intelligence (AI) software on all manuscripts it has provisionally accepted after peer review. The aim is to automatically alert editors to duplicated images, including those in which parts have been rotated, filtered, flipped or stretched.

The AACR is an early adopter in what could become a trend. Hoping to avoid publishing papers with images that have been doctored — whether because of outright fraud or inappropriate attempts to beautify findings — many journals have hired people to manually scan submitted manuscripts for issues, often using software to help check what they find. But Nature has learnt that in the past year, at least four publishers have started automating the process by relying on AI software to spot duplications and partial duplications before manuscripts are published.

The AACR tried numerous software products before it settled on a service from Proofig, a firm in Rehovot, Israel, says Daniel Evanko, director of journal operations at the association in Philadelphia, Pennsylvania. “We’re very happy with it,” he adds. He hopes the screening will aid researchers and reduce problems after publication.

Professional editors are still needed to decide what to do when the software flags images. If data sets are deliberately shown twice — with explanations — then repeated images might be appropriate, for instance. And some duplications might be simple copy-and-paste errors during manuscript assembly, rather than fraud. All this can be resolved only with discussions between editors and authors. Now that AI is getting sufficiently effective and low-cost, however, specialists say a wave of automated image-checking assistants could sweep through the scientific publishing industry in the next few years, much as using software to check manuscripts for plagiarism became routine a decade ago. Publishing-industry groups also say they are exploring ways to compare images in manuscripts across journals.

Other image-integrity experts welcome the trend, but caution that there has been no public comparison of the various software products, and that automated checks might throw up too many false positives or miss some kinds of manipulation. In the long term, a reliance on software screening might also push fraudsters to use AI to dupe software, much as some tweak text to evade plagiarism screening. “I am concerned that we are entering an arms race with AI-based tech that can lead to deepfakes that will be impossible to find,” says Bernd Pulverer, chief editor of EMBO Reports in Heidelberg, Germany.

An example from Proofig of the manipulation of a digitally colorized transmission electron microscopic (TEM) of virions.

This constructed example from the image-checking software firm Proofig shows how its program compares parts of images (red rectangles, left) and flags up identical parts even in stretched or rotated pictures. The blue lines indicate that the AI sees hundreds of identical features. Credit: Adapted from CDC/Proofig

Software’s moment?

Researchers have been developing image-checking AI for years because of concerns about errors or fraud — which are probably polluting the scientific literature to a much greater extent than the limited numbers of retractions and corrections suggest. In 2016, a manual analysis1 of around 20,000 biomedical papers led by microbiologist Elisabeth Bik, a consultant image analyst in California, suggested that as many as 4% might contain problematic image duplications. (Typically only about 1% of papers receive corrections each year, and many fewer are retracted.)

“I am aware of around 20 people working on developing software for image checking,” says Mike Rossner, who runs the consultancy firm Image Data Integrity in San Francisco, California, and introduced the first manual screening of manuscripts at the Journal of Cell Biology, 20 years ago. Last year, publishers joined together to form a working group to set standards for software that screens papers for image problems; the group issued guidelines this year on how editors should tackle doctored images, but hasn’t yet produced guidance on software.

Several academic groups and companies have told Nature that journals and government agencies are trialling their software, but Proofig is the first to name clients publicly. Besides the AACR, the American Society for Clinical Investigation started using Proofig’s software for manuscripts in the Journal of Clinical Investigation (JCI) and JCI Insight in July, says Sarah Jackson, executive editor of those journals in Ann Arbor, Michigan. And SAGE Publishing adopted the software in October for five of its life-sciences journals, says Helen King, head of transformation at SAGE in London.

Proofig’s software extracts images from papers and compares them in pairs to find common features, including partial duplications. A typical paper is checked in a minute or two; the software can also correct for tricky issues such as the compression artefacts that can arise when high-resolution raw data are compressed into smaller files, says Dror Kolodkin-Gal, the firm’s founder. “The computer has an advantage over human vision,” he says. “Not only does a computer not get tired and run much faster, but it is also not affected by manipulations in size, location, orientation, overlap, partial duplication and combinations of these.”

The cost of image checking is much higher than that of plagiarism checking, which specialists say runs to less than US$1 per paper. Kolodkin-Gal declined to discuss pricing in detail, but said that contracts with publishers tend to charge on the basis of the number of images in a paper, but also depend on the volume of manuscripts. He says they equate to per-paper charges “closer to tens of dollars than hundreds of dollars”.

At the JCI, says Jackson, the software picks up more problems than did previous manual reviews by staff members. But staff are still essential to check Proofig’s output, and it was important that the journal already had a system of procedures for dealing with various image concerns. “We really feel that rigorous data is an absolute hallmark of our journals. We have decided this is worth the time and money,” Jackson says. At the AACR, Evanko says many authors are happy that duplication errors are brought to their attention before publication.

Meanwhile, the publisher Frontiers, in Lausanne, Switzerland, has developed its own image-checking software as part of a system of automated checks called AIRA (Artificial Intelligence Review Assistant). Since August 2020, an internal research-integrity team has been using AIRA to run image checks on all submitted manuscripts, a spokesperson says. The majority of papers that it flags up don’t actually have problems: only around 10% require follow-up from the integrity team. (Frontiers declined to say what fraction of papers AIRA flags.)

Image-integrity specialists including Bik and Rossner say they haven’t tried AIRA or Proofig themselves, and that it is hard to evaluate software products that haven’t been publicly compared using standardized tests. Rossner adds that it’s also important to detect image manipulation apart from duplication, such as removing or cropping out parts of an image, and other photoshopping. “The software may be a useful supplement to visual screening, but it may not be a replacement in its current form,” he says.

“I am convinced, though, that eventually this will become the standard in manuscript screening,” adds Bik.

Industry caution

Publishers that haven’t yet adopted AI image screening cite cost and reliability concerns — although some are working on their own AIs. A spokesperson for the publisher PLOS says that it is “eagerly” monitoring progress on tools that can “reliably identify common image-integrity issues and that could be applied at scale”. Elsevier says it is “still testing” software, although it notes that some of its journals screen all accepted papers before publication, checking for concerns around images “using a combination of software tools and manual analysis”.

In April 2020, Wiley introduced an image-screening service for provisionally accepted manuscripts, now used by more than 120 journals, but this is currently manual screening aided by software, a spokesperson says. And Springer Nature, which publishes Nature, says that it is assessing some external tools, while collating data to train its own software that will “combine complementary AI and human elements to identify problematic images”. (Nature’s news team is editorially independent of its publisher.)

Pulverer says that EMBO Press still mostly uses manual screening because he’s not yet convinced by the cost–benefit ratio of the commercial offerings, and because he is part of the cross-publisher working group that is still defining criteria for software. “I have no doubt that we will have high-level tools before long,” he says.

Pulverer worries that fraudsters might learn how the software works and use AI to make fake images that neither people nor software can detect. Although no one has yet shown that such images are appearing in research papers, one preprint2 posted on bioRxiv last year suggested that it was possible to make fake versions of biological images such as western blots that were indistinguishable from real data. But researchers are working on the problem: computer scientist Edward Delp at Purdue University in West Lafayette, Indiana, leads a team that is spotting media faked by AIs, in a programme funded by the US Defense Advanced Research Projects Agency, and is focusing on fake biological imagery such as microscope images and X-rays. He says his team “has one of the best” sets of detectors for GANs, or generative adversarial networks — a way of pitting AIs against each other to create realistic images. A paper describing his system is under review.

Cross-journal image checks

For the moment, AI image checking is generally done within a manuscript, not across many papers, which would make it increasingly computationally intensive. But commercial and academic software developers say that this is technically feasible. Computer scientist Daniel Acuña at Syracuse University in New York last year ran his software on thousands of COVID-19 preprints to find duplications.

Crossref, a US-based non-profit collaboration of more than 15,000 organizations that organizes plagiarism checking across papers, among other things, is currently running a survey to ask its members about their concerns on doctored images, what software they are using and whether a “cross-publisher service” that could share images could be viable and helpful, says Bryan Vickery, Crossref’s director of product in London.

And in December, STM Solutions — a subsidiary of the STM, an industry group for scholarly publishers in Oxford, UK — announced that it was working on a “cloud-based environment” to help publishers collaborate “to check submitted articles for research integrity issues” — while maintaining privacy and confidentiality. Detecting image manipulation, duplication and plagiarism across journals is “high on our road map”, says Matt McKay, an STM spokesperson.



Source link

Leave a Reply