The Wiley Network

Software to Improve Reliability of Research Image Data: Wiley, Lumina, and Researchers at Harvard Medical School Work Together on Solutions

software-to-improve-reliability-of-research-image-data-wiley-lumina-and-researchers-at-harvard-medical-school-work-together-on-solutions

Chris Graf, Former Director of Research Integrity in the Open Research team at Wiley.

March 09, 2022

This blog was written by Mary Walsh, Ph.D., Chief Scientific Investigator, Office for Academic and Research Integrity at Harvard Medical School, and Chris Graf, Director, Research Integrity and Publishing Ethics, Wiley

Wiley and Lumina are working together to support the efforts of researchers at Harvard Medical School to develop and test new machine learning tools and artificial intelligence (AI) software that can identify discrepancies in research image data. We are looking for one of the solutions that we all need to improve research quality, and to reduce research waste.

Discrepancies in research image data-1
Discrepancies in research image data-2

An example of research image manipulation by Wiley, to illustrate. A marked area from the original image has been cloned, rotated, and copied.

It has been reported that 4% of all biomedical research articles may contain at least one duplicated image (Bik, Casadevall, & Fang, 2016). The ability to detect, identify and correct these incorrect images has the potential to reduce the inclusion of inadvertent errors and/or intentionally misrepresented images in the published literature, making research findings more reliable and more useful. Lumina will collate up to 10,000 images from corrected and retracted research articles published by Wiley and deliver them to researchers at Harvard Medical School. Harvard Medical School researchers will use the images to train and test AI software that can identify images that have potentially been misused and/or altered. The resulting code from the project efforts is, and will remain, open source. Goals of the study include aiding in the identification of unreliable research data from the world’s research literature, assisting laboratory leaders in identifying errors and, in turn, supporting good data management practices in research labs around the world.

The problem

In this age of big data, research projects can generate terabytes of digital image data every day. Managing these large quantities of data is complex. Errors in accurately tracking the experimental source of data can lead to an image being misrepresented – published as being the result of one experiment when it really came from a different one. These mistakes confound the reliability of the research findings. In addition, changes and manipulations made to raw data, well-intentioned or not, can further lead to unreliable research reports.

After publication, a research article that includes these potentially problematic images may need to be corrected or retracted. “Evaluation of image use in the biomedical literature has remained largely manual and reliant upon visual inspection in the identification of discrepancies such as manipulation and reuse of image data with or without manipulation,” says Mary Walsh, Ph.D., Chief Scientific Investigator, Office for Academic and Research Integrity at Harvard Medical School and Principal Investigator of the Harvard research study. “Our research efforts will utilize the image data provided by Wiley and Lumina to help us develop platforms that detect these challenges, ideally before publication, as well as to establish benchmark resources (such as image libraries) to continue supporting community efforts in creating tools to enhance research data accuracy and quality assurance.”

The solution

The software that researchers at Harvard plan to develop, using images from Wiley delivered by Lumina, will help identify and classify signs of image manipulation like those described by Bik and others (Byrne & Christopher, 2020), and will make the job of identifying and addressing problematic images prior to and after publication more routine and reliable.

“As illustrated by numerous social issues, from pandemics to climate change, it is imperative that researchers, companies, and leaders of states have accurate information to drive their critical decisions. We are pleased to join with Wiley and Harvard University in efforts to maintain and improve worldwide scholarly and academic integrity” says Vidur Bhogilal, Vice Chairman, Lumina Datamatics.


Bik, E. M., Casadevall, A., & Fang, F. C. (2016). The prevalence of inappropriate image duplication in biomedical research publications. MBio. https://doi.org/10.1128/mBio.00809-16

Byrne, J.A., Christopher J. (2020). Digital magic, or the dark arts of the 21st century—how can journals and peer reviewers detect manuscripts and publications from paper mills? FEBS Letters. https://doi.org/10.1002/1873-3468.13747


This article is part of the series Real World Impact with Wiley Research.

Watch our Webinar to help you get published

Related Articles

/global/aem/banner-6. This is a very global banner for every single page of EN language