Data Manipulation And Data Integrity ethics in research

Data Manipulation and
Data Integrity

Data Manipulation
• Data manipulation is the process in which scientific data is
forged, presented in an unprofessional way or changed with
disregard to the rules of the academic world.
• Data manipulation may result in a distorted perception of a
subject, which may lead to false theories being built and
tested.
• An experiment based on data that has been manipulated is
risky and unpredictable.

Consequences of Data Manipulation
• Misleading colleagues
• Impeding progress
• Causing harm to society
• Unpredictable experiments

Statistics as a tool of Data
manipulation
• One of the most common kinds of data manipulation is
misuse of statistics – many article titles on the internet are
based on misuse of statistics, as are some political and
economic arguments.
• Misuse of statistics does include data forgery – the process
in which data is created without any connection to actual
data situations.
• The most important kinds of misuse of statistics are those
that involve real data that is presented in a manner that may
be misleading and even dangerous.

Kinds of Data Manipulation and
Reasons behind Them
• Omitting important facts, factors
• Researchers are looking for results (Because that means research grants, etc.), and thus,
they sometimes deliberately or unintentionally manipulate data to fit their hypothesis.
• When conducting an experiment, researchers use lists of incomplete relevant factors.
• For example, a political poll can include the age, income, or religious beliefs of the participants.
• The weak point here is the fact that the researcher may not have included an important factor as
relevant in the study.
• If a study “Computer games – art or not?” was conducted on participants between the
ages of fifty and sixty, then its results would probably be quite different from the results
of the same study conducted on participants between the ages of fifteen and twenty.
• If, in the resulting publication, the age of the participants is not clearly stated, then that is an example
of data manipulation and, specifically, misuse of statistics.

Data Manipulation And Data Integrity ethics in research

How can you avoid this?
• Consider you are planning an extensive study; you may take the help of your friends
• Read many research papers and identify most of the factors that have been included by
researchers.
• Read about the domain and identify additional factors.
• As the main researcher, put down all the specifics of the study, all of the questions and any
other relevant information that you would expect for actual data gathering.
• Clarify and train your friends in the process of data collection.
• There are many friends– some with high integrity, some infamous, and some not well
known.
• As a researcher, if you publish data from the study, you should describe the data collection
process.
• If not mentioned, then the data should not be taken seriously.

Can a company be hired to do the
study?
• If you have employed companies to do the study.
• Verify that the company used in the experiment is reputed.
• the higher the chances of having a transparent system
• One can investigate how the study is conducted to investigate.
• Big companies caught doing data manipulation would mean the loss of
clients.
• Why would the company do data manipulation?
• To please the client – give the expected poll results.
• A good example is the Election results…

Pre-determined Results
• If a big tobacco-producing company wants to conduct research on the
probability of cancer being a result of smoking the cigarettes that the company
sells.
• There is a definite result the company wants to get, which is that the probability is not
higher than that of non-smokers.
• The understanding of this might lead a study-conducting company to
manipulate data to get results that would please the client.
• A real-life example would be the Volkswagen Scandal of 2015, in which the
Volkswagen Corporation falsified information about the gas emissions of its
cars. This led to the release of cars that polluted forty times more than allowed
by law. The falsification was done using a “defeat device” – smart software that
would turn on emission control when the car was being tested in a laboratory.

Another Reason of data manipulation
• Data is manipulated as research is hard to do. Conducting a
study with a few thousand participants is a lot of work in a
lot of different spectres.
• Instead, copy data from another research on the same topic
• This is technically plagiarism, not data manipulation, but these two
go together

False causality and illogical sequences
• This kind of falsification is done to deceive those who are not quite familiar with the
subject of the research.
• Situation: Research was conducted on mice of brown colour for multiple generations.
• The published theory: “Every generation of brown mice has more deaths than the
previous one”.
• The reason for this statistics (Which is true) is not the mentioned colour of the mice
but the fact that each new generation has more mice than the previous one and thus
has more deaths.
• This is misleading research as it omits the fact of more births.
• The information and the facts are true.
• However, they do not correlate to the work they are trying to show.

How can this be avoided?
• Do not compare apples and oranges.
• A graph or a statistic usually compares data, and it is important
to understand what kind of data it is and how it connects.
• For example, If people buy more matchboxes – more people get
cancer – lighters are bad for you.
• In actuality, people get cancer because they smoke, and they buy
lighters because they smoke.
• So, looking at the logical connections that the research makes is
quite important.

Data dredging and fact fitting
• Data dredging is the process in which researchers look through large amounts
of data to find patterns.
• The amounts of data picked for dredging are usually so big that there would
be at least one or two coincidences that can be used to base a theory on them.
• With the use of computers, this became even easier because a computer is
much better at figuring out more strings of facts from even larger amounts of
data.
• This leads to publications that are irrelevant or are based on a pure
coincidence
• Fact fitting is the process which, in a sense, is the opposite of fact omitting –
facts are shaped to fit a certain theory.

Bonferroni's principle
• Assume you are trying to identify people who are cheating in
examinations within a certain population
• You know that the percentage in the population who cheat is 5%.
• If you decide that people who claim to go out with friends more
than three times a week are most likely to cheat in examinations.
• You discover that 20% of people in the population qualify with your
method, then you know, in the very best case, only one-quarter of
the people you identify will actually be cheaters
• Furthermore, if there are any false negatives (cheaters who aren't
identified as cheaters), an even higher percentage of the "cheaters"
identified with the system would be false positives.

How can this be avoided?
• Check the facts: That is actually important in a lot of cases.
• If a scientific fact is actually a scientific fact, then it probably
comes up in more than one publication.
• Apply the rule of big numbers – if something like that would
actually be true – would you be the first one to discover it?
• A shocking new discovery? Why has it not been done before?
• Most people don’t get a chance to do PHD, so do not publish
questionable articles at some point in your life.

LaCour and scientifically based data
manipulation
• There are unique situations when data is not just manipulated but
manipulated professionally.
• Facts are pushed in different ways by a person who knows what he is doing.
• An easy example of this in work is any political debate.
• Case: “Irregularities in LaCour” – an exposure done by David Broockman,
both graduate students at UC Berkeley
• Michael LaCour, who forged enormous amounts of data in a significant
study on the perception of marriages in the years 2014-2015, has rocked the
scientific world.

Summary of LaCour’s research
• LaCour hired a company for research that would prove his
theory: people’s views on marriage can change dramatically
after a conversation with someone who has strong feelings
about marriage.
• It was a large-scale poll with ten thousand of respondents.
• The research proved LaCour’s theory, which was a new and
unique result.
• All previous results in similar works had shown that people
hardly change their political and social views.

What did Broockman do?
• Brookman was greatly impressed by LaCour’s work. He wanted to conduct
similar experiments.
• However, he found out the following:
• First hint: A hired company could not have conducted such research for a graduate
student’s budget
• He did not come out with it because it is easy to gain the reputation of someone who
does no work of his own and just tries to ruin others' work.
• A lot of people – scientists and researchers who Broockman talked to told him not to
publish such materials.
• Later, Broockman, with another student, noticed some specific irregularities (the
politically correct term for “mistakes and falsifications”) in the data used.
• The data did not look random enough.
• Later, they found a database that LaCour copied, which was the last argument
needed to publish their report.

Results
• LaCour lost his just obtained position in Princeton and his reputation
– it will now be really hard for him to return into the world of science.
• Broockman made the headlines and spoke a lot about debunking
and academic integrity.
• While some people are arguing about the competence of Lacour's
research, it has now been revealed that he had not, in fact, hired any
poll-conducting company, forged a letter from the company and lied
in later interviews.
• After all of these occurrences, there seems to be no reason for a
question of “competence”.

Lessons to be learned
• LaCour wanted a result that would make him a first-class researcher
and succeeded – until the exposure
• he had time to become quite famous.
• To obtain such a result he plagiarized data AND manipulated it to fit
his theory
• thus, he committed a set of violations of academic integrity.
• He was caught and is now a great example of how debunking works.
• The possibility of such exposure is one of the main protections of the
scientific world from academic dishonesty and thus should be
advocated.

Education on Data Manipulation
• While a lot is being done to expose and debunk data
manipulation, it is a subject that is not a part of popular culture.
• Debunking does not always involve high - class science –
Broockman just started checking how LaCour did his research.
• publication of false data may cause harm, as in Volkswagen's
example or in the medical field.
• LaCour´s data nearly caused the reform of many systems and
structures that concerned political and social views because of
its “original” content.

Image Manipulation
• In research, "image manipulation" is :
• Altering or modifying a digital image using software
• With the purpose of enhancing certain features, adjusting colours, or even
completely fabricating the image.
• Unethical if it
• It leads to misrepresented data.
• changes the interpretation of the results presented in the research article
• Minor adjustments like brightness and contrast are often
acceptable.
• Researchers must clearly disclose any image manipulations made in
their methods section, explaining the rationale behind the changes.

Image Manipulation: Image by Mystic
Art Design from Pixabay

The Digital Age
• The amount of data created,
captured, copied, and
consumed globally has been
growing rapidly:
• 2020: 64.2 zettabytes
• 2021: 79 zettabytes
• 2024: Projected to be 147
zettabytes
2020 2021 2022 2023 2024
0
20
40
60
80
100
120
140
160
Growth of Data Generation

Units of Data Measure
• Bit: Binary Digit
• Byte: eight bits: One ASCII character
• Kilobyte: 1,000 bytes
• Megabyte: 1,000 kilobytes. A large book
• Gigabyte: 1,000 megabytes. Hard disk size 500 gigabytes
• Terabyte: 1,000 gigabytes. A huge library
• Petabyte: 1,000 terabytes Research journal data 5 petabytes
• Exabyte: 1,000 petabytes.
• Zettabyte (ZB), Yottabyte (YB), Brontobyte (BB), Geopbyte (GPB)…

Data Integrity
• Research Data: Data used in scientific, engineering, and medical research
as inputs to generate research conclusions.
• Metadata: refers to descriptions of the content, context, and structure of
information objects.
• Data Integrity:
• An uncompromising adherence to ethical values, strict honesty, and absolute
avoidance of deception.
• The state of being whole and complete
• High integrity means having the confidence that the data are complete, verified,
and remain unaltered.
• Data integrity can be defined as the ‘state of data (valid or invalid) and/or
the process of ensuring and preserving the validity and accuracy of data.

Why Data Integrity?
• Key to empirical scientific research
• Leads to consistent decision making
• leads to trustworthy findings
• leads to the creation of correct knowledge
• Research data integrity is desirable from the inception of a
research project through the dissemination of findings and the
subsequent sharing of data.

Data Integrity for us
• You may be asked to describe your methods and tools for collecting
data so that the data can be checked and verified.
• You may also record the process or algorithms of pre-processing of
data, if any.
• Your analysis results should be verifiable.
• You may ensure…

Scientific Rigor Apply scientific methods to ensure unbiased and well-
controlled experimental design, methodology, analysis,
interpretation, and reporting of results.
Computational Reproducibility Obtaining consistent computational results using the same
input data, computational steps, methods, code, and
conditions of analysis
Replicability of Results Replicability means obtaining consistent results across
studies aimed at answering the same scientific question,
each of which has obtained its own data.
Reuse Data reuse is a concept that involves using research data
for a research activity or purpose other than that for which it
was originally intended

Integrity Policy of Publishers
• Data and methods access
• Does the journal require that all data be made available on request to journal editors and reviewers? Yes?
• Does the journal require the deposition of data in a public repository? Yes/No
• Are authors required to provide algorithms or computer programs used in the collection, report, or
analysis of data? No?
• Image manipulation
• Is image manipulation prohibited? No
• Does the journal require that image manipulation be reported? Yes
• Does the journal require that digital techniques be applied to the entire image? Yes
• Does the journal use software tests to detect image manipulation? Yes
• Ethics and Scientific Misconduct
• Is there a specified ethical statement? Yes
• Does the journal have a scientific misconduct investigation or reporting policy in place? Yes

Integrity: The Individual responsibility and
Collective Scrutiny of Research Data and Results
• Individual Responsibility is to ensure that the data are complete,
verified, and undistorted
• Collective responsibility is to ensure the data integrity of the submitted
research data and results derived from those data.
• When others can examine the steps used to generate data and the conclusions
drawn from those data, they can judge the validity of the data and results.
• the collective scrutiny of research results cannot guarantee that results will be
free of error or bias.
• bring multiple perspectives to minimise the error and bias.

Collective Scrutiny
• Data Producers Role:
• make data available to others so that the data’s quality can be judged.
• Data providers Role:
• Make data widely available in a form such that the data can be not only used
but evaluated, which requires the availability of metadata
• Data Users or Researcher's Role:
• Perform critical evaluation of the data generated by themselves and others.

Define the Data Processes
• You should report the following about data collection
• State the tools, techniques and procedures used to collect data
• Record anything that was done to the data thereafter
• Clearly state the models, code, and input data used
• For example, a community may decide that double-blind trials,
independent verification, or particular instrumental calibrations are
necessary for a body of data to be accepted as having high quality.
• Scientific methods include both a core of widely accepted methods and a
periphery of methods that are less widely accepted.
• Data integrity involves scrutiny of the methods used to derive those data.

PEER REVIEW AND INTEGRITY OF DATA
• Peer review of articles submitted to a scholarly journal for
publication is the most important process of ensuring Data
Integrity
• Screens for quality and relevance and help to ensure that
professional standards are followed in data collection and analysis.
• A forum in which the collective standards of a field can be enforced.
• Examines whether research questions have been framed and
addressed properly
• Examines whether findings are original and significant
• Examines whether a paper is clearly written and acknowledges
previous work.

Peer Review in Digital Age
• Digital technologies have put pressure on the peer review system.
• The volume or diversity of research data supporting a conclusion may
overwhelm the ability of a reviewer to evaluate the link between the data and
that conclusion, as supporting information for a finding in a submitted paper
increasingly moves to lengthy supplemental materials, reviewers may be less
able to judge the merits of a paper.
• Difficult to find peer reviewers who are competent and have the time to judge
complex interdisciplinary manuscripts.
• Peer review cannot ensure that all research data are technically accurate,
though inaccuracies in data can become apparent either in review or as
researchers seek to extend or build on data.

Trust in Research
• The research system is based to a large degree on trust.
• Following the standards is a crucial factor in building trust.
• Build and maintain trust.

Breach of Trust
• In 1998, a series of remarkable papers attracted great attention within the condensed
matter physics community.
• The papers, based largely on work done at Bell Laboratories, described methods that could
create carbon-based materials with superconductivity using molecular-level switching.
• However, when other materials scientists sought to reproduce or extend the results, they
were unsuccessful.
• In 2001, several physicists inside and outside Bell Laboratories began to notice anomalies
among the papers.
• Several contained figures that were very similar, even though they described different
experimental systems.
• Some graphs seemed too smooth to describe real-life systems.
• The person who helped create the materials had made the physical measurements on
them and was a co-author on all the papers was questioned.
• A committee was formed, which detected fabrication in 16 of 25 works published.

Using Digital Technologies and Data
Integrity
• Digital technologies can pose risks to data integrity, but they also offer ways to
improve the reliability of research data.
• They enable researchers to build checking and verification procedures into
research protocols in ways that reduce the potential for error and bias.
• Automated data collection that is quality-controlled can be much more accurate
when either substituting for or supplementing human observations.
• An example is the use of digital technologies in clinical research, including the
conduct of clinical trials and plans to link clinical trial information with individuals’
electronic health records.
• Will Digitizing individuals’ electronic health records compromise their security and privacy?
• Will inappropriate usage be properly restricted?
• Will companies be able to acquire and share these data?
• Merging of two datasets might make it possible to identify patients who have been “de-
identified” in each.

AI and data integrity
• Train people working with artificial intelligence (AI) to support
data integrity assurance in AI applications.
• AI practices should be aligned with societal values and ethical
norms.
• Key challenges and opportunities
• Artificial intelligence (AI) significantly enhances data integrity by
reducing human error and increasing efficiency in data processing.
• With its ability to efficiently process and analyse large datasets,
“AI has facilitated “breakthroughs in fields such as predictive
analytics, personalised medicine, and autonomous systems”.

Caution
• However, when using AI systems, data integrity concerns,
including data accuracy, quality, privacy, and security, arise.
• The integrity of AI decisions is directly linked to the integrity
of the data it processes.
• Data manipulation, whether intentional or due to inherent
biases in algorithms, pose serious questions about the
reliability and fairness of AI-driven decision making

AI Implications for data integrity
• AI systems are only as good as the data they are fed and how they are
programmed;
• There is a concern that if the input data is flawed or biased, AI will amplify
these issues.
• There is a need for transparency in AI algorithms to ensure data integrity.
• Policymakers should focus on developing and refining
comprehensive, adaptable regulatory frameworks for AI that
emphasise privacy, transparency, and accountability
• institutions and organisations should invest in continuous ethical
training and awareness programs for AI practitioners. This would
enable them to recognise and address the ethical implications of their
work, thereby ensuring data integrity and fairness in AI applications.

Data Integrity Principle [1]
• “Ensuring the integrity of research data is essential for advancing
scientific, engineering, and medical knowledge and for maintaining
public trust in the research.”
• “Researchers are ultimately responsible for ensuring the integrity of
research data.”

References
• Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the
Digital Age by Committee on Ensuring the Utility and Integrity of Research Data in
a Digital Age; National Academy of Sciences
• Oladoyinbo, Tunboson Oyewale and Olabanji, Samuel Oladiipo and Olaniyi,
Oluwaseun Oladeji and Adebiyi, Olubukola Omolara and Okunleye, Olalekan J. and
Ismaila Alao, Adegbenga, Exploring the Challenges of Artificial Intelligence in Data
Integrity and its Influence on Social Dynamics (January 13, 2024). Asian Journal of
Advanced Research and Reports, Volume 18, Issue 2, Page 1-23, 2024, Available at
SSRN: https://ssrn.com/abstract=4693987
• Condon, P., Simpson, J., and Emanuel, M. (2022) Research data integrity: A
cornerstone of rigorous and reproducible research, IASSIST Quarterly 46(3), pp. 1-
21. DOI: https://doi.org/10.29173/iq1033
• http://ds-wordpress.haverford.edu/psych2015/projects/chapter/plagiarism-and-
data-manipulation/

Data Manipulation And Data Integrity ethics in research

More Related Content

Similar to Data Manipulation And Data Integrity ethics in research (20)

Recently uploaded (20)

Data Manipulation And Data Integrity ethics in research