Discussion of the ethical challenges in the use of AI/ML for research in science and engineering with a major focus on the reproducibility and interpretability of works based on AI/ML.
On the ethical challenges in the use of AI/ML for research in science and engineering
1. On the Ethical Challenges in the Use of AI/ML
for Research in Science and Engineering
Kyeong Soo (Joseph) Kim
Department of Communications and Networking
School of Advanced Technology
Fifth Research Ethics Workshop:
Ethical Considerations of AI Related Research
Xi’an Jiaotong-Liverpool University
April 9, 2025
4. AI vs. ML [1]
AI and ML are closely related to each other yet distinct:
▶ AI is the broader concept of enabling a machine or system to sense,
reason, act, or adapt like a human.
▶ ML is an application of AI that allows machines to extract knowledge
from data and learn from it autonomously
3 / 23
5. Challenges in the Use of AI/ML for Research
The use of AI/ML in science and engineering significantly affects the two
fundamental aspects of research:
▶ Reproducibility.
▶ Interpretability.
4 / 23
6. Reproducibility [2]
▶ Reproducibility is a key to any scientific method and ensures repeating
an experiment and the results of its analysis in any place by any
person.
▶ A study can be truly reproducible when it satisfies at least the
following three criteria:
▶ All experimental methods are fully reported.
▶ All data and files used for the analysis are (publicly) available.
▶ The process of analyzing raw data is well reported and preserved.
▶ Reproducible research is to ensure
▶ Same data + Same code = Same results
5 / 23
7. Case 1: Laser Interferometer Gravitational-Wave
Observatory (LIGO) [5]
Cosmic gravitation waves were
detected on Sep. 14, 2015, by
LIGO, resulting in 2017 Nobel
Prize in Physics awarded to R.
Weiss, K. Thorne, and B. C.
Barish.
It was reported in Physical Review Letters[3], Feb. 2016,
with ipython notebook [4] with analysis code and data.
6 / 23
8. Case 2: Schön Scandal - Molecular Computing [6]
Jan Hendrik Schön rose to
prominence after a series
of successful experiments
apparently paving the way
to a new field of molecular
computing, which,
however, were discovered
later to be fraudulent.
▶ No records for his
ground-breaking
experimental results,
including lab
notebooks,
experimental
samples, data, and
hard disk drives.
7 / 23
9. Interpretability
There is no consensus on the
definition of interpretability.
Some examples are:
▶ The degree to which a
human can understand the
cause of a decision [7].
▶ The degree to which a
human can consistently
predict the model’s result [8].
▶ The ability to explain or to
present in understandable
terms to a human [9].
But how to do with the 160 billion
parameters trained by Digital Reasoning,
Jul. 7, 2015 [10]?
8 / 23
10. Case 1: Space Shuttle Challenger Disaster [11]
Challenger’s Solid rocket boosters
after its explosion, Jan. 28, 1986.
Feynman’s famous C-clamp experiment televised on
Feb. 11, 1986.
9 / 23
11. Case 2: Xiaomi SU7 Crash [12]
On March 29th, 2025, a Xiaomi
SU7 driving in Navigate on
Autopilot (NOA) intelligent
driving mode crashed on a
highway in Anhui Province,
resulting in the loss of the lives
of three young college-age girls
in the car.
10 / 23
13. During the Construction of a Model
▶ In ML, three datasets are commonly used in different stages of the
construction of a model:
▶ Training dataset: Used to train the model.
▶ Validation dataset: Used for hyperparameter tuning and model
selection.
▶ Test dataset: Used to evaluate the final model performance.
▶ One notable issue in the use of datasets is data leakage [13]:
▶ Information from the test or validation dataset unintentionally leaks into
the training process, leading to an inflated sense of model performance.
12 / 23
15. In the Interpretation of Results
▶ Deep neural network (DNN) models can provide state-of-the-art
performance, but they do not provide interpretability from model
construction to training to final predictions.
▶ Gaussian processes (GPs), on the other hand, can provide interpretable
models based on a large body of well-established theory and
algorithms in both statistics and ML.
14 / 23
16. Case 1: Deep Neural Network vs. Gaussian Process
, ,
, | 1, … ,
, Σ
Input Data & Priors
∗| , , ∗ ∽ ∗, cov ∗
Predictive Distribution
Gaussian Process
vs.
15 / 23
17. Case 2: Wi-Fi RSSI Data Augmentation Based on
Gaussian Process
16 / 23
19. Conclusions
▶ The increasing use of AI/ML in science and engineering makes it more
difficult to conduct reproducible and interpretable research [14].
▶ The situation becomes worse by questionable practices by both
researchers and practitioners.
▶ A more focus should be put on sound theoretical frameworks &
techniques and good research practices to provide interpretable and
reproducible research based on AI/ML.
18 / 23
20. References I
[1] Google. (2025) Artificial intelligence (AI) vs. machine learning (ML).
Accessed: April 7, 2025. [Online]. Available: https:
//cloud.google.com/learn/artificial-intelligence-vs-machine-learning#
artificial-intelligence-ai-vs-machine-learning-ml
[2] K. S. Kim, “Simulation reproducibility with Python and Pweave,” in
Recent advances in network simulation: The OMNeT++ environment and
its ecosystem, ser. EAI/Springer innovations in communication and
computing, A. Virdis and M. Kirsche, Eds. Switzerland: Springer
Cham, 2019, ch. 8, pp. 281–299.
19 / 23
21. References II
[3] B. P. Abbott et al., “Observation of gravitational waves from a binary
black hole merger,” Phys. Rev. Lett., vol. 116, p. 061102, Feb. 2016.
[Online]. Available:
https://link.aps.org/doi/10.1103/PhysRevLett.116.061102
[4] “Signal processing with GW150914 open data,” LIGO open science
center, Jul. 2017, accessed: March 27, 2018. [Online]. Available:
https://losc.ligo.org/s/events/GW150914/GW150914_tutorial.html
[5] Wikipedia contributors. (2025) LIGO. Accessed: April 8, 2025.
[Online]. Available: https://en.wikipedia.org/wiki/LIGO
[6] ——. (2025) Sch0̈n scandal. Accessed: April 8, 2025. [Online].
Available: https://en.wikipedia.org/wiki/Sch%C3%B6n_scandal
20 / 23
22. References III
[7] T. Miller, “Explanation in artificial intelligence: Insights from the
social sciences,” Artif. Intell., vol. 267, pp. 1–38, 2018.
[8] B. Kim, R. Khanna, and O. O. Koyejo, “Examples are not enough, learn
to criticize! criticism for interpretability,” in Advances in neural
information processing systems, D. Lee, M. Sugiyama, U. Luxburg,
I. Guyon, and R. Garnett, Eds., vol. 29. Curran Associates, Inc., 2016.
[Online]. Available: https://proceedings.neurips.cc/paper_files/paper/
2016/file/5680522b8e2bb01943234bce7bf84534-Paper.pdf
[9] F. Doshi-Velez and B. Kim, “Towards a rigorous science of
interpretable machine learning,” ArXiv e-prints, 2017, arXiv:1702.08608
[stat.ML].
21 / 23
23. References IV
[10] J. Hsu, “Biggest neural network ever pushes AI deep learning,” IEEE
Spectrum, Jul. 2015, accessed: April 8, 2025. [Online]. Available:
https://spectrum.ieee.org/
biggest-neural-network-ever-pushes-ai-deep-learning#:~:
text=Digital%20Reasoning%2C%20a%20cognitive%20computing,
larger%20than%20previous%20neural%20networks.
[11] Wikipedia contributors. (2025) Space shuttle challenger disaster.
Accessed: April 8, 2025. [Online]. Available:
https://en.wikipedia.org/wiki/Space_Shuttle_Challenger_disaster
22 / 23
24. References V
[12] J. Xu, “Xiaomi SU7 crash raises doubts on intelligent driving in china,”
CarNewsChina.com, Apr. 2025, accessed: April 8, 2025. [Online].
Available: https://carnewschina.com/2025/04/04/
xiaomi-su7-crash-raises-doubts-on-intelligent-driving/
[13] Wikipedia contributors. (2025) Leakage (machine learning). Accessed:
April 8, 2025. [Online]. Available:
https://en.wikipedia.org/wiki/Leakage_(machine_learning)
[14] W. Knight, “The dark secret at the heart of AI,” MIT Technology Review,
no. May/June 2017, Apr. 2017.
23 / 23
25. Thanks for your attention.
If you have any question, please contact me at
Kyeongsoo.Kim@xjtlu.edu.cn.