A Quick Overview of Artificial Intelligence and Machine Learning (revised version)

Hiroki Sayama
sayama@binghamton.edu

2
https://medium.com/swlh/the-map-of-artificial-intelligence-2020-2c4f446f4e43

1. The Origin: Understanding
“Intelligence”
2. Key Ingredient I: Statistics &
Data Analytics
3. Key Ingredient II: Optimization
4. Machine Learning
5. Artificial Neural Networks
6. Deep Learning
7. Other Topics and Tools
8. Research Examples
9. Challenges
3

The Origin:
Understanding
“Intelligence”
4

5
https://www.felienne.com/archives/2974

6
https://en.wikipedia.org/wiki/Turing_test

7
The first formal model of
computational mechanisms of
(artificial) neurons

8
Multilayer perceptron
(Rosenblatt 1958)
Backpropagation
(Rumelhart, Hinton &
Williams 1986)
Deep learning
https://commons.wikimedia.org/wiki/File:
Example_of_a_deep_neural_network.png

10
Norbert Wiener
(This is where the word “cyber-” came from!)

▪ Herbert Simon et al.’s “Logic Theorist” (1956)
▪ Functional programming, list processing (e.g.,
LISP (1955-))
▪ Logic-based chatbots (e.g., ELIZA (1966))
▪ Expert systems
▪ Fuzzy logic (Zadeh, 1965)
11

Key
Ingredient I:
Statistics &
Data Analytics
13

▪ Descriptive statistics
▪ Distribution, correlation,
regression
▪ Inferential statistics
▪ Hypothesis testing, estimation,
Bayesian inference
▪ Parametric / non-parametric
approaches
14
https://en.wikipedia.org/wiki/Statistics

▪ Legendre, Gauss (early 1800s)
▪ Representing the behavior of a
dependent variable (DV) as a
function of independent
variable(s) (IV)
▪ Linear regression, polynomial
regression, logistic regression,
etc.
▪ Optimization (minimization) of
errors between model and data
15
https://en.wikipedia.org/wiki/Regression_analysis
https://en.wikipedia.org/wiki/Polynomial_regression

▪ Original idea dates back to
1700s
▪ Pearson, Gosset, Fisher (early
1900s)
▪ Set up hypothesis(-ses) and
see how (un)likely the
observed data could be
explained by them
▪ Type-I error (false positive),
Type-II error (false negative)
16
https://en.wikibooks.org/wiki/Statistics/Testing
_Statistical_Hypothesis

▪ Bayes & Price (1763), Laplace
(1774)
▪ Probability as a degree of belief
that an event or a proposition is
true
▪ Estimated likelihoods updated
as additional data are obtained
▪ Empowered by Markov Chain
Monte Carlo (MCMC) numerical
integration methods (Metropolis
1953; Hastings 1970)
17
https://en.wikipedia.org/wiki/Bayes%27_theorem
https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo

Key
Ingredient II:
Optimization
18

▪ Legendre, Gauss (early 1800s)
▪ Find the formula that minimizes
the sum of squared errors
(residuals) analytically
19
https://en.wikipedia.org/wiki/Least_squares

▪ Find local minimum of a
function computationally
▪ Gradient descent (Cauchy
1847) and its variants
▪ More than 150 years later,
this is still what modern
AI/ML/DL systems are
essentially doing!!
▪ Error minimization
20
https://commons.wikimedia.org/wiki/File:
Gradient_descent.gif

▪ Extensively studied and used in
Operations Research
▪ Practical optimization algorithms
under various constraints
21
https://en.wikipedia.org/wiki/Linear_programming
https://en.wikipedia.org/wiki/Integer_programming
https://en.wikipedia.org/wiki/Floyd%E2%80%93Wa
rshall_algorithm

▪ Original idea by Turing (1950)
▪ Genetic algorithm (Holland 1975)
▪ Genetic programming (Cramer 1985, Koza 1988)
▪ Differential evolution (Storn & Price 1997)
▪ Neuroevolution (Stanley & Miikkulainen 2002)
22
https://becominghuman.ai/my-new-genetic-algorithm-for-time-series-f7f0df31343d https://en.wikipedia.org/wiki/Genetic_programming

▪ Ant colony optimization
(Dorigo 1992)
▪ Particle swarm optimization
(Kennedy & Eberhart 1995)
▪ And various other metaphor-based metaheuristic algorithms
https://en.wikipedia.org/wiki/List_of_metaphor-based_metaheuristics
23
https://en.wikipedia.org/wiki
/Ant_colony_optimization_al
gorithms
https://en.wikipedia.org/wiki
/Particle_swarm_optimizati
on

▪ Unsupervised learning
▪ Find patterns in the data
▪ Supervised learning
▪ Find patterns in the input-output mapping
▪ Reinforcement learning
▪ Learn the world by taking actions and receiving
rewards from the environment
25

▪ Clustering
▪ k-means, agglomerative
clustering, DBSCAN,
Gaussian mixture, community
detection, Jarvis Patrick, etc.
▪ Anomaly detection
▪ Feature
extraction/selection
▪ Dimension reduction
▪ PCA, t-SNE, etc.
26
https://reference.wolfram.com/language/ref/FindClusters.html
https://commons.wikimedia.org/wiki/File:T-SNE_and_PCA.png

▪ Regression
▪ Linear regression, Lasso, polynomial
regression, nearest neighbors,
decision tree, random forest,
Gaussian process, gradient boosted
trees, neural networks, support vector
machine, etc.
▪ Classification
▪ Logistic regression, decision tree,
gradient boosted trees, naive Bayes,
nearest neighbors, support vector
machine, neural networks, etc.
▪ Risk of overfitting
▪ Addressed by model selection, cross-
validation, etc.
27
https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
https://scikit-learn.org/stable/auto_examples/
model_selection/plot_underfitting_overfitting.html

▪ Environment typically
formulated as a Markov
decision process (MDP)
▪ State of the world + agent’s
action
→ next state of the world +
reward
▪ Monte Carlo methods
▪ TD learning, Q-learning
28
https://en.wikipedia.org/wiki/Markov_decision_process

▪ Hopfield (1982)
▪ A.k.a. “attractor networks”
▪ Fully connected networks with
symmetric weights can recover
imprinted patterns from imperfect
initial conditions
▪ “Associative memory”
Input Output
30
https://github.com/nosratullah/hopfieldNeuralNetwork

▪ Hinton & Sejnowski (1983),
Hinton & Salakhutdinov (2006)
▪ Stochastic, learnable variants
of Hopfield networks
▪ Restricted (bipartite) Boltzmann
machine was at the core of the
HS 2006 Science paper that
ignited the current boom of “Deep
Learning”
31
https://en.wikipedia.org/wiki/Boltzmann_machine
https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine

▪ Multilayer perceptron
(Rosenblatt 1958)
▪ Backpropagation (Werbos
1974; Rumelhart, Hinton &
Williams 1986)
▪ Minimization of errors by
gradient descent method
▪ Note that this is NOT how our
brain learns
▪ “Vanishing gradient” problem
32
Computation
Error correction
Input
Output

▪ Rumelhart, Hinton & Williams
(1986) (again!)
▪ Feed-forward ANNs that try
to reproduce the input
▪ Smaller intermediate layers
→ dimension reduction,
feature learning
▪ HS 2006 Science paper also
used restricted Boltzmann
machines as stacked
autoencoders
33
https://towardsdatascience.com/applied-deep-learning-part-3-
autoencoders-1c083af4d798
https://doi.org/10.1126/science.1127647

▪ Hopfield (1982);
Rumelhart, Hinton &
Williams (1986) (again!!)
▪ ANNs that contain
feedback loops
▪ Have internal states and
can learn temporal
behaviors of any long-
term dependencies
▪ With practical problems
in vanishing or exploding
long-term gradients
34
https://commons.wikimedia.org/wiki/File:Neuronal-Networks-
Feedback.png
https://en.wikipedia.org/wiki/Recurrent_neural_network
h
o
V
nfold
t 1
ht 1
ot 1
t
ht
ot
t+1
ht+1
ot+1
V
V V V
... ...

▪ Hochreiter & Schmidhuber
(1997)
▪ An improved neural module
for RNNs that can learn long-
term dependencies
effectively
▪ Vanishing gradient problem
resolved by hidden states
and error flow control
▪ “The most cited NN paper of
the 20th century”
35

▪ Actively studied since 2000s
▪ Use inherent behaviors of
complex dynamical systems
(usually a random RNN) as
a “reservoir” of various
solutions
▪ Learning takes place only at
the readout layer (i.e., no
backpropagation needed)
▪ Discrete-time, continuous-
time versions
36
https://doi.org/10.1515/nanoph-2016-0132
https://doi.org/10.1103/PhysRevLett.120.024102

▪ Self-organizing map (Kohonen 1982)
▪ Neural gas (Martinetz & Schulten 1991)
▪ Spiking neural networks (1990s-)
▪ Hierarchical Temporal Memory (2004-)
etc…
37
https://en.wikipedia.org/wiki/
Self-organizing_map
https://doi.org/10.1016/j.neucom.
2019.10.104
https://numenta.com/neuroscience-research/sequence-learning/

▪ Ideas originally around since
the beginning of ANNs
▪ Became feasible and popular
in 2010s because of:
▪ Huge increase in available
computational power thanks
to GPUs
▪ Wide availability of training
data over the Internet
39
https://commons.wikimedia.org/wiki/File:Example_of_a_deep_neural_network.png
https://www.techradar.com/news/computing-components/graphics-cards/best-graphics-cards-1291458

▪ Fukushima (1980), Homma
et al. (1988), LeCun et al.
(1989, 1998)
▪ DNNs with convolution
operations between layers
▪ Layers represent spatial
(and/or temporal) patterns
▪ Many great applications to
image/video/time series
analyses
40
https://towardsdatascience.com/a-comprehensive-guide-to-
convolutional-neural-networks-the-eli5-way-3bd2b1164a53
https://cs231n.github.io/convolutional-networks/

41
https://arxiv.org/abs/1412.6572
https://en.wikipedia.org/wiki/Generative_
adversarial_network
▪ Goodfellow et al. (2014a,b)
▪ DNNs are vulnerable
against adversarial attacks
▪ Utilize it to create co-
evolutionary systems of
generator and discriminator
https://commons.wikimedia.org/wiki/File:A-Standard-GAN-and-b-conditional-GAN-architecturpn.png

▪ Scarselli et al. (2008),
Kipf & Welling (2016)
▪ Non-regular graph
structure used as
network topology
within each layer of
DNN
▪ Applications to graph-
based data modeling,
e.g, social networks,
molecular biology, etc.
42
https://tkipf.github.io/graph-convolutional-networks/
https://towardsdatascience.com/how-to-do-deep-learning-on-
graphs-with-graph-convolutional-networks-7d2250723780

▪ Vaswani et al. (2017)
▪ DNNs with self-attention
mechanism for natural
language processing (NLP)
▪ Enhanced parallelizability
leading to shorter training time
than LSTM
▪ BERT (2018) for Google search
▪ Massive language models:
Open AI’s GPT-3 (2020),
Google's Switch Transformer
(2021), etc.
43
https://arxiv.org/abs/1706.03762

44
OpenAI GPT-3 / DALL-E
https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-
article-gpt-3

46
Time series analysis
• Autoregression, ARMA/ARIMA, time series
embedding, phase space reconstruction, etc.
Natural language processing (NLP)
• Classic syntactic/semantic approaches
Information theory
• Entropy, mutual information
Computation theory
• Automata, computational complexity

47
Brain/neuroscience, cognitive science
Complex systems and networks
Robotics and control
Consciousness, sentience, self

▪Python!!
▪scikit-learn
▪TensorFlow / Keras
▪PyTorch
▪Mathematica, MATLAB
48

Research
Examples
(of My Own)
49

50
Zamani Esfahlani, F. et al. (2018). A network-based classification framework
for predicting treatment response of schizophrenia patients. Expert Systems
with Applications, 109, 152-161. https://doi.org/10.1016/j.eswa.2018.05.005
Graduate Award for Excellence
in Research (2018)

51
Cao, Y., et al. (2022). Visualizing collective
idea generation and innovation processes in
social networks. IEEE Transactions on
Computational Social Systems.
https://doi.org/10.1109/TCSS.2022.3184628

52
Dong, Y. et al. (2021).
Utterance clustering using
stereo audio channels.
Computational Intelligence
and Neuroscience, 2021,
6151651.
https://doi.org/10.1155/2021/
6151651

53
Sayama, H. (2022). Social fragmentation transitions in
large-scale adaptive social network simulations,
Proceedings of the 14th International Conference on
Parallel Processing and Applied Mathematics (PPAM 2022)
/ 7th Workshop on Complex Collective Systems, Springer,
in press. https://arxiv.org/abs/2205.10489

▪ Words, numbers, facts
▪ Maintaining stability and plasticity
▪ Catastrophic forgetting
▪ Transfer Learning
▪ Application of acquired knowledge
to different problems
55
https://spectrum.ieee.org/openai-dall-e-2
https://www.invistaperforms.
org/getting-ahead-forgetting-
curve-training/
https://www.analyticsvidhy
a.com/blog/2021/10/unders
tanding-transfer-learning-
for-deep-learning/

56
https://spectrum.ieee.org/openai-dall-e-2
istockphoto.com

59
https://www.wired.com/story/deepfakes-getting-better-theyre-easy-spot/

60
https://www.analyticsvidhya.com/blog
/2022/03/the-carbon-footprint-of-ai-
and-deep-learning/

61
Fall 2020: “How to
safely reopen the
campus”

63
https://en.wikipedia.org/wiki/Tree_of_life_(biology)

Are We Getting Any
Closer to the
Understanding of
True “Intelligence"?
64

▪ Don’t get drowned in the vast
ocean of methods and tools
▪ Hundreds of years of history
▪ Buzzwords and fads keep changing
▪ Keep the big picture in mind –
focus on what the real problem is
and how you will solve it
▪ Being able to develop unique,
original, creative solutions is
key to differentiate your
intelligence from AI/machines
65

▪ Wikipedia, various websites and many AI/ML bloggers
for great info and images!!
▪ The following people for providing feedback on the
initial version:
▪ Sofia Teixeira, Arseny Krasikov, Odai Yousef Dweekat,
Mohammed Jarbou, Seth Bullock, Dobromir Dotov, and
others
66

A Quick Overview of Artificial Intelligence and Machine Learning (revised version)

More Related Content

Similar to A Quick Overview of Artificial Intelligence and Machine Learning (revised version) (20)

More from Hiroki Sayama (15)

Recently uploaded (20)

A Quick Overview of Artificial Intelligence and Machine Learning (revised version)