Unit-3 of mathematical foundation of ai ml

Unit-3
• Pseudo-Random Numbers: Random number generation, Inverse-transform, acceptance-rejection,
transformations, multivariate probability calculations.
• Monte Carlo Integration: Simulation and Monte Carlo integration, variance reduction, Monte Carlo
hypothesis testing, antithetic variables/control variates, importance sampling, stratified sampling
• Markov chain Monte Carlo (McMC): Markov chains; Metropolis-Hastings algorithm; Gibbs sampling;
convergence

Pseudo Random Number Generator
(PRNG)
• A Pseudo-Random Number Generator (PRNG) is an algorithm that
creates a sequence of numbers that look random but are actually
generated using a mathematical formula. It starts with an initial value
called a seed. If you know the seed, you can recreate the same
sequence of numbers. PRNGs are fast and efficient, and they generate
numbers that behave like random numbers, even though they are
predictable.

Types of Pseudo-Random Number Generators
• Linear Congruential Generator (LCG):
• Formula: X(n+1)=(a*Xn+c)mod m
• Simple and fast, but has a limited period (repeats after some time).
• example: Used in early programming languages like C and Java.
Where:
• Xn
: Current value (seed initially).
• a: Multiplier.
• C : Increment.
• M : Modulus (defines the range).

• Let’s generate a sequence of random numbers with the following
parameters:
• Seed (X0
) = 1
• a=4, c=1, m=9
• Step-by-step calculation:
1.X1=(4×1+1)mod 9=5
2.X2=(4×5+1)mod 9=3
3.X3=(4×3+1)mod 9=4
4.X4=(4×4+1)mod 9=8
Generated sequence: 1, 5, 3, 4, 8

Mersenne Twister (MT)
• The Mersenne Twister is a widely used pseudo-random number generator
known for its high-quality random numbers and long period of 2^{19937} –
1 . It’s efficient, fast, and generates numbers that are statistically random.
• Key Features
1.Period: Very long period (2^{19937} – 1 ), meaning the sequence doesn’t
repeat for a very long time.
2.Efficiency: Generates large batches of random numbers quickly.
3.Dimensions: Designed to avoid correlation in multiple dimensions.
4.Applications: Used in simulation, modeling, and gaming applications. Most
programming languages, like Python

import random
# Set a seed for reproducibility
random.seed(42)
# Generate random numbers
print("Random Number 1:", random.random()) # Generates a float between 0
and 1
print("Random Number 2:", random.randint(1, 100)) # Generates an integer
between 1 and 100
print("Random Number 3:", random.uniform(10, 20)) # Generates a float
between 10 and 20
Random Number 1: 0.6394267984578837
Random Number 2: 81
Random Number 3: 13.375387409862836

• Applications
• Gaming: For generating random events like shuffling cards or
spawning items.
• Simulations: For Monte Carlo simulations where high-quality
randomness is required.
• Machine Learning: For random initialization of weights and splitting
datasets.
The Mersenne Twister is popular due to its efficiency and statistical
quality, making it suitable for general-purpose randomness. However,
it’s not cryptographically secure and should not be used for encryption.

XOR-Shift Generators
• XOR-Shift generators are a family of efficient and fast pseudo-random
number generators (PRNGs) that use bitwise operations like XOR and
bit-shifting to produce random numbers.
How It Works
• The generator maintains an internal state (a seed), which is updated
iteratively using bitwise operations.
• The key idea is to shift bits of the internal state to the left or right and
combine them using the XOR operation.

• Algorithm
1.Start with an initial state x (seed).
Apply a sequence of XOR and shift operations:
x=x (x a)
⊕ ≫
x=x (x b)
⊕ ≪
x=x (x c)
⊕ ≫
Where a,b,c are fixed integers and , are bitwise shift operators.
≫ ≪

• The XOR (Exclusive OR) operation is a fundamental bitwise operation
in computer science and cryptography. It compares two bits and
returns
• 1 if the bits are different.
• 0 if the bits are the same.

• Example
• Let’s assume the initial state x=123456789, with a=21,b=35,c=4
1.x=x (x 21)
⊕ ≫
2.x=x (x 35)
⊕ ≪
• x=x (x 4)
⊕ ≫
• This produces the next random number x. Repeat the steps to generate a
sequence.
Applications
• Used in simulations, games, and lightweight random number generation.
• Faster than many PRNGs like the Mersenne Twister but less statistically robust.

Middle-Square Method
• The Middle-Square Method is a simple PRNG that generates a
sequence of numbers by squaring the current number and extracting
the middle digits as the next number.
• Algorithm
1.Start with an initial seed x0
2.Square xnto get y.
3.Extract the middle digits of y to form xn+1
.

• Example
• Seed: x0=1234
• Square 1234^2=1522756 .
1.Extract middle digits 2275.
2.x1=2275.
3.Repeat: 2275^2=5175625 ,middle digits 1756.
• Generated sequence: 1234,2275,1756,…
Limitations
• Short period: The sequence often repeats quickly.
• Sensitive to the seed: Some seeds can produce poor randomness.

Lagged Fibonacci Generator (LFG)
• The Lagged Fibonacci Generator generates random numbers using a
recurrence relation similar to the Fibonacci sequence.
• Algorithm
1.Define a lagged recurrence relation:
2.Xn=(Xn−j op Xn−k)mod
Where j,k are op, is an operation (e.g., addition, subtraction, XOR), and
m is the modulus.
3.Initialize with k seed values.

• Example
• Let j=2,k=5,m=10 and initial values 1,2,3,4,5.
1.X6=(X1+X4)mod 10, (1+4)mod 10=5
2.X7=(X2+X5)mod 10, (2+5)mod 10=7
• Generated sequence: 1,2,3,4,5,5,7,…
Applications
• Useful for simulations requiring high-quality randomness.

Cryptographically Secure PRNGs (CSPRNGs)
• CSPRNGs are PRNGs designed for cryptographic purposes, where randomness
must be secure and unpredictable.
• Key Features
1.Unpredictability: Even with knowledge of part of the sequence, the next
numbers cannot be predicted.
2.High Entropy: Numbers have high randomness quality.
• Applications
• Encryption keys.
• Session tokens.
• Secure password generation.

Inverse-Transform Method
• The Inverse-Transform Method is a technique to generate random
samples from a specific probability distribution using uniformly
distributed random numbers.
The key idea is to use the inverse of the cumulative distribution
function (CDF) of the desired distribution.
• Steps to Apply the Inverse-Transform Method
1.Start with the CDF F(x) :
Identify the cumulative distribution function of the desired probability
distribution. The CDF maps values of the random variable X to a range
of probabilities between 0 and 1.

• Generate a Uniform Random Number U :
Generate a random number U from a uniform distribution U(0,1)
• Solve for X:
Use the equation F(X)=U and solve for X to get a value distributed
according to the target distribution.

• How It Works
1.Start with the CDF F(x) :
The CDF gives the probability that a random variable is less than or
equal to x .
2.Generate a Random Number U :
Generate a random number U from a uniform distribution between 0
and 1 (U U(0,1)
∼
3.Solve for X :
Use the equation F(X)= U and solve for X . This gives the random
number X that follows the desired distribution.

• Example: Exponential Distribution
• Step 1: CDF of the Exponential Distribution
• For an exponential distribution with rate λ the CDF is:
• F(x)=1−e^−λx
Step 2: Generate a Uniform Random Number
• Suppose we generate U=0.5.
• Step 3: Solve for X
• Set F(X)= U:
• U=1−e^−λX

• Rearranging to solve for X :
X=−1/λ ln⁡
(1−U)
• Step 4: Plug in Values
• If λ=2
• X=−1/2 ln ⁡
(1−0.5) ≈0.35
• Key Points
• Input: Uniform random number U between 0 and 1.
• Output: A random number X that follows the desired distribution.

Acceptance-Rejection Method:
• The Acceptance-Rejection Method is a technique for generating
random numbers from a target probability distribution when direct
sampling is difficult. It uses a simpler proposal distribution to generate
samples and then accepts or rejects them based on a specific
condition

Steps to Apply the Method
1.Choose a Proposal Distribution: Select a proposal distribution that is
easy to sample from. Ensure it satisfies , where is the target
distribution and is a constant.
2.Generate a Sample: Generate a random sample from the proposal
distribution .
3.Generate a Uniform Random Number: Generate , a uniform random
number between 0 and 1.
4.Acceptance Condition: Accept the sample if:
5.Otherwise, reject the sample and repeat the process.

Key Idea
• The method works by generating samples from an easy-to-sample
distribution () and filtering them based on how closely they match the
desired distribution ().
• Rejected samples ensure the accepted ones follow the target
distribution.

• Multivariate probability calculations deal with the analysis of random
variables in a multivariate setting, i.e., when we consider more than
one random variable at a time. These calculations involve
understanding the joint behavior, relationships, and distributions of
these variables.
• . Random Variables and Joint Distribution
• If X1,X2,…,Xnare random variables, their joint probability
distribution describes the probability that these variables
simultaneously take certain values

• For discrete variables
• P(X1
=x1
,X2
=x2
,…,Xn
=xn
)
• For continuous variables: fX1,X2,…,Xn(x1,x2,…,xn)
• Marginal Probability
• The probability of a subset of variables ignoring the others.
• For continuous random variables: fX1(x1)=∫−∞∞fX1,X2(x1,x2)

• Conditional Probability
• The probability of one variable given another.

Transformations
• What are Transformations in Random Number Generation?
• Transformations are mathematical methods used to convert random
numbers generated from a uniform distribution (e.g., U(0,1) into
random numbers following other desired probability distributions.
These techniques are essential when we need random samples from
complex distributions in simulations, statistical models, and machine-
learning algorithms

Why is Transformation Important in AI/ML?
• Simulation: Simulating realistic data based on specific distributions
(e.g., Gaussian noise in deep learning).
• Statistical Models: Many machine learning models rely on
assumptions about data distributions
• Monte Carlo Methods: Used in reinforcement learning and
probabilistic modeling.

Key Transformation Methods
• Linear Transformation
Converts a uniform random variable U U(0,1) into a random variable X
∼
in a different range [a,b]:
X=a+(b−a) U
⋅
Inverse Transform Sampling
if the cumulative distribution function (CDF) F(x) of a random variable X
is known, we can generate X by:
X=F^(−1)(U)

Applications in AI/ML
1.Data Augmentation: Adding noise to training data for robustness.
2.Generative Models: Sampling from latent space in GANs or VAEs.
3.Bayesian Networks: Sampling for posterior distributions in
probabilistic models.
4.Optimization: Random search and simulated annealing require
specific distributions.

Multivariate Probability Calculations
• What is Multivariate Probability?
• Multivariate probability involves calculating probabilities and
expectations for random variables that are interdependent and can be
represented in multiple dimensions. This is crucial in AI/ML when
working with features, multivariate distributions, or joint probabilities
in datasets.

Key Concepts
• Joint Probability Distribution:
• Describes the probability of two or more random variables occurring
together.
• For random variables X1,X2,…, Xn
, the joint probability density function
(PDF) is: fX1,X2,…,Xn(x1,x2,…,xn)
• Example: Probability of a student scoring high in math and AI

• Marginal Probability:
• Probability of a subset of variables, regardless of the others.
• Obtained by integrating or summing over the other variables:
fX1(x1)=∫−∞∞fX1,X2(x1,x2)dx2

Multivariate Probability Example: Weather
and Attendance
• imagine you are a teacher and you observe two things daily
• Weather (W): Whether it is Sunny or Rainy.
• Student Attendance (A): Whether the class is Full or Not Full.
• Observations

Day Weather (W) Attendance (A)
1 Sunny Full
2 Rainy Not Full
3 Sunny Full
4 Sunny Full
5 Rainy Not Full
6 Sunny Not Full
7 Rainy Not Full
8 Sunny Full
9 Rainy Not Full
10 Sunny Full

Weather (W) Attendance (A) Count Joint Probability P(W,A)
Sunny Full 5 P(Sunny, Full)=5/10
=0.5
Sunny Not Full 1 P(Sunny, Not Full)=1/10
=0.1
Rainy Full 0 P(Rainy, Full)=0/10
=0.0
Rainy Not Full 4 P(Rainy, Not Full)=4/10
=0.4
Marginal Probability
The marginal probability is the probability of one event, regardless of the other.
Probability of Sunny: P(Sunny)= Sunny Days/Total Days = 6/10
=0.6

• Probability of Full Attendance:
• P(Full)= Days with Full Attendance /Total Days
=5/10
=0.5
• Conditional Probability
• The conditional probability answers questions like: "If it is sunny, what
is the probability of full attendance?“
• P(Full Sunny)=P(Sunny, Full)
/ P(Sunny)
∣
• From the table:
• P(Sunny, Full)= 0.5
• P(Sunny)= 0.6

Independence Check
• Are Weather and Attendance independent? If independent, P(W,A)=P(W) P(A)
⋅
• Let’s check P(Sunny, Full)
• P(Sunny, Full)=0.5
• P(Sunny) P(Full)=0.6 0.5=0.3
⋅ ⋅
• Since 0.5≠0.3, Weather and Attendance are NOT independent.
• Visualization
• This can also be visualized in a simple bar chart:
• Sunny, Full Attendance (5 days): 50%
• Sunny, Not Full Attendance (1 day): 10%
• Rainy, Full Attendance (0 days): 0%
• Rainy, Not Full Attendance (4 days): 40%

• Why is This Useful in AI/ML?
1.Feature Relationships: Understanding how one feature affects
another.
1. Example: Does weather (feature) affect sales or attendance (output)?
2.Probabilistic Models: Naive Bayes, Bayesian Networks, and HMMs
rely on joint, marginal, and conditional probabilities.
3.Data Imputation: If some values are missing, conditional probabilities
can help estimate them.

Monte Carlo Integration
• simulation and Monte Carlo integration
• What is Simulation?
• Simulation involves creating a computational model to mimic the
behavior of a real-world process or system. In AI/ML, simulations are
often used to:
• Generate synthetic data.
• Test machine learning models in controlled environments
• Solve problems where analytical solutions are difficult or impossible.

Examples in AI/ML:
• Reinforcement Learning: Simulating an environment (e.g., a robot
navigating a maze) to train an agent.
• Synthetic Data Generation: Simulating customer behavior for e-
commerce recommendations.

What is Monte Carlo Integration?
• Monte Carlo Integration is a computational method to approximate
integrals using random sampling. This technique is particularly useful
when:
• The function is complex, and traditional numerical methods fail.
• The integral has many dimensions (e.g., over a high-dimensional
space).

Monte Carlo methods are foundational for:
• Probabilistic Models: Estimating expectations and marginal probabilities.
• Bayesian Inference: Computing posterior probabilities.
• Reinforcement Learning: Estimating value functions.
Key Concept of Monte Carlo Integration
1.Random Sampling:
1. Generate random points uniformly in .
2.Compute the Mean:
1. Evaluate at these points and compute:
3.Interpretation: The integral is approximated as the average of the
function values over random points, scaled by the length of the interval .

Mathematical Problem: Estimate the Integral
The function e^−x2 does not have a simple closed-form integral.
However, numerical methods or symbolic solvers give us an
approximate value:
I≈0.882081.

Monte Carlo Integration
• To approximate this integral using Monte Carlo Integration
• The domain of integration is [0,2], so we will randomly sample points
x in this interval.
• For each random point x, we evaluate f(x)=e^−x^2

Variance Reduction in Monte Carlo
Integration
• Variance reduction techniques are methods used to improve the
accuracy of Monte Carlo estimates without increasing the number of
samples. These techniques are essential in scenarios where
computational resources are limited or where reducing error is critical.
• Why Variance Reduction
• In Monte Carlo integration, the accuracy of the estimate depends on the
variance of the sampled values. The standard error of the estimate is
proportional to
• is the variance of the function values.
• N is the number of samples.

From calculus, the exact value of the integral is:
[X^3/3 ]0 to 1 limit= 1/3=0.33333….
Monte Carlo Integration Without Variance Reduction
Generate N random points x1,x2,…,xNuniformly from [0,1]
Compute the mean of the function values

Monte Carlo Integration With Variance
Reduction Using Antithetic Variables
• Generate N/2 random points x1,x2,…,xN uniformly from [0,1] .
• For each xi
, compute f(xi)=xi^2 and f(1−xi)=(1−xi)^2 .
• Compute the mean of the combined function values

Example Calculation
• Let’s compute this for N=4:
• Without Variance Reduction
• Random samples: x=[0.2,0.6,0.8,0.1]
• Function values: f(x)=[0.04,0.36,0.64,0.01]
• Estimate the integral

Results
•Without Variance Reduction: Result will fluctuate more due to higher
variance.
•With Variance Reduction: The result will have lower variance and be
closer to 0.333.

Monte Carlo Hypothesis Testing
• Monte Carlo hypothesis testing uses random sampling to estimate p-
values and test statistical hypotheses when analytical or direct
solutions are difficult to compute. This method is especially useful in
cases involving complex probability distributions or small sample sizes

Steps in Monte Carlo Hypothesis Testing
• Define Null Hypothesis (H0
)
• Example: H0
: The observed data is consistent with a given distribution.
• Choose a Test Statistic (T)
• Example: The mean, variance, or a distance metric
• Simulate Data Under H0
:
• Generate random datasets under the null hypothesis.
• Compute Test Statistic for Simulated Data
• Calculate the test statistic T for each simulated dataset.

Compare Observed Test Statistic to Simulated
Distribution
1.
1. Compute the proportion of simulated statistics greater than or equal to the
observed value. This is the p-value:

• Problem
• We want to test whether a given dataset X={0.1,0.2,0.3,0.8,0.9}
comes from a uniform distribution U(0,1)
• Null Hypothesis (H0)
• The data is drawn from U(0,1)
• Test Statistic
• The test statistic is the mean of the data:
• T observed=Mean of observed data.

Unit-3 of mathematical foundation of ai ml

• Interpretation
• If the p-value is small (e.g., <0.05 ), reject H0
. The observed data is
unlikely to come from U(0,1)
• If the p-value is large, there is insufficient evidence to reject H0
• Advantages of Monte Carlo Hypothesis Testing
1.Flexibility:
1. Works with any test statistic, even if its distribution is unknown.
2.Applicability:
1. Useful for small sample sizes or complex null distributions.

Antithetic Variables
• Concept
• Generate pairs of negatively correlated random variables.
• Using these pairs reduces the variance of the estimator by balancing
over- and underestimations.
• Control Variates
• Concept
• Use a function g(x) (control variate) with a known expected value
E[g(x)] to adjust the estimator.
• Reduces variance if f(x) and g(x) are highly correlated.

• Importance Sampling
• Concept
• Sample from a distribution p(x) that resembles f(x) rather than
sampling uniformly.
• Reformulate the integral as:

• Stratified Sampling
• Concept
• Divide the integration domain into k strata and sample independently
within each stratum.
• Ensures better coverage of the domain, reducing the chance of
missing significant regions.

Markov Chains
• A Markov chain is a stochastic process that transitions between states
in a state space, where the probability of transitioning to the next
state depends only on the current state (memoryless property):
• State Space: The set of all possible states.
• Transition Matrix: Describes probabilities of moving from one state to
another.
• Stationary Distribution: A distribution that remains unchanged as the
Markov chain evolves.

Metropolis-Hastings Algorithm
• The Metropolis-Hastings (MH) algorithm is a Markov Chain Monte Carlo
(MCMC) method used to sample from a target probability distribution
when direct sampling is difficult.
1.Initialize: Start with an initial state .
2.Proposal: Propose a new state using a proposal distribution .
3.Acceptance Probability: Compute the acceptance probability :
4.Accept/Reject:
1. Accept with probability , i.e., set .
2. Otherwise, set .
5.Repeat: Generate a sequence that approximates samples from .

Gibbs Sampling
• Gibbs sampling is a special case of MCMC used for multivariate
distributions, where we sample each variable conditionally on the
others.
• Algorithm Steps
1.Start with an initial state .
2.Sample each variable sequentially from its conditional distribution:
3.and so on.
4.Repeat until convergence.

Convergence in MCMC
• Convergence in MCMC methods occurs when the Markov chain reaches its
stationary distribution, meaning the samples approximate the target
distribution.
• Diagnosing Convergence
1.Trace Plots: Visualize the chain over iterations. Convergence is indicated by
stability in the values.
2.Autocorrelation: Lower autocorrelation between samples indicates better
mixing.
3.Gelman-Rubin Statistic (): Compare between-chain and within-chain variance.
Values close to 1 indicate convergence.
4.Effective Sample Size (ESS): A larger ESS indicates better convergence.

Unit-3 of mathematical foundation of ai ml

More Related Content

Similar to Unit-3 of mathematical foundation of ai ml (20)

Recently uploaded (20)

Unit-3 of mathematical foundation of ai ml