SlideShare a Scribd company logo
Unit-3
Unit-3
• Pseudo-Random Numbers: Random number generation, Inverse-transform, acceptance-rejection,
transformations, multivariate probability calculations.
• Monte Carlo Integration: Simulation and Monte Carlo integration, variance reduction, Monte Carlo
hypothesis testing, antithetic variables/control variates, importance sampling, stratified sampling
• Markov chain Monte Carlo (McMC): Markov chains; Metropolis-Hastings algorithm; Gibbs sampling;
convergence
Pseudo Random Number Generator
(PRNG)
• A Pseudo-Random Number Generator (PRNG) is an algorithm that
creates a sequence of numbers that look random but are actually
generated using a mathematical formula. It starts with an initial value
called a seed. If you know the seed, you can recreate the same
sequence of numbers. PRNGs are fast and efficient, and they generate
numbers that behave like random numbers, even though they are
predictable.
Types of Pseudo-Random Number Generators
• Linear Congruential Generator (LCG):
• Formula: X(n+1)=(a*Xn+c)mod m
• Simple and fast, but has a limited period (repeats after some time).
• example: Used in early programming languages like C and Java.
Where:
• Xn ​
: Current value (seed initially).
• a: Multiplier.
• C : Increment.
• M : Modulus (defines the range).
• Let’s generate a sequence of random numbers with the following
parameters:
• Seed (X0​
) = 1
• a=4, c=1, m=9
• Step-by-step calculation:
1.X1=(4×1+1)mod 9=5
2.X2=(4×5+1)mod 9=3
3.X3=(4×3+1)mod 9=4
4.X4=(4×4+1)mod 9=8
Generated sequence: 1, 5, 3, 4, 8
Mersenne Twister (MT)
• The Mersenne Twister is a widely used pseudo-random number generator
known for its high-quality random numbers and long period of 2^{19937} –
1 . It’s efficient, fast, and generates numbers that are statistically random.
• Key Features
1.Period: Very long period (2^{19937} – 1 ), meaning the sequence doesn’t
repeat for a very long time.
2.Efficiency: Generates large batches of random numbers quickly.
3.Dimensions: Designed to avoid correlation in multiple dimensions.
4.Applications: Used in simulation, modeling, and gaming applications. Most
programming languages, like Python
import random
# Set a seed for reproducibility
random.seed(42)
# Generate random numbers
print("Random Number 1:", random.random()) # Generates a float between 0
and 1
print("Random Number 2:", random.randint(1, 100)) # Generates an integer
between 1 and 100
print("Random Number 3:", random.uniform(10, 20)) # Generates a float
between 10 and 20
Random Number 1: 0.6394267984578837
Random Number 2: 81
Random Number 3: 13.375387409862836
• Applications
• Gaming: For generating random events like shuffling cards or
spawning items.
• Simulations: For Monte Carlo simulations where high-quality
randomness is required.
• Machine Learning: For random initialization of weights and splitting
datasets.
The Mersenne Twister is popular due to its efficiency and statistical
quality, making it suitable for general-purpose randomness. However,
it’s not cryptographically secure and should not be used for encryption.
XOR-Shift Generators
• XOR-Shift generators are a family of efficient and fast pseudo-random
number generators (PRNGs) that use bitwise operations like XOR and
bit-shifting to produce random numbers.
How It Works
• The generator maintains an internal state (a seed), which is updated
iteratively using bitwise operations.
• The key idea is to shift bits of the internal state to the left or right and
combine them using the XOR operation.
• Algorithm
1.Start with an initial state x (seed).
Apply a sequence of XOR and shift operations:
x=x (x a)
⊕ ≫
x=x (x b)
⊕ ≪
x=x (x c)
⊕ ≫
Where a,b,c are fixed integers and , are bitwise shift operators.
≫ ≪
• The XOR (Exclusive OR) operation is a fundamental bitwise operation
in computer science and cryptography. It compares two bits and
returns
• 1 if the bits are different.
• 0 if the bits are the same.
• Example
• Let’s assume the initial state x=123456789, with a=21,b=35,c=4
1.x=x (x 21)
⊕ ≫
2.x=x (x 35)
⊕ ≪
• x=x (x 4)
⊕ ≫
• This produces the next random number x. Repeat the steps to generate a
sequence.
Applications
• Used in simulations, games, and lightweight random number generation.
• Faster than many PRNGs like the Mersenne Twister but less statistically robust.
Middle-Square Method
• The Middle-Square Method is a simple PRNG that generates a
sequence of numbers by squaring the current number and extracting
the middle digits as the next number.
• Algorithm
1.Start with an initial seed x0
2.Square xn​to get y.
3.Extract the middle digits of y to form xn+1​
.
• Example
• Seed: x0=1234
• Square 1234^2=1522756 .
1.Extract middle digits 2275.
2.x1=2275.
3.Repeat: 2275^2=5175625 ,middle digits 1756.
• Generated sequence: 1234,2275,1756,…
Limitations
• Short period: The sequence often repeats quickly.
• Sensitive to the seed: Some seeds can produce poor randomness.
Lagged Fibonacci Generator (LFG)
• The Lagged Fibonacci Generator generates random numbers using a
recurrence relation similar to the Fibonacci sequence.
• Algorithm
1.Define a lagged recurrence relation:
2.Xn=(Xn−j op Xn−k)mod
Where j,k are op, is an operation (e.g., addition, subtraction, XOR), and
m is the modulus.
3.Initialize with k seed values.
• Example
• Let j=2,k=5,m=10 and initial values 1,2,3,4,5.
1.X6=(X1+X4)mod 10, (1+4)mod 10=5
2.X7=(X2+X5)mod 10, (2+5)mod 10=7
• Generated sequence: 1,2,3,4,5,5,7,…
Applications
• Useful for simulations requiring high-quality randomness.
Cryptographically Secure PRNGs (CSPRNGs)
• CSPRNGs are PRNGs designed for cryptographic purposes, where randomness
must be secure and unpredictable.
• Key Features
1.Unpredictability: Even with knowledge of part of the sequence, the next
numbers cannot be predicted.
2.High Entropy: Numbers have high randomness quality.
• Applications
• Encryption keys.
• Session tokens.
• Secure password generation.
Inverse-Transform Method
• The Inverse-Transform Method is a technique to generate random
samples from a specific probability distribution using uniformly
distributed random numbers.
The key idea is to use the inverse of the cumulative distribution
function (CDF) of the desired distribution.
• Steps to Apply the Inverse-Transform Method
1.Start with the CDF F(x) :
Identify the cumulative distribution function of the desired probability
distribution. The CDF maps values of the random variable X to a range
of probabilities between 0 and 1.
• Generate a Uniform Random Number U :
Generate a random number U from a uniform distribution U(0,1)
• Solve for X:
Use the equation F(X)=U and solve for X to get a value distributed
according to the target distribution.
• How It Works
1.Start with the CDF F(x) :
The CDF gives the probability that a random variable is less than or
equal to x .
2.Generate a Random Number U :
Generate a random number U from a uniform distribution between 0
and 1 (U U(0,1)
∼
3.Solve for X :
Use the equation F(X)= U and solve for X . This gives the random
number X that follows the desired distribution.
• Example: Exponential Distribution
• Step 1: CDF of the Exponential Distribution
• For an exponential distribution with rate λ the CDF is:
• F(x)=1−e^−λx
Step 2: Generate a Uniform Random Number
• Suppose we generate U=0.5.
• Step 3: Solve for X
• Set F(X)= U:
• U=1−e^−λX
• Rearranging to solve for X :
X=−1/λ ln⁡
(1−U)
• Step 4: Plug in Values
• If λ=2
• X=−1/2 ln ⁡
(1−0.5) ≈0.35
• Key Points
• Input: Uniform random number U between 0 and 1.
• Output: A random number X that follows the desired distribution.
Acceptance-Rejection Method:
• The Acceptance-Rejection Method is a technique for generating
random numbers from a target probability distribution when direct
sampling is difficult. It uses a simpler proposal distribution to generate
samples and then accepts or rejects them based on a specific
condition
Steps to Apply the Method
1.Choose a Proposal Distribution: Select a proposal distribution that is
easy to sample from. Ensure it satisfies , where is the target
distribution and is a constant.
2.Generate a Sample: Generate a random sample from the proposal
distribution .
3.Generate a Uniform Random Number: Generate , a uniform random
number between 0 and 1.
4.Acceptance Condition: Accept the sample if:
5.Otherwise, reject the sample and repeat the process.
Key Idea
• The method works by generating samples from an easy-to-sample
distribution () and filtering them based on how closely they match the
desired distribution ().
• Rejected samples ensure the accepted ones follow the target
distribution.
• Multivariate probability calculations deal with the analysis of random
variables in a multivariate setting, i.e., when we consider more than
one random variable at a time. These calculations involve
understanding the joint behavior, relationships, and distributions of
these variables.
• . Random Variables and Joint Distribution
• If X1,X2,…,Xn​are random variables, their joint probability
distribution describes the probability that these variables
simultaneously take certain values
• For discrete variables
• P(X1​
=x1​
,X2​
=x2​
,…,Xn​
=xn​
)
• For continuous variables: fX1,X2,…,Xn(x1,x2,…,xn)
• Marginal Probability
• The probability of a subset of variables ignoring the others.
• For continuous random variables: fX1(x1)=∫−∞∞fX1,X2(x1,x2)
• Conditional Probability
• The probability of one variable given another.
Transformations
• What are Transformations in Random Number Generation?
• Transformations are mathematical methods used to convert random
numbers generated from a uniform distribution (e.g., U(0,1) into
random numbers following other desired probability distributions.
These techniques are essential when we need random samples from
complex distributions in simulations, statistical models, and machine-
learning algorithms
Why is Transformation Important in AI/ML?
• Simulation: Simulating realistic data based on specific distributions
(e.g., Gaussian noise in deep learning).
• Statistical Models: Many machine learning models rely on
assumptions about data distributions
• Monte Carlo Methods: Used in reinforcement learning and
probabilistic modeling.
Key Transformation Methods
• Linear Transformation
Converts a uniform random variable U U(0,1) into a random variable X
∼
in a different range [a,b]:
X=a+(b−a) U
⋅
Inverse Transform Sampling
if the cumulative distribution function (CDF) F(x) of a random variable X
is known, we can generate X by:
X=F^(−1)(U)
Applications in AI/ML
1.Data Augmentation: Adding noise to training data for robustness.
2.Generative Models: Sampling from latent space in GANs or VAEs.
3.Bayesian Networks: Sampling for posterior distributions in
probabilistic models.
4.Optimization: Random search and simulated annealing require
specific distributions.
Multivariate Probability Calculations
• What is Multivariate Probability?
• Multivariate probability involves calculating probabilities and
expectations for random variables that are interdependent and can be
represented in multiple dimensions. This is crucial in AI/ML when
working with features, multivariate distributions, or joint probabilities
in datasets.
Key Concepts
• Joint Probability Distribution:
• Describes the probability of two or more random variables occurring
together.
• For random variables X1,X2,…, Xn​
, the joint probability density function
(PDF) is: fX1,X2,…,Xn(x1,x2,…,xn)
• Example: Probability of a student scoring high in math and AI
• Marginal Probability:
• Probability of a subset of variables, regardless of the others.
• Obtained by integrating or summing over the other variables:
fX1(x1)=∫−∞∞fX1,X2(x1,x2)dx2
Multivariate Probability Example: Weather
and Attendance
• imagine you are a teacher and you observe two things daily
• Weather (W): Whether it is Sunny or Rainy.
• Student Attendance (A): Whether the class is Full or Not Full.
• Observations
Day Weather (W) Attendance (A)
1 Sunny Full
2 Rainy Not Full
3 Sunny Full
4 Sunny Full
5 Rainy Not Full
6 Sunny Not Full
7 Rainy Not Full
8 Sunny Full
9 Rainy Not Full
10 Sunny Full
Weather (W) Attendance (A) Count Joint Probability P(W,A)
Sunny Full 5 P(Sunny, Full)=5/10​
=0.5
Sunny Not Full 1 P(Sunny, Not Full)=1/10​
=0.1
Rainy Full 0 P(Rainy, Full)=0/10​
=0.0
Rainy Not Full 4 P(Rainy, Not Full)=4/10​
=0.4
Marginal Probability
The marginal probability is the probability of one event, regardless of the other.
Probability of Sunny: P(Sunny)= Sunny Days​/Total Days = 6/10​
=0.6
• Probability of Full Attendance:
• P(Full)= Days with Full Attendance /Total Days ​
=5/10​
=0.5
• Conditional Probability
• The conditional probability answers questions like: "If it is sunny, what
is the probability of full attendance?“
• P(Full Sunny)=P(Sunny, Full)​
/ P(Sunny)
∣
• From the table:
• P(Sunny, Full)= 0.5
• P(Sunny)= 0.6
Independence Check
• Are Weather and Attendance independent? If independent, P(W,A)=P(W) P(A)
⋅
• Let’s check P(Sunny, Full)
• P(Sunny, Full)=0.5
• P(Sunny) P(Full)=0.6 0.5=0.3
⋅ ⋅
• Since 0.5≠0.3, Weather and Attendance are NOT independent.
• Visualization
• This can also be visualized in a simple bar chart:
• Sunny, Full Attendance (5 days): 50%
• Sunny, Not Full Attendance (1 day): 10%
• Rainy, Full Attendance (0 days): 0%
• Rainy, Not Full Attendance (4 days): 40%
• Why is This Useful in AI/ML?
1.Feature Relationships: Understanding how one feature affects
another.
1. Example: Does weather (feature) affect sales or attendance (output)?
2.Probabilistic Models: Naive Bayes, Bayesian Networks, and HMMs
rely on joint, marginal, and conditional probabilities.
3.Data Imputation: If some values are missing, conditional probabilities
can help estimate them.
Monte Carlo Integration
• simulation and Monte Carlo integration
• What is Simulation?
• Simulation involves creating a computational model to mimic the
behavior of a real-world process or system. In AI/ML, simulations are
often used to:
• Generate synthetic data.
• Test machine learning models in controlled environments
• Solve problems where analytical solutions are difficult or impossible.
Examples in AI/ML:
• Reinforcement Learning: Simulating an environment (e.g., a robot
navigating a maze) to train an agent.
• Synthetic Data Generation: Simulating customer behavior for e-
commerce recommendations.
What is Monte Carlo Integration?
• Monte Carlo Integration is a computational method to approximate
integrals using random sampling. This technique is particularly useful
when:
• The function is complex, and traditional numerical methods fail.
• The integral has many dimensions (e.g., over a high-dimensional
space).
Monte Carlo methods are foundational for:
• Probabilistic Models: Estimating expectations and marginal probabilities.
• Bayesian Inference: Computing posterior probabilities.
• Reinforcement Learning: Estimating value functions.
Key Concept of Monte Carlo Integration
1.Random Sampling:
1. Generate random points uniformly in .
2.Compute the Mean:
1. Evaluate at these points and compute:
3.Interpretation: The integral is approximated as the average of the
function values over random points, scaled by the length of the interval .
Mathematical Problem: Estimate the Integral
The function e^−x2 does not have a simple closed-form integral.
However, numerical methods or symbolic solvers give us an
approximate value:
I≈0.882081.
Monte Carlo Integration
• To approximate this integral using Monte Carlo Integration
• The domain of integration is [0,2], so we will randomly sample points
x in this interval.
• For each random point x, we evaluate f(x)=e^−x^2
Variance Reduction in Monte Carlo
Integration
• Variance reduction techniques are methods used to improve the
accuracy of Monte Carlo estimates without increasing the number of
samples. These techniques are essential in scenarios where
computational resources are limited or where reducing error is critical.
• Why Variance Reduction
• In Monte Carlo integration, the accuracy of the estimate depends on the
variance of the sampled values. The standard error of the estimate is
proportional to
• is the variance of the function values.
• N is the number of samples.
From calculus, the exact value of the integral is:
[X^3/3 ]0 to 1 limit= 1/3=0.33333….
Monte Carlo Integration Without Variance Reduction
Generate N random points x1,x2,…,xN​uniformly from [0,1]
Compute the mean of the function values
Monte Carlo Integration With Variance
Reduction Using Antithetic Variables
• Generate N/2 random points x1,x2,…,xN uniformly from [0,1] .
• For each xi​
, compute f(xi)=xi^2 and f(1−xi)=(1−xi)^2 .
• Compute the mean of the combined function values
Example Calculation
• Let’s compute this for N=4:
• Without Variance Reduction
• Random samples: x=[0.2,0.6,0.8,0.1]
• Function values: f(x)=[0.04,0.36,0.64,0.01]
• Estimate the integral
Results
•Without Variance Reduction: Result will fluctuate more due to higher
variance.
•With Variance Reduction: The result will have lower variance and be
closer to 0.333.
Monte Carlo Hypothesis Testing
• Monte Carlo hypothesis testing uses random sampling to estimate p-
values and test statistical hypotheses when analytical or direct
solutions are difficult to compute. This method is especially useful in
cases involving complex probability distributions or small sample sizes
Steps in Monte Carlo Hypothesis Testing
• Define Null Hypothesis (H0 ​
)
• Example: H0​
: The observed data is consistent with a given distribution.
• Choose a Test Statistic (T)
• Example: The mean, variance, or a distance metric
• Simulate Data Under H0​
:
• Generate random datasets under the null hypothesis.
• Compute Test Statistic for Simulated Data
• Calculate the test statistic T for each simulated dataset.
Compare Observed Test Statistic to Simulated
Distribution
1.
1. Compute the proportion of simulated statistics greater than or equal to the
observed value. This is the p-value:
• Problem
• We want to test whether a given dataset X={0.1,0.2,0.3,0.8,0.9}
comes from a uniform distribution U(0,1)
• Null Hypothesis (H0)
• The data is drawn from U(0,1)
• Test Statistic
• The test statistic is the mean of the data:
• T observed=Mean of observed data.
Unit-3 of mathematical foundation of ai ml
• Interpretation
• If the p-value is small (e.g., <0.05 ), reject H0​
. The observed data is
unlikely to come from U(0,1)
• If the p-value is large, there is insufficient evidence to reject H0
• Advantages of Monte Carlo Hypothesis Testing
1.Flexibility:
1. Works with any test statistic, even if its distribution is unknown.
2.Applicability:
1. Useful for small sample sizes or complex null distributions.
Antithetic Variables
• Concept
• Generate pairs of negatively correlated random variables.
• Using these pairs reduces the variance of the estimator by balancing
over- and underestimations.
• Control Variates
• Concept
• Use a function g(x) (control variate) with a known expected value
E[g(x)] to adjust the estimator.
• Reduces variance if f(x) and g(x) are highly correlated.
• Importance Sampling
• Concept
• Sample from a distribution p(x) that resembles f(x) rather than
sampling uniformly.
• Reformulate the integral as:
• Stratified Sampling
• Concept
• Divide the integration domain into k strata and sample independently
within each stratum.
• Ensures better coverage of the domain, reducing the chance of
missing significant regions.
Unit-3 of mathematical foundation of ai ml
Markov Chains
• A Markov chain is a stochastic process that transitions between states
in a state space, where the probability of transitioning to the next
state depends only on the current state (memoryless property):
• State Space: The set of all possible states.
• Transition Matrix: Describes probabilities of moving from one state to
another.
• Stationary Distribution: A distribution that remains unchanged as the
Markov chain evolves.
Metropolis-Hastings Algorithm
• The Metropolis-Hastings (MH) algorithm is a Markov Chain Monte Carlo
(MCMC) method used to sample from a target probability distribution
when direct sampling is difficult.
1.Initialize: Start with an initial state .
2.Proposal: Propose a new state using a proposal distribution .
3.Acceptance Probability: Compute the acceptance probability :
4.Accept/Reject:
1. Accept with probability , i.e., set .
2. Otherwise, set .
5.Repeat: Generate a sequence that approximates samples from .
Gibbs Sampling
• Gibbs sampling is a special case of MCMC used for multivariate
distributions, where we sample each variable conditionally on the
others.
• Algorithm Steps
1.Start with an initial state .
2.Sample each variable sequentially from its conditional distribution:
3.and so on.
4.Repeat until convergence.
Convergence in MCMC
• Convergence in MCMC methods occurs when the Markov chain reaches its
stationary distribution, meaning the samples approximate the target
distribution.
• Diagnosing Convergence
1.Trace Plots: Visualize the chain over iterations. Convergence is indicated by
stability in the values.
2.Autocorrelation: Lower autocorrelation between samples indicates better
mixing.
3.Gelman-Rubin Statistic (): Compare between-chain and within-chain variance.
Values close to 1 indicate convergence.
4.Effective Sample Size (ESS): A larger ESS indicates better convergence.

More Related Content

PPTX
Pseudo Random Number
PPTX
Random_Number_Generation_Algorithms.pptx
PPTX
PPT
Lecture06-Random-Number-Genedawrators.ppt
PDF
J45015460
PDF
Python Programming - IX. On Randomness
PPTX
Random number generation
PDF
Fv2510671071
Pseudo Random Number
Random_Number_Generation_Algorithms.pptx
Lecture06-Random-Number-Genedawrators.ppt
J45015460
Python Programming - IX. On Randomness
Random number generation
Fv2510671071

Similar to Unit-3 of mathematical foundation of ai ml (20)

PPTX
Unit 3 random number generation, random-variate generation
PDF
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
PDF
Random Number Generators 2018
PDF
Pseudo-Random Number Generators: A New Approach
PPTX
Midsquare method- simulation system
PPT
lections tha detail aboot andom nummerss
PPTX
Random number generation
PDF
Random number generator
PPTX
Pseudorandom number generators powerpoint
PPTX
Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...
PPTX
2. Modelling and Simulation in computer 2.pptx
PPTX
Teknik Simulasi
PPTX
4. random number and it's generating techniques
PDF
40120140502003
PPSX
Generate and test random numbers
PPTX
MT6702 Unit 2 Random Number Generation
PDF
Random number generation (in C++) – past, present and potential future
DOCX
Chi-squared Goodness of Fit Test Project Overview and.docx
PDF
PhysicsSIG2008-01-Seneviratne
Unit 3 random number generation, random-variate generation
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
Random Number Generators 2018
Pseudo-Random Number Generators: A New Approach
Midsquare method- simulation system
lections tha detail aboot andom nummerss
Random number generation
Random number generator
Pseudorandom number generators powerpoint
Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...
2. Modelling and Simulation in computer 2.pptx
Teknik Simulasi
4. random number and it's generating techniques
40120140502003
Generate and test random numbers
MT6702 Unit 2 Random Number Generation
Random number generation (in C++) – past, present and potential future
Chi-squared Goodness of Fit Test Project Overview and.docx
PhysicsSIG2008-01-Seneviratne
Ad

Recently uploaded (20)

PDF
PPT on Performance Review to get promotions
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
web development for engineering and engineering
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Construction Project Organization Group 2.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Sustainable Sites - Green Building Construction
PDF
Well-logging-methods_new................
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Lecture Notes Electrical Wiring System Components
DOCX
573137875-Attendance-Management-System-original
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
additive manufacturing of ss316l using mig welding
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPT on Performance Review to get promotions
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
bas. eng. economics group 4 presentation 1.pptx
Digital Logic Computer Design lecture notes
web development for engineering and engineering
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Construction Project Organization Group 2.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Sustainable Sites - Green Building Construction
Well-logging-methods_new................
Lesson 3_Tessellation.pptx finite Mathematics
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Lecture Notes Electrical Wiring System Components
573137875-Attendance-Management-System-original
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
additive manufacturing of ss316l using mig welding
Arduino robotics embedded978-1-4302-3184-4.pdf
Ad

Unit-3 of mathematical foundation of ai ml

  • 2. Unit-3 • Pseudo-Random Numbers: Random number generation, Inverse-transform, acceptance-rejection, transformations, multivariate probability calculations. • Monte Carlo Integration: Simulation and Monte Carlo integration, variance reduction, Monte Carlo hypothesis testing, antithetic variables/control variates, importance sampling, stratified sampling • Markov chain Monte Carlo (McMC): Markov chains; Metropolis-Hastings algorithm; Gibbs sampling; convergence
  • 3. Pseudo Random Number Generator (PRNG) • A Pseudo-Random Number Generator (PRNG) is an algorithm that creates a sequence of numbers that look random but are actually generated using a mathematical formula. It starts with an initial value called a seed. If you know the seed, you can recreate the same sequence of numbers. PRNGs are fast and efficient, and they generate numbers that behave like random numbers, even though they are predictable.
  • 4. Types of Pseudo-Random Number Generators • Linear Congruential Generator (LCG): • Formula: X(n+1)=(a*Xn+c)mod m • Simple and fast, but has a limited period (repeats after some time). • example: Used in early programming languages like C and Java. Where: • Xn ​ : Current value (seed initially). • a: Multiplier. • C : Increment. • M : Modulus (defines the range).
  • 5. • Let’s generate a sequence of random numbers with the following parameters: • Seed (X0​ ) = 1 • a=4, c=1, m=9 • Step-by-step calculation: 1.X1=(4×1+1)mod 9=5 2.X2=(4×5+1)mod 9=3 3.X3=(4×3+1)mod 9=4 4.X4=(4×4+1)mod 9=8 Generated sequence: 1, 5, 3, 4, 8
  • 6. Mersenne Twister (MT) • The Mersenne Twister is a widely used pseudo-random number generator known for its high-quality random numbers and long period of 2^{19937} – 1 . It’s efficient, fast, and generates numbers that are statistically random. • Key Features 1.Period: Very long period (2^{19937} – 1 ), meaning the sequence doesn’t repeat for a very long time. 2.Efficiency: Generates large batches of random numbers quickly. 3.Dimensions: Designed to avoid correlation in multiple dimensions. 4.Applications: Used in simulation, modeling, and gaming applications. Most programming languages, like Python
  • 7. import random # Set a seed for reproducibility random.seed(42) # Generate random numbers print("Random Number 1:", random.random()) # Generates a float between 0 and 1 print("Random Number 2:", random.randint(1, 100)) # Generates an integer between 1 and 100 print("Random Number 3:", random.uniform(10, 20)) # Generates a float between 10 and 20 Random Number 1: 0.6394267984578837 Random Number 2: 81 Random Number 3: 13.375387409862836
  • 8. • Applications • Gaming: For generating random events like shuffling cards or spawning items. • Simulations: For Monte Carlo simulations where high-quality randomness is required. • Machine Learning: For random initialization of weights and splitting datasets. The Mersenne Twister is popular due to its efficiency and statistical quality, making it suitable for general-purpose randomness. However, it’s not cryptographically secure and should not be used for encryption.
  • 9. XOR-Shift Generators • XOR-Shift generators are a family of efficient and fast pseudo-random number generators (PRNGs) that use bitwise operations like XOR and bit-shifting to produce random numbers. How It Works • The generator maintains an internal state (a seed), which is updated iteratively using bitwise operations. • The key idea is to shift bits of the internal state to the left or right and combine them using the XOR operation.
  • 10. • Algorithm 1.Start with an initial state x (seed). Apply a sequence of XOR and shift operations: x=x (x a) ⊕ ≫ x=x (x b) ⊕ ≪ x=x (x c) ⊕ ≫ Where a,b,c are fixed integers and , are bitwise shift operators. ≫ ≪
  • 11. • The XOR (Exclusive OR) operation is a fundamental bitwise operation in computer science and cryptography. It compares two bits and returns • 1 if the bits are different. • 0 if the bits are the same.
  • 12. • Example • Let’s assume the initial state x=123456789, with a=21,b=35,c=4 1.x=x (x 21) ⊕ ≫ 2.x=x (x 35) ⊕ ≪ • x=x (x 4) ⊕ ≫ • This produces the next random number x. Repeat the steps to generate a sequence. Applications • Used in simulations, games, and lightweight random number generation. • Faster than many PRNGs like the Mersenne Twister but less statistically robust.
  • 13. Middle-Square Method • The Middle-Square Method is a simple PRNG that generates a sequence of numbers by squaring the current number and extracting the middle digits as the next number. • Algorithm 1.Start with an initial seed x0 2.Square xn​to get y. 3.Extract the middle digits of y to form xn+1​ .
  • 14. • Example • Seed: x0=1234 • Square 1234^2=1522756 . 1.Extract middle digits 2275. 2.x1=2275. 3.Repeat: 2275^2=5175625 ,middle digits 1756. • Generated sequence: 1234,2275,1756,… Limitations • Short period: The sequence often repeats quickly. • Sensitive to the seed: Some seeds can produce poor randomness.
  • 15. Lagged Fibonacci Generator (LFG) • The Lagged Fibonacci Generator generates random numbers using a recurrence relation similar to the Fibonacci sequence. • Algorithm 1.Define a lagged recurrence relation: 2.Xn=(Xn−j op Xn−k)mod Where j,k are op, is an operation (e.g., addition, subtraction, XOR), and m is the modulus. 3.Initialize with k seed values.
  • 16. • Example • Let j=2,k=5,m=10 and initial values 1,2,3,4,5. 1.X6=(X1+X4)mod 10, (1+4)mod 10=5 2.X7=(X2+X5)mod 10, (2+5)mod 10=7 • Generated sequence: 1,2,3,4,5,5,7,… Applications • Useful for simulations requiring high-quality randomness.
  • 17. Cryptographically Secure PRNGs (CSPRNGs) • CSPRNGs are PRNGs designed for cryptographic purposes, where randomness must be secure and unpredictable. • Key Features 1.Unpredictability: Even with knowledge of part of the sequence, the next numbers cannot be predicted. 2.High Entropy: Numbers have high randomness quality. • Applications • Encryption keys. • Session tokens. • Secure password generation.
  • 18. Inverse-Transform Method • The Inverse-Transform Method is a technique to generate random samples from a specific probability distribution using uniformly distributed random numbers. The key idea is to use the inverse of the cumulative distribution function (CDF) of the desired distribution. • Steps to Apply the Inverse-Transform Method 1.Start with the CDF F(x) : Identify the cumulative distribution function of the desired probability distribution. The CDF maps values of the random variable X to a range of probabilities between 0 and 1.
  • 19. • Generate a Uniform Random Number U : Generate a random number U from a uniform distribution U(0,1) • Solve for X: Use the equation F(X)=U and solve for X to get a value distributed according to the target distribution.
  • 20. • How It Works 1.Start with the CDF F(x) : The CDF gives the probability that a random variable is less than or equal to x . 2.Generate a Random Number U : Generate a random number U from a uniform distribution between 0 and 1 (U U(0,1) ∼ 3.Solve for X : Use the equation F(X)= U and solve for X . This gives the random number X that follows the desired distribution.
  • 21. • Example: Exponential Distribution • Step 1: CDF of the Exponential Distribution • For an exponential distribution with rate λ the CDF is: • F(x)=1−e^−λx Step 2: Generate a Uniform Random Number • Suppose we generate U=0.5. • Step 3: Solve for X • Set F(X)= U: • U=1−e^−λX
  • 22. • Rearranging to solve for X : X=−1/λ ln⁡ (1−U) • Step 4: Plug in Values • If λ=2 • X=−1/2 ln ⁡ (1−0.5) ≈0.35 • Key Points • Input: Uniform random number U between 0 and 1. • Output: A random number X that follows the desired distribution.
  • 23. Acceptance-Rejection Method: • The Acceptance-Rejection Method is a technique for generating random numbers from a target probability distribution when direct sampling is difficult. It uses a simpler proposal distribution to generate samples and then accepts or rejects them based on a specific condition
  • 24. Steps to Apply the Method 1.Choose a Proposal Distribution: Select a proposal distribution that is easy to sample from. Ensure it satisfies , where is the target distribution and is a constant. 2.Generate a Sample: Generate a random sample from the proposal distribution . 3.Generate a Uniform Random Number: Generate , a uniform random number between 0 and 1. 4.Acceptance Condition: Accept the sample if: 5.Otherwise, reject the sample and repeat the process.
  • 25. Key Idea • The method works by generating samples from an easy-to-sample distribution () and filtering them based on how closely they match the desired distribution (). • Rejected samples ensure the accepted ones follow the target distribution.
  • 26. • Multivariate probability calculations deal with the analysis of random variables in a multivariate setting, i.e., when we consider more than one random variable at a time. These calculations involve understanding the joint behavior, relationships, and distributions of these variables. • . Random Variables and Joint Distribution • If X1,X2,…,Xn​are random variables, their joint probability distribution describes the probability that these variables simultaneously take certain values
  • 27. • For discrete variables • P(X1​ =x1​ ,X2​ =x2​ ,…,Xn​ =xn​ ) • For continuous variables: fX1,X2,…,Xn(x1,x2,…,xn) • Marginal Probability • The probability of a subset of variables ignoring the others. • For continuous random variables: fX1(x1)=∫−∞∞fX1,X2(x1,x2)
  • 28. • Conditional Probability • The probability of one variable given another.
  • 29. Transformations • What are Transformations in Random Number Generation? • Transformations are mathematical methods used to convert random numbers generated from a uniform distribution (e.g., U(0,1) into random numbers following other desired probability distributions. These techniques are essential when we need random samples from complex distributions in simulations, statistical models, and machine- learning algorithms
  • 30. Why is Transformation Important in AI/ML? • Simulation: Simulating realistic data based on specific distributions (e.g., Gaussian noise in deep learning). • Statistical Models: Many machine learning models rely on assumptions about data distributions • Monte Carlo Methods: Used in reinforcement learning and probabilistic modeling.
  • 31. Key Transformation Methods • Linear Transformation Converts a uniform random variable U U(0,1) into a random variable X ∼ in a different range [a,b]: X=a+(b−a) U ⋅ Inverse Transform Sampling if the cumulative distribution function (CDF) F(x) of a random variable X is known, we can generate X by: X=F^(−1)(U)
  • 32. Applications in AI/ML 1.Data Augmentation: Adding noise to training data for robustness. 2.Generative Models: Sampling from latent space in GANs or VAEs. 3.Bayesian Networks: Sampling for posterior distributions in probabilistic models. 4.Optimization: Random search and simulated annealing require specific distributions.
  • 33. Multivariate Probability Calculations • What is Multivariate Probability? • Multivariate probability involves calculating probabilities and expectations for random variables that are interdependent and can be represented in multiple dimensions. This is crucial in AI/ML when working with features, multivariate distributions, or joint probabilities in datasets.
  • 34. Key Concepts • Joint Probability Distribution: • Describes the probability of two or more random variables occurring together. • For random variables X1,X2,…, Xn​ , the joint probability density function (PDF) is: fX1,X2,…,Xn(x1,x2,…,xn) • Example: Probability of a student scoring high in math and AI
  • 35. • Marginal Probability: • Probability of a subset of variables, regardless of the others. • Obtained by integrating or summing over the other variables: fX1(x1)=∫−∞∞fX1,X2(x1,x2)dx2
  • 36. Multivariate Probability Example: Weather and Attendance • imagine you are a teacher and you observe two things daily • Weather (W): Whether it is Sunny or Rainy. • Student Attendance (A): Whether the class is Full or Not Full. • Observations
  • 37. Day Weather (W) Attendance (A) 1 Sunny Full 2 Rainy Not Full 3 Sunny Full 4 Sunny Full 5 Rainy Not Full 6 Sunny Not Full 7 Rainy Not Full 8 Sunny Full 9 Rainy Not Full 10 Sunny Full
  • 38. Weather (W) Attendance (A) Count Joint Probability P(W,A) Sunny Full 5 P(Sunny, Full)=5/10​ =0.5 Sunny Not Full 1 P(Sunny, Not Full)=1/10​ =0.1 Rainy Full 0 P(Rainy, Full)=0/10​ =0.0 Rainy Not Full 4 P(Rainy, Not Full)=4/10​ =0.4 Marginal Probability The marginal probability is the probability of one event, regardless of the other. Probability of Sunny: P(Sunny)= Sunny Days​/Total Days = 6/10​ =0.6
  • 39. • Probability of Full Attendance: • P(Full)= Days with Full Attendance /Total Days ​ =5/10​ =0.5 • Conditional Probability • The conditional probability answers questions like: "If it is sunny, what is the probability of full attendance?“ • P(Full Sunny)=P(Sunny, Full)​ / P(Sunny) ∣ • From the table: • P(Sunny, Full)= 0.5 • P(Sunny)= 0.6
  • 40. Independence Check • Are Weather and Attendance independent? If independent, P(W,A)=P(W) P(A) ⋅ • Let’s check P(Sunny, Full) • P(Sunny, Full)=0.5 • P(Sunny) P(Full)=0.6 0.5=0.3 ⋅ ⋅ • Since 0.5≠0.3, Weather and Attendance are NOT independent. • Visualization • This can also be visualized in a simple bar chart: • Sunny, Full Attendance (5 days): 50% • Sunny, Not Full Attendance (1 day): 10% • Rainy, Full Attendance (0 days): 0% • Rainy, Not Full Attendance (4 days): 40%
  • 41. • Why is This Useful in AI/ML? 1.Feature Relationships: Understanding how one feature affects another. 1. Example: Does weather (feature) affect sales or attendance (output)? 2.Probabilistic Models: Naive Bayes, Bayesian Networks, and HMMs rely on joint, marginal, and conditional probabilities. 3.Data Imputation: If some values are missing, conditional probabilities can help estimate them.
  • 42. Monte Carlo Integration • simulation and Monte Carlo integration • What is Simulation? • Simulation involves creating a computational model to mimic the behavior of a real-world process or system. In AI/ML, simulations are often used to: • Generate synthetic data. • Test machine learning models in controlled environments • Solve problems where analytical solutions are difficult or impossible.
  • 43. Examples in AI/ML: • Reinforcement Learning: Simulating an environment (e.g., a robot navigating a maze) to train an agent. • Synthetic Data Generation: Simulating customer behavior for e- commerce recommendations.
  • 44. What is Monte Carlo Integration? • Monte Carlo Integration is a computational method to approximate integrals using random sampling. This technique is particularly useful when: • The function is complex, and traditional numerical methods fail. • The integral has many dimensions (e.g., over a high-dimensional space).
  • 45. Monte Carlo methods are foundational for: • Probabilistic Models: Estimating expectations and marginal probabilities. • Bayesian Inference: Computing posterior probabilities. • Reinforcement Learning: Estimating value functions. Key Concept of Monte Carlo Integration 1.Random Sampling: 1. Generate random points uniformly in . 2.Compute the Mean: 1. Evaluate at these points and compute: 3.Interpretation: The integral is approximated as the average of the function values over random points, scaled by the length of the interval .
  • 46. Mathematical Problem: Estimate the Integral The function e^−x2 does not have a simple closed-form integral. However, numerical methods or symbolic solvers give us an approximate value: I≈0.882081.
  • 47. Monte Carlo Integration • To approximate this integral using Monte Carlo Integration • The domain of integration is [0,2], so we will randomly sample points x in this interval. • For each random point x, we evaluate f(x)=e^−x^2
  • 48. Variance Reduction in Monte Carlo Integration • Variance reduction techniques are methods used to improve the accuracy of Monte Carlo estimates without increasing the number of samples. These techniques are essential in scenarios where computational resources are limited or where reducing error is critical. • Why Variance Reduction • In Monte Carlo integration, the accuracy of the estimate depends on the variance of the sampled values. The standard error of the estimate is proportional to • is the variance of the function values. • N is the number of samples.
  • 49. From calculus, the exact value of the integral is: [X^3/3 ]0 to 1 limit= 1/3=0.33333…. Monte Carlo Integration Without Variance Reduction Generate N random points x1,x2,…,xN​uniformly from [0,1] Compute the mean of the function values
  • 50. Monte Carlo Integration With Variance Reduction Using Antithetic Variables • Generate N/2 random points x1,x2,…,xN uniformly from [0,1] . • For each xi​ , compute f(xi)=xi^2 and f(1−xi)=(1−xi)^2 . • Compute the mean of the combined function values
  • 51. Example Calculation • Let’s compute this for N=4: • Without Variance Reduction • Random samples: x=[0.2,0.6,0.8,0.1] • Function values: f(x)=[0.04,0.36,0.64,0.01] • Estimate the integral
  • 52. Results •Without Variance Reduction: Result will fluctuate more due to higher variance. •With Variance Reduction: The result will have lower variance and be closer to 0.333.
  • 53. Monte Carlo Hypothesis Testing • Monte Carlo hypothesis testing uses random sampling to estimate p- values and test statistical hypotheses when analytical or direct solutions are difficult to compute. This method is especially useful in cases involving complex probability distributions or small sample sizes
  • 54. Steps in Monte Carlo Hypothesis Testing • Define Null Hypothesis (H0 ​ ) • Example: H0​ : The observed data is consistent with a given distribution. • Choose a Test Statistic (T) • Example: The mean, variance, or a distance metric • Simulate Data Under H0​ : • Generate random datasets under the null hypothesis. • Compute Test Statistic for Simulated Data • Calculate the test statistic T for each simulated dataset.
  • 55. Compare Observed Test Statistic to Simulated Distribution 1. 1. Compute the proportion of simulated statistics greater than or equal to the observed value. This is the p-value:
  • 56. • Problem • We want to test whether a given dataset X={0.1,0.2,0.3,0.8,0.9} comes from a uniform distribution U(0,1) • Null Hypothesis (H0) • The data is drawn from U(0,1) • Test Statistic • The test statistic is the mean of the data: • T observed=Mean of observed data.
  • 58. • Interpretation • If the p-value is small (e.g., <0.05 ), reject H0​ . The observed data is unlikely to come from U(0,1) • If the p-value is large, there is insufficient evidence to reject H0 • Advantages of Monte Carlo Hypothesis Testing 1.Flexibility: 1. Works with any test statistic, even if its distribution is unknown. 2.Applicability: 1. Useful for small sample sizes or complex null distributions.
  • 59. Antithetic Variables • Concept • Generate pairs of negatively correlated random variables. • Using these pairs reduces the variance of the estimator by balancing over- and underestimations. • Control Variates • Concept • Use a function g(x) (control variate) with a known expected value E[g(x)] to adjust the estimator. • Reduces variance if f(x) and g(x) are highly correlated.
  • 60. • Importance Sampling • Concept • Sample from a distribution p(x) that resembles f(x) rather than sampling uniformly. • Reformulate the integral as:
  • 61. • Stratified Sampling • Concept • Divide the integration domain into k strata and sample independently within each stratum. • Ensures better coverage of the domain, reducing the chance of missing significant regions.
  • 63. Markov Chains • A Markov chain is a stochastic process that transitions between states in a state space, where the probability of transitioning to the next state depends only on the current state (memoryless property): • State Space: The set of all possible states. • Transition Matrix: Describes probabilities of moving from one state to another. • Stationary Distribution: A distribution that remains unchanged as the Markov chain evolves.
  • 64. Metropolis-Hastings Algorithm • The Metropolis-Hastings (MH) algorithm is a Markov Chain Monte Carlo (MCMC) method used to sample from a target probability distribution when direct sampling is difficult. 1.Initialize: Start with an initial state . 2.Proposal: Propose a new state using a proposal distribution . 3.Acceptance Probability: Compute the acceptance probability : 4.Accept/Reject: 1. Accept with probability , i.e., set . 2. Otherwise, set . 5.Repeat: Generate a sequence that approximates samples from .
  • 65. Gibbs Sampling • Gibbs sampling is a special case of MCMC used for multivariate distributions, where we sample each variable conditionally on the others. • Algorithm Steps 1.Start with an initial state . 2.Sample each variable sequentially from its conditional distribution: 3.and so on. 4.Repeat until convergence.
  • 66. Convergence in MCMC • Convergence in MCMC methods occurs when the Markov chain reaches its stationary distribution, meaning the samples approximate the target distribution. • Diagnosing Convergence 1.Trace Plots: Visualize the chain over iterations. Convergence is indicated by stability in the values. 2.Autocorrelation: Lower autocorrelation between samples indicates better mixing. 3.Gelman-Rubin Statistic (): Compare between-chain and within-chain variance. Values close to 1 indicate convergence. 4.Effective Sample Size (ESS): A larger ESS indicates better convergence.