SlideShare a Scribd company logo
Data mining
Assignment week 2
Exercise 1: Probabilities
How can Bayes' rule be derived from simpler definitions, such as the definition of conditional
probability, symmetry of joint probability, the chain rule? Give a step-wise derivation,
mentioning which rule you applied at each step.
	
  
We have a set of possible outcomes for values of x and y:


  x = { x1, x2…,xn }
  y = { y1, y2…,yn }



We need to show how the Bayes rule is implemented. The Bayes rule is as following:


  P( X = x | Y = y ) = P( Y = y | X = x ) * P(X = x) / P(Y = y)



We use the chain rule:


  P(X = x , Y = y)

  Joint                            = condition        * marginal distance
  P(X = x, Y = y)                  = P(Y = y | X = x) * P(X = x)

  P(Y = y | X = x) * P(Y = y) = P(Y = y | X = x) * P(X = x)

  In conclusion: P(X = x) = P(x)
Exercise 2: Entropy
2.1 Assume a variable X with three possible values: a, b, and c. If p(a) = 0:4, and
p(b) = 0:25, what is the entropy of of X, i.e., what is H(X)?

To know the probability for C we calculate the P.


  P(total)   =   1
  P(a)       =   0.4
  P(b)       =   0.25
  P(c)       =   P(total) – P(a) – P(b)
  P(c)       =   0.35



Now we calculate the Entropy by using all probabilities:


H = 0.4 log2(0.4) + 0.25 log2(0.25)+ 0.35log2(0.35)
H = 1.5589



2.2 Assume a variable X with three possible values: a, b, and c. What is the probability
distribution with the highest entropy? Which one(s) has/have the lowest one? Explain in a
sentence or two and in your in own words why these distributions have the highest and lowest
entropies.

We need to see what ‘P’ value is responsible for the highest entropy (so the maximum uncertainty).

If we don’t know anything about the values ‘a’, ‘b’ and ‘c’ then we can give now prediction on the
possible chances of having any of these values. Because of this we can state that these values are
indistinguishable. So the change of having an ‘a’-value is equal to the ‘b’ and ‘c’. We call this uniform
distribution.


  P(a) = P(b) = P(c)
  P(total) = P(a) – P(b) + P(c)

  P(total) = 1
  P(a) – P(b) + P(c) = 1/3



The lowest entropy would be when we know on forehand which value will be the outcome. So there
should be a 100% of having a ‘a’, ‘b’ or ‘c’ value.

2.3 In general, if a variable X has n possible values, what is the maximum entropy?

We can just sum up the change for P, we only need a uniform distribution:


  P(x) = 1/ni
  i    = 1, 2, …, n

More Related Content

PPTX
4.4 implication
PPTX
APPLICATION OF PARTIAL DIFFERENTIATION
PPTX
Maths project for class 10 th
PPT
Factor theorem solving cubic equations
PDF
PPT
Shubhanshu math project work , polynomial
PDF
Lecture 2, exponents and radicals
PPTX
properties of addition and subtraction of integers
4.4 implication
APPLICATION OF PARTIAL DIFFERENTIATION
Maths project for class 10 th
Factor theorem solving cubic equations
Shubhanshu math project work , polynomial
Lecture 2, exponents and radicals
properties of addition and subtraction of integers

What's hot (20)

PPTX
3.6 applications in optimization
PPTX
properties of multiplication of integers
PPTX
multiplication of integers
PPT
Factor theorem
PPTX
Sample Space And Events
PPTX
2 integration and the substitution methods x
PDF
Taylor series
PPT
Remainder theorem
PPT
Set Operations
PDF
Numerical methods presentation 11 iteration method
PPT
Remainder theorem
PDF
3.2 Power sets
PPTX
Taylor series in 1 and 2 variable
PPTX
Linear equations in two variables
PPTX
4.5 continuous functions and differentiable functions
PPT
Maxima and minima
PDF
Slides lln-risques
PPTX
Project in Calcu
PPTX
Newton raphson method
PPTX
Polynomials
3.6 applications in optimization
properties of multiplication of integers
multiplication of integers
Factor theorem
Sample Space And Events
2 integration and the substitution methods x
Taylor series
Remainder theorem
Set Operations
Numerical methods presentation 11 iteration method
Remainder theorem
3.2 Power sets
Taylor series in 1 and 2 variable
Linear equations in two variables
4.5 continuous functions and differentiable functions
Maxima and minima
Slides lln-risques
Project in Calcu
Newton raphson method
Polynomials
Ad

Similar to Data mining assignment 2 (20)

PDF
Probability cheatsheet
PPTX
Unit II PPT.pptx
PDF
Slides ACTINFO 2016
PDF
Probability cheatsheet
PDF
Probability Cheatsheet.pdf
PDF
Probability Formula sheet
PDF
bayesian_statistics_introduction_uppsala_university
PDF
0202 fmc3
PDF
Probability Theory.pdf
PDF
PPT
pattern recognition
PPTX
Bayesian statistics
PDF
HPWFcorePRES--FUR2016
PDF
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
PDF
Approximation Methods Of Solutions For Equilibrium Problem In Hilbert Spaces
PPTX
Microeconomics Theory Exam Help
PDF
FullMLCheatSheetfor engineering students .pdf
PDF
Deep learning .pdf
PPTX
Microeconomics Theory Homework Help
Probability cheatsheet
Unit II PPT.pptx
Slides ACTINFO 2016
Probability cheatsheet
Probability Cheatsheet.pdf
Probability Formula sheet
bayesian_statistics_introduction_uppsala_university
0202 fmc3
Probability Theory.pdf
pattern recognition
Bayesian statistics
HPWFcorePRES--FUR2016
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Approximation Methods Of Solutions For Equilibrium Problem In Hilbert Spaces
Microeconomics Theory Exam Help
FullMLCheatSheetfor engineering students .pdf
Deep learning .pdf
Microeconomics Theory Homework Help
Ad

More from BarryK88 (14)

PDF
Data mining test notes (back)
PDF
Data mining test notes (front)
PDF
Data mining Computerassignment 3
PDF
Data mining assignment 4
PDF
Data mining assignment 3
PDF
Data mining assignment 5
PDF
Data mining assignment 6
PDF
Data mining assignment 1
PDF
Data mining Computerassignment 2
PDF
Data mining Computerassignment 1
PDF
Semantic web final assignment
PDF
Semantic web assignment 3
PDF
Semantic web assignment 2
PDF
Semantic web assignment1
Data mining test notes (back)
Data mining test notes (front)
Data mining Computerassignment 3
Data mining assignment 4
Data mining assignment 3
Data mining assignment 5
Data mining assignment 6
Data mining assignment 1
Data mining Computerassignment 2
Data mining Computerassignment 1
Semantic web final assignment
Semantic web assignment 3
Semantic web assignment 2
Semantic web assignment1

Data mining assignment 2

  • 2. Exercise 1: Probabilities How can Bayes' rule be derived from simpler definitions, such as the definition of conditional probability, symmetry of joint probability, the chain rule? Give a step-wise derivation, mentioning which rule you applied at each step.   We have a set of possible outcomes for values of x and y: x = { x1, x2…,xn } y = { y1, y2…,yn } We need to show how the Bayes rule is implemented. The Bayes rule is as following: P( X = x | Y = y ) = P( Y = y | X = x ) * P(X = x) / P(Y = y) We use the chain rule: P(X = x , Y = y) Joint = condition * marginal distance P(X = x, Y = y) = P(Y = y | X = x) * P(X = x) P(Y = y | X = x) * P(Y = y) = P(Y = y | X = x) * P(X = x) In conclusion: P(X = x) = P(x)
  • 3. Exercise 2: Entropy 2.1 Assume a variable X with three possible values: a, b, and c. If p(a) = 0:4, and p(b) = 0:25, what is the entropy of of X, i.e., what is H(X)? To know the probability for C we calculate the P. P(total) = 1 P(a) = 0.4 P(b) = 0.25 P(c) = P(total) – P(a) – P(b) P(c) = 0.35 Now we calculate the Entropy by using all probabilities: H = 0.4 log2(0.4) + 0.25 log2(0.25)+ 0.35log2(0.35) H = 1.5589 2.2 Assume a variable X with three possible values: a, b, and c. What is the probability distribution with the highest entropy? Which one(s) has/have the lowest one? Explain in a sentence or two and in your in own words why these distributions have the highest and lowest entropies. We need to see what ‘P’ value is responsible for the highest entropy (so the maximum uncertainty). If we don’t know anything about the values ‘a’, ‘b’ and ‘c’ then we can give now prediction on the possible chances of having any of these values. Because of this we can state that these values are indistinguishable. So the change of having an ‘a’-value is equal to the ‘b’ and ‘c’. We call this uniform distribution. P(a) = P(b) = P(c) P(total) = P(a) – P(b) + P(c) P(total) = 1 P(a) – P(b) + P(c) = 1/3 The lowest entropy would be when we know on forehand which value will be the outcome. So there should be a 100% of having a ‘a’, ‘b’ or ‘c’ value. 2.3 In general, if a variable X has n possible values, what is the maximum entropy? We can just sum up the change for P, we only need a uniform distribution: P(x) = 1/ni i = 1, 2, …, n