SlideShare a Scribd company logo
© 2013 ExcelR Solutions. All Rights Reserved
Advanced Regression
AGENDA	
Mul)nomial	
Regression	
Zero	Inflated	
Poisson	
Regression	
Nega)ve	
Binomial
© 2013 ExcelR Solutions. All Rights Reserved
Multinomial Regression
•  Logis'c	regression	(Binomial	distribu'on)	is	used	when	output	has	‘2’	categories	
•  Mul'nomial	regression	(classifica'on	model)	is	used	when	output	has	>	‘2’	categories	
•  Extension	to	logis'c	regression	
	
•  No	natural	ordering	of	categories	
•  Response	variable	has	>	‘2’	categories	&	hence	we	apply	mul'logit	
•  Understand	the	impact	of	cost	&	'me	on	the	various	modes	of	transport	
Mode	of	
transport	
Car	 Carpool	 Bus	 Rail	 All	modes	
Count	 218	 32	 81	 122	 453	
Probability	 0.48	 0.07	 0.18	 0.27	 1
© 2013 ExcelR Solutions. All Rights Reserved
Multinomial Regression
•  Whether	we	have	‘Y’	(response)	or	‘X’	(predictor),	which	is	categorical	with	‘s’	categories	
ü  Lowest	in	numerical	/	lexicographical	value	is	chosen	as	baseline	/	reference	
ü  Missing	level	in	output	is	baseline	level	
ü  We	can	choose	the	baseline	level	of	our	choice	based	on	‘relevel’	func'on	in	R	
ü  Model	formulates	the	rela'onship	between	transformed	(logit)	Y	&	numerical	X	linearly	
ü  Modeling	quan'ta've	variables	linearly	might	not	always	be	correct
© 2013 ExcelR Solutions. All Rights Reserved
Multinomial Regression - Output
Itera'on	History:		
•  Itera've	procedure	is	used	to	compute	maximum	likelihood	es'mates	
•  #	itera'ons	&	convergence	status	is	provided	
•  -2logL	=	2	*	nega've	log	likelihood	
•  -2logL	has	χ2	distribu'on,	which	is	used	for	hypothesis	tes'ng	of	goodness	of	fit	
#	parameters	=	27
© 2013 ExcelR Solutions. All Rights Reserved
Multinomial Regression - Output
Log(P(choice	=	carpool	|	x)	/	P(choice	=	car	|	x)	=	β20	+	β21	*	cost.car	+	β22	*	cost.carpool	+	…………….		
	
This	equa'on	compares	the	log	of	probabili'es	of	carpool	to	car			
•  ‘car’	has	been	chosen	as	baseline	
•  x	=	vector	represen'ng	the	values	of	all	inputs	
•  The	regression	coefficient	0.636	indicates	that	for	a	‘1’	unit	increases	the	‘cost.car’,	the	log	odds	of	‘carpool’	to	‘car’	
increases	by	0.636	
•  Intercept	value	does	not	mean	anything	in	this	context	
	
•  If	we	have	a	categorical	X	also,	say	Gender	(female	=	0,	male	=	1),	then	regression	coefficient	(say	0.22)	indicates	
that	rela've	to	females,	males	increase	the	log	odds	of	‘carpool’	to	‘car’	by	0.22
© 2013 ExcelR Solutions. All Rights Reserved
Probability
•  Let	p	=	p(x	|	A)	be	the	probability	of	any	event	(say	airi'on)	under	condi'on	A	(say	
gender	=	female)		
	
•  Then		p(x	|	A)	÷	(1	-	p(x	|	A)	is	called	the	odds	associated	with	the	event	
Odds
•  If	there	are	two	condi'ons	A	(gender	=	female)	&	B	(gender	=	male)	then	the	ra'o	
						p(x	|	A)	÷	(1	-	p(x	|	A)	/	p(x	|	B)	÷	(1	-	p(x	|	B)		is	called	as	odds	ra'o	of	A	with	respect	to	B	
Odds Ratio
•  p(x	|	A)	÷	p(x	|	B)	is	called	as	rela've	risk	
Relative Risk
hips://en.wikipedia.org/wiki/Rela've_risk
© 2013 ExcelR Solutions. All Rights Reserved
•  Odds	ra'o	is	computed	from	the	coefficients	in	the	linear	model	equa'on	by	simply	
exponen'a'ng	
•  Exponen'ated	regression	coefficients	are	odds	ra'o	for	a	unit	change	in	a	predictor	
variable	
•  The	odds	ra'o	for	a	unit	increase	in	cost.car	is	1.88	for	choosing	carpool	vs	car	
Odds Ratio
© 2013 ExcelR Solutions. All Rights Reserved
Goodness of fit
Linear	 GLM	
Analysis	of	Variance	 Analysis	of	Deviance	
Residual	Deviance	 Residual	Sum	of	Squares	
OLS	 Maximum	Likelihood	
•  Residual	Deviance	is	-2	log	L	
•  Adding	more	parameters	to	the	model	will	reduce	Residual	Deviance	even	if	it	is	not	
going	to	be	useful	for	predic'on	
•  In	order	to	control	this,	penalty	of	“2	*	number	of	parameters”	is	added	to	to	
Residual	deviance	
•  This	penalized	value	of	-2	log	L	is	called	as	AIC	criterion	
•  AIC	=	-2	log	L	+	2	*	number	of	parameters	
Note:	“Mul'logit	Model	with	Interac(on”

More Related Content

PPTX
IOT course in Bangalore
PPTX
Digital Marketing Training In Hyderabad
PPTX
PMP Certification Hyderabad
PDF
Digital Marketing Training In Hyderabad
PDF
2024 Trend Updates: What Really Works In SEO & Content Marketing
PDF
Storytelling For The Web: Integrate Storytelling in your Design Process
PDF
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
PDF
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
IOT course in Bangalore
Digital Marketing Training In Hyderabad
PMP Certification Hyderabad
Digital Marketing Training In Hyderabad
2024 Trend Updates: What Really Works In SEO & Content Marketing
Storytelling For The Web: Integrate Storytelling in your Design Process
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...

Recently uploaded (20)

PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
Trump Administration's workforce development strategy
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Cell Types and Its function , kingdom of life
PPTX
Unit 4 Skeletal System.ppt.pptxopresentatiom
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
advance database management system book.pdf
PPTX
Digestion and Absorption of Carbohydrates, Proteina and Fats
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Computing-Curriculum for Schools in Ghana
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Trump Administration's workforce development strategy
Final Presentation General Medicine 03-08-2024.pptx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Cell Types and Its function , kingdom of life
Unit 4 Skeletal System.ppt.pptxopresentatiom
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Weekly quiz Compilation Jan -July 25.pdf
What if we spent less time fighting change, and more time building what’s rig...
Supply Chain Operations Speaking Notes -ICLT Program
advance database management system book.pdf
Digestion and Absorption of Carbohydrates, Proteina and Fats
Orientation - ARALprogram of Deped to the Parents.pptx
A systematic review of self-coping strategies used by university students to ...
Computing-Curriculum for Schools in Ghana
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Ad
Ad

data science certification

  • 1. © 2013 ExcelR Solutions. All Rights Reserved Advanced Regression AGENDA Mul)nomial Regression Zero Inflated Poisson Regression Nega)ve Binomial
  • 2. © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression •  Logis'c regression (Binomial distribu'on) is used when output has ‘2’ categories •  Mul'nomial regression (classifica'on model) is used when output has > ‘2’ categories •  Extension to logis'c regression •  No natural ordering of categories •  Response variable has > ‘2’ categories & hence we apply mul'logit •  Understand the impact of cost & 'me on the various modes of transport Mode of transport Car Carpool Bus Rail All modes Count 218 32 81 122 453 Probability 0.48 0.07 0.18 0.27 1
  • 3. © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression •  Whether we have ‘Y’ (response) or ‘X’ (predictor), which is categorical with ‘s’ categories ü  Lowest in numerical / lexicographical value is chosen as baseline / reference ü  Missing level in output is baseline level ü  We can choose the baseline level of our choice based on ‘relevel’ func'on in R ü  Model formulates the rela'onship between transformed (logit) Y & numerical X linearly ü  Modeling quan'ta've variables linearly might not always be correct
  • 4. © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression - Output Itera'on History: •  Itera've procedure is used to compute maximum likelihood es'mates •  # itera'ons & convergence status is provided •  -2logL = 2 * nega've log likelihood •  -2logL has χ2 distribu'on, which is used for hypothesis tes'ng of goodness of fit # parameters = 27
  • 5. © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression - Output Log(P(choice = carpool | x) / P(choice = car | x) = β20 + β21 * cost.car + β22 * cost.carpool + ……………. This equa'on compares the log of probabili'es of carpool to car •  ‘car’ has been chosen as baseline •  x = vector represen'ng the values of all inputs •  The regression coefficient 0.636 indicates that for a ‘1’ unit increases the ‘cost.car’, the log odds of ‘carpool’ to ‘car’ increases by 0.636 •  Intercept value does not mean anything in this context •  If we have a categorical X also, say Gender (female = 0, male = 1), then regression coefficient (say 0.22) indicates that rela've to females, males increase the log odds of ‘carpool’ to ‘car’ by 0.22
  • 6. © 2013 ExcelR Solutions. All Rights Reserved Probability •  Let p = p(x | A) be the probability of any event (say airi'on) under condi'on A (say gender = female) •  Then p(x | A) ÷ (1 - p(x | A) is called the odds associated with the event Odds •  If there are two condi'ons A (gender = female) & B (gender = male) then the ra'o p(x | A) ÷ (1 - p(x | A) / p(x | B) ÷ (1 - p(x | B) is called as odds ra'o of A with respect to B Odds Ratio •  p(x | A) ÷ p(x | B) is called as rela've risk Relative Risk hips://en.wikipedia.org/wiki/Rela've_risk
  • 7. © 2013 ExcelR Solutions. All Rights Reserved •  Odds ra'o is computed from the coefficients in the linear model equa'on by simply exponen'a'ng •  Exponen'ated regression coefficients are odds ra'o for a unit change in a predictor variable •  The odds ra'o for a unit increase in cost.car is 1.88 for choosing carpool vs car Odds Ratio
  • 8. © 2013 ExcelR Solutions. All Rights Reserved Goodness of fit Linear GLM Analysis of Variance Analysis of Deviance Residual Deviance Residual Sum of Squares OLS Maximum Likelihood •  Residual Deviance is -2 log L •  Adding more parameters to the model will reduce Residual Deviance even if it is not going to be useful for predic'on •  In order to control this, penalty of “2 * number of parameters” is added to to Residual deviance •  This penalized value of -2 log L is called as AIC criterion •  AIC = -2 log L + 2 * number of parameters Note: “Mul'logit Model with Interac(on”