SlideShare a Scribd company logo
Questions from paper
"A Few Useful Things to Know about Machine Learning"
Reference: http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
By:
Akhilesh Joshi
mail: akhileshjoshi123@gmail.com
1. Whatis the definitionofML?
Machine Learningisart of usingthe existingdata(historicalandpresent) toforecast/predict
ideal solutionswiththe helpof implementingstatistical modelswithout(orlesser) manual
intervention.HowevertechniquesforMachine Learningare still in development, itisone of the
importantconceptinfieldof datascience withvariousapplicationsthatwill be helpful to
mankind.
2. What is a classifier?
A Classifierisasystemwhere we provide inputstothe system(inputsmaybe discrete orcanbe
continuous) andclassifiergivesusanoutput.The data that we provide tothe classifieriscalled
as trainingdata.So the mainaim of classifieristoprovide anoutputbasedonour trainingdata
and that outputwill correctlyclassifyourtestdatato get more ideal results.
3. What are the 3 components ofa learningsystem,according to the author? Explainthem
briefly.
There are 3 componentsdescribedaboutthe learningsystem.Theyare asfollows.
a. Representation
 Representationisveryimportantaspectforapplicationof ML on our setof data. Here
we understandhowwe shouldrepresentdatasothat it will fitperfectly.Forexample,a
decisiontree mightbe suitedperfectlyforourdata whereasthere canbe neural
networksthatare bestsuitedforotherdata.
b. Evaluation
 Evaluationhelpsusdeterminingthe goodclassifiersforthe badclassifiers.Good
classifiersare those whichprovide arightsetof hypothesisthatare bestsuitedforour
testdata. For studentdatawe mightneed “Likelihood”evaluationparameterforgetting
a job ratherthan “Precisionandrecall”evaluationforgettingajob.Evaluationstep
helpsusindeterminingthe same.
c. Optimization
 Out of all the possible outcomesforourhypothesiswe have to decide whichof the
hypothesisprovide uswithoptimal solutionforourtestdata. Here we use the best
suitedhypothesisforarrivingatmostideal solution.
4. What is informationgain?
Giventhe numberof attributeswe have todecide the attribute which hasmaximum
informationgain.We calculate the average entropyandcompare the sumof entropiestothe
original set.Thiswill helpustobuilda decisiontree.Where anattribute withhighest
informationgainwillbe atroot node,thenagainwe subdivide the furthertree nodesby
comparingthe informationgainsw.r.ttothe root attribute thatwe have alreadychosen.The
orderthe splitsinadecisiontree isindecreasingorderof informationgain.
Formulaforinformationgain:
IG (A) = H(S) - Σv (Sv/S) H (Sv)
IG (A):InformationgainIGoverattribute A
H(S):entropyof all examples
H (Sv):entropyof one subsample afterpartitioningS
5. Whyis generalizationmore important than justgetting a good result on trainingdata i.e.the
data that was usedto train the classifier?
 Usingtrainingdata providesusaninsighthow our data lookslike.Sotrainingour
machine learningalgorithmsonthatparticularsetof data won’tguarantee the
algorithmtoworkcorrectlyon the test data.There mightbe a case where our testdata
iscomplete differentthanourtrainingdata andthe outputmaybe notas desired.So
we have to considerbothscenarioswhere ouralgorithmwill workonbothourtraining
data and testdata. Hence the conceptof generalization.
6. What is cross-validation?Whatare its advantages?
 Giventhe trainingdataS and hypothesisclassH(itcontainsall the possible hypothesis)
we have to find h (correcthypothesisforourdata).So to findhcorrectlywe make the
use of crossvalidationprocesstohave a data withmaximumadvantage.
Advantages of cross-validation
 Data is testedonbothtrainingandtestdata givingthe algorithmclearinsightsabout
the type of data that itmightsee or use for evaluationpurpose
 We can setaside our trainingdataas a part of our testingdatawhichhelpsusto use
that testdata for testingthe workingof ouralgorithmtogive desiredideal solutions.
 Since alreadyaset of data that is setaside asour test data,we neednothave to worry
abouthavinga test data.
Illustrationof cross-validation:
7. How is generalizationdifferentfromotheroptimizationproblems?
 Optimizationproblemsare more alignedtothe datathat is alreadyknown.Whereasin
generalizationwe have toassume the errorsandfindings fromourtrainingdatathat
will helpustoinferabouttestdata or at leastwill tryto infersomethingabouttestdata.
Since optimizationdealswithmore ideal situationswhere mostof the thingsare known
alreadywe can expectthe outputsasdesired, whichisnotthe case of generalization
problems.
8. If you have a scenario where the functioninvolves10 Boolean variables,how many possible
examples(calledinstance space) can there be?If you see 100 examples,whatpercentage of the
instance space have you seen?
 Numberof instancescanbe definedby2N
(where N isthe no.of Booleanvariables).So
inour case total instanceswill be 210
i.e. 1024 instances.Now we are givenwithonly100
examplessowe will be able tosee only 9.76% of instance space.
9. What is the "no free lunch" theoremin machine learning?You can do a Google searchif the
paper isn'tclear enough.
 NO FREE lunchsuggeststhatno learningalgorithmisinherentlysuperior toother
algorithms.If analgorithmisperformingwell inparticularclassof problem, thenit
shouldbe performingworstinotherclassof a problemi.e.performance here is
compensated.If we average the errorforall possible weightinan algorithm, thenwe
will getdifference inexpectederrorsasZERO betweenthosetwoalgorithms.
10. What general assumptionsallow us to carry out the machine learningprocess? What isthe
meaningof induction?
 Inductionismakingthe use of available knowledge toturnitintolarge amountof
knowledge.
11. How is learninglike farming?
 Farmingismore kindof dependentactivitywhere itdependsonNature.Alongwiththe
helpof Nature farmerscombine seedswithnutrientstogrow crops.In similarmanner
to grow programs(like crops),alearninghastocombine knowledge (logic) withdata for
growingthe programs.
12. What is overfitting?Howdoes it leadto a wrong idea that you have done a reallygood job on
training dataset?
 Overfittingiswhenmodel learnsfrommore trainingdata.Whenwe have more training
data thenthe model getsusedtothe characteristicsof the trainingdatawhicheven
includesthe noise anderrorof it.Now whenitcomesto applythe learningthatmodel
learnedontrainingdata,the resultsare not as expectedandthe model mightnotwork
well onthe testdata. It negativelyimpactsonmodelsabilitytogeneralize.Itishighly
likelythatwe will gettestdatasame as our trainingdata.
13. What is meantby biasand variance? Youdon't have to be really precise indefiningthem,just
get the idea.
 Bias: Learners erroneousassumptionsinlearningalgorithms.Low Bias→ more
assumptionsHighBias→ lessassumptions
 Variance:Amountof estimate foramodel tochange withdifferenttrainingdataisused.
14. What are some of the thingsthat can helpcombat overfitting?
 Use of followingtechniquesmighthelpincombatingoverfitting
 cross-validation
 Addinga regularizationtermtothe evaluationfunction.
 performa statistical significancetestlike chi-squarebeforeaddingnew
structure
15. Whydo algorithmsthat work well in lowerdimensionsfail at higherdimensions?Thinkabout
the numberof instancespossible inhigher dimensionsandthe cost of similaritycalculation
 As the dimensionsincreasethe amountof datathat is requiredtotraina model
(inthiscase algorithm) the amountof data neededgrowsexponentially.Ina
wayalgorithmswithlowerdimensionscangeneralize (keepsyncintrainingand
testdata) ina betterwaythan dealingwithmaintaininggeneralizationwith
higherdimensionality. Same phenomenonisexplainedby“Curse of
Dimensionality”
16. What is meantby "blessingofnon-uniformity"?
 Thisrefersto the fact that observationsfromreal-worlddomainsare oftennot
distributeduniformly,butgroupedorclusteredinuseful andmeaningful ways.
17. What has beenone of the major developmentsinthe recent decadesabout resultsof
induction?
 One of the majordevelopmentsisthatwe canhave guaranteesonthe resultsof
induction,particularlyif we’re willingtosettle forprobabilisticguarantees.
18. What is the most important factor that determineswhetheramachine learningproject
succeeds?
- Successof the projectdependsuponnumberof featuresused.If we have many
independentfeaturesthateachcorrelate wellwiththe class,learningiseasy.Onthe other
hand,if the classisa verycomplex functionof the features,we maynotbe able tolearnit.
19. In a ML project,which is more time consuming – feature engineeringorthe actual learning
process?Explain how ML is an iterative process?
 Feature engineeringformsthe more time consumingprocessformachine
learningsince itdealswithmanythingssuchasgatheringdata,cleaningitand
pre-processit.
 In ML we have to carry out certaintasksiterativelysuchasrunningthe learner,
analyzingthe results,modifyingthe dataandthe learner.Hence itisan iterative
process.
20. What, according to the author, is one of the holygrails of ML?
 Accordingto the author,the processof automatingfeature engineering
processesisthe holygrails.Itcan be done by generatinglarge no.of candidate
featuresandselectingthe bestbasedontheirinformationgainw.r.tclass.Butit
has some limitations.
21. If your ML solutionis not performingwell,what are two thingsthat you can do?Which one is
a betteroption?
When an ML solutiondoesnotperformwell we have twomainchoices
. To Designa betterlearneralgorithm
. Gathermore data.
It isalwaysbetterif we go forcollectingmore databecause a dumbalgorithmwith more and
more data beatsa cleveralgorithmwithmodestamountof data
22. What are the 3 limitedresourcesin ML computations? Whatis the bottlenecktoday? What is
one of the solutions?
The 3 limitedresourcesinMLcomputationsare:
. Time
. Memory
. TrainingData
The bottleneckhaschangedfromdecade todecade and todayit is“Time”. If there ismore data
thenit takesverylongto processitand learnthe complex algorithm.Sothe onlysolutionfor
thisisto come upwitha fasterwayto learnthe complex classifiers.
23. A surprisingfact mentionedbythe author is that all representations(typesoflearners)
essentially"all dothe same".Can you explain?Whichlearnersshouldyou try first?
All learnersworkbygroupingnearbyexamplesintothe same class,the keydifference
isin the meaningof nearby.Withnon-uniformlydistributeddata,learnerscanproduce widely
differentfrontierswhile still makingthe same predictionsinthe regionsthatmatter.
It isbetterto try the simplestlearnersfirst.Complexlearnersare usuallyhardertouse,because
theyhave more knobsyouneedto turnto get goodresults,andbecause theirinternalsare
more opaque
24. The author divideslearnersinto two typesbased on theirrepresentationsize.Write a brief
summary.
Accordingtothe authorthere are twotypesof learnersbasedonrepresentationsize.
1) Learners withfixedrepresentationsize
2) Learners whose representationsize growswithdata
Fixed-sizelearnerscanonlytake advantage of so muchdata. Variable-sizelearnerscanin
principle learnanyfunctiongivensufficientdata,butinpractice theymaynot,because of
limitationsof the algorithmorcomputational costorthe curse of dimensionality. Forthese
reasons,cleveralgorithmsthose thatmake the mostof the data andcomputingresources
available oftenpayoff inthe end.
25. Is it betterto have variation of a single model or a combination ofdifferentmodels,knownas
ensemble orstacking? Explainbriefly.
Researchersnoticedthat,if insteadof selectingthe bestvariationfound,we combine many
variations,the resultsare oftenmuchbetterandat little extraeffortforthe user.Inensembling
we generate randomvariationsof the trainingsetbyresampling,learnaclassifieroneach,and
combine the resultsbyvoting.Thisworksbecause itgreatlyreducesvariancewhile onlyslightly
increasingbias.
26. Read the last paragraph and explainwhy it makessense to prefersimpleralgorithms and
hypotheses.
Whenthe complexityiscomparedtothe size of hypothesisspace,smallerspacesof hypotheses
are allowedtobe representedinshortercodes.A learnerwithalargerhypothesisspace that
triesfewerhypothesesfromitislesslikelytooverfitthanone thattriesmore hypothesesfroma
smallerspace.Soitmakessense toprefersimpleralgorithmsandhypothesesasmore the
numberof assumptions tomake,more unlikelyexplanationis.
27. It has beenestablishedthat correlationbetweenindependentvariablesandpredicted
variablesdoes not implycausation, still correlation isused by many researchers.Explainbrieflythe
reason.
In a predictionstudy,the goal istodevelopaformulaformakingpredictionsaboutthe
dependentvariable,basedonthe observedvaluesof the independentvariables. Ina causal analysis,the
independentvariablesare regardedascausesof the dependentvariable. Manylearningalgorithmscan
potentiallyextractcausal informationfromobservational data,buttheirapplicabilityisratherrestricted.
To findcausation,yougenerallyneedexperimental data,notobservational data.Correlation isa
necessarybutnotsufficientconditionforcausation. Correlationisavaluable type of scientificevidence
infieldssuchasmedicine,psychology,andsociology.Butfirstcorrelationsmustbe confirmedasreal,
and theneverypossible causativerelationshipmustbe systematicallyexplored.Inthe endcorrelation
can be usedaspowerful evidence foracause-and-effectrelationshipbetweenatreatmentandbenefit,
a risk factorand a disease,ora social or economicfactorand variousoutcomes.

More Related Content

PDF
PWL Seattle #23 - A Few Useful Things to Know About Machine Learning
PDF
A Few Useful Things to Know about Machine Learning
PDF
Lecture 1: What is Machine Learning?
PPTX
Lecture 01: Machine Learning for Language Technology - Introduction
PDF
Lecture 1: Introduction to the Course (Practical Information)
PPTX
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
PPTX
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
PDF
Lecture1 introduction to machine learning
PWL Seattle #23 - A Few Useful Things to Know About Machine Learning
A Few Useful Things to Know about Machine Learning
Lecture 1: What is Machine Learning?
Lecture 01: Machine Learning for Language Technology - Introduction
Lecture 1: Introduction to the Course (Practical Information)
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Lecture1 introduction to machine learning

What's hot (20)

PPTX
Machine Learning
PPTX
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
PPT
Basics of Machine Learning
PPTX
Machine learning
PDF
Machine Learning
PPTX
Hot Topics in Machine Learning For Research and thesis
PPTX
Introduction to Machine Learning
PDF
Machine learning interview questions and answers
PDF
Machine Learning: Machine Learning: Introduction Introduction
DOC
DagdelenSiriwardaneY..
PPTX
Introduction to machine learning
PDF
Model Evaluation in the land of Deep Learning
PPTX
Machine Learning Using Python
PDF
Machine Learning an Research Overview
PPTX
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...
PPTX
Machine Learning and Real-World Applications
PPT
LearningAG.ppt
PDF
Machine Learning Interview Questions Answers
PPT
Learning
PPTX
Introduction To Machine Learning
Machine Learning
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Basics of Machine Learning
Machine learning
Machine Learning
Hot Topics in Machine Learning For Research and thesis
Introduction to Machine Learning
Machine learning interview questions and answers
Machine Learning: Machine Learning: Introduction Introduction
DagdelenSiriwardaneY..
Introduction to machine learning
Model Evaluation in the land of Deep Learning
Machine Learning Using Python
Machine Learning an Research Overview
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...
Machine Learning and Real-World Applications
LearningAG.ppt
Machine Learning Interview Questions Answers
Learning
Introduction To Machine Learning
Ad

Viewers also liked (20)

PPTX
Deborap
PPT
Power point harp seal
PPT
Cssbestpracticesjstyleguidejandtips 150830184202-lva1-app6892
ODP
Lengua anuncio
PDF
Direct Relief FY 2014 Annual Report
PDF
Nascent Financial Services
PDF
Hum2220 sm2015 syllabus
PPTX
KEPERCAYAAN GURU
PDF
Obesità e stress ossidativo: una relazione pericolosa.
PDF
Hum2310 fa2014 exam 4 study guide
PDF
Ruolo dello stress ossidativo nei vari stadi della psoriasi
PDF
Hum2310 sp2015 proust questionnaire
PPT
Hotmail
PDF
Виртуализирано видеонаблюдение под FreeBSD
PDF
DOCX
My day Jesus
PDF
Hum2310 sp2016 annotated study guide
PDF
Node.social
PDF
2007 Spring Newsletter
PPT
華語教學必用的雙拼快注音Instant bopomo chinese phonetic symbols
Deborap
Power point harp seal
Cssbestpracticesjstyleguidejandtips 150830184202-lva1-app6892
Lengua anuncio
Direct Relief FY 2014 Annual Report
Nascent Financial Services
Hum2220 sm2015 syllabus
KEPERCAYAAN GURU
Obesità e stress ossidativo: una relazione pericolosa.
Hum2310 fa2014 exam 4 study guide
Ruolo dello stress ossidativo nei vari stadi della psoriasi
Hum2310 sp2015 proust questionnaire
Hotmail
Виртуализирано видеонаблюдение под FreeBSD
My day Jesus
Hum2310 sp2016 annotated study guide
Node.social
2007 Spring Newsletter
華語教學必用的雙拼快注音Instant bopomo chinese phonetic symbols
Ad

Similar to Machine learning (domingo's paper) (20)

DOCX
Training_Report_on_Machine_Learning.docx
PDF
An Introduction to Machine Learning
PPTX
introduction to machine learning
PDF
Andrew NG machine learning
PDF
Introduction to machine learning
PPTX
Introduction to Machine Learning
PDF
ML crash course
PDF
machine learning
PPTX
Eckovation Machine Learning
PDF
what-is-machine-learning-and-its-importance-in-todays-world.pdf
PDF
Machine Learning - Deep Learning
PPTX
Machine Learning
PDF
1822-b.e-cse-batchno-34.pdf45879652155547
PPTX
Intro/Overview on Machine Learning Presentation
PPTX
Machine learning: A Walk Through School Exams
PDF
Top 20 Data Science Interview Questions and Answers in 2023.pdf
PPTX
Advanced Machine Learning- Introduction to Machine Learning
PPTX
Machine Learning.pptx
PPTX
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
PDF
ML Mindbenders: Interview Questions That’ll Make You Sweat (Smartly)!
Training_Report_on_Machine_Learning.docx
An Introduction to Machine Learning
introduction to machine learning
Andrew NG machine learning
Introduction to machine learning
Introduction to Machine Learning
ML crash course
machine learning
Eckovation Machine Learning
what-is-machine-learning-and-its-importance-in-todays-world.pdf
Machine Learning - Deep Learning
Machine Learning
1822-b.e-cse-batchno-34.pdf45879652155547
Intro/Overview on Machine Learning Presentation
Machine learning: A Walk Through School Exams
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Advanced Machine Learning- Introduction to Machine Learning
Machine Learning.pptx
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
ML Mindbenders: Interview Questions That’ll Make You Sweat (Smartly)!

More from Akhilesh Joshi (20)

PPTX
PCA and LDA in machine learning
PPTX
random forest regression
PPTX
decision tree regression
PPTX
support vector regression
PPTX
polynomial linear regression
PPTX
multiple linear regression
PPTX
simple linear regression
PPTX
R square vs adjusted r square
PPTX
PPTX
Grid search (parameter tuning)
PPTX
svm classification
PPTX
knn classification
PPTX
logistic regression with python and R
PPTX
Data preprocessing for Machine Learning with R and Python
PPTX
Design patterns
PPTX
Bastion Host : Amazon Web Services
PDF
Design patterns in MapReduce
PPT
Google knowledge graph
DOC
SoLoMo - Future of Marketing
PPTX
Webcrawler
PCA and LDA in machine learning
random forest regression
decision tree regression
support vector regression
polynomial linear regression
multiple linear regression
simple linear regression
R square vs adjusted r square
Grid search (parameter tuning)
svm classification
knn classification
logistic regression with python and R
Data preprocessing for Machine Learning with R and Python
Design patterns
Bastion Host : Amazon Web Services
Design patterns in MapReduce
Google knowledge graph
SoLoMo - Future of Marketing
Webcrawler

Recently uploaded (20)

PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Basic Mud Logging Guide for educational purpose
PDF
Pre independence Education in Inndia.pdf
PDF
Business Ethics Teaching Materials for college
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Insiders guide to clinical Medicine.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Classroom Observation Tools for Teachers
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
Supply Chain Operations Speaking Notes -ICLT Program
102 student loan defaulters named and shamed – Is someone you know on the list?
Basic Mud Logging Guide for educational purpose
Pre independence Education in Inndia.pdf
Business Ethics Teaching Materials for college
STATICS OF THE RIGID BODIES Hibbelers.pdf
RMMM.pdf make it easy to upload and study
Final Presentation General Medicine 03-08-2024.pptx
Insiders guide to clinical Medicine.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
01-Introduction-to-Information-Management.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
VCE English Exam - Section C Student Revision Booklet
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
O5-L3 Freight Transport Ops (International) V1.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Classroom Observation Tools for Teachers
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Microbial disease of the cardiovascular and lymphatic systems

Machine learning (domingo's paper)

  • 1. Questions from paper "A Few Useful Things to Know about Machine Learning" Reference: http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf By: Akhilesh Joshi mail: akhileshjoshi123@gmail.com
  • 2. 1. Whatis the definitionofML? Machine Learningisart of usingthe existingdata(historicalandpresent) toforecast/predict ideal solutionswiththe helpof implementingstatistical modelswithout(orlesser) manual intervention.HowevertechniquesforMachine Learningare still in development, itisone of the importantconceptinfieldof datascience withvariousapplicationsthatwill be helpful to mankind. 2. What is a classifier? A Classifierisasystemwhere we provide inputstothe system(inputsmaybe discrete orcanbe continuous) andclassifiergivesusanoutput.The data that we provide tothe classifieriscalled as trainingdata.So the mainaim of classifieristoprovide anoutputbasedonour trainingdata and that outputwill correctlyclassifyourtestdatato get more ideal results. 3. What are the 3 components ofa learningsystem,according to the author? Explainthem briefly. There are 3 componentsdescribedaboutthe learningsystem.Theyare asfollows. a. Representation  Representationisveryimportantaspectforapplicationof ML on our setof data. Here we understandhowwe shouldrepresentdatasothat it will fitperfectly.Forexample,a decisiontree mightbe suitedperfectlyforourdata whereasthere canbe neural networksthatare bestsuitedforotherdata. b. Evaluation  Evaluationhelpsusdeterminingthe goodclassifiersforthe badclassifiers.Good classifiersare those whichprovide arightsetof hypothesisthatare bestsuitedforour testdata. For studentdatawe mightneed “Likelihood”evaluationparameterforgetting a job ratherthan “Precisionandrecall”evaluationforgettingajob.Evaluationstep helpsusindeterminingthe same. c. Optimization  Out of all the possible outcomesforourhypothesiswe have to decide whichof the hypothesisprovide uswithoptimal solutionforourtestdata. Here we use the best suitedhypothesisforarrivingatmostideal solution. 4. What is informationgain? Giventhe numberof attributeswe have todecide the attribute which hasmaximum informationgain.We calculate the average entropyandcompare the sumof entropiestothe original set.Thiswill helpustobuilda decisiontree.Where anattribute withhighest informationgainwillbe atroot node,thenagainwe subdivide the furthertree nodesby comparingthe informationgainsw.r.ttothe root attribute thatwe have alreadychosen.The orderthe splitsinadecisiontree isindecreasingorderof informationgain.
  • 3. Formulaforinformationgain: IG (A) = H(S) - Σv (Sv/S) H (Sv) IG (A):InformationgainIGoverattribute A H(S):entropyof all examples H (Sv):entropyof one subsample afterpartitioningS 5. Whyis generalizationmore important than justgetting a good result on trainingdata i.e.the data that was usedto train the classifier?  Usingtrainingdata providesusaninsighthow our data lookslike.Sotrainingour machine learningalgorithmsonthatparticularsetof data won’tguarantee the algorithmtoworkcorrectlyon the test data.There mightbe a case where our testdata iscomplete differentthanourtrainingdata andthe outputmaybe notas desired.So we have to considerbothscenarioswhere ouralgorithmwill workonbothourtraining data and testdata. Hence the conceptof generalization. 6. What is cross-validation?Whatare its advantages?  Giventhe trainingdataS and hypothesisclassH(itcontainsall the possible hypothesis) we have to find h (correcthypothesisforourdata).So to findhcorrectlywe make the use of crossvalidationprocesstohave a data withmaximumadvantage. Advantages of cross-validation  Data is testedonbothtrainingandtestdata givingthe algorithmclearinsightsabout the type of data that itmightsee or use for evaluationpurpose  We can setaside our trainingdataas a part of our testingdatawhichhelpsusto use that testdata for testingthe workingof ouralgorithmtogive desiredideal solutions.  Since alreadyaset of data that is setaside asour test data,we neednothave to worry abouthavinga test data. Illustrationof cross-validation:
  • 4. 7. How is generalizationdifferentfromotheroptimizationproblems?  Optimizationproblemsare more alignedtothe datathat is alreadyknown.Whereasin generalizationwe have toassume the errorsandfindings fromourtrainingdatathat will helpustoinferabouttestdata or at leastwill tryto infersomethingabouttestdata. Since optimizationdealswithmore ideal situationswhere mostof the thingsare known alreadywe can expectthe outputsasdesired, whichisnotthe case of generalization problems. 8. If you have a scenario where the functioninvolves10 Boolean variables,how many possible examples(calledinstance space) can there be?If you see 100 examples,whatpercentage of the instance space have you seen?  Numberof instancescanbe definedby2N (where N isthe no.of Booleanvariables).So inour case total instanceswill be 210 i.e. 1024 instances.Now we are givenwithonly100 examplessowe will be able tosee only 9.76% of instance space. 9. What is the "no free lunch" theoremin machine learning?You can do a Google searchif the paper isn'tclear enough.  NO FREE lunchsuggeststhatno learningalgorithmisinherentlysuperior toother algorithms.If analgorithmisperformingwell inparticularclassof problem, thenit shouldbe performingworstinotherclassof a problemi.e.performance here is compensated.If we average the errorforall possible weightinan algorithm, thenwe will getdifference inexpectederrorsasZERO betweenthosetwoalgorithms. 10. What general assumptionsallow us to carry out the machine learningprocess? What isthe meaningof induction?  Inductionismakingthe use of available knowledge toturnitintolarge amountof knowledge. 11. How is learninglike farming?  Farmingismore kindof dependentactivitywhere itdependsonNature.Alongwiththe helpof Nature farmerscombine seedswithnutrientstogrow crops.In similarmanner to grow programs(like crops),alearninghastocombine knowledge (logic) withdata for growingthe programs. 12. What is overfitting?Howdoes it leadto a wrong idea that you have done a reallygood job on training dataset?  Overfittingiswhenmodel learnsfrommore trainingdata.Whenwe have more training data thenthe model getsusedtothe characteristicsof the trainingdatawhicheven includesthe noise anderrorof it.Now whenitcomesto applythe learningthatmodel learnedontrainingdata,the resultsare not as expectedandthe model mightnotwork well onthe testdata. It negativelyimpactsonmodelsabilitytogeneralize.Itishighly likelythatwe will gettestdatasame as our trainingdata.
  • 5. 13. What is meantby biasand variance? Youdon't have to be really precise indefiningthem,just get the idea.  Bias: Learners erroneousassumptionsinlearningalgorithms.Low Bias→ more assumptionsHighBias→ lessassumptions  Variance:Amountof estimate foramodel tochange withdifferenttrainingdataisused. 14. What are some of the thingsthat can helpcombat overfitting?  Use of followingtechniquesmighthelpincombatingoverfitting  cross-validation  Addinga regularizationtermtothe evaluationfunction.  performa statistical significancetestlike chi-squarebeforeaddingnew structure 15. Whydo algorithmsthat work well in lowerdimensionsfail at higherdimensions?Thinkabout the numberof instancespossible inhigher dimensionsandthe cost of similaritycalculation  As the dimensionsincreasethe amountof datathat is requiredtotraina model (inthiscase algorithm) the amountof data neededgrowsexponentially.Ina wayalgorithmswithlowerdimensionscangeneralize (keepsyncintrainingand testdata) ina betterwaythan dealingwithmaintaininggeneralizationwith higherdimensionality. Same phenomenonisexplainedby“Curse of Dimensionality” 16. What is meantby "blessingofnon-uniformity"?  Thisrefersto the fact that observationsfromreal-worlddomainsare oftennot distributeduniformly,butgroupedorclusteredinuseful andmeaningful ways. 17. What has beenone of the major developmentsinthe recent decadesabout resultsof induction?  One of the majordevelopmentsisthatwe canhave guaranteesonthe resultsof induction,particularlyif we’re willingtosettle forprobabilisticguarantees. 18. What is the most important factor that determineswhetheramachine learningproject succeeds? - Successof the projectdependsuponnumberof featuresused.If we have many independentfeaturesthateachcorrelate wellwiththe class,learningiseasy.Onthe other hand,if the classisa verycomplex functionof the features,we maynotbe able tolearnit. 19. In a ML project,which is more time consuming – feature engineeringorthe actual learning process?Explain how ML is an iterative process?  Feature engineeringformsthe more time consumingprocessformachine learningsince itdealswithmanythingssuchasgatheringdata,cleaningitand pre-processit.  In ML we have to carry out certaintasksiterativelysuchasrunningthe learner, analyzingthe results,modifyingthe dataandthe learner.Hence itisan iterative process.
  • 6. 20. What, according to the author, is one of the holygrails of ML?  Accordingto the author,the processof automatingfeature engineering processesisthe holygrails.Itcan be done by generatinglarge no.of candidate featuresandselectingthe bestbasedontheirinformationgainw.r.tclass.Butit has some limitations. 21. If your ML solutionis not performingwell,what are two thingsthat you can do?Which one is a betteroption? When an ML solutiondoesnotperformwell we have twomainchoices . To Designa betterlearneralgorithm . Gathermore data. It isalwaysbetterif we go forcollectingmore databecause a dumbalgorithmwith more and more data beatsa cleveralgorithmwithmodestamountof data 22. What are the 3 limitedresourcesin ML computations? Whatis the bottlenecktoday? What is one of the solutions? The 3 limitedresourcesinMLcomputationsare: . Time . Memory . TrainingData The bottleneckhaschangedfromdecade todecade and todayit is“Time”. If there ismore data thenit takesverylongto processitand learnthe complex algorithm.Sothe onlysolutionfor thisisto come upwitha fasterwayto learnthe complex classifiers. 23. A surprisingfact mentionedbythe author is that all representations(typesoflearners) essentially"all dothe same".Can you explain?Whichlearnersshouldyou try first? All learnersworkbygroupingnearbyexamplesintothe same class,the keydifference isin the meaningof nearby.Withnon-uniformlydistributeddata,learnerscanproduce widely differentfrontierswhile still makingthe same predictionsinthe regionsthatmatter. It isbetterto try the simplestlearnersfirst.Complexlearnersare usuallyhardertouse,because theyhave more knobsyouneedto turnto get goodresults,andbecause theirinternalsare more opaque 24. The author divideslearnersinto two typesbased on theirrepresentationsize.Write a brief summary. Accordingtothe authorthere are twotypesof learnersbasedonrepresentationsize. 1) Learners withfixedrepresentationsize 2) Learners whose representationsize growswithdata
  • 7. Fixed-sizelearnerscanonlytake advantage of so muchdata. Variable-sizelearnerscanin principle learnanyfunctiongivensufficientdata,butinpractice theymaynot,because of limitationsof the algorithmorcomputational costorthe curse of dimensionality. Forthese reasons,cleveralgorithmsthose thatmake the mostof the data andcomputingresources available oftenpayoff inthe end. 25. Is it betterto have variation of a single model or a combination ofdifferentmodels,knownas ensemble orstacking? Explainbriefly. Researchersnoticedthat,if insteadof selectingthe bestvariationfound,we combine many variations,the resultsare oftenmuchbetterandat little extraeffortforthe user.Inensembling we generate randomvariationsof the trainingsetbyresampling,learnaclassifieroneach,and combine the resultsbyvoting.Thisworksbecause itgreatlyreducesvariancewhile onlyslightly increasingbias. 26. Read the last paragraph and explainwhy it makessense to prefersimpleralgorithms and hypotheses. Whenthe complexityiscomparedtothe size of hypothesisspace,smallerspacesof hypotheses are allowedtobe representedinshortercodes.A learnerwithalargerhypothesisspace that triesfewerhypothesesfromitislesslikelytooverfitthanone thattriesmore hypothesesfroma smallerspace.Soitmakessense toprefersimpleralgorithmsandhypothesesasmore the numberof assumptions tomake,more unlikelyexplanationis. 27. It has beenestablishedthat correlationbetweenindependentvariablesandpredicted variablesdoes not implycausation, still correlation isused by many researchers.Explainbrieflythe reason. In a predictionstudy,the goal istodevelopaformulaformakingpredictionsaboutthe dependentvariable,basedonthe observedvaluesof the independentvariables. Ina causal analysis,the independentvariablesare regardedascausesof the dependentvariable. Manylearningalgorithmscan potentiallyextractcausal informationfromobservational data,buttheirapplicabilityisratherrestricted. To findcausation,yougenerallyneedexperimental data,notobservational data.Correlation isa necessarybutnotsufficientconditionforcausation. Correlationisavaluable type of scientificevidence infieldssuchasmedicine,psychology,andsociology.Butfirstcorrelationsmustbe confirmedasreal, and theneverypossible causativerelationshipmustbe systematicallyexplored.Inthe endcorrelation can be usedaspowerful evidence foracause-and-effectrelationshipbetweenatreatmentandbenefit, a risk factorand a disease,ora social or economicfactorand variousoutcomes.