iccv2009 tutorial: boosting and random forest - part II

Part II Boosting and Tree-structured ClassifierTae-Kyun Kim

The classification speed is not just for time-efficiency but good accuracy

Object Detection by a Cascade of ClassifiersICCV09 Tutorial Tae-Kyun Kim University of Cambridge3Pictures from Romdhani et al. ICCV01

Object Tracking by Fast (Re-) Detection4Search regionFrom Time t to t+1Previous object locationDetection againOnline discriminative feature selection [Collins et al 03], Ensemble tracking [Avidan 07]ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Semantic SegmentationRequiring pixel-wise classification5ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Structure of this talkSecond halfUnified Boosting frameworkTree-structured classifiersMCBoostSpeeding upSuper treeComparison6First half

Robust real-time object detector

Boosting as a tree-structured classifierICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Things not coveredFast Training (e.g. Pham and Cham ICCV07)Randomised learning for Boosting (Rahimi et al NIPS 08)Variations such as Real valued AdaBoost (Freund and Schapire 95)GentleBoost etc by FriedmanEtc.7ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Introduction to Boosting Classifiers

Introduction to Boosting[Meir et al 03, Schapire 03] The underlying idea is to combine simple “rules”to form an ensemble such that the performance of the single ensemble member is improved, i.e. “boosted” .The strong classifier isA brief historyPAC (Prob. Approx. Correct) learning (Valiant 1984, Kearns and Valiant 1994): learners, each performing slightly better than random, can be combined to form a good hypothesis.Schapire (1990) first provided a polynomial time algorithm and applied to a OCR task, relying on neural networks as base learners.: a set of hypotheses: weights9ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

A brief history (continued)AdaBoost(Adaptive Boosting) is most common (Freund and Schapire 94). Lots of variations are formalised in an unified gradient descent procedure (Mason et al 00) .Grove et al (98) have shown overfitting effects on high noise data sets. New types have emerged for e.g. Regression (Duffy et al 00), multi-class (Allwein et al 2000), unsupervised learning (Ratsch et al 00). Real-time object detector of Viola and Jones (01) is a landmark in CV.10ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

It is to estimate a function f: X->Y , using input-output training data pairs generated independently at random from an unknown prob. dist. P(x,y) such that f will correctly predict unseen examples (x,y). Soft classifier takes Y=R , assigning the label according to sign(f(x)). The risk (or generalisation error) is where λ is a loss ftn. E.g. Formalisation11ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Formalisation(continued)Empirical risk is minimisedWhen N goes to an infinity, Occam’s razor: A “simple function” that explains most of data is preferable to a complex one which fits the data very well.Ensemble Learning - Bagging/Boosting12ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Bagging (Bootstrap AGGregatING)BootstrapFor each set, randomly draw examples from the uniform dist. allowing duplication and missing.Ensemble classifier are chosen on T bootstrap sets. .More theory on bias/variance [Geurts et al 06]or majority votingMachine Learning Basics: 3. Ensemble Learning13ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

14BaggingLSLS1LS2LSTtree t1tree tTtree t2……xy2(x)y1(x)yT(x)y(x)In classification: = the majority class in { ,…, } y1(x)yT(x)ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Randomized Decision Forest [Breiman 01, Geurts et al 06]Forest is an ensemble of random decision treesClassification is15vvtnfn(v) >tree t1tree tT……category cfeature vector v

Classifications Pn(c)category cICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Randomized Tree Learning16left splitright splitFeatures f(v)chosen from a random feature pool f2FThresholds t chosen in rangeChoose f and t to maximize gain in informationICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Random Forest – SummaryProsGeneralization through bagging (random samples) & randomised tree learning (random features)Very fastclassificationInherently multi-classSimpletrainingConsInconsistencyDifficulty for adaptation17ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

BoostingIteratively reweighting training samples.

Higher weights to previously misclassified samples.181 round2 rounds3 rounds4 rounds5 rounds50 roundsICCV09 Tutorial Tae-Kyun Kim University of Cambridge

19Boosting TreesLSLS1LS2LS3tree t1tree tTtree t2……xy2(x)y1(x)yT(x)y(x)In classification: = the majority class in { ,…, } according to the weights y1(x)yT(x)ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Input/output: Init:For t=1 to TLearn ht that minimises Set the hypothesis weightUpdate the sample weightsBreak if AdaBoost[Freund and Schapire 04] 20ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Existence of weak learnersDefinition of a baseline learnerData weights:SetBaseline classifier: for all xError is at most ½.Each weak learner in Boosting is demanded s.t. -> Error of the composite hypothesis goes to zero as boosting rounds increase [Duffy et al 00].XOR problems (Matlab demo) 21ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Does AdaBoost generalize?Margins in AdaBoostMaximizing margins in AdaBoostUpper boundsEtcTutorials By JiriMatas (www.robots.ox.ac.uk/~az/lectures/cv/adaboost_matas.pdf)By Derek Hoiem(http://www.cs.uiuc.edu/homes/dhoiem/)22ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Mixture of Experts [Jordan, Jacobs 94]Gating network encourages specialization (local experts) instead of cooperation24Gating ftny1(x)Expert 1g1(x)y2(x)Expert 2Input xoutput…Sg2(x)yL(x)Expert LgL(x)Gating NetworkICCV09 Tutorial Tae-Kyun Kim University of Cambridge

25Ensemble learning: Boosting and Bagging constantCooperation instead of specializationy1(x)Expert 1g1y2(x)Expert 2Input xoutputg2S…yL(x)gLExpert LICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Boosting Simple Features [Viola and Jones CVPR 01]Adaboost classificationWeak classifiers: Haar-basis like functions (45,396 in total)27Strong classifierWeak classifierICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Boosting Simple Features [Viola and Jones CVPR 01]Integral imageA value at (x,y) is the sum of the pixel values above and to the left of (x,y).The sum of original image values within the rectangle can be computed: Sum = A-B-C+D28ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Boosting as a Tree-structured Classifier

Boosting (very shallow network) The strong classifier H as boosted decision stumps has a flat structureCf. Decision “ferns” has been shown to outperform “trees” [Zisserman et al, 07] [Fua et al, 07]30c0 c1c0 c1c0 c1c0 c1c0 c1c0 c1ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Boosting -continuedGood generalisation by a flat structureFast (slower than RF) classificationGreedy optimisationA strong boosting classifierBoosting Cascade [viola & Jones 04], Boosting chain [Xiao et al]

Speeds up for unbalanced binary problems

Hard to design31ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

32BaggingRandom ferns?Bagged boosting classifiersRandom forest?Boosting decision stumpsA decision stumpBoostingDecision treeBoosted decision treesTree hierarchyInspired by Yin, Crinimisi CVPR07ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

AnyBoost : an unified framework[Mason et al 00]Most boosting algorithms have in common that they iteratively update sample weights and select the next hypothesis based on the weighted samples.Input: ,Loss ftn , Output: Init: For t=1 to TFind that minimises the weighted errorSet Update35ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

AnyBoost an exampleLoss ftn:Weight samples:36ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

AnyBoost related37Noisy-OR boosting for multiple instance learning [Viola et al NIPS06]Multi-component boosting [Dollar et al ECCV08]MP boosting [Babenko et al ECCVW08]MCBoost [Kim and Cipolla NIPS08]ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Multi-view and multi-category object detectionImages exhibit multi-modality.A single boosting classifier is not sufficient.Manual labeling sub-categories.39ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Multiclass object detection [Torralba et al PAMI 07]Learning multiple boosting classifiers by sharing featuresTree structure speeds up the classification40ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

41Multiclass object detection [Torralba et al PAMI 07]ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

42Multiclass object detection [Torralba et al PAMI 07]ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

ClusterBoost[Wu et al ICCV07]AlgorithmStart with one sub-categorySelect a feature per branch by BoostingDivide the branch if the feature is too weak by clustering on the feature valueContinue to grow while updating the weak-learners of parent nodesCf. VBT [Huang et al ICCV05],PBT [Tu ICCV05]43Feature sharing←Splitting←SplittingICCV09 Tutorial Tae-Kyun Kim University of Cambridge

ClusterBoost result44ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

AdaTree (Grossmann CVPRW04) Result451% Noise in data10% Noise in data0.50.40.3Error0.20.100 90Boosting round0.50.40.3Error0.20.100 90Mean computation costICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Boosting for XOR- MCBoost [T-K Kim, R Cipolla NIPS 08]

ProblemFace cluster 1Face cluster 2K-means clusteringMCBoost47ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

MCBoost: Multiple Strong Classifier BoostingObjective ftn:The joint prob. as a Noisy-OR: whereBy AnyBoost Framework [Mason et al. 00]For t=1 to TFor k=1 to KUpdate sample weights48ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Toy XOR classification problemDiscriminative clustering positive samples.It doesn’t partition an input space.49ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Toy XOR classification problemclassifier1classifier2classifier31.21.11.00.90.80.7Weaklearner weight0.60.50.40.30.210 20 30Boosting roundMatlab demo50ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

ExperimentsINRIA pedestrian data set containing 1207 pedestrian images and 11466 random images.PIE face data set involving 1800 face images and 14616 random images. A total number of 21780 simple rectangle features. Random images + simple featuresImage cluster centersPedestrian imagesK=5Face imagesK=351ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Pedestrian detection by MCBoost[Wojek, Walk, Schiele et al CVPR09]52TUD Brussels onboard datasetICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Pedestrian detection by MCBoost[Wojek, Walk, Schiele et al CVPR09]1.0recall0.00.0 1.01-precision53ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

54ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

A sequential structure of varying lengthFloatBoost [Li et al PAMI04]A backtrack mechanism deleting non-effective weak-learners at each round.WaldBoost[Sochman and Matas CVPR05]Sequential probability ratio test for the shortest set of weak-learners for the given error rate.56Classify x as + and exitClassify x as - and exitICCV09 Tutorial Tae-Kyun Kim University of Cambridge

vMaking a shallow network deep- Super tree [Kim, Budvytis, Cipolla, 09]v

Converting a boosting classifier to a decision treeMany short paths for speeding upPreserving (smooth) decision regions for good generalisation58Decision treeBoostingICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Converting a boosting classifier to a decision treeMany short paths for speeding upPreserving (smooth) decision regions for good generalisation5928162631175 times speed up11342BoostingSuper treeICCV09 Tutorial Tae-Kyun Kim University of Cambridge

ICCV09 Tutorial Tae-Kyun Kim University of Cambridge60

Boolean optimisation formulationFor a learnt boosting classifier split a data space into 2mprimitive regions by m binary weak-learners.Code regions Rii=1,..., 2m by boolean expressions.61W11R60R5R2R10W21R4R7R310W3ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Boolean expression minimizationOptimally joining the regions of the same class label or don’t care label.A short tree built from the minimised boolean expression.62W11R60W1R5R2R1001W21R41R7R3R5,R6,R7,R81010W30W3W2R1,R21010R3R4W1W2W3 v W1W2W3 v W1W2W3 vW1W2W3W1 v W1W2W3ICCV09 Tutorial Tae-Kyun Kim University of Cambridge

Boolean optimisation formulationOptimally short tree defined with average expected path length of data points where p(Ri)=Mi/M and the tree must duplicate the Boosting decision regions.63Solutions

Boolean expression minimization

Growing a tree from the decision regions

iccv2009 tutorial: boosting and random forest - part II

More Related Content

Viewers also liked (20)

Similar to iccv2009 tutorial: boosting and random forest - part II (20)

More from zukun (20)

Recently uploaded (20)

iccv2009 tutorial: boosting and random forest - part II

Editor's Notes