SlideShare a Scribd company logo
Adding tree and tree 
@avibryant
Brushfire:! 
Distributed, 
Generic, 
Decision Tree Learning 
in Scala 
(using Hadoop) 
@avibryant 
Open source: Real Soon Now
Vun!
Two! 
+
Tree!
Adding Tree and Tree
Do you like cookies? 
{height: 5, color: blue, wears: fur} ? 
{height: 7, color: yellow, wears: feathers} ? 
{height: 3, color: green, wears: garbage} ? 
{height: 5, color: yellow, wears: stripes} ? 
{height: 4, color: orange, wears: stripes} ?
Do you like cookies? 
color != blue color = blue
Does Cookie Monster like Cookies? 
color != blue color = blue
Is Cookie Monster Blue? 
color != blue color = blue
Cooooookie! 
color != blue color = blue 
cookie!
Do you like cookies? 
color != blue color = blue 
yuck ok 
cookie! 
wears != stripes 
wears = stripes
color != blue color = blue 
T T 
T 
wears != stripes 
wears = stripes
color != blue color = blue 
T T 
T 
wears != stripes 
wears = stripes 
Do you like cookies? 
How many cookies will you eat? 
What’s your favorite kind of cookie?
Bootstrap or k-fold? 
Chi-square or entropy? 
Wow! 
Classification or regression? 
Binary splits or multiway? 
Out-of-bag 
or out-of-time? 
One tree or 
many? 
Binary or multi-class?
trait Evaluator[V,T] 
trait Tree[V,T] 
trait Splitter[V,T] 
trait Error[T,E] 
Wow! 
Such types! 
case class Instance[V,T]
false true 
false 
true 
Binary classification
0.1 0.4 
0.0 
0.9 
Binary classification
T+T+T+T= 
T T 
T 
T 
T+T+T+T+T= 
T+T+T+T+T= T+T+T=
Binary classification
Adding Tree and Tree
Bigger (data) 
= Better (models) 
Generic != Fast 
“Why do you rob banks?”
Learning a tree in Scalding 
11 passes through the data 
21 MapReduce steps
T 
T
T T T T
T T 
T T 
T T T T
Step 1/21 
T
{height: 5, color: blue, wears: fur} 
{height: 7, color: yellow, wears: feathers} 
{height: 3, color: green, wears: garbage} 
{height: 5, color: yellow, wears: stripes} 
{height: 4, color: orange, wears: stripes}
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
T 
T 
T 
T 
T 
T 
T 
T 
T 
T 
Map 
T 
Reduce
color 
!= blue = blue 
T T 
color 
!= yellow = yellow 
T T 
height 
< 5 >= 5 
T T 
? 
Step 2/21
color 
!= blue = blue 
T T 
color 
Step 2/21 
!= yellow = yellow 
T T 
?
blue 
yellow 
green 
yellow 
orange
blue 
yellow 
green 
yellow 
orange
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
Map Reduce 
Step 2/21 
S 
S 
Other options: 
CountMinSketch 
QTree 
…
V => Boolean V => Boolean 
T T
V => Boolean V => Boolean 
T T 
T 
V => Boolean
Adding Tree and Tree
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
Step 3/21 
S 
S 
S Split[V,T] Split[V,T] 
Split[V,T] 
Split[V,T]
Adding Tree and Tree
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
S 
Step 3/21 
S 
S 
S Split[V,T] Split[V,T] 
Split[V,T] 
Split[V,T] 
S 
S 
S 
S 
S 
S
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
S 
Step 3/21 
S 
S 
S Split[V,T] Split[V,T] 
Split[V,T] 
Split[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
S 
S Split[V,T] 
Split[V,T] 
Split[V,T]
Instance[V,T] 
Instance[V,T] 
Instance[V,T]
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
… 
Forests!
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
…
Adding Tree and Tree
Adding Tree and Tree
Adding Tree and Tree
V? 
{height: 5, color: blue, wears: fur} ? 
{height: 7, color: yellow, wears: feathers} ? 
{height: 3, color: green, wears: garbage} ? 
{height: 5, color: yellow, wears: stripes} ? 
{height: 4, color: orange, wears: stripes} ?
Adding Tree and Tree
PLANET 
http://static.googleusercontent.com/media/ 
research.google.com/en/us/pubs/archive/36296.pdf 
Scalding + Algebird 
http://github.com/twitter/scalding 
http://github.com/twitter/algebird 
Coming soon 
http://github.com/stripe/brushfire

More Related Content

KEY
A Type Driven Approach to Functional Design
PDF
Machine learning Lecture 3
PPTX
Turing Test
PPTX
Decision Tree Learning
PDF
Agent properties
PPT
lecture07.ppt
PPTX
r studio presentation.pptx
PPTX
r studio presentation.pptx
A Type Driven Approach to Functional Design
Machine learning Lecture 3
Turing Test
Decision Tree Learning
Agent properties
lecture07.ppt
r studio presentation.pptx
r studio presentation.pptx

Similar to Adding Tree and Tree (12)

PPT
lecture 17
PDF
R learning by examples
PPT
Slide3.ppt
PPTX
S1-Chp3-RepresentationsOfData, maths a level presentation for statistics
PPTX
Introducing R
PPT
Box Plots and Histograms
PDF
Python for High School Programmers
PPTX
A quick introduction to R
PPTX
Naive Bayes.pptx
PDF
Introduction to Graph Theory
PPTX
Edexcel IGCSE-Drawing and Interpretation Histograms.pptx
PPTX
Managing Data: storage, decisions and classification
lecture 17
R learning by examples
Slide3.ppt
S1-Chp3-RepresentationsOfData, maths a level presentation for statistics
Introducing R
Box Plots and Histograms
Python for High School Programmers
A quick introduction to R
Naive Bayes.pptx
Introduction to Graph Theory
Edexcel IGCSE-Drawing and Interpretation Histograms.pptx
Managing Data: storage, decisions and classification
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
KodekX | Application Modernization Development
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Cloud computing and distributed systems.
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Electronic commerce courselecture one. Pdf
PPT
Teaching material agriculture food technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Approach and Philosophy of On baking technology
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
Building Integrated photovoltaic BIPV_UPV.pdf
Cloud computing and distributed systems.
MYSQL Presentation for SQL database connectivity
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Electronic commerce courselecture one. Pdf
Teaching material agriculture food technology
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Weekly Chronicles - August'25 Week I
Ad

Adding Tree and Tree