SlideShare a Scribd company logo
Doing data science
with Clojure
@sbelak
simon@goopti.com
Curry On Rome, 2016
Doing data science with Clojure
↳ Design constrains
↳ The environment
↳ notebooks vs. REPL
↳ programmable environments
↳ The tools
↳ design decisions behind Huri (my data science library)
↳ data frame considered harmful
↳ encoding computation into structure
↳ composability
↳ feedback loops
↳ Expanding the ecosystem with mini compilers (to ggplot, scipy, …)
Design
constraints
Doing data science with Clojure
Divide and conquer
complexity
Kafka
PostgreSQL
ElasticSearch
frontend actions
orderbook changes
monitoring
telemetry
flight changes
Intercom
…
s3
Intercom
Automatic views
• Event & attribute ontology
• Manual
• Inferred
• Seasonality detection
Data science: 

the process
(aka it’s about communication, stupid!)
The analytics chasm
Ideal. Almost real-time, can
be done during brainstorming
without disrupting flow
< 2min < 20min project
squeeze in
somewhere
in the day
fail
roadmap

ahoy!
Think in distributions,
not numbers
No
throwaways
Sharing results
• Have one canonical version that is always current.
• Concentrate discussion in one place and make it
searchable and persistent.
• Include methodology (=code).
The environment
REPL vs. notebook
REPL vs. notebook+[Ephemeral] [Spital grouping]
(hacked) gorilla-repl.org
+
auto-refresh
+
hypothes.is
#alderaan #sales #growth
Code hidden, but
can be expanded
Questions,
comments,
&
annotations
Shareable
Periodically re-run
to keep it fresh
#alderaan #sales #growth
discoverability
Notebooks as
dashboards
The power of
sharing runtime
Wishlist/TODO
• Better editor (shaunlebron.github.io/parinfer/ ?)
• Embedded REPL
• Better exception reporting
• Browsable data structures

The tools
Doing data science with Clojure
Data frame considered
harmful
• Data frame (=table) conflates representation and
abstraction
• Clojure excels in structure manipulation/encoding
github.com/sbelak/huri
• No data structures, just functions over collections
• Composable (even DSLs — no macros!)
• Reasonably fast (transducers <3)
• Do-what-I-mean (auto-sort, liberal with inputs, …)
• Minimal buy-in
composable
data structure
based DSLs
->> and partial friendly
Support reaching into
nested structures
everywhere
vanilla vector of maps
interoperability
Provide curried versions
where possible
Composability is key to
quick iterating
• Curried versions where possible
• ->> and partial friendly
• Side benefit: consistent API
• Generalised accessors (reaching into complex
structures everywhere via comp)
function
map key
“virtual” structure
“This is possibly Clojure’s most important
property: the syntax expresses the code’s
semantic layers. An experienced reader of
Clojure can skip over most of the code and
have a lossless understanding of its high-
level intent.”
— Z. Tellman, Elements of Clojure
On feedback
Catching errors early more context
easier debugging faster iterating
clojure.spec
=>
Should have been
a keyword->fn map
<3 Bret Victor
What about machine learning?
farm it out to
sklearn
Mini compilers for DSLs
targeting a specic library
in another language
huri.plot
• DSL that compiles to ggplot2
• Targets Gorilla REPL
• Follows the rest of Huri’s design philosophy
• bar chart, scatter plot, line chart, box & violin plot,
heatmap, histogram
Doing data science with Clojure
Takeouts
• Speed-of-answer matters
• Data science is about communication
• We don’t have to reinvent every wheel in Clojure
• Clojure is fantastic at structure manipulation, play
to its strengths
• Blurring the line between environment and work is
a powerful idea

More Related Content

PDF
Doing data science with Clojure
PDF
The log
PDF
Doing data science with Clojure
PDF
Big Data Analytics Tokyo
PDF
resume-yifei-wang
PDF
Strata Beijing 2017: Jumpy, a python interface for nd4j
PPTX
Brief introduction to Distributed Deep Learning
PPT
HyperGraphDb
Doing data science with Clojure
The log
Doing data science with Clojure
Big Data Analytics Tokyo
resume-yifei-wang
Strata Beijing 2017: Jumpy, a python interface for nd4j
Brief introduction to Distributed Deep Learning
HyperGraphDb

Viewers also liked (16)

PDF
Spec + onyx
PDF
Odkrivanje segmentov iz podatkov
PPT
O Filozofih In Programih
PDF
Living with-spec
PDF
The time is out of joint: O cursed spite, / That ever I was born to set it ri...
PPTX
inOrbit 2015: odkrivanje segmentov iz podatkov
PDF
Turn to datadriven: the first 6 months
PDF
Dao of lisp
PDF
Predicting the future with goopti
PPTX
Napadi na algoritme za strojno učenje
PDF
Turn to data-driven: the first 6 months, Simon Belak
PDF
Living with-spec
PDF
Clojure for Data Science
ZIP
Clojure: Functional Concurrency for the JVM (presented at Open Source Bridge)
PPTX
Clojure for Data Science
PDF
Using Onyx in anger
Spec + onyx
Odkrivanje segmentov iz podatkov
O Filozofih In Programih
Living with-spec
The time is out of joint: O cursed spite, / That ever I was born to set it ri...
inOrbit 2015: odkrivanje segmentov iz podatkov
Turn to datadriven: the first 6 months
Dao of lisp
Predicting the future with goopti
Napadi na algoritme za strojno učenje
Turn to data-driven: the first 6 months, Simon Belak
Living with-spec
Clojure for Data Science
Clojure: Functional Concurrency for the JVM (presented at Open Source Bridge)
Clojure for Data Science
Using Onyx in anger
Ad

Similar to Doing data science with Clojure (20)

PDF
Doing data science with clojure
PPTX
Why clojure(script) matters
PDF
Clojure Intro - Dallas Functional
PDF
Clojure intro Dallas Functional
PDF
Get into Functional Programming with Clojure
PDF
Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...
PDF
Introduction to Clojure
PDF
Clojure
PDF
The productivity brought by Clojure
PPTX
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
PDF
Functional web with clojure
PDF
Introductory Clojure Presentation
PDF
Fun with Functional Programming in Clojure
PDF
Clojure: Simple By Design
PDF
Thinking Functionally with Clojure
PDF
Thinking Functionally - John Stevenson - Codemotion Rome 2017
PDF
Functional (web) development with Clojure
PDF
Clojure Programming Cookbook Makoto Hashimoto Nicolas Modrzyk
PDF
The Ideas of Clojure - Things I learn from Clojure
PDF
Exploring Clojurescript
Doing data science with clojure
Why clojure(script) matters
Clojure Intro - Dallas Functional
Clojure intro Dallas Functional
Get into Functional Programming with Clojure
Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...
Introduction to Clojure
Clojure
The productivity brought by Clojure
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
Functional web with clojure
Introductory Clojure Presentation
Fun with Functional Programming in Clojure
Clojure: Simple By Design
Thinking Functionally with Clojure
Thinking Functionally - John Stevenson - Codemotion Rome 2017
Functional (web) development with Clojure
Clojure Programming Cookbook Makoto Hashimoto Nicolas Modrzyk
The Ideas of Clojure - Things I learn from Clojure
Exploring Clojurescript
Ad

More from Simon Belak (18)

PDF
Tools for building the future
PDF
Exploratory analysis
PDF
Levelling up your data infrastructure
PDF
The subtle art of recommendation
PDF
Metabase Ljubljana Meetup #2
PDF
Metabase lj meetup
PDF
Sketch algorithms
PDF
Transducing for fun and profit
PDF
Your metrics are wrong
PDF
Writing smart contracts the sane way
PDF
Online statistical analysis using transducers and sketch algorithms
PDF
Save the princess
PDF
Data driven going to market strategy
PDF
Spec: a lisp-flavoured type system
PDF
A data layer in clojure
PDF
Statisics for hackers
PDF
The data driven startup
PDF
Investor story
Tools for building the future
Exploratory analysis
Levelling up your data infrastructure
The subtle art of recommendation
Metabase Ljubljana Meetup #2
Metabase lj meetup
Sketch algorithms
Transducing for fun and profit
Your metrics are wrong
Writing smart contracts the sane way
Online statistical analysis using transducers and sketch algorithms
Save the princess
Data driven going to market strategy
Spec: a lisp-flavoured type system
A data layer in clojure
Statisics for hackers
The data driven startup
Investor story

Recently uploaded (20)

PPTX
Database Infoormation System (DBIS).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
1_Introduction to advance data techniques.pptx
PDF
Lecture1 pattern recognition............
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Quality review (1)_presentation of this 21
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Database Infoormation System (DBIS).pptx
Foundation of Data Science unit number two notes
1_Introduction to advance data techniques.pptx
Lecture1 pattern recognition............
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Data_Analytics_and_PowerBI_Presentation.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
oil_refinery_comprehensive_20250804084928 (1).pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Knowledge Engineering Part 1
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Fluorescence-microscope_Botany_detailed content
Quality review (1)_presentation of this 21
Business Acumen Training GuidePresentation.pptx
Reliability_Chapter_ presentation 1221.5784
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

Doing data science with Clojure