SlideShare a Scribd company logo
Spec + Onyx: 

an experience report
@sbelak
simon@goopti.com
Onyxa masterless, cloud scale, fault tolerant, high
performance distributed computation system
… written entirely in Clojure
Onyx at
• In production for almost a year 

• ETL
• online machine learning
• offline (batch) machine learning
• ad-hoc analysis
Self-service infrastructure
for data scientists
1.Onyx at a glance
2.How Onyx rewired my brain
3.Building on top of spec
Onyx at a glance
Job =
[[:input :processing-1]
[:input :processing-2]
[:processing-1 :output-1]
[:processing-2 :output-2]]
[{:flow/from :input-stream
:flow/to [:process-adults]
:flow/predicate :my.ns/adult?
:flow/doc "Emits segment if an adult.”}]
workflow
+ flow conditions
+ catalogue[{:onyx/name :add-5
:onyx/fn :my/adder
:onyx/type :function
:my/n 5
:onyx/params [:my/n]
:onyx/batch-size batch-size}
{:onyx/name :in
:onyx/plugin :onyx.plugin.core-async/input
:onyx/type :input
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out
:onyx/plugin :onyx.plugin.core-async/output
:onyx/type :output
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Writes segments to a core.async channel"}]
Catalogue
[{:onyx/name :add-5
:onyx/fn :my/adder
:onyx/type :function
:my/n 5
:onyx/params [:my/n]
:onyx/batch-size batch-size}
{:onyx/name :in
:onyx/plugin :onyx.plugin.core-async/input
:onyx/type :input
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out
:onyx/plugin :onyx.plugin.core-async/output
:onyx/type :output
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Writes segments to a core.async channel"}]
Vanilla Clojure function


(defn adder [n {:keys [x] :as segment}]
(assoc segment :x (+ n x))))
Plugins (I/O)
seq, async, Kafka,
Datomic, SQL,…
parameter
self-documenting
Computation entirely
described with data
data
is
code!
Everything can be run
locally!
Testing without
mocking
How Onyx rewired my
brain
It’s not about scaling,
but clean architecture
My goto architecture
KafkaDB Events
Onyx Onyx
Onyx
Persist all
messages to S3
(time travel!)
Decomplect
everything
Computation graphs
Building on top of
spec
Queryable data descriptions
• s/registry, s/form
• Build a graph (Datomic)
Interact with your type system!
code
is
data!
Case study: autogenerating materialised views
Kafka
Materialised
views
Events
External data
Automatic view generation
• Event & attribute ontology
• Manual (via spec)
• Inferred
• Statistical analysis (seasonality
detection, outlier removal, …)
Onyx Onyx
Onyx
Automatic view generation
1. Walk spec registry
2. Apply rules
1. Define new view (spec)
2. Trigger Onyx job that creates the view
⤾
Code is data
or
data is code?
Takeouts
Onyx 

is 

production 

ready
Everything should be
live and interactive
Computation graphs are
a great way to structure
data processing code
Queryable data and
computation descriptions
supercharge interactive
development and are a
great building block for
automation
Questions
@sbelak
simon@goopti.com
viebel.github.io/klipse/examples/onyx.html
onyxplatform.org
onyxplatform.org/jekyll/update/2017/02/08/Pyroclast-
Preview-Simulation.html

More Related Content

PDF
Using Onyx in anger
PDF
Save the princess
PDF
Spec: a lisp-flavoured type system
PDF
A data layer in clojure
PDF
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
PDF
Luigi presentation OA Summit
PDF
Luigi future
PPTX
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Using Onyx in anger
Save the princess
Spec: a lisp-flavoured type system
A data layer in clojure
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
Luigi presentation OA Summit
Luigi future
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko

What's hot (20)

PDF
Managing data workflows with Luigi
PDF
Streaming data to s3 using akka streams
PDF
Logs aggregation and analysis
PDF
Airflow introduction
PDF
spaCy lightning talk for KyivPy #21
PDF
Airflow presentation
PPTX
Airflow - a data flow engine
PDF
Vocanic Map Reduce Lite
PPTX
A Beginner's Guide to Building Data Pipelines with Luigi
PDF
Beautiful Monitoring With Grafana and InfluxDB
PDF
Devoxx france 2015 influxdb
PDF
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow
PDF
Luigi presentation NYC Data Science
PDF
React meets o OCalm
PDF
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
PPTX
Serverless in-action
PDF
"Ускорение сборки большого проекта на Objective-C + Swift" Иван Бондарь (Avito)
PDF
Getting started with influx Db and Grafana Installation Guide
PDF
How I learned to time travel, or, data pipelining and scheduling with Airflow
Managing data workflows with Luigi
Streaming data to s3 using akka streams
Logs aggregation and analysis
Airflow introduction
spaCy lightning talk for KyivPy #21
Airflow presentation
Airflow - a data flow engine
Vocanic Map Reduce Lite
A Beginner's Guide to Building Data Pipelines with Luigi
Beautiful Monitoring With Grafana and InfluxDB
Devoxx france 2015 influxdb
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow
Luigi presentation NYC Data Science
React meets o OCalm
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Serverless in-action
"Ускорение сборки большого проекта на Objective-C + Swift" Иван Бондарь (Avito)
Getting started with influx Db and Grafana Installation Guide
How I learned to time travel, or, data pipelining and scheduling with Airflow
Ad

Viewers also liked (20)

PPTX
Introduction to Clojure and why it's hot for Sart-Ups
PDF
Doing data science with Clojure
PDF
Predicting the future with goopti
PDF
Doing data science with Clojure
PDF
Living with-spec
PDF
Dao of lisp
PDF
Doing data science with Clojure
PDF
Функциональное программирование и Clojure
PDF
Odkrivanje segmentov iz podatkov
PDF
The time is out of joint: O cursed spite, / That ever I was born to set it ri...
PDF
Turn to datadriven: the first 6 months
PPTX
inOrbit 2015: odkrivanje segmentov iz podatkov
PPT
O Filozofih In Programih
PPTX
Napadi na algoritme za strojno učenje
PDF
ETL in Clojure
PDF
Turn to data-driven: the first 6 months, Simon Belak
PDF
Living with-spec
PDF
Clojure for Data Science
PPTX
Clojure for Data Science
PPTX
Road Trip To Component
Introduction to Clojure and why it's hot for Sart-Ups
Doing data science with Clojure
Predicting the future with goopti
Doing data science with Clojure
Living with-spec
Dao of lisp
Doing data science with Clojure
Функциональное программирование и Clojure
Odkrivanje segmentov iz podatkov
The time is out of joint: O cursed spite, / That ever I was born to set it ri...
Turn to datadriven: the first 6 months
inOrbit 2015: odkrivanje segmentov iz podatkov
O Filozofih In Programih
Napadi na algoritme za strojno učenje
ETL in Clojure
Turn to data-driven: the first 6 months, Simon Belak
Living with-spec
Clojure for Data Science
Clojure for Data Science
Road Trip To Component
Ad

Similar to Spec + onyx (20)

PPTX
Debugging IE Performance Issues with xperf, ETW and NavigationTiming
PDF
Angular - Improve Runtime performance 2019
PPTX
Onyx data processing the clojure way
PPTX
Nmon Analysis - Performance monitoring tool for LINUX and AIX
PPTX
Building a system for machine and event-oriented data - Velocity, Santa Clara...
PDF
Alfresco monitoring with Nagios and ELK stack
PPTX
Introduction to .NET Performance Measurement
PDF
Open World Forum 2009 Migration With Telosys
PDF
Ow2 Open World Forum09 Migration With Telosys
 
PDF
Runtime performance
PDF
Continuous Delivery: The Dirty Details
PDF
PAC 2019 virtual Christoph NEUMÜLLER
PPTX
How fluentd fits into the modern software landscape
PPTX
Building a system for machine and event-oriented data with Rocana
PPTX
Building a system for machine and event-oriented data - Data Day Seattle 2015
PDF
Angular performance improvments
PDF
Test Pyramid vs Roi
PPTX
Пирамида Тестирования через призму ROI калькулятора и прочая геометрия
POTX
IBM Domino / IBM Notes Performance Tuning
PPTX
DockerCon Europe 2018 Monitoring & Logging Workshop
Debugging IE Performance Issues with xperf, ETW and NavigationTiming
Angular - Improve Runtime performance 2019
Onyx data processing the clojure way
Nmon Analysis - Performance monitoring tool for LINUX and AIX
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Alfresco monitoring with Nagios and ELK stack
Introduction to .NET Performance Measurement
Open World Forum 2009 Migration With Telosys
Ow2 Open World Forum09 Migration With Telosys
 
Runtime performance
Continuous Delivery: The Dirty Details
PAC 2019 virtual Christoph NEUMÜLLER
How fluentd fits into the modern software landscape
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data - Data Day Seattle 2015
Angular performance improvments
Test Pyramid vs Roi
Пирамида Тестирования через призму ROI калькулятора и прочая геометрия
IBM Domino / IBM Notes Performance Tuning
DockerCon Europe 2018 Monitoring & Logging Workshop

More from Simon Belak (17)

PDF
Tools for building the future
PDF
Doing data science with clojure
PDF
Exploratory analysis
PDF
Levelling up your data infrastructure
PDF
The subtle art of recommendation
PDF
Metabase Ljubljana Meetup #2
PDF
Metabase lj meetup
PDF
Sketch algorithms
PDF
Transducing for fun and profit
PDF
Your metrics are wrong
PDF
Writing smart contracts the sane way
PDF
Online statistical analysis using transducers and sketch algorithms
PDF
Data driven going to market strategy
PDF
The log
PDF
Statisics for hackers
PDF
The data driven startup
PDF
Investor story
Tools for building the future
Doing data science with clojure
Exploratory analysis
Levelling up your data infrastructure
The subtle art of recommendation
Metabase Ljubljana Meetup #2
Metabase lj meetup
Sketch algorithms
Transducing for fun and profit
Your metrics are wrong
Writing smart contracts the sane way
Online statistical analysis using transducers and sketch algorithms
Data driven going to market strategy
The log
Statisics for hackers
The data driven startup
Investor story

Recently uploaded (20)

PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
Database Infoormation System (DBIS).pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
modul_python (1).pptx for professional and student
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
How to run a consulting project- client discovery
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Business Analytics and business intelligence.pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
Introduction to Data Science and Data Analysis
PDF
Global Data and Analytics Market Outlook Report
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
Managing Community Partner Relationships
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Database Infoormation System (DBIS).pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
modul_python (1).pptx for professional and student
Optimise Shopper Experiences with a Strong Data Estate.pdf
New ISO 27001_2022 standard and the changes
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
How to run a consulting project- client discovery
CYBER SECURITY the Next Warefare Tactics
Business Analytics and business intelligence.pdf
Qualitative Qantitative and Mixed Methods.pptx
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Introduction to Data Science and Data Analysis
Global Data and Analytics Market Outlook Report
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Managing Community Partner Relationships
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

Spec + onyx