SlideShare a Scribd company logo
HOW WE TURNED
EVERYTHINGME INTO A DATA
DRIVEN COMPANY
Hello, I’m Arik.
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
We’re geeks.
YMMV.
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
Requirements:
1. scalable
2. fast
3. easy to query (accessible)
Amazon
Redshift
“Petabyte scale;
massively parallel
Fully managed;
zero admin…”
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
8TB total size of data*
9 Nodes Cluster
Loading new data every ~5 minutes
~1500 query executions per day
One main fact table:
fact_events
Step #2
Give Everyone Access
BI TOOLS?
psql, SQL Workbench
CSV Sharing
(hackathing!)
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
re:dash supports multiple data sources:
PostgreSQL
Redshift
MySQL
BigQuery
MongoDB
InfluxDB
Graphite
Everyone at the company is using re:dash at
some capacity.
8K queries created to date, with >130 dashboards.
Over 25* companies using it &
24 contributors.
* Probably more. Not everyone reaches out to me.
Step #3
Improve
Giving everyone raw access is only
the first step, not the end of the
road.
Working with SQL all the time ->
repeating yourself.
Template Queries, Parameters
Alerts
Search & Discovery
Pandas, IPython integration
• Give everyone at the company access to the
data
Make it easy (accessible)
Avoid restrictions
Thank you. @arikfr
arik@everything.m
e
http://redash.io/

More Related Content

PDF
re:dash is awesome
PPTX
Improve aws withproxysql
PPTX
Running MongoDB 3.0 on AWS
PPTX
Azure Service Fabric Mesh
PDF
Configuring MongoDB HA Replica Set on AWS EC2
PDF
David Fetter, Disqus
PDF
Digdag Updates 2020 July
PDF
ECS위에 Log Server 구축하기
re:dash is awesome
Improve aws withproxysql
Running MongoDB 3.0 on AWS
Azure Service Fabric Mesh
Configuring MongoDB HA Replica Set on AWS EC2
David Fetter, Disqus
Digdag Updates 2020 July
ECS위에 Log Server 구축하기

What's hot (20)

PPTX
Azure SQL Database: 12 Things to Know
PDF
Security features In MySQL 8.0
PDF
Scaling MongoDB in the cloud with Microsoft Azure
PPTX
MongoDB on EC2 and EBS
PDF
20명 규모의 팀에서 Vault 사용하기
PDF
[245] presto 내부구조 파헤치기
PPTX
Combining Django REST framework & Elasticsearch
PPTX
Windows Azure Blob Storage
PPTX
[PL] Code Europe 2016 - Python and Microsoft Azure
PDF
Optimizing elastic search on google compute engine
PPTX
Webinar: Architecting Secure and Compliant Applications with MongoDB
PPTX
Insight on MongoDB Change Stream - Abhishek.D, Mydbops Team
PPTX
Windows Azure Web Sites – things they don’t teach kids in school - AzureConf
PDF
mongoDB Performance
PDF
Scripting Embulk Plugins
PDF
Python and cassandra
PPTX
Azure Recovery Services
PDF
Fluentd and Docker - running fluentd within a docker container
PDF
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
Azure SQL Database: 12 Things to Know
Security features In MySQL 8.0
Scaling MongoDB in the cloud with Microsoft Azure
MongoDB on EC2 and EBS
20명 규모의 팀에서 Vault 사용하기
[245] presto 내부구조 파헤치기
Combining Django REST framework & Elasticsearch
Windows Azure Blob Storage
[PL] Code Europe 2016 - Python and Microsoft Azure
Optimizing elastic search on google compute engine
Webinar: Architecting Secure and Compliant Applications with MongoDB
Insight on MongoDB Change Stream - Abhishek.D, Mydbops Team
Windows Azure Web Sites – things they don’t teach kids in school - AzureConf
mongoDB Performance
Scripting Embulk Plugins
Python and cassandra
Azure Recovery Services
Fluentd and Docker - running fluentd within a docker container
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
Ad

Viewers also liked (12)

PDF
Re:dash Use Cases at iPROS
PDF
Reversim Summit 2014: re:dash a new way to query, visualize and collaborate o...
PPTX
Predictions 2015: Meaningful & Data-Driven - Marketing Media Communication -...
PDF
Data-driven media relations
PPTX
The art and science of data-driven journalism
PDF
Building a Data Driven Company
PPTX
The Road to Becoming a Data Driven Company
PPTX
Practical advice to build a data driven company
PPTX
10 Amazing Things To Do With a Hadoop-Based Data Lake
PPTX
Introducing Azure SQL Data Warehouse
PDF
Fuck Spreadsheets - first steps to become a data-driven company
PDF
Digital transformation masterclass june 2016
Re:dash Use Cases at iPROS
Reversim Summit 2014: re:dash a new way to query, visualize and collaborate o...
Predictions 2015: Meaningful & Data-Driven - Marketing Media Communication -...
Data-driven media relations
The art and science of data-driven journalism
Building a Data Driven Company
The Road to Becoming a Data Driven Company
Practical advice to build a data driven company
10 Amazing Things To Do With a Hadoop-Based Data Lake
Introducing Azure SQL Data Warehouse
Fuck Spreadsheets - first steps to become a data-driven company
Digital transformation masterclass june 2016
Ad

Similar to PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company (20)

PPTX
Holistics Overview
PPTX
AmazonRedshift
PPTX
Big Data Mining Keynote presentation Sept 2013 09012013
PDF
Keepin’ It Real(-Time) With Nadine Farah | Current 2022
PDF
Lessons from Digital Natives: How Retailers Power their Businesses with DataOps
PDF
IT + Line of Business - Driving Faster, Deeper Insights Together
PDF
Amazon Redshift
PDF
Tech Talk: How the most innovative ecommerce companies scale their business w...
PDF
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
PDF
Becoming (Big) Data Driven presentation at BusinessMeetsIt Big Data seminar M...
PDF
DoneDeal - AWS Data Analytics Platform
PDF
go.datadriven.whitepaper
PDF
Emcien overview v6 01282013
PDF
Big Data
PDF
Demystify big data data science
PPT
Msbi by quontra us
PPTX
Big data
PDF
Next Generation Data Platforms - Deon Thomas
PPTX
Big Data.pptx
PPTX
BDD Data Lake Demo
Holistics Overview
AmazonRedshift
Big Data Mining Keynote presentation Sept 2013 09012013
Keepin’ It Real(-Time) With Nadine Farah | Current 2022
Lessons from Digital Natives: How Retailers Power their Businesses with DataOps
IT + Line of Business - Driving Faster, Deeper Insights Together
Amazon Redshift
Tech Talk: How the most innovative ecommerce companies scale their business w...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
Becoming (Big) Data Driven presentation at BusinessMeetsIt Big Data seminar M...
DoneDeal - AWS Data Analytics Platform
go.datadriven.whitepaper
Emcien overview v6 01282013
Big Data
Demystify big data data science
Msbi by quontra us
Big data
Next Generation Data Platforms - Deon Thomas
Big Data.pptx
BDD Data Lake Demo

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Spectroscopy.pptx food analysis technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Electronic commerce courselecture one. Pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
1. Introduction to Computer Programming.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
MIND Revenue Release Quarter 2 2025 Press Release
Spectroscopy.pptx food analysis technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
Spectral efficient network and resource selection model in 5G networks
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Electronic commerce courselecture one. Pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
1. Introduction to Computer Programming.pptx
Machine learning based COVID-19 study performance prediction
Agricultural_Statistics_at_a_Glance_2022_0.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Programs and apps: productivity, graphics, security and other tools
“AI and Expert System Decision Support & Business Intelligence Systems”

PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company

Editor's Notes

  • #4: We’re building a contextual Android launcher, that delivers “One Tap Happiness” to its users. Early on we decided that we want to empower all of our employees with access to all possible data related to the company and product.
  • #5: I will start with a disclaimer we’re a bunch of geeks at EverythingMe. Even our CEO writes SQL queries. Not every solution that works for us, will work in every company, but I’m sure that there is something for everyone here and the principals apply.
  • #6: We had three main tools: Splunk, Google Analytics & Flurry. Due to limits of the 3rd party tools, we had growing need to get our own numbers from our own data. Because we already had our events data in Splunk, we tried to use it but were limited, due to Splunk’s nature. It’s was great for Ops, but failed us in Analytics.
  • #7: When we decided to look for a new tool, we wanted to enable everyone access to raw data. Therefore we needed a scalable solution, that will grow with our data, and will allow for running fast queries and most important — will be easy to query. Due to our previous experience with 3rd party tools we knew we want something we can run in-house. For various reasons, we looked mainly at Hadoop and Redshift.
  • #8: Redshift is a columnar database that exposes PostgreSQL interface. We picked Redshift mainly due to the ease of querying it and the familiar interface (SQL). The quote is how Amazon sells Redshift. Most of it is correct. Zero admin is a fantasy, but it’s definitely much easier to admin than many other comparable options. It is scalable and massively parallel though. To get the most out of it, you really need to understand it.
  • #9: Scribe is a server for aggregating log data that's streamed in real time from clients (Scribe is a deprecated project and these days we’re switching Scribe with nsq). We invest in tooling to bring all our data into a single repository (Redshift). It gives all the users equal access to everything. Google Spreadsheets is used to import small dimension tables.
  • #10: We’re not dealing with real big data. Just with lots of rows.
  • #11: ~200 columns table. We didn't need to "design" our data model in advance, everything was reaction to actual use. Making raw data available is key #1. Key #2 is Redshift’s ability to run queries on top of the raw data without much hiccup, something that wasn’t really possible with other tools.
  • #13: Maybe it’s our engineering culture or the fact that Redshift was new at the time, but no existing tool could give us easy access to Redshift *full capabilities*.
  • #14: Because the data was available, users started using their favorite SQL tool to access the data.
  • #15: Soon they started sharing results with CSV files attached to emails. This sucks: Don’t know the source Don’t know when it was generated Can’t reproduce
  • #16: Luckily shortly after deploying Redshift, we had one of our internal hackathons (“hackAthing”), where I decided to tackle the problem of CSV sharing.
  • #17: The result was re:dash — open source project that we released over a year and a half ago. Written in Python. Of course. With Flask, Celery, and many other good libraries.
  • #18: Write a query.
  • #19: Visualize it.
  • #20: The key for sharing is that every data set & a query have their own URL you can share. So people can review your results and the query used to generate them.
  • #21: You can create your own fork. This is useful for further researching, changing, etc. But also useful for people less familiar with SQL or the data model, to be able to create their own things without much knowledge of SQL.
  • #22: Collect results and visualizations into dashboards.
  • #23: More features: filters.
  • #24: Parameters.
  • #25: Python as queries! It can be used to merge data sets, to apply simple transformation that are hard to do with sql, to bring data from external sources via API or to use Pandas and such.
  • #27: Internal adoption.
  • #28: External adoption.
  • #36: The key thing is to give everyone access to the data. You tools might vary depends on your organization type, and the amount of data you have. But you need to start with making it accessible. Data analysts: the idea is not to put you out of job, but rather to free you to do the interesting work instead of generating the same reports.
  • #37: Thank you very much. I’ll really appreciate your feedback.