SlideShare a Scribd company logo
Building a data-driven application
Ryan Wang, Assembled (@wgyn_)
A bit about me
- Cofounder at Assembled
- Previously early software engineer at Stripe
- Anti-fraud machine learning -> tools for customer support
- Economics and statistics by training
What is Assembled?
- Capacity planning and scheduling for support teams
- Forecast contact volume
- Forecast staffing requirements
- Suggest schedule optimizations
- Reporting (operations and trends)
- Working with Grammarly’s Customer Support team since mid-2018
Building a data-driven application
Some context: Engineering at Assembled
- Early-stage company, 5-person engineering team
- Primarily Golang & JS (React)
- Build out product alongside early partners
- Key features can start as custom requests
- E.g. Google Calendar sync, real-time dashboard
Some context: Engineering at Assembled (cont)
What does data-driven mean?
“[Google] is a big statistical analysis engine that collects, organizes, summarizes,
and analyzes data to provide users with information...”
- Diane Lambert, Research Scientist @ Google
https://webcache.googleusercontent.com/search?q=cache:NH2qAByiqv0J:https://www.ima.umn.edu/2011-2012/IPS10.7.11+&
cd=5&hl=en&ct=clnk&gl=ua
Summarize &
analyze data
Collect & organize
data
Provide users with
information
Data dispersed across multiple systems
- Makes collecting and organizing difficult!
- Key information sources include:
- Messaging (Salesforce, Zendesk, Intercom, etc.)
- Scheduling (Google Calendar, Google Sheets, WhenIWork, etc.)
- HR / Time Tracking
How we standardize data
Assembled
Salesforce Zendesk Intercom
How we standardize data (cont)
- Core schema curates
commonly useful metrics
- Must play nice with upstream
APIs
- Has changed surprisingly little
since early 2018
How we standardize data (cont)
See e.g. https://golang.org/doc/faq#inheritance
- Platform-specific adapters for
standardized data pipelines
- Golang’s interface provides
useful alternative to classical
inheritance
Summarize &
analyze data
Collect & organize
data
Provide users with
information
Building models => deploying models
- Specifically, building great models != deploying great models
- Famous example:
- $1M Netflix Prize for best recommendation algorithm
- Never put into production!
https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-stars-part-1-55838468f429
- In a past role when “big data” was a true constraint:
- Distributed machine learning via Hadoop
- Dedicated infrastructure teams
- Custom tools (e.g. brushfire for training tree ensembles)
Use the right tools for the job
- Now out-of-the-box tools are sufficient
- Optimize for compatibility with existing infrastructure
- Expose R as microservice queryable from Golang
- Library: https://github.com/senseyeio/roger
- Nice feature: reuse logic for analytics, training, and evaluation
Alternatively optimize for ease of use
Building a data-driven application
Summarize &
analyze data
Collect & organize
data
Provide users with
information
Provide users with (actionable) information
- Hard to make analytics useful; tendency is to add more information
- Product philosophy:
- Information should be actionable
- Figure out and answer the questions of direct interest
An example of actionable information
- Real question: How much staffing do we need at a given time?
- Calculate it directly and compare to scheduled staffing
How do we calculate the requirement?
https://web.archive.org/web/20110719122546/http://oldwww.com.dtu.dk/teletraffic/erlangbook/pps138-155.pdf
- For real-time channels, a model called Erlang-C
- Original problem statement: Given x telephone lines, y average
number of calls, what’s the probability of a call waiting?
- Treat call arrivals as following a Poisson distribution
- Reframing: Solve for x such that waiting probability is below B
- For queued channels, a proprietary simulation model
How do we calculate the requirement? (cont)
https://en.wikipedia.org/wiki/Erlang_(unit)#Erlang_C_formula
On the horizon: easier introspection
- Too simplistic can be problematic:
- What makes us trustworthy?
- What if certain inputs are wrong?
- Must make system easy to introspect
- Big ongoing work: support interactive “what-if” analysis
Summarize &
analyze data
Collect & organize
data
Provide users with
information
Summary / Lessons
1. Collect & organize data: standardization is time-consuming but
worthwhile
2. Summarize & analyze data: building models != deploying models
3. Provide users with information: make it actionable; must next support
introspection
Thank you!
- Joint work with Brian Sze, Chris Pak, and John Wang
- Please feel free to contact:
- ryan@assembled.com
- @wgyn_ (Twitter)

More Related Content

PDF
Pinterest - Big Data Machine Learning Platform at Pinterest
PDF
Apply MLOps at Scale
PDF
Building an ML Tool to predict Article Quality Scores using Delta & MLFlow
PDF
Model Experiments Tracking and Registration using MLflow on Databricks
PPTX
Bsadd Project Idea & Platform
PDF
Kushal resume
DOCX
Resume
PDF
Resume rachith
Pinterest - Big Data Machine Learning Platform at Pinterest
Apply MLOps at Scale
Building an ML Tool to predict Article Quality Scores using Delta & MLFlow
Model Experiments Tracking and Registration using MLflow on Databricks
Bsadd Project Idea & Platform
Kushal resume
Resume
Resume rachith

Similar to Building a data-driven application (20)

PDF
Lambda Architecture and open source technology stack for real time big data
PDF
AnujGupta_TechnologyConsultant
PPTX
Choosing the Right Database - Facebook DevC Malang Hackdays 2017
PPTX
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
PDF
Data_and_Analytics_Industry_IESE_v3.pdf
PDF
Internship Presentation.pdf
PDF
Product Engineer Certified Lean Six Sigma Black Belt by IASSC
PPTX
sap hana|sap hana database| Introduction to sap hana
PPTX
Machine Learning
PPTX
Big Data Analytics
PDF
Maruti gollapudi cv
PDF
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
PDF
Tarun datascientist affle
PPTX
Introduction to HANA in-memory from SAP
PPTX
Hourglass: a Library for Incremental Processing on Hadoop
PDF
Wims2012
PDF
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
PDF
Kashif Ghaffar
PDF
CDP.pl - tech case study by Divante
Lambda Architecture and open source technology stack for real time big data
AnujGupta_TechnologyConsultant
Choosing the Right Database - Facebook DevC Malang Hackdays 2017
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
Data_and_Analytics_Industry_IESE_v3.pdf
Internship Presentation.pdf
Product Engineer Certified Lean Six Sigma Black Belt by IASSC
sap hana|sap hana database| Introduction to sap hana
Machine Learning
Big Data Analytics
Maruti gollapudi cv
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Tarun datascientist affle
Introduction to HANA in-memory from SAP
Hourglass: a Library for Incremental Processing on Hadoop
Wims2012
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
Kashif Ghaffar
CDP.pl - tech case study by Divante
Ad

Recently uploaded (20)

PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Information Storage and Retrieval Techniques Unit III
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
86236642-Electric-Loco-Shed.pdf jfkduklg
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
737-MAX_SRG.pdf student reference guides
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPTX
Artificial Intelligence
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
Current and future trends in Computer Vision.pptx
Information Storage and Retrieval Techniques Unit III
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Exploratory_Data_Analysis_Fundamentals.pdf
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Categorization of Factors Affecting Classification Algorithms Selection
III.4.1.2_The_Space_Environment.p pdffdf
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
86236642-Electric-Loco-Shed.pdf jfkduklg
Fundamentals of safety and accident prevention -final (1).pptx
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
737-MAX_SRG.pdf student reference guides
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
Artificial Intelligence
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Ad

Building a data-driven application

  • 1. Building a data-driven application Ryan Wang, Assembled (@wgyn_)
  • 2. A bit about me - Cofounder at Assembled - Previously early software engineer at Stripe - Anti-fraud machine learning -> tools for customer support - Economics and statistics by training
  • 3. What is Assembled? - Capacity planning and scheduling for support teams - Forecast contact volume - Forecast staffing requirements - Suggest schedule optimizations - Reporting (operations and trends) - Working with Grammarly’s Customer Support team since mid-2018
  • 5. Some context: Engineering at Assembled - Early-stage company, 5-person engineering team - Primarily Golang & JS (React) - Build out product alongside early partners - Key features can start as custom requests - E.g. Google Calendar sync, real-time dashboard
  • 6. Some context: Engineering at Assembled (cont)
  • 7. What does data-driven mean? “[Google] is a big statistical analysis engine that collects, organizes, summarizes, and analyzes data to provide users with information...” - Diane Lambert, Research Scientist @ Google https://webcache.googleusercontent.com/search?q=cache:NH2qAByiqv0J:https://www.ima.umn.edu/2011-2012/IPS10.7.11+& cd=5&hl=en&ct=clnk&gl=ua
  • 8. Summarize & analyze data Collect & organize data Provide users with information
  • 9. Data dispersed across multiple systems - Makes collecting and organizing difficult! - Key information sources include: - Messaging (Salesforce, Zendesk, Intercom, etc.) - Scheduling (Google Calendar, Google Sheets, WhenIWork, etc.) - HR / Time Tracking
  • 10. How we standardize data Assembled Salesforce Zendesk Intercom
  • 11. How we standardize data (cont) - Core schema curates commonly useful metrics - Must play nice with upstream APIs - Has changed surprisingly little since early 2018
  • 12. How we standardize data (cont) See e.g. https://golang.org/doc/faq#inheritance - Platform-specific adapters for standardized data pipelines - Golang’s interface provides useful alternative to classical inheritance
  • 13. Summarize & analyze data Collect & organize data Provide users with information
  • 14. Building models => deploying models - Specifically, building great models != deploying great models - Famous example: - $1M Netflix Prize for best recommendation algorithm - Never put into production! https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-stars-part-1-55838468f429
  • 15. - In a past role when “big data” was a true constraint: - Distributed machine learning via Hadoop - Dedicated infrastructure teams - Custom tools (e.g. brushfire for training tree ensembles) Use the right tools for the job
  • 16. - Now out-of-the-box tools are sufficient - Optimize for compatibility with existing infrastructure - Expose R as microservice queryable from Golang - Library: https://github.com/senseyeio/roger - Nice feature: reuse logic for analytics, training, and evaluation Alternatively optimize for ease of use
  • 18. Summarize & analyze data Collect & organize data Provide users with information
  • 19. Provide users with (actionable) information - Hard to make analytics useful; tendency is to add more information - Product philosophy: - Information should be actionable - Figure out and answer the questions of direct interest
  • 20. An example of actionable information - Real question: How much staffing do we need at a given time? - Calculate it directly and compare to scheduled staffing
  • 21. How do we calculate the requirement? https://web.archive.org/web/20110719122546/http://oldwww.com.dtu.dk/teletraffic/erlangbook/pps138-155.pdf - For real-time channels, a model called Erlang-C - Original problem statement: Given x telephone lines, y average number of calls, what’s the probability of a call waiting? - Treat call arrivals as following a Poisson distribution - Reframing: Solve for x such that waiting probability is below B - For queued channels, a proprietary simulation model
  • 22. How do we calculate the requirement? (cont) https://en.wikipedia.org/wiki/Erlang_(unit)#Erlang_C_formula
  • 23. On the horizon: easier introspection - Too simplistic can be problematic: - What makes us trustworthy? - What if certain inputs are wrong? - Must make system easy to introspect - Big ongoing work: support interactive “what-if” analysis
  • 24. Summarize & analyze data Collect & organize data Provide users with information
  • 25. Summary / Lessons 1. Collect & organize data: standardization is time-consuming but worthwhile 2. Summarize & analyze data: building models != deploying models 3. Provide users with information: make it actionable; must next support introspection
  • 26. Thank you! - Joint work with Brian Sze, Chris Pak, and John Wang - Please feel free to contact: - ryan@assembled.com - @wgyn_ (Twitter)