SlideShare a Scribd company logo
Enabling Data-Driven
Decisions with Automated
Insights
Charlotte Emms
Data Insight Analyst @ Seenit
charlotte@seenit.io
Charlotte Emms
Data Insight Analyst @ Seenit
- 4 years of experience in the Data Analytics field
- Joined Seenit’s Product team in Nov ‘17
Introducing Seenit
A video collaboration platform that enables
organisations to co-create authentic and
engaging content with their communities
of customers, superfans, or employees.
What is Seenit?
A Brief History of Seenit
Founded in
January
2014
Over 200k
uploads hit
4k+ monthly
active users
MVP used Couchbase
deployed on GCP
Still the same today!
Seenit Capture v2.0
released in April 2018
Our Couchbase Journey
• Seenit’s MVP in 2014 was powered by Couchbase 2.2
back when querying the data required MapReduce
techniques
• The introduction of N1QL transformed Seenit, allowing
us to develop new features rapidly and analyse event-
based audit data for product usage research
• N1QL allowed Seenit to easily understand this data by
hiring a Data Analyst proficient in SQL
Non-first (N1) Normal Form Query Language (QL)
• It is based on ANSI 92 SQL
• Its query engine is optimized for modern, highly parallel multi-core
execution
SQL-like Query Language
• Expressive, familiar, and feature-rich language for querying,
transforming, and manipulating JSON data
N1QL extends SQL to handle data that is:
• Nested: Contains nested objects, arrays
• Heterogeneous: Schema-optional, non-uniform
• Distributed: Partitioned across a cluster
What is N1QL?
...this is not a scalable approach for repeatable work!
The Seenit Data Analysis Toolkit
Display the results using
Plotly - a personal
favourite in terms of data
display versatility and
clarity
Extract data for analysis
using N1QL queries
through the Couchbase
Python SDK
Manipulate the data in
Jupyter Notebooks with
Python, using analytics
libraries like Pandas and
Numpy
How can I share
this data with the
wider team?
“Let’s build a dashboard”
- every data analyst joining a
startup ever (probably)
Tell a different
story with the data
How to Communicate Platform
Usage and Make it Interesting
There was a need for something more tangible. Something that
can engage someone from any role in the business
For example:
How many
uploads in a
week?
How far was
our global
reach this
week?
Which were the
active projects
this week & who
was running them
Have mobile
users been
interacting with
the in-app-feed?
Seenit’s Regular, Fully Automated Platform Update
Important
overused pun
Three core sections
○ This week in numbers
○ This week in lists (what’s new, what’s active)
○ This week in fun facts, which includes the following:
■ most exotic upload location (furthest from the office)
■ most committed contributor (uploads to projects)
■ biggest crowd pleaser (likes from others)
The “Engaging” Part
The Technical Journey
Productionised
● Introduced more
thorough HTML
templating using Jinja2
● Scheduled in Jenkins to
run once a week
● Runs in Kubernetes by
default - also set up to
run in Docker
Initial Idea / PoC
● Off the back of a
conversation at stand-up
● Designed a process in
Python through a Jupyter
notebook
● Query data in Couchbase,
manipulating and returning
in tabular form for friendly
email display
● Sent using my own gmail
account
MVP
● Found and wrestled with a
nicer looking email template
● Introduced fun facts section
● Moved code to a formal
Python project (separate
scripts, classes, functions)
using an executable file to
trigger the email generation
● Sent using Sendgrid as we do
for generic email
notifications from the
platform
Next steps
Another idea
DataBot
A Slack Bot user to
deliver bespoke
analytics on request
• Coded in Python and uses the
RTM (Real Time Messaging)
Slack API
• Mention the name of the bot
user to get a response
• The bot uses the first word
after the direct mention to
decide the correct response -
no AI/ML training or NLP
methods needed
Image
Image
Things I learnt
- You don’t need a relational database
structure to build automated data
programs
- You also don’t need a BI tool to
communicate data insights (at first*)
- Try new things even if they don’t go to
plan
- Learn new skills from teammates
- Welcome feedback and keep iterating!
Questions
Thank you
charlotte@seenit.io

More Related Content

PDF
Building an Applied Science Portfolio
PDF
Boost dataviz with Python, OW2online, June 2020
 
PPTX
Data Analytics - Real Time Trending
PPTX
Automation for the Modern Enterprise - 18 October 2017
PPTX
Webinar - Analyzing Video
PPTX
SnapLogic Live: IoT Integration
PDF
Open Source Big Graph Analytics on Neo4j with Apache Spark
PPTX
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Building an Applied Science Portfolio
Boost dataviz with Python, OW2online, June 2020
 
Data Analytics - Real Time Trending
Automation for the Modern Enterprise - 18 October 2017
Webinar - Analyzing Video
SnapLogic Live: IoT Integration
Open Source Big Graph Analytics on Neo4j with Apache Spark
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets

What's hot (7)

PDF
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
PDF
What can the cloud do for you?
PDF
Cloud Developer Days - BigQuery
PDF
Pivotal corporate story by CS Park
PDF
Integrating Web and Business Data
PDF
Making the Case for NoSQL
PDF
Knative serving
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
What can the cloud do for you?
Cloud Developer Days - BigQuery
Pivotal corporate story by CS Park
Integrating Web and Business Data
Making the Case for NoSQL
Knative serving
Ad

Similar to Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS (20)

PDF
How Celtra Optimizes its Advertising Platform with Databricks
PDF
Be3 experimentingbigdatainabox-part1:comprehendingthescenario
PDF
PXL Data Engineering Workshop By Selligent
PDF
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
PDF
Big Data Evolution
PDF
BDS14 Big Data Analytics to the masses
PDF
PDF
Five Ways To Do Data Analytics "The Wrong Way"
PPTX
Big data meetup_10_9_2013
PDF
Dirty data? Clean it up! - Datapalooza Denver 2016
PPT
BAQMaR - Conference DM
PDF
How to Build Successful Data Team - Dataiku ?
PPTX
Data Science - Experiments
PPTX
Thinking Big with Big Data
PDF
Big Data Science Workshop Documentation V1.0
PDF
Big data tutorial_part4
PDF
Data Analysis and Report Generation in Enterprise Mobility Solution
PDF
Behavior Driven Development - Material de clase PMA
PDF
Predicting Startup Market Trends based on the news and social media - Albert ...
PDF
Architecting for Data Science
How Celtra Optimizes its Advertising Platform with Databricks
Be3 experimentingbigdatainabox-part1:comprehendingthescenario
PXL Data Engineering Workshop By Selligent
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Big Data Evolution
BDS14 Big Data Analytics to the masses
Five Ways To Do Data Analytics "The Wrong Way"
Big data meetup_10_9_2013
Dirty data? Clean it up! - Datapalooza Denver 2016
BAQMaR - Conference DM
How to Build Successful Data Team - Dataiku ?
Data Science - Experiments
Thinking Big with Big Data
Big Data Science Workshop Documentation V1.0
Big data tutorial_part4
Data Analysis and Report Generation in Enterprise Mobility Solution
Behavior Driven Development - Material de clase PMA
Predicting Startup Market Trends based on the news and social media - Albert ...
Architecting for Data Science
Ad

More from Matt Stubbs (20)

PDF
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
PDF
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
PDF
Blueprint Series: Expedia Partner Solutions, Data Platform
PDF
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
PDF
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
PDF
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
PDF
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
PDF
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
PDF
Big Data LDN 2018: AI VS. GDPR
PDF
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
PDF
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
PDF
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
PDF
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
PDF
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
PDF
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
PDF
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
PDF
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
PDF
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
PDF
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
PDF
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Introduction to Business Data Analytics.
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Computer network topology notes for revision
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Foundation of Data Science unit number two notes
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Business Data Analytics.
IB Computer Science - Internal Assessment.pptx
Computer network topology notes for revision
Reliability_Chapter_ presentation 1221.5784
.pdf is not working space design for the following data for the following dat...
Foundation of Data Science unit number two notes
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Fluorescence-microscope_Botany_detailed content
IBA_Chapter_11_Slides_Final_Accessible.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Miokarditis (Inflamasi pada Otot Jantung)
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Introduction-to-Cloud-ComputingFinal.pptx

Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS

  • 1. Enabling Data-Driven Decisions with Automated Insights Charlotte Emms Data Insight Analyst @ Seenit charlotte@seenit.io
  • 2. Charlotte Emms Data Insight Analyst @ Seenit - 4 years of experience in the Data Analytics field - Joined Seenit’s Product team in Nov ‘17
  • 3. Introducing Seenit A video collaboration platform that enables organisations to co-create authentic and engaging content with their communities of customers, superfans, or employees.
  • 5. A Brief History of Seenit Founded in January 2014 Over 200k uploads hit 4k+ monthly active users MVP used Couchbase deployed on GCP Still the same today! Seenit Capture v2.0 released in April 2018
  • 6. Our Couchbase Journey • Seenit’s MVP in 2014 was powered by Couchbase 2.2 back when querying the data required MapReduce techniques • The introduction of N1QL transformed Seenit, allowing us to develop new features rapidly and analyse event- based audit data for product usage research • N1QL allowed Seenit to easily understand this data by hiring a Data Analyst proficient in SQL
  • 7. Non-first (N1) Normal Form Query Language (QL) • It is based on ANSI 92 SQL • Its query engine is optimized for modern, highly parallel multi-core execution SQL-like Query Language • Expressive, familiar, and feature-rich language for querying, transforming, and manipulating JSON data N1QL extends SQL to handle data that is: • Nested: Contains nested objects, arrays • Heterogeneous: Schema-optional, non-uniform • Distributed: Partitioned across a cluster What is N1QL?
  • 8. ...this is not a scalable approach for repeatable work! The Seenit Data Analysis Toolkit Display the results using Plotly - a personal favourite in terms of data display versatility and clarity Extract data for analysis using N1QL queries through the Couchbase Python SDK Manipulate the data in Jupyter Notebooks with Python, using analytics libraries like Pandas and Numpy
  • 9. How can I share this data with the wider team?
  • 10. “Let’s build a dashboard” - every data analyst joining a startup ever (probably)
  • 11. Tell a different story with the data
  • 12. How to Communicate Platform Usage and Make it Interesting There was a need for something more tangible. Something that can engage someone from any role in the business For example: How many uploads in a week? How far was our global reach this week? Which were the active projects this week & who was running them Have mobile users been interacting with the in-app-feed?
  • 13. Seenit’s Regular, Fully Automated Platform Update Important overused pun
  • 14. Three core sections ○ This week in numbers ○ This week in lists (what’s new, what’s active) ○ This week in fun facts, which includes the following: ■ most exotic upload location (furthest from the office) ■ most committed contributor (uploads to projects) ■ biggest crowd pleaser (likes from others) The “Engaging” Part
  • 15. The Technical Journey Productionised ● Introduced more thorough HTML templating using Jinja2 ● Scheduled in Jenkins to run once a week ● Runs in Kubernetes by default - also set up to run in Docker Initial Idea / PoC ● Off the back of a conversation at stand-up ● Designed a process in Python through a Jupyter notebook ● Query data in Couchbase, manipulating and returning in tabular form for friendly email display ● Sent using my own gmail account MVP ● Found and wrestled with a nicer looking email template ● Introduced fun facts section ● Moved code to a formal Python project (separate scripts, classes, functions) using an executable file to trigger the email generation ● Sent using Sendgrid as we do for generic email notifications from the platform
  • 18. DataBot A Slack Bot user to deliver bespoke analytics on request • Coded in Python and uses the RTM (Real Time Messaging) Slack API • Mention the name of the bot user to get a response • The bot uses the first word after the direct mention to decide the correct response - no AI/ML training or NLP methods needed Image
  • 19. Image
  • 20. Things I learnt - You don’t need a relational database structure to build automated data programs - You also don’t need a BI tool to communicate data insights (at first*) - Try new things even if they don’t go to plan - Learn new skills from teammates - Welcome feedback and keep iterating!