SlideShare a Scribd company logo
Big Data @ Zulily
By Echo Li, Data Engineer,
eli@zulily.com
Data Services, BI and Big Data Analytics, Zulily
Where we are
Powerful and Flexible
2
BIDS Data Platform
CUSTOMER INTERACTION POINTS
WEBSTORE MEMBER ENGAGEMENT
EVENT MANAGEMENT VENDOR
MANAGEMENT
SUPPLY CHAIN ERP & BACK OFFICE
Site Mobile
Orders &
Payments
Content
Mgmt.
CRM
Relevancy
(personalization)
Messaging
Offers &
Promotions
Item
Master
Catalog &
Event
Workflow
Mgmt.
Planning PortalEDI / Data
Exchange
Purchase
Orders
Workflow
& Tools
Order
Mgmt.
Fulfillment
Mgmt.
Transportation
Warehouse &
Inventory
Mgmt.
Financial
(SAP
Enterprise)
Business
Intelligence
HRIS
Warehouse
Automation
Initiatives:
• Capacity & Scale
• Data driven decision
making
– Data for Everyone
• Better customer
experience through
Personalization &
Targeting
How We Do It
…powered by Hortonworks Data Platform & Google Cloud
Tableau (Visualization & Reporting) Data Services (ZATA API)
Google BigQuery
Big Data Platform - Google Compute Engine
Hortonworks Data Platform 2.1 on Google Cloud
HDFS YARN HIVE/TEZ AMBARI
Google Cloud Storage
Platform Tools (zulily Build)
ZuSync (ETL) ZuScheduler (Scheduling) ZuMon (Data Monitoring)
Customer Data Mart Merch DataMart Supply Chain DataMart
Clickstream/Web Analytics
Data Processing Pipeline & Analytics
2014 zulily Proprietary and Confidential
4
Operational
Systems
External APIs
(Google, FB, Yahoo, Bing etc)
Hadoop Processing in Cloud
Real Time
ZuSync
Landing
Zone(LZ)
Staging(stg)
AtomicData
Store(ADS)
Aggregated
Dataset
Tier 1 ETLWF Tier 2 ETLWF
Cust ADS
Order ADS
Clickstrea
m
Big Query Tables
Our Journey…
5
Data Platform V1.0
Technology Stack:
• SQL Server
Challenges:
• Scale & Only supported
structured relational data
Advantages:
• Simple
• All data in same data store
• Makes it easy for
visualization, analytics and
reporting
Data Platform V2.0
Technology Stack:
• SQL Server, Apache Hadoop
Challenges:
• Lack of single data store
• Unable to mash up data
across structured and
unstructured data
• Difficult to scale visualization
with large scale data
Advantage:
• Ability to process
unstructured data at scale
• Tableau allows us to have
single visualization layer on
top of all data
Modern Data Platform V3.0
Technology Stack:
• Hadoop, Google Cloud Platform, Big
Query
Challenges:
• New Pricing Model which is good and
bad
• Requires new data processing
methodology(especially for structured
data)
Advantages:
• Supports Scale, high Speed
• Single Data Platform for structured and
unstructured data
• Enables scenarios which were difficult
to achieve in V1.0 or V2.0
• Enterprise Hadoop capabilities enable
management, monitoring and workflow
definition which are critical
Use Cases…
Use Case#1: Site & Event Funnel Analysis
Google
Cloud
Storage
Hadoop/GCE
Web
Servers
zulily
data
API
BigQuery
Funnel Analysis
ZATA(DATA API)
Reporting &
Analysis
(Powered by
Tableau)
Benefits
Increase Revenue
Improve marketing strategy and
targeting
Improve business decisions
Hadoop
/GCE
Use Case #3: Supply Chain Visibility
zulily
Sync
Others
Carriers
Google
Cloud
Storage
Order
Visibility
BigQuery
In Transit
Shipment
PO
zulily SCS
PO Shipment
EDI
Flat File
Vendor
Data
Exch.
Benefits
End to end
order visibility
Manage by exception
Reduce shipping costs
As our Journey Continues… we need more talents !!!
Please check out our career page:
http://www.zulily.com/careers

More Related Content

PDF
SiSense Overview
PPTX
Altis Webinar: Use Cases For The Modern Data Platform
PPTX
MicroStrategy on Amazon Web Services (AWS) Cloud
 
PDF
Making the most of your Snowflake Investment
PPTX
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
PPTX
Instant Analytics with Birst and SAP HANA Cloud Platform for #sitNL
PPTX
Data Technology Platform @ RueLaLa.com
PPTX
Qlik sense- Technical Seminar
SiSense Overview
Altis Webinar: Use Cases For The Modern Data Platform
MicroStrategy on Amazon Web Services (AWS) Cloud
 
Making the most of your Snowflake Investment
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
Instant Analytics with Birst and SAP HANA Cloud Platform for #sitNL
Data Technology Platform @ RueLaLa.com
Qlik sense- Technical Seminar

What's hot (20)

PDF
From ingest to insights with AWS
PDF
8 ways qlik integrates with salesforce.com
PDF
Understanding Cortana Intelligence Suite & Power BI Demo
PPTX
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
PDF
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
PDF
Why Finance Should Consider Agile Modern Data Delivery Platform
PDF
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
PDF
Why HR Should Consider Agile Modern Data Delivery Platform
PDF
Finance Analytics
PDF
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
PPT
Hybrid IT: Legg Mason
PDF
DesignMind Data Analytics Consulting
PDF
Why Business Intelligence Should Consider Agile Modern Data Delivery Platform
PPTX
Master the Multi-Clustered Data Warehouse - Snowflake
PPTX
Neumann Data Platform
PPTX
Dynamics Day 2015: Dynamics AX and Enterprise Cloud Computing
PPTX
Informatica Cloud Winter 2016 Release Webinar
PPTX
Snowplow: where we came from and where we are going - March 2016
PPTX
Datamensional Business Intelligence and Data Services
PPTX
Dynamics Day 2017 Adelaide - IoT, Machine Learning and Big Data actionable in...
From ingest to insights with AWS
8 ways qlik integrates with salesforce.com
Understanding Cortana Intelligence Suite & Power BI Demo
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Why Finance Should Consider Agile Modern Data Delivery Platform
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
Why HR Should Consider Agile Modern Data Delivery Platform
Finance Analytics
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
Hybrid IT: Legg Mason
DesignMind Data Analytics Consulting
Why Business Intelligence Should Consider Agile Modern Data Delivery Platform
Master the Multi-Clustered Data Warehouse - Snowflake
Neumann Data Platform
Dynamics Day 2015: Dynamics AX and Enterprise Cloud Computing
Informatica Cloud Winter 2016 Release Webinar
Snowplow: where we came from and where we are going - March 2016
Datamensional Business Intelligence and Data Services
Dynamics Day 2017 Adelaide - IoT, Machine Learning and Big Data actionable in...
Ad

Similar to Big data at zulily (20)

PDF
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
PDF
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
PDF
Making the Most of Power BI with SQL Server 2014 and Azure
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
PPTX
Building Modern Data Platform with Microsoft Azure
PDF
Bringing the Power of Big Data Computation to Salesforce
PPTX
Power BI - 2016 - Public
PDF
Accelerate Self-Service Analytics with Data Virtualization and Visualization
PDF
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
PPTX
Skillwise Big Data part 2
PPTX
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
PPTX
Skilwise Big data
PPTX
Feature Store as a Data Foundation for Machine Learning
PPTX
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
PDF
the Data World Distilled
PPTX
Derfor skal du bruge en DataLake
PDF
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
PPTX
OAC Workshop - Detroit 2019
PDF
Tapdata Product Intro
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Making the Most of Power BI with SQL Server 2014 and Azure
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
Building Modern Data Platform with Microsoft Azure
Bringing the Power of Big Data Computation to Salesforce
Power BI - 2016 - Public
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Skillwise Big Data part 2
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Skilwise Big data
Feature Store as a Data Foundation for Machine Learning
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
the Data World Distilled
Derfor skal du bruge en DataLake
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
OAC Workshop - Detroit 2019
Tapdata Product Intro
Ad

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
[EN] Industrial Machine Downtime Prediction
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
annual-report-2024-2025 original latest.
PPTX
Modelling in Business Intelligence , information system
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Leprosy and NLEP programme community medicine
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Transcultural that can help you someday.
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Mega Projects Data Mega Projects Data
importance of Data-Visualization-in-Data-Science. for mba studnts
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Pilar Kemerdekaan dan Identi Bangsa.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
[EN] Industrial Machine Downtime Prediction
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
annual-report-2024-2025 original latest.
Modelling in Business Intelligence , information system
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Data Science and Data Analysis
IBA_Chapter_11_Slides_Final_Accessible.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Leprosy and NLEP programme community medicine
Introduction-to-Cloud-ComputingFinal.pptx
Transcultural that can help you someday.
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...

Big data at zulily

  • 1. Big Data @ Zulily By Echo Li, Data Engineer, eli@zulily.com Data Services, BI and Big Data Analytics, Zulily
  • 2. Where we are Powerful and Flexible 2 BIDS Data Platform CUSTOMER INTERACTION POINTS WEBSTORE MEMBER ENGAGEMENT EVENT MANAGEMENT VENDOR MANAGEMENT SUPPLY CHAIN ERP & BACK OFFICE Site Mobile Orders & Payments Content Mgmt. CRM Relevancy (personalization) Messaging Offers & Promotions Item Master Catalog & Event Workflow Mgmt. Planning PortalEDI / Data Exchange Purchase Orders Workflow & Tools Order Mgmt. Fulfillment Mgmt. Transportation Warehouse & Inventory Mgmt. Financial (SAP Enterprise) Business Intelligence HRIS Warehouse Automation Initiatives: • Capacity & Scale • Data driven decision making – Data for Everyone • Better customer experience through Personalization & Targeting
  • 3. How We Do It …powered by Hortonworks Data Platform & Google Cloud Tableau (Visualization & Reporting) Data Services (ZATA API) Google BigQuery Big Data Platform - Google Compute Engine Hortonworks Data Platform 2.1 on Google Cloud HDFS YARN HIVE/TEZ AMBARI Google Cloud Storage Platform Tools (zulily Build) ZuSync (ETL) ZuScheduler (Scheduling) ZuMon (Data Monitoring) Customer Data Mart Merch DataMart Supply Chain DataMart Clickstream/Web Analytics
  • 4. Data Processing Pipeline & Analytics 2014 zulily Proprietary and Confidential 4 Operational Systems External APIs (Google, FB, Yahoo, Bing etc) Hadoop Processing in Cloud Real Time ZuSync Landing Zone(LZ) Staging(stg) AtomicData Store(ADS) Aggregated Dataset Tier 1 ETLWF Tier 2 ETLWF Cust ADS Order ADS Clickstrea m Big Query Tables
  • 5. Our Journey… 5 Data Platform V1.0 Technology Stack: • SQL Server Challenges: • Scale & Only supported structured relational data Advantages: • Simple • All data in same data store • Makes it easy for visualization, analytics and reporting Data Platform V2.0 Technology Stack: • SQL Server, Apache Hadoop Challenges: • Lack of single data store • Unable to mash up data across structured and unstructured data • Difficult to scale visualization with large scale data Advantage: • Ability to process unstructured data at scale • Tableau allows us to have single visualization layer on top of all data Modern Data Platform V3.0 Technology Stack: • Hadoop, Google Cloud Platform, Big Query Challenges: • New Pricing Model which is good and bad • Requires new data processing methodology(especially for structured data) Advantages: • Supports Scale, high Speed • Single Data Platform for structured and unstructured data • Enables scenarios which were difficult to achieve in V1.0 or V2.0 • Enterprise Hadoop capabilities enable management, monitoring and workflow definition which are critical
  • 6. Use Cases… Use Case#1: Site & Event Funnel Analysis Google Cloud Storage Hadoop/GCE Web Servers zulily data API BigQuery Funnel Analysis ZATA(DATA API) Reporting & Analysis (Powered by Tableau) Benefits Increase Revenue Improve marketing strategy and targeting Improve business decisions
  • 7. Hadoop /GCE Use Case #3: Supply Chain Visibility zulily Sync Others Carriers Google Cloud Storage Order Visibility BigQuery In Transit Shipment PO zulily SCS PO Shipment EDI Flat File Vendor Data Exch. Benefits End to end order visibility Manage by exception Reduce shipping costs
  • 8. As our Journey Continues… we need more talents !!! Please check out our career page: http://www.zulily.com/careers

Editor's Notes

  • #6: Relational data is doubling every few months Non relational data is growing even faster It’s not about clicks its about impressions It’s not about who visited the site but who did not Data was fragmented across different stores limiting analytics More people more need for faster data