SlideShare a Scribd company logo
Pavel Hardak, Dir Product (Workday)
Ned Borisov (Ph.D), Sr Eng Mgr (Workday)
Lightning-Fast Analytics for
Workday Transactional Data
#ExpSAIS18
Agenda
• Workday (Pavel H)
– Introduction to Workday
– Business challenges
– Platform for Transactional Apps
• Prism Analytics (Ned B)
– High Level Architecture
– Functional Modules
– Problems encountered
• Wrap-up (Pavel H)
2#ExpSAIS18
Workday
• Pure SaaS company (founded in 2005)
• Enterprise cloud apps – HCM and Finances
– Named as “Leader” in Gartner Magic Quadrants
• 2200+ customers, 175+ of Fortune 500
– Revenue: $2.1B, 36% YoY
• 8600+ employees worldwide
– #7 in FORTUNE "100 Best Companies to Work For”
– Pleasanton (HQ), San Mateo, San Francisco
– Boulder (CO), Dublin (Ireland), Victoria (BC), …
3#ExpSAIS18
Workday Confidential
#ExpSAIS18
Continuous Innovation in Cloud
5#ExpSAIS18
6#ExpSAIS18
Enterprise SaaS Challenges
• Concurrency
– From small to huge companies - every ‘worker’ is Workday user
• Reliability
– All users add and change data, generating many transactions
• Security
– Customers trust us with very confidential and private information
• Scalability
– Import several years from the previous system(s) and keep growing
• Speed
– Everybody wants fast response time J
7#ExpSAIS18
Business Process
Framework
Object
Data Model
Reporting and
Analytics
Security Integration
Cloud
One Source for Data | One Security Model | One Experience | One Community
Machine
Learning
One Platform
#ExpSAIS18
Object
Data Model
One Source for Data | One Security Model | One Experience | One Community
One Platform
Object Data Model
MetadataExtensibleDurable
#ExpSAIS18
Reporting and
Analytics
One Source for Data | One Security Model | One Experience | One Community
One Platform
Reporting and Analytics
Dashboards CollaborationDistribution
But we want more…
• Import 3rd party data from external sources
– Unknown schema, need validations and cleansing
• Blend external data with Workday data
– Self Service Data Preparation
– Publish custom report sources
– Leverage the same security paradigms
• Data Discovery and Reporting
– Visualize, slice and dice by any dimension
– Perform faster than ever before
11#ExpSAIS18
12#ExpSAIS18
Just add some …
• Water (?)
• Coffee (?)
• Energy drink (?)
• Apache Spark (!)
13#ExpSAIS18
Why Apache Spark
• Wanted to standardize on ONE data processing
technology which keeps evolving
• Needed extensibility to handle diverse use cases
• Scalability for on-disk views and in-memory
processing
• SQL processing is a HUGE plus
#ExpSAIS18
High Level Prism Architecture
Report Queries Web UI Requests
Data Prep:
Interactive Transforms
HDFS
Workday Data
External Data
Samples
#ExpSAIS18
Prism Server
Data Preparation
• A dataset may import
other datasets to
transform them (think
SQL View)
• Transforms include:
Filter, Join, Union,
Group By, etc.
• Example data are shown
to help verify the
transformation
#ExpSAIS18
High Level Prism Architecture
Report Queries Web UI Requests
Data Prep:
Interactive Transforms
Lens Build:
Batch Transforms
HDFS
Workday Data
External Data
Samples
Data
#ExpSAIS18
Prism Server
Lens Build
Lens
• Materializing all
transforms
• Columnar format with
further split into small
blocks
Spark
Jobs
#ExpSAIS18
High Level Prism Architecture
Report Queries Web UI Requests
Query Engine:
Interactive BI Queries
Data Prep:
Interactive Transforms
Lens Build:
Batch Transforms
HDFS
Workday Data
External Data
Samples
Lens
Data
#ExpSAIS18
Prism Server
Query Engine
• Analyst-driven Analysis
• Drag & drop chart creation
• Analyst defined computed fields
• Quick measurement aggregates
• Execution
• Query Engine executes the queries
• Interactive response is required
#ExpSAIS18
High Level Prism Architecture
Report Queries Web UI Requests
Query Engine:
Interactive BI Queries
Data Prep:
Interactive Transforms
Lens Build:
Batch Transforms
HDFS
Workday Data
External Data
Samples
Lens
Data
#ExpSAIS18
Prism Server
Spark in Prism Architecture
Prism Analytics launches and maintains lifecycle of three types
of Spark Applications
• Data Prep: a single (smaller) always-on Spark Application
– executes dataset transformations over small samples of data
• Lens Build: on-demand batch Application
– one per Lens Build process
– executes dataset transformations over full datasets
• Query Engine: a single (larger) always-on Application
– executes reporting queries over Lens data
– caches columns of Lenses in memory
#ExpSAIS18
Query Engine & Spark
Query Engine
Prism
Spark
Server
Spark
Driver
Prism Server
Data Prep
. . .
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
#ExpSAIS18
Notable Observations
• Memory Allocation Strategy
• Row Level Security
#ExpSAIS18
Memory Allocation Strategy
• Executors
• Driver
Column Data
Cache
30%
Execution
60% 10%
Buffer
Accumulators
20%
Streaming
60% 20%
Buffer
Executor JVM
Driver JVM
#ExpSAIS18
à 20% faster queries
Row-Level Security
• Implemented as a dimension predicate. For example:
• In-List for supervisory_org could be very large
• More than one In-List
• Complex list values (e.g. nested conjunctions)
SELECT employee, SUM(quantity)
FROM Employee_Stock_Grants
WHERE supervisory_org IN (org1, org33, org_508)
GROUP BY employee;
#ExpSAIS18
Scenario Details
• Customer Use Case
– Predicates with 10+ In-Lists
– Values between 6K and 12K
– Additional mix of conjunctions and disjunctions
• The Same Query
With Security = 100X Without Security
#ExpSAIS18
Analysis
• Finding 1
– Parsing, planning and optimizing was taking ~27 seconds
– We did it 4 times
• Finding 2
– Major cause is the number of times the Catalyst
expressions (In and InSet) and their arguments were
being traversed and copied during plan analysis and
optimization.
– Minor cause is the amount of time spent in serializing
Scala’s TrieSet when shipping the plan to executors
#ExpSAIS18
Solution
• Custom InSet-Like expressions (case classes)
– Hide the large literals sets through a curried-argument
– Resulted in queries going from 27 sec to 4 sec.
• Further Optimizations
– Our InSet-Like expression did not materialize the target
in-sets until after the plan was de-serialized on the
executors
– Resulted in improvement from 4 sec to 2 sec.
#ExpSAIS18
Future Plans
• Better query latency for big datasets
• Deeper integration with reports and apps
• Integration with Kubernetes and AWS
• Improved scalability and concurrency
• Achieve ‘Zero DownTime’
…and much more I can not share here J
30#ExpSAIS18
Questions?
• IF ( you are looking for …
Great work culture &&
Technology challenges &&
Lots of fun and perks )
• THEN
Come to work with us!!!
workday.com/jobs
31#ExpSAIS18
More Info
• Building a modern data discovery and BI platform using
Apache Spark and Catalyst by Kevin Beyer
• Data Preparation in Workday Prism Analytics: Solving
Complex Problems the Workday Way by Jianneng Li
• Exploring Workday’s Architecture by James Pasley
32#ExpSAIS18

More Related Content

PDF
Red Hat OpenShift Container Platform Overview
PDF
Turning Cloud Metrics into Results
PDF
Red Hat Java Update and Quarkus Introduction
PPTX
Hadoop REST API Security with Apache Knox Gateway
PPTX
Introduction to Data Engineering
PDF
How to Build a ML Platform Efficiently Using Open-Source
PDF
Cloud Migration: Moving Data and Infrastructure to the Cloud
PDF
Introduction to Grafana
Red Hat OpenShift Container Platform Overview
Turning Cloud Metrics into Results
Red Hat Java Update and Quarkus Introduction
Hadoop REST API Security with Apache Knox Gateway
Introduction to Data Engineering
How to Build a ML Platform Efficiently Using Open-Source
Cloud Migration: Moving Data and Infrastructure to the Cloud
Introduction to Grafana

What's hot (20)

PPTX
Node.Js: Basics Concepts and Introduction
PDF
Testing Rich Domain Models
PDF
Hadoop Hbase - Introduction
PPT
Présentation Ionic Framework
PPTX
Data Engineering Proposal for Homerunner.pptx
PPTX
Docker 101 - Nov 2016
PDF
Intro to Terraform
PDF
PDF
Deploy resources on Azure using IaC (Azure Terraform)
PPTX
PPTX
Real Time analytics with Druid, Apache Spark and Kafka
PDF
Terraform: Infrastructure as Code
PDF
Docker Containers Deep Dive
PDF
gRPC: The Story of Microservices at Square
PDF
Introduction to Tekton
PDF
MLflow Model Serving
PDF
P3 listes et elements graphiques avancés
PDF
Sitecore Helix/Habitat Architecture and Ecosystem
PPTX
Devops as a service
PPTX
Event driven architecture
Node.Js: Basics Concepts and Introduction
Testing Rich Domain Models
Hadoop Hbase - Introduction
Présentation Ionic Framework
Data Engineering Proposal for Homerunner.pptx
Docker 101 - Nov 2016
Intro to Terraform
Deploy resources on Azure using IaC (Azure Terraform)
Real Time analytics with Druid, Apache Spark and Kafka
Terraform: Infrastructure as Code
Docker Containers Deep Dive
gRPC: The Story of Microservices at Square
Introduction to Tekton
MLflow Model Serving
P3 listes et elements graphiques avancés
Sitecore Helix/Habitat Architecture and Ecosystem
Devops as a service
Event driven architecture
Ad

Similar to Lightning-fast Analytics for Workday transactional data (20)

PDF
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
PDF
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
PPTX
Overview of Workday Prism Analytics Training
PDF
Tutorial: (Additional Slides) Business Intelligence: Making the Right Choices...
PDF
Age of Exploration: How to Achieve Enterprise-Wide Discovery
PDF
A Better Understanding: Solving Business Challenges with Data
PPTX
Improving Quality and Adoption: EIM SQL Server 2012
PPTX
Sharepoint 2010 composites
PDF
Enterprise 365 - SoftServe presentation
PDF
Data Culture Series - Keynote - 16th September 2014
PDF
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
PDF
InterConnect 2017 : Cognitive DevOps: Get Rid of the Guesswork to Improve Sof...
PPTX
How to Empower Your Business Users with Oracle Data Visualization
PDF
Bringing Agility and Flexibility to Data Design and Integration
PPT
Vs2008 Ms Lux
PPT
Vs2008 Ms Lux
PDF
Structurally Sound: How to Tame Your Architecture
PPTX
Enterprise Information Systemnables data to be used by multiple functions and...
PPTX
Transform Data into Action
PPTX
What we learned at PASS Summit in 2019
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Overview of Workday Prism Analytics Training
Tutorial: (Additional Slides) Business Intelligence: Making the Right Choices...
Age of Exploration: How to Achieve Enterprise-Wide Discovery
A Better Understanding: Solving Business Challenges with Data
Improving Quality and Adoption: EIM SQL Server 2012
Sharepoint 2010 composites
Enterprise 365 - SoftServe presentation
Data Culture Series - Keynote - 16th September 2014
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
InterConnect 2017 : Cognitive DevOps: Get Rid of the Guesswork to Improve Sof...
How to Empower Your Business Users with Oracle Data Visualization
Bringing Agility and Flexibility to Data Design and Integration
Vs2008 Ms Lux
Vs2008 Ms Lux
Structurally Sound: How to Tame Your Architecture
Enterprise Information Systemnables data to be used by multiple functions and...
Transform Data into Action
What we learned at PASS Summit in 2019
Ad

Recently uploaded (20)

PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
System and Network Administraation Chapter 3
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
top salesforce developer skills in 2025.pdf
PPTX
Transform Your Business with a Software ERP System
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Nekopoi APK 2025 free lastest update
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
AI in Product Development-omnex systems
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Design an Analysis of Algorithms II-SECS-1021-03
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Operating system designcfffgfgggggggvggggggggg
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
System and Network Administraation Chapter 3
VVF-Customer-Presentation2025-Ver1.9.pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
top salesforce developer skills in 2025.pdf
Transform Your Business with a Software ERP System
wealthsignaloriginal-com-DS-text-... (1).pdf
Odoo Companies in India – Driving Business Transformation.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Nekopoi APK 2025 free lastest update
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
AI in Product Development-omnex systems
How to Choose the Right IT Partner for Your Business in Malaysia
Upgrade and Innovation Strategies for SAP ERP Customers
Understanding Forklifts - TECH EHS Solution
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Design an Analysis of Algorithms I-SECS-1021-03
Design an Analysis of Algorithms II-SECS-1021-03

Lightning-fast Analytics for Workday transactional data

  • 1. Pavel Hardak, Dir Product (Workday) Ned Borisov (Ph.D), Sr Eng Mgr (Workday) Lightning-Fast Analytics for Workday Transactional Data #ExpSAIS18
  • 2. Agenda • Workday (Pavel H) – Introduction to Workday – Business challenges – Platform for Transactional Apps • Prism Analytics (Ned B) – High Level Architecture – Functional Modules – Problems encountered • Wrap-up (Pavel H) 2#ExpSAIS18
  • 3. Workday • Pure SaaS company (founded in 2005) • Enterprise cloud apps – HCM and Finances – Named as “Leader” in Gartner Magic Quadrants • 2200+ customers, 175+ of Fortune 500 – Revenue: $2.1B, 36% YoY • 8600+ employees worldwide – #7 in FORTUNE "100 Best Companies to Work For” – Pleasanton (HQ), San Mateo, San Francisco – Boulder (CO), Dublin (Ireland), Victoria (BC), … 3#ExpSAIS18
  • 5. Continuous Innovation in Cloud 5#ExpSAIS18
  • 7. Enterprise SaaS Challenges • Concurrency – From small to huge companies - every ‘worker’ is Workday user • Reliability – All users add and change data, generating many transactions • Security – Customers trust us with very confidential and private information • Scalability – Import several years from the previous system(s) and keep growing • Speed – Everybody wants fast response time J 7#ExpSAIS18
  • 8. Business Process Framework Object Data Model Reporting and Analytics Security Integration Cloud One Source for Data | One Security Model | One Experience | One Community Machine Learning One Platform #ExpSAIS18
  • 9. Object Data Model One Source for Data | One Security Model | One Experience | One Community One Platform Object Data Model MetadataExtensibleDurable #ExpSAIS18
  • 10. Reporting and Analytics One Source for Data | One Security Model | One Experience | One Community One Platform Reporting and Analytics Dashboards CollaborationDistribution
  • 11. But we want more… • Import 3rd party data from external sources – Unknown schema, need validations and cleansing • Blend external data with Workday data – Self Service Data Preparation – Publish custom report sources – Leverage the same security paradigms • Data Discovery and Reporting – Visualize, slice and dice by any dimension – Perform faster than ever before 11#ExpSAIS18
  • 13. Just add some … • Water (?) • Coffee (?) • Energy drink (?) • Apache Spark (!) 13#ExpSAIS18
  • 14. Why Apache Spark • Wanted to standardize on ONE data processing technology which keeps evolving • Needed extensibility to handle diverse use cases • Scalability for on-disk views and in-memory processing • SQL processing is a HUGE plus #ExpSAIS18
  • 15. High Level Prism Architecture Report Queries Web UI Requests Data Prep: Interactive Transforms HDFS Workday Data External Data Samples #ExpSAIS18 Prism Server
  • 16. Data Preparation • A dataset may import other datasets to transform them (think SQL View) • Transforms include: Filter, Join, Union, Group By, etc. • Example data are shown to help verify the transformation #ExpSAIS18
  • 17. High Level Prism Architecture Report Queries Web UI Requests Data Prep: Interactive Transforms Lens Build: Batch Transforms HDFS Workday Data External Data Samples Data #ExpSAIS18 Prism Server
  • 18. Lens Build Lens • Materializing all transforms • Columnar format with further split into small blocks Spark Jobs #ExpSAIS18
  • 19. High Level Prism Architecture Report Queries Web UI Requests Query Engine: Interactive BI Queries Data Prep: Interactive Transforms Lens Build: Batch Transforms HDFS Workday Data External Data Samples Lens Data #ExpSAIS18 Prism Server
  • 20. Query Engine • Analyst-driven Analysis • Drag & drop chart creation • Analyst defined computed fields • Quick measurement aggregates • Execution • Query Engine executes the queries • Interactive response is required #ExpSAIS18
  • 21. High Level Prism Architecture Report Queries Web UI Requests Query Engine: Interactive BI Queries Data Prep: Interactive Transforms Lens Build: Batch Transforms HDFS Workday Data External Data Samples Lens Data #ExpSAIS18 Prism Server
  • 22. Spark in Prism Architecture Prism Analytics launches and maintains lifecycle of three types of Spark Applications • Data Prep: a single (smaller) always-on Spark Application – executes dataset transformations over small samples of data • Lens Build: on-demand batch Application – one per Lens Build process – executes dataset transformations over full datasets • Query Engine: a single (larger) always-on Application – executes reporting queries over Lens data – caches columns of Lenses in memory #ExpSAIS18
  • 23. Query Engine & Spark Query Engine Prism Spark Server Spark Driver Prism Server Data Prep . . . Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor #ExpSAIS18
  • 24. Notable Observations • Memory Allocation Strategy • Row Level Security #ExpSAIS18
  • 25. Memory Allocation Strategy • Executors • Driver Column Data Cache 30% Execution 60% 10% Buffer Accumulators 20% Streaming 60% 20% Buffer Executor JVM Driver JVM #ExpSAIS18 à 20% faster queries
  • 26. Row-Level Security • Implemented as a dimension predicate. For example: • In-List for supervisory_org could be very large • More than one In-List • Complex list values (e.g. nested conjunctions) SELECT employee, SUM(quantity) FROM Employee_Stock_Grants WHERE supervisory_org IN (org1, org33, org_508) GROUP BY employee; #ExpSAIS18
  • 27. Scenario Details • Customer Use Case – Predicates with 10+ In-Lists – Values between 6K and 12K – Additional mix of conjunctions and disjunctions • The Same Query With Security = 100X Without Security #ExpSAIS18
  • 28. Analysis • Finding 1 – Parsing, planning and optimizing was taking ~27 seconds – We did it 4 times • Finding 2 – Major cause is the number of times the Catalyst expressions (In and InSet) and their arguments were being traversed and copied during plan analysis and optimization. – Minor cause is the amount of time spent in serializing Scala’s TrieSet when shipping the plan to executors #ExpSAIS18
  • 29. Solution • Custom InSet-Like expressions (case classes) – Hide the large literals sets through a curried-argument – Resulted in queries going from 27 sec to 4 sec. • Further Optimizations – Our InSet-Like expression did not materialize the target in-sets until after the plan was de-serialized on the executors – Resulted in improvement from 4 sec to 2 sec. #ExpSAIS18
  • 30. Future Plans • Better query latency for big datasets • Deeper integration with reports and apps • Integration with Kubernetes and AWS • Improved scalability and concurrency • Achieve ‘Zero DownTime’ …and much more I can not share here J 30#ExpSAIS18
  • 31. Questions? • IF ( you are looking for … Great work culture && Technology challenges && Lots of fun and perks ) • THEN Come to work with us!!! workday.com/jobs 31#ExpSAIS18
  • 32. More Info • Building a modern data discovery and BI platform using Apache Spark and Catalyst by Kevin Beyer • Data Preparation in Workday Prism Analytics: Solving Complex Problems the Workday Way by Jianneng Li • Exploring Workday’s Architecture by James Pasley 32#ExpSAIS18