SlideShare a Scribd company logo
QUILTS: Multidimensional Data Partitioning
Framework Based on Query-Aware and Skew-
Tolerant Space-Filling Curve
Shoji Nishimura (NEC Corporation, Tokyo Institute of
Technology)
Haruo Yokota (Tokyo Institute of Technology)
2 © NEC Corporation 2017
Background: Massive Multidimensional Data
▌Multidimensional data in daily life
Sales log (e.g. product, region, date of sales)
Sensor data (e.g. space, time)
…
▌Retrieve part of Big Data for analysis
Recent data (Constraint)
Data within a range (Aggregation Unit)
▌Multidimensional range query is
a fundamental operation for data retrieval
Date
Product
Region
3 © NEC Corporation 2017
Space-Filling Curve Based Multi-dimensional Index
Map multi-dimension
onto single-dimension
Partition data in range
by disk page capacity
4 © NEC Corporation 2017
Multidimensional Range Query on SFC based MD-Index
Query Region
Query Sections
Interpreted as
Accessed pages
5 © NEC Corporation 2017
Challenge: Find the curve to maximize query performance
▌Query performance: number of accessed pages
▌Mathematical analysis on the optimal curve for any query patterns
Locality-preserving [Mokbel et al. CIKM’01, GeoInformatica’03]
Clustering number [Moon et al. TKDE’01] [Xu et al. POD’12]
▌How about the optimal curve for prioritized query patterns?
NP-hard problem to find the optimal data partitioning considering
query patterns and data distribution [Sun et al. SIGMOD’14]
Query performance depends on curve
6 © NEC Corporation 2017
Contributions
▌Cohesion-based Cost Model
Metric of curve property for query pattern and data distribution
▌Curve Design Method
Heuristics to construct effective curves in terms of cohesion-based cost model
Input Output
Query Patterns Designed Curves
Curve Design
Method
Cohesion-based
Cost Model
Cohesion Based Cost Model
8 © NEC Corporation 2017
Quiz: Which curve is better for the query region?
Using
Curve 1
Using
Curve 2
Map onto
Query Region
Curve 1 or Curve 2 ?
9 © NEC Corporation 2017
Answer: Depend on data distribution!
Sparse distribution
Dense distribution
Curve 1: 3 pages
Curve 2: 2 pages
Curve 1: 1 page
Curve 2: 2 pages
Need the best curve for any data distribution
10 © NEC Corporation 2017
Curve 1: 1 page
Curve 2: 2 pages
Curve 1: 3 pages
Curve 2: 2 pages
Proposal: Cohesion Based Cost Model
▌Total Cohesion Cost: Global × Local
Global Cohesion Cost: robust for sparse distribution
Local Cohesion Cost: robust for dense distribution
11 © NEC Corporation 2017
Global Cohesion Cost: Robust for sparse distribution
𝐶𝑔𝑙𝑜𝑏𝑎𝑙 = ∑𝑛𝑖
𝑛𝑖 : the i-th gap length
Global Cohesion Cost: Sum of gap length
Shorter total gap length
Higher possibility to be covered within single pages
𝑛1 𝑛2Gap length:
𝑛′1 𝑛′2
12 © NEC Corporation 2017
Local Cohesion Cost: Robust for dense distribution
𝑛𝑖 : the i-th gap length
Local Cohesion Cost: Entropy of gap ratio
𝐶𝑙𝑜𝑐𝑎𝑙 = −∑
𝑛𝑖
∑𝑛𝑖
log
𝑛𝑖
∑𝑛𝑖
Smaller number of gaps, Higher variance of the gap length
Higher possibility to be covered within smaller number of pages
13 © NEC Corporation 2017
▌Global Cohesion Cost: Robust for sparse distribution
▌Local Cohesion Cost: Robust for dense distribution
▌Both cohesion cost are small
Expect the curve is robust for any distribution
Total Cohesion Cost
Total Cohesion Cost
Evaluate both global and local cohesion cost of query sections
= (∑𝑛𝑖)log(∑𝑛𝑖) − ∑ (𝑛𝑖log 𝑛𝑖)
𝐶𝑡𝑜𝑡𝑎𝑙 = 𝐶𝑔𝑙𝑜𝑏𝑎𝑙 × 𝐶𝑙𝑜𝑐𝑎𝑙
Curve Design Method
15 © NEC Corporation 2017
Goal: Design better curve for target query pattern
Design a robust curve for any distribution to maximize performance
of target query pattern
Input Output
Query Patterns Designed Curves
Curve Design
Method
Low total cohesion cost
16 © NEC Corporation 2017
Preliminary Experiments: Total cohesion cost of curves
Z-curve
C-curve
(Composite Index)
2
128
2
128
17 © NEC Corporation 2017
Preliminary Experiments: Total cohesion cost of curves
Z-curve
C-curve
(Composite Index)
16
16
16
16
18 © NEC Corporation 2017
Preliminary Experiments: Total cohesion cost of curves
Z-curve
C-curve
(Composite Index)
2
128
2
128
19 © NEC Corporation 2017
Preliminary Experiments: Total cohesion cost of curves
The best
for Z-curve
The best
for C-curve
Z-curve
C-curve
(Composite Index)
16
16
2
128
20 © NEC Corporation 2017
What curve do you use for intermediate query pattern?
Square-like
Query Pattern
Elongate-Rect. Like
Query Pattern
Z-Curve C-Curve
21 © NEC Corporation 2017
What curve do you use for intermediate query pattern?
Square-like
Query Pattern
Elongate-Rect. Like
Query Pattern
Z-Curve C-Curve
Intermediate Query Pattern
22 © NEC Corporation 2017
What curve do you use for intermediate query pattern?
Square-like
Query Pattern
Elongate-Rect. Like
Query Pattern
Z-Curve C-Curve
Intermediate Query Pattern
Hybrid curve between Z-curve and C-curve
23 © NEC Corporation 2017
Observation: Construction of Z-curve and C-curve
Z-Curve C-Curve (Composite Index)
0000 0001 0100 0101
0010 0011 0110 0111
1000 1001 1100 1101
1010 1011 1110 1111
0000 0001 0010 0011
0100 0101 0110 0111
1000 1001 1010 1011
1100 1101 1110 1111
00 01 10 11 00 01 10 11
00
01
10
11
Bit-interleaving of attributes Bit-concatenation of attributes
00
01
10
11
Generalize the bit-merging order
yyxxyxyx
24 © NEC Corporation 2017
Bit-Merging Curve Family: Generalization of Z- and C-Curves
Bit ConcatenationBit Interleaving
Attribute X
preference
Attribute Y
preference
xxxyyy
yyyxxxyxyxyx
xyxyxy xyxxyy
yxyyxx
Explosive curve design space
25 © NEC Corporation 2017
Observation: Curve Trajectory
Square-like
Query Pattern
Elongate-Rect. Like
Query Pattern
Z-Curve C-Curve
26 © NEC Corporation 2017
Observation: Curve Trajectory
Square-like
Query Pattern
Elongate-Rect. Like
Query Pattern
Curve trajectory is contiguous within query region
27 © NEC Corporation 2017
What hybrid curve is better for intermediate query pattern
Square-like
Query Pattern
Elongate-Rect. Like
Query Pattern
Intermediate Query Pattern
28 © NEC Corporation 2017
Brute Force Evaluations
▌Query Pattern:
2^3 for X and 2^5 for Y
▌Generate 80 curves and
Sort them by total cohesion cost
▌The trajectory-contiguous
curves ( ) are top-ranked
in total cohesion cost
23
25
Top 25 Curves in Total Cohesion Cost
29 © NEC Corporation 2017
Curve Design Method for Single Query Pattern
Design a curve which trajectory is contiguous within query region
1 bits 2 bits
X: Y:
21
22
x…x x y…y yy
Designed Curves:
xxy yyx
…
xyx yxy yxx xyy
30 © NEC Corporation 2017
Curve Design Method for Multiple Query Patterns
Design a curve which trajectory is contiguous within each query
pattern
Lx bits Ly bits
Designed curve
Y:X:
Uy bitsUx bits
Lx + Ly bits
Ux + Uy bits
2Lx
2Ux
2Ly
2Uy
x xy yyx
31 © NEC Corporation 2017
Curve Design Method for Multiple Query Patterns
Design a curve which trajectory is contiguous within each query
pattern
1 bits 2 bits
Designed curve
Y:X:
3 bits2 bits
1 + 2 bits
2 + 3 bits
21
22
22
23
x xy yyx
Evaluations
33 © NEC Corporation 2017
Evaluation on DWH application
▌Data schema: 3 dimensions (Date, Product, Store)
▌Dataset size: 100 Million records
▌Data distribution:
Combination of Poisson, Zipf, and Standard distributions
▌Query Patterns: 6 query patterns
▌Comparisons
Designed Curve
C-Curve, Z-Curve 1 (left-zero-padding), Z-Curve 2 (right-zero-padding), R*-Tree
Day
Week
Month
Category
Department
Region× ×
Date Product Store
34 © NEC Corporation 2017
Evaluation Results
▌The designed curve shows the best results for all query patterns
▌The designed curve improve performance up to 6 times than the
second best curve
#ofpageaccesses
Query Patterns
1
10
100
1000
day week month day week month
category department
Proposed Curve C-curve Z-curve 1 Z-curve 2 R*-Tree
Day
Category
Region
Week
Category
Region
Month
Category
Region
Day
Department
Region
Week
Department
Region
Month
Department
Region
35 © NEC Corporation 2017
Evaluation on GIS Application
▌Dataset
NYC taxi logs in 2015
150 million records with 2% invalid records
▌Query patterns
▌Comparisons
Designed curve
Z-curve 1 (longitude-latitude preference), Z-curve 2 (Date-preferece), R*-Tree
Hour
Day
Week
Month
Date
0.0001
0.001
0.01
0.1
Longitude
×
Manhattan
JFK Airport
0.0001
0.001
0.01
0.1
×
Latitude
36 © NEC Corporation 2017
1
10
100
1000
10000
0.0001
0.001
0.01
0.1
0.0001
0.001
0.01
0.1
0.0001
0.001
0.01
0.1
0.0001
0.001
0.01
0.1
hour day week month
Proposed curve Z-curve 1 Z-curve 2 R*-Tree
Evaluation results (1/2)
▌The designed curve shows the best results for 10 query patterns
#ofpageaccesses
Query Patterns
37 © NEC Corporation 2017
1
10
100
1000
10000
0.0001
0.001
0.01
0.1
0.0001
0.001
0.01
0.1
0.0001
0.001
0.01
0.1
0.0001
0.001
0.01
0.1
hour day week month
Proposed curve Z-curve 1 Z-curve 2 R*-Tree
Evaluation results (2/2)
▌The design curve may fail for some query patterns
▌Because single curve could not cover all query patterns
Future work
#ofpageaccesses
Query Patterns
Conclusion
39 © NEC Corporation 2017
Conclusion
▌Propose a multidimensional data partitioning framework
Cohesion based cost model to evaluate skew-tolerance property
Bit-merging curve family to extend curve design space
Curve design method to construct a query-aware and skew-tolerant curve
▌Demonstrate that designed curve improves performance up to 6
times for multiple query patterns on DWH and GIS applications
▌Future work
Evaluate other applications that have more attributes
Handle the case that single curve cannot cover all query patterns
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curve
Back up
42 © NEC Corporation 2017
Curve Notation
▌Name curves after the bit-merging order
▌Shorten curve name by run-length compression and grouping
parenthesis
xyxyxy xyxxyy
xyxyxy
(xy)3
xyxxyy
xyx2y2
43 © NEC Corporation 2017
Corner case of proposed method
Lx
Ux
Ly
Uy
44 © NEC Corporation 2017
Curve Design Method
Query Patterns Designed Curves
Bit-Merging Curve Family
Curve Design
Method
Select from

More Related Content

PDF
Nas net where model learn to generate models
PPTX
2.5D Clip-Surfaces for Technical Visualization
PPTX
Processing Large ToF-SIMS Datasets
PPTX
Rendering of Complex 3D Treemaps (GRAPP 2013)
PDF
Mask R-CNN
PDF
VJAI Paper Reading#3-KDD2019-ClusterGCN
PPT
Towards Utilizing GPUs in Information Visualization
PDF
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Nas net where model learn to generate models
2.5D Clip-Surfaces for Technical Visualization
Processing Large ToF-SIMS Datasets
Rendering of Complex 3D Treemaps (GRAPP 2013)
Mask R-CNN
VJAI Paper Reading#3-KDD2019-ClusterGCN
Towards Utilizing GPUs in Information Visualization
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...

What's hot (14)

PPT
Meshing and Simplification of High Resolution Urban Surface Data for UAV Path...
PPTX
Improving access to satellite imagery with Cloud computing
PPTX
Convolutional Patch Representations for Image Retrieval An unsupervised approach
PDF
Looking from Above: Object Detection and Other Computer Vision Tasks on Satel...
PPTX
Ivan Sahumbaiev "Deep Learning approaches meet 3D data"
PDF
Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...
PPTX
Petrel course Module_1: Import data and management, make simple surfaces
PDF
위성이미지 객체 검출 대회 - 2등
PPTX
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
PPTX
Semantic Mapping of Road Scenes
PPTX
Tutorial on Object Detection (Faster R-CNN)
PPT
Scale surface reconstruction
PDF
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
PPTX
Model compression
Meshing and Simplification of High Resolution Urban Surface Data for UAV Path...
Improving access to satellite imagery with Cloud computing
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Looking from Above: Object Detection and Other Computer Vision Tasks on Satel...
Ivan Sahumbaiev "Deep Learning approaches meet 3D data"
Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...
Petrel course Module_1: Import data and management, make simple surfaces
위성이미지 객체 검출 대회 - 2등
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Semantic Mapping of Road Scenes
Tutorial on Object Detection (Faster R-CNN)
Scale surface reconstruction
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
Model compression
Ad

Similar to QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curve (20)

PPTX
T digest-update
PDF
"Demystifying Deep Neural Networks," a Presentation from BDTI
PDF
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
PDF
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
PPTX
CFD Lecture (3/8): Mesh Generation in CFD
PDF
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
PDF
Computational steering Interactive Design-through-Analysis for Simulation Sci...
PDF
RECAP: The Simulation Approach
PPTX
Introduction to CNN Models: DenseNet & MobileNet
PDF
Comparison of Various RCNN techniques for Classification of Object from Image
PDF
Do we need a new standard for visualizing the invisible?
PDF
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
PPTX
ML Workshop 1: A New Architecture for Machine Learning Logistics
PDF
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
PDF
“Removing Weather-related Image Degradation at the Edge,” a Presentation from...
PPTX
Semantic Segmentation on Satellite Imagery
PDF
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
PDF
CVPR 2018 Paper Reading MobileNet V2
PDF
Chainer OpenPOWER developer congress HandsON 20170522_ota
PPTX
Quantum Computing: Timing is Everything
T digest-update
"Demystifying Deep Neural Networks," a Presentation from BDTI
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
CFD Lecture (3/8): Mesh Generation in CFD
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
Computational steering Interactive Design-through-Analysis for Simulation Sci...
RECAP: The Simulation Approach
Introduction to CNN Models: DenseNet & MobileNet
Comparison of Various RCNN techniques for Classification of Object from Image
Do we need a new standard for visualizing the invisible?
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
ML Workshop 1: A New Architecture for Machine Learning Logistics
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
“Removing Weather-related Image Degradation at the Edge,” a Presentation from...
Semantic Segmentation on Satellite Imagery
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
CVPR 2018 Paper Reading MobileNet V2
Chainer OpenPOWER developer congress HandsON 20170522_ota
Quantum Computing: Timing is Everything
Ad

Recently uploaded (20)

PPTX
Transform Your Business with a Software ERP System
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
System and Network Administration Chapter 2
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
medical staffing services at VALiNTRY
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Introduction to Artificial Intelligence
PDF
Nekopoi APK 2025 free lastest update
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
L1 - Introduction to python Backend.pptx
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
Transform Your Business with a Software ERP System
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
System and Network Administration Chapter 2
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
VVF-Customer-Presentation2025-Ver1.9.pptx
ai tools demonstartion for schools and inter college
Which alternative to Crystal Reports is best for small or large businesses.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
medical staffing services at VALiNTRY
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Introduction to Artificial Intelligence
Nekopoi APK 2025 free lastest update
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
CHAPTER 2 - PM Management and IT Context
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
L1 - Introduction to python Backend.pptx
Navsoft: AI-Powered Business Solutions & Custom Software Development
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Upgrade and Innovation Strategies for SAP ERP Customers

QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curve

  • 1. QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew- Tolerant Space-Filling Curve Shoji Nishimura (NEC Corporation, Tokyo Institute of Technology) Haruo Yokota (Tokyo Institute of Technology)
  • 2. 2 © NEC Corporation 2017 Background: Massive Multidimensional Data ▌Multidimensional data in daily life Sales log (e.g. product, region, date of sales) Sensor data (e.g. space, time) … ▌Retrieve part of Big Data for analysis Recent data (Constraint) Data within a range (Aggregation Unit) ▌Multidimensional range query is a fundamental operation for data retrieval Date Product Region
  • 3. 3 © NEC Corporation 2017 Space-Filling Curve Based Multi-dimensional Index Map multi-dimension onto single-dimension Partition data in range by disk page capacity
  • 4. 4 © NEC Corporation 2017 Multidimensional Range Query on SFC based MD-Index Query Region Query Sections Interpreted as Accessed pages
  • 5. 5 © NEC Corporation 2017 Challenge: Find the curve to maximize query performance ▌Query performance: number of accessed pages ▌Mathematical analysis on the optimal curve for any query patterns Locality-preserving [Mokbel et al. CIKM’01, GeoInformatica’03] Clustering number [Moon et al. TKDE’01] [Xu et al. POD’12] ▌How about the optimal curve for prioritized query patterns? NP-hard problem to find the optimal data partitioning considering query patterns and data distribution [Sun et al. SIGMOD’14] Query performance depends on curve
  • 6. 6 © NEC Corporation 2017 Contributions ▌Cohesion-based Cost Model Metric of curve property for query pattern and data distribution ▌Curve Design Method Heuristics to construct effective curves in terms of cohesion-based cost model Input Output Query Patterns Designed Curves Curve Design Method Cohesion-based Cost Model
  • 8. 8 © NEC Corporation 2017 Quiz: Which curve is better for the query region? Using Curve 1 Using Curve 2 Map onto Query Region Curve 1 or Curve 2 ?
  • 9. 9 © NEC Corporation 2017 Answer: Depend on data distribution! Sparse distribution Dense distribution Curve 1: 3 pages Curve 2: 2 pages Curve 1: 1 page Curve 2: 2 pages Need the best curve for any data distribution
  • 10. 10 © NEC Corporation 2017 Curve 1: 1 page Curve 2: 2 pages Curve 1: 3 pages Curve 2: 2 pages Proposal: Cohesion Based Cost Model ▌Total Cohesion Cost: Global × Local Global Cohesion Cost: robust for sparse distribution Local Cohesion Cost: robust for dense distribution
  • 11. 11 © NEC Corporation 2017 Global Cohesion Cost: Robust for sparse distribution 𝐶𝑔𝑙𝑜𝑏𝑎𝑙 = ∑𝑛𝑖 𝑛𝑖 : the i-th gap length Global Cohesion Cost: Sum of gap length Shorter total gap length Higher possibility to be covered within single pages 𝑛1 𝑛2Gap length: 𝑛′1 𝑛′2
  • 12. 12 © NEC Corporation 2017 Local Cohesion Cost: Robust for dense distribution 𝑛𝑖 : the i-th gap length Local Cohesion Cost: Entropy of gap ratio 𝐶𝑙𝑜𝑐𝑎𝑙 = −∑ 𝑛𝑖 ∑𝑛𝑖 log 𝑛𝑖 ∑𝑛𝑖 Smaller number of gaps, Higher variance of the gap length Higher possibility to be covered within smaller number of pages
  • 13. 13 © NEC Corporation 2017 ▌Global Cohesion Cost: Robust for sparse distribution ▌Local Cohesion Cost: Robust for dense distribution ▌Both cohesion cost are small Expect the curve is robust for any distribution Total Cohesion Cost Total Cohesion Cost Evaluate both global and local cohesion cost of query sections = (∑𝑛𝑖)log(∑𝑛𝑖) − ∑ (𝑛𝑖log 𝑛𝑖) 𝐶𝑡𝑜𝑡𝑎𝑙 = 𝐶𝑔𝑙𝑜𝑏𝑎𝑙 × 𝐶𝑙𝑜𝑐𝑎𝑙
  • 15. 15 © NEC Corporation 2017 Goal: Design better curve for target query pattern Design a robust curve for any distribution to maximize performance of target query pattern Input Output Query Patterns Designed Curves Curve Design Method Low total cohesion cost
  • 16. 16 © NEC Corporation 2017 Preliminary Experiments: Total cohesion cost of curves Z-curve C-curve (Composite Index) 2 128 2 128
  • 17. 17 © NEC Corporation 2017 Preliminary Experiments: Total cohesion cost of curves Z-curve C-curve (Composite Index) 16 16 16 16
  • 18. 18 © NEC Corporation 2017 Preliminary Experiments: Total cohesion cost of curves Z-curve C-curve (Composite Index) 2 128 2 128
  • 19. 19 © NEC Corporation 2017 Preliminary Experiments: Total cohesion cost of curves The best for Z-curve The best for C-curve Z-curve C-curve (Composite Index) 16 16 2 128
  • 20. 20 © NEC Corporation 2017 What curve do you use for intermediate query pattern? Square-like Query Pattern Elongate-Rect. Like Query Pattern Z-Curve C-Curve
  • 21. 21 © NEC Corporation 2017 What curve do you use for intermediate query pattern? Square-like Query Pattern Elongate-Rect. Like Query Pattern Z-Curve C-Curve Intermediate Query Pattern
  • 22. 22 © NEC Corporation 2017 What curve do you use for intermediate query pattern? Square-like Query Pattern Elongate-Rect. Like Query Pattern Z-Curve C-Curve Intermediate Query Pattern Hybrid curve between Z-curve and C-curve
  • 23. 23 © NEC Corporation 2017 Observation: Construction of Z-curve and C-curve Z-Curve C-Curve (Composite Index) 0000 0001 0100 0101 0010 0011 0110 0111 1000 1001 1100 1101 1010 1011 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 00 01 10 11 00 01 10 11 00 01 10 11 Bit-interleaving of attributes Bit-concatenation of attributes 00 01 10 11 Generalize the bit-merging order yyxxyxyx
  • 24. 24 © NEC Corporation 2017 Bit-Merging Curve Family: Generalization of Z- and C-Curves Bit ConcatenationBit Interleaving Attribute X preference Attribute Y preference xxxyyy yyyxxxyxyxyx xyxyxy xyxxyy yxyyxx Explosive curve design space
  • 25. 25 © NEC Corporation 2017 Observation: Curve Trajectory Square-like Query Pattern Elongate-Rect. Like Query Pattern Z-Curve C-Curve
  • 26. 26 © NEC Corporation 2017 Observation: Curve Trajectory Square-like Query Pattern Elongate-Rect. Like Query Pattern Curve trajectory is contiguous within query region
  • 27. 27 © NEC Corporation 2017 What hybrid curve is better for intermediate query pattern Square-like Query Pattern Elongate-Rect. Like Query Pattern Intermediate Query Pattern
  • 28. 28 © NEC Corporation 2017 Brute Force Evaluations ▌Query Pattern: 2^3 for X and 2^5 for Y ▌Generate 80 curves and Sort them by total cohesion cost ▌The trajectory-contiguous curves ( ) are top-ranked in total cohesion cost 23 25 Top 25 Curves in Total Cohesion Cost
  • 29. 29 © NEC Corporation 2017 Curve Design Method for Single Query Pattern Design a curve which trajectory is contiguous within query region 1 bits 2 bits X: Y: 21 22 x…x x y…y yy Designed Curves: xxy yyx … xyx yxy yxx xyy
  • 30. 30 © NEC Corporation 2017 Curve Design Method for Multiple Query Patterns Design a curve which trajectory is contiguous within each query pattern Lx bits Ly bits Designed curve Y:X: Uy bitsUx bits Lx + Ly bits Ux + Uy bits 2Lx 2Ux 2Ly 2Uy x xy yyx
  • 31. 31 © NEC Corporation 2017 Curve Design Method for Multiple Query Patterns Design a curve which trajectory is contiguous within each query pattern 1 bits 2 bits Designed curve Y:X: 3 bits2 bits 1 + 2 bits 2 + 3 bits 21 22 22 23 x xy yyx
  • 33. 33 © NEC Corporation 2017 Evaluation on DWH application ▌Data schema: 3 dimensions (Date, Product, Store) ▌Dataset size: 100 Million records ▌Data distribution: Combination of Poisson, Zipf, and Standard distributions ▌Query Patterns: 6 query patterns ▌Comparisons Designed Curve C-Curve, Z-Curve 1 (left-zero-padding), Z-Curve 2 (right-zero-padding), R*-Tree Day Week Month Category Department Region× × Date Product Store
  • 34. 34 © NEC Corporation 2017 Evaluation Results ▌The designed curve shows the best results for all query patterns ▌The designed curve improve performance up to 6 times than the second best curve #ofpageaccesses Query Patterns 1 10 100 1000 day week month day week month category department Proposed Curve C-curve Z-curve 1 Z-curve 2 R*-Tree Day Category Region Week Category Region Month Category Region Day Department Region Week Department Region Month Department Region
  • 35. 35 © NEC Corporation 2017 Evaluation on GIS Application ▌Dataset NYC taxi logs in 2015 150 million records with 2% invalid records ▌Query patterns ▌Comparisons Designed curve Z-curve 1 (longitude-latitude preference), Z-curve 2 (Date-preferece), R*-Tree Hour Day Week Month Date 0.0001 0.001 0.01 0.1 Longitude × Manhattan JFK Airport 0.0001 0.001 0.01 0.1 × Latitude
  • 36. 36 © NEC Corporation 2017 1 10 100 1000 10000 0.0001 0.001 0.01 0.1 0.0001 0.001 0.01 0.1 0.0001 0.001 0.01 0.1 0.0001 0.001 0.01 0.1 hour day week month Proposed curve Z-curve 1 Z-curve 2 R*-Tree Evaluation results (1/2) ▌The designed curve shows the best results for 10 query patterns #ofpageaccesses Query Patterns
  • 37. 37 © NEC Corporation 2017 1 10 100 1000 10000 0.0001 0.001 0.01 0.1 0.0001 0.001 0.01 0.1 0.0001 0.001 0.01 0.1 0.0001 0.001 0.01 0.1 hour day week month Proposed curve Z-curve 1 Z-curve 2 R*-Tree Evaluation results (2/2) ▌The design curve may fail for some query patterns ▌Because single curve could not cover all query patterns Future work #ofpageaccesses Query Patterns
  • 39. 39 © NEC Corporation 2017 Conclusion ▌Propose a multidimensional data partitioning framework Cohesion based cost model to evaluate skew-tolerance property Bit-merging curve family to extend curve design space Curve design method to construct a query-aware and skew-tolerant curve ▌Demonstrate that designed curve improves performance up to 6 times for multiple query patterns on DWH and GIS applications ▌Future work Evaluate other applications that have more attributes Handle the case that single curve cannot cover all query patterns
  • 42. 42 © NEC Corporation 2017 Curve Notation ▌Name curves after the bit-merging order ▌Shorten curve name by run-length compression and grouping parenthesis xyxyxy xyxxyy xyxyxy (xy)3 xyxxyy xyx2y2
  • 43. 43 © NEC Corporation 2017 Corner case of proposed method Lx Ux Ly Uy
  • 44. 44 © NEC Corporation 2017 Curve Design Method Query Patterns Designed Curves Bit-Merging Curve Family Curve Design Method Select from