SlideShare a Scribd company logo
Encoding = (Data Structures) - (Data)
Rajeev Raman
University of Leicester
SPIRE 2015, King’s College London
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
RMQ problem
Problem Statement
Given a static array A[1..n], pre-process A to answer queries:
RMQ(l, r) : return maxl≤i≤r A[i].
43 97 46 85 67 18 4524 8347 33 34
RMQ(5, 10) = 85.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
RMQ problem
Problem Statement
Given a static array A[1..n], pre-process A to answer queries:
RMQ(l, r) : return maxl≤i≤r A[i].
43 97 46 85 67 18 4524 8347 33 34
RMQ(5, 10) = 85.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Data Structuring Problems
This is a data structuring problem.
• Pre-process input data (here array A) to answer long series of
queries.
• Want to minimize:
1. Space usage of data structure.
2. Query time.
3. Time/space for pre-processing.
• In this talk we assume the input data is static i.e. it does not change
between queries.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Solution to RMQ Problem: Cartesian Tree
The Cartesian tree of A [Vuillemin CACM’80] is a binary tree.
43 97 46 33 85 67 18 4524 8347
97
47 85
43
18
45
83
24 67
33
34
34
46
• Place largest value at root of tree.
• Recurse on sub-arrays to left and right.
• RMQ is the lowest common ancestor (LCA) of interval endpoints.
• n-node binary tree can support LCA in O(n) space and O(1) time.
[Harel/Tarjan SICOMP’84]
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Compressing RMQ
• O(n) space = O(n) words = Ω(n lg n) bits1
.
• Many applications where using O(n) words is way too much.
• Suffix tree on a string of n bits occupies O(n) words
• The same is true for many applications of RMQ.
• Can reconstruct A by asking RMQ(i, i) queries.
• In general A can’t be compressed below Ω(n lg n) bits.
• In specific applications (e.g. LCP array), A can be compressed, but
then accessing A[i] is slow.
Can we do better?
1lg = log2.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
The RMQ Problem Redefined
Given a static array A[1..n], pre-process A to answer queries:
RMQ(l, r) = arg max
l≤i≤r
A[i]
.
43 97 46 85 67 18 4524 8347 33 34
RMQ(5, 10) = 8.
Often the value of A[i] is not needed.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
The RMQ Problem Redefined
Given a static array A[1..n], pre-process A to answer queries:
RMQ(l, r) = arg max
l≤i≤r
A[i]
.
43 97 46 85 67 18 4524 8347 33 34
RMQ(5, 10) = 8.
Often the value of A[i] is not needed.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding RMQ
RMQ(l, r) = arg max
l≤i≤r
A[i]
.
3
1 8
2
10
12
11
4 9
6
7
5
• Shape of Cartesian tree is enough to answer modified RMQ queries.
• A is not necessary!
• There are ≤ 4n
distinct binary trees on n nodes.
• Shape can be encoded in ≤ lg 4n
= 2n bits.
• Concrete encoding: 11 01 00 11 11 00 10 00 11 01 00 00.
• Data structures using 2n + o(n) bits, O(1) query time.
[Fischer/Heun SICOMP’11],[Davoodi et al. COCOON’12].
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Data Structures
QUERY
RESULT
INPUT
PREPROC
Encoding
• Preprocess input data to answer a long series of queries.
• Preprocessing creates an encoding and deletes input.
Encodings = (Data Structures) − (Data)
• Queries only read encoding.
• Minimize: encoding size and query time.
• Non-trivial encodings must be smaller than original input data.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding: Effective Entropy
Encoding ≡ determining effective entropy.
• Extensive literature on succinct and compressed data structures.
• Entropy: “information content of data.”
• Effective Entropy is “the information content of the data structure”
[Golin et al. TCS]:
• Given a set of objects S, a set of queries Q.
• Let C be the equivalence class on S induced by Q (x, y ∈ S are
equivalent if they cannot be distinguished by queries in Q).
A B
1 3 2 2 3 1
Arrays A and B cannot be distinguished by RMQ queries.
• We want to store x in lg |C| bits.
• Can define expected effective entropy as well.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Overview of Talk
• Overview of recent encoding results.
• Asymptotically optimal encodings
• Range Top-k [Grossi et al. ESA’13, Gawrychowski/Nicholson
ICALP’15]
• 2D Range Maximum [Brodal et al. Algor.’12][Brodal et al. ESA’13]
item Range Majority [Navarro/Thankachan CPM’14]
• Range Selection [Navarro et al. FSTTCS’14, GN ICALP’15]
• Range Maximum Sum Query [Nicholson/Gawrychowski, CPM ’15]
• 2D NLVs [Jo et al. WALCOM’15]
• Nondirectional NLV [Nicholson/Raman, CPM ’15]
• NLV + Range Max/Min [Jo/Satti, COCOON ’15]
• Minimal encodings
• RMQs [Fischer/Heun, SICOMP’11][Davoodi et al. PTRS-A ’14]
• Range Second Maximum [Davoodi et al. PTRS-A ’14]
• Bidirectional NLVs [Fischer, TCS’11]
• Range Min-Max [Gawrychowski/Nicholson, ICALP ’15]
• 2D Range Maximum, m = 2 [Golin et al. TCS]
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Nearest Larger Values (NLV)
Problem Definition
Given array A[1..n] of distinct values, encode A to answer
NLV(i): return i s.t. A[j] > A[i] and |j − i| is minimized.
9 11 2 0 1 8 56 410 7 3
NLV(6) = 3
• Can obtain NLVs in both directions from Cartesian tree:
• Unfortunately, NLVs in both directions ≡ RMQ.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Unidirectional NLVs
NLV(i): return j s.t. A[j] > A[i] and |j − i| is minimized.
• Can we modify the Cartesian tree?
• Eliminate zig-zags!
• How many binary trees with no zig-zags of degree-1 nodes?
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Counting Zig-Zag Free Binary Trees [Iacono]
• Change the encoding of degree-1 nodes:
01
1010
01
• Any encoding is a string over A = 01, B = 10, C = 00, D = 11.
• AA does not appear in the string.
• Number of strings of length n, S(n) satisfies:
S(n) = 3S(n − 1)
B,C,D
+ 3S(n − 2)
AB,AC,AD
• Gives log S(n) ∼ n · log((3 +
√
21)/2) ∼ 1.93n < 2n bits.
• Adding forbidden patterns AB∗
A gets ∼ 1.8999n bits.
• Easy to support operations.
• Same result obtained using a succinct Patricia trie, and much
optimization [Nicholson/Raman, CPM’15].
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
What’s the exact bound?
• Upper bound ∼ 1.89n.
• Lower bound by exhaustive enumeration ∼ 1.31n.
• Number of distinguishable configurations (equivalence classes):
n 1 2 3 4 5 6 7 8 9 10
# configurations 1 2 5 14 40 116 341 1010 3009 9012
This sequence is not in oeis.org.
• Counting up to n = 40 suggests rate of growth nO(1)
3n
giving
∼ n log 3 = 1.58n bits. [Hoffmann, personal communication.]
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection
Problem Definition
Given A[1..n] and κ, encode A to answer the query:
select(k, l, r): return the position of the k-th largest value in A[l..r], for
any k ≤ κ.
• Non-encoding results by many authors including [Brodal and
Jørgensen, ISAAC’09] [Jørgensen/Larsen, SODA’11],
[Chan/Wilkinson, SODA’13].
• O(n log n) bits, O(lg k/ lg lg n) time [CW SODA’13], optimal time
for n(lg n)O(1)
bits of space [JL SODA’11].
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Lower Bound on Encoding Size
Proposition
Any encoding for range selection must take Ω(n lg κ) bits.
Proof: The index can encode n/κ independent permutations over κ
elements ⇒ Ω((n/κ) · κ lg κ) bits = Ω(n lg κ) bits.
For example (κ = 3).
A = 3 1 2 2 3 1 1 2 3 · · ·
Can trivially recover A from its encoding.
select(2, 4, 6) = 4 ⇒ A[4] = 2.
κ must be known at construction time.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection [GN ’15]
Consider the 1-sided case: all queries of the form select(k, l, n). Example
assumes κ = 3.
0 9 3 4 2 5 6 8 1
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection [GN ’15]
Consider the 1-sided case: all queries of the form select(k, l, n). Example
assumes κ = 3.
0 9 3 4 2 5 6 8 1
8 0 4 3 3 2 1 0 0
• For each i, count # values to right that are greater.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection [GN ’15]
Consider the 1-sided case: all queries of the form select(k, l, n). Example
assumes κ = 3.
0 9 3 4 2 5 6 8 1
3 0 3 3 3 2 1 0 0
• For each i, count # values to right that are greater.
• Cap all values to κ.
• Claim: we know the sorted order among all positions with counts
< κ.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection [GN ’15]
Consider the 1-sided case: all queries of the form select(k, l, n). Example
assumes κ = 3.
0 9 3 4 2 5 6 8 1
3 0 3 3 3 2 1 0 0
• For each i, count # values to right that are greater.
• Cap all values to κ.
• Claim: we know the sorted order among all positions with counts
< κ.
• Positions = κ are never the answer to a select(k, l, n) query.
• We can answer select(k, l, n) queries using these counts which
occupy n log(κ + 1) bits.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection [GN ’15]
Extend to the general 2-sided case.
• Let Sr be the 1-sided encoding for A[1 . . . r], for r = 1, . . . , n.
• Sr answers all queries of form select(k, l, r).
• δr+1 = # counts in encoding of Sr that are incremented to get Sr+1.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection [GN ’15]
Extend to the general 2-sided case.
• Let Sr be the 1-sided encoding for A[1 . . . r], for r = 1, . . . , n.
• Sr answers all queries of form select(k, l, r).
• δr+1 = # counts in encoding of Sr that are incremented to get Sr+1.
• Example, κ = 3, δ10 = 3:
0 9 3 4 2 5 6 8 1 7
3 0 3 3 3 3 2 0 1 0
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection [GN ’15]
Extend to the general 2-sided case.
• Let Sr be the 1-sided encoding for A[1 . . . r], for r = 1, . . . , n.
• Sr answers all queries of form select(k, l, r).
• δr+1 = # counts in encoding of Sr that are incremented to get Sr+1.
• Example, κ = 3, δ10 = 3:
0 9 3 4 2 5 6 8 1 7
3 0 3 3 3 3 2 0 1 0
• Knowing δr+1 suffices to get Sr+1 from Sr .
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection [GN ’15]
Extend to the general 2-sided case.
• Let Sr be the 1-sided encoding for A[1 . . . r], for r = 1, . . . , n.
• Sr answers all queries of form select(k, l, r).
• δr+1 = # counts in encoding of Sr that are incremented to get Sr+1.
• Example, κ = 3, δ10 = 3:
0 9 3 4 2 5 6 8 1 7
3 0 3 3 3 3 2 0 1 0
• Knowing δr+1 suffices to get Sr+1 from Sr .
• Z = 0δ1
10δ2
1 . . . 0δn
1 is an encoding of all S1, . . . , Sn.
• Z has at most κn 0s and n 1s: there are ≤ (κ+1)n
n distinct Z’s.
• Encoding of size lg (κ+1)n
n
∼ n lg(κ + 1) + n lg e bits. This is
essentially optimal!
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection [GN ’15]
Extend to the general 2-sided case.
• Let Sr be the 1-sided encoding for A[1 . . . r], for r = 1, . . . , n.
• Sr answers all queries of form select(k, l, r).
• δr+1 = # counts in encoding of Sr that are incremented to get Sr+1.
• Example, κ = 3, δ10 = 3:
0 9 3 4 2 5 6 8 1 7
3 0 3 3 3 3 2 0 1 0
• Knowing δr+1 suffices to get Sr+1 from Sr .
• Z = 0δ1
10δ2
1 . . . 0δn
1 is an encoding of all S1, . . . , Sn.
• Z has at most κn 0s and n 1s: there are ≤ (κ+1)n
n distinct Z’s.
• Encoding of size lg (κ+1)n
n
∼ n lg(κ + 1) + n lg e bits. This is
essentially optimal!
• Query time: O(κ6
(log n)2+
) vs. O(log k/ log log n).
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection: Fast DS
• View A geometrically in 2D:
A[i] = y ⇒ (i, y).
• Use idea of shallow cutting for
top-k [JL SODA’11].
• Take set of n given points and
decompose into O(n/κ) slabs
each containing O(κ) points
such that:
• For any 2-sided query
select(l, r) ∃ slab such that it
and two other adjacent slabs
contain the top κ elements
in A[l..r].
000
111
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding Range Selection: Fast DS
Create κ-shallow cutting. For O(κ) points in each slab, store range
selection DS: O(κ lg κ) bits, or O(n lg κ) bits (asymptotically optimal).
1. Find resolving slab for given query [Grossi et al. ESA 13].
2. Use slab’s range selection data structure to answer query.
• Slab’s points are numbered 1..O(κ), input query and answer are in
1..n.
• Storing global coordinates of points in a slab takes O(κ lg n) bits per
slab or O(n lg n) bits overall.
3. Develop a representation of slabs which can space-efficiently:
3.1 in O(lg κ/ lg lg n) time, perform predecessor search for l and r among
x coordinates in a slab.
• Map query range to range among slab’s points.
3.2 in O(1) time, retrieve the i-th largest x-coordinate in the slab.
• Convert answer back to “global” coordinates.
Theorem [Navarro et al. FSTTCS’14]
There is an encoding using O(n lg κ) bits of space and supports range
selection in O(lg k/ lg lg n) time.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
2D NLV
Problem Statement
Given an n × n matrix A, preprocess to answer:
NLV(p) : if p = (i, j), return q = (i , j ) s.t. A[q] > A[p] and
|p − q|1 = |i − i | + |j − j | is minimized.
0
1
2
3
4
5
0 1 2 3 4 5
If elements of A are distinct, explicitly store pointers (length i pointer in
O(lg i) bits), overall O(n2
) bits. [Jaypaul et al. IWOCA’14] Jaypaul et al.
gave O(n2
lg lg n) bit encoding.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
2D NLV
Problem Statement
Given an n × n matrix A, preprocess to answer:
NLV(p) : if p = (i, j), return q = (i , j ) s.t. A[q] > A[p] and
|p − q|1 = |i − i | + |j − j | is minimized.
0
1
2
3
4
5
0 1 2 3 4 5
Can’t point directly to answer when elements of A are non-distinct: this
requires Ω(n2
lg n) bits, which is uninteresting.
Jaypaul et al. gave O(n2
lg lg n) bit encoding.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Encoding 2D NLV
Theorem [Jo et al. WALCOM’15]
There is an encoding of NLVs of a 2D matrix A that uses O(n2
) bits and
answers queries in O(1) time, even when elements of A are not distinct.
• Encoding idea is simple:
• Suppose wlog that NLV(p) = q is to the right and above p. If there
is a position p to the right of p in p’s row but not to the right of q,
then p points to p . Else, look for p above p in column. If neither
p nor p exist then point to q.
• 1D NLV problem closely related to RMQ problem.
• Encoding 2D-RMQ requires Ω(n2
lg n) bits [Demaine et al.
ICALP’09].
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Minimal Encodings
1. Pre-process given data to obtain encoding E, discard input.
2. E should precisely characterize the query – # distinct Es should
equal # distinguishable data instances using the query (|C|).
3. Create succinct DS on E, using lg |C|(1 + o(1)) bits. Second
pre-processing should not access input.
0000
0000000000000000
0000000000000000
00000000000000000000
0000
1111
1111111111111111
1111111111111111
11111111111111111111
1111
INPUT
QUERY
RESULT
PREPROC
Encoding
PREPROCDS
Advantages
• Optimal space.
• Only information in DS is what can be obtained from queries.
• “Minimal-knowledge” data structures: contain only information
strictly necessary to answer queries.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Minimal Encodings for RMQ
Problem Definition
Given A[1..n], preprocess to answer:
RMQ(l, r) : return arg maxl≤i≤r A[i].
3
1 8
2
10
12
11
4 9
6
7
5
• Shape of Cartesian tree precisely describes all possible RMQs.
[Fischer, Heun, SICOMP’11].
• Pre-process A, output Cartesian tree, delete A.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Minimal Encodings for R2MQ
Problem Definition
Given A[1..n], encode A to answer:
R2MQ(l, r): return arg maxi∈{l,...,r}−RMQ(l,r) A[i].
[10]
[1] [6]
[1]
[1]
[1]
[3]
[1] [1]
[1]
[1]
[3]
• Need to merge inner spines of Cartesian tree.
• Precisely described by “extended Cartesian tree”.
• Space needed is asymptotically ∼ 2.76n bits [Gawrychowski and
Nicholson, ICALP’15].
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Minimal Encodings for the Bidirectional NLV Problem
Problem Definition
Given A[1..n], encode A to answer:
BNLV(i): return j > i such that A[j] > A[i] and j − i is minimized,
and j < i such that A[j ] > A[i] and i − j is minimized.
3 7 2 4 4 8 54 34 4 3
• When A has distinct values, this is just
Cartesian trees.
• When A has equal values, described by a
subclass of Schr¨oder trees [Fischer, TCS’11].
• Number of n-node Schr¨oder trees is
≤ (3 + 2
√
2)n
< 22.54n
.
• Encoding using < 2.54n bits.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Minimal Encodings for Range Min-Max Queries
Problem Definition
Given A[1..n], encode A to answer:
Range-Min-Max(l, r): return both arg maxi∈{l,...,r} A[i] and
arg mini∈{l,...,r} A[i].
Minimal encoding by [Gawrychowski and Nicholson, ICALP’15]:
• Precisely characterized by Baxter permutations.
• Do not exist 1 ≤ l < i < r ≤ n such that:
π(i + 1) < π(l) < π(r) < π(i) (2 − 41 − 3)
or
π(i) < π(r) < π(l) < π(i + 1) (3 − 14 − 2)
• If A is a Baxter permutation, it can be recovered using
Range-Min-Max queries.
• Number of Baxter permutations on [n] = 23n
/nO(1)
, gives
3n − O(lg n) encoding size.
Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions
Conclusions and Open Problems
Conclusions:
• Introduced the notion of encoding DS.
• Minimal encodings are combinatorially interesting and have good
privacy properties.
Wide range of open problems:
• Challenging data structuring open problems:
• Asymptotically optimal 2D RMQ encoding of [Brodal et al. ESA’13]
does not support efficient 2D RMQ queries.
• Optimal top-k encoding of [Gawrychowski and Nicholson ICALP’15]
does not support efficient queries.
• Determining minimal encodings for a number of problems.
• Pre-processing time — ideally want O(n) time preprocessing.
• Apply encoding DS to reducing the space usage of “normal” DS. [cf.
Chan and Wilkinson, SODA’13]

More Related Content

PDF
Log Analytics in Datacenter with Apache Spark and Machine Learning
PPTX
On clusteredsteinertree slide-ver 1.1
PPTX
Iwsm2014 an analogy-based approach to estimation of software development ef...
PPT
Stacks queues lists
PDF
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
PDF
Gradient Estimation Using Stochastic Computation Graphs
PDF
Parallel Optimization in Machine Learning
PDF
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
Log Analytics in Datacenter with Apache Spark and Machine Learning
On clusteredsteinertree slide-ver 1.1
Iwsm2014 an analogy-based approach to estimation of software development ef...
Stacks queues lists
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
Gradient Estimation Using Stochastic Computation Graphs
Parallel Optimization in Machine Learning
Comparative Analysis of Algorithms for Single Source Shortest Path Problem

What's hot (15)

PDF
Hyperparameter optimization with approximate gradient
PDF
Chromatic Sparse Learning
PDF
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
PPTX
Assignment of pseudo code
PDF
High-Performance Graph Analysis and Modeling
PDF
Skiena algorithm 2007 lecture16 introduction to dynamic programming
PPTX
Design and Analysis of Algorithms
PDF
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
PPTX
Scalable k-means plus plus
PDF
Paper Study: Melding the data decision pipeline
PDF
Paper study: Learning to solve circuit sat
PDF
CSMR11b.ppt
PPT
Branch and bound
PDF
Paper study: Attention, learn to solve routing problems!
PDF
Linear Discriminant Analysis and Its Generalization
Hyperparameter optimization with approximate gradient
Chromatic Sparse Learning
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
Assignment of pseudo code
High-Performance Graph Analysis and Modeling
Skiena algorithm 2007 lecture16 introduction to dynamic programming
Design and Analysis of Algorithms
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Scalable k-means plus plus
Paper Study: Melding the data decision pipeline
Paper study: Learning to solve circuit sat
CSMR11b.ppt
Branch and bound
Paper study: Attention, learn to solve routing problems!
Linear Discriminant Analysis and Its Generalization
Ad

Viewers also liked (20)

PPTX
Comparing the Multimodal Interaction Technique Design of MINT with NiMMiT
PPT
Lca seminar modified
PDF
Integrating Lucene into a Transactional XML Database
PDF
Xml databases
PDF
Gesture-aware remote controls: guidelines and interaction techniques
DOCX
Algorithms and Data Structures~hmftj
PDF
XML In My Database!
PPTX
[DL Hacks輪読] Semi-Supervised Learning with Ladder Networks (NIPS2015)
PDF
Singletons in PHP - Why they are bad and how you can eliminate them from your...
PPT
XML Databases
PDF
Masterizing php data structure 102
PDF
Building and deploying PHP applications with Phing
PPTX
Semi supervised learning
PDF
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
PDF
PHP 7 – What changed internally? (PHP Barcelona 2015)
PPSX
Biomolecular interaction analysis (BIA) techniques
PDF
HTTP cookie hijacking in the wild: security and privacy implications
PDF
LCA and RMQ ~簡潔もあるよ!~
PDF
Cookies and browser exploits
PPTX
Semi-Supervised Learning
Comparing the Multimodal Interaction Technique Design of MINT with NiMMiT
Lca seminar modified
Integrating Lucene into a Transactional XML Database
Xml databases
Gesture-aware remote controls: guidelines and interaction techniques
Algorithms and Data Structures~hmftj
XML In My Database!
[DL Hacks輪読] Semi-Supervised Learning with Ladder Networks (NIPS2015)
Singletons in PHP - Why they are bad and how you can eliminate them from your...
XML Databases
Masterizing php data structure 102
Building and deploying PHP applications with Phing
Semi supervised learning
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
PHP 7 – What changed internally? (PHP Barcelona 2015)
Biomolecular interaction analysis (BIA) techniques
HTTP cookie hijacking in the wild: security and privacy implications
LCA and RMQ ~簡潔もあるよ!~
Cookies and browser exploits
Semi-Supervised Learning
Ad

Similar to Encoding survey (20)

PPT
Analysis design and analysis of algorithms ppt
PPT
algorithm and Analysis daa unit 2 aktu.ppt
PPT
Counting Sort Lowerbound
PDF
Data structure-question-bank
PPTX
Introduction to data structures using c/c++.pptx
PPTX
asymptotic analysis and insertion sort analysis
PPTX
19. algorithms and-complexity
PPT
Counting sort(Non Comparison Sort)
PDF
Linear sorting
PDF
Alg_Wks1_2.pdflklokjbhvkv jv .v.vk.hk kv h/k
PDF
DSJ_Unit I & II.pdf
PPT
Analysis.ppt
PPTX
Asymptotics 140510003721-phpapp02
PPTX
Asymptotic Notations.pptx
PPTX
Efficient anomaly detection via matrix sketching
PPT
lecture 9
PDF
Data Structure & Algorithms - Introduction
PPTX
datamining-lect8b-amachinelearninhapproach.pptx
PPTX
DS Unit-1.pptx very easy to understand..
Analysis design and analysis of algorithms ppt
algorithm and Analysis daa unit 2 aktu.ppt
Counting Sort Lowerbound
Data structure-question-bank
Introduction to data structures using c/c++.pptx
asymptotic analysis and insertion sort analysis
19. algorithms and-complexity
Counting sort(Non Comparison Sort)
Linear sorting
Alg_Wks1_2.pdflklokjbhvkv jv .v.vk.hk kv h/k
DSJ_Unit I & II.pdf
Analysis.ppt
Asymptotics 140510003721-phpapp02
Asymptotic Notations.pptx
Efficient anomaly detection via matrix sketching
lecture 9
Data Structure & Algorithms - Introduction
datamining-lect8b-amachinelearninhapproach.pptx
DS Unit-1.pptx very easy to understand..

Recently uploaded (20)

PDF
The scientific heritage No 166 (166) (2025)
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
BIOMOLECULES PPT........................
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
The scientific heritage No 166 (166) (2025)
Taita Taveta Laboratory Technician Workshop Presentation.pptx
INTRODUCTION TO EVS | Concept of sustainability
Biophysics 2.pdffffffffffffffffffffffffff
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
BIOMOLECULES PPT........................
neck nodes and dissection types and lymph nodes levels
Cell Membrane: Structure, Composition & Functions
POSITIONING IN OPERATION THEATRE ROOM.ppt
Placing the Near-Earth Object Impact Probability in Context
ECG_Course_Presentation د.محمد صقران ppt
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
7. General Toxicologyfor clinical phrmacy.pptx
2. Earth - The Living Planet Module 2ELS
Derivatives of integument scales, beaks, horns,.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
microscope-Lecturecjchchchchcuvuvhc.pptx

Encoding survey

  • 1. Encoding = (Data Structures) - (Data) Rajeev Raman University of Leicester SPIRE 2015, King’s College London
  • 2. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions RMQ problem Problem Statement Given a static array A[1..n], pre-process A to answer queries: RMQ(l, r) : return maxl≤i≤r A[i]. 43 97 46 85 67 18 4524 8347 33 34 RMQ(5, 10) = 85.
  • 3. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions RMQ problem Problem Statement Given a static array A[1..n], pre-process A to answer queries: RMQ(l, r) : return maxl≤i≤r A[i]. 43 97 46 85 67 18 4524 8347 33 34 RMQ(5, 10) = 85.
  • 4. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Data Structuring Problems This is a data structuring problem. • Pre-process input data (here array A) to answer long series of queries. • Want to minimize: 1. Space usage of data structure. 2. Query time. 3. Time/space for pre-processing. • In this talk we assume the input data is static i.e. it does not change between queries.
  • 5. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Solution to RMQ Problem: Cartesian Tree The Cartesian tree of A [Vuillemin CACM’80] is a binary tree. 43 97 46 33 85 67 18 4524 8347 97 47 85 43 18 45 83 24 67 33 34 34 46 • Place largest value at root of tree. • Recurse on sub-arrays to left and right. • RMQ is the lowest common ancestor (LCA) of interval endpoints. • n-node binary tree can support LCA in O(n) space and O(1) time. [Harel/Tarjan SICOMP’84]
  • 6. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Compressing RMQ • O(n) space = O(n) words = Ω(n lg n) bits1 . • Many applications where using O(n) words is way too much. • Suffix tree on a string of n bits occupies O(n) words • The same is true for many applications of RMQ. • Can reconstruct A by asking RMQ(i, i) queries. • In general A can’t be compressed below Ω(n lg n) bits. • In specific applications (e.g. LCP array), A can be compressed, but then accessing A[i] is slow. Can we do better? 1lg = log2.
  • 7. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions The RMQ Problem Redefined Given a static array A[1..n], pre-process A to answer queries: RMQ(l, r) = arg max l≤i≤r A[i] . 43 97 46 85 67 18 4524 8347 33 34 RMQ(5, 10) = 8. Often the value of A[i] is not needed.
  • 8. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions The RMQ Problem Redefined Given a static array A[1..n], pre-process A to answer queries: RMQ(l, r) = arg max l≤i≤r A[i] . 43 97 46 85 67 18 4524 8347 33 34 RMQ(5, 10) = 8. Often the value of A[i] is not needed.
  • 9. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding RMQ RMQ(l, r) = arg max l≤i≤r A[i] . 3 1 8 2 10 12 11 4 9 6 7 5 • Shape of Cartesian tree is enough to answer modified RMQ queries. • A is not necessary! • There are ≤ 4n distinct binary trees on n nodes. • Shape can be encoded in ≤ lg 4n = 2n bits. • Concrete encoding: 11 01 00 11 11 00 10 00 11 01 00 00. • Data structures using 2n + o(n) bits, O(1) query time. [Fischer/Heun SICOMP’11],[Davoodi et al. COCOON’12].
  • 10. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Data Structures QUERY RESULT INPUT PREPROC Encoding • Preprocess input data to answer a long series of queries. • Preprocessing creates an encoding and deletes input. Encodings = (Data Structures) − (Data) • Queries only read encoding. • Minimize: encoding size and query time. • Non-trivial encodings must be smaller than original input data.
  • 11. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding: Effective Entropy Encoding ≡ determining effective entropy. • Extensive literature on succinct and compressed data structures. • Entropy: “information content of data.” • Effective Entropy is “the information content of the data structure” [Golin et al. TCS]: • Given a set of objects S, a set of queries Q. • Let C be the equivalence class on S induced by Q (x, y ∈ S are equivalent if they cannot be distinguished by queries in Q). A B 1 3 2 2 3 1 Arrays A and B cannot be distinguished by RMQ queries. • We want to store x in lg |C| bits. • Can define expected effective entropy as well.
  • 12. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Overview of Talk • Overview of recent encoding results. • Asymptotically optimal encodings • Range Top-k [Grossi et al. ESA’13, Gawrychowski/Nicholson ICALP’15] • 2D Range Maximum [Brodal et al. Algor.’12][Brodal et al. ESA’13] item Range Majority [Navarro/Thankachan CPM’14] • Range Selection [Navarro et al. FSTTCS’14, GN ICALP’15] • Range Maximum Sum Query [Nicholson/Gawrychowski, CPM ’15] • 2D NLVs [Jo et al. WALCOM’15] • Nondirectional NLV [Nicholson/Raman, CPM ’15] • NLV + Range Max/Min [Jo/Satti, COCOON ’15] • Minimal encodings • RMQs [Fischer/Heun, SICOMP’11][Davoodi et al. PTRS-A ’14] • Range Second Maximum [Davoodi et al. PTRS-A ’14] • Bidirectional NLVs [Fischer, TCS’11] • Range Min-Max [Gawrychowski/Nicholson, ICALP ’15] • 2D Range Maximum, m = 2 [Golin et al. TCS]
  • 13. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Nearest Larger Values (NLV) Problem Definition Given array A[1..n] of distinct values, encode A to answer NLV(i): return i s.t. A[j] > A[i] and |j − i| is minimized. 9 11 2 0 1 8 56 410 7 3 NLV(6) = 3 • Can obtain NLVs in both directions from Cartesian tree: • Unfortunately, NLVs in both directions ≡ RMQ.
  • 14. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Unidirectional NLVs NLV(i): return j s.t. A[j] > A[i] and |j − i| is minimized. • Can we modify the Cartesian tree? • Eliminate zig-zags! • How many binary trees with no zig-zags of degree-1 nodes?
  • 15. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Counting Zig-Zag Free Binary Trees [Iacono] • Change the encoding of degree-1 nodes: 01 1010 01 • Any encoding is a string over A = 01, B = 10, C = 00, D = 11. • AA does not appear in the string. • Number of strings of length n, S(n) satisfies: S(n) = 3S(n − 1) B,C,D + 3S(n − 2) AB,AC,AD • Gives log S(n) ∼ n · log((3 + √ 21)/2) ∼ 1.93n < 2n bits. • Adding forbidden patterns AB∗ A gets ∼ 1.8999n bits. • Easy to support operations. • Same result obtained using a succinct Patricia trie, and much optimization [Nicholson/Raman, CPM’15].
  • 16. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions What’s the exact bound? • Upper bound ∼ 1.89n. • Lower bound by exhaustive enumeration ∼ 1.31n. • Number of distinguishable configurations (equivalence classes): n 1 2 3 4 5 6 7 8 9 10 # configurations 1 2 5 14 40 116 341 1010 3009 9012 This sequence is not in oeis.org. • Counting up to n = 40 suggests rate of growth nO(1) 3n giving ∼ n log 3 = 1.58n bits. [Hoffmann, personal communication.]
  • 17. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection Problem Definition Given A[1..n] and κ, encode A to answer the query: select(k, l, r): return the position of the k-th largest value in A[l..r], for any k ≤ κ. • Non-encoding results by many authors including [Brodal and Jørgensen, ISAAC’09] [Jørgensen/Larsen, SODA’11], [Chan/Wilkinson, SODA’13]. • O(n log n) bits, O(lg k/ lg lg n) time [CW SODA’13], optimal time for n(lg n)O(1) bits of space [JL SODA’11].
  • 18. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Lower Bound on Encoding Size Proposition Any encoding for range selection must take Ω(n lg κ) bits. Proof: The index can encode n/κ independent permutations over κ elements ⇒ Ω((n/κ) · κ lg κ) bits = Ω(n lg κ) bits. For example (κ = 3). A = 3 1 2 2 3 1 1 2 3 · · · Can trivially recover A from its encoding. select(2, 4, 6) = 4 ⇒ A[4] = 2. κ must be known at construction time.
  • 19. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection [GN ’15] Consider the 1-sided case: all queries of the form select(k, l, n). Example assumes κ = 3. 0 9 3 4 2 5 6 8 1
  • 20. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection [GN ’15] Consider the 1-sided case: all queries of the form select(k, l, n). Example assumes κ = 3. 0 9 3 4 2 5 6 8 1 8 0 4 3 3 2 1 0 0 • For each i, count # values to right that are greater.
  • 21. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection [GN ’15] Consider the 1-sided case: all queries of the form select(k, l, n). Example assumes κ = 3. 0 9 3 4 2 5 6 8 1 3 0 3 3 3 2 1 0 0 • For each i, count # values to right that are greater. • Cap all values to κ. • Claim: we know the sorted order among all positions with counts < κ.
  • 22. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection [GN ’15] Consider the 1-sided case: all queries of the form select(k, l, n). Example assumes κ = 3. 0 9 3 4 2 5 6 8 1 3 0 3 3 3 2 1 0 0 • For each i, count # values to right that are greater. • Cap all values to κ. • Claim: we know the sorted order among all positions with counts < κ. • Positions = κ are never the answer to a select(k, l, n) query. • We can answer select(k, l, n) queries using these counts which occupy n log(κ + 1) bits.
  • 23. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection [GN ’15] Extend to the general 2-sided case. • Let Sr be the 1-sided encoding for A[1 . . . r], for r = 1, . . . , n. • Sr answers all queries of form select(k, l, r). • δr+1 = # counts in encoding of Sr that are incremented to get Sr+1.
  • 24. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection [GN ’15] Extend to the general 2-sided case. • Let Sr be the 1-sided encoding for A[1 . . . r], for r = 1, . . . , n. • Sr answers all queries of form select(k, l, r). • δr+1 = # counts in encoding of Sr that are incremented to get Sr+1. • Example, κ = 3, δ10 = 3: 0 9 3 4 2 5 6 8 1 7 3 0 3 3 3 3 2 0 1 0
  • 25. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection [GN ’15] Extend to the general 2-sided case. • Let Sr be the 1-sided encoding for A[1 . . . r], for r = 1, . . . , n. • Sr answers all queries of form select(k, l, r). • δr+1 = # counts in encoding of Sr that are incremented to get Sr+1. • Example, κ = 3, δ10 = 3: 0 9 3 4 2 5 6 8 1 7 3 0 3 3 3 3 2 0 1 0 • Knowing δr+1 suffices to get Sr+1 from Sr .
  • 26. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection [GN ’15] Extend to the general 2-sided case. • Let Sr be the 1-sided encoding for A[1 . . . r], for r = 1, . . . , n. • Sr answers all queries of form select(k, l, r). • δr+1 = # counts in encoding of Sr that are incremented to get Sr+1. • Example, κ = 3, δ10 = 3: 0 9 3 4 2 5 6 8 1 7 3 0 3 3 3 3 2 0 1 0 • Knowing δr+1 suffices to get Sr+1 from Sr . • Z = 0δ1 10δ2 1 . . . 0δn 1 is an encoding of all S1, . . . , Sn. • Z has at most κn 0s and n 1s: there are ≤ (κ+1)n n distinct Z’s. • Encoding of size lg (κ+1)n n ∼ n lg(κ + 1) + n lg e bits. This is essentially optimal!
  • 27. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection [GN ’15] Extend to the general 2-sided case. • Let Sr be the 1-sided encoding for A[1 . . . r], for r = 1, . . . , n. • Sr answers all queries of form select(k, l, r). • δr+1 = # counts in encoding of Sr that are incremented to get Sr+1. • Example, κ = 3, δ10 = 3: 0 9 3 4 2 5 6 8 1 7 3 0 3 3 3 3 2 0 1 0 • Knowing δr+1 suffices to get Sr+1 from Sr . • Z = 0δ1 10δ2 1 . . . 0δn 1 is an encoding of all S1, . . . , Sn. • Z has at most κn 0s and n 1s: there are ≤ (κ+1)n n distinct Z’s. • Encoding of size lg (κ+1)n n ∼ n lg(κ + 1) + n lg e bits. This is essentially optimal! • Query time: O(κ6 (log n)2+ ) vs. O(log k/ log log n).
  • 28. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection: Fast DS • View A geometrically in 2D: A[i] = y ⇒ (i, y). • Use idea of shallow cutting for top-k [JL SODA’11]. • Take set of n given points and decompose into O(n/κ) slabs each containing O(κ) points such that: • For any 2-sided query select(l, r) ∃ slab such that it and two other adjacent slabs contain the top κ elements in A[l..r]. 000 111
  • 29. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding Range Selection: Fast DS Create κ-shallow cutting. For O(κ) points in each slab, store range selection DS: O(κ lg κ) bits, or O(n lg κ) bits (asymptotically optimal). 1. Find resolving slab for given query [Grossi et al. ESA 13]. 2. Use slab’s range selection data structure to answer query. • Slab’s points are numbered 1..O(κ), input query and answer are in 1..n. • Storing global coordinates of points in a slab takes O(κ lg n) bits per slab or O(n lg n) bits overall. 3. Develop a representation of slabs which can space-efficiently: 3.1 in O(lg κ/ lg lg n) time, perform predecessor search for l and r among x coordinates in a slab. • Map query range to range among slab’s points. 3.2 in O(1) time, retrieve the i-th largest x-coordinate in the slab. • Convert answer back to “global” coordinates. Theorem [Navarro et al. FSTTCS’14] There is an encoding using O(n lg κ) bits of space and supports range selection in O(lg k/ lg lg n) time.
  • 30. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions 2D NLV Problem Statement Given an n × n matrix A, preprocess to answer: NLV(p) : if p = (i, j), return q = (i , j ) s.t. A[q] > A[p] and |p − q|1 = |i − i | + |j − j | is minimized. 0 1 2 3 4 5 0 1 2 3 4 5 If elements of A are distinct, explicitly store pointers (length i pointer in O(lg i) bits), overall O(n2 ) bits. [Jaypaul et al. IWOCA’14] Jaypaul et al. gave O(n2 lg lg n) bit encoding.
  • 31. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions 2D NLV Problem Statement Given an n × n matrix A, preprocess to answer: NLV(p) : if p = (i, j), return q = (i , j ) s.t. A[q] > A[p] and |p − q|1 = |i − i | + |j − j | is minimized. 0 1 2 3 4 5 0 1 2 3 4 5 Can’t point directly to answer when elements of A are non-distinct: this requires Ω(n2 lg n) bits, which is uninteresting. Jaypaul et al. gave O(n2 lg lg n) bit encoding.
  • 32. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Encoding 2D NLV Theorem [Jo et al. WALCOM’15] There is an encoding of NLVs of a 2D matrix A that uses O(n2 ) bits and answers queries in O(1) time, even when elements of A are not distinct. • Encoding idea is simple: • Suppose wlog that NLV(p) = q is to the right and above p. If there is a position p to the right of p in p’s row but not to the right of q, then p points to p . Else, look for p above p in column. If neither p nor p exist then point to q. • 1D NLV problem closely related to RMQ problem. • Encoding 2D-RMQ requires Ω(n2 lg n) bits [Demaine et al. ICALP’09].
  • 33. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Minimal Encodings 1. Pre-process given data to obtain encoding E, discard input. 2. E should precisely characterize the query – # distinct Es should equal # distinguishable data instances using the query (|C|). 3. Create succinct DS on E, using lg |C|(1 + o(1)) bits. Second pre-processing should not access input. 0000 0000000000000000 0000000000000000 00000000000000000000 0000 1111 1111111111111111 1111111111111111 11111111111111111111 1111 INPUT QUERY RESULT PREPROC Encoding PREPROCDS Advantages • Optimal space. • Only information in DS is what can be obtained from queries. • “Minimal-knowledge” data structures: contain only information strictly necessary to answer queries.
  • 34. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Minimal Encodings for RMQ Problem Definition Given A[1..n], preprocess to answer: RMQ(l, r) : return arg maxl≤i≤r A[i]. 3 1 8 2 10 12 11 4 9 6 7 5 • Shape of Cartesian tree precisely describes all possible RMQs. [Fischer, Heun, SICOMP’11]. • Pre-process A, output Cartesian tree, delete A.
  • 35. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Minimal Encodings for R2MQ Problem Definition Given A[1..n], encode A to answer: R2MQ(l, r): return arg maxi∈{l,...,r}−RMQ(l,r) A[i]. [10] [1] [6] [1] [1] [1] [3] [1] [1] [1] [1] [3] • Need to merge inner spines of Cartesian tree. • Precisely described by “extended Cartesian tree”. • Space needed is asymptotically ∼ 2.76n bits [Gawrychowski and Nicholson, ICALP’15].
  • 36. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Minimal Encodings for the Bidirectional NLV Problem Problem Definition Given A[1..n], encode A to answer: BNLV(i): return j > i such that A[j] > A[i] and j − i is minimized, and j < i such that A[j ] > A[i] and i − j is minimized. 3 7 2 4 4 8 54 34 4 3 • When A has distinct values, this is just Cartesian trees. • When A has equal values, described by a subclass of Schr¨oder trees [Fischer, TCS’11]. • Number of n-node Schr¨oder trees is ≤ (3 + 2 √ 2)n < 22.54n . • Encoding using < 2.54n bits.
  • 37. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Minimal Encodings for Range Min-Max Queries Problem Definition Given A[1..n], encode A to answer: Range-Min-Max(l, r): return both arg maxi∈{l,...,r} A[i] and arg mini∈{l,...,r} A[i]. Minimal encoding by [Gawrychowski and Nicholson, ICALP’15]: • Precisely characterized by Baxter permutations. • Do not exist 1 ≤ l < i < r ≤ n such that: π(i + 1) < π(l) < π(r) < π(i) (2 − 41 − 3) or π(i) < π(r) < π(l) < π(i + 1) (3 − 14 − 2) • If A is a Baxter permutation, it can be recovered using Range-Min-Max queries. • Number of Baxter permutations on [n] = 23n /nO(1) , gives 3n − O(lg n) encoding size.
  • 38. Introduction Encoding Data Structures Asymptotically Optimal Encodings Minimal Encodings Conclusions Conclusions and Open Problems Conclusions: • Introduced the notion of encoding DS. • Minimal encodings are combinatorially interesting and have good privacy properties. Wide range of open problems: • Challenging data structuring open problems: • Asymptotically optimal 2D RMQ encoding of [Brodal et al. ESA’13] does not support efficient 2D RMQ queries. • Optimal top-k encoding of [Gawrychowski and Nicholson ICALP’15] does not support efficient queries. • Determining minimal encodings for a number of problems. • Pre-processing time — ideally want O(n) time preprocessing. • Apply encoding DS to reducing the space usage of “normal” DS. [cf. Chan and Wilkinson, SODA’13]