SlideShare a Scribd company logo
PAGE1
www.exensa.com
www.exensa.com
PRESENTER: GUILLAUME PITEL 2016 JUNE 9Approximate counting for NLP
Count-Min Tree Sketch
Guillaume Pitel, Geoffroy Fouquier, Emmanuel Marchand, Abdul
Mouhamadsultane
0
1
1 0 1 0
1 1 1 1
1 0
0 1
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
1
0101
b=2/c=110 b=4/c=01011001
conflict
between
counters
4 and 7
PAGE2
www.exensa.com
A bit of context
Why do we need to count ?
Data analysis platform : eXenGine.
Processes different kind of data (mostly text).
We need to create relevant cross-features : to do that we need to count occurrences of all possible
cross-features. In the case of text data, a particular kind of cross-feature is known as n-grams.
There are many different measures to decide if a n-gram is interesting. All require to count the
occurrences of the cross-feature and the features themselves (i.e. count bigrams and words in
bigrams)
Counting exactly is easy, distributable, and very slow because of memory usage. Also, having the
whole data structure containing the counts in memory is impossible, so one has to resort to using
huge map/reduce with joins to do the job.
PAGE3
www.exensa.com
A bit of context
What kind of data are we talking about ?
Google N-grams
tokens 1024 Billions
sentences 95 Billions
1-grams (count > 200) 14 Millions
2-grams (count > 40) 314 Millions
3-grams 977 Millions
4-grams 1.3 Billion
5-grams 1.2 Billion
PAGE4
www.exensa.com
A bit of context
What kind of data are we talking about ?
Zipfian distribution
[Le Quan & al. 2003]
PAGE5
www.exensa.com
A bit of context
What kind of measures are we talking about ?
PMI, TF-IDF, LLR
PAGE6
www.exensa.com
A bit of context
Summary / Goals
Many
counts
Logarithms
in measures
We need to store
a large amount of
counts
We care about
the order of
magnitude
Fast and memory
controlled
We don’t want a
distributed memory for
the counts
Zipfian
counts
Many very small
counts that will be
filtered out later
PAGE7
www.exensa.com
A bit of context
Summary / Goals
Many
counts
Logarithms
in measures
We need to store
a large amount of
counts
We care about
the order of
magnitude
Fast and memory
controlled
We don’t want a
distributed memory for
the counts
Zipfian
counts
Many very small
counts that will be
filtered out later
We can use probabilistic
structures
PAGE8
www.exensa.com
Count-Min Sketch
A probabilistic data structure to store counts [Cormode & Muthukrishnan 2005]
PAGE9
www.exensa.com
Count-Min Sketch
A probabilistic data structure to store counts
Conservative Update :
improve CMS by updating
only min values
PAGE10
www.exensa.com
Count-Min Log Sketch
A probabilistic data structure to store logarithmic counts
[Pitel & Fouquier, 2015] : same idea than [Talbot, 2009] in a Count-min Sketch
Instead of using regular 32 bit counters, we use 8 or 16 bits “Morris” counters counting
logarithmically.
Since counts are used in logs anyway, the error on the PMI/TF-IDF/… is almost the same, but we can
use more counters
However, a count of 1 still uses the same amount of memory than a count of 10000. Also, at some
point, error stops improving with space (there is an inherent residual error)
PAGE11
www.exensa.com
Count-Min Tree Sketch
A count min sketch with shared counters
Idea : use a hierarchal storage where most significant bits are shared
between counters.
Somehow similar to TOMB counters [Van Durme, 2009], except that
overflow is managed very differently.
PAGE12
www.exensa.com
Tree Shared Counters
Sharing most significant
bits
8 counters structure
o A tree is made of three kinds of storage:
o Counting bits
o Barrier bits
o Spire (not required except for
performance)
oSeveral layers alternating counting
and barrier bits.
oHere we have a
<[(8,8),(4,4),(2,2),(1,1)],4> counter
Or : how can we store counts with an average approaching
4 bits / counter
0
1
1 0 1 0
1 1 1 1
1 0
0 1
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
1
0101
barrier bits
counting bits
spire
base layer
PAGE13
www.exensa.com
Tree Shared Counters
Sharing most significant
bits
8 counters structure
o8 counters in 30 bits + spire
oWithout a spire, n bits can count up
to 3 × 21+log2
𝑛
4
o Many small shared counters with spires
are more efficient than a large shared
counter
Or : how can we store counts with an average approaching
4 bits / counter
0
1
1 0 1 0
1 1 1 1
1 0
0 1
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
1
0101
barrier bits
counting bits
spire
base layer
PAGE14
www.exensa.com
Tree Shared Counters
Reading values
o A counter stops at the first ZERO barrier
o When two barrier paths meet, there is
a conflict
o Barrier length (b) is evaluated in unary
o Counter bits (c) are evaluated in a more
classical way
0
1
1 0 1 0
1 1 1 1
1 0
0 1
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
1
0101
b=2/c=110 b=4/c=01011001
conflict
between
counters
4 and 7
PAGE15
www.exensa.com
Tree Shared Counters
Incrementing (counter 5)
0
0
0 0 0 0
0 0 0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0000
0
0
0 0 0 0
0 0 0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0000
0
0
0 0 0 0
0 0 0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0000
0 1 2
PAGE16
www.exensa.com
Tree Shared Counters
Incrementing (counter 5)
0
0
0 0 0 0
0 0 0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0000
0
0
0 0 1 0
0 0 0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0000
0
0
0 0 1 0
0 0 0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0000
3 4 5
PAGE17
www.exensa.com
Tree Shared Counters
Incrementing (counter 5)
0
0
0 0 0 0
0 0 1 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0000
6
1
A bit at that level is worth …
2
2
4
4
8
PAGE18
www.exensa.com
Count-Min Tree Sketches
Experiments
Results !
• 140M tokens from English Wikipedia*
• 14.7M words (unigrams + bigrams)
• Reference counts stored in UnorderedMap  815MiB
Perfect storage size : suppose we have a perfect hash function and store the counts using 32-bits
counters. For 14.7M words, it amounts to 59MiB.
Performance : our implementation of a CMTS using <[(128,128),(64,64)…],32> counters is equivalent to native
UnorderedMap performance.
We use 3-layers sketches (good performance/precision tradeoff)
* We preferred to test our counters with a large number of parameters rather than with a large
corpus, so we limit to 5% of Wikipedia.
PAGE19
www.exensa.com
Count-Min Tree Sketches
Average Relative Error
Results !
PAGE20
www.exensa.com
Count-Min Tree Sketches
RMSE
Results !
PAGE21
www.exensa.com
Count-Min Tree Sketches
RMSE on PMI
Results !
PAGE22
www.exensa.com
Count-Min Tree Sketch
Question : are CMTS really useful in real-life ?
1 – CMTS are better on the whole vocabulary, but what happens if we
skip the least frequent words / bigrams ?
2 – CMTS are better on average, but what happens quantile by quantile ?
PAGE23
www.exensa.com
Count-Min Tree Sketches
PMI Error per quantile
(sketches at 50% perfect
size, limit eval to f > 10-7
)
Results !
PAGE24
www.exensa.com
Count-Min Tree Sketches
Relative Error per log2-quantile
(sketches at 50% perfect size,
limit eval to f > 10-7 )
Results !
PAGE25
www.exensa.com
Conclusion
Where are we ?
CMTS significantly outperforms other methods to store and update Zipfian counts in a very efficient
way.
Because most of the time in sketch accesses is due to memory access, its performance is on-par with
other methods
• Main drawback : at very high (and unpractical anyway) pressures (less than 10% of the perfect storage
size), the error skyrockets
• Other drawback : implementation is not straightforward. We have devised at least 4 different ways to
increment the counters.
Merging (and thus distributing) is easy once you can read and set a counter.
PAGE26
www.exensa.com
Conclusion
Where are we going ?
Dynamic : we are working on a CMTS version that can automatically grow (more layers added below)
Pressure control : when we detect that pressure becomes too high, we can divide and subsample to
stop the collisions to cascade
Open Source python package on its way

More Related Content

PPTX
Things to Remember When Developing 64-bit Software
PDF
Data visualization pyplot
PPT
Connaissance marché et apports du web
PPT
Les professionnels de l'information face aux défis du Web de données
PDF
Programmation web1 complet
PPT
Trends and challenges in web application development
PDF
Design d'Information
PDF
NLP from scratch
Things to Remember When Developing 64-bit Software
Data visualization pyplot
Connaissance marché et apports du web
Les professionnels de l'information face aux défis du Web de données
Programmation web1 complet
Trends and challenges in web application development
Design d'Information
NLP from scratch

Viewers also liked (18)

PPT
Evolution du look & feel du web 0.0 au 2.0 - Printemps.com
PPTX
2016 06-30-deep-learning-archi
PPTX
Modern Datacenter : de la théorie à la pratique
PDF
Les cabinets de recrutement spécialisés dans les métiers du numérique
PPTX
hands on: Text Mining With R
PPTX
Web1, web2 and web 3
PDF
Introducing natural language processing(NLP) with r
PDF
The Evolution of Web 3.0
PPTX
Natural Language Processing in R (rNLP)
PDF
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
PPTX
TextMining with R
PPTX
Craftworkz at InterConnect 2017 - Creating a Highly Scalable Chatbot in a Mic...
PDF
Deep Learning for NLP: An Introduction to Neural Word Embeddings
PPT
Web Development on Web Project Presentation
PPTX
DocDoku - Mobile Monday Toulouse 1ère : la NFC
PDF
#MDSGAM : Etude Digital Trends Morocco 2015
PPT
Detail History of web 1.0 to 3.0
PPT
Web 1.0, Web 2.0 & Web 3.0
Evolution du look & feel du web 0.0 au 2.0 - Printemps.com
2016 06-30-deep-learning-archi
Modern Datacenter : de la théorie à la pratique
Les cabinets de recrutement spécialisés dans les métiers du numérique
hands on: Text Mining With R
Web1, web2 and web 3
Introducing natural language processing(NLP) with r
The Evolution of Web 3.0
Natural Language Processing in R (rNLP)
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
TextMining with R
Craftworkz at InterConnect 2017 - Creating a Highly Scalable Chatbot in a Mic...
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Web Development on Web Project Presentation
DocDoku - Mobile Monday Toulouse 1ère : la NFC
#MDSGAM : Etude Digital Trends Morocco 2015
Detail History of web 1.0 to 3.0
Web 1.0, Web 2.0 & Web 3.0
Ad

Similar to Count-Min Tree Sketch : Approximate counting for NLP tasks (20)

PDF
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
PPTX
2013 py con awesome big data algorithms
PDF
Counting (Using Computer)
PDF
Too Much Data? - Just Sample, Just Hash, ...
PDF
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
PDF
Probabilistic data structures. Part 3. Frequency
PDF
Elementary algorithms
PPTX
hash
PDF
PDF
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
PPTX
Tech talk Probabilistic Data Structure
PPT
Secure information aggregation in sensor networks
PPTX
Image Captioning of Handwritten Mathematical Expressions
PDF
Python: The Programmer's Lingua Franca
PPTX
Topical_Facets
PPTX
Streaming Algorithms
PDF
An introduction to probabilistic data structures
PDF
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
PPTX
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
PDF
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
2013 py con awesome big data algorithms
Counting (Using Computer)
Too Much Data? - Just Sample, Just Hash, ...
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic data structures. Part 3. Frequency
Elementary algorithms
hash
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
Tech talk Probabilistic Data Structure
Secure information aggregation in sensor networks
Image Captioning of Handwritten Mathematical Expressions
Python: The Programmer's Lingua Franca
Topical_Facets
Streaming Algorithms
An introduction to probabilistic data structures
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Ad

Recently uploaded (20)

PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
INTRODUCTION TO EVS | Concept of sustainability
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Sciences of Europe No 170 (2025)
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPT
protein biochemistry.ppt for university classes
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
ECG_Course_Presentation د.محمد صقران ppt
Phytochemical Investigation of Miliusa longipes.pdf
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
2Systematics of Living Organisms t-.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
INTRODUCTION TO EVS | Concept of sustainability
Viruses (History, structure and composition, classification, Bacteriophage Re...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
The KM-GBF monitoring framework – status & key messages.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Comparative Structure of Integument in Vertebrates.pptx
Sciences of Europe No 170 (2025)
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
TOTAL hIP ARTHROPLASTY Presentation.pptx
protein biochemistry.ppt for university classes
bbec55_b34400a7914c42429908233dbd381773.pdf

Count-Min Tree Sketch : Approximate counting for NLP tasks

  • 1. PAGE1 www.exensa.com www.exensa.com PRESENTER: GUILLAUME PITEL 2016 JUNE 9Approximate counting for NLP Count-Min Tree Sketch Guillaume Pitel, Geoffroy Fouquier, Emmanuel Marchand, Abdul Mouhamadsultane 0 1 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 0101 b=2/c=110 b=4/c=01011001 conflict between counters 4 and 7
  • 2. PAGE2 www.exensa.com A bit of context Why do we need to count ? Data analysis platform : eXenGine. Processes different kind of data (mostly text). We need to create relevant cross-features : to do that we need to count occurrences of all possible cross-features. In the case of text data, a particular kind of cross-feature is known as n-grams. There are many different measures to decide if a n-gram is interesting. All require to count the occurrences of the cross-feature and the features themselves (i.e. count bigrams and words in bigrams) Counting exactly is easy, distributable, and very slow because of memory usage. Also, having the whole data structure containing the counts in memory is impossible, so one has to resort to using huge map/reduce with joins to do the job.
  • 3. PAGE3 www.exensa.com A bit of context What kind of data are we talking about ? Google N-grams tokens 1024 Billions sentences 95 Billions 1-grams (count > 200) 14 Millions 2-grams (count > 40) 314 Millions 3-grams 977 Millions 4-grams 1.3 Billion 5-grams 1.2 Billion
  • 4. PAGE4 www.exensa.com A bit of context What kind of data are we talking about ? Zipfian distribution [Le Quan & al. 2003]
  • 5. PAGE5 www.exensa.com A bit of context What kind of measures are we talking about ? PMI, TF-IDF, LLR
  • 6. PAGE6 www.exensa.com A bit of context Summary / Goals Many counts Logarithms in measures We need to store a large amount of counts We care about the order of magnitude Fast and memory controlled We don’t want a distributed memory for the counts Zipfian counts Many very small counts that will be filtered out later
  • 7. PAGE7 www.exensa.com A bit of context Summary / Goals Many counts Logarithms in measures We need to store a large amount of counts We care about the order of magnitude Fast and memory controlled We don’t want a distributed memory for the counts Zipfian counts Many very small counts that will be filtered out later We can use probabilistic structures
  • 8. PAGE8 www.exensa.com Count-Min Sketch A probabilistic data structure to store counts [Cormode & Muthukrishnan 2005]
  • 9. PAGE9 www.exensa.com Count-Min Sketch A probabilistic data structure to store counts Conservative Update : improve CMS by updating only min values
  • 10. PAGE10 www.exensa.com Count-Min Log Sketch A probabilistic data structure to store logarithmic counts [Pitel & Fouquier, 2015] : same idea than [Talbot, 2009] in a Count-min Sketch Instead of using regular 32 bit counters, we use 8 or 16 bits “Morris” counters counting logarithmically. Since counts are used in logs anyway, the error on the PMI/TF-IDF/… is almost the same, but we can use more counters However, a count of 1 still uses the same amount of memory than a count of 10000. Also, at some point, error stops improving with space (there is an inherent residual error)
  • 11. PAGE11 www.exensa.com Count-Min Tree Sketch A count min sketch with shared counters Idea : use a hierarchal storage where most significant bits are shared between counters. Somehow similar to TOMB counters [Van Durme, 2009], except that overflow is managed very differently.
  • 12. PAGE12 www.exensa.com Tree Shared Counters Sharing most significant bits 8 counters structure o A tree is made of three kinds of storage: o Counting bits o Barrier bits o Spire (not required except for performance) oSeveral layers alternating counting and barrier bits. oHere we have a <[(8,8),(4,4),(2,2),(1,1)],4> counter Or : how can we store counts with an average approaching 4 bits / counter 0 1 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 0101 barrier bits counting bits spire base layer
  • 13. PAGE13 www.exensa.com Tree Shared Counters Sharing most significant bits 8 counters structure o8 counters in 30 bits + spire oWithout a spire, n bits can count up to 3 × 21+log2 𝑛 4 o Many small shared counters with spires are more efficient than a large shared counter Or : how can we store counts with an average approaching 4 bits / counter 0 1 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 0101 barrier bits counting bits spire base layer
  • 14. PAGE14 www.exensa.com Tree Shared Counters Reading values o A counter stops at the first ZERO barrier o When two barrier paths meet, there is a conflict o Barrier length (b) is evaluated in unary o Counter bits (c) are evaluated in a more classical way 0 1 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 0101 b=2/c=110 b=4/c=01011001 conflict between counters 4 and 7
  • 15. PAGE15 www.exensa.com Tree Shared Counters Incrementing (counter 5) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0000 0 1 2
  • 16. PAGE16 www.exensa.com Tree Shared Counters Incrementing (counter 5) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0000 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0000 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0000 3 4 5
  • 17. PAGE17 www.exensa.com Tree Shared Counters Incrementing (counter 5) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0000 6 1 A bit at that level is worth … 2 2 4 4 8
  • 18. PAGE18 www.exensa.com Count-Min Tree Sketches Experiments Results ! • 140M tokens from English Wikipedia* • 14.7M words (unigrams + bigrams) • Reference counts stored in UnorderedMap  815MiB Perfect storage size : suppose we have a perfect hash function and store the counts using 32-bits counters. For 14.7M words, it amounts to 59MiB. Performance : our implementation of a CMTS using <[(128,128),(64,64)…],32> counters is equivalent to native UnorderedMap performance. We use 3-layers sketches (good performance/precision tradeoff) * We preferred to test our counters with a large number of parameters rather than with a large corpus, so we limit to 5% of Wikipedia.
  • 22. PAGE22 www.exensa.com Count-Min Tree Sketch Question : are CMTS really useful in real-life ? 1 – CMTS are better on the whole vocabulary, but what happens if we skip the least frequent words / bigrams ? 2 – CMTS are better on average, but what happens quantile by quantile ?
  • 23. PAGE23 www.exensa.com Count-Min Tree Sketches PMI Error per quantile (sketches at 50% perfect size, limit eval to f > 10-7 ) Results !
  • 24. PAGE24 www.exensa.com Count-Min Tree Sketches Relative Error per log2-quantile (sketches at 50% perfect size, limit eval to f > 10-7 ) Results !
  • 25. PAGE25 www.exensa.com Conclusion Where are we ? CMTS significantly outperforms other methods to store and update Zipfian counts in a very efficient way. Because most of the time in sketch accesses is due to memory access, its performance is on-par with other methods • Main drawback : at very high (and unpractical anyway) pressures (less than 10% of the perfect storage size), the error skyrockets • Other drawback : implementation is not straightforward. We have devised at least 4 different ways to increment the counters. Merging (and thus distributing) is easy once you can read and set a counter.
  • 26. PAGE26 www.exensa.com Conclusion Where are we going ? Dynamic : we are working on a CMTS version that can automatically grow (more layers added below) Pressure control : when we detect that pressure becomes too high, we can divide and subsample to stop the collisions to cascade Open Source python package on its way