SlideShare a Scribd company logo
Extraction de biclusters de valeurs
similaires `a l’aide de l’analyse de concepts
triadiques
M. Kaytoue, S. O. Kuznetsov,
J. Macko, W. Meira Jr. et A. Napoli
Bordeaux, 31 Janvier - 3 F´evrier 2012
Extraction et Gestion des Connaissances - EGC 2012
Context
Knowledge Discovery in Databases
2 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Biclustering numerical data
Numerical data and bicluster
Given a numerical dataset (G, M, W , I)
–object/attribute data-table–
G a set of objects (lines)
M a set of attributes (columns)
W a set of values
I ⊆ G × M × W a relation s.t. (g, m, w) ∈ I, written m(g) = w,
means that object g takes the value w for attribute m
–simply represents data-cells–
a bicluster is a pair (A, B) with A ⊆ G and B ⊆ M.
–a rectangle in the data-table–
3 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Biclustering numerical data
Example
Given a dataset (G, M, W , I) with
G = {g1, g2, g3, g4}
M = {m1, m2, m3, m4, m5}
W = {0, 1, 2, 6, 7, 8, 9}
and e.g. m2(g4) = 9
the bicluster ({g2, g3, g4}, {m3, m4}) can be viewed as the gray
rectangle
m1 m2 m3 m4 m5
g1 1 2 2 1 6
g2 2 1 1 0 6
g3 2 2 1 7 6
g4 8 9 2 6 7
4 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Biclustering numerical data
But... a bicluster should reflect
a local phenomena in the data: “rectangles of values”
connectedness of values: e.g. similar values
overlapping: objects/attributes may belong to several patterns
a partial order, e.g. for algorithmic issues
maximality of rectangles w.r.t. connectedness and ordering
Several types of biclusters
5 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Biclustering numerical data
Several applications
Collaborative filtering and recommender systems
Finding web communities
Discovery of association rules in databases
Gene expression analysis, ...
Several algorithms
Iterative Row and Column Clustering Combination
Divide and Conquer / Distribution Parameter Identification
Greedy Iterative Search / Exhaustive Bicluster Enumeration
A difficult problem generally relying on heuristics
S. C. Madeira and A. L. Oliveira
Biclustering Algorithms for Biological Data Analysis: a survey.
In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004.
6 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Introducing similarity
A simple similarity relation
w1 θ w2 ⇐⇒ |w1 − w2| ≤ θ with θ ∈ R, w1, w2 ∈ W
Considered type of biclusters
A bicluster (A, B) is a bicluster of similar values if
mi (gj ) θ mk(gl ), ∀gj , gl ∈ A, ∀mi , mk ∈ B
m1 m2 m3 m4 m5
g1 1 2 2 1 6
g2 2 1 1 0 6
g3 2 2 1 7 6
g4 8 9 2 6 7
(with θ = 2)
and maximal if no object/attribute can be added
J. Besson, C. Robardet, L. De Raedt, J.-F. Boulicaut
Mining Bi-sets in Numerical Data.
In KDID 2006: 11-23.
7 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Formal Concept Analysis (G. & W., 99)
From a formal context to a concept lattice...
m1 m2 m3
g1 × ×
g2 × ×
g3 × ×
g4 × ×
g5 × × ×
Formal concepts = maximal rectangles
... with interesting properties (and existing algorithms!)
Maximality of concepts as rectangles
Overlapping of concepts
Specialization/generalisation hierarchy
This is exactly what we need for biclustering
8 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Contribution
FCA: an interesting framework for biclustering
Use FCA for a complete, correct and non-redundant extraction
of biclusters of similar values with lossless discretization
with no set similarity parameter (useful for top-k pattern
discovery)
with a given similarity parameter (as in the literature)
Design an algorithm
better than its competitors
can be easily distributed
can handle several constraints (e.g. size) in the fly
A better understanding of closed numerical pattern mining
9 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Outline
1 Formal Concept Analysis (FCA)
2 A first FCA-based biclustering method
3 Algorithm TriMax
4 Experiments
5 Conclusion and perspectives
10 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Formal Concept Analysis (FCA)
In a nutshell...
FCA
A data analysis theory rooted in order and lattice theory allowing
to characterize formal concepts (also known as closed itemsets)
A concept in a formal context
Formal context (G, M, I): objects, attributes, incidence relation
Two derivations operators allowing to define formal concepts
A concept is a maximal rectangle of ×, modulo column and line
permutations
m1 m2 m3
g1 × ×
g2 × ×
g3 × ×
g4 × ×
g5 × × ×
({g3, g4, g5}, {m2, m3}) is a formal concept
11 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Formal Concept Analysis (FCA)
Triadic Concept Analysis (Lehmann &
Wille, 1995)
“Extension” of FCA to ternary relation
An object has an attribute for a given condition
Triadic context (G, M, B, Y ): objects, attributes, conditions,
incidence relation
Several derivation operators allowing to characterize “triadic
concepts” as maximal cubes of ×
b1 b2 b3
m1 m2 m3
g1 ×
g2 × ×
g3 × ×
g4 × ×
g5 × ×
m1 m2 m3
g1 × × ×
g2 × ×
g3 × × ×
g4 × ×
g5 × ×
m1 m2 m3
g1 × ×
g2 ×
g3 × × ×
g4 × ×
g5 × × ×
({g3, g4, g5}, {m2, m3}, {c1, c2, c3}) is a triadic concept
12 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
1 Formal Concept Analysis (FCA)
2 A first FCA-based biclustering method
3 Algorithm TriMax
4 Experiments
5 Conclusion and perspectives
A first FCA-based biclustering method
Basic idea
Principle
Start from a numerical dataset
Build a triadic context, with same objects, same attributes, and
a discretized non-lossy “numerical space” dimension
Extract triadic concepts
We show interesting links between biclusters of similar
values and triadic concepts
14 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
A first FCA-based biclustering method
Discretization method
Interodinal scaling (existing discretization scale)
Let (G, M, W , I) be a numerical dataset (with W the set of
data-values.
Now consider the set
T = {[min(W ), w], ∀w ∈ W } ∪ {[w, max(W )], ∀w ∈ W }.
Known fact: T and all its intersections characterize any interval
of values on W .
Example
With W = {0, 1, 2, 6, 7, 8, 9}, one has
T = {[0, 0], [0, 1], [0, 2], [0, 3], ..., [1, 9], [2, 9], ..., [9, 9]}
and for example [0, 8] ∩ [2, 9] = [2, 8]
15 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
A first FCA-based biclustering method
Building a triadic context
Transformation procedure
From a numerical dataset (G, M, W , I), build a triadic context
(G, M, T, Y ) such as (g, m, t) ∈ Y ⇐⇒ m(g) ∈ t
16 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
A first FCA-based biclustering method
First contribution
We proved that there is a 1-1-correspondence between
(i) Triadic concepts of the resulting triadic context
(ii) Biclusters of similar values maximal for some θ ≥ 0
Interesting facts
Efficient algorithm for concepts extraction (Data-Peeler)
L. Cerf, J. Besson, C. Robardet, J.-F. Boulicaut
Closed patterns meet n-ary relations.
In TKDD 3(1): (2009).
This algorithm allows to handle several constraints
Top-k biclusters: Concept (A, B, C) with high |A|, |B|, and |C|
corresponds to bicluster (A, B) as a large rectangle of close
values (by properties of interordinal scale)
This formalization allows us to design a new algorithm to
extract maximal biclusters for a given parameter θ
17 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
1 Formal Concept Analysis (FCA)
2 A first FCA-based biclustering method
3 Algorithm TriMax
4 Experiments
5 Conclusion and perspectives
Algorithm TriMax
Compute all max. biclusters for a given
θ
Principle
Use another (but similar) discretization procedure to build the
triadic context based on tolerance blocks
Standard algorithms output biclusters of similar values but not
necessarily maximal
We design a new algorithm TriMax for that task
TriMax is flexible, uses standard FCA algorithms in its
core and is better than its competitors
19 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Algorithm TriMax
Finding maximal set of similar values
θ a tolerance relation
reflexive, symmetric, but not transitive
Blocks of tolerance of W
Maximal sets of pairwise similar values are closed sets
Example with θ = 1
1 0 1 2 6 7 8 9
0 × ×
1 × × ×
2 × ×
6 × ×
7 × × ×
8 × × ×
9 × ×
Blocks of tolerance
{0, 1}
{1, 2}
{6, 7}
{7, 8}
{8, 9}
Renamed classes
[0, 1]
[1, 2]
[6, 7]
[7, 8]
[8, 9]
S. O. Kuznetsov
Galois Connections in Data Analysis: Contributions from the Soviet Era and Modern Russian Research.
In Formal Concept Analysis, Foundations and Applications, 2005.
20 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Algorithm TriMax
New transformation procedure
Tolerance blocks based scaling
Compute the set C of all blocks of tolerance over W
From the numerical dataset (G, M, W , I), build the triadic
context (G, M, C, Z) such that (g, m, c) ∈ Z ⇐⇒ m(g) ∈ c
Actually, we remove “useless information”
θ = 1
21 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Algorithm TriMax
Second contribution
Algorithm TriMax
Any triadic concept corresponds to a bicluster of similar values,
but not necessarily maximal!
It lead us to the algorithm TriMax that:
Process each formal context (one for each block of tolerance)
with any existing FCA algorithm
Any resulting concept is a maximal bicluster candidate and a
simple procedure allow to check maximality (this may be
problematic, but experiments show a good behaviour)
Each context can be processed separately
TriMax allows a complete, correct and non redundant
extraction of all maximal biclusters of similar values for a
user defined similarity parameter θ
22 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
1 Formal Concept Analysis (FCA)
2 A first FCA-based biclustering method
3 Algorithm TriMax
4 Experiments
5 Conclusion and perspectives
Experiments
Trimax - settings
Implementation: C++, boost library 1.42
InClose algorithm for dyadic contexts processing
Data: gene expression data of the species Laccaria bicolor
Configuration: Intel CPU 2.54 Ghz, 8 GB RAM
24 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Experiments
Trimax - monitoring aspects
Starting with all 12 attributes, we make vary the number of
objects, the similarity parameter θ and monitor:
Number of maximal biclusters of similar values
Execution time (in seconds)
Number of tolerance blocks
Density of the triadic context
Comparison between the number of non-maximal biclusters with
the number of maximal biclusters
Execution time profiling of the main procedures of TriMax
25 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Experiments
Trimax - experimental results
Nr. of max. biclusters Execution times in sec. Nr. of blocks of toler.
Density of 3-adic cont. Nr. generated of biclusters Execution time
26 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Experiments
TriMax bottleneck
Computing the modus is problematic...
builds of formal context (2D) for each block of tolerance
extracts concepts (A, B) for each of them
computes the modus C to get triadic concept (A, B, C) and
check maximality
But...
In many applications, experts have preferences
One can remove a bicluster candidate before modus
computation according to some constraints
Example with θ = 33, 000, 500 objects, 12 attributes
104, 226 maximal biclusters extracted in 16.130 sec
5, 332 maximal biclusters in 2.1 sec with at least 10 (at last 40)
objects
27 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Experiments
Comparison
Existing algorithms
Numerical Biset Miner (NBS-Miner) - not scalable
J. Besson, C. Robardet, L. De Raedt, J.-F. Boulicaut
Mining Bi-sets in Numerical Data.
In KDID 2006: 11-23.
Interval Pattern Structures (IPS) - less efficient than TriMax
M. Kaytoue, S. O. Kuznetsov, and A. Napoli
Biclustering Numerical Data in Formal Concept Analysis.
ICFCA, Springer, 2011.
28 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Experiments
An example of comparison
Increasing number of objects and all 12 attributes.
Results in milliseconds.
θ = 0 θ = 700 θ = 10000
Other scenarii show a similar behaviour.
29 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
1 Formal Concept Analysis (FCA)
2 A first FCA-based biclustering method
3 Algorithm TriMax
4 Experiments
5 Conclusion and perspectives
Conclusion and perspectives
Conclusion
Contribution
A better understanding of closed numerical pattern mining
within FCA
A formal characterization of a type of bicluster
TriMax for efficient computation
Perspectives
top-k bicluster discovery
n-dimensional numerical datasets
Distributed computation
Constraints (size, mean-square residue, etc.)
Links with Fuzzy FCA
31 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques

More Related Content

PDF
Reproducibility and differential analysis with selfish
PDF
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
PDF
optimal subsampling
PDF
Kernel methods for data integration in systems biology
PDF
Graph Neural Network in practice
PPT
Learning for Optimization: EDAs, probabilistic modelling, or ...
PPSX
Prototype-based models in machine learning
PDF
A short and naive introduction to using network in prediction models
Reproducibility and differential analysis with selfish
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
optimal subsampling
Kernel methods for data integration in systems biology
Graph Neural Network in practice
Learning for Optimization: EDAs, probabilistic modelling, or ...
Prototype-based models in machine learning
A short and naive introduction to using network in prediction models

What's hot (20)

PDF
From RNN to neural networks for cyclic undirected graphs
PPTX
Ruta solucion de problemas
DOCX
Planted Clique Research Paper
PDF
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
PPT
Data Mining: Concepts and Techniques — Chapter 2 —
DOCX
Clustering techniques final
PPTX
Binomial Distribution Part 4
PDF
Linear Algebra – A Powerful Tool for Data Science
PPT
20070702 Text Categorization
PPT
1212 regular meeting
ODP
Module 3 Review
PDF
An overview of Bayesian testing
PPT
IR-ranking
PDF
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
PDF
Mapping Subsets of Scholarly Information
PDF
PDF
PDF
Dynamics in graph analysis (PyData Carolinas 2016)
PDF
PPT
Lect12 graph mining
From RNN to neural networks for cyclic undirected graphs
Ruta solucion de problemas
Planted Clique Research Paper
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Data Mining: Concepts and Techniques — Chapter 2 —
Clustering techniques final
Binomial Distribution Part 4
Linear Algebra – A Powerful Tool for Data Science
20070702 Text Categorization
1212 regular meeting
Module 3 Review
An overview of Bayesian testing
IR-ranking
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Mapping Subsets of Scholarly Information
Dynamics in graph analysis (PyData Carolinas 2016)
Lect12 graph mining
Ad

Similar to Extracting biclusters of similar values with Triadic Concept Analysis (20)

PDF
Interval Pattern Structures: An introdution
PDF
On the Mining of Numerical Data with Formal Concept Analysis
PDF
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
PDF
Searching for optimal patterns in Boolean tensors
PDF
Probabilistic Modelling with Information Filtering Networks
PDF
Similarity Features, and their Role in Concept Alignment Learning
PDF
Дмитрий Игнатов для ФИSNA
PDF
Data mining knowledge representation Notes
PDF
mlcourse.ai. Clustering
PDF
A One-Pass Triclustering Approach: Is There any Room for Big Data?
PPTX
Orpailleur -- triclustering talk
PDF
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
PDF
Characterizing and mining numerical patterns, an FCA point of view
PPT
15857 cse422 unsupervised-learning
PPT
Lecture11_ Intro to clustering and K-means algorithm.ppt
PPT
Lecture11_ Intro to clustering and K-means algorithm.ppt
PDF
A Branch And Bound Algorithm For The Maximum Clique Problem
PDF
Icitam2019 2020 book_chapter
PPTX
Tdm probabilistic models (part 2)
PPTX
theory of computation lecture 01
Interval Pattern Structures: An introdution
On the Mining of Numerical Data with Formal Concept Analysis
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Searching for optimal patterns in Boolean tensors
Probabilistic Modelling with Information Filtering Networks
Similarity Features, and their Role in Concept Alignment Learning
Дмитрий Игнатов для ФИSNA
Data mining knowledge representation Notes
mlcourse.ai. Clustering
A One-Pass Triclustering Approach: Is There any Room for Big Data?
Orpailleur -- triclustering talk
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
Characterizing and mining numerical patterns, an FCA point of view
15857 cse422 unsupervised-learning
Lecture11_ Intro to clustering and K-means algorithm.ppt
Lecture11_ Intro to clustering and K-means algorithm.ppt
A Branch And Bound Algorithm For The Maximum Clique Problem
Icitam2019 2020 book_chapter
Tdm probabilistic models (part 2)
theory of computation lecture 01
Ad

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
Teaching material agriculture food technology
PPTX
A Presentation on Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Building Integrated photovoltaic BIPV_UPV.pdf
Unlocking AI with Model Context Protocol (MCP)
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Per capita expenditure prediction using model stacking based on satellite ima...
Empathic Computing: Creating Shared Understanding
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
Teaching material agriculture food technology
A Presentation on Artificial Intelligence
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The Rise and Fall of 3GPP – Time for a Sabbatical?

Extracting biclusters of similar values with Triadic Concept Analysis

  • 1. Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques M. Kaytoue, S. O. Kuznetsov, J. Macko, W. Meira Jr. et A. Napoli Bordeaux, 31 Janvier - 3 F´evrier 2012 Extraction et Gestion des Connaissances - EGC 2012
  • 2. Context Knowledge Discovery in Databases 2 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 3. Biclustering numerical data Numerical data and bicluster Given a numerical dataset (G, M, W , I) –object/attribute data-table– G a set of objects (lines) M a set of attributes (columns) W a set of values I ⊆ G × M × W a relation s.t. (g, m, w) ∈ I, written m(g) = w, means that object g takes the value w for attribute m –simply represents data-cells– a bicluster is a pair (A, B) with A ⊆ G and B ⊆ M. –a rectangle in the data-table– 3 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 4. Biclustering numerical data Example Given a dataset (G, M, W , I) with G = {g1, g2, g3, g4} M = {m1, m2, m3, m4, m5} W = {0, 1, 2, 6, 7, 8, 9} and e.g. m2(g4) = 9 the bicluster ({g2, g3, g4}, {m3, m4}) can be viewed as the gray rectangle m1 m2 m3 m4 m5 g1 1 2 2 1 6 g2 2 1 1 0 6 g3 2 2 1 7 6 g4 8 9 2 6 7 4 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 5. Biclustering numerical data But... a bicluster should reflect a local phenomena in the data: “rectangles of values” connectedness of values: e.g. similar values overlapping: objects/attributes may belong to several patterns a partial order, e.g. for algorithmic issues maximality of rectangles w.r.t. connectedness and ordering Several types of biclusters 5 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 6. Biclustering numerical data Several applications Collaborative filtering and recommender systems Finding web communities Discovery of association rules in databases Gene expression analysis, ... Several algorithms Iterative Row and Column Clustering Combination Divide and Conquer / Distribution Parameter Identification Greedy Iterative Search / Exhaustive Bicluster Enumeration A difficult problem generally relying on heuristics S. C. Madeira and A. L. Oliveira Biclustering Algorithms for Biological Data Analysis: a survey. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004. 6 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 7. Introducing similarity A simple similarity relation w1 θ w2 ⇐⇒ |w1 − w2| ≤ θ with θ ∈ R, w1, w2 ∈ W Considered type of biclusters A bicluster (A, B) is a bicluster of similar values if mi (gj ) θ mk(gl ), ∀gj , gl ∈ A, ∀mi , mk ∈ B m1 m2 m3 m4 m5 g1 1 2 2 1 6 g2 2 1 1 0 6 g3 2 2 1 7 6 g4 8 9 2 6 7 (with θ = 2) and maximal if no object/attribute can be added J. Besson, C. Robardet, L. De Raedt, J.-F. Boulicaut Mining Bi-sets in Numerical Data. In KDID 2006: 11-23. 7 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 8. Formal Concept Analysis (G. & W., 99) From a formal context to a concept lattice... m1 m2 m3 g1 × × g2 × × g3 × × g4 × × g5 × × × Formal concepts = maximal rectangles ... with interesting properties (and existing algorithms!) Maximality of concepts as rectangles Overlapping of concepts Specialization/generalisation hierarchy This is exactly what we need for biclustering 8 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 9. Contribution FCA: an interesting framework for biclustering Use FCA for a complete, correct and non-redundant extraction of biclusters of similar values with lossless discretization with no set similarity parameter (useful for top-k pattern discovery) with a given similarity parameter (as in the literature) Design an algorithm better than its competitors can be easily distributed can handle several constraints (e.g. size) in the fly A better understanding of closed numerical pattern mining 9 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 10. Outline 1 Formal Concept Analysis (FCA) 2 A first FCA-based biclustering method 3 Algorithm TriMax 4 Experiments 5 Conclusion and perspectives 10 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 11. Formal Concept Analysis (FCA) In a nutshell... FCA A data analysis theory rooted in order and lattice theory allowing to characterize formal concepts (also known as closed itemsets) A concept in a formal context Formal context (G, M, I): objects, attributes, incidence relation Two derivations operators allowing to define formal concepts A concept is a maximal rectangle of ×, modulo column and line permutations m1 m2 m3 g1 × × g2 × × g3 × × g4 × × g5 × × × ({g3, g4, g5}, {m2, m3}) is a formal concept 11 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 12. Formal Concept Analysis (FCA) Triadic Concept Analysis (Lehmann & Wille, 1995) “Extension” of FCA to ternary relation An object has an attribute for a given condition Triadic context (G, M, B, Y ): objects, attributes, conditions, incidence relation Several derivation operators allowing to characterize “triadic concepts” as maximal cubes of × b1 b2 b3 m1 m2 m3 g1 × g2 × × g3 × × g4 × × g5 × × m1 m2 m3 g1 × × × g2 × × g3 × × × g4 × × g5 × × m1 m2 m3 g1 × × g2 × g3 × × × g4 × × g5 × × × ({g3, g4, g5}, {m2, m3}, {c1, c2, c3}) is a triadic concept 12 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 13. 1 Formal Concept Analysis (FCA) 2 A first FCA-based biclustering method 3 Algorithm TriMax 4 Experiments 5 Conclusion and perspectives
  • 14. A first FCA-based biclustering method Basic idea Principle Start from a numerical dataset Build a triadic context, with same objects, same attributes, and a discretized non-lossy “numerical space” dimension Extract triadic concepts We show interesting links between biclusters of similar values and triadic concepts 14 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 15. A first FCA-based biclustering method Discretization method Interodinal scaling (existing discretization scale) Let (G, M, W , I) be a numerical dataset (with W the set of data-values. Now consider the set T = {[min(W ), w], ∀w ∈ W } ∪ {[w, max(W )], ∀w ∈ W }. Known fact: T and all its intersections characterize any interval of values on W . Example With W = {0, 1, 2, 6, 7, 8, 9}, one has T = {[0, 0], [0, 1], [0, 2], [0, 3], ..., [1, 9], [2, 9], ..., [9, 9]} and for example [0, 8] ∩ [2, 9] = [2, 8] 15 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 16. A first FCA-based biclustering method Building a triadic context Transformation procedure From a numerical dataset (G, M, W , I), build a triadic context (G, M, T, Y ) such as (g, m, t) ∈ Y ⇐⇒ m(g) ∈ t 16 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 17. A first FCA-based biclustering method First contribution We proved that there is a 1-1-correspondence between (i) Triadic concepts of the resulting triadic context (ii) Biclusters of similar values maximal for some θ ≥ 0 Interesting facts Efficient algorithm for concepts extraction (Data-Peeler) L. Cerf, J. Besson, C. Robardet, J.-F. Boulicaut Closed patterns meet n-ary relations. In TKDD 3(1): (2009). This algorithm allows to handle several constraints Top-k biclusters: Concept (A, B, C) with high |A|, |B|, and |C| corresponds to bicluster (A, B) as a large rectangle of close values (by properties of interordinal scale) This formalization allows us to design a new algorithm to extract maximal biclusters for a given parameter θ 17 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 18. 1 Formal Concept Analysis (FCA) 2 A first FCA-based biclustering method 3 Algorithm TriMax 4 Experiments 5 Conclusion and perspectives
  • 19. Algorithm TriMax Compute all max. biclusters for a given θ Principle Use another (but similar) discretization procedure to build the triadic context based on tolerance blocks Standard algorithms output biclusters of similar values but not necessarily maximal We design a new algorithm TriMax for that task TriMax is flexible, uses standard FCA algorithms in its core and is better than its competitors 19 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 20. Algorithm TriMax Finding maximal set of similar values θ a tolerance relation reflexive, symmetric, but not transitive Blocks of tolerance of W Maximal sets of pairwise similar values are closed sets Example with θ = 1 1 0 1 2 6 7 8 9 0 × × 1 × × × 2 × × 6 × × 7 × × × 8 × × × 9 × × Blocks of tolerance {0, 1} {1, 2} {6, 7} {7, 8} {8, 9} Renamed classes [0, 1] [1, 2] [6, 7] [7, 8] [8, 9] S. O. Kuznetsov Galois Connections in Data Analysis: Contributions from the Soviet Era and Modern Russian Research. In Formal Concept Analysis, Foundations and Applications, 2005. 20 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 21. Algorithm TriMax New transformation procedure Tolerance blocks based scaling Compute the set C of all blocks of tolerance over W From the numerical dataset (G, M, W , I), build the triadic context (G, M, C, Z) such that (g, m, c) ∈ Z ⇐⇒ m(g) ∈ c Actually, we remove “useless information” θ = 1 21 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 22. Algorithm TriMax Second contribution Algorithm TriMax Any triadic concept corresponds to a bicluster of similar values, but not necessarily maximal! It lead us to the algorithm TriMax that: Process each formal context (one for each block of tolerance) with any existing FCA algorithm Any resulting concept is a maximal bicluster candidate and a simple procedure allow to check maximality (this may be problematic, but experiments show a good behaviour) Each context can be processed separately TriMax allows a complete, correct and non redundant extraction of all maximal biclusters of similar values for a user defined similarity parameter θ 22 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 23. 1 Formal Concept Analysis (FCA) 2 A first FCA-based biclustering method 3 Algorithm TriMax 4 Experiments 5 Conclusion and perspectives
  • 24. Experiments Trimax - settings Implementation: C++, boost library 1.42 InClose algorithm for dyadic contexts processing Data: gene expression data of the species Laccaria bicolor Configuration: Intel CPU 2.54 Ghz, 8 GB RAM 24 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 25. Experiments Trimax - monitoring aspects Starting with all 12 attributes, we make vary the number of objects, the similarity parameter θ and monitor: Number of maximal biclusters of similar values Execution time (in seconds) Number of tolerance blocks Density of the triadic context Comparison between the number of non-maximal biclusters with the number of maximal biclusters Execution time profiling of the main procedures of TriMax 25 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 26. Experiments Trimax - experimental results Nr. of max. biclusters Execution times in sec. Nr. of blocks of toler. Density of 3-adic cont. Nr. generated of biclusters Execution time 26 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 27. Experiments TriMax bottleneck Computing the modus is problematic... builds of formal context (2D) for each block of tolerance extracts concepts (A, B) for each of them computes the modus C to get triadic concept (A, B, C) and check maximality But... In many applications, experts have preferences One can remove a bicluster candidate before modus computation according to some constraints Example with θ = 33, 000, 500 objects, 12 attributes 104, 226 maximal biclusters extracted in 16.130 sec 5, 332 maximal biclusters in 2.1 sec with at least 10 (at last 40) objects 27 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 28. Experiments Comparison Existing algorithms Numerical Biset Miner (NBS-Miner) - not scalable J. Besson, C. Robardet, L. De Raedt, J.-F. Boulicaut Mining Bi-sets in Numerical Data. In KDID 2006: 11-23. Interval Pattern Structures (IPS) - less efficient than TriMax M. Kaytoue, S. O. Kuznetsov, and A. Napoli Biclustering Numerical Data in Formal Concept Analysis. ICFCA, Springer, 2011. 28 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 29. Experiments An example of comparison Increasing number of objects and all 12 attributes. Results in milliseconds. θ = 0 θ = 700 θ = 10000 Other scenarii show a similar behaviour. 29 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 30. 1 Formal Concept Analysis (FCA) 2 A first FCA-based biclustering method 3 Algorithm TriMax 4 Experiments 5 Conclusion and perspectives
  • 31. Conclusion and perspectives Conclusion Contribution A better understanding of closed numerical pattern mining within FCA A formal characterization of a type of bicluster TriMax for efficient computation Perspectives top-k bicluster discovery n-dimensional numerical datasets Distributed computation Constraints (size, mean-square residue, etc.) Links with Fuzzy FCA 31 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques