SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org 511
A COMPREHENSIVE STUDY OF MAJOR TECHNIQUES OF MULTI
LEVEL FREQUENT PATTERN MINING: A SURVEY
Syed Zishan Ali1
, Yogesh Rathore2
1
Computer Science and Engineering, 2
Professor, Computer Science and Engineering, 1, 2
Raipur Institute of Technology,
Raipur, Mandir Hasaud, Raipur, Chhattisgarh, INDIA
Abstract
Frequent pattern mining has become one of the most popular data mining approaches for the analysis of purchasing patterns. There
are techniques such as Apriority and FP-Growth, which were typically restricted to a single concept level. We extend our research to
study Multi - level frequent patterns in multi-level environments. Mining Multi-level frequent pattern may lead to the discovery of
mining patterns at different levels of hierarchy. In this study, we describe the main techniques used to solve these problems and give
a comprehensive survey of the most influential algorithms
That were proposed during the last decade.
Index Terms: Data Mining, Data Transformation, Frequent Pattern Mining (FPM), Transactional Database.
----------------------------------------------------------------------***------------------------------------------------------------------------
1. INTRODUCTION
Frequent patterns are itemsets, subsequences, or substructures
that appear in a data set with frequency no less than a user-
specified threshold. For example, a set of items, such as milk
and bread, that appear frequently together in a Transaction
data set, is a frequent itemset. Finding frequent patterns plays
an essential role in mining associations, correlations, and
many other interesting relationships among data. Moreover, it
helps in data indexing, classification, clustering, and other data
mining tasks as well. Thus, frequent pattern mining has
become an important data mining task and a focused theme in
data mining research.
Frequent pattern mining was first proposed by Agrawal [1]
for market basket analysis in the form of association rule
mining. It analyses customer buying habits by finding
associations between the different items that customers place
in their “shopping baskets”. For instance, if customers are
buying milk, how likely are they going to also buy cereal (and
what kind of cereal) on the same trip to the supermarket? Such
information can lead to increased sales by helping retailers do
selective marketing and arrange their shelf space
One approach to multilevel mining would be to directly
exploit the standard algorithms in this area – Apriori [1] and
FP-Growth [2] by iteratively applying them in a level by level
manner to each concept level. In this paper, we focused on
the study of frequent patterns based on the FP- tree [3].
Many scholars have published a tons of research work on
frequent pattern mining. There have been extensive studies on
the improvements or extensions of Pattern mining has been
extensively studied in data mining communities for many
years. A variety of efficient algorithms such as PrefixSpan
[4],[5], FP-tree [6],[7] have been proposed. These research
works have mainly focused on developing efficient mining
algorithms for discovering patterns from a large data
collection. However, searching for useful and interesting
patterns and rules was still an open problem [8]. Some of the
basic mining techniques : Apriori, Fp-Growth etc.
2. APRIORI ALGORITHM
Apriori is a algorithm proposed by R. Agrawal and R Srikant
in 1994 [1] for mining frequent item sets for Boolean
association rule. The name of algorithm is based on the fact
that the algorithm uses prior knowledge of frequent item set
properties, as we shall see following. Apriori employs an
iterative approach known as level-wise search, where k item
set are used to explore (k+1) item sets. There are two steps in
each iteration. The first step generates a set of candidate item
sets. Then, in the second step we count the occurrence of
each candidate set in database and prunes all disqualified
candidates (i.e. all infrequent item sets). Apriori uses two
pruning technique, first on the bases of support count (should
be greater than user specified support threshold) and second
for an item set to be frequent , all its subset should be in last
frequent item set The iterations begin with size 2 item sets
and the size is incremented after each iteration. The algorithm
is based on the closure property of frequent item sets: if a set
of items is frequent, then all its proper subsets are also
frequent.
3. FP-GROWTH ALGORITHM
FP-growth [9] is a well-known algorithm that uses the FP- tree
data structure to achieve a condensed representation of the
IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org 512
database transactions and employs a divide and-conquer
approach to decompose the mining problem into a set of
smaller problems. In essence, it mines all the frequent itemsets
by recursively finding all frequent itemsets in the conditional
pattern base which is efficiently constructed with the help of a
node link structure. A variant of FP-growth is the H-mine
algorithm [10]. It uses array-based and trie-based data
structures to deal with sparse and dense datasets respectively.
FPgrowth* [11] uses an array technique to reduce the FP-tree
traversal time. In FP-growth based algorithms, recursive
construction of the FP-tree affects the algorithm’s
performance.
TID Items bought (ordered) frequent items
100 {z, s, i, d, g, r, e, n} {z , i, s, e, n}
200 {s, h, i, z, l, e, o} {z , i, s, h, e}
300 {h, z, t, j, o, w} {z , h}
400 {h, i , k, s, n} {i, h, n}
500 {s, z, i , e, l ,u, e, n} {z , i, s, e, n}
1. Scan DB once, find frequent 1-itemset (single item
pattern)
2. Sort frequent items in frequency descending order, f-
list
3. Scan DB again, construct FP-tree
TABLE 1:
min_support = 3
F-list= z-i-s-h-e-n
4. METHODOLOGY USED
A. ASCENDING FREQUENCY ORDERED PREFIX-TREE
(AFOPT)
AFOPT is an efficient algorithm for mining frequent itemsets.
It adopts the pattern growth approach and uses a compact data
structure---Ascending Frequency Ordered Prefix-Tree
(AFOPT) to represent the conditional databases[12]. The
AFOPT tree structure is traversed top-down. Compared with
the descending frequent order and bottom-up traversal strategy
adopted by the FP-growth algorithm, the combination of the
top-down traversal strategy and ascending frequency ordering
method requires less pointers to be maintained at each node
and it also reduces the traversal cost of individual conditional
databases.
The goal of mining frequent closed itemsets or maximal
frequent itemsets is to reduce output size. An itemset is closed
if all of its supersets are less frequent than it. An itemset is
maximal if none of its superset is frequent. The complete set
of frequent itemsets can be recovered from the set of frequent
closed itemsets or the set of maximal frequent itemsets. From
frequent closed itemsets, the support information of itemsets
can be recovered, but it cannot be recovered from maximal
frequent itemsets.
B. ADAPTIVE AFOPT ALGORITHM - ADA AFOPT
An algorithm, named ADA AFOPT, which can be used to
resolve the multilevel frequent pattern mining problem. The
algorithm is obtained by extending the AFOPT algorithm for
multi-level databases. The features of the AFOPT
algorithm[12] i.e. FP-Tree, FP-Tree based pattern fragment,
1item partition based divide and conquer method are well
preserved. This algorithm uses flexible support constraints. To
avoid the problem caused by uniform support threshold,
mining with various support constraints is used. Uniform
support threshold might cause problems of either generating
uninteresting patterns at higher abstraction level or missing
potential interesting patterns at lower abstraction are level. At
each level, we classify individual items into two categories:
normal items and exceptional items.
ADA AFOPT algorithm favours users by pushing these
various n transactions, support constraints deep into the
mining process. The interestingness of the patterns mound
generated, hence, is improved dramatically. Being based on
the AFOPT algorithm, this algorithm first traverses the
original database to find frequent items or abstract levels and
sorts them in ascending frequency order. Then the original
database is scanned the second time to construct an AFOPT
structure to represent the conditional databases of the frequent
items.
Header Table
Item frequency
head
z
4
i
4
s
3
h
3
{}
z:4 i:1
i:3 h:1 h:1
s:3 n:3
e:2 h:1
n:2 e:1
IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org 513
C. TRANSACTION REDUCTION TECHNIQUE
Theorem: If c € Fk and c.support < min.support, Titems ≤ k, k
= 1, then c is useless in Fk+1 where Fk is Frequent pattern, c
is an itemset in each transaction and Titems is total item count
in each transactions.
Proof 1: [For c € Fk ]
Consider a transaction Ti = {T1, T2, T3… Tm}. Let T1 = {a1,
a2, a3… an}, T2 = {a1}. Since c is a hierarchy data.
Whenever the lower-level items of c achieve a support count,
the higher –level items should be added into the reduced
transaction table. For Example data c is 111, 211... Satisfies
support count, therefore the higher –level items of 11*, 21*…
and the next higher-level also 1**, 2** should be added into
the reduced transaction table. If lower-level items of c
does not satisfy the min.support, then lower-level items
of c is removed from the reduced transaction table. Hence
the proof.
Proof 2: [For T items ≤ k where k = 1]
Now consider the same transaction Ti = {T1, T2, T3… Tm}.
Let T1 = {a1, a2, a3… an}, T2 = {a1}. During frequent k+1
pattern generation transaction T2 requires at least 2 as
item count and if not then, Ti can be rejected from the
transaction table. Hence the proof.
Let us consider the following Example with sample database.
TABLE: 1 Sample Database
CCB – Tree Algorithm [13] has been used to find multilevel
frequent 1 pattern.
1:7:7 2:5:7 3:3:8 4:3:8 5:2:8 7:1
11:6:6 12:4:5 13:1:7 21:3:5 22:4:5 32:3:8 41:3:8
52:2:8 71:1
121:2 122:2 131:1 211:3 221:3 222:1 323:3 411:2 413:1 524:2
713:1
111:4 112:1 113:1
After generating the FP tree the next step is to generate
candidate itemsets and find frequent patterns. It begins by
scanning the tree and identifying its leaf nodes. A pointer
to each leaf is then inserting into the leaf node array.
After that a bottom up scan of each leaf node is done
until it reaches the root. Meanwhile each node visited is
conserved into temporary buffer for recording the passing
path when a node with support count is visited. Candidate
Generation keeps the path from starting node i.e. leaf node
to the current node and generate all combinations of
candidate 2-itemset. Only items from all levels that are
above this threshold can be considered as frequent.
Candidate itemset which satisfies the minimum support count
that candidate can be used for next level processing, the node
which does not satisfy minimum .support can be ignored and
candidate generation does nothing for this. After finding
frequent 2-itemsets from all sub trees. Next traversal is
Candidate generation for frequent 3 itemsets. The supports for
all the candidate k-itemsets (k≥3) can be computed and
the frequent k-itemset can be obtained. This process
proceeds until to find frequent k patterns.
TID Items
T1 {111,121,211,221}
T2 {111,211,222,323}
T3 {112,122,221,411}
T4 {111,121}
T5 {111,122,211,221,413}
T6 {113,323,524}
T7 {131,231}
T8 {323,411,524,713}
Root
IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org 514
TABLE: 2
5. COMPARISON
Table 2 shows the comparison between the techniques that are
discussed in this paper. The characteristics used to distinguish
are: pre-processing, feature extraction, database and result.
Pre-processing is the main step in frequent pattern mining..
Second characteristic is feature extraction which gives the
extracted features that are used in classification. In AFOPT
and ADA-AFOPT the tree is sorted and arranged in ascending
order. Transaction Reduction Technique based method is
used to reduce the unwanted candidates and transactions and
applying the resulted transactions in FP-tree as input to
subsequent iterations of the mining process. It reduces the
I/O costs and search spaces without losing any patterns.
CONCLUSIONS
This paper’s objective is to present the major techniques of
multilevel frequent pattern mining. This paper surveys some
of the important techniques. The techniques considered in this
paper are Ascending Frequency Ordered Prefix-Tree
(AFOPT), Adaptive Afopt algorithm (ADA AFOPT),
Transaction Reduction Technique. The experimental results
shows that AFOPT with ADA AFOPT gives excellent result,
Transaction Reduction Technique are also good in minimizing
I/O cost.
REFERENCES
[1] Agrawal R, Imielinski T, Swami A (1993) Mining
association rules between sets of items in large
databases.In:Proceedings of the1993 ACM-SIGMOD
international conference on management of data
(SIGMOD’93), Washington, DC, pp 207–216,.
[2] Han .J ,Pei .J, and Yin .Y,(2000) Mining Frequent
patterns without candidate generation. In Proc. Of
ACM- SIGMOD Int. Conf. on Management of
Data, pages 1- 12.
[3] T.Eavis and XI Zheng, Multi-Level Frequent
Pattern Mining, in Springer-Verlag Berlin Heidelberg
2009, pp. 369 – 383..
[4] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U.
Dayal, and M. Hsu, “Prefixspan: Mining Sequential
Patterns Efficiently by Prefix-Projected Pattern
Growth,” Proc. 17th Int’l Conf. Data Eng. (ICDE ’01),
pp. 215-224, 2001.
[5] X. Yan, J. Han, and R. Afshar, “Clospan: Mining
Closed Sequential Patterns in Large Datasets,” Proc.
SIAM Int’l Conf. Data Mining (SDM ’03), pp. 166-
177, 2003
[6] J. Han and K.C.-C. Chang, “Data Mining for Web
Intelligence,” Computer, vol. 35, no. 11, pp. 64-70,
Nov. 2002.
[7] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns
without Candidate Generation,” Proc. ACM SIGMOD
Int’l Conf. Management of Data (SIGMOD ’00), pp. 1-
12, 2000.
[8] Y. Li and N. Zhong, “Interpretations of Association
Rules by Granular Computing,” Proc. IEEE Third Int’l
Conf. Data Mining (ICDM ’03), pp. 593-596, 2003.
[9] J. Han, J. Pei, and Y. Yin,“Mining Frequent Patterns
without Candidate Generation,” Proceedings of ACM
SIGMOD International Conference on Management of
Data, ACM Press, Dallas, Texas, pp. 1-12, May 2000.
[10] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang,
“Hmine: Hyper Structure Mining of Frequent Patterns
in Large Databases,” Proceedings of IEEE
International Conference on Data Mining, pp. 441-448,
2001.
[11] G. Grahne, and J. Zhu, ”Efficiently using prefix-trees in
mining frequent itemsets,” FIMI ’03, Frequent Itemset
Mining Implementations, Proceedings of the ICDM
2003 Workshop on Frequent Itemset Mining
Implementations, Melbourne, Florida,December 2003.
[12] Int. J. of Computers, Communications & Control, ISSN
1841-9836, E-ISSN 1841-9844 Vol. III (2008), Suppl.
issue: Proceedings of ICCCC 2008, pp. 437-441
[13] Dr.K.Duraiswamy and B.Jayanthi, a Novel
preprocessing Algorithm for Frequent Pattern Mining
in Mutidatasets, International Journal of Data
Engineering,Vol. 2, No. 3, Aug 2011.
Techniques
Pre-
processing
Feature
extraction
Result
Ascending
Frequency
Ordered Prefix-
Tree (AFOPT)
To Traverse
the trees in
top-down
depth-first
order
Sorted
tree(Ascendin
g order)
Feasible and
effective
ADAPTIVE
AFOPT
algorithm - ADA
AFOPT
To Scan the
traversed tree
Scanned Tree
Excellent
Transaction
Reduction
Technique
To Reduce
Non candidate
Pattern
Reduced
Search tree
with less I/O
cost.
Good
IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org 515
BIOGRAPHIES
Syed Zishan Ali is with the Department of
Computer Science and Engineering, hilai
Institute of Technology, Raipur, Chhattisgarh,
India.. E-mail: zishan786s@gmail.com
Yogesh Rathore is currently the coordinator
of M.Tech, and is a Senior lecturer in
Department of Computer Science and
engineering, Raipur Institute of Technology.
Raipur, Chhattisgarh, India. E-mail:
yogeshrathore23@gmail.com

More Related Content

PDF
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
PDF
Sequential Pattern Tree Mining
PDF
Ad03301810188
PPT
Mining Frequent Patterns, Association and Correlations
PPT
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
PDF
Literature Survey of modern frequent item set mining methods
PDF
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Sequential Pattern Tree Mining
Ad03301810188
Mining Frequent Patterns, Association and Correlations
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Literature Survey of modern frequent item set mining methods
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...

What's hot (20)

PDF
Ej36829834
PDF
International Journal of Engineering Research and Development
PDF
A classification of methods for frequent pattern mining
PPTX
Mining single dimensional boolean association rules from transactional
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
Frequent Item Set Mining - A Review
PPT
Associations1
PDF
REVIEW: Frequent Pattern Mining Techniques
PPT
My6asso
PDF
Ijariie1129
PDF
An improvised tree algorithm for association rule mining using transaction re...
PPTX
Mining frequent patterns association
PDF
A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI
PDF
Ijcatr04051004
PPTX
Apriori algorithm
PDF
A Survey on Frequent Patterns To Optimize Association Rules
PDF
A Framework to Automatically Extract Funding Information from Text
PPT
Apriori algorithm
PPTX
Apriori algorithm
PDF
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
Ej36829834
International Journal of Engineering Research and Development
A classification of methods for frequent pattern mining
Mining single dimensional boolean association rules from transactional
Welcome to International Journal of Engineering Research and Development (IJERD)
Frequent Item Set Mining - A Review
Associations1
REVIEW: Frequent Pattern Mining Techniques
My6asso
Ijariie1129
An improvised tree algorithm for association rule mining using transaction re...
Mining frequent patterns association
A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI
Ijcatr04051004
Apriori algorithm
A Survey on Frequent Patterns To Optimize Association Rules
A Framework to Automatically Extract Funding Information from Text
Apriori algorithm
Apriori algorithm
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
Ad

Viewers also liked (15)

PPT
God acts on those who wait
PPTX
Blended Learning for Secondary School Teachers: Teaching a new programming en...
PPT
勉強会 2014-12-11 (spain)
PPTX
Evaluation question 5
PDF
Deferred
PPTX
Image Optimization Techniques & Tips
PDF
أجرك لا شيء
PPTX
Blended Learning in learner-centered environments – a case study
PDF
Water quality modeling of an agricultural watershed with best management prac...
PDF
Delayed feedback control of nonlinear phenomena in indirect field oriented co...
PPTX
¿Por qué Mindfulness?
PDF
Agenda sistemica estudiantes
PPTX
trabajo de ingles
PPTX
Sudden death
PPTX
ви наші найкращі №4
God acts on those who wait
Blended Learning for Secondary School Teachers: Teaching a new programming en...
勉強会 2014-12-11 (spain)
Evaluation question 5
Deferred
Image Optimization Techniques & Tips
أجرك لا شيء
Blended Learning in learner-centered environments – a case study
Water quality modeling of an agricultural watershed with best management prac...
Delayed feedback control of nonlinear phenomena in indirect field oriented co...
¿Por qué Mindfulness?
Agenda sistemica estudiantes
trabajo de ingles
Sudden death
ви наші найкращі №4
Ad

Similar to A comprehensive study of major techniques of multi level frequent pattern mining a survey (20)

PDF
3.[18 22]hybrid association rule mining using ac tree
PDF
3.[18 22]hybrid association rule mining using ac tree
PDF
J017114852
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
A Brief Overview On Frequent Pattern Mining Algorithms
PDF
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
PDF
20120140502006
PDF
20120140502006
PDF
A Study of Various Projected Data Based Pattern Mining Algorithms
PDF
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
PDF
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
PDF
B017550814
PDF
An improved apriori algorithm for association rules
PDF
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
PDF
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
PDF
J0945761
PDF
Mining Frequent Item set Using Genetic Algorithm
PDF
Sequential Pattern Mining Methods: A Snap Shot
PDF
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
PDF
Review Over Sequential Rule Mining
3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree
J017114852
International Journal of Engineering Research and Development (IJERD)
A Brief Overview On Frequent Pattern Mining Algorithms
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
20120140502006
20120140502006
A Study of Various Projected Data Based Pattern Mining Algorithms
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
B017550814
An improved apriori algorithm for association rules
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
J0945761
Mining Frequent Item set Using Genetic Algorithm
Sequential Pattern Mining Methods: A Snap Shot
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
Review Over Sequential Rule Mining

More from eSAT Publishing House (20)

PDF
Likely impacts of hudhud on the environment of visakhapatnam
PDF
Impact of flood disaster in a drought prone area – case study of alampur vill...
PDF
Hudhud cyclone – a severe disaster in visakhapatnam
PDF
Groundwater investigation using geophysical methods a case study of pydibhim...
PDF
Flood related disasters concerned to urban flooding in bangalore, india
PDF
Enhancing post disaster recovery by optimal infrastructure capacity building
PDF
Effect of lintel and lintel band on the global performance of reinforced conc...
PDF
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
PDF
Wind damage to buildings, infrastrucuture and landscape elements along the be...
PDF
Shear strength of rc deep beam panels – a review
PDF
Role of voluntary teams of professional engineers in dissater management – ex...
PDF
Risk analysis and environmental hazard management
PDF
Review study on performance of seismically tested repaired shear walls
PDF
Monitoring and assessment of air quality with reference to dust particles (pm...
PDF
Low cost wireless sensor networks and smartphone applications for disaster ma...
PDF
Coastal zones – seismic vulnerability an analysis from east coast of india
PDF
Can fracture mechanics predict damage due disaster of structures
PDF
Assessment of seismic susceptibility of rc buildings
PDF
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
PDF
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Likely impacts of hudhud on the environment of visakhapatnam
Impact of flood disaster in a drought prone area – case study of alampur vill...
Hudhud cyclone – a severe disaster in visakhapatnam
Groundwater investigation using geophysical methods a case study of pydibhim...
Flood related disasters concerned to urban flooding in bangalore, india
Enhancing post disaster recovery by optimal infrastructure capacity building
Effect of lintel and lintel band on the global performance of reinforced conc...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Shear strength of rc deep beam panels – a review
Role of voluntary teams of professional engineers in dissater management – ex...
Risk analysis and environmental hazard management
Review study on performance of seismically tested repaired shear walls
Monitoring and assessment of air quality with reference to dust particles (pm...
Low cost wireless sensor networks and smartphone applications for disaster ma...
Coastal zones – seismic vulnerability an analysis from east coast of india
Can fracture mechanics predict damage due disaster of structures
Assessment of seismic susceptibility of rc buildings
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...

Recently uploaded (20)

PPTX
web development for engineering and engineering
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Digital Logic Computer Design lecture notes
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
composite construction of structures.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
PPT on Performance Review to get promotions
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Construction Project Organization Group 2.pptx
PPTX
Welding lecture in detail for understanding
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Well-logging-methods_new................
PPTX
Geodesy 1.pptx...............................................
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
web development for engineering and engineering
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Digital Logic Computer Design lecture notes
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
composite construction of structures.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPT on Performance Review to get promotions
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
OOP with Java - Java Introduction (Basics)
Embodied AI: Ushering in the Next Era of Intelligent Systems
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
bas. eng. economics group 4 presentation 1.pptx
Construction Project Organization Group 2.pptx
Welding lecture in detail for understanding
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Well-logging-methods_new................
Geodesy 1.pptx...............................................
Internet of Things (IOT) - A guide to understanding
Operating System & Kernel Study Guide-1 - converted.pdf

A comprehensive study of major techniques of multi level frequent pattern mining a survey

  • 1. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163 __________________________________________________________________________________________ Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org 511 A COMPREHENSIVE STUDY OF MAJOR TECHNIQUES OF MULTI LEVEL FREQUENT PATTERN MINING: A SURVEY Syed Zishan Ali1 , Yogesh Rathore2 1 Computer Science and Engineering, 2 Professor, Computer Science and Engineering, 1, 2 Raipur Institute of Technology, Raipur, Mandir Hasaud, Raipur, Chhattisgarh, INDIA Abstract Frequent pattern mining has become one of the most popular data mining approaches for the analysis of purchasing patterns. There are techniques such as Apriority and FP-Growth, which were typically restricted to a single concept level. We extend our research to study Multi - level frequent patterns in multi-level environments. Mining Multi-level frequent pattern may lead to the discovery of mining patterns at different levels of hierarchy. In this study, we describe the main techniques used to solve these problems and give a comprehensive survey of the most influential algorithms That were proposed during the last decade. Index Terms: Data Mining, Data Transformation, Frequent Pattern Mining (FPM), Transactional Database. ----------------------------------------------------------------------***------------------------------------------------------------------------ 1. INTRODUCTION Frequent patterns are itemsets, subsequences, or substructures that appear in a data set with frequency no less than a user- specified threshold. For example, a set of items, such as milk and bread, that appear frequently together in a Transaction data set, is a frequent itemset. Finding frequent patterns plays an essential role in mining associations, correlations, and many other interesting relationships among data. Moreover, it helps in data indexing, classification, clustering, and other data mining tasks as well. Thus, frequent pattern mining has become an important data mining task and a focused theme in data mining research. Frequent pattern mining was first proposed by Agrawal [1] for market basket analysis in the form of association rule mining. It analyses customer buying habits by finding associations between the different items that customers place in their “shopping baskets”. For instance, if customers are buying milk, how likely are they going to also buy cereal (and what kind of cereal) on the same trip to the supermarket? Such information can lead to increased sales by helping retailers do selective marketing and arrange their shelf space One approach to multilevel mining would be to directly exploit the standard algorithms in this area – Apriori [1] and FP-Growth [2] by iteratively applying them in a level by level manner to each concept level. In this paper, we focused on the study of frequent patterns based on the FP- tree [3]. Many scholars have published a tons of research work on frequent pattern mining. There have been extensive studies on the improvements or extensions of Pattern mining has been extensively studied in data mining communities for many years. A variety of efficient algorithms such as PrefixSpan [4],[5], FP-tree [6],[7] have been proposed. These research works have mainly focused on developing efficient mining algorithms for discovering patterns from a large data collection. However, searching for useful and interesting patterns and rules was still an open problem [8]. Some of the basic mining techniques : Apriori, Fp-Growth etc. 2. APRIORI ALGORITHM Apriori is a algorithm proposed by R. Agrawal and R Srikant in 1994 [1] for mining frequent item sets for Boolean association rule. The name of algorithm is based on the fact that the algorithm uses prior knowledge of frequent item set properties, as we shall see following. Apriori employs an iterative approach known as level-wise search, where k item set are used to explore (k+1) item sets. There are two steps in each iteration. The first step generates a set of candidate item sets. Then, in the second step we count the occurrence of each candidate set in database and prunes all disqualified candidates (i.e. all infrequent item sets). Apriori uses two pruning technique, first on the bases of support count (should be greater than user specified support threshold) and second for an item set to be frequent , all its subset should be in last frequent item set The iterations begin with size 2 item sets and the size is incremented after each iteration. The algorithm is based on the closure property of frequent item sets: if a set of items is frequent, then all its proper subsets are also frequent. 3. FP-GROWTH ALGORITHM FP-growth [9] is a well-known algorithm that uses the FP- tree data structure to achieve a condensed representation of the
  • 2. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163 __________________________________________________________________________________________ Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org 512 database transactions and employs a divide and-conquer approach to decompose the mining problem into a set of smaller problems. In essence, it mines all the frequent itemsets by recursively finding all frequent itemsets in the conditional pattern base which is efficiently constructed with the help of a node link structure. A variant of FP-growth is the H-mine algorithm [10]. It uses array-based and trie-based data structures to deal with sparse and dense datasets respectively. FPgrowth* [11] uses an array technique to reduce the FP-tree traversal time. In FP-growth based algorithms, recursive construction of the FP-tree affects the algorithm’s performance. TID Items bought (ordered) frequent items 100 {z, s, i, d, g, r, e, n} {z , i, s, e, n} 200 {s, h, i, z, l, e, o} {z , i, s, h, e} 300 {h, z, t, j, o, w} {z , h} 400 {h, i , k, s, n} {i, h, n} 500 {s, z, i , e, l ,u, e, n} {z , i, s, e, n} 1. Scan DB once, find frequent 1-itemset (single item pattern) 2. Sort frequent items in frequency descending order, f- list 3. Scan DB again, construct FP-tree TABLE 1: min_support = 3 F-list= z-i-s-h-e-n 4. METHODOLOGY USED A. ASCENDING FREQUENCY ORDERED PREFIX-TREE (AFOPT) AFOPT is an efficient algorithm for mining frequent itemsets. It adopts the pattern growth approach and uses a compact data structure---Ascending Frequency Ordered Prefix-Tree (AFOPT) to represent the conditional databases[12]. The AFOPT tree structure is traversed top-down. Compared with the descending frequent order and bottom-up traversal strategy adopted by the FP-growth algorithm, the combination of the top-down traversal strategy and ascending frequency ordering method requires less pointers to be maintained at each node and it also reduces the traversal cost of individual conditional databases. The goal of mining frequent closed itemsets or maximal frequent itemsets is to reduce output size. An itemset is closed if all of its supersets are less frequent than it. An itemset is maximal if none of its superset is frequent. The complete set of frequent itemsets can be recovered from the set of frequent closed itemsets or the set of maximal frequent itemsets. From frequent closed itemsets, the support information of itemsets can be recovered, but it cannot be recovered from maximal frequent itemsets. B. ADAPTIVE AFOPT ALGORITHM - ADA AFOPT An algorithm, named ADA AFOPT, which can be used to resolve the multilevel frequent pattern mining problem. The algorithm is obtained by extending the AFOPT algorithm for multi-level databases. The features of the AFOPT algorithm[12] i.e. FP-Tree, FP-Tree based pattern fragment, 1item partition based divide and conquer method are well preserved. This algorithm uses flexible support constraints. To avoid the problem caused by uniform support threshold, mining with various support constraints is used. Uniform support threshold might cause problems of either generating uninteresting patterns at higher abstraction level or missing potential interesting patterns at lower abstraction are level. At each level, we classify individual items into two categories: normal items and exceptional items. ADA AFOPT algorithm favours users by pushing these various n transactions, support constraints deep into the mining process. The interestingness of the patterns mound generated, hence, is improved dramatically. Being based on the AFOPT algorithm, this algorithm first traverses the original database to find frequent items or abstract levels and sorts them in ascending frequency order. Then the original database is scanned the second time to construct an AFOPT structure to represent the conditional databases of the frequent items. Header Table Item frequency head z 4 i 4 s 3 h 3 {} z:4 i:1 i:3 h:1 h:1 s:3 n:3 e:2 h:1 n:2 e:1
  • 3. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163 __________________________________________________________________________________________ Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org 513 C. TRANSACTION REDUCTION TECHNIQUE Theorem: If c € Fk and c.support < min.support, Titems ≤ k, k = 1, then c is useless in Fk+1 where Fk is Frequent pattern, c is an itemset in each transaction and Titems is total item count in each transactions. Proof 1: [For c € Fk ] Consider a transaction Ti = {T1, T2, T3… Tm}. Let T1 = {a1, a2, a3… an}, T2 = {a1}. Since c is a hierarchy data. Whenever the lower-level items of c achieve a support count, the higher –level items should be added into the reduced transaction table. For Example data c is 111, 211... Satisfies support count, therefore the higher –level items of 11*, 21*… and the next higher-level also 1**, 2** should be added into the reduced transaction table. If lower-level items of c does not satisfy the min.support, then lower-level items of c is removed from the reduced transaction table. Hence the proof. Proof 2: [For T items ≤ k where k = 1] Now consider the same transaction Ti = {T1, T2, T3… Tm}. Let T1 = {a1, a2, a3… an}, T2 = {a1}. During frequent k+1 pattern generation transaction T2 requires at least 2 as item count and if not then, Ti can be rejected from the transaction table. Hence the proof. Let us consider the following Example with sample database. TABLE: 1 Sample Database CCB – Tree Algorithm [13] has been used to find multilevel frequent 1 pattern. 1:7:7 2:5:7 3:3:8 4:3:8 5:2:8 7:1 11:6:6 12:4:5 13:1:7 21:3:5 22:4:5 32:3:8 41:3:8 52:2:8 71:1 121:2 122:2 131:1 211:3 221:3 222:1 323:3 411:2 413:1 524:2 713:1 111:4 112:1 113:1 After generating the FP tree the next step is to generate candidate itemsets and find frequent patterns. It begins by scanning the tree and identifying its leaf nodes. A pointer to each leaf is then inserting into the leaf node array. After that a bottom up scan of each leaf node is done until it reaches the root. Meanwhile each node visited is conserved into temporary buffer for recording the passing path when a node with support count is visited. Candidate Generation keeps the path from starting node i.e. leaf node to the current node and generate all combinations of candidate 2-itemset. Only items from all levels that are above this threshold can be considered as frequent. Candidate itemset which satisfies the minimum support count that candidate can be used for next level processing, the node which does not satisfy minimum .support can be ignored and candidate generation does nothing for this. After finding frequent 2-itemsets from all sub trees. Next traversal is Candidate generation for frequent 3 itemsets. The supports for all the candidate k-itemsets (k≥3) can be computed and the frequent k-itemset can be obtained. This process proceeds until to find frequent k patterns. TID Items T1 {111,121,211,221} T2 {111,211,222,323} T3 {112,122,221,411} T4 {111,121} T5 {111,122,211,221,413} T6 {113,323,524} T7 {131,231} T8 {323,411,524,713} Root
  • 4. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163 __________________________________________________________________________________________ Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org 514 TABLE: 2 5. COMPARISON Table 2 shows the comparison between the techniques that are discussed in this paper. The characteristics used to distinguish are: pre-processing, feature extraction, database and result. Pre-processing is the main step in frequent pattern mining.. Second characteristic is feature extraction which gives the extracted features that are used in classification. In AFOPT and ADA-AFOPT the tree is sorted and arranged in ascending order. Transaction Reduction Technique based method is used to reduce the unwanted candidates and transactions and applying the resulted transactions in FP-tree as input to subsequent iterations of the mining process. It reduces the I/O costs and search spaces without losing any patterns. CONCLUSIONS This paper’s objective is to present the major techniques of multilevel frequent pattern mining. This paper surveys some of the important techniques. The techniques considered in this paper are Ascending Frequency Ordered Prefix-Tree (AFOPT), Adaptive Afopt algorithm (ADA AFOPT), Transaction Reduction Technique. The experimental results shows that AFOPT with ADA AFOPT gives excellent result, Transaction Reduction Technique are also good in minimizing I/O cost. REFERENCES [1] Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases.In:Proceedings of the1993 ACM-SIGMOD international conference on management of data (SIGMOD’93), Washington, DC, pp 207–216,. [2] Han .J ,Pei .J, and Yin .Y,(2000) Mining Frequent patterns without candidate generation. In Proc. Of ACM- SIGMOD Int. Conf. on Management of Data, pages 1- 12. [3] T.Eavis and XI Zheng, Multi-Level Frequent Pattern Mining, in Springer-Verlag Berlin Heidelberg 2009, pp. 369 – 383.. [4] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, “Prefixspan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” Proc. 17th Int’l Conf. Data Eng. (ICDE ’01), pp. 215-224, 2001. [5] X. Yan, J. Han, and R. Afshar, “Clospan: Mining Closed Sequential Patterns in Large Datasets,” Proc. SIAM Int’l Conf. Data Mining (SDM ’03), pp. 166- 177, 2003 [6] J. Han and K.C.-C. Chang, “Data Mining for Web Intelligence,” Computer, vol. 35, no. 11, pp. 64-70, Nov. 2002. [7] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’00), pp. 1- 12, 2000. [8] Y. Li and N. Zhong, “Interpretations of Association Rules by Granular Computing,” Proc. IEEE Third Int’l Conf. Data Mining (ICDM ’03), pp. 593-596, 2003. [9] J. Han, J. Pei, and Y. Yin,“Mining Frequent Patterns without Candidate Generation,” Proceedings of ACM SIGMOD International Conference on Management of Data, ACM Press, Dallas, Texas, pp. 1-12, May 2000. [10] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, “Hmine: Hyper Structure Mining of Frequent Patterns in Large Databases,” Proceedings of IEEE International Conference on Data Mining, pp. 441-448, 2001. [11] G. Grahne, and J. Zhu, ”Efficiently using prefix-trees in mining frequent itemsets,” FIMI ’03, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations, Melbourne, Florida,December 2003. [12] Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. III (2008), Suppl. issue: Proceedings of ICCCC 2008, pp. 437-441 [13] Dr.K.Duraiswamy and B.Jayanthi, a Novel preprocessing Algorithm for Frequent Pattern Mining in Mutidatasets, International Journal of Data Engineering,Vol. 2, No. 3, Aug 2011. Techniques Pre- processing Feature extraction Result Ascending Frequency Ordered Prefix- Tree (AFOPT) To Traverse the trees in top-down depth-first order Sorted tree(Ascendin g order) Feasible and effective ADAPTIVE AFOPT algorithm - ADA AFOPT To Scan the traversed tree Scanned Tree Excellent Transaction Reduction Technique To Reduce Non candidate Pattern Reduced Search tree with less I/O cost. Good
  • 5. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163 __________________________________________________________________________________________ Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org 515 BIOGRAPHIES Syed Zishan Ali is with the Department of Computer Science and Engineering, hilai Institute of Technology, Raipur, Chhattisgarh, India.. E-mail: zishan786s@gmail.com Yogesh Rathore is currently the coordinator of M.Tech, and is a Senior lecturer in Department of Computer Science and engineering, Raipur Institute of Technology. Raipur, Chhattisgarh, India. E-mail: yogeshrathore23@gmail.com