SlideShare a Scribd company logo
International Journal of Technical Research and Applications e-ISSN: 2320-8163,
www.ijtra.com Volume 3, Issue 1 (Jan-Feb 2015), PP. 31-34
31 | P a g e
A FLEXIBLE APPROACH TO MINE HIGH
UTILITY ITEMSETS FROM TRANSACTIONAL
DATABASES USING UP-GROWTH+: A SURVEY
Mr. Ramesh S. Yevale1, Prof. Vinod S. Wadne2
1
Department of Computer Engineering, 2
Assistant Professor
ICOER, Wagholi, Pune, Maharashtra, India
1
ryevale33@gmail.com
Abstract- Present day, mining of high utility itemsets
especially from transactional databases is required task to
process many transactional operations quick. There are many
methods that are presented for mining high utility itemsets from
transactional datasets are subjected to some serious limitations
such as performance of this methods needs to be investigated in
low memory based systems for mining high utility itemsets from
large transactional datasets and hence needs to address further
as well. Further limitation includes these methods cannot
overcome the screenings as well as overhead of null transactions;
hence, performance degrades eventually. We are analyzing the
new approaches to overcome these limitations such as distributed
programming model for mining business-oriented transactional
datasets, which overcomes the limitations and main memory-
based computing, but also unexpectedly highly scalable in terms
of increasing database size. We have used this approach with
existing UP-Growth and UP-Growth+ with aim of improving
their performances further.
Keywords: Data Mining, Frequent Itemset, Itemset Utility,
UP-Growth, UP-Growth+
I. INTRODUCTION
A high utility itemset is defined as: A group of items in a
transaction database is called itemset. This itemset in a
transaction database consists of two aspects: First one is
itemset in a single transaction is called internal utility and
second one is itemset in different transaction database is called
external utility. The transaction utility of an itemset is defined
as the multiplication of external utility by the internal utility.
By transaction utility, transaction weight utilizations (TWU)
can be found. To call an itemset as high utility itemset only if
its utility is not less than a user specified minimum support
threshold utility value; otherwise itemset is treated as low
utility itemset. To generate these high utility itemsets mining
recently in 2010, UP-Growth (Utility Pattern Growth)
algorithm was proposed by Vincent S. Tseng et al. for
discovering high utility itemsets and a tree based data structure
called UP-Tree (Utility Pattern tree) which efficiently
maintains the information of transaction database related to
the utility patterns. Four strategies (DGU, DGN, DLU, and
DLN) used for efficient construction of UP-Tree [11] and the
processing in UP-Growth [11]. By applying these strategies,
can not only efficiently decrease the estimated utilities of the
potential high utility itemsets (PHUI) but also effectively
reduce the number of candidates. But this algorithm takes
more execution time for phase II (identify local utility
itemsets) and I/O cost.
In this paper, the existing UP-Growth algorithm is
improved to generate high utility itemsets efficiently for large
datasets and reduce execution time in phase II compared with
existing UP-Growth algorithm. In the experimental section,
experiments are conducted on our improved algorithm and
existing algorithm with variety of synthetic and real-time
datasets.
II. PROBLEM DEFINITION
In this section we describe the concepts of regular
frequent pattern mining and define the basic definitions of the
problem to obtain complete set of regular frequent patterns in
incremental transaction databases.
Let I = {i1, i2, . . . , in} be a set of items. A set X = {ij, . . .
,ik} ⊆ I, where j ≤ k and j, k ∈ [1, n] is called a pattern or an
itemest. A transaction t = (tid, Y) is a couple where tid is a
transaction-id and Y is a pattern. Let size (t) be the size of t,
i.e., the number of items in Y. A transaction database DB over
I is a set of transactions T = {t1, . . . ,tm}, m = | DB | is the
size of DB, i.e., the total number of transactions in DB. If X ⊆
Y, which means that t contains X or X occurs in t and denoted
as tjX, j∈[1, m]. Therefore, TX = {tjX, . . . ,tkX}, j ≤ k and j, k
∈[1, m] is the set of all transactions where pattern X occurs in
DB.
A. Definition 1 (frequent pattern X):
The total number of transactions in a DB that
contains pattern X is called the support of X i.e.,
Sup(X). Hence Sup(X) = | TX|, where | TX | is the
size of TX. The pattern X is said to be frequent if its
support is greater than or equal to user given
minimum support threshold i.e., Sup(X) ≥
min_sup(δ).
B. Definition 2 (regularity of frequent pattern X)
Let tXj+1 and tjX, j∈[1, (m - 1)] be two successive
transactions where frequent pattern X appears. The
variation between these two successive transactions
can be defined as a period of X, say pX (i.e., p=
tXj+1 – tX, j∈[1, (m - )]). For ease, to calculate the
period of a pattern, we consider the first transaction
in the DB as null i.e., tf = 0 and the last transaction is
the mth transaction i.e., tl = tm. Let for a TX, PX be
the set of all periods of X i.e., PX = {p1X, . . . ,prX},
where r is the total number of periods in PX. Then
the regularity of a frequent pattern X can be denoted
as Reg(X) = max{p1X, . . . ,prX}. A frequent pattern
X is said to be regular frequent if its regularity is less
than or equal to user given maximum regularity
threshold i.e., λ.
International Journal of Technical Research and Applications e-ISSN: 2320-8163,
www.ijtra.com Volume 3, Issue 1 (Jan-Feb 2015), PP. 31-34
32 | P a g e
III. DATA SOURCES
A. OLAP
OLTP (On-line Transaction Processing) is
characterized by a large number of short on-line transactions
(INSERT, UPDATE, DELETE). The main emphasis for
OLTP systems is put on very fast query processing,
maintaining data integrity in multi-access environments and an
effectiveness measured by number of transactions per second.
In OLTP database there is detailed and current data, and
schema used to store transactional databases is theentity
model(usually3NF).
OLAP (On-line Analytical Processing) is characterized by
relatively low volume of transactions. Queries are often very
complex and involve aggregations. For OLAP systems a
response time is an effectiveness measure. OLAP applications
are widely used by Data Mining techniques. In OLAP
database there is aggregated, historical data, stored in multi-
dimensional schemas (usually star schema).
B. Big Data
Big Data describes the process of extracting actionable
intelligence from disparate, and often times non-traditional,
data sources. These data sources may include structured data
such as databases, sensor, click stream and location data, as
well as unstructured data like email, HTML, social data and
images. The actionable data may be represented visually (e.g.
in a graph), but it is often distilled down to a structured
format, which is then stored in a database for further
manipulation.
C. Stock Market
The goal of this article is to introduce the concepts,
terminology and code structures required to develop
applications that utilise real-time stock market data (e.g.
trading applications). It discusses trading concepts, the
different types of market data available, and provides a
practical example on how to process data feed events into a
market object model.
The article is aimed at intermediate to advanced
developers who wish to gain an understanding of basic
financial market data processing. I recommend that those who
are already familiar with trading terminology skip ahead to the
Market Data section.
D. Datasets
Real world data sets Accidents and Chess are obtained
from FIMI Repository [4]; Chain-store is obtained from NU-
MineBench 2.0 [5]; Foodmart is acquired from Microsoft
foodmart 2000 database. In the above data sets, except Chain-
store and Foodmart, unit profits for items in utility tables are
generated between 1 and 1,000 by using a log-normal
distribution and quantities of items are generated randomly
between 1 and 10. The two real data sets Chain-store and
foodmart already contain unit profits and purchased quantities.
Total utilities of the two data sets are 26,388,499.8 and
120,160.84, respectively.
IV. TECHNIQUES USED FOR HIGH UTILITY
ITEMSETS MINING
A. Mining Regular Frequent Patterns
In this section we describe the mining process of regular
frequent patterns in incremental transactional databases using
vertical data format requires only one database scan. To
generate length-1 itemset our algorithm constructs an item
header table called RFPID-table consists of four fields
(Itemset,Tid, Sup, Reg). Itemset is an item name, Tid is the
transaction list where the item occurs in various transactions,
Sup is the support of the itemset and Reg is the regularity of
an itemset. Each itemset consists of its own array to
accommodate Tids and other intermediate results. Let Table 1
be the transactional database DB in horizontal format which is
somewhat similar to the database in [9]. Convert the above
horizontal database into vertical database with one database
scan to store all length-1 items with respective tids, support and
regularity. For example, Let us consider the minimum support
threshold value, δ = 5 and maximum.
International Journal of Technical Research and Applications e-ISSN: 2320-8163,
www.ijtra.com Volume 3, Issue 1 (Jan-Feb 2015), PP. 31-34
33 | P a g e
B. Frequent Pattern Mining Tree:Design and
Construction
Let I = a1,a2,….an be a set of items, and a transaction
database DB = hT1,T2,….Tni,where Ti (i….[1::n]) is a
transaction which contains a set of items in I. The support1 (or
occurrence frequency) of a pattern A, which is a set of items,
is the number of transactions containing A in DB. A is a
frequent pattern if A's support is no less than a predefined
minimum support threshold. Given a transaction database DB
and a minimum support threshold, , the problem of finding the
complete set of frequent patterns is called the frequent pattern
mining problem.
C. UP-Growth Algorithm
The UP-Growth [11] is one of the efficient algorithms to
generate high utility itemsets depending on construction of a
global UP-Tree. In phase I, the framework of UP-Tree follows
three steps: (i). Construction of UP-Tree [11]. (ii). Generate
PHUIs from UP-Tree. (iii). Identify high utility itemsets using
PHUI.The construction of global UP-Tree [11] is follows, (i).
Discarding global unpromising items (i.e., DGU strategy) is to
eliminate the low utility items and their utilities from the
transaction utilities. (ii). Discarding global node utilities (i.e.,
DGN strategy) during global UP-Tree construction. By DGN
strategy, node utilities which are nearer to UP-Tree root node
are effectively reduced[15]. The PHUI is similar to TWU,
which compute all itemsets utility with the help of estimated
utility. Finally, identify high utility itemsets (not less than
min_sup) from PHUIs values. The global UP-Tree contains
many sub paths. Each path is considered from bottom node of
header table. This path is named as conditional pattern base
(CPB).
D. Improved UP-Growth
Although DGU and DGN strategies are efficiently reduce
the number of candidates in Phase 1(i.e., global UP-Tree). But
they cannot be applied during the construction of the local UP-
Tree (Phase-2). Instead use, DLU strategy (Discarding local
unpromising items) to discarding utilities of low utility items
from path utilities of the paths and DLN strategy (Discarding
local node utilities) to discarding item utilities of descendant
nodes during the local UP-Tree construction. Even though,
still the algorithm facing some performance issues in phase-2.
To overcome this, maximum transaction weight utilizations
(MTWU) are computed from all the items and considering
multiple of min_sup as a user specified threshold value as
shown in algorithm. By this modification, performance will
increase compare with existing UP-Tree construction also
improves the performance of UP-growth algorithm. An
improved utility pattern growth is abbreviated as IUPG.
V. LIMITATIONS OF FREQUENT ITEMSETS
MINING
A. Frequent Itemset Mining is Uncertain Transaction
databases semantically and had significant drawbacks
which led to misleading results.
B. Apriori, while historically significant, suffers from a
number of inefficiencies or trade-offs, which have
spawned other algorithms. Candidate generation
generates large numbers of subsets (the algorithm
attempts to load up the candidate set with as many as
possible before each scan). Bottom-up subset
exploration (essentially a breadth-first traversal of the
subset lattice) finds any maximal subset S only after
all of its proper subsets.
VI. APPLICATIONS OF FREQUENT ITEMSETS
MINING
A. Methodology/Principal Findings
The claims datasets of 1 million nationally representative
people within Taiwan's National Health Insurance in 2005
were used to calculate the number of patients with one-stop
visits. The frequent itemsets mining was applied to compute
the combination patterns of specialties in the one-stop visits.
Among the total 13,682,469 ambulatory care visits in 2005,
one-stop visits occurred 144,132 times and involved 296,822
visits (2.2% of all visits) by 66,294 (6.6%) persons. People
tended to have this behavior with age and the percentage
reached 27.5% (5,662 in 20,579) in the age group ≥80 years.
In general, women were more likely to have one-stop visits
than men (7.2% vs. 6.0%). Internal medicine plus
ophthalmology was the most frequent combination with a
visited frequency of 3,552 times (2.5%), followed by
cardiology plus neurology with 3,183 times (2.2%). The most
frequent three-specialty combination, cardiology plus
neurology and gastroenterology, occurred only 111 times.
B. Association Rule Learning
Association rule learning is a popular and well researched
method for discovering interesting relations between variables
in large databases.
VII. CONCLUSION
In this paper we have analyzed new enhanced frameworks
of recently presented algorithms namely UP-Growth and UP-
Growth+ with aim of improving the processing time
International Journal of Technical Research and Applications e-ISSN: 2320-8163,
www.ijtra.com Volume 3, Issue 1 (Jan-Feb 2015), PP. 31-34
34 | P a g e
performance and mining performance under the less system
memory environment as well. We have seen the concept of
UP-Growth and UP-Growth+. The systems presented the work
done so far over the previous approaches with the datasets
used. In the future completely evaluate this proposed
architecture and compare its performance against existing
methods in order to claim the effectiveness and efficiency of
this proposed network
VIII. ACKNOWLEDGEMENT
I express great many thanks to Prof. Vinod S. Wadne for
his great effort of supervising and leading me, to accomplish
this fine work. To college and department staff, they were a
great source of support and encouragement. To my friends and
family, for their warm, kind encourages and loves. To every
person who gave me something too light along my pathway. I
thanks for believing in me.
REFERENCES
[1] S. J. Yen and Y. S. Lee.: Mining high utility quantitative
association rules. In Proc. of 9th Int'l Conf. on Data
Warehousing and Knowledge Discovery, Lecture Notes in
Computer Science 4654, pp. 283-292, Sep., 2007.
[2] Frequent itemset mining implementations repository,
http://fimi.cs.helsinki.fi/
[3] Vincent. S. Tseng, C. W. Wu, B. E. Shie, and P. S. Yu.:
UP-Growth: An Efficient Algorithm for High Utility
Itemset Mining. In Proc. of ACM-KDD, Washington, DC,
USA, pp. 253-262, July 25–28, 2010.
[4] Y.-C. Li, J.-S. Yeh, and C.-C. Chang, “Isolated Items
Discarding Strategy for Discovering High Utility Itemsets,”
Data and Knowledge Eng., vol. 64, no. 1, pp. 198-217, Jan.
2008.
[5] C.H. Lin, D.Y. Chiu, Y.H. Wu, and A.L.P. Chen, “Mining
Frequent Itemsets from Data Streams with a Time-
Sensitive Sliding Window,” Proc. SIAM Int’l Conf. Data
Mining (SDM ’05), 2005.
[6] Y. Liu, W. Liao, and A. Choudhary, “A Fast High Utility
Itemsets Mining Algorithm,” Proc. Utility-Based Data
Mining Workshop, 2005.
[7] F. Tao, F. Murtagh, and M. Farid, “Weighted Association
Rule Mining Using Weighted Support and Significance
Framework,” Proc. ACM SIGKDD Conf. Knowledge
Discovery and Data Mining (KDD ’03), pp. 661-666, 2003
[8] H. Dutta, and J. Demme, “Distributed Storage of Large
Scale Multidimensional EEG Data using Hadoop/HBase,”
Grid and Cloud Database Management, New York City:
Springer; 2011.
[9] G. Y. Ming, W. Zhi-jun. A Vertical format algorithm for
mining frequent itemsets. IEEE Transactions, pp. 11-13
(2010).
[10] M. J. Zaki, G. Karam. Fast Vertical Mining Using Diffsets,
ACM SIGKDD. pp. 24-27 (2003).
[11] M. G. Elfeky, W. G. Aref, A. K. Elmagarmid. Periodicity
Detection in Time Series Databases. IEEE Transactions on
Knowledge and Data Engineering 17(7), pp. 875-887
(2005).
[12] A. Erwin, R.P. Gopalan, and N.R. Achuthan, “Efficient
Mining of High Utility Itemsets from Large Data Sets,”
Proc. 12th Pacific-Asia Conf. Advances in Knowledge
Discovery and Data Mining (PAKDD), pp. 554-561, 2008.
[13] R. Chan, Q. Yang, and Y. Shen, “Mining High Utility
Itemsets,” Proc. IEEE Third Int’l Conf. Data Mining, pp.
19-26, Nov. 2003.
[14] U. Yun and J.J. Leggett, “WIP: Mining Weighted
Interesting Patterns with a Strong Weight and/or Support
Affinity,” Proc. SIAM Int’l Conf. Data Mining (SDM ’06),
pp. 623-627, Apr. 2006.
[15] U. Yun, “An Efficient Mining of Weighted Frequent
Patterns with Length Decreasing Support Constraints,”
Knowledge-Based Systems, vol. 21, no. 8, pp. 741-752,
Dec. 2008.

More Related Content

PDF
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
PDF
A classification of methods for frequent pattern mining
PDF
An improvised frequent pattern tree
PPTX
Multidimensioal database
PPTX
DMDW Lesson 08 - Further Data Mining Algorithms
PDF
Mining Regular Patterns in Data Streams Using Vertical Format
PDF
Ej36829834
PPTX
DMDW Lesson 04 - Data Mining Theory
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
A classification of methods for frequent pattern mining
An improvised frequent pattern tree
Multidimensioal database
DMDW Lesson 08 - Further Data Mining Algorithms
Mining Regular Patterns in Data Streams Using Vertical Format
Ej36829834
DMDW Lesson 04 - Data Mining Theory

What's hot (17)

PPT
Mining Frequent Patterns, Association and Correlations
PDF
Ijcet 06 06_003
PPTX
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
PDF
B0950814
PPT
Associations1
PPTX
Mining frequent patterns association
PPTX
Data Mining: Mining ,associations, and correlations
PDF
A Performance Based Transposition algorithm for Frequent Itemsets Generation
PPT
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
PDF
DBMS INTRODUCTION
PDF
Paper id 42201608
PDF
Big Data with Rough Set Using Map- Reduce
PPTX
Introduction to dm and dw
PPTX
introduction to Data Structure and classification
PPT
Lec1
PDF
Mining frequent itemsets (mfi) over
Mining Frequent Patterns, Association and Correlations
Ijcet 06 06_003
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
B0950814
Associations1
Mining frequent patterns association
Data Mining: Mining ,associations, and correlations
A Performance Based Transposition algorithm for Frequent Itemsets Generation
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
DBMS INTRODUCTION
Paper id 42201608
Big Data with Rough Set Using Map- Reduce
Introduction to dm and dw
introduction to Data Structure and classification
Lec1
Mining frequent itemsets (mfi) over
Ad

Viewers also liked (20)

PDF
STUDIES ON PRODUCTION PERFORMANCE IN BROILER CHICKEN SUPPLEMENTING COPPER AND...
PPTX
Social media mediasharing-monicamcginnis
PDF
INVESTMENT AND ECONOMIC GROWTH IN SUDAN: AN EMPIRICAL INVESTIGATION, 1999-2011
PPTX
Slide show training_centre
PDF
NEED OF THE HOUR: A CUSTOMER CENTRIC FORMAT FOR ORGANIZED RETAILING
PDF
OPTIMIZATION OF SCALE FACTORS IN SHRINKAGE COMPENSATIONS IN SLS USING PATTERN...
PDF
EFFECT OF TRANS-SEPTAL SUTURE TECHNIQUE VERSUS NASAL PACKING AFTER SEPTOPLASTY
PPT
сокальщина гра
PPTX
интернет для специальности политология
PDF
Untitled Presentation
PPTX
Providing incentives
PPTX
Social media mediasharing-monicamcginnis
PDF
Omnichannel retailing
PPTX
PDF
A SURVEY ON IRIS RECOGNITION FOR AUTHENTICATION
PPT
презентация 4
PDF
UNIVERSIDAD METROPOLITANA
PDF
PDF
2012HAITI research report strengthening local capactities
PDF
COMPARISON OF THE EXPERIMENTAL PERFORMANCE OF A THERMOELECTRIC REFRIGERATOR W...
STUDIES ON PRODUCTION PERFORMANCE IN BROILER CHICKEN SUPPLEMENTING COPPER AND...
Social media mediasharing-monicamcginnis
INVESTMENT AND ECONOMIC GROWTH IN SUDAN: AN EMPIRICAL INVESTIGATION, 1999-2011
Slide show training_centre
NEED OF THE HOUR: A CUSTOMER CENTRIC FORMAT FOR ORGANIZED RETAILING
OPTIMIZATION OF SCALE FACTORS IN SHRINKAGE COMPENSATIONS IN SLS USING PATTERN...
EFFECT OF TRANS-SEPTAL SUTURE TECHNIQUE VERSUS NASAL PACKING AFTER SEPTOPLASTY
сокальщина гра
интернет для специальности политология
Untitled Presentation
Providing incentives
Social media mediasharing-monicamcginnis
Omnichannel retailing
A SURVEY ON IRIS RECOGNITION FOR AUTHENTICATION
презентация 4
UNIVERSIDAD METROPOLITANA
2012HAITI research report strengthening local capactities
COMPARISON OF THE EXPERIMENTAL PERFORMANCE OF A THERMOELECTRIC REFRIGERATOR W...
Ad

Similar to A FLEXIBLE APPROACH TO MINE HIGH UTILITY ITEMSETS FROM TRANSACTIONAL DATABASES USING UP-GROWTH+: A SURVEY (20)

PDF
International Journal of Engineering Research and Development (IJERD)
PDF
The International Journal of Engineering and Science (The IJES)
PDF
LNAI 2682 Declarative Data Mining Using SQL3 1st Edition by Hasan Jamil ISBN ...
PDF
An Approach of Improvisation in Efficiency of Apriori Algorithm
PDF
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
PDF
An efficient algorithm for mining frequent inter transaction patterns
 
PDF
A Relative Study on Various Techniques for High Utility Itemset Mining from T...
PDF
A1030105
PDF
A Study of Various Projected Data Based Pattern Mining Algorithms
PDF
50120140503019
PDF
LNAI 2682 Declarative Data Mining Using SQL3 1st Edition by Hasan Jamil ISBN ...
PDF
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
PDF
Mining high utility itemsets in data streams based on the weighted sliding wi...
PDF
Ijsrdv1 i2039
PDF
A novel approach for text extraction using effective pattern matching technique
PDF
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
PDF
IRJET - Document Comparison based on TF-IDF Metric
PDF
Transaction Profitability Using HURI Algorithm [TPHURI]
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Efficient Temporal Association Rule Mining
International Journal of Engineering Research and Development (IJERD)
The International Journal of Engineering and Science (The IJES)
LNAI 2682 Declarative Data Mining Using SQL3 1st Edition by Hasan Jamil ISBN ...
An Approach of Improvisation in Efficiency of Apriori Algorithm
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An efficient algorithm for mining frequent inter transaction patterns
 
A Relative Study on Various Techniques for High Utility Itemset Mining from T...
A1030105
A Study of Various Projected Data Based Pattern Mining Algorithms
50120140503019
LNAI 2682 Declarative Data Mining Using SQL3 1st Edition by Hasan Jamil ISBN ...
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
Mining high utility itemsets in data streams based on the weighted sliding wi...
Ijsrdv1 i2039
A novel approach for text extraction using effective pattern matching technique
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
IRJET - Document Comparison based on TF-IDF Metric
Transaction Profitability Using HURI Algorithm [TPHURI]
International Journal of Engineering Research and Development (IJERD)
Efficient Temporal Association Rule Mining

More from International Journal of Technical Research & Application (20)

PDF
STUDY & PERFORMANCE OF METAL ON METAL HIP IMPLANTS: A REVIEW
PDF
EXPONENTIAL SMOOTHING OF POSTPONEMENT RATES IN OPERATION THEATRES OF ADVANCED...
PDF
POSTPONEMENT OF SCHEDULED GENERAL SURGERIES IN A TERTIARY CARE HOSPITAL - A T...
PDF
STUDY OF NANO-SYSTEMS FOR COMPUTER SIMULATIONS
PDF
ENERGY GAP INVESTIGATION AND CHARACTERIZATION OF KESTERITE CU2ZNSNS4 THIN FIL...
PDF
POD-PWM BASED CAPACITOR CLAMPED MULTILEVEL INVERTER
PDF
DIGITAL COMPRESSING OF A BPCM SIGNAL ACCORDING TO BARKER CODE USING FPGA
PDF
MODELLING THE IMPACT OF FLOODING USING GEOGRAPHIC INFORMATION SYSTEM AND REMO...
PDF
AN EXPERIMENTAL STUDY ON SEPARATION OF WATER FROM THE ATMOSPHERIC AIR
PDF
LI-ION BATTERY TESTING FROM MANUFACTURING TO OPERATION PROCESS
PDF
QUALITATIVE RISK ASSESSMENT AND MITIGATION MEASURES FOR REAL ESTATE PROJECTS ...
PDF
SCOPE OF REPLACING FINE AGGREGATE WITH COPPER SLAG IN CONCRETE- A REVIEW
PDF
IMPLEMENTATION OF METHODS FOR TRANSACTION IN SECURE ONLINE BANKING
PDF
EVALUATION OF DRAINAGE WATER QUALITY FOR IRRIGATION BY INTEGRATION BETWEEN IR...
PDF
THE CONSTRUCTION PROCEDURE AND ADVANTAGE OF THE RAIL CABLE-LIFTING CONSTRUCTI...
PDF
TIME EFFICIENT BAYLIS-HILLMAN REACTION ON STEROIDAL NUCLEUS OF WITHAFERIN-A T...
PDF
A STUDY ON THE FRESH PROPERTIES OF SCC WITH FLY ASH
PDF
AN INSIDE LOOK IN THE ELECTRICAL STRUCTURE OF THE BATTERY MANAGEMENT SYSTEM T...
PDF
OPEN LOOP ANALYSIS OF CASCADED HBRIDGE MULTILEVEL INVERTER USING PDPWM FOR PH...
PDF
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...
STUDY & PERFORMANCE OF METAL ON METAL HIP IMPLANTS: A REVIEW
EXPONENTIAL SMOOTHING OF POSTPONEMENT RATES IN OPERATION THEATRES OF ADVANCED...
POSTPONEMENT OF SCHEDULED GENERAL SURGERIES IN A TERTIARY CARE HOSPITAL - A T...
STUDY OF NANO-SYSTEMS FOR COMPUTER SIMULATIONS
ENERGY GAP INVESTIGATION AND CHARACTERIZATION OF KESTERITE CU2ZNSNS4 THIN FIL...
POD-PWM BASED CAPACITOR CLAMPED MULTILEVEL INVERTER
DIGITAL COMPRESSING OF A BPCM SIGNAL ACCORDING TO BARKER CODE USING FPGA
MODELLING THE IMPACT OF FLOODING USING GEOGRAPHIC INFORMATION SYSTEM AND REMO...
AN EXPERIMENTAL STUDY ON SEPARATION OF WATER FROM THE ATMOSPHERIC AIR
LI-ION BATTERY TESTING FROM MANUFACTURING TO OPERATION PROCESS
QUALITATIVE RISK ASSESSMENT AND MITIGATION MEASURES FOR REAL ESTATE PROJECTS ...
SCOPE OF REPLACING FINE AGGREGATE WITH COPPER SLAG IN CONCRETE- A REVIEW
IMPLEMENTATION OF METHODS FOR TRANSACTION IN SECURE ONLINE BANKING
EVALUATION OF DRAINAGE WATER QUALITY FOR IRRIGATION BY INTEGRATION BETWEEN IR...
THE CONSTRUCTION PROCEDURE AND ADVANTAGE OF THE RAIL CABLE-LIFTING CONSTRUCTI...
TIME EFFICIENT BAYLIS-HILLMAN REACTION ON STEROIDAL NUCLEUS OF WITHAFERIN-A T...
A STUDY ON THE FRESH PROPERTIES OF SCC WITH FLY ASH
AN INSIDE LOOK IN THE ELECTRICAL STRUCTURE OF THE BATTERY MANAGEMENT SYSTEM T...
OPEN LOOP ANALYSIS OF CASCADED HBRIDGE MULTILEVEL INVERTER USING PDPWM FOR PH...
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...

Recently uploaded (20)

PPTX
OOP with Java - Java Introduction (Basics)
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Well-logging-methods_new................
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
web development for engineering and engineering
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
PPT on Performance Review to get promotions
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
Geodesy 1.pptx...............................................
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Construction Project Organization Group 2.pptx
DOCX
573137875-Attendance-Management-System-original
OOP with Java - Java Introduction (Basics)
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
bas. eng. economics group 4 presentation 1.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CH1 Production IntroductoryConcepts.pptx
Well-logging-methods_new................
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Internet of Things (IOT) - A guide to understanding
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
web development for engineering and engineering
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPT on Performance Review to get promotions
Operating System & Kernel Study Guide-1 - converted.pdf
Sustainable Sites - Green Building Construction
Geodesy 1.pptx...............................................
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Construction Project Organization Group 2.pptx
573137875-Attendance-Management-System-original

A FLEXIBLE APPROACH TO MINE HIGH UTILITY ITEMSETS FROM TRANSACTIONAL DATABASES USING UP-GROWTH+: A SURVEY

  • 1. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 3, Issue 1 (Jan-Feb 2015), PP. 31-34 31 | P a g e A FLEXIBLE APPROACH TO MINE HIGH UTILITY ITEMSETS FROM TRANSACTIONAL DATABASES USING UP-GROWTH+: A SURVEY Mr. Ramesh S. Yevale1, Prof. Vinod S. Wadne2 1 Department of Computer Engineering, 2 Assistant Professor ICOER, Wagholi, Pune, Maharashtra, India 1 ryevale33@gmail.com Abstract- Present day, mining of high utility itemsets especially from transactional databases is required task to process many transactional operations quick. There are many methods that are presented for mining high utility itemsets from transactional datasets are subjected to some serious limitations such as performance of this methods needs to be investigated in low memory based systems for mining high utility itemsets from large transactional datasets and hence needs to address further as well. Further limitation includes these methods cannot overcome the screenings as well as overhead of null transactions; hence, performance degrades eventually. We are analyzing the new approaches to overcome these limitations such as distributed programming model for mining business-oriented transactional datasets, which overcomes the limitations and main memory- based computing, but also unexpectedly highly scalable in terms of increasing database size. We have used this approach with existing UP-Growth and UP-Growth+ with aim of improving their performances further. Keywords: Data Mining, Frequent Itemset, Itemset Utility, UP-Growth, UP-Growth+ I. INTRODUCTION A high utility itemset is defined as: A group of items in a transaction database is called itemset. This itemset in a transaction database consists of two aspects: First one is itemset in a single transaction is called internal utility and second one is itemset in different transaction database is called external utility. The transaction utility of an itemset is defined as the multiplication of external utility by the internal utility. By transaction utility, transaction weight utilizations (TWU) can be found. To call an itemset as high utility itemset only if its utility is not less than a user specified minimum support threshold utility value; otherwise itemset is treated as low utility itemset. To generate these high utility itemsets mining recently in 2010, UP-Growth (Utility Pattern Growth) algorithm was proposed by Vincent S. Tseng et al. for discovering high utility itemsets and a tree based data structure called UP-Tree (Utility Pattern tree) which efficiently maintains the information of transaction database related to the utility patterns. Four strategies (DGU, DGN, DLU, and DLN) used for efficient construction of UP-Tree [11] and the processing in UP-Growth [11]. By applying these strategies, can not only efficiently decrease the estimated utilities of the potential high utility itemsets (PHUI) but also effectively reduce the number of candidates. But this algorithm takes more execution time for phase II (identify local utility itemsets) and I/O cost. In this paper, the existing UP-Growth algorithm is improved to generate high utility itemsets efficiently for large datasets and reduce execution time in phase II compared with existing UP-Growth algorithm. In the experimental section, experiments are conducted on our improved algorithm and existing algorithm with variety of synthetic and real-time datasets. II. PROBLEM DEFINITION In this section we describe the concepts of regular frequent pattern mining and define the basic definitions of the problem to obtain complete set of regular frequent patterns in incremental transaction databases. Let I = {i1, i2, . . . , in} be a set of items. A set X = {ij, . . . ,ik} ⊆ I, where j ≤ k and j, k ∈ [1, n] is called a pattern or an itemest. A transaction t = (tid, Y) is a couple where tid is a transaction-id and Y is a pattern. Let size (t) be the size of t, i.e., the number of items in Y. A transaction database DB over I is a set of transactions T = {t1, . . . ,tm}, m = | DB | is the size of DB, i.e., the total number of transactions in DB. If X ⊆ Y, which means that t contains X or X occurs in t and denoted as tjX, j∈[1, m]. Therefore, TX = {tjX, . . . ,tkX}, j ≤ k and j, k ∈[1, m] is the set of all transactions where pattern X occurs in DB. A. Definition 1 (frequent pattern X): The total number of transactions in a DB that contains pattern X is called the support of X i.e., Sup(X). Hence Sup(X) = | TX|, where | TX | is the size of TX. The pattern X is said to be frequent if its support is greater than or equal to user given minimum support threshold i.e., Sup(X) ≥ min_sup(δ). B. Definition 2 (regularity of frequent pattern X) Let tXj+1 and tjX, j∈[1, (m - 1)] be two successive transactions where frequent pattern X appears. The variation between these two successive transactions can be defined as a period of X, say pX (i.e., p= tXj+1 – tX, j∈[1, (m - )]). For ease, to calculate the period of a pattern, we consider the first transaction in the DB as null i.e., tf = 0 and the last transaction is the mth transaction i.e., tl = tm. Let for a TX, PX be the set of all periods of X i.e., PX = {p1X, . . . ,prX}, where r is the total number of periods in PX. Then the regularity of a frequent pattern X can be denoted as Reg(X) = max{p1X, . . . ,prX}. A frequent pattern X is said to be regular frequent if its regularity is less than or equal to user given maximum regularity threshold i.e., λ.
  • 2. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 3, Issue 1 (Jan-Feb 2015), PP. 31-34 32 | P a g e III. DATA SOURCES A. OLAP OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is theentity model(usually3NF). OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi- dimensional schemas (usually star schema). B. Big Data Big Data describes the process of extracting actionable intelligence from disparate, and often times non-traditional, data sources. These data sources may include structured data such as databases, sensor, click stream and location data, as well as unstructured data like email, HTML, social data and images. The actionable data may be represented visually (e.g. in a graph), but it is often distilled down to a structured format, which is then stored in a database for further manipulation. C. Stock Market The goal of this article is to introduce the concepts, terminology and code structures required to develop applications that utilise real-time stock market data (e.g. trading applications). It discusses trading concepts, the different types of market data available, and provides a practical example on how to process data feed events into a market object model. The article is aimed at intermediate to advanced developers who wish to gain an understanding of basic financial market data processing. I recommend that those who are already familiar with trading terminology skip ahead to the Market Data section. D. Datasets Real world data sets Accidents and Chess are obtained from FIMI Repository [4]; Chain-store is obtained from NU- MineBench 2.0 [5]; Foodmart is acquired from Microsoft foodmart 2000 database. In the above data sets, except Chain- store and Foodmart, unit profits for items in utility tables are generated between 1 and 1,000 by using a log-normal distribution and quantities of items are generated randomly between 1 and 10. The two real data sets Chain-store and foodmart already contain unit profits and purchased quantities. Total utilities of the two data sets are 26,388,499.8 and 120,160.84, respectively. IV. TECHNIQUES USED FOR HIGH UTILITY ITEMSETS MINING A. Mining Regular Frequent Patterns In this section we describe the mining process of regular frequent patterns in incremental transactional databases using vertical data format requires only one database scan. To generate length-1 itemset our algorithm constructs an item header table called RFPID-table consists of four fields (Itemset,Tid, Sup, Reg). Itemset is an item name, Tid is the transaction list where the item occurs in various transactions, Sup is the support of the itemset and Reg is the regularity of an itemset. Each itemset consists of its own array to accommodate Tids and other intermediate results. Let Table 1 be the transactional database DB in horizontal format which is somewhat similar to the database in [9]. Convert the above horizontal database into vertical database with one database scan to store all length-1 items with respective tids, support and regularity. For example, Let us consider the minimum support threshold value, δ = 5 and maximum.
  • 3. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 3, Issue 1 (Jan-Feb 2015), PP. 31-34 33 | P a g e B. Frequent Pattern Mining Tree:Design and Construction Let I = a1,a2,….an be a set of items, and a transaction database DB = hT1,T2,….Tni,where Ti (i….[1::n]) is a transaction which contains a set of items in I. The support1 (or occurrence frequency) of a pattern A, which is a set of items, is the number of transactions containing A in DB. A is a frequent pattern if A's support is no less than a predefined minimum support threshold. Given a transaction database DB and a minimum support threshold, , the problem of finding the complete set of frequent patterns is called the frequent pattern mining problem. C. UP-Growth Algorithm The UP-Growth [11] is one of the efficient algorithms to generate high utility itemsets depending on construction of a global UP-Tree. In phase I, the framework of UP-Tree follows three steps: (i). Construction of UP-Tree [11]. (ii). Generate PHUIs from UP-Tree. (iii). Identify high utility itemsets using PHUI.The construction of global UP-Tree [11] is follows, (i). Discarding global unpromising items (i.e., DGU strategy) is to eliminate the low utility items and their utilities from the transaction utilities. (ii). Discarding global node utilities (i.e., DGN strategy) during global UP-Tree construction. By DGN strategy, node utilities which are nearer to UP-Tree root node are effectively reduced[15]. The PHUI is similar to TWU, which compute all itemsets utility with the help of estimated utility. Finally, identify high utility itemsets (not less than min_sup) from PHUIs values. The global UP-Tree contains many sub paths. Each path is considered from bottom node of header table. This path is named as conditional pattern base (CPB). D. Improved UP-Growth Although DGU and DGN strategies are efficiently reduce the number of candidates in Phase 1(i.e., global UP-Tree). But they cannot be applied during the construction of the local UP- Tree (Phase-2). Instead use, DLU strategy (Discarding local unpromising items) to discarding utilities of low utility items from path utilities of the paths and DLN strategy (Discarding local node utilities) to discarding item utilities of descendant nodes during the local UP-Tree construction. Even though, still the algorithm facing some performance issues in phase-2. To overcome this, maximum transaction weight utilizations (MTWU) are computed from all the items and considering multiple of min_sup as a user specified threshold value as shown in algorithm. By this modification, performance will increase compare with existing UP-Tree construction also improves the performance of UP-growth algorithm. An improved utility pattern growth is abbreviated as IUPG. V. LIMITATIONS OF FREQUENT ITEMSETS MINING A. Frequent Itemset Mining is Uncertain Transaction databases semantically and had significant drawbacks which led to misleading results. B. Apriori, while historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms. Candidate generation generates large numbers of subsets (the algorithm attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all of its proper subsets. VI. APPLICATIONS OF FREQUENT ITEMSETS MINING A. Methodology/Principal Findings The claims datasets of 1 million nationally representative people within Taiwan's National Health Insurance in 2005 were used to calculate the number of patients with one-stop visits. The frequent itemsets mining was applied to compute the combination patterns of specialties in the one-stop visits. Among the total 13,682,469 ambulatory care visits in 2005, one-stop visits occurred 144,132 times and involved 296,822 visits (2.2% of all visits) by 66,294 (6.6%) persons. People tended to have this behavior with age and the percentage reached 27.5% (5,662 in 20,579) in the age group ≥80 years. In general, women were more likely to have one-stop visits than men (7.2% vs. 6.0%). Internal medicine plus ophthalmology was the most frequent combination with a visited frequency of 3,552 times (2.5%), followed by cardiology plus neurology with 3,183 times (2.2%). The most frequent three-specialty combination, cardiology plus neurology and gastroenterology, occurred only 111 times. B. Association Rule Learning Association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. VII. CONCLUSION In this paper we have analyzed new enhanced frameworks of recently presented algorithms namely UP-Growth and UP- Growth+ with aim of improving the processing time
  • 4. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 3, Issue 1 (Jan-Feb 2015), PP. 31-34 34 | P a g e performance and mining performance under the less system memory environment as well. We have seen the concept of UP-Growth and UP-Growth+. The systems presented the work done so far over the previous approaches with the datasets used. In the future completely evaluate this proposed architecture and compare its performance against existing methods in order to claim the effectiveness and efficiency of this proposed network VIII. ACKNOWLEDGEMENT I express great many thanks to Prof. Vinod S. Wadne for his great effort of supervising and leading me, to accomplish this fine work. To college and department staff, they were a great source of support and encouragement. To my friends and family, for their warm, kind encourages and loves. To every person who gave me something too light along my pathway. I thanks for believing in me. REFERENCES [1] S. J. Yen and Y. S. Lee.: Mining high utility quantitative association rules. In Proc. of 9th Int'l Conf. on Data Warehousing and Knowledge Discovery, Lecture Notes in Computer Science 4654, pp. 283-292, Sep., 2007. [2] Frequent itemset mining implementations repository, http://fimi.cs.helsinki.fi/ [3] Vincent. S. Tseng, C. W. Wu, B. E. Shie, and P. S. Yu.: UP-Growth: An Efficient Algorithm for High Utility Itemset Mining. In Proc. of ACM-KDD, Washington, DC, USA, pp. 253-262, July 25–28, 2010. [4] Y.-C. Li, J.-S. Yeh, and C.-C. Chang, “Isolated Items Discarding Strategy for Discovering High Utility Itemsets,” Data and Knowledge Eng., vol. 64, no. 1, pp. 198-217, Jan. 2008. [5] C.H. Lin, D.Y. Chiu, Y.H. Wu, and A.L.P. Chen, “Mining Frequent Itemsets from Data Streams with a Time- Sensitive Sliding Window,” Proc. SIAM Int’l Conf. Data Mining (SDM ’05), 2005. [6] Y. Liu, W. Liao, and A. Choudhary, “A Fast High Utility Itemsets Mining Algorithm,” Proc. Utility-Based Data Mining Workshop, 2005. [7] F. Tao, F. Murtagh, and M. Farid, “Weighted Association Rule Mining Using Weighted Support and Significance Framework,” Proc. ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD ’03), pp. 661-666, 2003 [8] H. Dutta, and J. Demme, “Distributed Storage of Large Scale Multidimensional EEG Data using Hadoop/HBase,” Grid and Cloud Database Management, New York City: Springer; 2011. [9] G. Y. Ming, W. Zhi-jun. A Vertical format algorithm for mining frequent itemsets. IEEE Transactions, pp. 11-13 (2010). [10] M. J. Zaki, G. Karam. Fast Vertical Mining Using Diffsets, ACM SIGKDD. pp. 24-27 (2003). [11] M. G. Elfeky, W. G. Aref, A. K. Elmagarmid. Periodicity Detection in Time Series Databases. IEEE Transactions on Knowledge and Data Engineering 17(7), pp. 875-887 (2005). [12] A. Erwin, R.P. Gopalan, and N.R. Achuthan, “Efficient Mining of High Utility Itemsets from Large Data Sets,” Proc. 12th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD), pp. 554-561, 2008. [13] R. Chan, Q. Yang, and Y. Shen, “Mining High Utility Itemsets,” Proc. IEEE Third Int’l Conf. Data Mining, pp. 19-26, Nov. 2003. [14] U. Yun and J.J. Leggett, “WIP: Mining Weighted Interesting Patterns with a Strong Weight and/or Support Affinity,” Proc. SIAM Int’l Conf. Data Mining (SDM ’06), pp. 623-627, Apr. 2006. [15] U. Yun, “An Efficient Mining of Weighted Frequent Patterns with Length Decreasing Support Constraints,” Knowledge-Based Systems, vol. 21, no. 8, pp. 741-752, Dec. 2008.