SlideShare a Scribd company logo
IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 3, 2013 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 659
Mining Frequent Item set Using Genetic Algorithm
Hardik Patel1
Prof. Jigar Patel2
1, 2
Department of Computer Science
1, 2
Alpha College of Engg and technology Khatraj, Ahmadabad 382 481, India
Abstract— By applying rule mining algorithms, frequent
itemsets are generated from large data sets e.g. Apriori
algorithm. It takes so much computer time to compute all
frequent itemsets. We can solve this problem much
efficiently by using Genetic Algorithm(GA). GA performs
global search and the time complexity is less compared to
other algorithms. Genetic Algorithms (GAs) are adaptive
heuristic search & optimization method for solving both
constrained and unconstrained problems based on the
evolutionary ideas of natural selection and genetic. The
main aim of this work is to find all the frequent itemsets
from given data sets using genetic algorithm & compare the
results generated by GA with other algorithms. Population
size, number of generation, crossover probability, and
mutation probability are the parameters of GA which affect
the quality of result and time of calculation.
I. INTRODUCTION
Studies of Frequent Itemset (or pattern) Mining is
recognized in the data mining field because of its large
applications in mining association rules, correlations, and
graph pattern constraint based on frequent patterns,
sequential patterns, and many other data mining tasks.
Capable algorithms for mining frequent itemsets are critical
for mining association rules as well as for many other data
mining tasks. The major challenge found in frequent pattern
mining is a large number of result patterns. As the minimum
threshold becomes lower, an exponentially large number of
itemsets are generated.So, pruning unimportant patterns can
be done effectively in mining process and that becomes one
of the main topics in frequent pattern mining. Therefore, the
main aim is to optimize the process of finding patterns
which should be efficient and can detect the important
patterns which can be used in various ways.
Genetic algorithms (GAs), inspired by biological
evolution, are efficient domain independent search methods.
That is, these methods could help us in effectively solving
problem in different application domain. The goals of
Holland's research have been twofold: First, to abstract and
rigorously explain the adaptive processes of nature systems.
Second, to design artificial systems software that retains the
important mechanisms of nature system. These methods are
capable of applying in many fields and perform well. From
the viewpoint of AI research, Holland's method provides a
good mechanism of learning.
GAs are population-based search techniques that
maintain populations of potential solutions during searches.
A string with a fixed bit-length usually represents a potential
solution. In order to evaluate each potential solution, GAs
need a payoff (or reward, objective) function that assigns
scalar payoff to any particular solution. Once the
representation scheme and evaluation function is
determined, a GA can start searching. Initially, often at
random, GAs create a certain number, called the population
size, of strings to form the first generation. Next, the payoff
function is used to evaluate each solution in this first
generation. Better solutions obtain higher payoffs. Then, on
the basis of these evaluations, some genetic operations are
employed to generate the next generation. The procedures of
evaluation and generation are iteratively performed until the
optimal solution(s) is (are) found or the time allotted for
computation ends.
The goal of this paper is to review mining frequent item sets
using different methods.
II. DIFFERENT METHODS OF MINING FREQUENT
ITEM SETS
A. An Algorithm for Frequent Pattern Mining Based On
Apriori:
Frequent pattern mining is a heavily researched area in the
field of data mining with wide range of applications. Mining
frequent patterns from largescale databases has emerged as
an important problem in data mining and knowledge
discovery community number of algorithms has been
proposed to determine frequent pattern. Apriori algorithm is
the first algorithm proposed in this field. With the time a
number of changes proposed in Apriori to enhance the
performance in term of time and number of database passes.
In this paper three different frequent pattern mining
approaches (Record filter, Intersection and Proposed
Algorithm) are given based on classical Apriori algorithm.
In these approaches Record filter approach proved better
than classical Apriori Algorithm, Intersection approach
proved better than Record filter approach and finally
proposed algorithm proved that it is much better than other
frequent pattern mining algorithm. In last we perform a
comparative study of all approaches on dataset of 2000
transaction.
Conclusion- Association rule mining has a wide range of
applicability such as market basket analysis, medical
diagnosis/ research, website navigation analysis, homeland
security and so on. In this method, we surveyed the list of
existing association rule mining techniques and compare
these algorithms with our modified approach. The
conventional algorithm of association rules discovery
proceeds in two and more steps but in our approach
discovery of all frequent item will take the same steps but it
will take the less time as compare to the conventional
algorithm. We can conclude that in this new approach, we
have the key ideas of reducing time. As we have proved
above how the proposed Apriori algorithm take less time
than that of classical apriori algorithms. That is really going
to be fruitful in saving the time in case of large database.
This key idea is surely going to open a new gateway for the
upcoming researcher to work in the filed of the data mining.
Mining Frequent Item set Using Genetic Algorithm
(IJSRD/Vol. 1/Issue 3/2013/0066)
All rights reserved by www.ijsrd.com 660
B. Efficient Algorithm For Mining Frequent Itemsets
Using Clustering Techniques
Now a days, Association rule plays an important role. The
purchasing of one product when another product is
purchased represents an association rule. The Apriori
algorithm is the basic algorithm for mining association rules.
This paper presents an efficient Partition Algorithm for
Mining Frequent Itemsets(PAFI) using clustering. This
algorithm finds the frequent itemsets by partitioning the
database transactions into clusters. Clusters are formed
based on the similarity measures between the transactions.
Then it finds the frequent itemsets with the transactions in
the clusters directly using improved Apriori algorithm which
further reduces the number of scans in the database and
hence improve the efficiency.
In this method, the Partition Algorithm for
Frequent Itemset (PAFI) is proposed before applying
Improved Apriori Algorithm. This algorithm reduces the
number of scans in the database and improves efficiency and
computing time by taking the advantage of clustering
technique. By experiment results, it can obtain higher
efficiency.
C. Efficient Hardware Data Mining For Frequent Item-set
With Apriori Algorithm
The Apriori algorithm is a popular correlation-based data
mining kernel. However, it is a computationally expensive
algorithm and the running times can stretch up to days for
large databases, as database sizes can extend to Gigabytes.
Through the use of a new extension to the systolic array
architecture, time required for processing can be signicantly
reduced. Our array architecture implementation on a Xilinx
Virtex-II Pro 100 provides a performance improvement that
can be orders of magnitude faster than the state-of-the-art
software implementations. The system is easily scalable and
introduces an efficient .systolic injection method for
intelligently reporting unpredictably generated mid-array
results to a controller without any chance of collision or
excessive stalling.
FPGA implementations of the Apriori algorithm
can provide significant performance improvement over
software-based approaches. We are also interested in
implementing some of the more recent (and more control-
intensive and memory-intensive) approaches in hardware,
including hash-based strategies such as DHP and trie-based
approaches. It may be possible to increase the bandwidth of
the system by processing several sub-partitions of a set in
parallel. We are also interested in leveraging our experience
with high-performance string matching for autonomous
pattern generation for network security.
D. Mining Frequent Item set for Non Binary Data set using
Genetic Algorithm
Frequent itemset mining is a basic problem in data mining
and knowledge discovery. The discovered patterns can be
used as input for Association rules, which are useful in
many application domains. We have considered a large
database of customer transactions from a super market. Each
transaction consists of items purchased by a customer in a
visit. We present an efficient algorithm that generates all
significant association rules between items in the database.
In general the association rule mining
algorithms like Apriori, partition, pincer-search,
incremental, border algorithm etc, does not consider
negation occurrence of the attribute in them and also take
more time to compute all the frequent itemsets. By using
Genetic Algorithm (GA), we can improve the scenario and
the system can predict the rules which contain negative
attributes in the generated rules, even with more than one
attribute in consequent part. The major advantage of
using GA in the discovery of frequent itemsets is
that they perform global search and its time complexity is
less compared to that of other algorithms which are based on
the greedy approach. The main aim of this method is to find
all possible frequent item sets from given dataset using the
genetic algorithm.
III. INTRODUCTION TO GENETIC ALGORITHM
Genetic algorithms (GAs), inspired by biological
development, are efficient domain self-sufficient search
methods. That is, these methods could help us in effectively
solving problem in different application domain. The goals
of Holland's research have been twofold: First, to conceptual
and strictly explain the adaptive processes of character
systems. Second, to design artificial systems software that
retains the important mechanisms of nature system. These
methods are capable of applying in many fields and execute
well. From the viewpoint of AI research, Holland's method
provides a good mechanism of learning.
Genetic Algorithms are population-based search
techniques that maintain populations of probable solutions
during searches. A string with a fixed bit-length usually
represents a probable solution. In order to assess each
potential solution, GAs need a payoff (or reward, objective)
functions that assigns scalar induce to any particular
solution. Once the representation scheme and estimate
function is determined, a GA can start searching. Initially,
often at casual, GAs create a positive number, called the
population size, of strings to form the first generation. Next,
the payoff function is used to evaluate each solution in this
first generation. Better solutions obtain higher payoffs.
Then, on the basis of these evaluation, some genetic
operations are employed to generate the next generation.
A. SIMPLE GENETIC ALGORITHM
1. [Start] Generate random population of n chromosomes
(suitable solutions for the problem)
2. [Fitness] Evaluate the fitness f(x) of each chromosome
x in the population
3. [New population] Create a new population by
repeating following steps until the new population is
complete
a) [Selection] Select two parent chromosomes from a
population according to their fitness (the better
fitness, the bigger chance to be selected)
b) [Crossover] With a crossover probability cross
over the parents to form new offspring (children).
If no crossover was performed, offspring is the
exact copy of parents.
c) [Mutation] With a mutation probability mutate
new offspring at each locus (position in
chromosome).
Mining Frequent Item set Using Genetic Algorithm
(IJSRD/Vol. 1/Issue 3/2013/0066)
All rights reserved by www.ijsrd.com 661
d) [Accepting] Place new offspring in the new
population
4. [Replace] Use new generated population for a further
run of the algorithm
5. [Test] If the end condition is satisfied, stop, and return
the best solution in current population
[Loop] Go to step 2
IV. GENETIC ALGORITHM VS TRADITIONAL
METHODS
The following list gives the essential differences between
GAs and other forms of optimization.
1. Genetic algorithms a coded form of the function
values (parameter set), rather than with the actual values
themselves. So, for example, if we want to find the
minimum of the function f(x) = x3+x2+5, the GA would
not deal directly with x or y values, but with strings that
encode these values. For this case, strings representing the
binary x values should be used.
2. Genetic algorithms use a set, or population, of
points to conduct a search, not just a single point on the
problem space. This gives GAs the power to search noisy
spaces littered with local optimum points. Instead of
relying on a single point to search through the space, the
GAs looks at many different areas of the problem space at
once, and uses all of this information to guide it.
3. Genetic algorithms use only payoff information to
guide themselves through the problem space. Many search
techniques need a variety of information to guide
themselves. Hill climbing methods require derivatives, for
example. The only information a GA needs is some
measure of fitness about a point in the space (sometimes
known as an objective function value). Once the GA knows
the current measure of "goodness" about a point, it can use
this to continue searching for the optimum.
4. GAs are probabilistic in nature, not deterministic.
This is a direct result of the randomization techniques used
by GAs.
5. GAs are inherently parallel. Here lies one of the
most powerful features of genetic algorithms. GAs, by their
nature are very parallel, dealing with a large number of
points (strings) simultaneously.
Ad hoc approach (analytical,
specific)
Genetic approach
Speed
Depending on solution, generally
good
Median or low
Performance Depending on solution Fair to excellent
Problem
understanding
Necessary Not necessary
Human work
needed
A few minutes to a few theses A few days
Applicability
Low: Most interesting problems
have no usable mathematical
expression, or are non-
computable, or "NP-complete"
(too many solutions to try them
all)
General
Intermediary
steps
are not solutions (you must wait
until the end of computation)
are solutions (the
solving process
can be interrupted
at any time,
though the later
the better)
Table. 1: Comparison GA with traditional Algorithms.
[1] Agrawal R., Imielinski T. and Swami A. (1993) Mining
Association rules between sets of items in large
databases, In the Proc. of the ACM SIGMOD Int’l
Conf. on Management of Data (ACM SIGMOD
‘93),Washington, USA, 207-216.
[2] Pei M., Goodman E.D., Punch F. (2000) Feature
Extraction using genetic algorithm, CaseCenter for
Computer-Aided Engineering and built-up W.
Department of Computer Science.
[3] Han J., Kamber M. Data Mining: Concepts &
Techniques, Morgan & Kaufmann, 2000.
[4] Pujari A.K., Data Mining Techniques, Universities
Press, 2001.
[5] Arun K Pujari. Data Mining Techniques (Edition
5th):Hyderabad, India: Universities Press (India)
Private Limited, 2003.
[6] Jiawei Han. Data Mining, concepts and Techniques:
San Francisco, CA: Morgan Kaufmann
Publishers.,2004.

More Related Content

PDF
Frequent Item Set Mining - A Review
PDF
Review Over Sequential Rule Mining
PDF
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
PDF
H044063843
PDF
Ay4201347349
PDF
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
PDF
A cyber physical stream algorithm for intelligent software defined storage
PDF
(2016)application of parallel glowworm swarm optimization algorithm for data ...
Frequent Item Set Mining - A Review
Review Over Sequential Rule Mining
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
H044063843
Ay4201347349
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
A cyber physical stream algorithm for intelligent software defined storage
(2016)application of parallel glowworm swarm optimization algorithm for data ...

What's hot (20)

PDF
International Journal of Engineering Research and Development
PDF
3.[18 22]hybrid association rule mining using ac tree
PDF
Ijricit 01-002 enhanced replica detection in short time for large data sets
PDF
Machine learning in the life sciences with knime
PDF
IRJET- A Review of Data Cleaning and its Current Approaches
PDF
Feature Subset Selection for High Dimensional Data using Clustering Techniques
PDF
V34132136
PDF
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
PDF
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
PDF
An Improved Differential Evolution Algorithm for Data Stream Clustering
PDF
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
PDF
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
PDF
A genetic based research framework 3
DOCX
Ontology based clustering algorithms
PDF
A Novel Framework on Web Usage Mining
PDF
IRJET- Reversible Data Hiding using Histogram Shifting Method: A Critical Review
PDF
Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...
PDF
Deep learning for medical imaging
PDF
Drug Repurposing using Deep Learning on Knowledge Graphs
PDF
Artificial Intelligence for Automating Data Analysis
International Journal of Engineering Research and Development
3.[18 22]hybrid association rule mining using ac tree
Ijricit 01-002 enhanced replica detection in short time for large data sets
Machine learning in the life sciences with knime
IRJET- A Review of Data Cleaning and its Current Approaches
Feature Subset Selection for High Dimensional Data using Clustering Techniques
V34132136
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
An Improved Differential Evolution Algorithm for Data Stream Clustering
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
A genetic based research framework 3
Ontology based clustering algorithms
A Novel Framework on Web Usage Mining
IRJET- Reversible Data Hiding using Histogram Shifting Method: A Critical Review
Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...
Deep learning for medical imaging
Drug Repurposing using Deep Learning on Knowledge Graphs
Artificial Intelligence for Automating Data Analysis
Ad

Viewers also liked (15)

PDF
Performance Analysis of Small Sized Engine with Supercharger using Gasoline -...
PDF
An authentication framework for wireless sensor networks using Signature Base...
PDF
Review on variants of Security aware AODV
PDF
Survey of Modified Routing Protocols for Mobile Ad-hoc Network
PDF
Segment Combination based Approach for Energy- Aware Multipath Communication ...
PDF
Design & Development of Articulated Inspection ARM for in House Inspection in...
PDF
Design and Simulation of 4-bit DAC Decoder Using Custom Designer
PDF
Low Power Design flow using Power Format
PDF
Analysis on Data Fusion Techniques for Combining Conflicting Beliefs
PDF
State of the Art in Cloud Security
PDF
A Review on Image Denoising using Wavelet Transform
PDF
Analysis and Detection of Image Forgery Methodologies
PDF
The Optimizing Multiple Travelling Salesman Problem Using Genetic Algorithm
PDF
Optimization Approach for Capacitated Vehicle Routing Problem Using Genetic A...
PDF
SDH (Synchronous Digital Hierarchy) & Its Architecture
Performance Analysis of Small Sized Engine with Supercharger using Gasoline -...
An authentication framework for wireless sensor networks using Signature Base...
Review on variants of Security aware AODV
Survey of Modified Routing Protocols for Mobile Ad-hoc Network
Segment Combination based Approach for Energy- Aware Multipath Communication ...
Design & Development of Articulated Inspection ARM for in House Inspection in...
Design and Simulation of 4-bit DAC Decoder Using Custom Designer
Low Power Design flow using Power Format
Analysis on Data Fusion Techniques for Combining Conflicting Beliefs
State of the Art in Cloud Security
A Review on Image Denoising using Wavelet Transform
Analysis and Detection of Image Forgery Methodologies
The Optimizing Multiple Travelling Salesman Problem Using Genetic Algorithm
Optimization Approach for Capacitated Vehicle Routing Problem Using Genetic A...
SDH (Synchronous Digital Hierarchy) & Its Architecture
Ad

Similar to Mining Frequent Item set Using Genetic Algorithm (20)

PDF
A Survey on Frequent Patterns To Optimize Association Rules
PDF
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
PDF
Discovering Frequent Patterns with New Mining Procedure
PDF
J017114852
PDF
A classification of methods for frequent pattern mining
PDF
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
PDF
B0950814
PDF
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
PDF
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
PPT
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
PDF
Ej36829834
PDF
An improved apriori algorithm for association rules
PDF
Volume 2-issue-6-2081-2084
PDF
Volume 2-issue-6-2081-2084
PPT
Mining Frequent Patterns, Association and Correlations
PDF
A Brief Overview On Frequent Pattern Mining Algorithms
PDF
D05333034
PPTX
Data mining techniques unit III
PDF
06FPBasic02.pdf
PDF
Ijcatr04051008
A Survey on Frequent Patterns To Optimize Association Rules
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
Discovering Frequent Patterns with New Mining Procedure
J017114852
A classification of methods for frequent pattern mining
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
B0950814
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Ej36829834
An improved apriori algorithm for association rules
Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084
Mining Frequent Patterns, Association and Correlations
A Brief Overview On Frequent Pattern Mining Algorithms
D05333034
Data mining techniques unit III
06FPBasic02.pdf
Ijcatr04051008

More from ijsrd.com (20)

PDF
IoT Enabled Smart Grid
PDF
A Survey Report on : Security & Challenges in Internet of Things
PDF
IoT for Everyday Life
PDF
Study on Issues in Managing and Protecting Data of IOT
PDF
Interactive Technologies for Improving Quality of Education to Build Collabor...
PDF
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
PDF
A Study of the Adverse Effects of IoT on Student's Life
PDF
Pedagogy for Effective use of ICT in English Language Learning
PDF
Virtual Eye - Smart Traffic Navigation System
PDF
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
PDF
Understanding IoT Management for Smart Refrigerator
PDF
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
PDF
A Review: Microwave Energy for materials processing
PDF
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
PDF
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
PDF
Making model of dual axis solar tracking with Maximum Power Point Tracking
PDF
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
PDF
Study and Review on Various Current Comparators
PDF
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
PDF
Defending Reactive Jammers in WSN using a Trigger Identification Service.
IoT Enabled Smart Grid
A Survey Report on : Security & Challenges in Internet of Things
IoT for Everyday Life
Study on Issues in Managing and Protecting Data of IOT
Interactive Technologies for Improving Quality of Education to Build Collabor...
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
A Study of the Adverse Effects of IoT on Student's Life
Pedagogy for Effective use of ICT in English Language Learning
Virtual Eye - Smart Traffic Navigation System
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
Understanding IoT Management for Smart Refrigerator
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
A Review: Microwave Energy for materials processing
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
Making model of dual axis solar tracking with Maximum Power Point Tracking
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
Study and Review on Various Current Comparators
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
Defending Reactive Jammers in WSN using a Trigger Identification Service.

Recently uploaded (20)

PPTX
Sustainable Sites - Green Building Construction
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
PPT on Performance Review to get promotions
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Welding lecture in detail for understanding
PPTX
web development for engineering and engineering
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
Sustainable Sites - Green Building Construction
Internet of Things (IOT) - A guide to understanding
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT on Performance Review to get promotions
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CYBER-CRIMES AND SECURITY A guide to understanding
CH1 Production IntroductoryConcepts.pptx
Welding lecture in detail for understanding
web development for engineering and engineering
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Lecture Notes Electrical Wiring System Components
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Foundation to blockchain - A guide to Blockchain Tech

Mining Frequent Item set Using Genetic Algorithm

  • 1. IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 3, 2013 | ISSN (online): 2321-0613 All rights reserved by www.ijsrd.com 659 Mining Frequent Item set Using Genetic Algorithm Hardik Patel1 Prof. Jigar Patel2 1, 2 Department of Computer Science 1, 2 Alpha College of Engg and technology Khatraj, Ahmadabad 382 481, India Abstract— By applying rule mining algorithms, frequent itemsets are generated from large data sets e.g. Apriori algorithm. It takes so much computer time to compute all frequent itemsets. We can solve this problem much efficiently by using Genetic Algorithm(GA). GA performs global search and the time complexity is less compared to other algorithms. Genetic Algorithms (GAs) are adaptive heuristic search & optimization method for solving both constrained and unconstrained problems based on the evolutionary ideas of natural selection and genetic. The main aim of this work is to find all the frequent itemsets from given data sets using genetic algorithm & compare the results generated by GA with other algorithms. Population size, number of generation, crossover probability, and mutation probability are the parameters of GA which affect the quality of result and time of calculation. I. INTRODUCTION Studies of Frequent Itemset (or pattern) Mining is recognized in the data mining field because of its large applications in mining association rules, correlations, and graph pattern constraint based on frequent patterns, sequential patterns, and many other data mining tasks. Capable algorithms for mining frequent itemsets are critical for mining association rules as well as for many other data mining tasks. The major challenge found in frequent pattern mining is a large number of result patterns. As the minimum threshold becomes lower, an exponentially large number of itemsets are generated.So, pruning unimportant patterns can be done effectively in mining process and that becomes one of the main topics in frequent pattern mining. Therefore, the main aim is to optimize the process of finding patterns which should be efficient and can detect the important patterns which can be used in various ways. Genetic algorithms (GAs), inspired by biological evolution, are efficient domain independent search methods. That is, these methods could help us in effectively solving problem in different application domain. The goals of Holland's research have been twofold: First, to abstract and rigorously explain the adaptive processes of nature systems. Second, to design artificial systems software that retains the important mechanisms of nature system. These methods are capable of applying in many fields and perform well. From the viewpoint of AI research, Holland's method provides a good mechanism of learning. GAs are population-based search techniques that maintain populations of potential solutions during searches. A string with a fixed bit-length usually represents a potential solution. In order to evaluate each potential solution, GAs need a payoff (or reward, objective) function that assigns scalar payoff to any particular solution. Once the representation scheme and evaluation function is determined, a GA can start searching. Initially, often at random, GAs create a certain number, called the population size, of strings to form the first generation. Next, the payoff function is used to evaluate each solution in this first generation. Better solutions obtain higher payoffs. Then, on the basis of these evaluations, some genetic operations are employed to generate the next generation. The procedures of evaluation and generation are iteratively performed until the optimal solution(s) is (are) found or the time allotted for computation ends. The goal of this paper is to review mining frequent item sets using different methods. II. DIFFERENT METHODS OF MINING FREQUENT ITEM SETS A. An Algorithm for Frequent Pattern Mining Based On Apriori: Frequent pattern mining is a heavily researched area in the field of data mining with wide range of applications. Mining frequent patterns from largescale databases has emerged as an important problem in data mining and knowledge discovery community number of algorithms has been proposed to determine frequent pattern. Apriori algorithm is the first algorithm proposed in this field. With the time a number of changes proposed in Apriori to enhance the performance in term of time and number of database passes. In this paper three different frequent pattern mining approaches (Record filter, Intersection and Proposed Algorithm) are given based on classical Apriori algorithm. In these approaches Record filter approach proved better than classical Apriori Algorithm, Intersection approach proved better than Record filter approach and finally proposed algorithm proved that it is much better than other frequent pattern mining algorithm. In last we perform a comparative study of all approaches on dataset of 2000 transaction. Conclusion- Association rule mining has a wide range of applicability such as market basket analysis, medical diagnosis/ research, website navigation analysis, homeland security and so on. In this method, we surveyed the list of existing association rule mining techniques and compare these algorithms with our modified approach. The conventional algorithm of association rules discovery proceeds in two and more steps but in our approach discovery of all frequent item will take the same steps but it will take the less time as compare to the conventional algorithm. We can conclude that in this new approach, we have the key ideas of reducing time. As we have proved above how the proposed Apriori algorithm take less time than that of classical apriori algorithms. That is really going to be fruitful in saving the time in case of large database. This key idea is surely going to open a new gateway for the upcoming researcher to work in the filed of the data mining.
  • 2. Mining Frequent Item set Using Genetic Algorithm (IJSRD/Vol. 1/Issue 3/2013/0066) All rights reserved by www.ijsrd.com 660 B. Efficient Algorithm For Mining Frequent Itemsets Using Clustering Techniques Now a days, Association rule plays an important role. The purchasing of one product when another product is purchased represents an association rule. The Apriori algorithm is the basic algorithm for mining association rules. This paper presents an efficient Partition Algorithm for Mining Frequent Itemsets(PAFI) using clustering. This algorithm finds the frequent itemsets by partitioning the database transactions into clusters. Clusters are formed based on the similarity measures between the transactions. Then it finds the frequent itemsets with the transactions in the clusters directly using improved Apriori algorithm which further reduces the number of scans in the database and hence improve the efficiency. In this method, the Partition Algorithm for Frequent Itemset (PAFI) is proposed before applying Improved Apriori Algorithm. This algorithm reduces the number of scans in the database and improves efficiency and computing time by taking the advantage of clustering technique. By experiment results, it can obtain higher efficiency. C. Efficient Hardware Data Mining For Frequent Item-set With Apriori Algorithm The Apriori algorithm is a popular correlation-based data mining kernel. However, it is a computationally expensive algorithm and the running times can stretch up to days for large databases, as database sizes can extend to Gigabytes. Through the use of a new extension to the systolic array architecture, time required for processing can be signicantly reduced. Our array architecture implementation on a Xilinx Virtex-II Pro 100 provides a performance improvement that can be orders of magnitude faster than the state-of-the-art software implementations. The system is easily scalable and introduces an efficient .systolic injection method for intelligently reporting unpredictably generated mid-array results to a controller without any chance of collision or excessive stalling. FPGA implementations of the Apriori algorithm can provide significant performance improvement over software-based approaches. We are also interested in implementing some of the more recent (and more control- intensive and memory-intensive) approaches in hardware, including hash-based strategies such as DHP and trie-based approaches. It may be possible to increase the bandwidth of the system by processing several sub-partitions of a set in parallel. We are also interested in leveraging our experience with high-performance string matching for autonomous pattern generation for network security. D. Mining Frequent Item set for Non Binary Data set using Genetic Algorithm Frequent itemset mining is a basic problem in data mining and knowledge discovery. The discovered patterns can be used as input for Association rules, which are useful in many application domains. We have considered a large database of customer transactions from a super market. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. In general the association rule mining algorithms like Apriori, partition, pincer-search, incremental, border algorithm etc, does not consider negation occurrence of the attribute in them and also take more time to compute all the frequent itemsets. By using Genetic Algorithm (GA), we can improve the scenario and the system can predict the rules which contain negative attributes in the generated rules, even with more than one attribute in consequent part. The major advantage of using GA in the discovery of frequent itemsets is that they perform global search and its time complexity is less compared to that of other algorithms which are based on the greedy approach. The main aim of this method is to find all possible frequent item sets from given dataset using the genetic algorithm. III. INTRODUCTION TO GENETIC ALGORITHM Genetic algorithms (GAs), inspired by biological development, are efficient domain self-sufficient search methods. That is, these methods could help us in effectively solving problem in different application domain. The goals of Holland's research have been twofold: First, to conceptual and strictly explain the adaptive processes of character systems. Second, to design artificial systems software that retains the important mechanisms of nature system. These methods are capable of applying in many fields and execute well. From the viewpoint of AI research, Holland's method provides a good mechanism of learning. Genetic Algorithms are population-based search techniques that maintain populations of probable solutions during searches. A string with a fixed bit-length usually represents a probable solution. In order to assess each potential solution, GAs need a payoff (or reward, objective) functions that assigns scalar induce to any particular solution. Once the representation scheme and estimate function is determined, a GA can start searching. Initially, often at casual, GAs create a positive number, called the population size, of strings to form the first generation. Next, the payoff function is used to evaluate each solution in this first generation. Better solutions obtain higher payoffs. Then, on the basis of these evaluation, some genetic operations are employed to generate the next generation. A. SIMPLE GENETIC ALGORITHM 1. [Start] Generate random population of n chromosomes (suitable solutions for the problem) 2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population 3. [New population] Create a new population by repeating following steps until the new population is complete a) [Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected) b) [Crossover] With a crossover probability cross over the parents to form new offspring (children). If no crossover was performed, offspring is the exact copy of parents. c) [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome).
  • 3. Mining Frequent Item set Using Genetic Algorithm (IJSRD/Vol. 1/Issue 3/2013/0066) All rights reserved by www.ijsrd.com 661 d) [Accepting] Place new offspring in the new population 4. [Replace] Use new generated population for a further run of the algorithm 5. [Test] If the end condition is satisfied, stop, and return the best solution in current population [Loop] Go to step 2 IV. GENETIC ALGORITHM VS TRADITIONAL METHODS The following list gives the essential differences between GAs and other forms of optimization. 1. Genetic algorithms a coded form of the function values (parameter set), rather than with the actual values themselves. So, for example, if we want to find the minimum of the function f(x) = x3+x2+5, the GA would not deal directly with x or y values, but with strings that encode these values. For this case, strings representing the binary x values should be used. 2. Genetic algorithms use a set, or population, of points to conduct a search, not just a single point on the problem space. This gives GAs the power to search noisy spaces littered with local optimum points. Instead of relying on a single point to search through the space, the GAs looks at many different areas of the problem space at once, and uses all of this information to guide it. 3. Genetic algorithms use only payoff information to guide themselves through the problem space. Many search techniques need a variety of information to guide themselves. Hill climbing methods require derivatives, for example. The only information a GA needs is some measure of fitness about a point in the space (sometimes known as an objective function value). Once the GA knows the current measure of "goodness" about a point, it can use this to continue searching for the optimum. 4. GAs are probabilistic in nature, not deterministic. This is a direct result of the randomization techniques used by GAs. 5. GAs are inherently parallel. Here lies one of the most powerful features of genetic algorithms. GAs, by their nature are very parallel, dealing with a large number of points (strings) simultaneously. Ad hoc approach (analytical, specific) Genetic approach Speed Depending on solution, generally good Median or low Performance Depending on solution Fair to excellent Problem understanding Necessary Not necessary Human work needed A few minutes to a few theses A few days Applicability Low: Most interesting problems have no usable mathematical expression, or are non- computable, or "NP-complete" (too many solutions to try them all) General Intermediary steps are not solutions (you must wait until the end of computation) are solutions (the solving process can be interrupted at any time, though the later the better) Table. 1: Comparison GA with traditional Algorithms. [1] Agrawal R., Imielinski T. and Swami A. (1993) Mining Association rules between sets of items in large databases, In the Proc. of the ACM SIGMOD Int’l Conf. on Management of Data (ACM SIGMOD ‘93),Washington, USA, 207-216. [2] Pei M., Goodman E.D., Punch F. (2000) Feature Extraction using genetic algorithm, CaseCenter for Computer-Aided Engineering and built-up W. Department of Computer Science. [3] Han J., Kamber M. Data Mining: Concepts & Techniques, Morgan & Kaufmann, 2000. [4] Pujari A.K., Data Mining Techniques, Universities Press, 2001. [5] Arun K Pujari. Data Mining Techniques (Edition 5th):Hyderabad, India: Universities Press (India) Private Limited, 2003. [6] Jiawei Han. Data Mining, concepts and Techniques: San Francisco, CA: Morgan Kaufmann Publishers.,2004.