SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 257
AN EFFECTIVE ADAPTIVE APPROACH FOR JOINING DATA IN DATA
WAREHOUSE
Sudha.S1
, Manikandan.S2
1
Research Scholar, Adhiparasakthi Engineering College, Chennai, Tamilnadu, India
2
Professor, Computer Applications, RMD Engineering College, Chennai, Tamilnadu, India
Abstract
Formulation of efficient assessment is important for the businesses, because its retrieve lot of details from the data warehouse. In
Data warehouses have materialize as original business information pattern where data store and maintain in concurrent. The
adaptations are requiring in the implementation of Extract Transform Load (ETL) operations. The several methods are included to
joining stream and produce the innovative relation .The previous work was used the adaptive join in data warehouse using ETL
procedure. This approach was common conspire, which create many possible solution. But the drawback of the previous approach is
its not consider exact reproduction. To rise above the question, we are going to present genetic algorithm for joining stream of data .
Several queries and streams are combined in data warehouse the selection of exact grouping of multiple associations are complete via
genetic algorithm. The crossover as well as mutation prefer the paramount consortium of several associations of link for by retrieving
the data and produces the output in data warehouse. The performance of the proposed genetic algorithm used to deliver efficient
highest join data and increase the scalability.
Keywords: join, stream, relation, Genetic algorithm.
-----------------------------------------------------------------------***-----------------------------------------------------------------------
1. INTRODUCTION
Different way we have to store the data in different database.
Genetic Algorithms are powerful search techniques used to
solve many difficult problems. Despite the great successes
achieved in real-world applications, GAs have some
drawbacks. In the genetic algorithm lot of fitness calculations
are necessaryed before an acceptable solution can be found.
Fitness evaluation is not easy in many real-world applications.
There are several situations, in which the fitness evaluation
becomes computationally difficult, so, GAs can be very
demanding in terms of computation and GAs are frequently
used to solve search and optimization problems since they
were first introduced by Holland [1]. They have gained this
prominence by their robustness and simplicity they offer.
Individuals with higher aptitude have more 26 probability to
survive, to reproduce, and to transmit their genetic
characteristics to future generations. GAs can perform
efficient search operations in problem spaces where it is not
easy to understand the environment. Each potential solution in
the search space is considered as an individual (phenotype).
Individuals are represented by using strings that are called
chromosomes. Genes are the atomic parts of chromosomes
and they codify a specific characteristic of a chromosome.
There are several approaches to encode individuals for a
variety of applications. GAs generate a an initial population at
the first phase of the algorithm and then, selection (for mating
and carrying its genes to the next generation, giving higher
probability to "fitter" individuals representing better solutions)
crossover (method for combining genes of two mating
parents), and mutation operations are applied randomly on the
current generation, creating the next generation of solutions
[3]. Individual having the best fitness in the population is the
proposed solution of the problem. For each pair of mating
individuals, the parents' chromosomes are split in two (or
more) parts and genes selected from both parents are
combined to generate a new chromosome which joins the
population. Mutations are also possible where an individual's
randomly selected gene is mutated. Mutations prevent
stagnation and enable a wider exploration of search space that
could not be reached with crossover operators as it is limited
with the available genes in the current population [2]. In order
to keep the population size constant a method is defined for
selecting the individuals that will be copied to the next
generation, and another iteration begins. Termination can be
based on either by production of a fixed number of
generations or the algorithm can terminate when the amount of
improvement in the overall generation quality (e.g. average
fitness values for individuals in a generation) falls below a
predetermined threshold. Also, GA can be terminated at any
given time reporting the best currently available individual as
the discovered solution, thus making GA very suitable for
real-time problems where only a fixed amount of time is
available for optimization.
After an introduction ETL and Genetic Algorithm in Section1,
related literature review of the joining algorithms are given in
Section 2. Section 3 defines the main process models and the
mutation and offspring structure . Evaluation of performance
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 258
and the related results are presented in Section 4. Concluding
clarification and the future works are given in Section 5.
2. RELATED WORKS
Hash-Merge Join (HMJ) (Mokbel et al. 2004), also one from
the series of symmetric joins, is based on push technology and
consists of two phases, hashing and merging. All three
approaches above do not consider the metadata about the
stream. Therefore, they are unable to identify the data which is
no-longer required and by plummeting it the overhead can be
reduced. In addition, the join approaches above focus on
throughput optimization while ignoring the other optimization
goals such as the characteristics of stream data which are
correspondingly important.
The MESHJOIN (Mesh Join) algorithm (Polyzotis et al. 2007)
(Polyzotis et al. 2008) has been presented with the objective to
amortize the slow disk access with as many stream tuples as
possible. To perform the join, the algorithm keeps a number of
hunks of stream in memory at the same time. In each iteration
the algorithm loads a disk partition into memory and attains
the join with all these stream chunks. The algorithm performs
tuning for efficient memory transfer among the join
components, but It is identified in the past some issues around
the access to the disk based relation. Also MESHJOIN cannot
deal with intermittency of the stream competently.
R-MESHJOIN (reduced Mesh Join) (Naeem et al. 2010) is an
enhanced form of MESHJOIN in which one issue related to
suboptimal distribution of memory among the join
components is resolved. However, R-MESHJOIN implements
the same strategy as the MESHJOIN algorithm for retrieving
the disk-based relation.
A partition-based approach (Chakraborty et al. 2009) has been
introduced to deal with intermittency in the stream. It uses a
two-level hash table to attempt to join stream tuples as soon as
they arrive, and uses a partition-based waiting area for the
other stream tuples. The authors do not provide a cost model
for their approach. In addition, the algorithm needs a clustered
index or an equivalent sorting on the join attribute and it does
not prevent starvation of stream tuples.
One recent algorithm, HYBRIDJOIN (Hybrid Join) (Naeem et
al. 2011) address the issue of retrieving the disk-based
relation. An active strategy to access the disk-based relation is
introduced in HYBRIDJOIN.
Another advantage of HYBRIDJOIN is that it can deal with
burst streams, which is a curb of both MESHJOIN and R-
MESHJOIN. However, if it is consider long-tail distributions,
it is find that the algorithm can be improved further.
3. SYSTEM MODEL
The proposed work used Genetic Algorithm is designed here
to perform the multi-join operation efficiently in active data
warehouse. Normally, in active data warehouse, the
information should keep up-to-date with recent values in the
database. So, to add the recent values with active data
warehouse, the genetic algorithm is used here by performing
the highest joining operation in the source streams. The
architecture diagram of the proposed work to perform the
highest joining operation using GA is shown in Fig 1
Fig. 1 System Architecture Diagram of highest joining
operation using Genetic Algorithm
In this system model we can insert the current data to active
data warehouse, the GA use to verify the exact data and add it
in to the warehouse. so this model explain task to execute the
highly join using GA. The main component of the figure .1.
are crossover estimation and mutation, while the source stream
S and relation Rare the input. We first select the relation R
based on the source stream and perform the highly join
operation using Genetic Algorithm process.
Initial relations are loaded in the relation table, the quality of
relations are evaluated based on the fitness .so the initial
relation store in the relation table.
4. PERFORMANCE EVALUATION
The performance of GA based highly join operation in data
warehouse are considered the following ,Data retrieval ,
Efficiency of join operation , Scalability
Table 1
No of Source Max Process Rate
95 1000
100 2800
189 4000
300 4450
670 5600
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 259
Table 1 given details about the number of source arrival in the
method and maximum process rate. Based on this its
fashioned the following presentation.
5. CONCLUSIONS
In this paper, we propose a genetic algorithm based to improve
the performance of our existing join data. This work carry out
highly join operation with relation table in limited
remembrance by using genetic algorithm. An efficient way to
retrieve the data using this GA,GA effectively perform highly
join data by selection ,crossover and mutation.
REFERENCES
[1]. Holland, J.H. Adaptation in Natural and Artificial
Systems.University of Michigan Press, 1975, Ann Arbor, MI,
USA.
[2]. Sevinc, E., and Cosar, A. An Evolutionary Genetic
Algorithm for Optimization of Distributed Database Queries.
The Computer Journal, vol.54, issue: 5, 2011, 717-725.
[3]. Swami, A., and Gupta, A. Optimization of large join
queries.Proceedings of ACM SIGMOD Conf. on Management
of Data, Chicago, Ill, May,1988, 8–17.
[4]. S.Sudha and S.Manikandan,” ADAPTIVE APPROACH
FOR JOINING AND SUBMISSIVE VIEW OF DATA IN
DATA WAREHOUSE USING ETL”, Indian Journal of
Computer Science and Engineering (IJCSE), Vol. 4 No.3 Jun-
Jul 2013 Pp.250-252
[5]. Naeem, M. A., Dobbie, G. & Weber, G. (2011),
‘HYBRIDJOIN for Near-real-time Data Warehousing’,
International Journal of Data Warehousing and Mining
(IJDWM), IGI Global.
[6]. Naeem, M. A., Dobbie, G. & Weber, G. (2011), X-
HYBRIDJOIN for Near-real-time Data Warehousing, in
‘Proceedings of 28th British National Conference on
Databases (BNCOD ’11)’, Springer, Manchester, UK, pp. 33–
47.
BIOGRAPHIES
Sudha.S received the Master Degree in Computer
Applications from the Bharathidhasan University, Trichy,
India in the year 2000, and the M.Phil degree from the same
University in the year 2003.
She is teaching profession for the past 13 years. Previously she
worked as a Lecturer in the Department of M.S and B.Teh(IT).
She is currently an Assistant Professor; Adhiparasakthi
Engineering College affiliated to Anna University, Chennai,
India from 2003 to till. She is currently perusing here PhD
degree in Department Computer Science, Anna University,
Chennai, India. She has published 2 papers in refereed
journals and 5 papers national and international conference
proceedings.
Dr. S. Manikandan received B.Sc. degree in Mathematics
from Madurai Kamaraj University, Madurai in 1996 and
M.C.A. degree from Bharathidasan University, Tiruchirapalli,
in 1999. He received his M.Phil. degree in Computer Science
from Manonmaniam Sundaranar University, Tirunelveli, in
2003 and Ph.D degree from Anna University Chennai in the
year 2009.
He is teaching profession for the past 11 years. Previously he
worked as an Assistant Professor, PSNA Engineering college,
Dindigul. He is currently Director in the Department of
Computer Applications RMD engineering College, Chennai
.He has published six research articles in International and
National journals and presented twenty papers in refereed
national & International Conferences.
0
1000
2000
3000
4000
5000
6000
7000
0 500 1000
processrate
No of source
Max
Process
Rate
Linear (Max
Process
Rate)

More Related Content

PDF
IRJET- Evidence Chain for Missing Data Imputation: Survey
PDF
H04564550
PDF
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
PDF
Partitioning of Query Processing in Distributed Database System to Improve Th...
PDF
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
PDF
A unified approach for spatial data query
PDF
Using particle swarm optimization to solve test functions problems
PDF
Comparative study of various supervisedclassification methodsforanalysing def...
IRJET- Evidence Chain for Missing Data Imputation: Survey
H04564550
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
Partitioning of Query Processing in Distributed Database System to Improve Th...
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
A unified approach for spatial data query
Using particle swarm optimization to solve test functions problems
Comparative study of various supervisedclassification methodsforanalysing def...

What's hot (16)

PDF
IRJET- Customer Relationship and Management System
PDF
Assessment of Cluster Tree Analysis based on Data Linkages
PDF
Ijmet 10 01_141
PDF
Enhancing the labelling technique of
PDF
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...
PDF
QUERY INVERSION TO FIND DATA PROVENANCE
PDF
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
PDF
Ijmer 46062932
PDF
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
PDF
50120130406007
PDF
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
PDF
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
PDF
B0930610
PDF
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
PDF
Applying the big bang-big crunch metaheuristic to large-sized operational pro...
PDF
158822 article text-413072-1-10-20170718
IRJET- Customer Relationship and Management System
Assessment of Cluster Tree Analysis based on Data Linkages
Ijmet 10 01_141
Enhancing the labelling technique of
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...
QUERY INVERSION TO FIND DATA PROVENANCE
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
Ijmer 46062932
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
50120130406007
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
B0930610
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Applying the big bang-big crunch metaheuristic to large-sized operational pro...
158822 article text-413072-1-10-20170718
Ad

Viewers also liked (20)

PDF
Performance studies of microbial fuel cell
PDF
Improvement in the usability of gis based services by
PDF
Design and implementation of secured scan based attacks on ic’s by using on c...
PDF
Effect of count and stitch length on spirality of single jersey knit fabric
PDF
To the numerical modeling of self similar solutions of
PDF
Investigation on multi cylinder s.i engine using blends of hydrogen and cng
PDF
Bivariatealgebraic integerencoded arai algorithm for
PDF
Nesting of five modulus method with improved lsb subtitution to hide an image...
PDF
Literature survey for 3 d reconstruction of brain mri
PDF
Implementation of dynamic source routing (dsr) in
PDF
Intrusion detection in heterogeneous network by multipath routing based toler...
PDF
A novel mrp so c processor for dispatch time curtailment
PDF
Effects of aging time on mechanical properties of sand cast al 4.5 cu alloy
PDF
Wound epithelization model by 3 d imaging
PDF
Ultimate strength analysis of box girder under hoggong bending moment, torque...
PDF
Study of shear walls in multistoried buildings with different thickness and r...
PDF
Mechatronics engineering education in republic of benin opportunities and cha...
PDF
Power quality enhancement by improving voltage stability using dstatcom
PDF
Simulation and performance analysis of blast
PPT
20160219 - F. Grati - Toma - Maternal Malignancies
Performance studies of microbial fuel cell
Improvement in the usability of gis based services by
Design and implementation of secured scan based attacks on ic’s by using on c...
Effect of count and stitch length on spirality of single jersey knit fabric
To the numerical modeling of self similar solutions of
Investigation on multi cylinder s.i engine using blends of hydrogen and cng
Bivariatealgebraic integerencoded arai algorithm for
Nesting of five modulus method with improved lsb subtitution to hide an image...
Literature survey for 3 d reconstruction of brain mri
Implementation of dynamic source routing (dsr) in
Intrusion detection in heterogeneous network by multipath routing based toler...
A novel mrp so c processor for dispatch time curtailment
Effects of aging time on mechanical properties of sand cast al 4.5 cu alloy
Wound epithelization model by 3 d imaging
Ultimate strength analysis of box girder under hoggong bending moment, torque...
Study of shear walls in multistoried buildings with different thickness and r...
Mechatronics engineering education in republic of benin opportunities and cha...
Power quality enhancement by improving voltage stability using dstatcom
Simulation and performance analysis of blast
20160219 - F. Grati - Toma - Maternal Malignancies
Ad

Similar to An effective adaptive approach for joining data in data (20)

PDF
Improving the effectiveness of information retrieval system using adaptive ge...
PDF
50120130405011
PDF
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
PDF
T180203125133
PDF
International Journal of Engineering and Science Invention (IJESI)
PPTX
Join operation
DOC
PDF
Applying genetic algorithms to information retrieval using vector space model
PDF
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
PPTX
unit-2 Query processing and optimization,Query equivalence, Join strategies.pptx
PDF
Applying Genetic Algorithms to Information Retrieval Using Vector Space Model
PDF
Applying Genetic Algorithms to Information Retrieval Using Vector Space Model
PDF
A comprehensive study of non blocking joining techniques
PDF
A comprehensive study of non blocking joining technique
PDF
Augmentation of Customer’s Profile Dataset Using Genetic Algorithm
PDF
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
PDF
A genetic algorithm coupled with tree-based pruning for mining closed associa...
PDF
50120140501018
PPTX
Lazy beats Smart and Fast
Improving the effectiveness of information retrieval system using adaptive ge...
50120130405011
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
T180203125133
International Journal of Engineering and Science Invention (IJESI)
Join operation
Applying genetic algorithms to information retrieval using vector space model
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
unit-2 Query processing and optimization,Query equivalence, Join strategies.pptx
Applying Genetic Algorithms to Information Retrieval Using Vector Space Model
Applying Genetic Algorithms to Information Retrieval Using Vector Space Model
A comprehensive study of non blocking joining techniques
A comprehensive study of non blocking joining technique
Augmentation of Customer’s Profile Dataset Using Genetic Algorithm
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
A genetic algorithm coupled with tree-based pruning for mining closed associa...
50120140501018
Lazy beats Smart and Fast

More from eSAT Publishing House (20)

PDF
Likely impacts of hudhud on the environment of visakhapatnam
PDF
Impact of flood disaster in a drought prone area – case study of alampur vill...
PDF
Hudhud cyclone – a severe disaster in visakhapatnam
PDF
Groundwater investigation using geophysical methods a case study of pydibhim...
PDF
Flood related disasters concerned to urban flooding in bangalore, india
PDF
Enhancing post disaster recovery by optimal infrastructure capacity building
PDF
Effect of lintel and lintel band on the global performance of reinforced conc...
PDF
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
PDF
Wind damage to buildings, infrastrucuture and landscape elements along the be...
PDF
Shear strength of rc deep beam panels – a review
PDF
Role of voluntary teams of professional engineers in dissater management – ex...
PDF
Risk analysis and environmental hazard management
PDF
Review study on performance of seismically tested repaired shear walls
PDF
Monitoring and assessment of air quality with reference to dust particles (pm...
PDF
Low cost wireless sensor networks and smartphone applications for disaster ma...
PDF
Coastal zones – seismic vulnerability an analysis from east coast of india
PDF
Can fracture mechanics predict damage due disaster of structures
PDF
Assessment of seismic susceptibility of rc buildings
PDF
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
PDF
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Likely impacts of hudhud on the environment of visakhapatnam
Impact of flood disaster in a drought prone area – case study of alampur vill...
Hudhud cyclone – a severe disaster in visakhapatnam
Groundwater investigation using geophysical methods a case study of pydibhim...
Flood related disasters concerned to urban flooding in bangalore, india
Enhancing post disaster recovery by optimal infrastructure capacity building
Effect of lintel and lintel band on the global performance of reinforced conc...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Shear strength of rc deep beam panels – a review
Role of voluntary teams of professional engineers in dissater management – ex...
Risk analysis and environmental hazard management
Review study on performance of seismically tested repaired shear walls
Monitoring and assessment of air quality with reference to dust particles (pm...
Low cost wireless sensor networks and smartphone applications for disaster ma...
Coastal zones – seismic vulnerability an analysis from east coast of india
Can fracture mechanics predict damage due disaster of structures
Assessment of seismic susceptibility of rc buildings
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...

Recently uploaded (20)

PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Welding lecture in detail for understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
bas. eng. economics group 4 presentation 1.pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Well-logging-methods_new................
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Construction Project Organization Group 2.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
CH1 Production IntroductoryConcepts.pptx
Arduino robotics embedded978-1-4302-3184-4.pdf
Structs to JSON How Go Powers REST APIs.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Digital Logic Computer Design lecture notes
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Welding lecture in detail for understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Lesson 3_Tessellation.pptx finite Mathematics
bas. eng. economics group 4 presentation 1.pptx
573137875-Attendance-Management-System-original
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
OOP with Java - Java Introduction (Basics)
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Well-logging-methods_new................
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Construction Project Organization Group 2.pptx

An effective adaptive approach for joining data in data

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 257 AN EFFECTIVE ADAPTIVE APPROACH FOR JOINING DATA IN DATA WAREHOUSE Sudha.S1 , Manikandan.S2 1 Research Scholar, Adhiparasakthi Engineering College, Chennai, Tamilnadu, India 2 Professor, Computer Applications, RMD Engineering College, Chennai, Tamilnadu, India Abstract Formulation of efficient assessment is important for the businesses, because its retrieve lot of details from the data warehouse. In Data warehouses have materialize as original business information pattern where data store and maintain in concurrent. The adaptations are requiring in the implementation of Extract Transform Load (ETL) operations. The several methods are included to joining stream and produce the innovative relation .The previous work was used the adaptive join in data warehouse using ETL procedure. This approach was common conspire, which create many possible solution. But the drawback of the previous approach is its not consider exact reproduction. To rise above the question, we are going to present genetic algorithm for joining stream of data . Several queries and streams are combined in data warehouse the selection of exact grouping of multiple associations are complete via genetic algorithm. The crossover as well as mutation prefer the paramount consortium of several associations of link for by retrieving the data and produces the output in data warehouse. The performance of the proposed genetic algorithm used to deliver efficient highest join data and increase the scalability. Keywords: join, stream, relation, Genetic algorithm. -----------------------------------------------------------------------***----------------------------------------------------------------------- 1. INTRODUCTION Different way we have to store the data in different database. Genetic Algorithms are powerful search techniques used to solve many difficult problems. Despite the great successes achieved in real-world applications, GAs have some drawbacks. In the genetic algorithm lot of fitness calculations are necessaryed before an acceptable solution can be found. Fitness evaluation is not easy in many real-world applications. There are several situations, in which the fitness evaluation becomes computationally difficult, so, GAs can be very demanding in terms of computation and GAs are frequently used to solve search and optimization problems since they were first introduced by Holland [1]. They have gained this prominence by their robustness and simplicity they offer. Individuals with higher aptitude have more 26 probability to survive, to reproduce, and to transmit their genetic characteristics to future generations. GAs can perform efficient search operations in problem spaces where it is not easy to understand the environment. Each potential solution in the search space is considered as an individual (phenotype). Individuals are represented by using strings that are called chromosomes. Genes are the atomic parts of chromosomes and they codify a specific characteristic of a chromosome. There are several approaches to encode individuals for a variety of applications. GAs generate a an initial population at the first phase of the algorithm and then, selection (for mating and carrying its genes to the next generation, giving higher probability to "fitter" individuals representing better solutions) crossover (method for combining genes of two mating parents), and mutation operations are applied randomly on the current generation, creating the next generation of solutions [3]. Individual having the best fitness in the population is the proposed solution of the problem. For each pair of mating individuals, the parents' chromosomes are split in two (or more) parts and genes selected from both parents are combined to generate a new chromosome which joins the population. Mutations are also possible where an individual's randomly selected gene is mutated. Mutations prevent stagnation and enable a wider exploration of search space that could not be reached with crossover operators as it is limited with the available genes in the current population [2]. In order to keep the population size constant a method is defined for selecting the individuals that will be copied to the next generation, and another iteration begins. Termination can be based on either by production of a fixed number of generations or the algorithm can terminate when the amount of improvement in the overall generation quality (e.g. average fitness values for individuals in a generation) falls below a predetermined threshold. Also, GA can be terminated at any given time reporting the best currently available individual as the discovered solution, thus making GA very suitable for real-time problems where only a fixed amount of time is available for optimization. After an introduction ETL and Genetic Algorithm in Section1, related literature review of the joining algorithms are given in Section 2. Section 3 defines the main process models and the mutation and offspring structure . Evaluation of performance
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 258 and the related results are presented in Section 4. Concluding clarification and the future works are given in Section 5. 2. RELATED WORKS Hash-Merge Join (HMJ) (Mokbel et al. 2004), also one from the series of symmetric joins, is based on push technology and consists of two phases, hashing and merging. All three approaches above do not consider the metadata about the stream. Therefore, they are unable to identify the data which is no-longer required and by plummeting it the overhead can be reduced. In addition, the join approaches above focus on throughput optimization while ignoring the other optimization goals such as the characteristics of stream data which are correspondingly important. The MESHJOIN (Mesh Join) algorithm (Polyzotis et al. 2007) (Polyzotis et al. 2008) has been presented with the objective to amortize the slow disk access with as many stream tuples as possible. To perform the join, the algorithm keeps a number of hunks of stream in memory at the same time. In each iteration the algorithm loads a disk partition into memory and attains the join with all these stream chunks. The algorithm performs tuning for efficient memory transfer among the join components, but It is identified in the past some issues around the access to the disk based relation. Also MESHJOIN cannot deal with intermittency of the stream competently. R-MESHJOIN (reduced Mesh Join) (Naeem et al. 2010) is an enhanced form of MESHJOIN in which one issue related to suboptimal distribution of memory among the join components is resolved. However, R-MESHJOIN implements the same strategy as the MESHJOIN algorithm for retrieving the disk-based relation. A partition-based approach (Chakraborty et al. 2009) has been introduced to deal with intermittency in the stream. It uses a two-level hash table to attempt to join stream tuples as soon as they arrive, and uses a partition-based waiting area for the other stream tuples. The authors do not provide a cost model for their approach. In addition, the algorithm needs a clustered index or an equivalent sorting on the join attribute and it does not prevent starvation of stream tuples. One recent algorithm, HYBRIDJOIN (Hybrid Join) (Naeem et al. 2011) address the issue of retrieving the disk-based relation. An active strategy to access the disk-based relation is introduced in HYBRIDJOIN. Another advantage of HYBRIDJOIN is that it can deal with burst streams, which is a curb of both MESHJOIN and R- MESHJOIN. However, if it is consider long-tail distributions, it is find that the algorithm can be improved further. 3. SYSTEM MODEL The proposed work used Genetic Algorithm is designed here to perform the multi-join operation efficiently in active data warehouse. Normally, in active data warehouse, the information should keep up-to-date with recent values in the database. So, to add the recent values with active data warehouse, the genetic algorithm is used here by performing the highest joining operation in the source streams. The architecture diagram of the proposed work to perform the highest joining operation using GA is shown in Fig 1 Fig. 1 System Architecture Diagram of highest joining operation using Genetic Algorithm In this system model we can insert the current data to active data warehouse, the GA use to verify the exact data and add it in to the warehouse. so this model explain task to execute the highly join using GA. The main component of the figure .1. are crossover estimation and mutation, while the source stream S and relation Rare the input. We first select the relation R based on the source stream and perform the highly join operation using Genetic Algorithm process. Initial relations are loaded in the relation table, the quality of relations are evaluated based on the fitness .so the initial relation store in the relation table. 4. PERFORMANCE EVALUATION The performance of GA based highly join operation in data warehouse are considered the following ,Data retrieval , Efficiency of join operation , Scalability Table 1 No of Source Max Process Rate 95 1000 100 2800 189 4000 300 4450 670 5600
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 259 Table 1 given details about the number of source arrival in the method and maximum process rate. Based on this its fashioned the following presentation. 5. CONCLUSIONS In this paper, we propose a genetic algorithm based to improve the performance of our existing join data. This work carry out highly join operation with relation table in limited remembrance by using genetic algorithm. An efficient way to retrieve the data using this GA,GA effectively perform highly join data by selection ,crossover and mutation. REFERENCES [1]. Holland, J.H. Adaptation in Natural and Artificial Systems.University of Michigan Press, 1975, Ann Arbor, MI, USA. [2]. Sevinc, E., and Cosar, A. An Evolutionary Genetic Algorithm for Optimization of Distributed Database Queries. The Computer Journal, vol.54, issue: 5, 2011, 717-725. [3]. Swami, A., and Gupta, A. Optimization of large join queries.Proceedings of ACM SIGMOD Conf. on Management of Data, Chicago, Ill, May,1988, 8–17. [4]. S.Sudha and S.Manikandan,” ADAPTIVE APPROACH FOR JOINING AND SUBMISSIVE VIEW OF DATA IN DATA WAREHOUSE USING ETL”, Indian Journal of Computer Science and Engineering (IJCSE), Vol. 4 No.3 Jun- Jul 2013 Pp.250-252 [5]. Naeem, M. A., Dobbie, G. & Weber, G. (2011), ‘HYBRIDJOIN for Near-real-time Data Warehousing’, International Journal of Data Warehousing and Mining (IJDWM), IGI Global. [6]. Naeem, M. A., Dobbie, G. & Weber, G. (2011), X- HYBRIDJOIN for Near-real-time Data Warehousing, in ‘Proceedings of 28th British National Conference on Databases (BNCOD ’11)’, Springer, Manchester, UK, pp. 33– 47. BIOGRAPHIES Sudha.S received the Master Degree in Computer Applications from the Bharathidhasan University, Trichy, India in the year 2000, and the M.Phil degree from the same University in the year 2003. She is teaching profession for the past 13 years. Previously she worked as a Lecturer in the Department of M.S and B.Teh(IT). She is currently an Assistant Professor; Adhiparasakthi Engineering College affiliated to Anna University, Chennai, India from 2003 to till. She is currently perusing here PhD degree in Department Computer Science, Anna University, Chennai, India. She has published 2 papers in refereed journals and 5 papers national and international conference proceedings. Dr. S. Manikandan received B.Sc. degree in Mathematics from Madurai Kamaraj University, Madurai in 1996 and M.C.A. degree from Bharathidasan University, Tiruchirapalli, in 1999. He received his M.Phil. degree in Computer Science from Manonmaniam Sundaranar University, Tirunelveli, in 2003 and Ph.D degree from Anna University Chennai in the year 2009. He is teaching profession for the past 11 years. Previously he worked as an Assistant Professor, PSNA Engineering college, Dindigul. He is currently Director in the Department of Computer Applications RMD engineering College, Chennai .He has published six research articles in International and National journals and presented twenty papers in refereed national & International Conferences. 0 1000 2000 3000 4000 5000 6000 7000 0 500 1000 processrate No of source Max Process Rate Linear (Max Process Rate)