SlideShare a Scribd company logo
Knowledge Discovery inKnowledge Discovery in
Remote Access DatabasesRemote Access Databases
A thesis submitted in partial fulfillment of the requirements for the degree ofA thesis submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Computer ScienceDoctor of Philosophy in Computer Science
at the Institute of Mathematics and Computer Science Informaticsat the Institute of Mathematics and Computer Science Informatics
Debrecen of UniversityDebrecen of University
By Zakaria Suliman ZubiBy Zakaria Suliman Zubi
Supervised by Prof. Arato Matyas andSupervised by Prof. Arato Matyas and
Prof.Fazekas GáborProf.Fazekas Gábor
2
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
3
Introduction to KDDIntroduction to KDD
and DMand DM
 KDD is the process of extracting interesting (non-trivial, implicit,
previously unknown and potentially useful) information or
patterns from data in large databases.
 DM is a single step in KDD process which deals with extracting
trends or patterns from raw databases and carefully and
accurately transforms them into useful and understandable
information.
 In the introduction part (chapter 1) I will follow the structure of
expressing the History, Importance, Appearances and Tools for
KDD and DM in all sections of the introduction part in this
thesis.
Is a phase in which
noise data and
irrelevant data are
removed from the
collection. Multiple data sources,
often heterogeneous, may
be combined in a common
source.
The data relevant to the
analysis is decided on
and retrieved from the
data collection.
It is a phase in which
the selected data is
transformed into forms
appropriate for the
mining procedure.
It is the crucial step in which
clever techniques are applied
to extract patterns potentially
useful information.
Strictly interesting patterns
representing knowledge are
identified based on a given
measures.
In the final phase in which
the discovered knowledge is
visually represented to the
user.
KDD process
4
Introduction to KDDIntroduction to KDD
and DMand DM
KDD & DM shared with several topic
5
Introduction to KDDIntroduction to KDD
and DMand DM
 Access to databases was established via Open Database
Connectivity (ODBC) .
 Querying the databases can be maintained by Structured Query
Language (SQL). The aim of using SQL is to allow users to define
the data in databases and manipulate that data (adding, deleting and
retrieving ) it from raw databases.
 Using Data Visualization to represent Data Mining results.
6
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
7
Goal of the Thesis WorkGoal of the Thesis Work
 In this thesis work, we investigated the problem of matching DM
problems with the set of DM algorithms that are suitable for solving it.
 The use of visualization and its integration with algorithmic
approaches to tune the parameters of DM algorithms, in order to
support the parameter selection process, currently only explored by
algorithmic approaches, in a more systematic form than using default
values or setting parameter values without clues.
 Introducing visualization to provide expressive information about
induced models and statistics entities, and to support the interactive and
dynamic exploration of induced models for DM.
8
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implantation of KDQL.
 Conclusion.
 Appendix A , B.
9
Remote Access KDD models
Connection between KDD and ODBC
10
The architectures of
ODBC_KDD(1) model
11
The architectures of
ODBC_KDD (2) model
12
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
13
Logical Foundation in Data
Mining (LFDM)
 Expressiveness :First order logic can represent more complex concepts than
traditional attribute-value languages.
 Readability : Formulae are easier to read than decision trees or a set of linear
equations.
 Background knowledge: Background knowledge can be grown during
discovery time for example, in time series.
 Multiple tables: Multiple database tables can be handled without explicit and
expensive joins.
 Deductive databases: Logical discovery engines can be transparently linked to
relational databases via deductive databases.
Advantages of Logical Foundation in Data Mining
Disadvantages of Logical Foundation in Data Mining
 Language complexity : First order hypothesis are usually constructed through heavy
search ( discovery feasible).
 Database access times: Checking one single candidate might involve heavy querying.
 Number handling: Logical approaches to discovery usually suffer from poor number
handling capabilities.
14
Translating first order queries into SQL
 In our natural language a question such as “find all employers who are
mangers and getting salary or expenses more than 1000000 HUF a year”:
 expensive_employee(Name) ← employee(Name, Salary1,
Manager),Salary1 > 1000000, employee(Manager, Salary2),Salary1 >
Salary2
 SELECT employee_0.NAME
FROM employee employee_0, employee employee_1
WHERE employee_0.SALARY > 1000000 AND
employee_1.NAME = employee_0.MANAGER AND
employee_0.SALARY > employee_1.SALARY
Logical Foundation in
Data Mining (LFDM)
15
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
16
Association Rules
 What is an Association Rule? Association rule is a set of items
T={ia,ib,..,it}
T I, where I is the set of all possible items {i1,i2,…,in} in
D the task relevant data, D is a set of transactions.
An association rule is of the form :
P  Q, where P I, Q I, and P Q =Ø.
P Q holds in D with support s and
P Q has a confidence c in the transaction set D
 Example: “In 80% of the cases when people buy bread, they also
buy milk”
Bread ==> milk /80%
Mining the DiscoveredMining the Discovered
Association RulesAssociation Rules
⊂
⊂ ⊂ ∩
y(Q/P)ProbabilitQ)(PConfidence =→
Q)y(PProbabilitQ)Support(P ∪=→
17
Mining the Association Rules
 What is Mining the association rule? Finding frequent patterns,
associations, correlations, or causal structures among sets of items or
objects in transaction databases, relational databases, and other
information repositories. Selecting the most "interesting" rules based on
their confidence factors. If holds in D with support s and has a
confidence c in the transaction set D.
 Applications: Basket data analysis, cross-marketing, catalog design,
loss-leader analysis, clustering, classification, etc.
 Examples:
 “Body → Head [support, confidence]”
 buys(x, “bread”) → buys(x, “milk”) [6%, 65%]
 major(x, “CS”) takes(x, “Database”) → grade(x, “5”) [1%, 75%]
Mining the DiscoveredMining the Discovered
Association RulesAssociation Rules
18
 How do we Mine Association Rules?
 Input :
 A database of transactions.
 Each transaction is a list of items (Ex. purchased by a customer
in a visit).
 Find all rules that associate the presence of one set of items with
that of another set of items.
 Example: “98% of people who purchase tires and auto
accessories also get automotive services done”
 There are no restrictions on number of items in the body of the
rule.
Mining the DiscoveredMining the Discovered
Association RulesAssociation Rules
Mining the Association Rules cont.
19
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
20
What is Data Mining Query Language?
 Data Mining Query Language (DMQL)Data Mining Query Language (DMQL): Is an iterative process to the
KDD process, which discovered knowledge and presented the
knowledge to the user, the evaluation measures can be enhanced, the
mining can be further refined, new data can be selected or further
transformed, or new data sources can be integrated, in order to get
different, more appropriate results.
Data Mining QueryData Mining Query
Language (DMQL)Language (DMQL)
21
Types of discovered patterns by DMQL
 Characterization: Data characterization is a summarization of general
features of objects in a target class, and produces what is called characteristic
rules.
 Discrimination: Data discrimination produces what are called discriminant
rules and is basically the comparison of the general features of objects
between two classes referred to as the target class and the contrasting class.
 Association analysis: Association analysis is the discovery of what are
commonly called association rules.
 Classification: Classification analysis is the organization of data in given
classes.
 Prediction: Prediction has attracted considerable attention given the potential
implications of successful forecasting in a business context.
 Clustering: clustering is the organization of data in classes.
 Outlier analysis: Outliers are data elements that cannot be grouped in a given
class or cluster.
 Evolution and deviation analysis: Evolution and deviation analysis pertain
to the study of time related data that changes in time.
Data Mining QueryData Mining Query
Language (DMQL)Language (DMQL)
22
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
23
Knowledge Discovery QueryKnowledge Discovery Query
Language ( KDQL)Language ( KDQL)
What is KDQL in principle ?
 Knowledge Discovery Query Language (KDQL) is a KDD query language suggested to the ODBC_KDD(2)
model for mining the association rules in the databases (i.e. DBMS, relational database), and then to visualize
the discovered results in different charts forms (i.e. 2D and 3D). KDQL was not implemented namely yet. In
KDQL we join KDD technology and data visualization with conjunction of the request of creating query
language for DM tasks. This leads us to develop a language tool that can handle two approaches in one session.
RequestRequest
DataData
Data toData to
VisualizeVisualize
Visualization ToolVisualization Tool
Database Management SystemDatabase Management System
(DBMS(DBMS((
24
Visualization techniques for DMQL
Data Mining QueryData Mining Query
Language (DMQL)Language (DMQL)
Visualization ToolsVisualization Tools
Database Management SystemDatabase Management System
(DBMS(DBMS((
Knowledge DiscoveryKnowledge Discovery
Query Language ( KDQL)Query Language ( KDQL)
25
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
26
Motivation
 I-Extended DatabaseI-Extended Database : Is a database that in addition to data also
contain exceedingly defined generalizations about the data. Moreover,
I-extended database is a database that has similar properties that are in
inductive database. We formalize this concept and show how it can be
used throughout the whole process of DM due to the closure property
of the framework.
 The basic message in I-extended database is as follow:
 I-extended database consists of a normal database associated to a
subset of patterns from a class of patterns, and an evaluation
function that tells how the patterns occur in the data.
 I-extended database can be queried (in principle) just by using
normal relational algebra or SQL, with the added property of being
able to refer to the values of the evaluation function on the
patterns.
 Modeling KDD processes as a sequence of queries on i-extended
database gives rise to chances for reasoning and optimizing these
processes.
I-Extended Databases (I-ED)I-Extended Databases (I-ED)
27
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
28
Motivation of KDQL
 The background of KDQL came from the Structured Query Language
(SQL) since several extensions to the SQL have been proposed to
serve as a Data Mining Query Language (DMQL).
SQL + DM (rules) = is the appropriate form for this task on the user
interface.
DM (rules) is based on the association rules to interact I-extended
database. The association rules will be obtained by the use of KDQL
rules, and the results will be graphically represented in a 2D and 3D
charts.
Implementation of KDQLImplementation of KDQL
29
Architecture of KDQL
Implementation of KDQLImplementation of KDQL
30
Example of KDQL
 For example, the rule. { cheese, coke} ==> bread
 States that if cheese and coke are bought together in a
transaction, also bread is bought in the same transaction. In
this association rules, the body is a set of items and the head is a
single item. The rule {cheese, coke}==> cheese, is not
interesting because it is a tautology: in fact if the head is
implicated by the body the rule does not provide new
information. This problem has the following formulation:
 KDQL RULE Associations AS
SELECT DISTINCT 1..n item AS BODY,
1..1 item AS HEAD,
SUPPORT, CONFIDENCE
FROM Purchase
GROUP BY transaction
EXTRACTING RULES WITH SUPPORT: 0.1,
CONFIDENCE: 0.2
Implementation of KDQLImplementation of KDQL
31
Implementation ofImplementation of
KDQLKDQL
 < KDQL_RULES_OP > := KDD RULES < TableName > AS
SELECT DISTINCT < BodyDescr >, < HeadDescr >
[,SUPPORT] [,CONFIDENCE]
[WHERE < WhereClause >]
FROM < FromList > [WHERE < WhereClause >]
GROUP BY < Attribute > < AttributeList>
[HAVING < HavingClause > ]
[CLUSTER BY < Attribute> < AttributeList>
[HAVING < HavingClause > ]
EXTRACTING RULES WITH SUPPORT :< real >,
CONFIDENCE:<real>
 < Body_Description_KDQL>:= [< Cardinaly_Sheap > ] < AttrName > < AttrList > AS BODY
/* default cardinality sheap for the Body: 1..n */
< Head_Description_KDQL>:= [< Cardinaly_Sheap > ] < AttrName > < AttrList > AS HEAD
/* default cardinality shaep for the Head: 1..1 */
< Cardinaly_Sheap >:=< Number> .. (< Number> | n)
<AttributeList>:={<AttributeName>,<AttributeName>,…<AttributeName>}
KDQL rules operator
32
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implantation of KDQL.
 Conclusion.
 Appendix A , B.
33
ConclusionConclusion
 KDQL is a part of the
ODBC_KDD (2) model .
 KDQL calls I-extended
database via ODBC connection.
 I-extended database calls all the
requested information from
traditional databases via the
ODBC.
 KDQL was implemented to
handle DM task with
visualization.
 Visualization techniques can be
maintained to visualize
interesting association rules
discovered from the databases.
34
ResultsResults
The major results of the thesis work are summarized as follows.
 Proposing a new remote access KDD model called ODBC_KDD (2) to
build an attractive model that could get results with more detailed
description such as visualization, scripts, statistical inferences and
more.
 Proposing and implementing a database concept, called I-extended
database (I-ED) to be maintained and accelerated by the use of
Knowledge Discovery Query Language (KDQL).
 In ODBC_KDD (2) model we proposed a query language called
KDQL.KDQL was suggested to interact into the conceptual database
called I-extended database. KDQL is a result of a new KDD query
language which could discover association rules.
 Using visualization tools in KDQL to represent the retrieved data
results in different 2D and 3D visual forms such as pie, points, lines
and bars.
 Using support and confidence of data item to locate the important
associated rules from the databases by using I-extended database to be
established by KDQL.
35
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
36
Appendix A , B
 We introduced the proposed syntax of the
KDQL statement rules.
Appendix A
Appendix B (Images from the program(
37
Dedications and AcknowledgmentsDedications and Acknowledgments
• First I want to thank my wife Emaan Zubi for her understanding and
making the last steps of writing this dissertation enjoyable and also my kids
Yhaia, Mohamed and Suliman for being nice kids while I’m doing this
work.
• My parents father: Suliman Zubi and Mother: Memona Yousef.
• I would like to thank Dr. Fazekas Gábor for accepting me as a Ph.D
student under his supervision. Also I would like to thank him for continuous
encouragement, confidence and support, reviewing the text of this thesis,
and for sharing with me his knowledge and love of this field .
• My senior supervisor Prof. Dr.Arató Mátyás for his encouragements.
• Dr.Kormos Janos, my teacher and friend, for his insightful comments ,
advice and help.
• Dr. Bajalinov Erik for the frequent constructive discussions regarding the
programming in Delphi.
• My deepest thanks to Dr.Varga Katalin and Dr.Várterész Magdolna for
refereeing my Ph.D dissertation work.
• Mr. Basheer Nassain the Libyan student advisor and Mr. Khalid Zintaney
the financial office in the Libyan Embassy, Budapest , for there support.
• All people in this committee.
• Finally I want to thank all my friends and people in the Institute of
Mathematical and Informatics, Debrecen University.
38
Thank you!!!
39
40

More Related Content

PDF
PRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATION
PPT
PDF
Privacy Preserving Clustering on Distorted data
PPT
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
PDF
New approaches with chord in efficient p2p grid resource discovery
PDF
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
PDF
A new hybrid algorithm for business intelligence recommender system
PDF
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...
PRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATION
Privacy Preserving Clustering on Distorted data
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
New approaches with chord in efficient p2p grid resource discovery
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
A new hybrid algorithm for business intelligence recommender system
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...

What's hot (20)

PPT
Edi text
PDF
IRJET- Swift Retrieval of DNA Databases by Aggregating Queries
PPTX
WP4-QoS Management in the Cloud
PDF
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
PDF
Ijsws14 423 (1)-paper-17-normalization of data in (1)
PDF
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
PDF
10420140501003
PDF
Indexing based Genetic Programming Approach to Record Deduplication
PDF
Term Frequency and its Variants in Retrieval Models
PDF
Linked open data it univ 22 nov 2012
PDF
Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...
PDF
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
PDF
51 privacy-preserving-publication-of-set-valued-data
PDF
1699 1704
PDF
Hu3414421448
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Cluster Based Access Privilege Management Scheme for Databases
PDF
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Edi text
IRJET- Swift Retrieval of DNA Databases by Aggregating Queries
WP4-QoS Management in the Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
10420140501003
Indexing based Genetic Programming Approach to Record Deduplication
Term Frequency and its Variants in Retrieval Models
Linked open data it univ 22 nov 2012
Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
51 privacy-preserving-publication-of-set-valued-data
1699 1704
Hu3414421448
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Cluster Based Access Privilege Management Scheme for Databases
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Ad

Viewers also liked (7)

PPT
Ismail&&ziko 2003
PPT
Knowledge Discovery Query Language (KDQL)
PPT
I- Extended Databases
PPT
Arabic Text mining Classification
PPT
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
PPT
Using Data Mining Techniques to Analyze Crime Pattern
PPTX
Data mining
Ismail&&ziko 2003
Knowledge Discovery Query Language (KDQL)
I- Extended Databases
Arabic Text mining Classification
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
Using Data Mining Techniques to Analyze Crime Pattern
Data mining
Ad

Similar to Knowledge Discovery in Remote Access Databases (20)

PDF
Association rule visualization technique
PDF
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
PPTX
kdd vs database. For data mining btech pptx
PPTX
Data mining
PDF
data mining
PDF
G045033841
PPTX
Data mining
PPTX
Data mining
PDF
DM-Unit-1-Part 1-R.pdf
PPT
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
PPT
Introduction to Data Mining
PDF
lec1.pdf
PDF
Advancing Knowledge Discovery and Data Mining
PDF
Data Mining Concepts - A survey paper
PPTX
Data Mining in Operating System
PPT
Unit 3 part i Data mining
PPTX
Lect 1 2 Data Mining.pptx for the predictive ananlysis
PPTX
An Introduction to Data Mining
PPTX
Introduction to Data Mining and Data Warehousing
PDF
Data Mining and its detail processes with steps
Association rule visualization technique
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
kdd vs database. For data mining btech pptx
Data mining
data mining
G045033841
Data mining
Data mining
DM-Unit-1-Part 1-R.pdf
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
Introduction to Data Mining
lec1.pdf
Advancing Knowledge Discovery and Data Mining
Data Mining Concepts - A survey paper
Data Mining in Operating System
Unit 3 part i Data mining
Lect 1 2 Data Mining.pptx for the predictive ananlysis
An Introduction to Data Mining
Introduction to Data Mining and Data Warehousing
Data Mining and its detail processes with steps

More from Zakaria Zubi (7)

PPT
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
PPT
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
PPT
Applying web mining application for user behavior understanding
PPT
PPT
Ibtc dwt hybrid coding of digital images
PPT
Deep Web mining
PPT
Information communication technology in libya for educational purposes
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
Applying web mining application for user behavior understanding
Ibtc dwt hybrid coding of digital images
Deep Web mining
Information communication technology in libya for educational purposes

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Spectroscopy.pptx food analysis technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
Teaching material agriculture food technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Cloud computing and distributed systems.
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
MIND Revenue Release Quarter 2 2025 Press Release
“AI and Expert System Decision Support & Business Intelligence Systems”
Spectroscopy.pptx food analysis technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Approach and Philosophy of On baking technology
sap open course for s4hana steps from ECC to s4
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Teaching material agriculture food technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
Cloud computing and distributed systems.
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Knowledge Discovery in Remote Access Databases

  • 1. Knowledge Discovery inKnowledge Discovery in Remote Access DatabasesRemote Access Databases A thesis submitted in partial fulfillment of the requirements for the degree ofA thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer ScienceDoctor of Philosophy in Computer Science at the Institute of Mathematics and Computer Science Informaticsat the Institute of Mathematics and Computer Science Informatics Debrecen of UniversityDebrecen of University By Zakaria Suliman ZubiBy Zakaria Suliman Zubi Supervised by Prof. Arato Matyas andSupervised by Prof. Arato Matyas and Prof.Fazekas GáborProf.Fazekas Gábor
  • 2. 2 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 3. 3 Introduction to KDDIntroduction to KDD and DMand DM  KDD is the process of extracting interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases.  DM is a single step in KDD process which deals with extracting trends or patterns from raw databases and carefully and accurately transforms them into useful and understandable information.  In the introduction part (chapter 1) I will follow the structure of expressing the History, Importance, Appearances and Tools for KDD and DM in all sections of the introduction part in this thesis. Is a phase in which noise data and irrelevant data are removed from the collection. Multiple data sources, often heterogeneous, may be combined in a common source. The data relevant to the analysis is decided on and retrieved from the data collection. It is a phase in which the selected data is transformed into forms appropriate for the mining procedure. It is the crucial step in which clever techniques are applied to extract patterns potentially useful information. Strictly interesting patterns representing knowledge are identified based on a given measures. In the final phase in which the discovered knowledge is visually represented to the user. KDD process
  • 4. 4 Introduction to KDDIntroduction to KDD and DMand DM KDD & DM shared with several topic
  • 5. 5 Introduction to KDDIntroduction to KDD and DMand DM  Access to databases was established via Open Database Connectivity (ODBC) .  Querying the databases can be maintained by Structured Query Language (SQL). The aim of using SQL is to allow users to define the data in databases and manipulate that data (adding, deleting and retrieving ) it from raw databases.  Using Data Visualization to represent Data Mining results.
  • 6. 6 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 7. 7 Goal of the Thesis WorkGoal of the Thesis Work  In this thesis work, we investigated the problem of matching DM problems with the set of DM algorithms that are suitable for solving it.  The use of visualization and its integration with algorithmic approaches to tune the parameters of DM algorithms, in order to support the parameter selection process, currently only explored by algorithmic approaches, in a more systematic form than using default values or setting parameter values without clues.  Introducing visualization to provide expressive information about induced models and statistics entities, and to support the interactive and dynamic exploration of induced models for DM.
  • 8. 8 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implantation of KDQL.  Conclusion.  Appendix A , B.
  • 9. 9 Remote Access KDD models Connection between KDD and ODBC
  • 12. 12 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 13. 13 Logical Foundation in Data Mining (LFDM)  Expressiveness :First order logic can represent more complex concepts than traditional attribute-value languages.  Readability : Formulae are easier to read than decision trees or a set of linear equations.  Background knowledge: Background knowledge can be grown during discovery time for example, in time series.  Multiple tables: Multiple database tables can be handled without explicit and expensive joins.  Deductive databases: Logical discovery engines can be transparently linked to relational databases via deductive databases. Advantages of Logical Foundation in Data Mining Disadvantages of Logical Foundation in Data Mining  Language complexity : First order hypothesis are usually constructed through heavy search ( discovery feasible).  Database access times: Checking one single candidate might involve heavy querying.  Number handling: Logical approaches to discovery usually suffer from poor number handling capabilities.
  • 14. 14 Translating first order queries into SQL  In our natural language a question such as “find all employers who are mangers and getting salary or expenses more than 1000000 HUF a year”:  expensive_employee(Name) ← employee(Name, Salary1, Manager),Salary1 > 1000000, employee(Manager, Salary2),Salary1 > Salary2  SELECT employee_0.NAME FROM employee employee_0, employee employee_1 WHERE employee_0.SALARY > 1000000 AND employee_1.NAME = employee_0.MANAGER AND employee_0.SALARY > employee_1.SALARY Logical Foundation in Data Mining (LFDM)
  • 15. 15 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 16. 16 Association Rules  What is an Association Rule? Association rule is a set of items T={ia,ib,..,it} T I, where I is the set of all possible items {i1,i2,…,in} in D the task relevant data, D is a set of transactions. An association rule is of the form : P  Q, where P I, Q I, and P Q =Ø. P Q holds in D with support s and P Q has a confidence c in the transaction set D  Example: “In 80% of the cases when people buy bread, they also buy milk” Bread ==> milk /80% Mining the DiscoveredMining the Discovered Association RulesAssociation Rules ⊂ ⊂ ⊂ ∩ y(Q/P)ProbabilitQ)(PConfidence =→ Q)y(PProbabilitQ)Support(P ∪=→
  • 17. 17 Mining the Association Rules  What is Mining the association rule? Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories. Selecting the most "interesting" rules based on their confidence factors. If holds in D with support s and has a confidence c in the transaction set D.  Applications: Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.  Examples:  “Body → Head [support, confidence]”  buys(x, “bread”) → buys(x, “milk”) [6%, 65%]  major(x, “CS”) takes(x, “Database”) → grade(x, “5”) [1%, 75%] Mining the DiscoveredMining the Discovered Association RulesAssociation Rules
  • 18. 18  How do we Mine Association Rules?  Input :  A database of transactions.  Each transaction is a list of items (Ex. purchased by a customer in a visit).  Find all rules that associate the presence of one set of items with that of another set of items.  Example: “98% of people who purchase tires and auto accessories also get automotive services done”  There are no restrictions on number of items in the body of the rule. Mining the DiscoveredMining the Discovered Association RulesAssociation Rules Mining the Association Rules cont.
  • 19. 19 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 20. 20 What is Data Mining Query Language?  Data Mining Query Language (DMQL)Data Mining Query Language (DMQL): Is an iterative process to the KDD process, which discovered knowledge and presented the knowledge to the user, the evaluation measures can be enhanced, the mining can be further refined, new data can be selected or further transformed, or new data sources can be integrated, in order to get different, more appropriate results. Data Mining QueryData Mining Query Language (DMQL)Language (DMQL)
  • 21. 21 Types of discovered patterns by DMQL  Characterization: Data characterization is a summarization of general features of objects in a target class, and produces what is called characteristic rules.  Discrimination: Data discrimination produces what are called discriminant rules and is basically the comparison of the general features of objects between two classes referred to as the target class and the contrasting class.  Association analysis: Association analysis is the discovery of what are commonly called association rules.  Classification: Classification analysis is the organization of data in given classes.  Prediction: Prediction has attracted considerable attention given the potential implications of successful forecasting in a business context.  Clustering: clustering is the organization of data in classes.  Outlier analysis: Outliers are data elements that cannot be grouped in a given class or cluster.  Evolution and deviation analysis: Evolution and deviation analysis pertain to the study of time related data that changes in time. Data Mining QueryData Mining Query Language (DMQL)Language (DMQL)
  • 22. 22 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 23. 23 Knowledge Discovery QueryKnowledge Discovery Query Language ( KDQL)Language ( KDQL) What is KDQL in principle ?  Knowledge Discovery Query Language (KDQL) is a KDD query language suggested to the ODBC_KDD(2) model for mining the association rules in the databases (i.e. DBMS, relational database), and then to visualize the discovered results in different charts forms (i.e. 2D and 3D). KDQL was not implemented namely yet. In KDQL we join KDD technology and data visualization with conjunction of the request of creating query language for DM tasks. This leads us to develop a language tool that can handle two approaches in one session. RequestRequest DataData Data toData to VisualizeVisualize Visualization ToolVisualization Tool Database Management SystemDatabase Management System (DBMS(DBMS((
  • 24. 24 Visualization techniques for DMQL Data Mining QueryData Mining Query Language (DMQL)Language (DMQL) Visualization ToolsVisualization Tools Database Management SystemDatabase Management System (DBMS(DBMS(( Knowledge DiscoveryKnowledge Discovery Query Language ( KDQL)Query Language ( KDQL)
  • 25. 25 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 26. 26 Motivation  I-Extended DatabaseI-Extended Database : Is a database that in addition to data also contain exceedingly defined generalizations about the data. Moreover, I-extended database is a database that has similar properties that are in inductive database. We formalize this concept and show how it can be used throughout the whole process of DM due to the closure property of the framework.  The basic message in I-extended database is as follow:  I-extended database consists of a normal database associated to a subset of patterns from a class of patterns, and an evaluation function that tells how the patterns occur in the data.  I-extended database can be queried (in principle) just by using normal relational algebra or SQL, with the added property of being able to refer to the values of the evaluation function on the patterns.  Modeling KDD processes as a sequence of queries on i-extended database gives rise to chances for reasoning and optimizing these processes. I-Extended Databases (I-ED)I-Extended Databases (I-ED)
  • 27. 27 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 28. 28 Motivation of KDQL  The background of KDQL came from the Structured Query Language (SQL) since several extensions to the SQL have been proposed to serve as a Data Mining Query Language (DMQL). SQL + DM (rules) = is the appropriate form for this task on the user interface. DM (rules) is based on the association rules to interact I-extended database. The association rules will be obtained by the use of KDQL rules, and the results will be graphically represented in a 2D and 3D charts. Implementation of KDQLImplementation of KDQL
  • 29. 29 Architecture of KDQL Implementation of KDQLImplementation of KDQL
  • 30. 30 Example of KDQL  For example, the rule. { cheese, coke} ==> bread  States that if cheese and coke are bought together in a transaction, also bread is bought in the same transaction. In this association rules, the body is a set of items and the head is a single item. The rule {cheese, coke}==> cheese, is not interesting because it is a tautology: in fact if the head is implicated by the body the rule does not provide new information. This problem has the following formulation:  KDQL RULE Associations AS SELECT DISTINCT 1..n item AS BODY, 1..1 item AS HEAD, SUPPORT, CONFIDENCE FROM Purchase GROUP BY transaction EXTRACTING RULES WITH SUPPORT: 0.1, CONFIDENCE: 0.2 Implementation of KDQLImplementation of KDQL
  • 31. 31 Implementation ofImplementation of KDQLKDQL  < KDQL_RULES_OP > := KDD RULES < TableName > AS SELECT DISTINCT < BodyDescr >, < HeadDescr > [,SUPPORT] [,CONFIDENCE] [WHERE < WhereClause >] FROM < FromList > [WHERE < WhereClause >] GROUP BY < Attribute > < AttributeList> [HAVING < HavingClause > ] [CLUSTER BY < Attribute> < AttributeList> [HAVING < HavingClause > ] EXTRACTING RULES WITH SUPPORT :< real >, CONFIDENCE:<real>  < Body_Description_KDQL>:= [< Cardinaly_Sheap > ] < AttrName > < AttrList > AS BODY /* default cardinality sheap for the Body: 1..n */ < Head_Description_KDQL>:= [< Cardinaly_Sheap > ] < AttrName > < AttrList > AS HEAD /* default cardinality shaep for the Head: 1..1 */ < Cardinaly_Sheap >:=< Number> .. (< Number> | n) <AttributeList>:={<AttributeName>,<AttributeName>,…<AttributeName>} KDQL rules operator
  • 32. 32 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implantation of KDQL.  Conclusion.  Appendix A , B.
  • 33. 33 ConclusionConclusion  KDQL is a part of the ODBC_KDD (2) model .  KDQL calls I-extended database via ODBC connection.  I-extended database calls all the requested information from traditional databases via the ODBC.  KDQL was implemented to handle DM task with visualization.  Visualization techniques can be maintained to visualize interesting association rules discovered from the databases.
  • 34. 34 ResultsResults The major results of the thesis work are summarized as follows.  Proposing a new remote access KDD model called ODBC_KDD (2) to build an attractive model that could get results with more detailed description such as visualization, scripts, statistical inferences and more.  Proposing and implementing a database concept, called I-extended database (I-ED) to be maintained and accelerated by the use of Knowledge Discovery Query Language (KDQL).  In ODBC_KDD (2) model we proposed a query language called KDQL.KDQL was suggested to interact into the conceptual database called I-extended database. KDQL is a result of a new KDD query language which could discover association rules.  Using visualization tools in KDQL to represent the retrieved data results in different 2D and 3D visual forms such as pie, points, lines and bars.  Using support and confidence of data item to locate the important associated rules from the databases by using I-extended database to be established by KDQL.
  • 35. 35 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 36. 36 Appendix A , B  We introduced the proposed syntax of the KDQL statement rules. Appendix A Appendix B (Images from the program(
  • 37. 37 Dedications and AcknowledgmentsDedications and Acknowledgments • First I want to thank my wife Emaan Zubi for her understanding and making the last steps of writing this dissertation enjoyable and also my kids Yhaia, Mohamed and Suliman for being nice kids while I’m doing this work. • My parents father: Suliman Zubi and Mother: Memona Yousef. • I would like to thank Dr. Fazekas Gábor for accepting me as a Ph.D student under his supervision. Also I would like to thank him for continuous encouragement, confidence and support, reviewing the text of this thesis, and for sharing with me his knowledge and love of this field . • My senior supervisor Prof. Dr.Arató Mátyás for his encouragements. • Dr.Kormos Janos, my teacher and friend, for his insightful comments , advice and help. • Dr. Bajalinov Erik for the frequent constructive discussions regarding the programming in Delphi. • My deepest thanks to Dr.Varga Katalin and Dr.Várterész Magdolna for refereeing my Ph.D dissertation work. • Mr. Basheer Nassain the Libyan student advisor and Mr. Khalid Zintaney the financial office in the Libyan Embassy, Budapest , for there support. • All people in this committee. • Finally I want to thank all my friends and people in the Institute of Mathematical and Informatics, Debrecen University.
  • 39. 39
  • 40. 40