SlideShare a Scribd company logo
CodeCritics Applied to Database Schema:
Challenges and First Results
Julien Delplanque1,2
Anne Etien2
Olivier Auverlot2
Tom Mens1
Nicolas Anquetil2
St´ephane Ducasse2
1Universit´e de Mons, Belgique
julien.delplanque@student.umons.ac.be
tom.mens@umons.ac.be
2Universit´e de Lille, CNRS, Inria, Centrale Lille,
UMR 9189 - CRIStAL,
F-59000 Lille, France
{nom.prenom}@univ-lille1.fr
1 / 35
Use Case Scenario I
Smells detection
DBAs need tools to highlight smells, anti-patterns and
violations of business rules.
2 / 35
Use Case Scenario I
Smells detection
DBAs need tools to highlight smells, anti-patterns and
violations of business rules.
Rule = a property that the database should have
3 / 35
Use Case Scenario I
Smells detection
DBAs need tools to highlight smells, anti-patterns and
violations of business rules.
Rule = a property that the database should have
• Generic rules
e.g., foreign keys reference primary keys
4 / 35
Use Case Scenario I
Smells detection
DBAs need tools to highlight smells, anti-patterns and
violations of business rules.
Rule = a property that the database should have
• Generic rules
e.g., foreign keys reference primary keys
• Company or database-specific rules
e.g., ensure the respect of naming convention
5 / 35
Use Case Scenario II
DBMS version migration
DBMS evolves to introduce new features or to fix bugs.
6 / 35
Use Case Scenario II
DBMS version migration
DBMS evolves to introduce new features or to fix bugs.
• Upgrade migration patches are rarely provided
7 / 35
Use Case Scenario II
DBMS version migration
DBMS evolves to introduce new features or to fix bugs.
• Upgrade migration patches are rarely provided
• Sometimes a textual change log is provided
8 / 35
Use Case Scenario II
DBMS version migration
DBMS evolves to introduce new features or to fix bugs.
• Upgrade migration patches are rarely provided
• Sometimes a textual change log is provided
• DBAs need to identify the migration impact
9 / 35
Use Case Scenario III
Maintaining consistency
A DB schema may be used as a basis for multiple projects.
10 / 35
Use Case Scenario III
Maintaining consistency
A DB schema may be used as a basis for multiple projects.
• Need to integrate the
changes to profit from the
original schema updates
11 / 35
Use Case Scenario III
Maintaining consistency
A DB schema may be used as a basis for multiple projects.
• Need to integrate the
changes to profit from the
original schema updates
• The consistency of the
DB should be kept after
an update
12 / 35
Additionally...
• All kind of entities (tables, columns, views, functions,
. . . ) and the relationships between them are potentially
subject to quality defects
13 / 35
Additionally...
• All kind of entities (tables, columns, views, functions,
. . . ) and the relationships between them are potentially
subject to quality defects
• Checking for domain-specific or system-specific rules
provides better defect prevention
14 / 35
Additionally...
• All kind of entities (tables, columns, views, functions,
. . . ) and the relationships between them are potentially
subject to quality defects
• Checking for domain-specific or system-specific rules
provides better defect prevention
• Automatic detection of quality problems is important but
resolving them is the ultimate goal
15 / 35
Additionally...
• All kind of entities (tables, columns, views, functions,
. . . ) and the relationships between them are potentially
subject to quality defects
• Checking for domain-specific or system-specific rules
provides better defect prevention
• Automatic detection of quality problems is important but
resolving them is the ultimate goal
• Resolving an issue on an entity may imply changes on
other entities
16 / 35
Table of contents
1 Introduction
2 DBCritics
3 Case Studies
17 / 35
Overview
⇒ Apply traditional Software Quality Analysis methods to
database schemas
18 / 35
Examples of rules
1 Detect use of * in SELECT request
19 / 35
Examples of rules
1 Detect use of * in SELECT request
2 View using another view
20 / 35
Examples of rules
1 Detect use of * in SELECT request
2 View using another view
21 / 35
Examples of rules
1 Detect use of * in SELECT request
2 View using another view
22 / 35
Examples of rules
1 Detect use of * in SELECT request
2 View using another view
23 / 35
Examples of rules
1 Detect use of * in SELECT request
2 View using another view
24 / 35
Table of contents
1 Introduction
2 DBCritics
3 Case Studies
25 / 35
Evaluation
Discovering rule violations on two real databases
• WikiMedia: 25 versions analysed
• AppSI: 12 versions analysed
WikiMedia AppSI
Tables 30/51 71/91
Columns 196/353 583/974
View 0/1 30/52
Functions 3/5 46/67
Triggers 2/3 12/16
LOC 1,435/2,453 4,910/7,006
Min/Max number of entities per type for each database.
26 / 35
Violation count per version
Rule violations can be found in open source as well as in
proprietary DB schemas.
27 / 35
Violating entities proportion
Dashed: violating entities, Solid: entities count.
The number of violating entities evolves with the total number
of entities.
28 / 35
“Time-to-fix” of a rule violation
Corrected violations:
• WikiMedia (WM): 21/87
• AppSI: 3/85
⇒ On both DBs some violations are fixed but not all of them.
29 / 35
“Time-to-fix” of a rule violation
Corrected violations:
• WikiMedia (WM): 21/87
• AppSI: 3/85
⇒ On both DBs some violations are fixed but not all of them.
Time in days needed to correct violations:
Min 1st quantile Median 3rd quantile Max
WM 95 1227 1833 2403 3644
AppSI 3 / 125 / 278
30 / 35
False positives
Three categories of violations can be distinguished:
1 Real design issues
2 Issues that the DBA accept to live with
3 Issues due to limitations of DBCritics
31 / 35
False positives
Three categories of violations can be distinguished:
1 Real design issues
2 Issues that the DBA accept to live with
3 Issues due to limitations of DBCritics
Classifying violations in these categories can not be automated.
32 / 35
False positives
Three categories of violations can be distinguished:
1 Real design issues
2 Issues that the DBA accept to live with
3 Issues due to limitations of DBCritics
Classifying violations in these categories can not be automated.
On AppSI v10, the DBA analysed the 81 rule violations:
Category Count
1 51
2 8
3 22
⇒ Can not be generalised, just gives an idea.
33 / 35
Conclusion
• Relational databases are at the core of many information
systems
• As any artefact, they are subject to errors and quality
defects
• Empirical study on two real DB supporting the relevance
of the approach
• External validation based on the feedback of AppSI’s
DBA supporting the relevance of the tool’s results
34 / 35
Questions
• Do open-source and proprietary DB schemas behave
differently in terms of rule violations?
• How to practically integrate such an approach in the DB
life-cycle?
• How to convince DBAs of the relevance of the approach
since they have lived without such tools for years?
35 / 35

More Related Content

PDF
A web application detecting dos attack using mca and tam
PPTX
Database IDS using data mining
PPTX
Aparato digestivo
PDF
Hajj—Pilgrimage to the House of God
PPTX
Tema 6
PDF
11CCEE_23Jul2015_Final
PPTX
Ética en la medicina actual
PDF
Making the Most of Reporting: The Power of Analytics
A web application detecting dos attack using mca and tam
Database IDS using data mining
Aparato digestivo
Hajj—Pilgrimage to the House of God
Tema 6
11CCEE_23Jul2015_Final
Ética en la medicina actual
Making the Most of Reporting: The Power of Analytics

Viewers also liked (10)

PDF
Panel Debate: An Uncertain Future - TEF, Retention, and Student Success
PPTX
Jarrar: Data Schema Integration
PPTX
Estudio Panorámico de la Biblia: 1 Pedro
PPTX
Estudio Panorámico de la Biblia: Tito
PPTX
Estudio Panorámico de la Biblia: 1 Timoteo
PDF
كتيب الثقافة العمالية
PPTX
Obsopoly 1
PPTX
Social media to promote a networking group
PDF
Sistematización de Experiencias Locales Preval-Fidamérica. Pag. 56
PPTX
Tipos de contratos
Panel Debate: An Uncertain Future - TEF, Retention, and Student Success
Jarrar: Data Schema Integration
Estudio Panorámico de la Biblia: 1 Pedro
Estudio Panorámico de la Biblia: Tito
Estudio Panorámico de la Biblia: 1 Timoteo
كتيب الثقافة العمالية
Obsopoly 1
Social media to promote a networking group
Sistematización de Experiencias Locales Preval-Fidamérica. Pag. 56
Tipos de contratos
Ad

Similar to CodeCritics Applied to Database Schema: Challenges and First Results (20)

PDF
Runaway complexity in Big Data... and a plan to stop it
PPTX
The Rise of NoSQL and Polyglot Persistence
PDF
Real-world consistency explained
PDF
Bristol Uni - Use Cases of NoSQL
PDF
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PPTX
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
PDF
Database Design and Implementation
PPTX
RDBMS to NoSQL. An overview.
PPT
Ch10
PPT
demo2.ppt
PDF
3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf
PPTX
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
PPTX
Data Base Design.pptx
PDF
Database revolution opening webcast 01 18-12
PDF
Database Revolution - Exploratory Webcast
PPTX
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
PDF
Architectural anti patterns_for_data_handling
PDF
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
DOCX
Discussion 1 The incorrect implementation of databases ou
PPT
The Land Sharks are on the Squawk Box (Where Did Postgres Come From?)
 
Runaway complexity in Big Data... and a plan to stop it
The Rise of NoSQL and Polyglot Persistence
Real-world consistency explained
Bristol Uni - Use Cases of NoSQL
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
Database Design and Implementation
RDBMS to NoSQL. An overview.
Ch10
demo2.ppt
3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Data Base Design.pptx
Database revolution opening webcast 01 18-12
Database Revolution - Exploratory Webcast
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
Architectural anti patterns_for_data_handling
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
Discussion 1 The incorrect implementation of databases ou
The Land Sharks are on the Squawk Box (Where Did Postgres Come From?)
 
Ad

Recently uploaded (20)

PPTX
The Minerals for Earth and Life Science SHS.pptx
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Fluid dynamics vivavoce presentation of prakash
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Overview of calcium in human muscles.pptx
PPTX
Application of enzymes in medicine (2).pptx
PPTX
Pharmacology of Autonomic nervous system
PDF
The Land of Punt — A research by Dhani Irwanto
PDF
The scientific heritage No 166 (166) (2025)
PDF
An interstellar mission to test astrophysical black holes
PPTX
BIOMOLECULES PPT........................
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPT
veterinary parasitology ````````````.ppt
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
The Minerals for Earth and Life Science SHS.pptx
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
. Radiology Case Scenariosssssssssssssss
Fluid dynamics vivavoce presentation of prakash
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Overview of calcium in human muscles.pptx
Application of enzymes in medicine (2).pptx
Pharmacology of Autonomic nervous system
The Land of Punt — A research by Dhani Irwanto
The scientific heritage No 166 (166) (2025)
An interstellar mission to test astrophysical black holes
BIOMOLECULES PPT........................
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
veterinary parasitology ````````````.ppt
BODY FLUIDS AND CIRCULATION class 11 .pptx
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
TOTAL hIP ARTHROPLASTY Presentation.pptx

CodeCritics Applied to Database Schema: Challenges and First Results

  • 1. CodeCritics Applied to Database Schema: Challenges and First Results Julien Delplanque1,2 Anne Etien2 Olivier Auverlot2 Tom Mens1 Nicolas Anquetil2 St´ephane Ducasse2 1Universit´e de Mons, Belgique julien.delplanque@student.umons.ac.be tom.mens@umons.ac.be 2Universit´e de Lille, CNRS, Inria, Centrale Lille, UMR 9189 - CRIStAL, F-59000 Lille, France {nom.prenom}@univ-lille1.fr 1 / 35
  • 2. Use Case Scenario I Smells detection DBAs need tools to highlight smells, anti-patterns and violations of business rules. 2 / 35
  • 3. Use Case Scenario I Smells detection DBAs need tools to highlight smells, anti-patterns and violations of business rules. Rule = a property that the database should have 3 / 35
  • 4. Use Case Scenario I Smells detection DBAs need tools to highlight smells, anti-patterns and violations of business rules. Rule = a property that the database should have • Generic rules e.g., foreign keys reference primary keys 4 / 35
  • 5. Use Case Scenario I Smells detection DBAs need tools to highlight smells, anti-patterns and violations of business rules. Rule = a property that the database should have • Generic rules e.g., foreign keys reference primary keys • Company or database-specific rules e.g., ensure the respect of naming convention 5 / 35
  • 6. Use Case Scenario II DBMS version migration DBMS evolves to introduce new features or to fix bugs. 6 / 35
  • 7. Use Case Scenario II DBMS version migration DBMS evolves to introduce new features or to fix bugs. • Upgrade migration patches are rarely provided 7 / 35
  • 8. Use Case Scenario II DBMS version migration DBMS evolves to introduce new features or to fix bugs. • Upgrade migration patches are rarely provided • Sometimes a textual change log is provided 8 / 35
  • 9. Use Case Scenario II DBMS version migration DBMS evolves to introduce new features or to fix bugs. • Upgrade migration patches are rarely provided • Sometimes a textual change log is provided • DBAs need to identify the migration impact 9 / 35
  • 10. Use Case Scenario III Maintaining consistency A DB schema may be used as a basis for multiple projects. 10 / 35
  • 11. Use Case Scenario III Maintaining consistency A DB schema may be used as a basis for multiple projects. • Need to integrate the changes to profit from the original schema updates 11 / 35
  • 12. Use Case Scenario III Maintaining consistency A DB schema may be used as a basis for multiple projects. • Need to integrate the changes to profit from the original schema updates • The consistency of the DB should be kept after an update 12 / 35
  • 13. Additionally... • All kind of entities (tables, columns, views, functions, . . . ) and the relationships between them are potentially subject to quality defects 13 / 35
  • 14. Additionally... • All kind of entities (tables, columns, views, functions, . . . ) and the relationships between them are potentially subject to quality defects • Checking for domain-specific or system-specific rules provides better defect prevention 14 / 35
  • 15. Additionally... • All kind of entities (tables, columns, views, functions, . . . ) and the relationships between them are potentially subject to quality defects • Checking for domain-specific or system-specific rules provides better defect prevention • Automatic detection of quality problems is important but resolving them is the ultimate goal 15 / 35
  • 16. Additionally... • All kind of entities (tables, columns, views, functions, . . . ) and the relationships between them are potentially subject to quality defects • Checking for domain-specific or system-specific rules provides better defect prevention • Automatic detection of quality problems is important but resolving them is the ultimate goal • Resolving an issue on an entity may imply changes on other entities 16 / 35
  • 17. Table of contents 1 Introduction 2 DBCritics 3 Case Studies 17 / 35
  • 18. Overview ⇒ Apply traditional Software Quality Analysis methods to database schemas 18 / 35
  • 19. Examples of rules 1 Detect use of * in SELECT request 19 / 35
  • 20. Examples of rules 1 Detect use of * in SELECT request 2 View using another view 20 / 35
  • 21. Examples of rules 1 Detect use of * in SELECT request 2 View using another view 21 / 35
  • 22. Examples of rules 1 Detect use of * in SELECT request 2 View using another view 22 / 35
  • 23. Examples of rules 1 Detect use of * in SELECT request 2 View using another view 23 / 35
  • 24. Examples of rules 1 Detect use of * in SELECT request 2 View using another view 24 / 35
  • 25. Table of contents 1 Introduction 2 DBCritics 3 Case Studies 25 / 35
  • 26. Evaluation Discovering rule violations on two real databases • WikiMedia: 25 versions analysed • AppSI: 12 versions analysed WikiMedia AppSI Tables 30/51 71/91 Columns 196/353 583/974 View 0/1 30/52 Functions 3/5 46/67 Triggers 2/3 12/16 LOC 1,435/2,453 4,910/7,006 Min/Max number of entities per type for each database. 26 / 35
  • 27. Violation count per version Rule violations can be found in open source as well as in proprietary DB schemas. 27 / 35
  • 28. Violating entities proportion Dashed: violating entities, Solid: entities count. The number of violating entities evolves with the total number of entities. 28 / 35
  • 29. “Time-to-fix” of a rule violation Corrected violations: • WikiMedia (WM): 21/87 • AppSI: 3/85 ⇒ On both DBs some violations are fixed but not all of them. 29 / 35
  • 30. “Time-to-fix” of a rule violation Corrected violations: • WikiMedia (WM): 21/87 • AppSI: 3/85 ⇒ On both DBs some violations are fixed but not all of them. Time in days needed to correct violations: Min 1st quantile Median 3rd quantile Max WM 95 1227 1833 2403 3644 AppSI 3 / 125 / 278 30 / 35
  • 31. False positives Three categories of violations can be distinguished: 1 Real design issues 2 Issues that the DBA accept to live with 3 Issues due to limitations of DBCritics 31 / 35
  • 32. False positives Three categories of violations can be distinguished: 1 Real design issues 2 Issues that the DBA accept to live with 3 Issues due to limitations of DBCritics Classifying violations in these categories can not be automated. 32 / 35
  • 33. False positives Three categories of violations can be distinguished: 1 Real design issues 2 Issues that the DBA accept to live with 3 Issues due to limitations of DBCritics Classifying violations in these categories can not be automated. On AppSI v10, the DBA analysed the 81 rule violations: Category Count 1 51 2 8 3 22 ⇒ Can not be generalised, just gives an idea. 33 / 35
  • 34. Conclusion • Relational databases are at the core of many information systems • As any artefact, they are subject to errors and quality defects • Empirical study on two real DB supporting the relevance of the approach • External validation based on the feedback of AppSI’s DBA supporting the relevance of the tool’s results 34 / 35
  • 35. Questions • Do open-source and proprietary DB schemas behave differently in terms of rule violations? • How to practically integrate such an approach in the DB life-cycle? • How to convince DBAs of the relevance of the approach since they have lived without such tools for years? 35 / 35