3. Data: Known facts that can be recorded and have an implicit meaning
Database: A collection of related data.
Database Management System (DBMS): A software package/ system to facilitate
the creation and maintenance of a computerized database
1. INTRODUCTION
Applications of DBMS
- Banking
- Airlines
- Universities
- Credit card transactions
- Tele communication
- Finance
- Sales
- Manufacturing
- Human resources
-- Electronic Commerce
-- Social Networking
-- OnLine Analytical Processing (OLAP)
Purpose of DBMS:
Traditionally, data was organized in file formats. DBMS was a new concept then, and all the research was
done to make it overcome the deficiencies in traditional style of data management.
The main purpose of the database is to operate a large amount of information by storing, retrieving, and
managing data.
4. File Organization : Terms and Concepts
•Database: Group of related files
•File: Group of records of same type
•Record: Group of related fields
•Field: Group of words or a complete
number
•Byte: Group of bits that represents a
single character
•Bit: Smallest unit of data; binary digit
(0,1)
Data Hierarchy in a Computer System
5. Problems with the Traditional File Environment
•Data redundancy &
inconsistency
•Program-Data dependence
•Lack of flexibility
•Integrity problems
•Poor security
•Atomicity problems
•No concurrency control
Traditional File Processing
6. Data Redundancy
Data Redundancy means same information is duplicated in several files. This makes data redundancy.
Data Inconsistency
Data Inconsistency means different copies of the same data are not matching. That means different
versions of same basic data are existing. This occurs as the result of update operations that are not
updating the same data stored at different places.
Example: Address Information of a customer is recorded differently in different files.
Difficulty in Accessing Data
It is not easy to retrieve information using a conventional file processing system. Convenient and efficient
information retrieval is almost impossible using conventional file processing system.
7. Data Isolation
Data are scattered in various files, and the files may be in different format, writing new
application program to retrieve data is difficult.
Integrity Problems
The data values may need to satisfy some integrity constraints. For example the balance field
Value must be grater than 5000. We have to handle this through program code in file processing
systems. But in database we can declare the integrity constraints along with definition itself.
Independence of data and program
Both the database and the user program can be altered independently of each other thus saving
time and money which would be required to retain consistency.
8. Atomicity Problem
It is difficult to ensure atomicity in file processing system.
For example transferring $100 from Account A to account B.
If a failure occurs during execution there could be situation like $100 is deducted from Account A and not
credited in Account B.
Concurrent Access anomalies
If multiple users are updating the same data simultaneously it will result in inconsistent data state.
In file processing system it is very difficult to handle this using program code. This results in concurrent
access anomalies.
Security Problems
Enforcing Security Constraints in file processing system is very difficult as the application programs are
added to the system in an ad-hoc manner.
9. DBMS and its Advantages
•A Database Management System is a collection of programs that enables users to create and
maintain a database. It is a general purpose software system that facilitates processes of
defining, constructing and manipulating databases for various applications.
•Advantages of Database approach:
• Controlling Redundancy
• Restricting Unauthorized access
• Providing persistent storage for program objects and data
structures
• Permitting inference and actions using deduction rules
• Providing multiple user interface
• Representing complex relationships among data
• Enforcing integrity constraints and providing backup and
recovery
10. Basis DBMS Approach File System Approach
Meaning DBMS is a collection of data. In DBMS, the user is not
required to write the procedures.
The file system is a collection of data. In this system, the user
has to write the procedures for managing the database.
Sharing of data Due to the centralized approach, data sharing is easy. Data is distributed in many files, and it may be of different
formats, so it isn't easy to share data.
Data Abstraction DBMS gives an abstract view of data that hides the details. The file system provides the detail of the data representation
and storage of data.
Security and Protection DBMS provides a good protection mechanism. It isn't easy to protect a file under the file system.
Recovery Mechanism DBMS provides a crash recovery mechanism, i.e., DBMS
protects the user from system failure.
The file system doesn't have a crash mechanism, i.e., if the
system crashes while entering some data, then the content of
the file will be lost.
Manipulation Techniques DBMS contains a wide variety of sophisticated techniques to
store and retrieve the data.
The file system can't efficiently store and retrieve the data.
Concurrency Problems DBMS takes care of Concurrent access of data using some
form of locking.
In the File system, concurrent access has many problems like
redirecting the file while deleting some information or updating
some information.
Where to use Database approach used in large systems which interrelate
many files.
File system approach used in large systems which interrelate
many files.
Cost The database system is expensive to design. The file system approach is cheaper to design.
Data Redundancy and Inconsistency Due to the centralization of the database, the problems of data
redundancy and inconsistency are controlled.
In this, the files and application programs are created by
different programmers so that there exists a lot of duplication
of data which may lead to inconsistency.
Structure The database structure is complex to design. The file system approach has a simple structure.
Data Independence •In this system, Data Independence exists, and it can be of two
types.Logical Data Independence
•Physical Data Independence
In the File system approach, there exists no Data
Independence.
Integrity Constraints Integrity Constraints are easy to apply. Integrity Constraints are difficult to implement in file system.
Data Models •In the database approach, 3 types of data models
exist:Hierarchal data models
•Network data models
•Relational data models
In the file system approach, there is no concept of data models
exists.
Flexibility Changes are often a necessity to the content of the data stored The flexibility of the system is less as compared to the DBMS
11. View of Data
External
Schema 1
External
Schema 2
External
Schema n
...
Conceptual
Schema
Physical
Schema
External Schemas
how the data are physically stored
what data are stored,
what relationships, constraints exist
customizations
of the conceptual
schema to the
needs of various
classes of users
actual data
1st
level
of abstraction
2nd
level
of abstraction
3rd
level
of abstraction
12. External view/External schema
• Different users often need different views of the data.
• Example:
• Accountant needs to have access to financial information on a student
• Head of Department needs access to academic information.
• Student needs to access marks information.
• An external schema is a description of part of the DB as seen by an application
programmer or a user
Logical View/Conceptual Schema
• Representation of the logical structure of the information content of the DB.
• Abstracts away from the actual physical storage.
• It is, in a sense, a composite of all the external schema.
Physical/Internal Schema
• The internal schema describes the data as it is physically stored.
• For example,
• record structure
• types of fields in a record
• existence of primary and secondary indexes
13. Example: University Database
• Conceptual schema:
– Students(sid: string, name: string, login: string,
age: integer, gpa:real)
– Courses(cid: string, cname:string, credits:integer)
– Enrolled(sid:string, cid:string, grade:string)
• Physical schema:
– Relations stored as unordered files.
– Index on first column of Students.
• External Schema (View):
– Course_info(cid:string,enrollment:integer)
The views also provide a security mechanism to prevent users from accessing certain parts of the
database
For example, tellers in a bank see only that part of the database that has information on customer
accounts; they cannot access information about salaries of employees
14. Schema & Instance
Databases change over time as information is inserted and deleted
The collection of information stored in the database at a particular moment is called an instance of
the database
The overall design of the database is called the database schema
The concept of database schemas and instances can be understood by analogy to a program written
in a programming language
A database schema corresponds to the variable declarations in a program, each variable has a
particular value at a given instant
The values of the variables in a program at a point in time correspond to an instance of a database
schema
Database systems have several schemas, partitioned according to the levels of abstraction
15. The physical schema describes the database design at the physical level, while the logical
schema describes the database design at the logical level
A database may also have several schemas at the view level, sometimes called
subschemas, that describe different views of the database
Programmers develop applications by using the logical schema
Physical schema is hidden beneath the logical schema, and can usually be changed easily
without affecting application programs
• Application programs are said to exhibit physical data independence if they do not depend on
the physical schema, and thus need not be rewritten if the physical schema changes
17. Database Languages
• Data Definition Language: To specify the database schema.
• Data Manipulation Language: To express database queries and updates.
18. Data Manipulation Language (DML)
Language for accessing and manipulating the data organized by the
appropriate data model
• DML also known as query language
Two classes of languages
• Procedural – user specifies what data is required and how to get those
data
• Declarative (nonprocedural) – user specifies what data is required
without specifying how to get those data
SQL is the most widely used query language
19. • A query is a statement requesting the retrieval of information.
• DML involves information retrieval is called a query language.
Data Manipulation Language (DML)
20. Data Definition Language (DDL)
Specification notation for defining the database schema
Example: create table account (
account_number char(10),
branch_name char(10),
balance integer)
DDL compiler generates a set of tables stored in a data dictionary
Data dictionary contains metadata (i.e., data about data)
• Database schema
• Data storage and definition language
• Specifies the storage structure and access methods used
• Integrity constraints
• Domain constraints
• Referential integrity (e.g. branch_name must correspond to a valid
branch in the branch table)
21. Data Dictionary
• A data dictionary contains a list of all files in the database, the number of
records in each file, and the names and types of each field
• Most database management systems keep the data dictionary hidden from
users to prevent them from accidentally destroying its contents.
• Data dictionaries do not contain any actual data from the database, only
bookkeeping information for managing it.
• Without a data dictionary, however, a database management system cannot
access data from the database.
23. Storage Manager
• Storage manager is a program module that provides the interface between the
low-level data stored in the database and the application programs and queries
submitted to the system.
• Translates DML statements into low-level file-system commands.
• The storage manager is responsible to the following tasks:
• Interaction with the file manager
• Efficient storing, retrieving and updating of data
• Controls the placement of data on disk and its movement between disk and
main memory.
• Data structures:
• Data files: which store the database itself.
• Data dictionary: stores meta data about the structures of the database.
• Indices: provide fast access to data items.
24. Storage Manager
Components:
• Authorization and integrity manager:
• Tests for satisfaction of integrity constraints and checks the authority of users to
access data.
• Transaction manager:
• Consistent state
• Concurrent transaction executions
• File manager:
• Manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
• Buffer manger:
• Responsible for fetching the data from disk storage into main memory, and deciding
what data to cache in main memory.
• It enables the database to handle data sizes that are much larger than the size of main
memory.
25. Query Processor
Components:
• DDL interpreter: interprets DDL statements and records the definitions in
the data dictionary.
• DML compiler: translates DML statements in a query language into an
evaluation plan consisting of low-level instructions that the query
evaluation engine understands.
• Translate into any of a number of alternative evaluation plans that all give the same result
• Query evaluation engine: executes low-level instructions generated by the
DML compiler.
27. Query Processing
• Alternative ways of evaluating a given query
• Equivalent expressions
• Different algorithms for each operation
• Cost difference between a good and a bad way of evaluating a query can
be enormous
• Need to estimate the cost of operations
• Depends critically on statistical information about relations which
the database must maintain
• Need to estimate statistics for intermediate results to compute cost of
complex expressions
28. Data Models
A data model provides a way to describe the design of a database at the physical,
logical, and view level.
A collection of tools for describing
• Data
• Data relationships
• Data semantics
• Data constraints
Relational model
Entity-Relationship data model (mainly for database design)
Object-based data models (Object-oriented and Object-relational)
Semistructured data model (XML)
Other older models:
• Network model
• Hierarchical model
31. SQL
• SQL: Structured Query Language
• Example: Find the name of the customer with customer-id 192-83-7465
select customer.customer_name
from customer
where customer.customer_id = ‘192-83-7465’
• Example: Find the balances of all accounts held by the customer with
customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer_id = ‘192-83-7465’ and
depositor.account_number = account.account_number
32. The Entity-Relationship Model
• Models an enterprise as a collection of entities and relationships
• Entity: a “thing” or “object” in the enterprise that is distinguishable
from other objects
• Described by a set of attributes
• Relationship: an association among several entities
• Represented diagrammatically by an entity-relationship diagram:
33. The Entity-Relationship Model
• Rectangles: which represent entity sets.
• Ellipses: which represent attributes.
• Diamonds: which represent set of relationships among a member from each
of several entity sets.
• Lines: which link attributes to entity sets and entity sets to relationships.
• Mapping cardinalities: express the number of entities to which another
entity can be associated via a relationship set.
34. Other Data Models
• Object-Based data models
Object-oriented data model
• Object-oriented programming language
Object-relational data model
• Structured and collection types
• Object orientation
• Semistructured data models:
• Specification of data where individual data items of the same type may
have different sets of attributes.
35. Hierarchical Model
• First database systems introduced in the 1960’s were based on the “hierarchical
model”
• This model assumes all data relationships can be structured as hierarchies.
• Uses a “parent”” and “child” relationship where a child can only have one parent.
• Hierarchical Model still used
today on some IBM mainframes
such as IBM’s IMS system.
• Need to redesign entire database
to change how data is
interrelated.
• Can lead to confusion in
relationships of data.
• Only allows one parent record
type.
Ratings Salary
Compensation Job
Assignments
Pension Insurance Health
Benefits
Employer
36. Network Model
• Network Model came about from the Conference on Data Systems Language
(CODASYL).
• Design is based on “data sets”
• Similar to the hierarchical model with sequential access properties.
• Uses logically related data items to define record types.
• As data sets are added so is the
complexity and confusion.
• Procedural model of one record
processing at a time.
• Difficult to change data once it is
stored
• Need detailed understanding of
the data and how it is stored.
37. Database Users
Users are differentiated by the way they expect to interact with the system
• Application programmers – interact with system through DML calls
• Sophisticated users – interact with the system without writing the programs.
• form requests in a database query language
• Specialized users – write specialized database applications that do not fit into
the traditional data processing framework
• Naïve users – invoke one of the permanent application programs that have
been written previously
• Examples, people accessing database over the web, bank tellers, clerical
staff
• Read reports generated from the database.
38. Database Administrator
• Coordinates all the activities of the database system; the database administrator has
a good understanding of the enterprise’s information resources and needs.
• Database administrator's duties include:
• Schema definition
• Storage structure and access method definition
• Schema and physical organization modification
• Granting user authority to access the database
• Specifying integrity constraints
• Monitoring performance and responding to changes in requirements
39. Transaction Management
• A transaction is a collection of operations that performs a single logical
function in a database application
• Transaction-management component ensures that the database remains
in a consistent (correct) state despite system failures (e.g., power failures
and operating system crashes) and transaction failures.
• Concurrency-control manager controls the interaction among the
concurrent transactions, to ensure the consistency of the database.
40. Transaction Management
ACID Properties
• Key concept is a transaction: a sequence of database actions (reads/writes).
• A transaction is a collection of operations that performs a single logical function in a
database application
• DBMS ensures atomicity (all-or-nothing property) even if system crashes in the
middle of a execution
• Each transaction, executed completely, must take the DB between consistent states
or must not run at all.
• DBMS ensures that concurrent transactions appear to run in isolation.
• DBMS ensures durability of committed execution even if system crashes.