1. Chapter 1: Introduction
Purpose of Database Systems
View of Data
Data Models
Data Definition Language
Data Manipulation Language
Transaction Management
Storage Management
Database Administrator
Database Users
Overall System Structure
abhishek.kumar762@gmail.com
2. Database Management System (DBMS)
Collection of interrelated data
Set of programs to access the data
DBMS contains information about a particular enterprise
DBMS provides an environment that is both convenient and
efficient to use.
Database Applications:
Banking: all transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax deductions
Databases touch all aspects of our lives
abhishek.kumar762@g
mail.com
3. Purpose of Database System
In the early days, database applications were built on top
of file systems
Drawbacks of using file systems to store data:
Data redundancy and inconsistency
Multiple file formats, duplication of information in
different files
Difficulty in accessing data
Need to write a new program to carry out each new task
Data isolation — multiple files and formats
Integrity problems
Integrity constraints (e.g. account balance > 0) become
part of program code
Hard to add new constraints or change existing ones
abhishek.kumar762@g
mail.com
4. Purpose of Database Systems (Cont.)
Drawbacks of using file systems (cont.)
Atomicity of updates
Failures may leave database in an inconsistent state with
partial updates carried out
E.g. transfer of funds from one account to another should
either complete or not happen at all
Concurrent access by multiple users
Concurrent accessed needed for performance
Uncontrolled concurrent accesses can lead to inconsistencies
– E.g. two people reading a balance and updating it at the
same time
Security problems
Not provide a systematic way for backup and recovery of data
and files.
Database systems offer solutions to all the above problems
abhishek.kumar762@g
mail.com
5. Advantages of DBMS
Preventing Data Redundancy
In the database approach of storing data, the information such as
employee name or date of joining is stored at a single place in a
database. Therefore storing data in a database prevents data
redundancy and saves storage space.
Restricting Unauthorized Access
In the database approach of storing data, multiple users can access data
stored in a database. However some user are authorized to modify the
database.
Persistent Storage
Database provides persistent storage for data structures such as class
definitions in C++ and program variables used in programming languages.
Data Structures can be stored permanently in the DBMS. Such data
stored in the DBMS is called persistent data because data can be called
by another program even after the program, in which it was being used, is
terminated.
abhishek.kumar762@g
mail.com
6. Advantages of DBMS (Cont.)
Multiple User Interface
DBMS provides multiple user interfaces for users with varying levels of
technical knowledge.
Ex : Programming language interface for application programmers.
Integrity Constraints
DBMS allows us to define and implement the constraints. When we
store data in a database, we can specify integrity constraints that
hold for the data.
Ex: Employee_Id must have a Unique value
Backup and Recovery
The backup and recovery subsystem of the DBMS allows recovering
of data when hardware or software failure occurs.
Data independence
Application programs should be as independent as possible from details of
data representation and storage. The DBMS can provide an abstract view
of the data to insulate application code from such details.
abhishek.kumar762@g
mail.com
7. Disadvantages of DBMS
Danger of a Overkill: For small and simple applications for single users
a database system is often not advisable.
Complexity: A database system creates additional complexity and
requirements. The supply and operation of a database management
system with several users and databases is quite costly and demanding.
Qualified Personnel: The professional operation of a database system
requires appropriately trained staff. Without a qualified database
administrator nothing will work for long.
Costs: Through the use of a database system new costs are generated for
the system itselfs but also for additional hardware and the more complex
handling of the system.
Enterprise vulnerability: The centralisation of all the enterprise
information in the database makes the database an indispensable
resource for the organization. The security of central database becomes a
cumbersome task for the organization, as the survival of the organization
may depend on the security of the database.
abhishek.kumar762@g
mail.com
8. ACID Properties in DBMS
Atomicity
It ensures that when an update occurs in a database, either all or none of
the update is made available to other users except the user who is
performing the update.
Consistency
It ensures that any change to a value of an instance is consistent with
the change to other values in the same instance.
Isolation
It is used when concurrent transactions occur.
Durability
It ensures that the updates of committed transactions are maintained and
never get lost.
abhishek.kumar762@g
mail.com
9. Levels of Abstraction
Physical level describes how a record (e.g., customer) is
stored.
Logical level: describes data stored in database, and the
relationships among the data.
type customer = record
name : string;
street : string;
city : integer;
end;
View level: application programs hide details of data types.
Views can also hide information (e.g., salary) for security
purposes.
abhishek.kumar762@g
mail.com
10. View of Data
An architecture for a database system
abhishek.kumar762@g
mail.com
11. Database Schemes and Database Instances
Independent from the database model it is important to differentiate between
the description of the database and the database itself. The description of the
database is called database scheme or also metadata. The database
scheme is defined during the database design process and changes very
rarely afterwards.
The actual content of the database, the data, changes often over the years. A
database state at a specific time defined through the currently existing content
and relationship and their attributes is called a database instance.
The following illustration shows that a database scheme could be looked at
like a template or building plan for one or several database instances.
abhishek.kumar762@g
mail.com
12. Data Models
A collection of tools for describing
data
data relationships
data semantics
data constraints
Entity-Relationship model
Relational model
Other models:
object-oriented model
semi-structured data models
Network model and hierarchical model
abhishek.kumar762@g
mail.com
13. Data Models (Cont.)
A collection of concepts that can be used to describe the
structure of a database (data types, relationships, and
constraints)
basic operations (retrieval and updates)
specify the dynamic aspect or behavior of a database
application( user-defined operations )
example: COMPUTE_GPA, which can be applied to a
STUDENT object
abhishek.kumar762@g
mail.com
14. Categories of Data Models
High-level or , representational (common users)
In between conceptual data models (or implementation)
data models can serve both categories above
low-level or physical data models (describe the details of
how data is stored )
abhishek.kumar762@g
mail.com
15. Conceptual Data Model
Use concepts such as
Entities: a real-world object or concept (DEPT) (COURSE)
Attributes: property of interest that further describes an
entity (dept no, name, telephone, etc)
Relationships: interaction among the entities (DEPT)
provides (COURSE)
abhishek.kumar762@g
mail.com
16. Physical Data Model
Describes how data is stored in the computer.
It represents info such as
record formats
record orderings
access path: make search more efficient
abhishek.kumar762@g
mail.com
17. Representational Data Model
Used in traditional commercial DMBS
they include
Relational Data model
Network model
Hierarchical model
abhishek.kumar762@g
mail.com
18. Entity-Relationship Model
The entity-relationship (E-R) data model is based on a perception of a real world that
consists of a collection of basic objects, called entities, and of relationships among
these objects. An entity is a “thing” or “object” in the real world that is
distinguishable from other objects.
For example, each person is an entity, and bank accounts can be considered as
entities.
Entities are described in a database by a set of attributes.
For example, the attributes account-number and balance may describe one
particular account in a bank, and they form attributes of the account entity set.
Similarly, attributes customer-name, customer-street address and customer-city may
describe a customer entity.
An extra attribute customer-id is used to uniquely identify customers (since it may
be possible to have two customers with the same name, street address, and city).
A relationship is an association among several entities. For example, a depositor
relationship associates a customer with each account that she has. The set of all
entities of the same type and the set of all relationships of the same type are termed
an entity set and relationship set, respectively.
The overall logical structure (schema) of a database can be expressed graphically by
an E-R diagram.
abhishek.kumar762@g
mail.com
20. Relational Model
Relational database is made up of two-dimensional table, which is used to represent
data in the form of rows and columns. The two-dimensional table in relational
database is known as relation and represents a real world object, which it
represents.
Entity : An entity is a real world object such as person, place etc about which
information is stored inside relations or tables. Each entity has a name associated
with it, which is used to identify the entity. An entity is represented by a table in
relational model.
Ex: employee, student etc are entities.
Entity instance refers to a particular occurrence of an entity in a table.
Ex : a particular employee in an organization represents as instance of the employee
entity.
An entity can be classified as strong entity or a weak entity. A strong entity is an
entity, which can be uniquely identified by an attribute known as key. A weak entity
is an entity, which do not have any key i.e, it cannot be uniquely identified with the
help of some of its attribute.
Attribute : An attribute of an entity defines the properties of that entity. It is used to
describe the specific entity.
Ex: An employee entity is characterized by employee_name, cities etc, which are its
attribute.
abhishek.kumar762@g
mail.com
21. Relational Model (Cont.)
Attributes can be classified into three categories :
Composite Vs simple attribute : Composite attributes are the attributes,
which can be subdivided into smaller attributes. Ex : the name of an
employee can divided into first name, middle name and last name. Simple
attributes are the attributes, which cannot be further subdivided
into smaller attributes.
Single valued Vs multivalued attributes : Single valued attributes are the
attribute, which have a single value for a particular entity. Ex : Employee
name.
Multivalued attributes are the attributes, which can have more than 1
value for a particular entity. Ex : Contact Number.
Derived Vs Stored attributes : Derived attributes are the attributes whose
values can be derived from the value of some other attributes. Ex : working
years of an employee can be derived from the date of joining of the
employee and present date.
The attribute date of joining is said to be the stored attribute from which
we derive the value of some other attribute.
abhishek.kumar762@g
mail.com
22. Relational Model (Cont.)
In Relational Model Concepts
There are certain terms that are used in context of relational model.
These are:
Domain : It is a set of atomic values. The values that cannot be divided into
subcomponents are called atomic values. Generally, we specify a domain as
a data type from which the values forming the domain are taken. Ex : the
employee name attribute has domain from a to z, i.e, it can accept values
between a to z only.
Tuple: In relational data model, a row of a table is termed as a tuple that
gives complete information of an enitity. Ex: the row with employee_name
Rahul in the Employee relation is a tuple.
Relationship : It shows the association between two or more relations i.e
how entities are related to one other. Ex : An employee belongs to one
department and here, Belongs_To is the relationship.
Degree : Degree of relationship is the number of relations participating in
that relationship. Ex: the degree of Belongs_To relationship is two is known
as binary relationship.
abhishek.kumar762@g
mail.com
23. Relational Model
Example of tabular data in the relational model
customer-
name
Customer-id
customer-
street
customer-
city
account-
number
Johnson
Smith
Johnson
Jones
Smith
192-83-7465
019-28-3746
192-83-7465
321-12-3123
019-28-3746
Alma
North
Alma
Main
North
Palo Alto
Rye
Palo Alto
Harrison
Rye
A-101
A-215
A-201
A-217
A-201
Attributes
abhishek.kumar762@g
mail.com
26. Database Languages
A database system provides a data definition language to specify the
database schema and a data manipulation language to express database
queries and updates.
In practice, the data definition and data manipulation languages are not two
separate languages; instead they simply form parts of a single database
language, such as the widely used SQL language.
abhishek.kumar762@g
mail.com
27. Data Definition Language (DDL)
Specification notation for defining the database schema
E.g.
create table account (
account-number char(10),
balance integer)
DDL compiler generates a set of tables stored in a data
dictionary
Data dictionary contains metadata (i.e., data about data)
database schema
Data storage and definition language
language in which the storage structure and access
methods used by the database system are specified
Usually an extension of the data definition language
abhishek.kumar762@g
mail.com
28. Data Manipulation Language (DML)
Language for accessing and manipulating the data
organized by the appropriate data model
DML also known as query language
Two classes of languages
Procedural – user specifies what data is required and how to
get those data
Nonprocedural – user specifies what data is required without
specifying how to get those data
SQL is the most widely used query language
abhishek.kumar762@g
mail.com
29. SQL
SQL: widely used non-procedural language
E.g. find the name of the customer with customer-id 192-83-7465
select customer.customer-name
from customer
where customer.customer-id = ‘192-83-7465’
E.g. find the balances of all accounts held by the customer with
customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer-id = ‘192-83-7465’ and
depositor.account-number =
account.account-number
Application programs generally access databases through one
of
Language extensions to allow embedded SQL
Application program interface (e.g. ODBC/JDBC) which allow SQL
queries to be sent to a database
abhishek.kumar762@gmail.com
30. Data Dictionary
We can define a data dictionary as a DBMS component that stores the
definition of data characteristics and relationships. We can call data
dictionary as “data about data” labelled metadata. The DBMS data
dictionary provides the DBMS with its self describing characteristic. In
effect, the data dictionary resembles and X-ray of the company’s entire
data set, and is a crucial element in the data administration function.
The two main types of data dictionary exist, integrated and stand alone.
An integrated data dictionary is included with the DBMS. For example, all
relational DBMSs include a built in data dictionary or system catalogue
that is frequently accessed and updated by the RDBMS. Other DBMSs
especially older types, do not have a built in data dictionary instead the
DBA may use third party stand alone data dictionary systems.
Data dictionaries can also be classified as active or passive. An active data
dictionary is automatically updated by the DBMS with every database
access, thereby keeping its access information up-to-date. A passive data
dictionary is not updated automatically and usually requires a batch
process to be run. Data dictionary access information is normally used by
the DBMS for query optimization purpose.
The data dictionary’s main function is to store the description of all objects
that interact with the database.
abhishek.kumar762@g
mail.com
31. Data Dictionary (Cont.)
Although, there is no standard format for the information stored in the data
dictionary several features are common.
For example, the data dictionary typically stores descriptions of all :
Data elements that are define in all tables of all databases. Specifically the data
dictionary stores the name, data types, display formats, internal storage
formats, and validation rules. The data dictionary tells where an element is
used, by whom it is used and so on.
Tables define in all databases. For example, the data dictionary is likely to store
the name of the table creator, the date of creation access authorizations, the
number of columns, and so on.
Indexes define for each database tables. For each index the DBMS stores at least
the index name the attributes used, the location, specific index characteristics
and the creation date.
Define databases: who created each database, the date of creation where the
database is located, who the DBA is and so on.
End users and The Administrators of the data base
Programs that access the database including screen formats, report formats
application formats, SQL queries and so on.
Access authorization for all users of all databases.
Relationships among data elements which elements are involved: whether the
relationship are mandatory or optional, the connectivity and cardinality and so
on.
abhishek.kumar762@g
mail.com
32. Data Independence
The three-schema architecture can be used to explain the concept of data
independence, which can be defined as the capacity to change the schema
at one level of a database system without having to change the schema at
the next higher level.
We can define two types of data independence:
1. Logical data independence is the capacity to change the conceptual schema
without having to change external schemas or application programs. We may
change the conceptual schema to expand the database (by adding a record type or
data item), or to reduce the database (by removing a record type or data item). In
the latter case, external schemas that refer only to the remaining data should not be
affected. Only the view definition and the mappings need be changed in a DBMS that
supports logical data independence. Application programs that reference the
external schema constructs must work as before, after the conceptual schema
undergoes a logical reorganization. Changes to constraints can be applied also to the
conceptual schema without affecting the external schemas or application programs.
2. Physical data independence is the capacity to change the internal schema
without having to change the conceptual (or external) schemas. Changes to the
internal schema may be needed because some physical files had to be reorganized—
for example, by creating additional access structures—to improve the performance
of retrieval or update. If the same data as before remains in the database, we should
not have to change the conceptual schema.
abhishek.kumar762@g
mail.com
33. Data Independence (Cont.)
Whenever we have a multiple-level DBMS, its catalog must be expanded to
include information on how to map requests and data among the various
levels. The DBMS uses additional software to accomplish these mappings
by referring to the mapping information in the catalog. Data independence
is accomplished because, when the schema is changed at some level, the
schema at the next higher level remains unchanged; only the mapping
between the two levels is changed. Hence, application programs referring
to the higher-level schema need not be changed.
The three-schema architecture can make it easier to achieve true data
independence, both physical and logical. However, the two levels of
mappings create an overhead during compilation or execution of a query
or program, leading to inefficiencies in the DBMS.
Because of this, few DBMSs have implemented the full three-schema
architecture.
abhishek.kumar762@gmail.com
34. Database Users
Users are differentiated by the way they expect to interact
with the system
Application programmers – interact with system through
DML calls
Sophisticated users – form requests in a database query
language
Specialized users – write specialized database applications
that do not fit into the traditional data processing
framework
Native users – invoke one of the permanent application
programs that have been written previously
E.g. people accessing database over the web, bank tellers,
clerical staff
abhishek.kumar762@g
mail.com
35. Database Administrator
Coordinates all the activities of the database system;
the database administrator has a good understanding
of the enterprise’s information resources and needs.
Database administrator's duties include:
Schema definition
Storage structure and access method definition
Schema and physical organization modification
Granting user authority to access the database
Specifying integrity constraints
Acting as liaison with users
Monitoring performance and responding to changes in
requirements
abhishek.kumar762@g
mail.com
36. Transaction Management
A transaction is a collection of operations that performs a
single logical function in a database application
Transaction-management component ensures that the
database remains in a consistent (correct) state despite
system failures (e.g., power failures and operating system
crashes) and transaction failures.
Concurrency-control manager controls the interaction
among the concurrent transactions, to ensure the
consistency of the database.
abhishek.kumar762@g
mail.com
37. Storage Management
Storage manager is a program module that provides the
interface between the low-level data stored in the database
and the application programs and queries submitted to the
system.
The storage manager is responsible to the following tasks:
interaction with the file manager
efficient storing, retrieving and updating of data
abhishek.kumar762@g
mail.com
39. Application Architectures
Two-tier architecture: E.g. client programs using
ODBC/JDBC to communicate with a
database
Three-tier architecture: E.g. web-based applications, and
applications built using “middleware”
abhishek.kumar762@gmail.com