SlideShare a Scribd company logo
Chapter 3
Database System and Big Data
Introduction
• Database: is an organized collection of data
• It is a well-designed, organized, and carefully managed collection of data
• It can contribute to organizational success by providing managers and decision makers
with timely, accurate, and relevant information built on data.
• It also help companies analyze information to reduce costs, increase profits, add new
customers, track past business activities, and open new market opportunities.
• A database management system (DBMS) is a group of programs that:
• Access, manage and manipulate the database
• Provide an interface between the database and its users and other application programs
• provides a single point of management and control over data resources, which can be
critical to maintaining the integrity and security of the data.
• A database, a DBMS, and the application programs that use the data make up a database
environment
• Databases and database management systems are becoming even more important to
organizations as they deal with rapidly increasing amounts of information.
• Most organizations have many databases; however, without good data management, it is
nearly impossible for anyone to find the right and related information for accurate and
business-critical decision making
Introduction
• Data fundamentals :
• Without data and the ability to process it,
• An organization could not successfully complete most business
activities.
• It cannot pay employees, send out bills, order new inventory, or
produce information to assist managers in decision making.
• Data consists of raw facts such as employee numbers and sales
figures
• Data must be organized in a meaningful way to transform it into
useful information
Hierarchy of Data
• A bit (binary digit i.e.0 or 1) that represents a circuit that is either on
or off
• Bits can be organized into units called bytes
• A byte is made up of eight bits
• Each byte represents a character
• Character: is the basic building block of most information.
• Field: a name, number, or combination of characters that describes an
aspect of a business object or activity
• Record: a collection of related data fields
• File: a collection of related records
• Database: a collection of integrated and related files
• Hierarchy of data: bits, characters, fields, records, files, and
databases
Hierarchy of Data
Data Entities, Attributes, and Keys
• Entity: a person, place, or thing for which data is collected, stored,
and maintained. Examples of entities include employees, products,
and customers.
• Most organizations organize and store data as entities.
• Attribute: a characteristic of an entity. For example, employee
number, last name, first name, hire date, and department number
are attributes for an employee.
• Data item: the specific value of an attribute
• Primary key: a field or set of fields that uniquely identifies the
record
• No other record can have the same primary key.
• The primary key is used to distinguish records so that they can be accessed,
organized, and manipulated. Primary keys ensure that each record in a file is
unique
• A data key is a field within a record that is used to identify the
record
Data Entities, Attributes, and Keys
The Database Approach
• Traditional approach to data management
• Each distinct operational system used data files dedicated to that system
• Database approach to data management
• Information systems share a pool of related data
• Offers the ability to share data and information resources
• To use the approach A database management system (DBMS) is required
• Typically, this software acts as a buffer between the application
programs and the database itself.
Data Modeling and Database Characteristics
• A database should be designed to store all data relevant to the
business and to provide quick access and easy modification.
• Moreover, database must reflect the business processes of the
organization
• Considerations when building a database
• Content: what data should be collected? cost?
• Access: what data should be provided to which users and when?
• Logical structure: how should data be arranged so that it makes
sense?
• Physical organization: where should data be physically located?
• Archiving: how long to store?
• Security: how can data be protected?
Data Modeling
• One of the tools database designers use to show the logical
relationships among data is a data model.
• Data model: a diagram of data entities and their relationships
• Data modeling usually involves developing an understanding
of a specific business problem and then analyzing the data and
information needed to deliver a solution
• Enterprise data modeling is a data modeling done at the level
of the entire enterprise.
• Enterprise data modeling is an approach that starts
• by investigating the general data and information needs of the
organization at the strategic level and
• then moves on to examine more specific data and information needs
for the functional areas and departments within the organization.
• It involves analyzing the data and information needs of an entire
organization
• It provides a roadmap for building database and information systems
Data Modeling
Data Modeling
Data Modeling
•Entity-relationship (ER) diagrams: data models that use basic graphical
symbols to show the organization of and relationships between data
•ER diagrams show data items in tables (entities) and the ways they
are related.
•In most cases, boxes in ER diagrams indicate data items or entities
contained in data tables, and lines show relationships between
entities
•ER diagrams help ensure that the relationships among the data
entities in a database are correctly structured so that any application
programs developed are consistent with business operations and user
needs.
•In addition, ER diagrams can serve as reference documents after a
database is in use.
•If changes are made to the database, ER diagrams help design them
Relational Database Model
• Relational model: a simple but highly useful way to organize data
into collections of two-dimensional tables called relations
• Relation or table represents an Entity
• Each row in the table represents records or instances of an entity
• Each column represents an attribute of that entity
• Domain: range of allowable values for a data attribute
Manipulating Data
•Selecting: eliminating rows according to certain criteria
•Projecting: eliminating columns in a table
•Joining: combining two or more tables
•Linking: combining two or more tables through common
data attributes to form a new table with only the unique
data attributes
Manipulating Data
Data Cleansing
• It is also called data cleaning or data scrubbing
• The process of detecting and then correcting or deleting
incomplete, incorrect, inaccurate, irrelevant records that reside in a
database
• The cost of performing data cleansing can be quite high
• Data cleaning is different from data validation
• Which involves the identification of “bad data” and its rejection at the
time of data entry.
Relational Database Management Systems (DBMSs)
•Relational DBMSs:
• Creating and implementing the right database system ensures
that the database will support both business activities and goals
• Capabilities and types of database systems vary considerably
•SQL Databases :
•SQL: a special-purpose programming language for
accessing and manipulating data stored in a relational
database
•SQL databases conform to ACID properties:
-Atomicity, consistency, isolation, and durability
•SQL was adopted by ANSI 1986 as the standard query
language for relational databases
SQL Databases
SQL Databases
Database Activities
•A) Providing a user view of the database
•B) Creating and modifying the database
•C) Storing and retrieving data
•D) Manipulating the data and generating reports
•A) Providing a User View:
•Schema: a description of the entire database
•A schema can be part of the database or a separate
schema file
•The DBMS can reference a schema to find where to
access the requested data in relation to another piece of
data
B) Creating and Modifying the Database:
• Data definition language (DDL)
• A collection of instructions and commands used to define and describe
data and relationships in a specific database
• Allows the database’s creator to describe data and relationships that are
to be contained in the schema
• Data dictionary: a detailed description of all the data used in the database
• It can also include a description of data flows, information about the
way records are organized, and the data-processing requirements
C) Storing and Retrieving Data:
• When an application program needs data, it requests the data
through the DBMS
• Concurrency control deals with the situation in which two or
more users or applications need to access the same record at the
same time
D) Manipulating Data and Generating Reports:
•Query by Example (QBE) is a visual approach to
developing database queries or requests
•Data manipulation language (DML): a specific language,
provided with a DBMS
•Allows users to access and modify the data, to make
queries, and to generate reports
•A DBMS can produce a wide variety of documents,
reports, and other output that can help organizations
achieve their goals
Manipulating Data and Generating Reports
Database Administration
• Database administrators(DBAs): skilled and trained Information
System professionals
• Works with users to define their data needs
• Applies database programming languages to craft a set of
databases to meet those needs
• Tests and evaluates databases
• Implements changes to improve their databases’ performance
• Assures that data is secure from unauthorized access
• Data administrator: a non technical position responsible for
defining and implementing consistent principles for a variety of
data issues
• Including setting data standards and data definitions that apply across all the
databases in an organization
• The data administrator can be a high-level position reporting to
top-level managers
Popular Database Management Systems
• Database as a Service (DaaS)
• The database is stored on a service provider’s servers
• The database is accessed by the client over a network, typically
the Internet
• Database administration is handled by the service provider
• Example of DaaS: Amazon Relational Database Service (Amazon
RDS)
Using Databases with Other Software
•DBMSs can act as front-end or back-end
applications
•Front-end applications interact directly with people
•Back-end applications interact with other programs or
applications
•Example:
•The Library of Congress (LOC) provides a back-end
application that allows Web access to its databases,
which include references to books and digital media in
the LOC collection
Big Data
• Extremely large and complex data collections
• Traditional data management software, hardware, and analysis processes are
incapable of dealing with them
• Four characteristics of big data
• Volume
• Velocity
• Variety
• Veracity
• Sources of Big Data:
Big Data Uses
• Examples:
• Retail organizations monitor social networks to engage brand advocates,
identify brand adversaries
• Advertising and marketing agencies track comments on social media
• Hospitals analyze medical data and patient records
• Consumer product companies monitor social networks to gain insight into
consumer behavior
• Financial service organizations use data to identify customers who are likely
to be attracted to increasingly targeted and sophisticated offers
• Challenges of Big Data:
-How to choose what subset of the data to store
-Where and how to store the data
-How to find the nuggets of data that are relevant to the decision
making at hand
-How to derive value from the relevant data
-How to identify which data needs to be protected from unauthorized
access
Data Management
• Data management
• An integrated set of functions that defines the processes by which data
is obtained, certified fit for use, stored, secured, and processed in such
a way as to ensure that the accessibility, reliability, and timeliness of
the data meet the needs of the data users within an organization
• Data governance
• Defines the roles, responsibilities, and processes for ensuring that data
can be trusted and used by an entire organization
Data Management
•Data management is driven by a variety of factors:
• The need to meet external regulations designed to manage risk
associated with financial misstatement
• The need to avoid the inadvertent release of sensitive data
• The need to ensure that high data quality is available for key
decisions
•Data governance requires business leadership and active
participation
• Use of a cross-functional tea is recommended
• Team should consist of executives, project managers, line-of-
business managers, and data stewards
• A data steward is an individual responsible for management of
critical data elements
Data Management
• Data lifecycle management (DLM)
• A policy-based approach to managing the flow of an enterprise’s data
Data Warehouses, Data Marts, and Data Lakes
• Data warehouse: a large database that collects business information
from many sources in the enterprise in support of management
decision making
• ETL process:
• Extract
• Transform
• Load
Data Warehouses, Data Marts, and Data Lakes
•Data mart: a subset of a data warehouse that is used by
small- and medium-sized businesses and departments
within large companies to support decision making
•A specific area in the data mart might contain greater
detailed data than the data warehouse
•Data lake: takes a “store everything” approach to big
data, saving all the data in its raw and unaltered form
• Also called an enterprise data hub
• Raw data is available when users decide just how they want to
use the data
• Only when the data is accessed for a specific analysis is it
extracted from the data lake
NoSQL Databases
• NoSQL database
• Provides a means to store and retrieve data that is modeled using some means
other than the simple two-dimensional tabular relations used in relational
databases
• Advantages:
• Ability to spread data over multiple servers so that each server contains only a
subset of the total data
• Do not require a predefined schema
• Data structures are more flexible and can provide improved access speed and
redundancy.
Hadoop
•Hadoop
•An open-source software framework that includes
several software modules that provide a means for
storing and processing extremely large data sets
•Has two primary components:
•A data processing component (MapReduce)
•A distributed file system (Hadoop Distributed File
System, HDFS)
Hadoop
In-Memory Databases
• In-memory database (IMDB)
• A database management system that stores the entire database in random
access memory (RAM)
• Provides access to data at rates much faster than storing data on some form of
secondary storage
• Enables the analysis of big data and other challenging data-processing
applications
• Performs best on multiple multicore CPUs
THE END !!!
QUESTIONS??
THANK U!!!

More Related Content

PPT
Logarithmic Functions
PPTX
Modern geometry
PPTX
RowanDay4.pptx
PPT
Management information system database management
PPTX
DATABASE MANAGEMENT SYSTEMS CS 3492.pptx
PPTX
Lec20.pptx introduction to data bases and information systems
PDF
01-Database Administration and Management.pdf
PDF
DBMS 1.pdf from computer application for business
Logarithmic Functions
Modern geometry
RowanDay4.pptx
Management information system database management
DATABASE MANAGEMENT SYSTEMS CS 3492.pptx
Lec20.pptx introduction to data bases and information systems
01-Database Administration and Management.pdf
DBMS 1.pdf from computer application for business

Similar to Fundamentals of information systems chapter 3.ppt (20)

PPT
This discussion about the dbms introduction
PPSX
PPTX
Utsav Mahendra : Introduction to Database and managemnet
PPTX
Foundations of business intelligence databases and information management
PDF
1.0 Introduction to Database Management Systems.pdf
PPTX
The Database Management System DBMS.pptx
PPTX
CS3270 - DATABASE SYSTEM - Lecture (1)
PPT
Notes on Understanding RDBMS2 for StudentsS.ppt
PDF
dbms Unit 1.pdf arey bhai teri maa chodunga
PPTX
DATA RESOURCE MANAGEMENT
PPTX
DBMS_Lect.1 intro.pptx sdcaacweewcssrwe w
PPTX
MIS-3rd Unit.pptx
PPTX
MIS-3rd Unit.pptx
PPTX
Week 1 and 2 Getting started with DBMS.pptx
PPTX
DBM to the following in details of 1.pptx
PDF
lecture 1.pdf
PPT
Lecture 1 =Unit 1 Part 1.ppt
PDF
(Dbms) class 1 & 2 (Presentation)
PDF
Relational data base management system (Unit 1)
PPTX
dbms introduction.pptx
This discussion about the dbms introduction
Utsav Mahendra : Introduction to Database and managemnet
Foundations of business intelligence databases and information management
1.0 Introduction to Database Management Systems.pdf
The Database Management System DBMS.pptx
CS3270 - DATABASE SYSTEM - Lecture (1)
Notes on Understanding RDBMS2 for StudentsS.ppt
dbms Unit 1.pdf arey bhai teri maa chodunga
DATA RESOURCE MANAGEMENT
DBMS_Lect.1 intro.pptx sdcaacweewcssrwe w
MIS-3rd Unit.pptx
MIS-3rd Unit.pptx
Week 1 and 2 Getting started with DBMS.pptx
DBM to the following in details of 1.pptx
lecture 1.pdf
Lecture 1 =Unit 1 Part 1.ppt
(Dbms) class 1 & 2 (Presentation)
Relational data base management system (Unit 1)
dbms introduction.pptx
Ad

More from ArebuMaruf (8)

PPTX
Fundamentals of information systems chapter 3.pptx
PPTX
chapter three artificial inteligence.pptx
PPTX
chapter 6 ethics emerging technology.pptx
PDF
chapter five Agummented and it's application reality .pdf
PPTX
computer application in management ch1234.pptx
PPTX
Touchless-Touchscreen-Technology-PPT.pptx
PPTX
CHAPTER- 5.pptx
PPTX
Chapter _4_Semantic Analysis .pptx
Fundamentals of information systems chapter 3.pptx
chapter three artificial inteligence.pptx
chapter 6 ethics emerging technology.pptx
chapter five Agummented and it's application reality .pdf
computer application in management ch1234.pptx
Touchless-Touchscreen-Technology-PPT.pptx
CHAPTER- 5.pptx
Chapter _4_Semantic Analysis .pptx
Ad

Recently uploaded (20)

PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
medical staffing services at VALiNTRY
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
AI in Product Development-omnex systems
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Essential Infomation Tech presentation.pptx
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
history of c programming in notes for students .pptx
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Digital Strategies for Manufacturing Companies
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PTS Company Brochure 2025 (1).pdf.......
Reimagine Home Health with the Power of Agentic AI​
medical staffing services at VALiNTRY
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
2025 Textile ERP Trends: SAP, Odoo & Oracle
AI in Product Development-omnex systems
How to Choose the Right IT Partner for Your Business in Malaysia
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Essential Infomation Tech presentation.pptx
Odoo POS Development Services by CandidRoot Solutions
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
history of c programming in notes for students .pptx
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
How Creative Agencies Leverage Project Management Software.pdf
Digital Strategies for Manufacturing Companies
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
CHAPTER 2 - PM Management and IT Context
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Upgrade and Innovation Strategies for SAP ERP Customers

Fundamentals of information systems chapter 3.ppt

  • 2. Introduction • Database: is an organized collection of data • It is a well-designed, organized, and carefully managed collection of data • It can contribute to organizational success by providing managers and decision makers with timely, accurate, and relevant information built on data. • It also help companies analyze information to reduce costs, increase profits, add new customers, track past business activities, and open new market opportunities. • A database management system (DBMS) is a group of programs that: • Access, manage and manipulate the database • Provide an interface between the database and its users and other application programs • provides a single point of management and control over data resources, which can be critical to maintaining the integrity and security of the data. • A database, a DBMS, and the application programs that use the data make up a database environment • Databases and database management systems are becoming even more important to organizations as they deal with rapidly increasing amounts of information. • Most organizations have many databases; however, without good data management, it is nearly impossible for anyone to find the right and related information for accurate and business-critical decision making
  • 3. Introduction • Data fundamentals : • Without data and the ability to process it, • An organization could not successfully complete most business activities. • It cannot pay employees, send out bills, order new inventory, or produce information to assist managers in decision making. • Data consists of raw facts such as employee numbers and sales figures • Data must be organized in a meaningful way to transform it into useful information
  • 4. Hierarchy of Data • A bit (binary digit i.e.0 or 1) that represents a circuit that is either on or off • Bits can be organized into units called bytes • A byte is made up of eight bits • Each byte represents a character • Character: is the basic building block of most information. • Field: a name, number, or combination of characters that describes an aspect of a business object or activity • Record: a collection of related data fields • File: a collection of related records • Database: a collection of integrated and related files • Hierarchy of data: bits, characters, fields, records, files, and databases
  • 6. Data Entities, Attributes, and Keys • Entity: a person, place, or thing for which data is collected, stored, and maintained. Examples of entities include employees, products, and customers. • Most organizations organize and store data as entities. • Attribute: a characteristic of an entity. For example, employee number, last name, first name, hire date, and department number are attributes for an employee. • Data item: the specific value of an attribute • Primary key: a field or set of fields that uniquely identifies the record • No other record can have the same primary key. • The primary key is used to distinguish records so that they can be accessed, organized, and manipulated. Primary keys ensure that each record in a file is unique • A data key is a field within a record that is used to identify the record
  • 8. The Database Approach • Traditional approach to data management • Each distinct operational system used data files dedicated to that system • Database approach to data management • Information systems share a pool of related data • Offers the ability to share data and information resources • To use the approach A database management system (DBMS) is required • Typically, this software acts as a buffer between the application programs and the database itself.
  • 9. Data Modeling and Database Characteristics • A database should be designed to store all data relevant to the business and to provide quick access and easy modification. • Moreover, database must reflect the business processes of the organization • Considerations when building a database • Content: what data should be collected? cost? • Access: what data should be provided to which users and when? • Logical structure: how should data be arranged so that it makes sense? • Physical organization: where should data be physically located? • Archiving: how long to store? • Security: how can data be protected?
  • 10. Data Modeling • One of the tools database designers use to show the logical relationships among data is a data model. • Data model: a diagram of data entities and their relationships • Data modeling usually involves developing an understanding of a specific business problem and then analyzing the data and information needed to deliver a solution • Enterprise data modeling is a data modeling done at the level of the entire enterprise. • Enterprise data modeling is an approach that starts • by investigating the general data and information needs of the organization at the strategic level and • then moves on to examine more specific data and information needs for the functional areas and departments within the organization. • It involves analyzing the data and information needs of an entire organization • It provides a roadmap for building database and information systems
  • 13. Data Modeling •Entity-relationship (ER) diagrams: data models that use basic graphical symbols to show the organization of and relationships between data •ER diagrams show data items in tables (entities) and the ways they are related. •In most cases, boxes in ER diagrams indicate data items or entities contained in data tables, and lines show relationships between entities •ER diagrams help ensure that the relationships among the data entities in a database are correctly structured so that any application programs developed are consistent with business operations and user needs. •In addition, ER diagrams can serve as reference documents after a database is in use. •If changes are made to the database, ER diagrams help design them
  • 14. Relational Database Model • Relational model: a simple but highly useful way to organize data into collections of two-dimensional tables called relations • Relation or table represents an Entity • Each row in the table represents records or instances of an entity • Each column represents an attribute of that entity • Domain: range of allowable values for a data attribute
  • 15. Manipulating Data •Selecting: eliminating rows according to certain criteria •Projecting: eliminating columns in a table •Joining: combining two or more tables •Linking: combining two or more tables through common data attributes to form a new table with only the unique data attributes
  • 17. Data Cleansing • It is also called data cleaning or data scrubbing • The process of detecting and then correcting or deleting incomplete, incorrect, inaccurate, irrelevant records that reside in a database • The cost of performing data cleansing can be quite high • Data cleaning is different from data validation • Which involves the identification of “bad data” and its rejection at the time of data entry.
  • 18. Relational Database Management Systems (DBMSs) •Relational DBMSs: • Creating and implementing the right database system ensures that the database will support both business activities and goals • Capabilities and types of database systems vary considerably •SQL Databases : •SQL: a special-purpose programming language for accessing and manipulating data stored in a relational database •SQL databases conform to ACID properties: -Atomicity, consistency, isolation, and durability •SQL was adopted by ANSI 1986 as the standard query language for relational databases
  • 21. Database Activities •A) Providing a user view of the database •B) Creating and modifying the database •C) Storing and retrieving data •D) Manipulating the data and generating reports •A) Providing a User View: •Schema: a description of the entire database •A schema can be part of the database or a separate schema file •The DBMS can reference a schema to find where to access the requested data in relation to another piece of data
  • 22. B) Creating and Modifying the Database: • Data definition language (DDL) • A collection of instructions and commands used to define and describe data and relationships in a specific database • Allows the database’s creator to describe data and relationships that are to be contained in the schema • Data dictionary: a detailed description of all the data used in the database • It can also include a description of data flows, information about the way records are organized, and the data-processing requirements
  • 23. C) Storing and Retrieving Data: • When an application program needs data, it requests the data through the DBMS • Concurrency control deals with the situation in which two or more users or applications need to access the same record at the same time
  • 24. D) Manipulating Data and Generating Reports: •Query by Example (QBE) is a visual approach to developing database queries or requests •Data manipulation language (DML): a specific language, provided with a DBMS •Allows users to access and modify the data, to make queries, and to generate reports •A DBMS can produce a wide variety of documents, reports, and other output that can help organizations achieve their goals
  • 25. Manipulating Data and Generating Reports
  • 26. Database Administration • Database administrators(DBAs): skilled and trained Information System professionals • Works with users to define their data needs • Applies database programming languages to craft a set of databases to meet those needs • Tests and evaluates databases • Implements changes to improve their databases’ performance • Assures that data is secure from unauthorized access • Data administrator: a non technical position responsible for defining and implementing consistent principles for a variety of data issues • Including setting data standards and data definitions that apply across all the databases in an organization • The data administrator can be a high-level position reporting to top-level managers
  • 27. Popular Database Management Systems • Database as a Service (DaaS) • The database is stored on a service provider’s servers • The database is accessed by the client over a network, typically the Internet • Database administration is handled by the service provider • Example of DaaS: Amazon Relational Database Service (Amazon RDS)
  • 28. Using Databases with Other Software •DBMSs can act as front-end or back-end applications •Front-end applications interact directly with people •Back-end applications interact with other programs or applications •Example: •The Library of Congress (LOC) provides a back-end application that allows Web access to its databases, which include references to books and digital media in the LOC collection
  • 29. Big Data • Extremely large and complex data collections • Traditional data management software, hardware, and analysis processes are incapable of dealing with them • Four characteristics of big data • Volume • Velocity • Variety • Veracity • Sources of Big Data:
  • 30. Big Data Uses • Examples: • Retail organizations monitor social networks to engage brand advocates, identify brand adversaries • Advertising and marketing agencies track comments on social media • Hospitals analyze medical data and patient records • Consumer product companies monitor social networks to gain insight into consumer behavior • Financial service organizations use data to identify customers who are likely to be attracted to increasingly targeted and sophisticated offers • Challenges of Big Data: -How to choose what subset of the data to store -Where and how to store the data -How to find the nuggets of data that are relevant to the decision making at hand -How to derive value from the relevant data -How to identify which data needs to be protected from unauthorized access
  • 31. Data Management • Data management • An integrated set of functions that defines the processes by which data is obtained, certified fit for use, stored, secured, and processed in such a way as to ensure that the accessibility, reliability, and timeliness of the data meet the needs of the data users within an organization • Data governance • Defines the roles, responsibilities, and processes for ensuring that data can be trusted and used by an entire organization
  • 32. Data Management •Data management is driven by a variety of factors: • The need to meet external regulations designed to manage risk associated with financial misstatement • The need to avoid the inadvertent release of sensitive data • The need to ensure that high data quality is available for key decisions •Data governance requires business leadership and active participation • Use of a cross-functional tea is recommended • Team should consist of executives, project managers, line-of- business managers, and data stewards • A data steward is an individual responsible for management of critical data elements
  • 33. Data Management • Data lifecycle management (DLM) • A policy-based approach to managing the flow of an enterprise’s data
  • 34. Data Warehouses, Data Marts, and Data Lakes • Data warehouse: a large database that collects business information from many sources in the enterprise in support of management decision making • ETL process: • Extract • Transform • Load
  • 35. Data Warehouses, Data Marts, and Data Lakes •Data mart: a subset of a data warehouse that is used by small- and medium-sized businesses and departments within large companies to support decision making •A specific area in the data mart might contain greater detailed data than the data warehouse •Data lake: takes a “store everything” approach to big data, saving all the data in its raw and unaltered form • Also called an enterprise data hub • Raw data is available when users decide just how they want to use the data • Only when the data is accessed for a specific analysis is it extracted from the data lake
  • 36. NoSQL Databases • NoSQL database • Provides a means to store and retrieve data that is modeled using some means other than the simple two-dimensional tabular relations used in relational databases • Advantages: • Ability to spread data over multiple servers so that each server contains only a subset of the total data • Do not require a predefined schema • Data structures are more flexible and can provide improved access speed and redundancy.
  • 37. Hadoop •Hadoop •An open-source software framework that includes several software modules that provide a means for storing and processing extremely large data sets •Has two primary components: •A data processing component (MapReduce) •A distributed file system (Hadoop Distributed File System, HDFS)
  • 39. In-Memory Databases • In-memory database (IMDB) • A database management system that stores the entire database in random access memory (RAM) • Provides access to data at rates much faster than storing data on some form of secondary storage • Enables the analysis of big data and other challenging data-processing applications • Performs best on multiple multicore CPUs

Editor's Notes

  • #1: Fundamentals of Information Systems, Ninth Edition Chapter 3 Database System and Big Data
  • #2: Introduction Database: an organized collection of data A database management system (DBMS) is a group of programs that: Manipulate the database Provide an interface between the database and its users and other application programs
  • #3: Introduction Database: an organized collection of data A database management system (DBMS) is a group of programs that: Manipulate the database Provide an interface between the database and its users and other application programs
  • #4: Hierarchy of Data A bit (binary digit) represents a circuit that is either on or off A byte is made up of eight bits Each byte represents a character Field: a name, number, or combination of characters that describes an aspect of a business object or activity Record: a collection of related data fields File: a collection of related records
  • #5: Hierarchy of Data Database: a collection of integrated and related files Hierarchy of data: bits, characters, fields, records, files, and databases
  • #6: Data Entities, Attributes, and Keys Entity: a person, place, or thing for which data is collected, stored, and maintained Attribute: a characteristic of an entity Data item: the specific value of an attribute Primary key: a field or set of fields that uniquely identifies the record
  • #7: Data Entities, Attributes, and Keys Entity: a person, place, or thing for which data is collected, stored, and maintained Attribute: a characteristic of an entity Data item: the specific value of an attribute Primary key: a field or set of fields that uniquely identifies the record
  • #8: The Database Approach
  • #9: Data Modeling and Database Characteristics Considerations when building a database Content: what data should be collected? cost? Access: what data should be provided to which users and when? Logical structure: how should data be arranged so that it makes sense? Physical organization: where should data be physically located? Archiving: how long to store? Security: how can data be protected?
  • #10: Data Modeling
  • #11: Data Modeling
  • #12: Data Modeling
  • #13: Data Modeling
  • #14: Relational Database Model
  • #15: Manipulating Data Selecting: eliminating rows according to certain criteria Projecting: eliminating columns in a table Joining: combining two or more tables Linking: combining two or more tables through common data attributes to form a new table with only the unique data attributes
  • #16: Manipulating Data
  • #17: Data Cleansing Also called data cleaning or data scrubbing The process of detecting and then correcting or deleting incomplete, incorrect, inaccurate, irrelevant records that reside in a database The cost of performing data cleansing can be quite high Different from data validation Which involves the identification of “bad data” and its rejection at the time of data entry
  • #18: Relational Database Management Systems (DBMSs) Creating and implementing the right database system ensures that the database will support both business activities and goals Capabilities and types of database systems vary considerably
  • #19: SQL Databases
  • #20: SQL Databases
  • #21: Database Activities Providing a user view of the database Creating and modifying the database Storing and retrieving data Manipulating the data and generating reports
  • #22: Creating and Modifying the Database Data definition language (DDL) A collection of instructions and commands used to define and describe data and relationships in a specific database Allows the database’s creator to describe data and relationships that are to be contained in the schema Data dictionary: a detailed description of all the data used in the database Can also include a description of data flows, information about the way records are organized, and the data-processing requirements
  • #23: Storing and Retrieving Data When an application program needs data, it requests the data through the DBMS Concurrency control deals with the situation in which two or more users or applications need to access the same record at the same time
  • #24: Manipulating Data and Generating Reports Query by Example (QBE) is a visual approach to developing database queries or requests Data manipulation language (DML): a specific language, provided with a DBMS Allows users to access and modify the data, to make queries, and to generate reports A DBMS can produce a wide variety of documents, reports, and other output that can help organizations achieve their goals
  • #25: Manipulating Data and Generating Reports
  • #26: Database Administration Database administrators (DBAs): skilled and trained IS professionals Works with users to define their data needs Applies database programming languages to craft a set of databases to meet those needs Tests and evaluates databases Implements changes to improve their databases’ performance Assures that data is secure from unauthorized access
  • #27: Popular Database Management Systems
  • #28: Using Databases with Other Software DBMSs can act as front-end or back-end applications Front-end applications interact directly with people Back-end applications interact with other programs or applications Example: The Library of Congress (LOC) provides a back-end application that allows Web access to its databases, which include references to books and digital media in the LOC collection
  • #29: Big Data Extremely large and complex data collections Traditional data management software, hardware, and analysis processes are incapable of dealing with them Three characteristics of big data Volume Velocity Variety
  • #30: Big Data Uses Examples: Retail organizations monitor social networks to engage brand advocates, identify brand adversaries Advertising and marketing agencies track comments on social media Hospitals analyze medical data and patient records Consumer product companies monitor social networks to gain insight into consumer behavior Financial service organizations use data to identify customers who are likely to be attracted to increasingly targeted and sophisticated offers
  • #31: Data Management Data management An integrated set of functions that defines the processes by which data is obtained, certified fit for use, stored, secured, and processed in such a way as to ensure that the accessibility, reliability, and timeliness of the data meet the needs of the data users within an organization Data governance Defines the roles, responsibilities, and processes for ensuring that data can be trusted and used by an entire organization
  • #32: Data Management Data management is driven by a variety of factors: The need to meet external regulations designed to manage risk associated with financial misstatement The need to avoid the inadvertent release of sensitive data The need to ensure that high data quality is available for key decisions Data governance requires business leadership and active participation Use of a cross-functional tea is recommended Team should consist of executives, project managers, line-of-business managers, and data stewards A data steward is an individual responsible for management of critical data elements
  • #33: Data Management Data lifecycle management (DLM) A policy-based approach to managing the flow of an enterprise’s data
  • #34: Data Warehouses, Data Marts, and Data Lakes Data warehouse: a large database that collects business information from many sources in the enterprise in support of management decision making ETL process Extract Transform Load
  • #35: Data Warehouses, Data Marts, and Data Lakes Data mart: a subset of a data warehouse that is used by small- and medium-sized businesses and departments within large companies to support decision making A specific area in the data mart might contain greater detailed data than the data warehouse Data lake: takes a “store everything” approach to big data, saving all the data in its raw and unaltered form Also called an enterprise data hub Raw data is available when users decide just how they want to use the data Only when the data is accessed for a specific analysis is it extracted from the data lake
  • #36: NoSQL Databases NoSQL database Provides a means to store and retrieve data that is modeled using some means other than the simple two-dimensional tabular relations used in relational databases Advantages: Ability to spread data over multiple servers so that each server contains only a subset of the total data Do not require a predefined schema Data structures are more flexible and can provide improved access speed and redundancy
  • #37: Hadoop Hadoop An open-source software framework that includes several software modules that provide a means for storing and processing extremely large data sets Has two primary components: A data processing component (MapReduce) A distributed file system (Hadoop Distributed File System, HDFS)
  • #38: Hadoop
  • #39: In-Memory Databases In-memory database (IMDB) A database management system that stores the entire database in random access memory (RAM) Provides access to data at rates much faster than storing data on some form of secondary storage Enables the analysis of big data and other challenging data-processing applications Performs best on multiple multicore CPUs