SlideShare a Scribd company logo
ECU-M 213: HEALTH INFORMATICS
By: Patience A. Jaffu
Bsc Maths, CSC(Mak 2012) and MHI(Mak 2020)
Lecture 6: Big data and Data acquisition
Big data
• A collection of large and complex datasets which are difficult to process using
common database management tools or traditional data processing applications.
• Big Data is a combination of structured, semi-structured and unstructured data. It
is “data whose size forces us to look beyond the tried-and-true methods that are
prevalent at that time”
• It is characterized by 5big Vs; Volume, Velocity, Variety,Varacity and Value
“When the size of the data itself becomes part of the problem and traditional
techniques for working with data run out of steam”
Characteristics of big data
• Volume (amount of data): dealing with large scales of data within data processing
(e.g. Global Supply Chains, Global Financial Analysis, DHIS2 data).
• Velocity (speed of data): dealing with streams of high frequency of incoming
real-time data (e.g. Sensors, Electronic Trading, Internet ).
• Variety (range of data types/sources): dealing with data using differing syntactic
formats (e.g. Spreadsheets, XML, DBMS), schemas/graphs, and meanings.
• Value: Without business value, big data is simply a lot of data. With business value,
it becomes a rich mine of business intelligence. Spend resources on big data
analytics to realize that value.
• Veracity: the “truth” or accuracy of data and information assets, which
often determines executive-level confidence
• It dictates how reliable and significant the data really is.
• Low veracity data, usually contains a high percentage of non-valuable, 'noisy'
and meaningless data, that will not benefit an organization's analysis.
Big data Value chain
Data acquisition
• Data acquisition has been understood as the process of gathering, filtering, and
cleaning data before the data is put in a data warehouse or any other storage
solution.
• Data acquisition is one of the major big data challenges in terms of infrastructure
requirements
• The infrastructure required to support the acquisition of big data must deliver low,
predictable latency(time delay) in both capturing data and in executing queries; be
able to handle very high transaction volumes, often in a distributed environment;
and support flexible and dynamic data structures.
Data acquisition
• The acquisition of big data is most commonly governed by four of the
Vs(characteristics of big data): volume, velocity, variety, and value.
• Most data acquisition scenarios assume high Vs, but low-value data, making
it important to have adaptable and time-efficient gathering, filtering, and
cleaning algorithms that ensure that only the high-value of the data are
actually processed by the data-warehouse analysis.
Data acquisition
• However, in healthcare, most/all data is of potentially high value as it can be
important in improving patient outcomes
• For such organizations, data analysis, classification, and packaging on very
high data volumes play the most central role after the data acquisition.
Data Analysis
• This is concerned with making the raw data acquired amenable to use in
decision-making as well as domain-specific usage.
• Data analysis involves exploring, transforming, and modelling data with the
goal of highlighting relevant data, synthesizing/amalgamating and extracting
useful hidden information with high potential from a business point of view.
Data Curation
• This is the active management of data over its life cycle to ensure it meets
the necessary data quality requirements for its effective usage.
• Data curators (also known as scientific curators, or data annotators) hold the
responsibility of ensuring that data are trustworthy, discoverable, accessible,
reusable, and fit their purpose.
Data Storage
• This is the persistence and management of data in a scalable way that
satisfies the needs of applications that require fast access to the data.
Data Usage
• This covers the data-driven business activities that need access to data, its
analysis, and the tools needed to integrate the data analysis within the
business activity.
• Data usage in business decision-making can enhance competitiveness
through reduction of costs, increased added value, or any other parameter
that can be measured against existing performance criteria.
• Big data has already influenced many business and has the potential to
impact all business sectors.
Lecture 6_Data acquisition.pptx power points
Data acquisition in the health sector
• Within the health sector big data technology aims to establish a holistic
approach whereby clinical, financial, and administrative data as well as patient
behavioral data, population data, medical device data, and any other related
health data are combined and used for retrospective, real-time, and predictive
analysis.
Data acquisition in the health sector
• In order to establish a basis for the successful implementation of big data
health applications, the challenge of data digitalization and acquisition (i.e.
putting health data in a form suitable as input for analytic solutions) needs to
be addressed.
• Today, large amounts of health data are stored in data silos and data
exchange is only possible via Scan, Fax, or email.
• Due to inflexible interfaces and missing standards, the aggregation of health
data relies on individualized solutions with high costs.
Data acquisition in the health sector
• In hospitals, patient data is stored on CIS (clinical information system) or
EHR (electronic health record ) systems.
• However, different clinical departments might use different systems, such as
RIS (radiology information system), LIS (laboratory information system ), or
PACS (picture archiving and communication system) to store their data.
There is no standard data model or EHR system.
Today we can exchange data using HL7
Types of data
1. Structured data
2. Unstructured data
Structured data
• Structured data usually resides in relational databases (RDBMS).
• Fields store length-delineated data phone numbers, Social Security numbers,
or ZIP codes.
• Even text strings of variable length like names are contained in records,
making it a simple matter to search. Data may be human- or machine-
generated as long as the data is created within an RDBMS structure.
Unstructured data
Unstructured data is essentially everything else.
Unstructured data has internal structure but is not structured via pre-defined
data models or schema. It may be textual or non-textual, and human- or
machine-generated. It may also be stored within a non-relational database.
1. Human generated unstructured data includes:
• Text files: Microsoft Word, spreadsheets, PowerPoint.
• Social media: Data from Facebook, twitter, LinkedIn.
• Website: Youtube, Instagram, photo sharing sites.
• Mobile data: Text messages, locations.
• Communication: Chat, phone recordings, collaboration software.
• Media: MP3,Digital photos, audio sharing sites.
Machine generated structured data:
• Satellite imagery: Weather data, land forms, military movements.
• Scientific data: Oil and gas exploration, space exploration, seismic imagery,
atmospheric data.
• Sensor data: Traffic, weather, oceanographic sensors
Lecture 6_Data acquisition.pptx power points
Limitations to data acquisition
1. Privacy and security
• These need to be addressed by the systems and technologies used in the data
acquisition process.
• Many systems already generate and collect large amounts of data, but only a
small fragment is used actively in business processes.
2. Confidentiality
Confidentiality in health care refers to the obligation of professionals who have
access to patient records or communication to hold that information in
confidence.
Privacy, confidentiality and security of patient data
Confidentiality: Everyone in the organization is responsible for patient confidentiality
• Board members
• Executive leadership
• Clinical staff
• Physicians and nurses
• Administrative and clerical staff
• Students and interns
• Volunteers
What information is confidential?
The following is a list of patient information that must remain confidential
•Identity(e.g. name, address, social security #, date of birth, etc.)
•Physical condition
•Emotional condition
•Financial information
• Confidentiality ensures that individual health information is used for the intended
purpose only, and that patient consent is required for any disclosure.
Guiding Principles
• Access patient information only if there is a ‘Need to Know’
• Discard confidential information appropriately– (e.g. Locked Trash Bins or Shredders)
• Forward requests for medical records to the Health Information Management Department.
• Do not discuss confidential matters where others might over hear.– (e.g. Cafeteria, Elevator,
Buses, or Restaurants)
• Do not leave patients charts or files unattended
• Report suspicious activities that may compromise patient confidentiality to the Privacy
Officer
Privacy
Privacy, as distinct from confidentiality, is viewed as the right of the individual client or
patient to be let alone and to make decisions about how personal information is shared
(Brodnik, 2012)
State & Federal Laws that Protect Patient Privacy
• Health Insurance Portability & Accountability Act of 1996 (HIPAA)
• American Recovery and Reinvestment Act of 2009 (ARRA) – HITECT
breach notification provisions
Privacy
• THE DATA PROTECTION AND PRIVACY ACT, 2019
https://ulii.org/system/files/legislation/act/2019/1/THE%20DATA%20P
ROTECTION%20AND%20PRIVACY%20BILL%20-%20ASSENTED.pdf
Privacy
What is the purpose of HIPAA?
• Improve the efficiency and effectiveness of the health care system
• Encourage the development of an electronic health record
• Establish national standards for electronic transmission of certain health information
• Establish national standards to protect health information
• Ensure patient confidentiality
• Protect patient privacy
• Build loyalty and trust
• Provide exceptional customer service
What is PHI?
• PHI stands for Protected Health Information and includes demographic
information that identifies an individual and:
– Is created or received by a health care provider, health plan, employer, or
health care clearing house.
– Relates to the past, present, or future physical or mental health or condition
of an individual.
– Describes the past, present or future payment for the provision of health care
to an individual.
Who has to follow HIPAA?
Anyone who:
• Currently works directly with patients
• Currently sees, uses, or shares PHI as a part of their job
• Currently access any hospital systems, records, tools, and information that
may contain PHI.
The entire organization/hospital is responsible for protecting the privacy
of our patients and upholding all HIPAA Privacy & Security Rules
Privacy
Where is PHI Found?
• Medical records
• Patient information systems
• Billing information (bills, receipts, EOBs, etc.)
• Test results
• X-rays
• Clinic lists
• Labels on IV bags
• Patient menus
Where is PHI Found?
• Conversations
• Telephone notes (in certain situations)
• Patient information on a mobile device
Privacy
Permitted Uses and Disclosures of PHI Include:
1. Treatment of the patient
•Direct patient care
•Coordination of care
•Consultations
•Referrals to other health care providers
2. Payment of healthcare bills
3. Operations related to healthcare
4. Research when approved by an Institutional Review Board (IRB)
5. Required by law (e.g. subpoena, court order, etc.)
Patient Rights
1. Right to Access
• Any information contained in their medical and billing record
2. Right to Amend
• Patients may request in writing, an amendment to their medical records if they feel it contains incorrect
or incomplete information
3. Right to an Account of Un-Authorized Disclosures
Patients have the right to receive a list of disclosures , other than for treatment, payment, or operations
4. Right to Request Special Communications
Patients may ask the hospital to contact them via an alternative phone number or address
Patient Rights (continued)
5. Right to Request Restrictions
Patients may request not to be included (opt-out) in the directory. Patient
information should not be shared with clergy, friends, or anyone
6. Right to Receive a Notice of Privacy Practices
The Organisation is required to provide a written notice of how they will use and
disclose patient health information
7. Right to File a Complaint
Patients have the right to file a complaint without fear of retaliation
Security
Security refers directly to protection, and specifically to the means
used to protect the privacy of health information and support
professionals in holding that information in confidence.
• When we protect patient data, we help build trust between patients and providers.
• Ensure Protected Health Information (PHI) is not disclosed to unauthorized persons.
• Do not send email containing Protected Health Information (PHI) unless it is encrypted.
• Log off your computer if you have to leave your workstation.
• If you suspect someone is using your login ID, you must report it immediately.
• It is your responsibility to report incidents to your supervisor , Privacy
Officer, if you suspect a patients Protected Health Information (PHI) might
have been acquired, accessed, used or disclosed without authorization.
• The Privacy, Confidentiality and Security Assessment Tool
https://www.unaids.org/sites/default/files/media_asset/confidentiality_sec
urity_assessment_tool_en.pdf
• UGANDA MEDICAL AND DENTAL PRACTITIONERS
COUNCIL(UMDPC)
https://www.umdpc.com/Resources/Code%20of%20Professional%20Ethic
s.pdf
• More literature : A Primer on the Privacy, Security, and Confidentiality of
Electronic Health Records by Manish Kumar, Samuel Wambugu
(MEASURE Evaluation)
Data Standards
• Data standards encompasses methods, protocols, terminologies, and
specifications for the collection, exchange, storage, and retrieval of
information associated with health care applications, including medical
records, medications, radiological images, payment and reimbursement,
medical devices and monitoring systems, and administrative processes
Standardizing health care data
• Definition of data elements—determination of the data content to be collected
and exchanged.
Data Element Tag
• A DICOM message can be visualized as a stream of data elements, where
each element is made up of four data fields: element tag, optional value
representation, value length and the value itself.
• The Data Element Tag is a pair of 16-bit unsigned integers(xxxx xxxx)
representing the group number and the element number.
Examples of data element tags:
• (0008,0020) Study Date
• (0008,0030) Study Time
• (0008,0060) Modality
• (0010,0010) Patient’s Name
• (0010,0020) Patient ID
• (0028,0010) Number of pixel rows in the image
• (0028,0011) Number of pixel columns in the
image
• (0038,001A) Scheduled admission date
• (0038,001B) Scheduled admission time
• The tags are identified by hexa-decimal number,
and they can range from 0000 to FFFF.
• They are always sorted in ascending order in a
DICOM header to make it easily searchable
• Data interchange formats—standard formats for electronically encoding the data
elements
• Terminologies—the medical terms and concepts used to describe, classify, and
code the data elements and data expression languages and syntax that
describe the relationships among the terms/concepts.
• Knowledge Representation—standard methods for electronically representing
medical literature, clinical guidelines, and the like for decision support.
Three primary areas in which standards for health
care data need to be developed
• Data interchange
• Terminologies
• Knowledge representation
Data Interchange Standards
• These are needed for message format, document architecture, clinical templates,
user interface, and patient data linkage.
• Message Format Standards: These facilitate interoperability through the use of
common encoding specifications, information models for defining relationships
between data elements, document architectures, and clinical templates for
structuring data as they are exchanged.
• These include the Health Level Seven [HL7] Version 2.x [V2.x] series for clinical
data messaging, Digital Imaging and Communications in Medicine [DICOM] for
medical images, National Council for Prescription Drug Programs [NCPDP] Script
for retail pharmacy messaging,
• Health Level 7 (HL7) : This is the primary data interchange standard for
clinical messaging and is presently adopted in 90 percent of large hospitals.
• Logical Observation Identifiers, Names and Codes [LOINC] for reporting
of laboratory results
• Institute of Electrical and Electronics Engineers [IEEE] standards for
medical devices e.g IEEE 802.16 – Wireless Networking,
Terminologies
• Standardized terminologies facilitate electronic data collection at the point of
care; retrieval of relevant data, information, and knowledge (i.e., evidence);
and data reuse for multiple purposes, such as automated surveillance, clinical
decision support, and quality and cost monitoring.
• To promote patient safety and enable quality management, standardized
terminologies that represent the focus (e.g., medical diagnosis, nursing
diagnosis, patient problem) and interventions of the variety of clinicians
involved in health care as well as data about the patient (e.g., age, gender,
ethnicity, severity of illness, preferences, functional status) are necessary
SNOMED CT
• This is the most well-developed concept-oriented terminology to date. A
concept-oriented reference terminology can be defined as one that has such
characteristics as a grammar that defines the rules for automated generation
and classification of new concepts, as well as the combining of atomic
concepts to form molecular expressions.
• SNOMED CT is based on a formal terminology model that provides
nonambiguous definitions of health care concepts and contains the most
granular concepts for representing clinical and patient safety information
• SNOMED CT is based on a formal terminology model that provides
nonambiguous definitions of health care concepts and contains the most
granular concepts for representing clinical and patient safety information.
• SNOMED CT requires the support of additional terminologies to capture
certain clinical data not currently available in the terminology with sufficient
granularity or scope, namely laboratory, medication, and medical device data.
LOINC
• This is the terminology for representing laboratory test results .
• LOINC is the available terminology that most fully represents laboratory
data in terms of naming for tests (e.g., chemistry, hematology) and clinical
observations (e.g., blood pressure, respiratory rate). The LOINC terms are
composed of up to eight dimensions derived from component (e.g., analyte),
type of property
• LOINC is the terminology for representing laboratory test results and is a
part of the NCVHS core terminology group
• LOINC is the available terminology that most fully represents laboratory
data in terms of naming for tests (e.g., chemistry, hematology) and clinical
observations (e.g., blood pressure, respiratory rate).
Publically Available Medical Image Repositories

More Related Content

PPTX
Big Data Mining Methods in Medical Applications [Autosaved].pptx
PPTX
Lecture 3 Data Mining.pptx power points for graduates
PDF
Big data in healthcare
PDF
Unit 3.pdf
PPTX
Data Warehousing: Bridging Islands of Health Information Systems
PPTX
The Data Operating System: Changing the Digital Trajectory of Healthcare
PPTX
The Data Operating System: Changing the Digital Trajectory of Healthcare
PDF
Unit-1 introduction to Big data.pdf
Big Data Mining Methods in Medical Applications [Autosaved].pptx
Lecture 3 Data Mining.pptx power points for graduates
Big data in healthcare
Unit 3.pdf
Data Warehousing: Bridging Islands of Health Information Systems
The Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of Healthcare
Unit-1 introduction to Big data.pdf

Similar to Lecture 6_Data acquisition.pptx power points (20)

PDF
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
PPTX
TOPIC.pptx
PPTX
Bigdata and Hadoop with applications
PPTX
Group 2 Handling and Processing of big data.pptx
PPTX
Big data
PDF
dataminingppt-170616163835.pdf jejwwkwnwnn
PPTX
PPT
BIG DATA.ppt
PDF
Regulatory Intelligence
PPTX
PPT 1.1.4.pptx_PPT 1.1.4.pptx_PPT 1.1.4.pptx
PPTX
PPT 1.1.4.pptx_PPT 1.1.4.pptx_PPT 1.1.4.pptx
PPTX
Big data Analytics Unit - CCS334 Syllabus
PDF
CS3352-Foundations of Data Science Notes.pdf
PDF
Big Data Analytics
PPTX
DOWLD SLIDES.pptx
PPTX
MIS Big Data & Data Analytics.pptx
PPTX
How to Architect Smarter Systems for Healthcare
PDF
Big Data in Healthcare -- What Does it Mean?
PDF
Data Governance in two different data archives: When is a federal data reposi...
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
TOPIC.pptx
Bigdata and Hadoop with applications
Group 2 Handling and Processing of big data.pptx
Big data
dataminingppt-170616163835.pdf jejwwkwnwnn
BIG DATA.ppt
Regulatory Intelligence
PPT 1.1.4.pptx_PPT 1.1.4.pptx_PPT 1.1.4.pptx
PPT 1.1.4.pptx_PPT 1.1.4.pptx_PPT 1.1.4.pptx
Big data Analytics Unit - CCS334 Syllabus
CS3352-Foundations of Data Science Notes.pdf
Big Data Analytics
DOWLD SLIDES.pptx
MIS Big Data & Data Analytics.pptx
How to Architect Smarter Systems for Healthcare
Big Data in Healthcare -- What Does it Mean?
Data Governance in two different data archives: When is a federal data reposi...
Ad

More from Josephmwanika (20)

PPTX
Gasless Abdomen and displaced bowel.pptx
PPTX
INTRA-ABDOMINAL CALCIFICATIONS.pptx present
PPTX
BOWELL OBSTRUCTION..pptx presentation master
PPTX
ADOMINAL IMAGING 1.pptx presentation for master
PPTX
vascular real.pptx presentation for masters
PPTX
Trachea and Airways Diseases.Pptx present
PPTX
COPD.pptx presentation for master students
PPTX
PLEURAL Diseases.Pptx for master students
PPTX
Pulmonary nodules.Pptx for master students
PPTX
cystic lung diseases.Pptx for master students
PPTX
THYROID ULTRASOUND.Pptx presentation masterpiece
PPTX
BIOPHYSICAL PROFILE and iugr.pptx presentation
PPTX
GESTATIONAL TROPHOBLASTIC ppt.pptx for master
PPTX
MALE Pelvis power point presentation for master students
PPTX
HEMODYNAMICS OF VASCULAR DISEASES ppt.pptx
PPTX
CEREBRAL_STRUCTURE_MAIN.power points for master students
PPTX
penile DOPPLER for masters courses and above.pptx
PDF
2024 RENAL DUPLEX for master students.pdf
PPTX
DOPPLER FLOW IMAGING AND SPECTRAL ANALYSIS.pptx
PPTX
carotid stenosis [Autosaved].pptx for master students
Gasless Abdomen and displaced bowel.pptx
INTRA-ABDOMINAL CALCIFICATIONS.pptx present
BOWELL OBSTRUCTION..pptx presentation master
ADOMINAL IMAGING 1.pptx presentation for master
vascular real.pptx presentation for masters
Trachea and Airways Diseases.Pptx present
COPD.pptx presentation for master students
PLEURAL Diseases.Pptx for master students
Pulmonary nodules.Pptx for master students
cystic lung diseases.Pptx for master students
THYROID ULTRASOUND.Pptx presentation masterpiece
BIOPHYSICAL PROFILE and iugr.pptx presentation
GESTATIONAL TROPHOBLASTIC ppt.pptx for master
MALE Pelvis power point presentation for master students
HEMODYNAMICS OF VASCULAR DISEASES ppt.pptx
CEREBRAL_STRUCTURE_MAIN.power points for master students
penile DOPPLER for masters courses and above.pptx
2024 RENAL DUPLEX for master students.pdf
DOPPLER FLOW IMAGING AND SPECTRAL ANALYSIS.pptx
carotid stenosis [Autosaved].pptx for master students
Ad

Recently uploaded (20)

PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Open Quiz Monsoon Mind Game Final Set.pptx
PDF
Open folder Downloads.pdf yes yes ges yes
PDF
Pre independence Education in Inndia.pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
master seminar digital applications in india
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Cardiovascular Pharmacology for pharmacy students.pptx
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
102 student loan defaulters named and shamed – Is someone you know on the list?
Open Quiz Monsoon Mind Game Final Set.pptx
Open folder Downloads.pdf yes yes ges yes
Pre independence Education in Inndia.pdf
01-Introduction-to-Information-Management.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Anesthesia in Laparoscopic Surgery in India
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Basic Mud Logging Guide for educational purpose
master seminar digital applications in india
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
Week 4 Term 3 Study Techniques revisited.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Cardiovascular Pharmacology for pharmacy students.pptx
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Pharmacology of Heart Failure /Pharmacotherapy of CHF

Lecture 6_Data acquisition.pptx power points

  • 1. ECU-M 213: HEALTH INFORMATICS By: Patience A. Jaffu Bsc Maths, CSC(Mak 2012) and MHI(Mak 2020) Lecture 6: Big data and Data acquisition
  • 2. Big data • A collection of large and complex datasets which are difficult to process using common database management tools or traditional data processing applications. • Big Data is a combination of structured, semi-structured and unstructured data. It is “data whose size forces us to look beyond the tried-and-true methods that are prevalent at that time” • It is characterized by 5big Vs; Volume, Velocity, Variety,Varacity and Value “When the size of the data itself becomes part of the problem and traditional techniques for working with data run out of steam”
  • 3. Characteristics of big data • Volume (amount of data): dealing with large scales of data within data processing (e.g. Global Supply Chains, Global Financial Analysis, DHIS2 data). • Velocity (speed of data): dealing with streams of high frequency of incoming real-time data (e.g. Sensors, Electronic Trading, Internet ). • Variety (range of data types/sources): dealing with data using differing syntactic formats (e.g. Spreadsheets, XML, DBMS), schemas/graphs, and meanings. • Value: Without business value, big data is simply a lot of data. With business value, it becomes a rich mine of business intelligence. Spend resources on big data analytics to realize that value.
  • 4. • Veracity: the “truth” or accuracy of data and information assets, which often determines executive-level confidence • It dictates how reliable and significant the data really is. • Low veracity data, usually contains a high percentage of non-valuable, 'noisy' and meaningless data, that will not benefit an organization's analysis.
  • 6. Data acquisition • Data acquisition has been understood as the process of gathering, filtering, and cleaning data before the data is put in a data warehouse or any other storage solution. • Data acquisition is one of the major big data challenges in terms of infrastructure requirements • The infrastructure required to support the acquisition of big data must deliver low, predictable latency(time delay) in both capturing data and in executing queries; be able to handle very high transaction volumes, often in a distributed environment; and support flexible and dynamic data structures.
  • 7. Data acquisition • The acquisition of big data is most commonly governed by four of the Vs(characteristics of big data): volume, velocity, variety, and value. • Most data acquisition scenarios assume high Vs, but low-value data, making it important to have adaptable and time-efficient gathering, filtering, and cleaning algorithms that ensure that only the high-value of the data are actually processed by the data-warehouse analysis.
  • 8. Data acquisition • However, in healthcare, most/all data is of potentially high value as it can be important in improving patient outcomes • For such organizations, data analysis, classification, and packaging on very high data volumes play the most central role after the data acquisition.
  • 9. Data Analysis • This is concerned with making the raw data acquired amenable to use in decision-making as well as domain-specific usage. • Data analysis involves exploring, transforming, and modelling data with the goal of highlighting relevant data, synthesizing/amalgamating and extracting useful hidden information with high potential from a business point of view.
  • 10. Data Curation • This is the active management of data over its life cycle to ensure it meets the necessary data quality requirements for its effective usage. • Data curators (also known as scientific curators, or data annotators) hold the responsibility of ensuring that data are trustworthy, discoverable, accessible, reusable, and fit their purpose.
  • 11. Data Storage • This is the persistence and management of data in a scalable way that satisfies the needs of applications that require fast access to the data.
  • 12. Data Usage • This covers the data-driven business activities that need access to data, its analysis, and the tools needed to integrate the data analysis within the business activity. • Data usage in business decision-making can enhance competitiveness through reduction of costs, increased added value, or any other parameter that can be measured against existing performance criteria.
  • 13. • Big data has already influenced many business and has the potential to impact all business sectors.
  • 15. Data acquisition in the health sector • Within the health sector big data technology aims to establish a holistic approach whereby clinical, financial, and administrative data as well as patient behavioral data, population data, medical device data, and any other related health data are combined and used for retrospective, real-time, and predictive analysis.
  • 16. Data acquisition in the health sector • In order to establish a basis for the successful implementation of big data health applications, the challenge of data digitalization and acquisition (i.e. putting health data in a form suitable as input for analytic solutions) needs to be addressed. • Today, large amounts of health data are stored in data silos and data exchange is only possible via Scan, Fax, or email. • Due to inflexible interfaces and missing standards, the aggregation of health data relies on individualized solutions with high costs.
  • 17. Data acquisition in the health sector • In hospitals, patient data is stored on CIS (clinical information system) or EHR (electronic health record ) systems. • However, different clinical departments might use different systems, such as RIS (radiology information system), LIS (laboratory information system ), or PACS (picture archiving and communication system) to store their data. There is no standard data model or EHR system. Today we can exchange data using HL7
  • 18. Types of data 1. Structured data 2. Unstructured data
  • 19. Structured data • Structured data usually resides in relational databases (RDBMS). • Fields store length-delineated data phone numbers, Social Security numbers, or ZIP codes. • Even text strings of variable length like names are contained in records, making it a simple matter to search. Data may be human- or machine- generated as long as the data is created within an RDBMS structure.
  • 20. Unstructured data Unstructured data is essentially everything else. Unstructured data has internal structure but is not structured via pre-defined data models or schema. It may be textual or non-textual, and human- or machine-generated. It may also be stored within a non-relational database. 1. Human generated unstructured data includes: • Text files: Microsoft Word, spreadsheets, PowerPoint. • Social media: Data from Facebook, twitter, LinkedIn.
  • 21. • Website: Youtube, Instagram, photo sharing sites. • Mobile data: Text messages, locations. • Communication: Chat, phone recordings, collaboration software. • Media: MP3,Digital photos, audio sharing sites.
  • 22. Machine generated structured data: • Satellite imagery: Weather data, land forms, military movements. • Scientific data: Oil and gas exploration, space exploration, seismic imagery, atmospheric data. • Sensor data: Traffic, weather, oceanographic sensors
  • 24. Limitations to data acquisition 1. Privacy and security • These need to be addressed by the systems and technologies used in the data acquisition process. • Many systems already generate and collect large amounts of data, but only a small fragment is used actively in business processes.
  • 25. 2. Confidentiality Confidentiality in health care refers to the obligation of professionals who have access to patient records or communication to hold that information in confidence.
  • 26. Privacy, confidentiality and security of patient data Confidentiality: Everyone in the organization is responsible for patient confidentiality • Board members • Executive leadership • Clinical staff • Physicians and nurses • Administrative and clerical staff • Students and interns • Volunteers
  • 27. What information is confidential? The following is a list of patient information that must remain confidential •Identity(e.g. name, address, social security #, date of birth, etc.) •Physical condition •Emotional condition •Financial information • Confidentiality ensures that individual health information is used for the intended purpose only, and that patient consent is required for any disclosure.
  • 28. Guiding Principles • Access patient information only if there is a ‘Need to Know’ • Discard confidential information appropriately– (e.g. Locked Trash Bins or Shredders) • Forward requests for medical records to the Health Information Management Department. • Do not discuss confidential matters where others might over hear.– (e.g. Cafeteria, Elevator, Buses, or Restaurants) • Do not leave patients charts or files unattended • Report suspicious activities that may compromise patient confidentiality to the Privacy Officer
  • 29. Privacy Privacy, as distinct from confidentiality, is viewed as the right of the individual client or patient to be let alone and to make decisions about how personal information is shared (Brodnik, 2012) State & Federal Laws that Protect Patient Privacy • Health Insurance Portability & Accountability Act of 1996 (HIPAA) • American Recovery and Reinvestment Act of 2009 (ARRA) – HITECT breach notification provisions
  • 30. Privacy • THE DATA PROTECTION AND PRIVACY ACT, 2019 https://ulii.org/system/files/legislation/act/2019/1/THE%20DATA%20P ROTECTION%20AND%20PRIVACY%20BILL%20-%20ASSENTED.pdf
  • 31. Privacy What is the purpose of HIPAA? • Improve the efficiency and effectiveness of the health care system • Encourage the development of an electronic health record • Establish national standards for electronic transmission of certain health information • Establish national standards to protect health information • Ensure patient confidentiality • Protect patient privacy • Build loyalty and trust • Provide exceptional customer service
  • 32. What is PHI? • PHI stands for Protected Health Information and includes demographic information that identifies an individual and: – Is created or received by a health care provider, health plan, employer, or health care clearing house. – Relates to the past, present, or future physical or mental health or condition of an individual. – Describes the past, present or future payment for the provision of health care to an individual.
  • 33. Who has to follow HIPAA? Anyone who: • Currently works directly with patients • Currently sees, uses, or shares PHI as a part of their job • Currently access any hospital systems, records, tools, and information that may contain PHI. The entire organization/hospital is responsible for protecting the privacy of our patients and upholding all HIPAA Privacy & Security Rules
  • 35. Where is PHI Found? • Medical records • Patient information systems • Billing information (bills, receipts, EOBs, etc.) • Test results • X-rays • Clinic lists • Labels on IV bags • Patient menus
  • 36. Where is PHI Found? • Conversations • Telephone notes (in certain situations) • Patient information on a mobile device
  • 37. Privacy Permitted Uses and Disclosures of PHI Include: 1. Treatment of the patient •Direct patient care •Coordination of care •Consultations •Referrals to other health care providers 2. Payment of healthcare bills 3. Operations related to healthcare 4. Research when approved by an Institutional Review Board (IRB) 5. Required by law (e.g. subpoena, court order, etc.)
  • 38. Patient Rights 1. Right to Access • Any information contained in their medical and billing record 2. Right to Amend • Patients may request in writing, an amendment to their medical records if they feel it contains incorrect or incomplete information 3. Right to an Account of Un-Authorized Disclosures Patients have the right to receive a list of disclosures , other than for treatment, payment, or operations 4. Right to Request Special Communications Patients may ask the hospital to contact them via an alternative phone number or address
  • 39. Patient Rights (continued) 5. Right to Request Restrictions Patients may request not to be included (opt-out) in the directory. Patient information should not be shared with clergy, friends, or anyone 6. Right to Receive a Notice of Privacy Practices The Organisation is required to provide a written notice of how they will use and disclose patient health information 7. Right to File a Complaint Patients have the right to file a complaint without fear of retaliation
  • 40. Security Security refers directly to protection, and specifically to the means used to protect the privacy of health information and support professionals in holding that information in confidence. • When we protect patient data, we help build trust between patients and providers. • Ensure Protected Health Information (PHI) is not disclosed to unauthorized persons. • Do not send email containing Protected Health Information (PHI) unless it is encrypted. • Log off your computer if you have to leave your workstation. • If you suspect someone is using your login ID, you must report it immediately.
  • 41. • It is your responsibility to report incidents to your supervisor , Privacy Officer, if you suspect a patients Protected Health Information (PHI) might have been acquired, accessed, used or disclosed without authorization.
  • 42. • The Privacy, Confidentiality and Security Assessment Tool https://www.unaids.org/sites/default/files/media_asset/confidentiality_sec urity_assessment_tool_en.pdf • UGANDA MEDICAL AND DENTAL PRACTITIONERS COUNCIL(UMDPC) https://www.umdpc.com/Resources/Code%20of%20Professional%20Ethic s.pdf
  • 43. • More literature : A Primer on the Privacy, Security, and Confidentiality of Electronic Health Records by Manish Kumar, Samuel Wambugu (MEASURE Evaluation)
  • 44. Data Standards • Data standards encompasses methods, protocols, terminologies, and specifications for the collection, exchange, storage, and retrieval of information associated with health care applications, including medical records, medications, radiological images, payment and reimbursement, medical devices and monitoring systems, and administrative processes
  • 45. Standardizing health care data • Definition of data elements—determination of the data content to be collected and exchanged. Data Element Tag • A DICOM message can be visualized as a stream of data elements, where each element is made up of four data fields: element tag, optional value representation, value length and the value itself. • The Data Element Tag is a pair of 16-bit unsigned integers(xxxx xxxx) representing the group number and the element number.
  • 46. Examples of data element tags: • (0008,0020) Study Date • (0008,0030) Study Time • (0008,0060) Modality • (0010,0010) Patient’s Name • (0010,0020) Patient ID • (0028,0010) Number of pixel rows in the image • (0028,0011) Number of pixel columns in the image • (0038,001A) Scheduled admission date • (0038,001B) Scheduled admission time • The tags are identified by hexa-decimal number, and they can range from 0000 to FFFF. • They are always sorted in ascending order in a DICOM header to make it easily searchable
  • 47. • Data interchange formats—standard formats for electronically encoding the data elements • Terminologies—the medical terms and concepts used to describe, classify, and code the data elements and data expression languages and syntax that describe the relationships among the terms/concepts. • Knowledge Representation—standard methods for electronically representing medical literature, clinical guidelines, and the like for decision support.
  • 48. Three primary areas in which standards for health care data need to be developed • Data interchange • Terminologies • Knowledge representation
  • 49. Data Interchange Standards • These are needed for message format, document architecture, clinical templates, user interface, and patient data linkage. • Message Format Standards: These facilitate interoperability through the use of common encoding specifications, information models for defining relationships between data elements, document architectures, and clinical templates for structuring data as they are exchanged. • These include the Health Level Seven [HL7] Version 2.x [V2.x] series for clinical data messaging, Digital Imaging and Communications in Medicine [DICOM] for medical images, National Council for Prescription Drug Programs [NCPDP] Script for retail pharmacy messaging,
  • 50. • Health Level 7 (HL7) : This is the primary data interchange standard for clinical messaging and is presently adopted in 90 percent of large hospitals. • Logical Observation Identifiers, Names and Codes [LOINC] for reporting of laboratory results • Institute of Electrical and Electronics Engineers [IEEE] standards for medical devices e.g IEEE 802.16 – Wireless Networking,
  • 51. Terminologies • Standardized terminologies facilitate electronic data collection at the point of care; retrieval of relevant data, information, and knowledge (i.e., evidence); and data reuse for multiple purposes, such as automated surveillance, clinical decision support, and quality and cost monitoring. • To promote patient safety and enable quality management, standardized terminologies that represent the focus (e.g., medical diagnosis, nursing diagnosis, patient problem) and interventions of the variety of clinicians involved in health care as well as data about the patient (e.g., age, gender, ethnicity, severity of illness, preferences, functional status) are necessary
  • 52. SNOMED CT • This is the most well-developed concept-oriented terminology to date. A concept-oriented reference terminology can be defined as one that has such characteristics as a grammar that defines the rules for automated generation and classification of new concepts, as well as the combining of atomic concepts to form molecular expressions. • SNOMED CT is based on a formal terminology model that provides nonambiguous definitions of health care concepts and contains the most granular concepts for representing clinical and patient safety information
  • 53. • SNOMED CT is based on a formal terminology model that provides nonambiguous definitions of health care concepts and contains the most granular concepts for representing clinical and patient safety information. • SNOMED CT requires the support of additional terminologies to capture certain clinical data not currently available in the terminology with sufficient granularity or scope, namely laboratory, medication, and medical device data.
  • 54. LOINC • This is the terminology for representing laboratory test results . • LOINC is the available terminology that most fully represents laboratory data in terms of naming for tests (e.g., chemistry, hematology) and clinical observations (e.g., blood pressure, respiratory rate). The LOINC terms are composed of up to eight dimensions derived from component (e.g., analyte), type of property
  • 55. • LOINC is the terminology for representing laboratory test results and is a part of the NCVHS core terminology group • LOINC is the available terminology that most fully represents laboratory data in terms of naming for tests (e.g., chemistry, hematology) and clinical observations (e.g., blood pressure, respiratory rate).
  • 56. Publically Available Medical Image Repositories

Editor's Notes

  • #4: Extensible Markup Language (XML) is used to describe data. The XML standard is a flexible way to create information formats and electronically share structured data via the public Internet, as well as via corporate networks.
  • #6: Big data value chain: The chain provides a framework with which to examine how to bring disparate data together in an organized fashion and create valuable information that can inform decision making at the enterprise level.
  • #10: Synthesizing data: A method that uses statistical techniques to combine results from different studies and obtain a quantitative estimate of the overall effect of a particular intervention or variable on a defined outcome
  • #17: A data silo is a repository of fixed data that remains under the control of one department and is isolated from the rest of the organization, much like grain in a farm silo is closed off from outside elements. Data silos can have technical or cultural roots. 
  • #20: A relational database refers to a database that stores data in a structured format, using rows and columns. This makes it easy to locate and access specific values within the database. It is "relational" because the values within each table are related to each other
  • #24: Big data= Structured data+unstructured data
  • #27: This helps the organisation to achieve its mission of Exceptional Care. Without Exception
  • #30: Many of our patients are also our neighbors, our friends, and our co-workers. Maintaining their privacy is essential.
  • #35: Printed materials containing any of these identifiers should not be discarded in the trash. They should be either shredded or placed in locked recycling containers.
  • #41: Passwords are only effective if they are NEVER shared, and if the guidelines for creating a strong passwords are followed.
  • #46: Interchange standards can also include document architectures for structuring data elements as they are exchanged and information models that define the relationships among data elements in a message.
  • #51: Document Architecture: This is a method for representing electronically clinical data such as discharge summaries or progress notes and patient safety reports requires a standardized document architecture.
  • #56: National Committee on Vital and Health Statistics