SlideShare a Scribd company logo
Department of Computer Science
Knowledge Discovery From
Massive Healthcare Claims
Data
Varun Chandola, Sreenivas Sukumar, Jack Schryver
Presented by Anatoli Shein
(aus4@pitt.edu)
Department of Computer Science
Motivation: US health care
7/12/2023
Anatoli Shein
2008: 15.2% of GDP
2017: 19.5% of GDP
Department of Computer Science
Goal: Improve cost-care ratio
• Improve healthcare operations.
• Reduce fraud, waste, and abuse.
7/12/2023
Anatoli Shein
Department of Computer Science
Big Data Analytics in HealthCare
7/12/2023
Anatoli Shein
Department of Computer Science
Big Data in HealthCare Categorized
7/12/2023
Anatoli Shein
Department of Computer Science
Data quality and availability
• Clinical Data, Behavior data, and
Pharmaceutical Data:
–Useful but unavailable
7/12/2023
Anatoli Shein
Department of Computer Science
Data quality and availability
• Health insurance Data
– Available but needs preparation
7/12/2023
Anatoli Shein
Department of Computer Science
State of the Art Analytics for Massive
HealthCare Data:
• Network analysis
• Text mining
• Temporal analysis
• Higher order feature construction
7/12/2023
Anatoli Shein
Department of Computer Science
Health Insurance
• 85% of Americans have it
• It’s data is stored to :
– Track payments
– Address fraud
7/12/2023
Anatoli Shein
Department of Computer Science
Health Insurance Data Model
• Fee-for-service model
• Provider -> Service -> Patient -> Cost ->
Justification -> Payor
7/12/2023
Anatoli Shein
Department of Computer Science
Data Maintained for Operation
• Claims information
• Patient enrollment and eligibility
• Provider enrollment
7/12/2023
Anatoli Shein
Department of Computer Science
Challenges and Opportunities
• Fraud
• Waste
• Abuse
7/12/2023
Anatoli Shein
Department of Computer Science
Fraud
• Billing for not provided services
• Large scale fraud
7/12/2023
Anatoli Shein
Department of Computer Science
Waste
• Improper payments
– Double payments
– Duplicate claims
– Outdated fee schedule
7/12/2023
Anatoli Shein
Department of Computer Science
Abuse
• Prospective payment system
• Upcoding
7/12/2023
Anatoli Shein
Department of Computer Science
Data Used
• Claims data (48 million beneficiaries in the US) from
transactional data warehouses
• Provider enrollment data (from private organizations)
• Fraudulent providers (from Office of Inspector
General’s exclusion)
– The rest are treated as non-fraudulent
7/12/2023
Anatoli Shein
Department of Computer Science
Claims Data
7/12/2023
Anatoli Shein
Department of Computer Science
Analysis
• Identification of typical treatment
profiles
• Identification of costly areas
7/12/2023
Anatoli Shein
Department of Computer Science
Text Analysis, profile building
• Apache Mahout
• Hadoop Based technology
– Map Reduce
7/12/2023
Anatoli Shein
Department of Computer Science
Entities as Documents
• Document-term matrixes
– P(providers)
– B(beneficiaries)
– C(procedures)
– G(diagnoses)
– D(drugs)
• Ex: PG (providers/diagnoses)
7/12/2023
Anatoli Shein
Department of Computer Science
7/12/2023
Anatoli Shein
Department of Computer Science
Interesting find
• Some seemingly different diagnosis
codes got grouped to the same topics
– Ex: Diabetes and Dermatoses
7/12/2023
Anatoli Shein
Department of Computer Science
Social Network Analysis
• Estimate the risk of a provider fraud
before making any claims by
constructing social network
7/12/2023
Anatoli Shein
Department of Computer Science
Provider Network
7/12/2023
Anatoli Shein
Department of Computer Science
Texas Provider Network
7/12/2023
Anatoli Shein
Department of Computer Science
Extracting Features from Provider Network
7/12/2023
Anatoli Shein
Department of Computer Science
Information complexity measure
• Most distinguishing features showed to be:
– Node degree
– Number of fraudulent providers in 2-hop network
– Eigenvector centrality
– Current-flow closeness centrality
7/12/2023
Anatoli Shein
Department of Computer Science
7/12/2023
Anatoli Shein
Department of Computer Science
Temporal Feature Construction
• By looking at provider data over time we
can find anomalies
• Increase in number of patients
• Taking patients with conditions different
from their past profiles
7/12/2023
Anatoli Shein
Department of Computer Science
Fraudulent Provider Detection
7/12/2023
Anatoli Shein
Department of Computer Science
Conclusions
• Introduced domain of “big” healthcare claims data
• Analyzed health care claims data on a country level
using state of art analytics for massive data
• Problem was transformed to well known analysis
problems in the data mining community
• Several approaches presented for identifying fraud,
waste and abuse
7/12/2023
Anatoli Shein
Department of Computer Science
• Thank you.
• Questions?
7/12/2023
Anatoli Shein

More Related Content

PDF
Data Science Unit 01 PPT - SPPU Sem 6.pdf
PPTX
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
PPTX
What Does Responsible Data Science Mean?
PPTX
BSCD SOFTHWARE ENGINERING TECHNOLGYSDFJANF
PPTX
Lec 1&2.pptxúdjdjdxnnxnxnxxnxjxjxjdjjeueiidid
PPTX
Augmented Personalized Health: using AI techniques on semantically integrated...
PPTX
dissertation proposal writing service
PPTX
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
Data Science Unit 01 PPT - SPPU Sem 6.pdf
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
What Does Responsible Data Science Mean?
BSCD SOFTHWARE ENGINERING TECHNOLGYSDFJANF
Lec 1&2.pptxúdjdjdxnnxnxnxxnxjxjxjdjjeueiidid
Augmented Personalized Health: using AI techniques on semantically integrated...
dissertation proposal writing service
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...

Similar to seminarShein.ppt (20)

PPT
1. Health Informatics.ppt
PDF
Information Criteria And Statistical Modeling Sadanori Konishi
PDF
Sun==big data analytics for health care
PPTX
Big Data and Data Science: Opportunities for Biomedical Engineering
PPTX
Data Responsibly: The next decade of data science
PDF
Data Science Training and Workforce Development
PDF
PDF
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
PDF
Distributed Trust Architecture: The New Reality of ML-based Systems
PDF
Big Data in Healthcare and Medical Devices
PPTX
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
PDF
Kun-Hsing Yu "AI vs MD: Will Machines Replace Doctors?"
PDF
hariri2019.pdf
PPTX
An Introduction to Data Science.pptx learn
PPTX
Chain Event: Intro - Sean Manion
PDF
Secinaro et al-2021-bmc_medical_informatics_and_decision_making
PDF
ECA Community Meetup - Ethical data collection and use with Citizen Science
PDF
IRJET- Role of Different Data Mining Techniques for Predicting Heart Disease
PDF
Big data analytics and internet of things for personalised healthcare: opport...
PPTX
EHR guidelines from CDC SS-08_WILLIAMSON_GUGERTY.pptx
1. Health Informatics.ppt
Information Criteria And Statistical Modeling Sadanori Konishi
Sun==big data analytics for health care
Big Data and Data Science: Opportunities for Biomedical Engineering
Data Responsibly: The next decade of data science
Data Science Training and Workforce Development
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Distributed Trust Architecture: The New Reality of ML-based Systems
Big Data in Healthcare and Medical Devices
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Kun-Hsing Yu "AI vs MD: Will Machines Replace Doctors?"
hariri2019.pdf
An Introduction to Data Science.pptx learn
Chain Event: Intro - Sean Manion
Secinaro et al-2021-bmc_medical_informatics_and_decision_making
ECA Community Meetup - Ethical data collection and use with Citizen Science
IRJET- Role of Different Data Mining Techniques for Predicting Heart Disease
Big data analytics and internet of things for personalised healthcare: opport...
EHR guidelines from CDC SS-08_WILLIAMSON_GUGERTY.pptx

Recently uploaded (20)

PPTX
Module5_Session1 (mlzrkfbbbbbbbbbbbz1).pptx
PDF
The Right Social Media Strategy Can Transform Your Business
PDF
Truxton Capital: Middle Market Quarterly Review - August 2025
PPT
features and equilibrium under MONOPOLY 17.11.20.ppt
PDF
Principal of magaement is good fundamentals in economics
PDF
THE EFFECT OF FOREIGN AID ON ECONOMIC GROWTH IN ETHIOPIA
PDF
Financial discipline for educational purpose
PDF
How to join illuminati agent in Uganda Kampala call 0782561496/0756664682
PDF
GVCParticipation_Automation_Climate_India
PPTX
Q1 PE AND HEALTH 5 WEEK 5 DAY 1 powerpoint template
DOCX
BUSINESS PERFORMANCE SITUATION AND PERFORMANCE EVALUATION OF FELIX HOTEL IN H...
PDF
2012_The dark side of valuation a jedi guide to valuing difficult to value co...
PDF
2018_Simulating Hedge Fund Strategies Generalising Fund Performance Presentat...
PPTX
PROFITS AND GAINS OF BUSINESS OR PROFESSION 2024.pptx
PDF
Fintech Regulatory Sandbox: Lessons Learned and Future Prospects
PDF
Buy Verified Stripe Accounts for Sale - Secure and.pdf
PPTX
Group Presentation Development Econ and Envi..pptx
PDF
Lundin Gold Corporate Presentation August 2025
PPTX
PPT-Lesson-2-Recognize-a-Potential-Market-2-3.pptx
PPTX
lesson in englishhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Module5_Session1 (mlzrkfbbbbbbbbbbbz1).pptx
The Right Social Media Strategy Can Transform Your Business
Truxton Capital: Middle Market Quarterly Review - August 2025
features and equilibrium under MONOPOLY 17.11.20.ppt
Principal of magaement is good fundamentals in economics
THE EFFECT OF FOREIGN AID ON ECONOMIC GROWTH IN ETHIOPIA
Financial discipline for educational purpose
How to join illuminati agent in Uganda Kampala call 0782561496/0756664682
GVCParticipation_Automation_Climate_India
Q1 PE AND HEALTH 5 WEEK 5 DAY 1 powerpoint template
BUSINESS PERFORMANCE SITUATION AND PERFORMANCE EVALUATION OF FELIX HOTEL IN H...
2012_The dark side of valuation a jedi guide to valuing difficult to value co...
2018_Simulating Hedge Fund Strategies Generalising Fund Performance Presentat...
PROFITS AND GAINS OF BUSINESS OR PROFESSION 2024.pptx
Fintech Regulatory Sandbox: Lessons Learned and Future Prospects
Buy Verified Stripe Accounts for Sale - Secure and.pdf
Group Presentation Development Econ and Envi..pptx
Lundin Gold Corporate Presentation August 2025
PPT-Lesson-2-Recognize-a-Potential-Market-2-3.pptx
lesson in englishhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

seminarShein.ppt

  • 1. Department of Computer Science Knowledge Discovery From Massive Healthcare Claims Data Varun Chandola, Sreenivas Sukumar, Jack Schryver Presented by Anatoli Shein (aus4@pitt.edu)
  • 2. Department of Computer Science Motivation: US health care 7/12/2023 Anatoli Shein 2008: 15.2% of GDP 2017: 19.5% of GDP
  • 3. Department of Computer Science Goal: Improve cost-care ratio • Improve healthcare operations. • Reduce fraud, waste, and abuse. 7/12/2023 Anatoli Shein
  • 4. Department of Computer Science Big Data Analytics in HealthCare 7/12/2023 Anatoli Shein
  • 5. Department of Computer Science Big Data in HealthCare Categorized 7/12/2023 Anatoli Shein
  • 6. Department of Computer Science Data quality and availability • Clinical Data, Behavior data, and Pharmaceutical Data: –Useful but unavailable 7/12/2023 Anatoli Shein
  • 7. Department of Computer Science Data quality and availability • Health insurance Data – Available but needs preparation 7/12/2023 Anatoli Shein
  • 8. Department of Computer Science State of the Art Analytics for Massive HealthCare Data: • Network analysis • Text mining • Temporal analysis • Higher order feature construction 7/12/2023 Anatoli Shein
  • 9. Department of Computer Science Health Insurance • 85% of Americans have it • It’s data is stored to : – Track payments – Address fraud 7/12/2023 Anatoli Shein
  • 10. Department of Computer Science Health Insurance Data Model • Fee-for-service model • Provider -> Service -> Patient -> Cost -> Justification -> Payor 7/12/2023 Anatoli Shein
  • 11. Department of Computer Science Data Maintained for Operation • Claims information • Patient enrollment and eligibility • Provider enrollment 7/12/2023 Anatoli Shein
  • 12. Department of Computer Science Challenges and Opportunities • Fraud • Waste • Abuse 7/12/2023 Anatoli Shein
  • 13. Department of Computer Science Fraud • Billing for not provided services • Large scale fraud 7/12/2023 Anatoli Shein
  • 14. Department of Computer Science Waste • Improper payments – Double payments – Duplicate claims – Outdated fee schedule 7/12/2023 Anatoli Shein
  • 15. Department of Computer Science Abuse • Prospective payment system • Upcoding 7/12/2023 Anatoli Shein
  • 16. Department of Computer Science Data Used • Claims data (48 million beneficiaries in the US) from transactional data warehouses • Provider enrollment data (from private organizations) • Fraudulent providers (from Office of Inspector General’s exclusion) – The rest are treated as non-fraudulent 7/12/2023 Anatoli Shein
  • 17. Department of Computer Science Claims Data 7/12/2023 Anatoli Shein
  • 18. Department of Computer Science Analysis • Identification of typical treatment profiles • Identification of costly areas 7/12/2023 Anatoli Shein
  • 19. Department of Computer Science Text Analysis, profile building • Apache Mahout • Hadoop Based technology – Map Reduce 7/12/2023 Anatoli Shein
  • 20. Department of Computer Science Entities as Documents • Document-term matrixes – P(providers) – B(beneficiaries) – C(procedures) – G(diagnoses) – D(drugs) • Ex: PG (providers/diagnoses) 7/12/2023 Anatoli Shein
  • 21. Department of Computer Science 7/12/2023 Anatoli Shein
  • 22. Department of Computer Science Interesting find • Some seemingly different diagnosis codes got grouped to the same topics – Ex: Diabetes and Dermatoses 7/12/2023 Anatoli Shein
  • 23. Department of Computer Science Social Network Analysis • Estimate the risk of a provider fraud before making any claims by constructing social network 7/12/2023 Anatoli Shein
  • 24. Department of Computer Science Provider Network 7/12/2023 Anatoli Shein
  • 25. Department of Computer Science Texas Provider Network 7/12/2023 Anatoli Shein
  • 26. Department of Computer Science Extracting Features from Provider Network 7/12/2023 Anatoli Shein
  • 27. Department of Computer Science Information complexity measure • Most distinguishing features showed to be: – Node degree – Number of fraudulent providers in 2-hop network – Eigenvector centrality – Current-flow closeness centrality 7/12/2023 Anatoli Shein
  • 28. Department of Computer Science 7/12/2023 Anatoli Shein
  • 29. Department of Computer Science Temporal Feature Construction • By looking at provider data over time we can find anomalies • Increase in number of patients • Taking patients with conditions different from their past profiles 7/12/2023 Anatoli Shein
  • 30. Department of Computer Science Fraudulent Provider Detection 7/12/2023 Anatoli Shein
  • 31. Department of Computer Science Conclusions • Introduced domain of “big” healthcare claims data • Analyzed health care claims data on a country level using state of art analytics for massive data • Problem was transformed to well known analysis problems in the data mining community • Several approaches presented for identifying fraud, waste and abuse 7/12/2023 Anatoli Shein
  • 32. Department of Computer Science • Thank you. • Questions? 7/12/2023 Anatoli Shein

Editor's Notes

  • #10: Address economic challenges. Strong analytic insight into healthcare.