SlideShare a Scribd company logo
1
From Mining To Analytics
Making Sense of Medicare Data
2
Tripfilms.com
3
The Achievement Network
4
Archway Health Advisors
Medicare
• Centers for Medicare & Medicaid Services (CMS)
• Medicare is a national social insurance program since
1966 covering Americans aged 65 and older.
• “Fee For Service” Model
5
Medicare: Fee for Service
6
Hospital SNF
CMS
ClaimClaimClaimClaims
ClaimClaimClaimClaims
ClaimClaimClaimClaims
ClaimClaimClaimClaims
Home HealthHospital
(Readmit)
3 days 18 days 2 days 12 visits
$3,000 $12,000 $5,000 $4,000
= $24,000
Bundled Payments for Care
• Better care, smarter spending, and healthier people
• The Bundled Payments for Care Improvement (BPCI)
Payment arrangements
Based on financial and performance
• 4 Models:
Model 1: Retrospective Acute Care Hospital Stay Only
Model 2: Retrospective Acute Care Hospital Stay And Post-
acute care
Model 3: Retrospective Post-Acute Care Only
Model 4: Acute Care Hospital Stay Only
7
Medicare: BPCI
8
Hospital SNF
CMS
ClaimClaimClaimClaims
ClaimClaimClaimClaims
Home HealthHospital
(Readmit)
3 days 18 days
Episode
$20,000
$20,000 - ($3,000 + $12,000) = $5,000
$3,000 $12,000
Claims
• Statement of services and costs from a healthcare
provider
Patient information
Diagnosis information
Procedure(s) information
• Types:
Hospital (Inpatient) Claims
Skilled Nursing Facility (SNF) Claims
Home Health Agency (HHA) Claims
…
9
10
Episode of Care
SNF #1
SNF #2
IP #1
IP #2
SNF #2
IP #3
HHA #1
HHA #2
time
Mining Claims
11
IP SNF HHAIP IP IP IP SNF SNF
IP SNF HHAIP IP IP IP SNF SNF
IP SNF HHAIP IP IP IP SNF SNF
IP SNF HHAIP IP IP IP SNF SNF
IP SNF HHAIP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital
Stays
Post-Acute
Care
Anchor
Episode
Initiator
Technologies considered
• Pig (& Hadoop)
Data Processing Language
Procedural Language
Relational-oriented
• SAS
Business Analytics & BI Software
De-facto standard in Healthcare Industry
Proprietary
12
HPCC Systems
Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT/082311
What is HPCC Systems
14
• Open Source distributed data-intensive computing platform
• Provides end-to-end Big Data workflow management , scheduler,
integration tools, etc
• Runs on commodity computing/storage nodes
• Binary packages available for the most common Linux distributions
• Originally designed circa 1999 (predates the original paper on
MapReduce from Dec. ‘04)
• Improved over a decade of real-world Big Data analytics
• In use across critical production environments throughout
LexisNexis for more than 10 years
WHT/082311
The HPCC Systems platform
15
WHT/082311
• Massively Parallel data processing engine
• Enables data integration on a scale not previously available
• Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) • A massively parallel, high throughput, query engine
• Low latency, highly concurrent and highly available
• Several advanced strategies for efficient retrieval
• Programmable using ECL
Enterprise Control Language (ECL) • An easy to use, declarative data-centric programming
language optimized for large-scale data management
and query processing
• Highly efficient; automatically distributes workload
across all nodes; compiles to native machine code.
• Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion: End to End platform • No need for any third party tools
16
WHT/082311
• Declarative programming language: Describe what needs
to be done and not how to do it
• Powerful: High level data activities like JOIN, TRANSFORM,
PROJECT, SORT, DISTRIBUTE, MAP, etc. are available.
• Extensible: Modular and extensible, it can shape itself to
adapt to the type of problem at hand
• Implicitly parallel: Parallelism is built into the underlying
platform. The programmer needs not be concerned with
data partitioning and parallelism
• Maintainable: High level programming language, without
side effects and with efficient encapsulation; programs are
more succinct, reliable and easier to troubleshoot
• Complete: ECL provides a complete data programming
paradigm
• Homogeneous: One language to express data algorithms
across the entire HPCC Systems platform: data integration,
analytics and high speed delivery
• Polyglottic: ECL supports the embedding of other
languages such as Java, Python, R, SQL, and more
Enterprise Control Language (ECL)
17
WHT/082311
Current Status and Resources
• HPCCSystems.com – Tutorials, Docs, Platform distributions, and more
• Latest release 5.2.0 adds many new features and improvements
• Drastic GUI improvements
• Ganglia and Nagios plug-in for system monitoring and alerting
• Security Enhancements – tighter authentication measures, intra-
component communication encryption
• Embedded Languages – Cassandra support, memcache and redis
access
• JSON based data support
• Dynamic ESDL – Provides simple middleware/back-end interface
definition
• JAVA API project – facilitates interaction between Java based apps and
HPCC web services and c++ tools
• Available now – HPCCSystems.com
Data mining with HPCC Systems
• Thor
Responsible for processing vast amount of data
Optimized for Extraction, Transformation, Loading,
Sorting and Linking Data
• ECL
Declarative
More Data Centric
Fast & Implicitly Parallel
Inline data
Unit Tests in ECL
19
20
SQL vs ECL
SELECT
diag_group_cd,
COUNT(*) as volume
SUM(pmt_amt) as costs
FROM
inpatient_claims
GROUP BY
diag_group_cd;
TABLE(
inpatient_claims,
{
diag_group_cd;
INTEGER volume :=
COUNT(GROUP);
REAL costs := SUM(pmt_amt);
},
diag_group_cd
);
SQL ECL
SELECT
*
FROM
inpatient_claims LEFT
JOIN ip_value_codes
RIGHT
ON LEFT.id = RIGHT.id
JOIN(
inpatient_claims,
ip_value_codes,
LEFT.id = RIGHT.id
);
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR
SELECT * FROM inpatient_claims;
OPEN my_cursor
FETCH NEXT FROM my_cursor
INTO @…, @…
WHILE @@FETCH_STATUS = 0
BEGIN
…
END
CLOSE my_cursor;
DEALLOCATE my_cursor;
ITERATE(
inpatient_claims,
TRANSFORM(inpatient_claim_layout,
SELF.is_dropped :=
is_one_year_or_greater(
RIGHT.admsn_dt,
RIGHT.dschrgdt);
SELF := RIGHT;
)
);
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
TxLEFT RIGHT
RB R4 R6R5
ROLLUP( dataset, condition(LEFT, RIGHT), transformation(LEFT, RIGHT) )
Processing Claims
1. The intent here is to make the series of interim claims look like a single claim for
most purposes, where the admission date of the first claim becomes the
admission date of the whole claim and the discharge date of the last claim in the
series becomes the discharge date of the whole claim.
2. 􏰂􏰂The admission date from the first series in the claim and the discharge date
from the last series in the claim define the length of the stay.
3. 􏰂􏰂The MS-DRG from the last claim in the single stay (the discharge MS-DRG)
determines whether the hospital stay becomes an anchor record, or whether the
stay is included/excluded as a readmission for an existing episode.
4. 􏰂􏰂Costs across all IP claims included in the single stay are aggregated to the stay
level.
5. Claims where the last in the series of claims has patient (…) [as “still a patient”,
not discharged], flag these and drop all of the claims in the series from the IP
hospital stay file.
23
Processing Claims With ECL
H_1 := SORT( A , bene_sk, provider, admsn_dt, dschrgdt, thru_dt);
H_2 := ROLLUP(H_1,
is_interim(LEFT, RIGHT),
merge_interim_claims(LEFT, RIGHT));
H_3 := JOIN(H_2, H_1, LEFT.bene_sk = RIGHT.bene_sk […], RIGHT
ONLY);
H_4 := PROJECT(H_3, TRANSFORM(BPCI.Layouts.ip_claim_etl_layout,
SELF.is_dropped := TRUE;
SELF.dropped_reason_code :=
BPCI.Layouts.DROPPED_REASON_CODES.InterimClaim;
SELF := LEFT;
));
H := H_2 + H_4;
24
25
Template Language
EXPORT load_all_client_files(pId, pFileSet, pBaseDataDirectory) := MACRO
LOADXML(pFileSet);
baseDataDirectory := pBaseDataDirectory + pId + '/';
#FOR(folder)
#UNIQUENAME(subId)
%subId% := %''%;
#UNIQUENAME(subDS)
%subDS% := Client.Datasets(%subId%);
[...]
#UNIQUENAME(id)
%id% := pId + '::' + %''%;
#UNIQUENAME(dataDir)
%dataDir% := %baseDataDirectory% + %''% + '/';
#UNIQUENAME(etl)
%etl% := Client.ETL(%dataDir%, %id%);
%etl%.run();
#END
ENDMACRO;
26
Template Language
file_set := ’<folders>’ +
'<folder>M201409</folder>' +
'<folder>M201410</folder>' +
'<folder>M201411</folder>' +
'<folder>M201412</folder>' +
'<folder>M201501</folder>' +
'<folder>M201502</folder>' +
'<folder>M201503</folder>' +
‘</folders>’;
load_all_client_files(1234, file_set, ‘/volume1/data/‘);
Beyond Processing Data
• Security & Authentication
• Collaboration
• Unit Tests
• Visualizations
27
Beyond Processing: Security
• HTTPS
• Htpasswd
• LDAP support
• File level security when using LDAP
28
Beyond Processing: Workunits
• Workunit Identifier
• Attribution
• Query
• Timings
• Results
29
30
Beyond Processing: Collaboration
31
Beyond Processing: Collaboration (2)
32
Beyond Processing: Collaboration
33
Beyond Processing: Collaboration
34
Beyond Processing: Unit Tests
interim_claims := MODULE
// Test Data
test_set :=
BPCI.Test.Samples.ip_claim(
bene_id := 1, claim_id := 1, pmt_amt := 3042.0, ...)
+ BPCI.Test.Samples.ip_claim(
bene_id := 1, claim_id := 2, pmt_amt := 11409.0, ...)
+ ....
;
...
EXPORT Actual := Step2.ip_stays;
SHARED TestSuite := MODULE
EXPORT Test01 := ASSERT(oActual(NOT is_dropped), claimno IN [1,2], 'Did not filter ou
EXPORT Test02 := ASSERT(oActual(is_dropped), claimno IN [3,5]);
END;
EXPORT AllTests := TestSuite.Test01 + TestSuite.Test02;
END;
Beyond Processing: Unit Tests (2)
35
// Using inline dataset
simple_ip_claims := DATASET([
{1,1,'0','010001',20000201,'',20000120,200
00201,'61'},
], simplified_ip_layout);
ip_claims :=
Samples.ip_claims(simple_ip_claims);
// OR passing NAMED parameters
ip_claims := Samples.ip_claims2(
bene_id := 1,
claim_id := 1,
claim_type := '00'
)
simplified_ip_layout := RECORD
UNSIGNED bene_id;
UNSIGNED claim_id;
STRING claim_type;
STRING provider_number;
INTEGER4 through_date;
STRING status_code;
INTEGER4 admission_date;
INTEGER4 discharge_date;
STRING ms_drg_code;
END;
36
Beyond Processing: Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episodes
Costs in $1,000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episodes
Costs in $1,000
No readmit 1 Readmit 2 Readmits
Data Delivery: Roxie
• Data Delivery Engine
• Indexed, compressed and in-memory
• Data Warehouse Capabilities
• Data Services
40
Data Services
• Web Services over Data Warehouse
• XML/SOAP but also JSON
• Web Services defined in ECL
• Solution to add/remove data from the cluster
41
Data Service: Query
• Define service using ECL
42
INTEGER4 oOffset := 1 : STORED('Offset');
INTEGER4 oResults := 100 : STORED('Results');
INTEGER4 oStartDate := 20130201 : STORED('Begin_Date');
INTEGER4 oEndDate := 20140201 : STORED('End_Date');
...
oParams := DATASET([{ oOffset, oResults, oStartDate, oEndDate, … }],
Layouts.service_parameters_layout);
T(DATASET(RECORDOF(Datasets.dsFactEpisodeCostsIndex)) pData) := FUNCTION
RETURN TABLE(pData,
{
STRING bpid := pData.bpid;
UNSIGNED INTEGER1 model := pData.model;
UNSIGNED INTEGER1 post_dsch_prd_length := pData.post_dsch_prd_length;
INTEGER8 total_episodes := COUNT(GROUP);
DECIMAL15_2 total_costs := SUM(GROUP, sum_post_dsch_prd_pay);
DECIMAL15_2 average_costs := AVE(GROUP, sum_post_dsch_prd_pay);
DECIMAL15_2 std_dev_costs := SQRT(VARIANCE(GROUP, sum_post_dsch_prd_pay));
}, bpid, model, post_dsch_prd_length);
END;
ReportServices.BaseService.run_it('Summary', oParameters, T, bpid);
43
Data (Web) Services
{
"summary": {
"offset": 1,
"results": 10,
"begin_date": 20130101,
"end_date": 20130201,
…
}
}
{
"summaryResponse": {
…
"Results": {
…
"Summary": {
"Row": [
{
"bpid": "9999",
"model": 2,
"post_dsch_prd_length": 90,
"total_episodes": 987,
"total_costs": 9876543.21,
"average_costs": 12358.13
…
}
]
…
Request Response
https://.../WsEcl/forms/json/query/roxie/summary
44
Data Services: WsECL
45
Data Services: WsECL
Loading up data
• Logical vs Physical
~abc::subfolder::subsubfolder::myfile
/abc/subfolder/subsubfolder/myfile
• ECL to load data into cluster:
46
oDS := DATASET(
std.File.ExternalLogicalFilename('172.0.0.1','/var/lib/.../myfile.csv'),
Layouts.ip_claim_layout,
CSV(HEADING(0)) );
oDSDistributed := DISTRIBUTE(oDS, bene_id);
OUTPUT(oDSDistributed,, ‘~somewhere::over::here::myfile’, OVERWRITE);
oDS := DATASET(‘~somewhere::over::here::myfile’, Layouts.ip_claim_layout);
• ECL to use data loaded into cluster:
SuperFiles
• Super File = Symbolic link, list of sub-files
• Each sub-file must have the same layout
47
WEBLOGS_FILE := ‘~somewhere::logs::web’
Std.File.CreateSuperFile(WEBLOGS_FILE);
…
run_report() := FUNCTION
oDS := DATASET(WEBLOGS_FILE, Layouts.weblogs_layout, CSV);
RETURN TABLE(oDS, { ip_address; COUNT(GROUP); }, ip_address );
END;
SEQUENTIAL(
Std.File.StartSuperFileTransaction(),
Std.File.AddSuperFile(WEBLOGS_FILE, ‘~somewhere::logs::web::2015::04::01’),
Std.File.FinishSuperFileTransaction()
);
• Including (more) data:
48
Data Services: Reusability
EXPORT run_it( pServiceName, pParams, pReportFunction, pSortByField) := MACRO
// Filtering data based on parameters
#UNIQUENAME(DS);
%DS% := WS.Datasets.dsFactEpisodeCostsIndex;
[…]
#UNIQUENAME(B)
%B% := IF(COUNT(pParams[1].providers) = 0, %A%, %A%(provider_id IN pParams[1].providers));
#UNIQUENAME(C)
%C% := IF(COUNT(pParams[1].npis) = 0, %B%, %B%(at_npi IN pParams[1].npis OR op_npi IN pParams[
[…]
#UNIQUENAME(report)
%report% := pReportFunction(%K%);
#UNIQUENAME(sorted)
%sorted% := SORT(%report%, pSortByField);
#UNIQUENAME(O1)
%O1% := OUTPUT(pParameters, NAMED('Request'));
oSummary := DATASET([{ COUNT(%sorted%) }], WS.Layouts.service_summary_layout);
#UNIQUENAME(O2)
%O2% := OUTPUT(oSummary, NAMED(‘Metadata'));
#UNIQUENAME(O3)
%O3% := OUTPUT(
CHOOSEN(%sorted%, pParams[1].results, pParams[1].offset),
NAMED(pServiceName), ALL);
PARALLEL(%O1%, %O2%, %O3%);
ENDMACRO;
49
AHA System Architecture
50
Archway Analytics
51
lpezet@archwayha.com
www.linkedin.com/in/lucpezet
mezzetin.blogspot.com
HPCC Systems open source portal:
http://hpccsystems.com
Thank you
Questions? Feedback?
Questions ? Feedback ?
www.linkedin.com/in/lucpezet

More Related Content

KEY
Ljudkort
PPT
Activity 1
PPTX
Presentation1
PPTX
Presentación inglés numbers
PPT
загальношкільні заходи
PDF
Supply7
PPTX
Grab the Latest Offer
PDF
Партнерский договор LR с физическим лицом_12.15
Ljudkort
Activity 1
Presentation1
Presentación inglés numbers
загальношкільні заходи
Supply7
Grab the Latest Offer
Партнерский договор LR с физическим лицом_12.15

Viewers also liked (17)

DOCX
Khóa học lập trình ios
PPTX
Cisco and SUSE Linux: The perfect platform for SAP
PDF
Supply7 overview
PDF
Portafolio clau (1)
PDF
Moz public invst-agriculture
PPTX
Challenges and opportunities of the Mexican Space Agency
PDF
By CMS COMPUTER
PDF
E2 d3 detailed description
PPTX
Storyboards
DOC
PPTX
Html workshop 1
PPTX
Character Profile
PDF
Dokumen standard bahasa tamil sk tahap 1
PPTX
Advocates and Activities
RTF
Traumatic brain injury
DOCX
User guide
PPTX
Teaser Trailer Analysis
Khóa học lập trình ios
Cisco and SUSE Linux: The perfect platform for SAP
Supply7 overview
Portafolio clau (1)
Moz public invst-agriculture
Challenges and opportunities of the Mexican Space Agency
By CMS COMPUTER
E2 d3 detailed description
Storyboards
Html workshop 1
Character Profile
Dokumen standard bahasa tamil sk tahap 1
Advocates and Activities
Traumatic brain injury
User guide
Teaser Trailer Analysis
Ad

Similar to Making Sense of Medicare Data: From Mining to Analytics (20)

PDF
Presentation at Wright State University
PDF
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
DOC
Richard p fhir
DOC
Richard p fhir
PPTX
Flexible EDI Solutions for the SMB Market
PDF
Hive 3 a new horizon
DOC
MICHAEL SHEFFER ETL CA
DOC
richard_p_Integration
PPTX
Hive 3 - a new horizon
PDF
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
PPTX
HL7 Survival Guide - Chapter 3 - The Heart of the Matter: Data Formats, Workf...
PDF
HPCC Systems Presentation to TDWI Chicago Chapter
DOC
Naman_Abinitio_7757021406
PPTX
: HL7 Survival Guide - Chapter 7 – Gap Analysis
PDF
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
DOC
Naresh
PDF
Fast SQL on Hadoop, really?
PPT
Towards the Implementation of an openEHR-based Open Source EHR Platform (a vi...
PDF
HPC Advisory Council – Stanford Conference 2018
PPTX
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Presentation at Wright State University
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
Richard p fhir
Richard p fhir
Flexible EDI Solutions for the SMB Market
Hive 3 a new horizon
MICHAEL SHEFFER ETL CA
richard_p_Integration
Hive 3 - a new horizon
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
HL7 Survival Guide - Chapter 3 - The Heart of the Matter: Data Formats, Workf...
HPCC Systems Presentation to TDWI Chicago Chapter
Naman_Abinitio_7757021406
: HL7 Survival Guide - Chapter 7 – Gap Analysis
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Naresh
Fast SQL on Hadoop, really?
Towards the Implementation of an openEHR-based Open Source EHR Platform (a vi...
HPC Advisory Council – Stanford Conference 2018
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Ad

More from HPCC Systems (20)

PPTX
Natural Language to SQL Query conversion using Machine Learning Techniques on...
PPT
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
PPTX
Towards Trustable AI for Complex Systems
PPTX
Welcome
PPTX
Closing / Adjourn
PPTX
Community Website: Virtual Ribbon Cutting
PPTX
Path to 8.0
PPTX
Release Cycle Changes
PPTX
Geohashing with Uber’s H3 Geospatial Index
PPTX
Advancements in HPCC Systems Machine Learning
PPTX
Docker Support
PPTX
Expanding HPCC Systems Deep Neural Network Capabilities
PPTX
Leveraging Intra-Node Parallelization in HPCC Systems
PPTX
DataPatterns - Profiling in ECL Watch
PPTX
Leveraging the Spark-HPCC Ecosystem
PPTX
Work Unit Analysis Tool
PPTX
Community Award Ceremony
PPTX
Dapper Tool - A Bundle to Make your ECL Neater
PPTX
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
PPTX
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Towards Trustable AI for Complex Systems
Welcome
Closing / Adjourn
Community Website: Virtual Ribbon Cutting
Path to 8.0
Release Cycle Changes
Geohashing with Uber’s H3 Geospatial Index
Advancements in HPCC Systems Machine Learning
Docker Support
Expanding HPCC Systems Deep Neural Network Capabilities
Leveraging Intra-Node Parallelization in HPCC Systems
DataPatterns - Profiling in ECL Watch
Leveraging the Spark-HPCC Ecosystem
Work Unit Analysis Tool
Community Award Ceremony
Dapper Tool - A Bundle to Make your ECL Neater
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Foundation of Data Science unit number two notes
PDF
Mega Projects Data Mega Projects Data
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Quality review (1)_presentation of this 21
PDF
annual-report-2024-2025 original latest.
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Lecture1 pattern recognition............
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
IB Computer Science - Internal Assessment.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to machine learning and Linear Models
Business Ppt On Nestle.pptx huunnnhhgfvu
Miokarditis (Inflamasi pada Otot Jantung)
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Foundation of Data Science unit number two notes
Mega Projects Data Mega Projects Data
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Qualitative Qantitative and Mixed Methods.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Quality review (1)_presentation of this 21
annual-report-2024-2025 original latest.
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Clinical guidelines as a resource for EBP(1).pdf
Lecture1 pattern recognition............
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx

Making Sense of Medicare Data: From Mining to Analytics

  • 1. 1 From Mining To Analytics Making Sense of Medicare Data
  • 5. Medicare • Centers for Medicare & Medicaid Services (CMS) • Medicare is a national social insurance program since 1966 covering Americans aged 65 and older. • “Fee For Service” Model 5
  • 6. Medicare: Fee for Service 6 Hospital SNF CMS ClaimClaimClaimClaims ClaimClaimClaimClaims ClaimClaimClaimClaims ClaimClaimClaimClaims Home HealthHospital (Readmit) 3 days 18 days 2 days 12 visits $3,000 $12,000 $5,000 $4,000 = $24,000
  • 7. Bundled Payments for Care • Better care, smarter spending, and healthier people • The Bundled Payments for Care Improvement (BPCI) Payment arrangements Based on financial and performance • 4 Models: Model 1: Retrospective Acute Care Hospital Stay Only Model 2: Retrospective Acute Care Hospital Stay And Post- acute care Model 3: Retrospective Post-Acute Care Only Model 4: Acute Care Hospital Stay Only 7
  • 8. Medicare: BPCI 8 Hospital SNF CMS ClaimClaimClaimClaims ClaimClaimClaimClaims Home HealthHospital (Readmit) 3 days 18 days Episode $20,000 $20,000 - ($3,000 + $12,000) = $5,000 $3,000 $12,000
  • 9. Claims • Statement of services and costs from a healthcare provider Patient information Diagnosis information Procedure(s) information • Types: Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims … 9
  • 10. 10 Episode of Care SNF #1 SNF #2 IP #1 IP #2 SNF #2 IP #3 HHA #1 HHA #2 time
  • 11. Mining Claims 11 IP SNF HHAIP IP IP IP SNF SNF IP SNF HHAIP IP IP IP SNF SNF IP SNF HHAIP IP IP IP SNF SNF IP SNF HHAIP IP IP IP SNF SNF IP SNF HHAIP IP IP IP SNF SNF IP IP IP IP IP Hospital Stays Post-Acute Care Anchor Episode Initiator
  • 12. Technologies considered • Pig (& Hadoop) Data Processing Language Procedural Language Relational-oriented • SAS Business Analytics & BI Software De-facto standard in Healthcare Industry Proprietary 12
  • 13. HPCC Systems Quick Introduction Rodrigo Pastrana -Consulting Software Engineer
  • 14. WHT/082311 What is HPCC Systems 14 • Open Source distributed data-intensive computing platform • Provides end-to-end Big Data workflow management , scheduler, integration tools, etc • Runs on commodity computing/storage nodes • Binary packages available for the most common Linux distributions • Originally designed circa 1999 (predates the original paper on MapReduce from Dec. ‘04) • Improved over a decade of real-world Big Data analytics • In use across critical production environments throughout LexisNexis for more than 10 years
  • 16. WHT/082311 • Massively Parallel data processing engine • Enables data integration on a scale not previously available • Programmable using ECL HPCC Systems Data Refinery (Thor) HPCC Systems Data Delivery Engine (Roxie) • A massively parallel, high throughput, query engine • Low latency, highly concurrent and highly available • Several advanced strategies for efficient retrieval • Programmable using ECL Enterprise Control Language (ECL) • An easy to use, declarative data-centric programming language optimized for large-scale data management and query processing • Highly efficient; automatically distributes workload across all nodes; compiles to native machine code. • Automatic parallelization and synchronization 1 2 3 The Three HPCC Systems components Conclusion: End to End platform • No need for any third party tools 16
  • 17. WHT/082311 • Declarative programming language: Describe what needs to be done and not how to do it • Powerful: High level data activities like JOIN, TRANSFORM, PROJECT, SORT, DISTRIBUTE, MAP, etc. are available. • Extensible: Modular and extensible, it can shape itself to adapt to the type of problem at hand • Implicitly parallel: Parallelism is built into the underlying platform. The programmer needs not be concerned with data partitioning and parallelism • Maintainable: High level programming language, without side effects and with efficient encapsulation; programs are more succinct, reliable and easier to troubleshoot • Complete: ECL provides a complete data programming paradigm • Homogeneous: One language to express data algorithms across the entire HPCC Systems platform: data integration, analytics and high speed delivery • Polyglottic: ECL supports the embedding of other languages such as Java, Python, R, SQL, and more Enterprise Control Language (ECL) 17
  • 18. WHT/082311 Current Status and Resources • HPCCSystems.com – Tutorials, Docs, Platform distributions, and more • Latest release 5.2.0 adds many new features and improvements • Drastic GUI improvements • Ganglia and Nagios plug-in for system monitoring and alerting • Security Enhancements – tighter authentication measures, intra- component communication encryption • Embedded Languages – Cassandra support, memcache and redis access • JSON based data support • Dynamic ESDL – Provides simple middleware/back-end interface definition • JAVA API project – facilitates interaction between Java based apps and HPCC web services and c++ tools • Available now – HPCCSystems.com
  • 19. Data mining with HPCC Systems • Thor Responsible for processing vast amount of data Optimized for Extraction, Transformation, Loading, Sorting and Linking Data • ECL Declarative More Data Centric Fast & Implicitly Parallel Inline data Unit Tests in ECL 19
  • 20. 20 SQL vs ECL SELECT diag_group_cd, COUNT(*) as volume SUM(pmt_amt) as costs FROM inpatient_claims GROUP BY diag_group_cd; TABLE( inpatient_claims, { diag_group_cd; INTEGER volume := COUNT(GROUP); REAL costs := SUM(pmt_amt); }, diag_group_cd ); SQL ECL SELECT * FROM inpatient_claims LEFT JOIN ip_value_codes RIGHT ON LEFT.id = RIGHT.id JOIN( inpatient_claims, ip_value_codes, LEFT.id = RIGHT.id );
  • 21. 21 SQL vs ECL DECLARE my_cursor CURSOR FOR SELECT * FROM inpatient_claims; OPEN my_cursor FETCH NEXT FROM my_cursor INTO @…, @… WHILE @@FETCH_STATUS = 0 BEGIN … END CLOSE my_cursor; DEALLOCATE my_cursor; ITERATE( inpatient_claims, TRANSFORM(inpatient_claim_layout, SELF.is_dropped := is_one_year_or_greater( RIGHT.admsn_dt, RIGHT.dschrgdt); SELF := RIGHT; ) ); SQL ECL
  • 22. Tx 22 ECL ROLLUP R1 R2 R3 R4 R5 R6 LEFT RIGHT RA TxLEFT RIGHT RB R4 R6R5 ROLLUP( dataset, condition(LEFT, RIGHT), transformation(LEFT, RIGHT) )
  • 23. Processing Claims 1. The intent here is to make the series of interim claims look like a single claim for most purposes, where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim. 2. 􏰂􏰂The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay. 3. 􏰂􏰂The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record, or whether the stay is included/excluded as a readmission for an existing episode. 4. 􏰂􏰂Costs across all IP claims included in the single stay are aggregated to the stay level. 5. Claims where the last in the series of claims has patient (…) [as “still a patient”, not discharged], flag these and drop all of the claims in the series from the IP hospital stay file. 23
  • 24. Processing Claims With ECL H_1 := SORT( A , bene_sk, provider, admsn_dt, dschrgdt, thru_dt); H_2 := ROLLUP(H_1, is_interim(LEFT, RIGHT), merge_interim_claims(LEFT, RIGHT)); H_3 := JOIN(H_2, H_1, LEFT.bene_sk = RIGHT.bene_sk […], RIGHT ONLY); H_4 := PROJECT(H_3, TRANSFORM(BPCI.Layouts.ip_claim_etl_layout, SELF.is_dropped := TRUE; SELF.dropped_reason_code := BPCI.Layouts.DROPPED_REASON_CODES.InterimClaim; SELF := LEFT; )); H := H_2 + H_4; 24
  • 25. 25 Template Language EXPORT load_all_client_files(pId, pFileSet, pBaseDataDirectory) := MACRO LOADXML(pFileSet); baseDataDirectory := pBaseDataDirectory + pId + '/'; #FOR(folder) #UNIQUENAME(subId) %subId% := %''%; #UNIQUENAME(subDS) %subDS% := Client.Datasets(%subId%); [...] #UNIQUENAME(id) %id% := pId + '::' + %''%; #UNIQUENAME(dataDir) %dataDir% := %baseDataDirectory% + %''% + '/'; #UNIQUENAME(etl) %etl% := Client.ETL(%dataDir%, %id%); %etl%.run(); #END ENDMACRO;
  • 26. 26 Template Language file_set := ’<folders>’ + '<folder>M201409</folder>' + '<folder>M201410</folder>' + '<folder>M201411</folder>' + '<folder>M201412</folder>' + '<folder>M201501</folder>' + '<folder>M201502</folder>' + '<folder>M201503</folder>' + ‘</folders>’; load_all_client_files(1234, file_set, ‘/volume1/data/‘);
  • 27. Beyond Processing Data • Security & Authentication • Collaboration • Unit Tests • Visualizations 27
  • 28. Beyond Processing: Security • HTTPS • Htpasswd • LDAP support • File level security when using LDAP 28
  • 29. Beyond Processing: Workunits • Workunit Identifier • Attribution • Query • Timings • Results 29
  • 34. 34 Beyond Processing: Unit Tests interim_claims := MODULE // Test Data test_set := BPCI.Test.Samples.ip_claim( bene_id := 1, claim_id := 1, pmt_amt := 3042.0, ...) + BPCI.Test.Samples.ip_claim( bene_id := 1, claim_id := 2, pmt_amt := 11409.0, ...) + .... ; ... EXPORT Actual := Step2.ip_stays; SHARED TestSuite := MODULE EXPORT Test01 := ASSERT(oActual(NOT is_dropped), claimno IN [1,2], 'Did not filter ou EXPORT Test02 := ASSERT(oActual(is_dropped), claimno IN [3,5]); END; EXPORT AllTests := TestSuite.Test01 + TestSuite.Test02; END;
  • 35. Beyond Processing: Unit Tests (2) 35 // Using inline dataset simple_ip_claims := DATASET([ {1,1,'0','010001',20000201,'',20000120,200 00201,'61'}, ], simplified_ip_layout); ip_claims := Samples.ip_claims(simple_ip_claims); // OR passing NAMED parameters ip_claims := Samples.ip_claims2( bene_id := 1, claim_id := 1, claim_type := '00' ) simplified_ip_layout := RECORD UNSIGNED bene_id; UNSIGNED claim_id; STRING claim_type; STRING provider_number; INTEGER4 through_date; STRING status_code; INTEGER4 admission_date; INTEGER4 discharge_date; STRING ms_drg_code; END;
  • 38. 38 (No) Insights 0 20 40 60 80 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Episodes Costs in $1,000
  • 39. 39 Insights 0 20 40 60 80 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Episodes Costs in $1,000 No readmit 1 Readmit 2 Readmits
  • 40. Data Delivery: Roxie • Data Delivery Engine • Indexed, compressed and in-memory • Data Warehouse Capabilities • Data Services 40
  • 41. Data Services • Web Services over Data Warehouse • XML/SOAP but also JSON • Web Services defined in ECL • Solution to add/remove data from the cluster 41
  • 42. Data Service: Query • Define service using ECL 42 INTEGER4 oOffset := 1 : STORED('Offset'); INTEGER4 oResults := 100 : STORED('Results'); INTEGER4 oStartDate := 20130201 : STORED('Begin_Date'); INTEGER4 oEndDate := 20140201 : STORED('End_Date'); ... oParams := DATASET([{ oOffset, oResults, oStartDate, oEndDate, … }], Layouts.service_parameters_layout); T(DATASET(RECORDOF(Datasets.dsFactEpisodeCostsIndex)) pData) := FUNCTION RETURN TABLE(pData, { STRING bpid := pData.bpid; UNSIGNED INTEGER1 model := pData.model; UNSIGNED INTEGER1 post_dsch_prd_length := pData.post_dsch_prd_length; INTEGER8 total_episodes := COUNT(GROUP); DECIMAL15_2 total_costs := SUM(GROUP, sum_post_dsch_prd_pay); DECIMAL15_2 average_costs := AVE(GROUP, sum_post_dsch_prd_pay); DECIMAL15_2 std_dev_costs := SQRT(VARIANCE(GROUP, sum_post_dsch_prd_pay)); }, bpid, model, post_dsch_prd_length); END; ReportServices.BaseService.run_it('Summary', oParameters, T, bpid);
  • 43. 43 Data (Web) Services { "summary": { "offset": 1, "results": 10, "begin_date": 20130101, "end_date": 20130201, … } } { "summaryResponse": { … "Results": { … "Summary": { "Row": [ { "bpid": "9999", "model": 2, "post_dsch_prd_length": 90, "total_episodes": 987, "total_costs": 9876543.21, "average_costs": 12358.13 … } ] … Request Response https://.../WsEcl/forms/json/query/roxie/summary
  • 46. Loading up data • Logical vs Physical ~abc::subfolder::subsubfolder::myfile /abc/subfolder/subsubfolder/myfile • ECL to load data into cluster: 46 oDS := DATASET( std.File.ExternalLogicalFilename('172.0.0.1','/var/lib/.../myfile.csv'), Layouts.ip_claim_layout, CSV(HEADING(0)) ); oDSDistributed := DISTRIBUTE(oDS, bene_id); OUTPUT(oDSDistributed,, ‘~somewhere::over::here::myfile’, OVERWRITE); oDS := DATASET(‘~somewhere::over::here::myfile’, Layouts.ip_claim_layout); • ECL to use data loaded into cluster:
  • 47. SuperFiles • Super File = Symbolic link, list of sub-files • Each sub-file must have the same layout 47 WEBLOGS_FILE := ‘~somewhere::logs::web’ Std.File.CreateSuperFile(WEBLOGS_FILE); … run_report() := FUNCTION oDS := DATASET(WEBLOGS_FILE, Layouts.weblogs_layout, CSV); RETURN TABLE(oDS, { ip_address; COUNT(GROUP); }, ip_address ); END; SEQUENTIAL( Std.File.StartSuperFileTransaction(), Std.File.AddSuperFile(WEBLOGS_FILE, ‘~somewhere::logs::web::2015::04::01’), Std.File.FinishSuperFileTransaction() ); • Including (more) data:
  • 48. 48 Data Services: Reusability EXPORT run_it( pServiceName, pParams, pReportFunction, pSortByField) := MACRO // Filtering data based on parameters #UNIQUENAME(DS); %DS% := WS.Datasets.dsFactEpisodeCostsIndex; […] #UNIQUENAME(B) %B% := IF(COUNT(pParams[1].providers) = 0, %A%, %A%(provider_id IN pParams[1].providers)); #UNIQUENAME(C) %C% := IF(COUNT(pParams[1].npis) = 0, %B%, %B%(at_npi IN pParams[1].npis OR op_npi IN pParams[ […] #UNIQUENAME(report) %report% := pReportFunction(%K%); #UNIQUENAME(sorted) %sorted% := SORT(%report%, pSortByField); #UNIQUENAME(O1) %O1% := OUTPUT(pParameters, NAMED('Request')); oSummary := DATASET([{ COUNT(%sorted%) }], WS.Layouts.service_summary_layout); #UNIQUENAME(O2) %O2% := OUTPUT(oSummary, NAMED(‘Metadata')); #UNIQUENAME(O3) %O3% := OUTPUT( CHOOSEN(%sorted%, pParams[1].results, pParams[1].offset), NAMED(pServiceName), ALL); PARALLEL(%O1%, %O2%, %O3%); ENDMACRO;
  • 51. 51 lpezet@archwayha.com www.linkedin.com/in/lucpezet mezzetin.blogspot.com HPCC Systems open source portal: http://hpccsystems.com Thank you Questions? Feedback? Questions ? Feedback ? www.linkedin.com/in/lucpezet