SlideShare a Scribd company logo
Innovation and
Reinvention Driving
Transformation
OCTOBER 9, 2018
2018 HPCC Systems® Community Day
Luke Pezet, Archway Health
HPCC Systems vs SAS: The Final Countdown
“Change is the only constant in life”
HPCC Systems vs SAS: The Final Countdown 2
— Heraclitus
Me, Me and Me...at Archway
• Solution Architect with over 15 years of experience
• Worked for Archway Health Advisors ~ 5 years
• Archway helps care providers manage bundled payment programs.
• Needed to process medical claims 5 years ago and chose HPCC Systems over SAS,
Hadoop*, etc.
• New employees brought other technologies, including SAS
3HPCC Systems vs SAS: The Final Countdown
Introduction
HPCC Systems
• Open-source data-intensive computing system platform developed by
LexisNexis Risk Solutions.
• Development started before 2000.
• Scalable Data refinery called Thor and scalable rapid data delivery engine
called ROXIE.
SAS (“Statistical Analysis System”)
• Proprietary software suite developed by SAS Institute that provides advanced
analytics.
• Development started in 1966.
HPCC Systems vs SAS: The Final Countdown 4
Use Case
• Based on Regression With SAS Chapter 1 - Simple And Multiple Regression web book
from Institute for Digital Research and Education at UCLA.
• It's about data analysis and demonstrates how to use software for regression
analysis. This is not about the statistical basis of multiple regression or which
criterion is best to choose models, etc.
• Data was created by randomly sampling 400 elementary schools from the California
Department of Education's API 2000 dataset.
• Contains a measure of school academic performance as well as other attributes such
as class size, enrollment, poverty, etc.
5HPCC Systems vs SAS: The Final Countdown
Helper
SASsy ECL bundle
ecl-bundle install https://github.com/lpezet/SASsy.git
Usage:
IMPORT SASsy;
// OR
IMPORT SASsy.PROC;
6HPCC Systems vs SAS: The Final Countdown
Loading data
SAS
DATA scores;
INFILE datalines dsd;
INPUT Name : $9. Score1-Score3 Team ~ $25.
Div $;
DATALINES;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
Mitchel,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
;
ECL
layout := { STRING Name; UNSIGNED Score1;
UNSIGNED Score2; UNSIGNED Score3; STRING
Team; STRING Div; };
scores := DATASET( [ { ‘Smith’,12,22,46,’Green
Hornets, Atlanta’, ‘AAA’ }, { ‘Mitchel’,
23,19,25,’High Volts, Portland’, ‘AAA’ }, { ‘Jones’,
09, 17, 54, ‘Vulcans, Las Vegas’, ‘AA’ } ], layout );
HPCC Systems vs SAS: The Final Countdown 7
Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 8
PROC PRINT data=”elemapi” (obs=5);
run;
Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 9
IMPORT SASsy.PROC;
PROC.PRINT( ElemAPIDS, 5 );
// CHOOSEN( ElemAPIDS, 5 );
Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 10
PROC CONTENTS data=”elemapi”;
run;
Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 11
IMPORT SASsy.PROC;
PROC.CONTENTS( ElemAPIDS );
Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 12
PROC MEANS data=”elemapi”;
var api00 acs_k3 meals full;
run;
Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 13
IMPORT SASsy.PROC;
PROC.MEANS( oMeans, ElemAPIDS,
'api00,acs_k3,meals,full' );
OUTPUT( oMeans, NAMED('MEANS'));
Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 14
IMPORT DataPatterns;
DataPatterns.Profile( ElemAPIDS,
features :=
‘fill_rate,best_ecl_types,cardinali
ty,lengths,min_max,mean,std_dev,qua
rtiles,correlations’ );
Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 15
PROC UNIVARIATE data=”elemapi”;
var acs_k3;
run;
Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 16
IMPORT SASsy.PROC;
PROC.UNIVARIATE( ElemAPIDS,
'acs_k3' );
Extreme - Lowest Extreme - Highest
Missing Values
Basics
Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 17
PROC FREQ data=”elemapi”;
tables acs_k3;
run;
Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 18
IMPORT SASsy.PROC;
PROC.FREQ( ACSK3Freq, ElemAPIDS,
'acs_k3' );
OUTPUT( ACSK3Freq, NAMED(‘Frequency’));
Looking at the data (SAS)
HPCC Systems vs SAS: The Final Countdown 19
PROC UNIVARIATE data=”elemapi”;
var acs_k3;
histogram / cfill=gray;
run;
Looking at the data (ECL)
HPCC Systems vs SAS: The Final Countdown 20
IMPORT Visualizer;
PlotData := TABLE( SORT( ElemAPIDS,
acs_k3 ), { STRING label := acs_k3;
COUNT(GROUP); }, acs_k3 );
OUTPUT(oPlotData,
NAMED('PlotData'));
Visualizer.MultiD.Column('myChart',,
'PlotData');
MACROs
SAS
%MACRO MISSINGCHECK(VAR, TYPE);
PROC SQL;
CREATE TABLE &VAR._&TYPE. AS
SELECT DISTINCT CLM_TYPE_1, COUNT(SYSKEY) AS
&VAR._MISSING
FROM OUTPUT.&TYPE.
WHERE &VAR. IS MISSING
GROUP BY CLM_TYPE_1
ORDER BY CLM_TYPE_1;
QUIT;
%MEND MISSINGCHECK;
%MISSINGCHECK(MEMBER_ID, &EPI.GENERAL);
%MISSINGCHECK(CLAIM_ID, &EPI.GENERAL);
%MISSINGCHECK(MS_DRG, &EPI.GENERAL);
%MISSINGCHECK(ADM_DGNS, &EPI.GENERAL);
ECL
MissingCheck( pDS, pField, pMissingValue, pByField ) :=
FUNCTIONMACRO
#UNIQUENAME(tabled)
%tabled% := TABLE( pDS( pField = pMissingValue ), {
pByField; COUNT(GROUP); }, pByField );
#UNIQUENAME(sorted)
%sorted% := SORT( %tabled%, pByField);
RETURN %sorted%;
ENDMACRO;
MissingCheck( ElemAPIDS, meals, ‘’, dnum );
MissingCheck( ElemAPIDS, acs_k3, ‘’, dnum );
MissingCheck( ElemAPIDS, api00, ‘’, dnum );
HPCC Systems vs SAS: The Final Countdown 21
Multiple Regression (SAS)
HPCC Systems vs SAS: The Final Countdown 22
PROC REG data="c:sasregelemapi"
model api00 = acs_k3 meals full;
run;
Multiple Regression (ECL)
HPCC Systems vs SAS: The Final Countdown 23
IMPORT ML_Core;
IMPORT LinearRegression;
IMPORT SASsy;
IndVars := 'acs_k3,meals,full';
DepVars := 'api00';
/* … */
ML_Core.ToField( inddata,
inddataNF, __id__ );
ML_Core.ToField( depdata,
depdataNF, __id__ );
MyOLS := LinearRegression.OLS(
inddataNF, depdataNF );
MyModel := MyOLS.GetModel;
SASsy.Utils.reg_report_on_all(
MyOLS, MyModel, inddataNF );
More
ECL Machine Learning Library
• Statistics (e.g. Means, Std Deviation, Modes, Medians, NTiles, etc.)
• Regression
• Clustering (e.g. K-Means)
• Classification (e.g. Logistic Regression, Decision Trees, Perceptron, etc.)
• Unstructured Data (Tokenize, Transform, CoLocation)
• Association (e.g. AprioriN)
• Matrix Manipulation
HPCC Systems vs SAS: The Final Countdown 24
Today
HPCC Systems used to process data at scale and on a more frequent basis
• Process Medical Claims using Thor and deliver results using Roxie
• Run ETL/ELT processes to load, clean, prepare data
• Run more advanced processing to generate outputs (Bundle Engine)
• Clusters of 8+ nodes
SAS used to run research, exploratory data analysis and modeling.
• Uses HPCC outputs as input
• Single instance
• Restricted on CPU/RAM
25HPCC Systems vs SAS: The Final Countdown
Tomorrow
HPCC Systems
• Still run ETL/ELT processes to load, clean, prepare data
• Run processes that need to happen more frequently
• Porting more Advanced Data Analysis And Modeling features to ECL
• Make it easier to create clusters to make experimentation effortless
SAS
• 1 server
• R&D for now
• Validate/compare results with HPCC Systems
26HPCC Systems vs SAS: The Final Countdown
Thank you!OUTPUT(‘ ’);

More Related Content

PDF
Integrate SparkR with existing R packages to accelerate data science workflows
PPTX
AMP Camp 5 Intro
PPTX
Data Warehouse Offload
PDF
PDF
Introduction to Apache Hivemall v0.5.0
PPT
Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...
PDF
HEPData workshop talk
PDF
Towards efficient processing of RDF data streams
Integrate SparkR with existing R packages to accelerate data science workflows
AMP Camp 5 Intro
Data Warehouse Offload
Introduction to Apache Hivemall v0.5.0
Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...
HEPData workshop talk
Towards efficient processing of RDF data streams

What's hot (12)

PDF
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
PDF
An Introduction to Spark with Scala
PDF
HEPData Open Repositories 2016 Talk
PDF
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
PDF
Intro to Apache Spark - Lab
PDF
GSLIS Research Showcase Presentation (Expanded)
PDF
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
PDF
Time series database by Harshil Ambagade
PPTX
Hive LLAP cache roadmap
PDF
Reproducible, Open Data Science in the Life Sciences
PPTX
Swift Parallel Scripting for High-Performance Workflow
PDF
Introduction to Microsoft R Services
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
An Introduction to Spark with Scala
HEPData Open Repositories 2016 Talk
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
Intro to Apache Spark - Lab
GSLIS Research Showcase Presentation (Expanded)
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Time series database by Harshil Ambagade
Hive LLAP cache roadmap
Reproducible, Open Data Science in the Life Sciences
Swift Parallel Scripting for High-Performance Workflow
Introduction to Microsoft R Services
Ad

Similar to HPCC Systems vs SAS: The Final Countdown (20)

PDF
Tutorial On Database Management System
PPTX
Ssis ssas sps_mdx_hong_bingli
PPTX
Ssis sql ssrs_ssas_sp_mdx_hb_li
PPTX
Ssis sql ssas_sps_mdx_hong_bingli
PPTX
Ssis sql ssas_sps_mdx_hong_bingli
PPTX
Ssis sql hb_li
PPTX
scalable machine learning
PPTX
Theits 2014 iaa s saas strategic focus
PPTX
Managing ASQ Data: a Guide for Relief Nursery Administrative Assistants
PDF
Introducing a horizontally scalable, inference-based business Rules Engine fo...
PPTX
Ssis ssas sps_mdx_hong_bingli
PDF
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
PPTX
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
PPT
SQL Optimization With Trace Data And Dbms Xplan V6
PPTX
SQL Server 2008 Development for Programmers
PPT
Skills Portfolio
PPTX
Bringing OpenClinica Data into SAS
PPTX
Visualizing HPCC Systems Log Data Using ELK
PPT
Oracle OpenWorld 2011– Leveraging and Enriching the Capabilities of Oracle Da...
PDF
MIS5101 WK10 Outcome Measures
Tutorial On Database Management System
Ssis ssas sps_mdx_hong_bingli
Ssis sql ssrs_ssas_sp_mdx_hb_li
Ssis sql ssas_sps_mdx_hong_bingli
Ssis sql ssas_sps_mdx_hong_bingli
Ssis sql hb_li
scalable machine learning
Theits 2014 iaa s saas strategic focus
Managing ASQ Data: a Guide for Relief Nursery Administrative Assistants
Introducing a horizontally scalable, inference-based business Rules Engine fo...
Ssis ssas sps_mdx_hong_bingli
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
SQL Optimization With Trace Data And Dbms Xplan V6
SQL Server 2008 Development for Programmers
Skills Portfolio
Bringing OpenClinica Data into SAS
Visualizing HPCC Systems Log Data Using ELK
Oracle OpenWorld 2011– Leveraging and Enriching the Capabilities of Oracle Da...
MIS5101 WK10 Outcome Measures
Ad

More from HPCC Systems (20)

PPTX
Natural Language to SQL Query conversion using Machine Learning Techniques on...
PPT
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
PPTX
Towards Trustable AI for Complex Systems
PPTX
Welcome
PPTX
Closing / Adjourn
PPTX
Community Website: Virtual Ribbon Cutting
PPTX
Path to 8.0
PPTX
Release Cycle Changes
PPTX
Geohashing with Uber’s H3 Geospatial Index
PPTX
Advancements in HPCC Systems Machine Learning
PPTX
Docker Support
PPTX
Expanding HPCC Systems Deep Neural Network Capabilities
PPTX
Leveraging Intra-Node Parallelization in HPCC Systems
PPTX
DataPatterns - Profiling in ECL Watch
PPTX
Leveraging the Spark-HPCC Ecosystem
PPTX
Work Unit Analysis Tool
PPTX
Community Award Ceremony
PPTX
Dapper Tool - A Bundle to Make your ECL Neater
PPTX
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
PPTX
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Towards Trustable AI for Complex Systems
Welcome
Closing / Adjourn
Community Website: Virtual Ribbon Cutting
Path to 8.0
Release Cycle Changes
Geohashing with Uber’s H3 Geospatial Index
Advancements in HPCC Systems Machine Learning
Docker Support
Expanding HPCC Systems Deep Neural Network Capabilities
Leveraging Intra-Node Parallelization in HPCC Systems
DataPatterns - Profiling in ECL Watch
Leveraging the Spark-HPCC Ecosystem
Work Unit Analysis Tool
Community Award Ceremony
Dapper Tool - A Bundle to Make your ECL Neater
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...

Recently uploaded (20)

PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Microsoft Core Cloud Services powerpoint
DOCX
Factor Analysis Word Document Presentation
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
Leprosy and NLEP programme community medicine
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPT
Predictive modeling basics in data cleaning process
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Introduction to the R Programming Language
PPTX
Database Infoormation System (DBIS).pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
Topic 5 Presentation 5 Lesson 5 Corporate Fin
retention in jsjsksksksnbsndjddjdnFPD.pptx
CYBER SECURITY the Next Warefare Tactics
Microsoft Core Cloud Services powerpoint
Factor Analysis Word Document Presentation
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Pilar Kemerdekaan dan Identi Bangsa.pptx
Leprosy and NLEP programme community medicine
SAP 2 completion done . PRESENTATION.pptx
Predictive modeling basics in data cleaning process
Qualitative Qantitative and Mixed Methods.pptx
Introduction to the R Programming Language
Database Infoormation System (DBIS).pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
ISS -ESG Data flows What is ESG and HowHow
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
New ISO 27001_2022 standard and the changes
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Analytics and business intelligence.pdf
STERILIZATION AND DISINFECTION-1.ppthhhbx

HPCC Systems vs SAS: The Final Countdown

  • 1. Innovation and Reinvention Driving Transformation OCTOBER 9, 2018 2018 HPCC Systems® Community Day Luke Pezet, Archway Health HPCC Systems vs SAS: The Final Countdown
  • 2. “Change is the only constant in life” HPCC Systems vs SAS: The Final Countdown 2 — Heraclitus
  • 3. Me, Me and Me...at Archway • Solution Architect with over 15 years of experience • Worked for Archway Health Advisors ~ 5 years • Archway helps care providers manage bundled payment programs. • Needed to process medical claims 5 years ago and chose HPCC Systems over SAS, Hadoop*, etc. • New employees brought other technologies, including SAS 3HPCC Systems vs SAS: The Final Countdown
  • 4. Introduction HPCC Systems • Open-source data-intensive computing system platform developed by LexisNexis Risk Solutions. • Development started before 2000. • Scalable Data refinery called Thor and scalable rapid data delivery engine called ROXIE. SAS (“Statistical Analysis System”) • Proprietary software suite developed by SAS Institute that provides advanced analytics. • Development started in 1966. HPCC Systems vs SAS: The Final Countdown 4
  • 5. Use Case • Based on Regression With SAS Chapter 1 - Simple And Multiple Regression web book from Institute for Digital Research and Education at UCLA. • It's about data analysis and demonstrates how to use software for regression analysis. This is not about the statistical basis of multiple regression or which criterion is best to choose models, etc. • Data was created by randomly sampling 400 elementary schools from the California Department of Education's API 2000 dataset. • Contains a measure of school academic performance as well as other attributes such as class size, enrollment, poverty, etc. 5HPCC Systems vs SAS: The Final Countdown
  • 6. Helper SASsy ECL bundle ecl-bundle install https://github.com/lpezet/SASsy.git Usage: IMPORT SASsy; // OR IMPORT SASsy.PROC; 6HPCC Systems vs SAS: The Final Countdown
  • 7. Loading data SAS DATA scores; INFILE datalines dsd; INPUT Name : $9. Score1-Score3 Team ~ $25. Div $; DATALINES; Smith,12,22,46,"Green Hornets, Atlanta",AAA Mitchel,23,19,25,"High Volts, Portland",AAA Jones,09,17,54,"Vulcans, Las Vegas",AA ; ECL layout := { STRING Name; UNSIGNED Score1; UNSIGNED Score2; UNSIGNED Score3; STRING Team; STRING Div; }; scores := DATASET( [ { ‘Smith’,12,22,46,’Green Hornets, Atlanta’, ‘AAA’ }, { ‘Mitchel’, 23,19,25,’High Volts, Portland’, ‘AAA’ }, { ‘Jones’, 09, 17, 54, ‘Vulcans, Las Vegas’, ‘AA’ } ], layout ); HPCC Systems vs SAS: The Final Countdown 7
  • 8. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 8 PROC PRINT data=”elemapi” (obs=5); run;
  • 9. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 9 IMPORT SASsy.PROC; PROC.PRINT( ElemAPIDS, 5 ); // CHOOSEN( ElemAPIDS, 5 );
  • 10. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 10 PROC CONTENTS data=”elemapi”; run;
  • 11. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 11 IMPORT SASsy.PROC; PROC.CONTENTS( ElemAPIDS );
  • 12. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 12 PROC MEANS data=”elemapi”; var api00 acs_k3 meals full; run;
  • 13. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 13 IMPORT SASsy.PROC; PROC.MEANS( oMeans, ElemAPIDS, 'api00,acs_k3,meals,full' ); OUTPUT( oMeans, NAMED('MEANS'));
  • 14. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 14 IMPORT DataPatterns; DataPatterns.Profile( ElemAPIDS, features := ‘fill_rate,best_ecl_types,cardinali ty,lengths,min_max,mean,std_dev,qua rtiles,correlations’ );
  • 15. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 15 PROC UNIVARIATE data=”elemapi”; var acs_k3; run;
  • 16. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 16 IMPORT SASsy.PROC; PROC.UNIVARIATE( ElemAPIDS, 'acs_k3' ); Extreme - Lowest Extreme - Highest Missing Values Basics
  • 17. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 17 PROC FREQ data=”elemapi”; tables acs_k3; run;
  • 18. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 18 IMPORT SASsy.PROC; PROC.FREQ( ACSK3Freq, ElemAPIDS, 'acs_k3' ); OUTPUT( ACSK3Freq, NAMED(‘Frequency’));
  • 19. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 19 PROC UNIVARIATE data=”elemapi”; var acs_k3; histogram / cfill=gray; run;
  • 20. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 20 IMPORT Visualizer; PlotData := TABLE( SORT( ElemAPIDS, acs_k3 ), { STRING label := acs_k3; COUNT(GROUP); }, acs_k3 ); OUTPUT(oPlotData, NAMED('PlotData')); Visualizer.MultiD.Column('myChart',, 'PlotData');
  • 21. MACROs SAS %MACRO MISSINGCHECK(VAR, TYPE); PROC SQL; CREATE TABLE &VAR._&TYPE. AS SELECT DISTINCT CLM_TYPE_1, COUNT(SYSKEY) AS &VAR._MISSING FROM OUTPUT.&TYPE. WHERE &VAR. IS MISSING GROUP BY CLM_TYPE_1 ORDER BY CLM_TYPE_1; QUIT; %MEND MISSINGCHECK; %MISSINGCHECK(MEMBER_ID, &EPI.GENERAL); %MISSINGCHECK(CLAIM_ID, &EPI.GENERAL); %MISSINGCHECK(MS_DRG, &EPI.GENERAL); %MISSINGCHECK(ADM_DGNS, &EPI.GENERAL); ECL MissingCheck( pDS, pField, pMissingValue, pByField ) := FUNCTIONMACRO #UNIQUENAME(tabled) %tabled% := TABLE( pDS( pField = pMissingValue ), { pByField; COUNT(GROUP); }, pByField ); #UNIQUENAME(sorted) %sorted% := SORT( %tabled%, pByField); RETURN %sorted%; ENDMACRO; MissingCheck( ElemAPIDS, meals, ‘’, dnum ); MissingCheck( ElemAPIDS, acs_k3, ‘’, dnum ); MissingCheck( ElemAPIDS, api00, ‘’, dnum ); HPCC Systems vs SAS: The Final Countdown 21
  • 22. Multiple Regression (SAS) HPCC Systems vs SAS: The Final Countdown 22 PROC REG data="c:sasregelemapi" model api00 = acs_k3 meals full; run;
  • 23. Multiple Regression (ECL) HPCC Systems vs SAS: The Final Countdown 23 IMPORT ML_Core; IMPORT LinearRegression; IMPORT SASsy; IndVars := 'acs_k3,meals,full'; DepVars := 'api00'; /* … */ ML_Core.ToField( inddata, inddataNF, __id__ ); ML_Core.ToField( depdata, depdataNF, __id__ ); MyOLS := LinearRegression.OLS( inddataNF, depdataNF ); MyModel := MyOLS.GetModel; SASsy.Utils.reg_report_on_all( MyOLS, MyModel, inddataNF );
  • 24. More ECL Machine Learning Library • Statistics (e.g. Means, Std Deviation, Modes, Medians, NTiles, etc.) • Regression • Clustering (e.g. K-Means) • Classification (e.g. Logistic Regression, Decision Trees, Perceptron, etc.) • Unstructured Data (Tokenize, Transform, CoLocation) • Association (e.g. AprioriN) • Matrix Manipulation HPCC Systems vs SAS: The Final Countdown 24
  • 25. Today HPCC Systems used to process data at scale and on a more frequent basis • Process Medical Claims using Thor and deliver results using Roxie • Run ETL/ELT processes to load, clean, prepare data • Run more advanced processing to generate outputs (Bundle Engine) • Clusters of 8+ nodes SAS used to run research, exploratory data analysis and modeling. • Uses HPCC outputs as input • Single instance • Restricted on CPU/RAM 25HPCC Systems vs SAS: The Final Countdown
  • 26. Tomorrow HPCC Systems • Still run ETL/ELT processes to load, clean, prepare data • Run processes that need to happen more frequently • Porting more Advanced Data Analysis And Modeling features to ECL • Make it easier to create clusters to make experimentation effortless SAS • 1 server • R&D for now • Validate/compare results with HPCC Systems 26HPCC Systems vs SAS: The Final Countdown