SlideShare a Scribd company logo
Big Data Cloud Meet Up September 8 th , 2011   HPCC Platform Big Data Analytics and Delivery http://hpccsystems.com LexisNexis’ massive parallel-processing open-source computing platform
Who’s been using the HPCC Platform and why? Very large businesses Federal Agencies National research labs It’s 4 to 10 times faster Products and solutions are    built much faster Very complex problems can be   modeled and solved It’s proven http://hpccsystems.com
What’s changed? We just Open-Sourced! The HPCC Platform is now available to you. http://hpccsystems.com
Big Data…It’s our business. Big Data Open Source Components Insurance Financial Services  Cyber Security Government Health Care Retail Telecommunications Transportation & Logistics Weblog Analysis INDUSTRY SOLUTIONS Online Reservations http://hpccsystems.com Customer Data Integration Data Fusion Fraud Detection and Prevention Know Your Customer Master Data Management Weblog Analysis
The Platform’s Major Parts Thor – Data ingestion, hygiene, refining, transformation, linking, fusion Roxie – Data Delivery Engine Supports complex queries and distributed indexes Low latency -- Latencies grow logarithmically ECL – One language Highly expressive and efficient declarative language Solve complex problems Encourage code reuse http://hpccsystems.com
How we’re different It’s not a group of disparate technologies or competing visions bolted together. It’s one platform with a clear proven vision. This by itself is powerful. http://hpccsystems.com
How we’re different You can transcend map reduce  Build transformative data graphs and applications using ECL Solve very complex Big Data problems Don’t struggle to fit your Big Data problem into   groups of map reduce jobs http://hpccsystems.com
How we’re different No need to munge the data before ingestion No complex block file system No need to tune number of tasks for different jobs Data Delivery Engine is included Use a single language for data cleansing,    transformation, linking, fusion, and delivery ECL promotes language extension and code reuse Data graphs are built and optimized by the system The system-generated C++  is highly optimized Code execution is optimized Low and predictable latencies Modeling data problems as data problems leads to richer solutions http://hpccsystems.com
Challenges Facing Health Care Enterprises Challenges facing the health insurance industry Disparate data in spread across separate physical locations Scale of data. BIG Data is getting BIGGER. Adding relationships exponentially expands the size of the BIG Data analytics challenge. LexisNexis has leveraged parallel-processing computing platforms and large scale graph analytics for a over a decade.  http://hpccsystems.com
Potential Fraud – a POC for the State of New York Applied  social network analytics to information provided by the State of New York and public data supplied by LexisNexis to identify relationships between a group of New York Medicaid recipients living in high-end condominiums located within the same complex and any links those individuals might have to medical facilities or others providing care to New York Medicaid recipients. http://hpccsystems.com
What’s entailed (high level) Mix First Party data with Public and Third Data sources Adds fidelity to existing entities Adds new linkages into the analysis Ads new entities into the analysis Exposes ring leaders and brokers that don’t directly participate Addition of External Data http://hpccsystems.com
Graph \ Network 3 Billion derived public data relationships between people merged with risk indicators. Graph Analytics examine up to 20 billion data points to create variables that allows for predictive analysis incorporating relationship context and associated risk. Targets fraud across all sectors including Healthcare, Financial Services and Government. How we did it http://hpccsystems.com
Cluster Visualization Introduction How many of them are living in expensive residences, owned expensive property or drive expensive cars? How many recipients are contacts of medical businesses? How many medical businesses are associated with any of the people in the cluster? How many are currently receiving benefits? Medicaid Recipient Expensive Residence Owns expensive property Owns Expensive Vehicles Business Contact of Medical Business Entity Cluster visualization introduction  http://hpccsystems.com
Cluster Visualization Cluster visualization  http://hpccsystems.com
City Walk Sample: Vehicle Statistics What is the list of preferred expensive vehicles? Vehicle Statistics http://hpccsystems.com Make Description # Owned Make Description # Owned Mercedes-Benz  46 Chevrolet  2 Lexus  41 Hummer  2 BMW  27 Jeep  2 Infiniti  13 Nissan  2 Acura  9 Toyota  2 Lincoln  8 Aston Martin  1 Audi  7 Bentley  1 Land Rover  7 Cadillac  1 Porsche  6 GMC  1 Jaguar  5 Honda  1 Mercedes Benz  3 Volkswagen  1 Saab  3 Volvo  1
Dominant buyers and sellers at City Walk Property deed reference counts http://hpccsystems.com Name Deeds Held Name Deeds Held Hudson Eight 78 Mike Greem 21 Hudson Five 74 Scott Hill 21 Hudson First 73 Betty Donaway 21 Hudson Nine 65 Al Clark  19 Harry Anderson 45 Dave Miller 17 Hudson Ten 41 Mark Walker 16 Hudson Seven 39 Mike Smith 16 Home Nationwide 33 Val Edwards 15 Hudson Three 33 Eric Garcia 14 Brian Smith 28 Dane Young 14 Alan Stevens  25 Bill Moore 14 Chris Doe 24 Karen Carter 14 Sophie Davis 23 Casey Baker 14 Washington Mutual 23 Art Nelson 14 Fleet Mortgage Co. 21 Cathy Parker  13
The engineering story http://hpccsystems.com One guy (Joe Prichard).  Three weeks.  Less than part time. The platform lets him focus on the data. Joe’s a lot of fun to work with.
Do you do build other POC’s? Yes http://hpccsystems.com
What next? Try us out! Virtual Machine Binaries EC2 Data Script Ensemble Recipe…Juan from Cannonical http://hpccsystems.com
Contact Information Charles Kaminski Senior Architect Academic Development Lead HPCC Systems [email_address] 402-619-9413 http://hpccsystems.com

More Related Content

PDF
Data gravity3
PDF
Big data using Public Cloud
PPTX
Bigdatacooltools
PPTX
Bigdata
PDF
Summiting the Mountain of Big Data
PPT
Jobs Complexity
PDF
Data Con LA 2020 Keynote - Bryan Kirschner
PDF
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Data gravity3
Big data using Public Cloud
Bigdatacooltools
Bigdata
Summiting the Mountain of Big Data
Jobs Complexity
Data Con LA 2020 Keynote - Bryan Kirschner
Big Data LDN 2017: Real World Impact of a Global Data Fabric

What's hot (14)

PDF
Internet of Things
PDF
Forecast of Big Data Trends
PDF
The New Convergence of Data; the Next Strategic Business Advantage
PPTX
Big data characteristics, value chain and challenges
PPTX
IBM Big Data for Social Good Challenge - Submission Showcase
PPT
Datapreneurs
PDF
BIg Data Trends in 2016
PPTX
PDF
Diving into UK corporation ownership with Neo4j
PPTX
A Short History of Big Data
PPTX
Big Data
PDF
How to apply graphs to network management
PPTX
Spark Social Media
PDF
Data Transparency 2013 - OrgPedia by 3 Round Stones
Internet of Things
Forecast of Big Data Trends
The New Convergence of Data; the Next Strategic Business Advantage
Big data characteristics, value chain and challenges
IBM Big Data for Social Good Challenge - Submission Showcase
Datapreneurs
BIg Data Trends in 2016
Diving into UK corporation ownership with Neo4j
A Short History of Big Data
Big Data
How to apply graphs to network management
Spark Social Media
Data Transparency 2013 - OrgPedia by 3 Round Stones
Ad

Viewers also liked (20)

DOC
Guion video: El blog como un recurso educativo abierto.
PPTX
Analytics and Big Data in Law Firms
PPS
Ilusiones Opticas
PDF
Spanish Speaking Resources
PDF
Cynertia Consulting - Marketing y publicidad en Internet
PPTX
Ppp internetstr statt_homep_nr2_april_14
PPTX
Mi experiencia en la educacion a distancia
PDF
Revista Vidapremium nº 45
PPTX
05 Bernal Nicolas normas ICONTEC
PDF
The evolving container landscape
DOCX
Martesa e Kleopatres me Mark Antonin
PDF
Emf protection device alphaspin by jm ocean avenue
PDF
PLM Open Hours - Das Potential generischer Produkstrukturen
PDF
Pd 957 slides
PDF
Angular 2
PPT
[Newie Induction] Personal Effectiveness
PPTX
Permiso de maternidad en España
PDF
La importancia del ahorro y el IPAB
PPT
Lola cases album fotos
PDF
(Las nubes Aristófanes , guía de lectura y cuestiones)
Guion video: El blog como un recurso educativo abierto.
Analytics and Big Data in Law Firms
Ilusiones Opticas
Spanish Speaking Resources
Cynertia Consulting - Marketing y publicidad en Internet
Ppp internetstr statt_homep_nr2_april_14
Mi experiencia en la educacion a distancia
Revista Vidapremium nº 45
05 Bernal Nicolas normas ICONTEC
The evolving container landscape
Martesa e Kleopatres me Mark Antonin
Emf protection device alphaspin by jm ocean avenue
PLM Open Hours - Das Potential generischer Produkstrukturen
Pd 957 slides
Angular 2
[Newie Induction] Personal Effectiveness
Permiso de maternidad en España
La importancia del ahorro y el IPAB
Lola cases album fotos
(Las nubes Aristófanes , guía de lectura y cuestiones)
Ad

Similar to BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis (20)

PDF
Presentation at Wright State University
PDF
HUG Ireland Event - HPCC Presentation Slides
PPTX
HPCC Systems - Open source, Big Data Processing & Analytics
PDF
HPCC Systems Presentation to TDWI Chicago Chapter
PDF
Big Data Bootcamp 2017 - Atlanta - Flavio Villanustre
PDF
Introduction to the Open Source HPCC Systems Platform by Arjuna Chala
PPTX
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
PPTX
Big Data Introduction
PPTX
HPCC Platform + Visualization
PDF
Whitepaper - The need self service data tools, not scientists
PDF
Linking clinical data standards
PPTX
The Download: Tech Talks by the HPCC Systems Community, Episode 11
PDF
FC Brochure & Insert
PPTX
IDC Perspectives on Big Data Outside of HPC
PDF
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
PPTX
Using The Hadoop Ecosystem to Drive Healthcare Innovation
PDF
Outcomes Are Everything
PDF
Rapid Data Exploration With Hadoop
PPTX
Big data applications
PDF
Aman kaur gandhi
Presentation at Wright State University
HUG Ireland Event - HPCC Presentation Slides
HPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems Presentation to TDWI Chicago Chapter
Big Data Bootcamp 2017 - Atlanta - Flavio Villanustre
Introduction to the Open Source HPCC Systems Platform by Arjuna Chala
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Big Data Introduction
HPCC Platform + Visualization
Whitepaper - The need self service data tools, not scientists
Linking clinical data standards
The Download: Tech Talks by the HPCC Systems Community, Episode 11
FC Brochure & Insert
IDC Perspectives on Big Data Outside of HPC
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Using The Hadoop Ecosystem to Drive Healthcare Innovation
Outcomes Are Everything
Rapid Data Exploration With Hadoop
Big data applications
Aman kaur gandhi

More from BigDataCloud (20)

PDF
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
PDF
Crime Analysis & Prediction System
PDF
REAL-TIME RECOMMENDATION SYSTEMS
PDF
Cloud Computing Services
PDF
Google Enterprise Cloud Platform - Resources & $2000 credit!
PDF
Big Data in the Cloud - Solutions & Apps
PDF
Big Data Analytics in Motorola on the Google Cloud Platform
PDF
Streak + Google Cloud Platform
PDF
Using Advanced Analyics to bring Business Value
PDF
Creating Business Value from Big Data, Analytics & Technology.
PDF
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
PPTX
Recommendation Engines - An Architectural Guide
PPTX
Why Hadoop is the New Infrastructure for the CMO?
PDF
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
PPTX
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
PPTX
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
PDF
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
PDF
What Does Big Data Mean and Who Will Win
PDF
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
PDF
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Crime Analysis & Prediction System
REAL-TIME RECOMMENDATION SYSTEMS
Cloud Computing Services
Google Enterprise Cloud Platform - Resources & $2000 credit!
Big Data in the Cloud - Solutions & Apps
Big Data Analytics in Motorola on the Google Cloud Platform
Streak + Google Cloud Platform
Using Advanced Analyics to bring Business Value
Creating Business Value from Big Data, Analytics & Technology.
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Recommendation Engines - An Architectural Guide
Why Hadoop is the New Infrastructure for the CMO?
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
What Does Big Data Mean and Who Will Win
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Modernizing your data center with Dell and AMD
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
Teaching material agriculture food technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf
Empathic Computing: Creating Shared Understanding
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Mobile App Security Testing_ A Comprehensive Guide.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Modernizing your data center with Dell and AMD
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Monthly Chronicles - July 2025
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Teaching material agriculture food technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.

BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

  • 1. Big Data Cloud Meet Up September 8 th , 2011 HPCC Platform Big Data Analytics and Delivery http://hpccsystems.com LexisNexis’ massive parallel-processing open-source computing platform
  • 2. Who’s been using the HPCC Platform and why? Very large businesses Federal Agencies National research labs It’s 4 to 10 times faster Products and solutions are built much faster Very complex problems can be modeled and solved It’s proven http://hpccsystems.com
  • 3. What’s changed? We just Open-Sourced! The HPCC Platform is now available to you. http://hpccsystems.com
  • 4. Big Data…It’s our business. Big Data Open Source Components Insurance Financial Services Cyber Security Government Health Care Retail Telecommunications Transportation & Logistics Weblog Analysis INDUSTRY SOLUTIONS Online Reservations http://hpccsystems.com Customer Data Integration Data Fusion Fraud Detection and Prevention Know Your Customer Master Data Management Weblog Analysis
  • 5. The Platform’s Major Parts Thor – Data ingestion, hygiene, refining, transformation, linking, fusion Roxie – Data Delivery Engine Supports complex queries and distributed indexes Low latency -- Latencies grow logarithmically ECL – One language Highly expressive and efficient declarative language Solve complex problems Encourage code reuse http://hpccsystems.com
  • 6. How we’re different It’s not a group of disparate technologies or competing visions bolted together. It’s one platform with a clear proven vision. This by itself is powerful. http://hpccsystems.com
  • 7. How we’re different You can transcend map reduce Build transformative data graphs and applications using ECL Solve very complex Big Data problems Don’t struggle to fit your Big Data problem into groups of map reduce jobs http://hpccsystems.com
  • 8. How we’re different No need to munge the data before ingestion No complex block file system No need to tune number of tasks for different jobs Data Delivery Engine is included Use a single language for data cleansing, transformation, linking, fusion, and delivery ECL promotes language extension and code reuse Data graphs are built and optimized by the system The system-generated C++ is highly optimized Code execution is optimized Low and predictable latencies Modeling data problems as data problems leads to richer solutions http://hpccsystems.com
  • 9. Challenges Facing Health Care Enterprises Challenges facing the health insurance industry Disparate data in spread across separate physical locations Scale of data. BIG Data is getting BIGGER. Adding relationships exponentially expands the size of the BIG Data analytics challenge. LexisNexis has leveraged parallel-processing computing platforms and large scale graph analytics for a over a decade. http://hpccsystems.com
  • 10. Potential Fraud – a POC for the State of New York Applied social network analytics to information provided by the State of New York and public data supplied by LexisNexis to identify relationships between a group of New York Medicaid recipients living in high-end condominiums located within the same complex and any links those individuals might have to medical facilities or others providing care to New York Medicaid recipients. http://hpccsystems.com
  • 11. What’s entailed (high level) Mix First Party data with Public and Third Data sources Adds fidelity to existing entities Adds new linkages into the analysis Ads new entities into the analysis Exposes ring leaders and brokers that don’t directly participate Addition of External Data http://hpccsystems.com
  • 12. Graph \ Network 3 Billion derived public data relationships between people merged with risk indicators. Graph Analytics examine up to 20 billion data points to create variables that allows for predictive analysis incorporating relationship context and associated risk. Targets fraud across all sectors including Healthcare, Financial Services and Government. How we did it http://hpccsystems.com
  • 13. Cluster Visualization Introduction How many of them are living in expensive residences, owned expensive property or drive expensive cars? How many recipients are contacts of medical businesses? How many medical businesses are associated with any of the people in the cluster? How many are currently receiving benefits? Medicaid Recipient Expensive Residence Owns expensive property Owns Expensive Vehicles Business Contact of Medical Business Entity Cluster visualization introduction http://hpccsystems.com
  • 14. Cluster Visualization Cluster visualization http://hpccsystems.com
  • 15. City Walk Sample: Vehicle Statistics What is the list of preferred expensive vehicles? Vehicle Statistics http://hpccsystems.com Make Description # Owned Make Description # Owned Mercedes-Benz 46 Chevrolet 2 Lexus 41 Hummer 2 BMW 27 Jeep 2 Infiniti 13 Nissan 2 Acura 9 Toyota 2 Lincoln 8 Aston Martin 1 Audi 7 Bentley 1 Land Rover 7 Cadillac 1 Porsche 6 GMC 1 Jaguar 5 Honda 1 Mercedes Benz 3 Volkswagen 1 Saab 3 Volvo 1
  • 16. Dominant buyers and sellers at City Walk Property deed reference counts http://hpccsystems.com Name Deeds Held Name Deeds Held Hudson Eight 78 Mike Greem 21 Hudson Five 74 Scott Hill 21 Hudson First 73 Betty Donaway 21 Hudson Nine 65 Al Clark 19 Harry Anderson 45 Dave Miller 17 Hudson Ten 41 Mark Walker 16 Hudson Seven 39 Mike Smith 16 Home Nationwide 33 Val Edwards 15 Hudson Three 33 Eric Garcia 14 Brian Smith 28 Dane Young 14 Alan Stevens 25 Bill Moore 14 Chris Doe 24 Karen Carter 14 Sophie Davis 23 Casey Baker 14 Washington Mutual 23 Art Nelson 14 Fleet Mortgage Co. 21 Cathy Parker 13
  • 17. The engineering story http://hpccsystems.com One guy (Joe Prichard). Three weeks. Less than part time. The platform lets him focus on the data. Joe’s a lot of fun to work with.
  • 18. Do you do build other POC’s? Yes http://hpccsystems.com
  • 19. What next? Try us out! Virtual Machine Binaries EC2 Data Script Ensemble Recipe…Juan from Cannonical http://hpccsystems.com
  • 20. Contact Information Charles Kaminski Senior Architect Academic Development Lead HPCC Systems [email_address] 402-619-9413 http://hpccsystems.com