HPCC Systems 
Loading csv Data 
& 
Querying 
By Fujio Turner 
@FujioTurner
Non-Indexed Full Data Set 
1 20 
Customers Development Business 
http://hpccsystems.com/why-hpcc/benchmarks
ECL (Enterprise Control Language) 
C++ based query language 
SQL w/ JOINS 
Map/Reduce 
GraphDB 
Machine 
Learning 
Simple to Complex Queries
“I’m sub-second 
fast.” 
“I can query all 
or part of your 
data.” 
Architecture 
Thor Roxie 
Hard Disk 
Index(optional) 
Hard Disk 
Index(optional) 
In-memory Index 
SSD 
Either/Both
Example 
File Load File into HPCC Query 
CSV data sample source 
http://catalog.data.gov/dataset/consumer-complaint-database
Administrator Web GUI! 
on 
IP / Url of HPCC install Port 8010
4. add ,t 
5. 
1. Upload file*! 
2. Distribute to cluster! 
3. Name of file in cluster! 
4. Most CSV have t! 
5. Push to cluster 
*2GB file size limit through web 
No limit if uploaded via SOAP 
Load !! ! ! Data
*optional file rename Loaded 
In Thor Cluster
How do I Query HPCC Systems ? 
What Is ECL? 
ECL (Enterprise Control Language) is a C++ based query 
language for use with HPCC Systems Big Data platform. 
ECLs syntax and format is very simple and easy to learn.! 
! 
Note - ECL is very similar to Hadoop’s pig ,but! 
more expressive and feature rich.
Query w/ ECL 
Com := DATASET(‘~test::complaints’,ComS, 
CSV(HEADING(1), SEPARATOR([',','t']))); 
ComS :=RECORD 
UNSIGNED3 ComplaintID; 
STRING23 Product; 
STRING38 State; 
…………………………. 
…………………………. 
STRING31Consumer_disputed; 
END; 
Ma := Com(State = ‘MA’); 
Ma; //output 
WHERE `State` = ‘MA’ 
File Type 
File Location,! 
“FROM Table” 
“USE DATABASE;” 
“SELECT * ….” 
Schema
1. Go to playground! 
2. Edit ECL! 
3. Pick “thor” Cluster! 
4. Submit 
Practice 
http://www.meetup.com/HPCC-SV/pages/ECL_EXAMPLE__- 
_CSV_LOAD_and_QUERY
Schema Made EZ 
http://hpccsystems.com/demos/data-profiling-demo 
CSV 
IN 
Schema 
Click OUT 
Storing a new file and want to make a quick schema? 
! 
Take a small part of your CSV data and 
go to the link below to make an ECL Schema
ECL Guide 
http://hpccsystems.com/download/docs/ecl-language-reference 
JOIN! 
MERGE! 
LENGTH! 
REGEX! 
ROUND! 
SUM! 
COUNT! 
TRIM! 
WHEN! 
AVE! 
ABS! 
CASE! 
DEDUP! 
NORMALIZE! 
DENORMALIZE! 
IF! 
SORT! 
GROUP! 
more ….
For More HPCC! 
“How To’s”! 
Go to SlideShare 
http://www.slideshare.net/FujioTurner/
Watch how to install 
HPCC Systems 
in 5 Minutes 
Download HPCC Systems 
Open Source 
Community Edition 
http://hpccsystems.com/download/ 
http://www.youtube.com/watch?v=8SV43DCUqJg 
or 
Source Code 
https://github.com/hpcc-systems

More Related Content

PDF
Big Data - Load, Index & Query the EZ way - HPCC Systems
PDF
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
PDF
NoSQL Couchbase Lite & BigData HPCC Systems
PDF
Big Data - Fast Machine Learning at Scale + Couchbase
PDF
HPCC Systems vs Hadoop
PDF
SequoiaDB Distributed Relational Database
PDF
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
PPTX
Practical Hadoop using Pig
Big Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
NoSQL Couchbase Lite & BigData HPCC Systems
Big Data - Fast Machine Learning at Scale + Couchbase
HPCC Systems vs Hadoop
SequoiaDB Distributed Relational Database
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Practical Hadoop using Pig

What's hot (20)

PPTX
Redis Developers Day 2015 - Secondary Indexes and State of Lua
PDF
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Efficient Data Storage for Analytics with Apache Parquet 2.0
KEY
Hive vs Pig for HadoopSourceCodeReading
PDF
Native erasure coding support inside hdfs presentation
PPTX
PDF
Hypertable - massively scalable nosql database
PDF
Introduction to hadoop ecosystem
PDF
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
PPT
Database Architectures and Hypertable
PDF
Hypertable
PPTX
Hadoop Essential for Oracle Professionals
PPTX
Redis/Lessons learned
PPTX
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
PDF
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
PDF
Import web resources using R Studio
PPTX
Presentation at the EMBL-EBI Industry RDF meeting
PDF
Parquet Twitter Seattle open house
PDF
Spark Cassandra 2016
PDF
Full Text Search in PostgreSQL
Redis Developers Day 2015 - Secondary Indexes and State of Lua
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Efficient Data Storage for Analytics with Apache Parquet 2.0
Hive vs Pig for HadoopSourceCodeReading
Native erasure coding support inside hdfs presentation
Hypertable - massively scalable nosql database
Introduction to hadoop ecosystem
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Database Architectures and Hypertable
Hypertable
Hadoop Essential for Oracle Professionals
Redis/Lessons learned
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Import web resources using R Studio
Presentation at the EMBL-EBI Industry RDF meeting
Parquet Twitter Seattle open house
Spark Cassandra 2016
Full Text Search in PostgreSQL
Ad

Similar to Big Data - Load CSV File & Query the EZ way - HPCC Systems (13)

PDF
HPCC Systems - ECL for Programmers - Big Data - Data Scientist
PDF
Making Sense of Medicare Data: From Mining to Analytics
PDF
Meetup - Exabyte Big Data - HPCC Systems - SQL to ECL
PPTX
PDF
Custom Query Languages: Why? How?
PDF
Big Data for Small Businesses & Startups
PPTX
ACS DataMart_ppt
PPTX
ACS DataMart_ppt
PDF
HPCC Systems JDBC Driver
PDF
CQL3 and Data Modeling 101 with Apache Cassandra
PDF
Etl confessions pg conf us 2017
PPTX
Telegraph Cq English
PPTX
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
HPCC Systems - ECL for Programmers - Big Data - Data Scientist
Making Sense of Medicare Data: From Mining to Analytics
Meetup - Exabyte Big Data - HPCC Systems - SQL to ECL
Custom Query Languages: Why? How?
Big Data for Small Businesses & Startups
ACS DataMart_ppt
ACS DataMart_ppt
HPCC Systems JDBC Driver
CQL3 and Data Modeling 101 with Apache Cassandra
Etl confessions pg conf us 2017
Telegraph Cq English
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Ad

Recently uploaded (20)

PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Chapter 5: Probability Theory and Statistics
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Tartificialntelligence_presentation.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
STKI Israel Market Study 2025 version august
PDF
Getting Started with Data Integration: FME Form 101
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
DOCX
search engine optimization ppt fir known well about this
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
WOOl fibre morphology and structure.pdf for textiles
Chapter 5: Probability Theory and Statistics
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
A novel scalable deep ensemble learning framework for big data classification...
NewMind AI Weekly Chronicles – August ’25 Week III
sustainability-14-14877-v2.pddhzftheheeeee
Tartificialntelligence_presentation.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Getting started with AI Agents and Multi-Agent Systems
STKI Israel Market Study 2025 version august
Getting Started with Data Integration: FME Form 101
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Web Crawler for Trend Tracking Gen Z Insights.pptx
Hindi spoken digit analysis for native and non-native speakers
search engine optimization ppt fir known well about this
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
A comparative study of natural language inference in Swahili using monolingua...
observCloud-Native Containerability and monitoring.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Group 1 Presentation -Planning and Decision Making .pptx

Big Data - Load CSV File & Query the EZ way - HPCC Systems

  • 1. HPCC Systems Loading csv Data & Querying By Fujio Turner @FujioTurner
  • 2. Non-Indexed Full Data Set 1 20 Customers Development Business http://hpccsystems.com/why-hpcc/benchmarks
  • 3. ECL (Enterprise Control Language) C++ based query language SQL w/ JOINS Map/Reduce GraphDB Machine Learning Simple to Complex Queries
  • 4. “I’m sub-second fast.” “I can query all or part of your data.” Architecture Thor Roxie Hard Disk Index(optional) Hard Disk Index(optional) In-memory Index SSD Either/Both
  • 5. Example File Load File into HPCC Query CSV data sample source http://catalog.data.gov/dataset/consumer-complaint-database
  • 6. Administrator Web GUI! on IP / Url of HPCC install Port 8010
  • 7. 4. add ,t 5. 1. Upload file*! 2. Distribute to cluster! 3. Name of file in cluster! 4. Most CSV have t! 5. Push to cluster *2GB file size limit through web No limit if uploaded via SOAP Load !! ! ! Data
  • 8. *optional file rename Loaded In Thor Cluster
  • 9. How do I Query HPCC Systems ? What Is ECL? ECL (Enterprise Control Language) is a C++ based query language for use with HPCC Systems Big Data platform. ECLs syntax and format is very simple and easy to learn.! ! Note - ECL is very similar to Hadoop’s pig ,but! more expressive and feature rich.
  • 10. Query w/ ECL Com := DATASET(‘~test::complaints’,ComS, CSV(HEADING(1), SEPARATOR([',','t']))); ComS :=RECORD UNSIGNED3 ComplaintID; STRING23 Product; STRING38 State; …………………………. …………………………. STRING31Consumer_disputed; END; Ma := Com(State = ‘MA’); Ma; //output WHERE `State` = ‘MA’ File Type File Location,! “FROM Table” “USE DATABASE;” “SELECT * ….” Schema
  • 11. 1. Go to playground! 2. Edit ECL! 3. Pick “thor” Cluster! 4. Submit Practice http://www.meetup.com/HPCC-SV/pages/ECL_EXAMPLE__- _CSV_LOAD_and_QUERY
  • 12. Schema Made EZ http://hpccsystems.com/demos/data-profiling-demo CSV IN Schema Click OUT Storing a new file and want to make a quick schema? ! Take a small part of your CSV data and go to the link below to make an ECL Schema
  • 13. ECL Guide http://hpccsystems.com/download/docs/ecl-language-reference JOIN! MERGE! LENGTH! REGEX! ROUND! SUM! COUNT! TRIM! WHEN! AVE! ABS! CASE! DEDUP! NORMALIZE! DENORMALIZE! IF! SORT! GROUP! more ….
  • 14. For More HPCC! “How To’s”! Go to SlideShare http://www.slideshare.net/FujioTurner/
  • 15. Watch how to install HPCC Systems in 5 Minutes Download HPCC Systems Open Source Community Edition http://hpccsystems.com/download/ http://www.youtube.com/watch?v=8SV43DCUqJg or Source Code https://github.com/hpcc-systems