SlideShare a Scribd company logo
TeraLab, A Secure Big Data Platform
Description And Use Cases
Franck Cotton – INSEE
Kamel Gadouche – GENES/CASD
TERALAB: DESCRIPTION
*Etude Deloitte 2013
Birth of the TeraLab project
*Etude Deloitte 2013
• Call for projects “Cloud computing / Big Data” conducted by the French
Government
• Proposal for the construction and operation of a Big Data platform,
– For innovation, research and education projects
– Submitted by a consortium comprising
• The IMT (Institut Mines-Télécom)
• The GENES,particularly the CASD (secure remote access data center)
• With INSEE partnership
• Project selected and launched
– Budget of 5,7 M€
– Over 5 years
– Contract signed in December 2013
The TeraLab platform
• A state-of-the-art technical infrastructure
– Elastic distributed system + tera-memory server
– With unique security features
• A rich catalogue of software tools
– Data storage (MPP, NoSQL)
– Query, exploration, visualization (Pig, Hive, Mahout…)
– Management and monitoring
• Data sets
– Pre-installed (public data, open data…)
– Brought by the projects, or acquired for them
• A dedicated team
– 6 people
– Platform configuration and operation
– Project advisors
*Etude Deloitte 2013
Platform organization
*Etude Deloitte 2013
The CASD
• The CASD is a facility including
– A central secure computing infrastructure (IICE): “the bubble”
– Specific access devices (SD-Box™), guarantying imperviousness as the sole means
for accessing the IICE.
• With the SD-Box, researchers may
– Work remotely on confidential data
• With 64-bit statistics sofware: SAS, Stata, R, Gauss, Matlab, Latex, Excel...
• Soon with Big Data software: Hive, Pig, Mahout, Revolution Analytics, Python…
– Request inputs or outputs
• Scripts or data
• Inputs and outputs are monitored
• With the SD-Box, data owners are sure that
– The authorized researcher is the one behind the SD-Box (smartcard and biometry)
– No data can be retrieved by researchers (no copy or paste, printing, USB keys...)
*Etude Deloitte 2013
EDF2014: Franck Cotton  & Kamel Gadouche, France: TeraLab - A Secure Big Data Platform, Description And Use Cases
EDF2014: Franck Cotton  & Kamel Gadouche, France: TeraLab - A Secure Big Data Platform, Description And Use Cases
EDF2014: Franck Cotton  & Kamel Gadouche, France: TeraLab - A Secure Big Data Platform, Description And Use Cases
The TeraLab platform – planning
• 2014 - 2015
– Incremental platform construction
– Pilot projects
– No cost
• 2016 - 2018
– Professionalization (business model, methodology, client support, etc.)
– Operating expenses recovery
• 2019 and beyond
– Target service offer
– Commercial mode
*Etude Deloitte 2013
TERALAB: SOME USE CASES
*Etude Deloitte 2013
Use cases in public statistics
• A burning subject
– The statistical community sees Big Data as a high-priority topic
– A few experiences in some pioneer statistical institutes (Estonia,The Netherlands, etc.)
– Several actions launched by international organizations (OECD, UNECE, Eurostat)
• HowTeraLab fits in
– Needs: methodological tests, exploration of data sources, process redesign
– A presentation to the French official statistics system aroused much interest
– Precise project on scanner data for the consumer price index
• Currently a 7 To relational database
– Other ideas expressed
• Telco data for tourism statistics
• Web site log analysis
• Next-generation social declarations
*Etude Deloitte 2013
Use case for health data
• French context
– Everyone has a unique personal identifier (the NIR)
• Allowing data matching
• Longitudinal studies
• Using the NIR requires high confidentiality (organized by law)
– A central database with all the health services provided to every citizen
• More than 1.2 billion records with more than a thousand variables
• About 250 terabytes of data generated each year
• Real time updates
• HowTeraLab fits in
– Able to meet the challenges
• Huge volumes
• Real-time analysis
– While ensuring ultra-high security
*Etude Deloitte 2013
Use case for data challenges
• The DataScience web site (http://datascience.net)
– Allows data owners to issue public or private challenges based on their data
– Allows data scientists to analyze the data, to submit models and their results and to get
evaluation scores (ranking). The winner gets a prize in euros.
• The goals are to improve the knowledge
– On methodological aspects
– On data management aspects
• HowTeraLab fits in
– Allow to organize challenges on Big Data
• hosted byTeraLab – standard
• hosted byTeraLab – bubble
– Help disseminate Big Data technologies
*Etude Deloitte 2013
CONCLUSION
*Etude Deloitte 2013
Where are we now?
• The story has just begun
– The planning is tight
• A ß-version of the distributed service will open in April 2014
• The “tera-memory” server will open in summer 2014
• The ultra-secure compartment will open in September 2014
– The team is currently being set up
– Several pilot projects have been identified
– The methodology for projects management is being defined
• Contact us if you have a Big Data project
• Visit us at http://www.teralab-datascience.fr/
*Etude Deloitte 2013
Thank you for your attention
franck.cotton@insee.fr
kamel.gadouche@casd.eu
22/03/2014 17
WWW.TERALAB-DATASCIENCE.FR

More Related Content

PPTX
EDF2014: Nicolas Lemcke Horst, Ambassador of the Danish Basic Data Programme,...
PPTX
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
PPT
EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...
PPT
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
PDF
Sitra data strategy
PPTX
EDF2014: Harry Theocharis, General Secretary of Public Revenue in the Ministr...
PPTX
EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...
PDF
Evolution of Data Spaces
EDF2014: Nicolas Lemcke Horst, Ambassador of the Danish Basic Data Programme,...
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
Sitra data strategy
EDF2014: Harry Theocharis, General Secretary of Public Revenue in the Ministr...
EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...
Evolution of Data Spaces

What's hot (20)

PDF
Rajendra Akerkar - LeMO Project
PDF
Building blocks for fair digital society
PPTX
Introduction to EOSCpilot project and topical activities in the area of EOSC
PDF
OPEN DEI vision about European Data Spaces
PPTX
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
PDF
European Data Spaces
PDF
Sitra rise of the pilots janne enberg
PDF
EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...
PDF
The EC strategy to enable data sharing spaces
PDF
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
PPT
20140521 presentation ce de mv3
PPTX
An overview of piv initiatives(papaloi,gouscos)final21.5
PPTX
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
PPTX
EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...
PDF
SC6 Workshop 1: What can big data do for you?
PDF
SC6 Workshop 1: From your data to data stories - BigDataEurope, SC6 Workshop
PDF
Europe rules – making the fair data economy flourish
PPT
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
PPTX
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
PPTX
BYTE Project Overview
Rajendra Akerkar - LeMO Project
Building blocks for fair digital society
Introduction to EOSCpilot project and topical activities in the area of EOSC
OPEN DEI vision about European Data Spaces
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
European Data Spaces
Sitra rise of the pilots janne enberg
EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...
The EC strategy to enable data sharing spaces
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
20140521 presentation ce de mv3
An overview of piv initiatives(papaloi,gouscos)final21.5
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...
SC6 Workshop 1: What can big data do for you?
SC6 Workshop 1: From your data to data stories - BigDataEurope, SC6 Workshop
Europe rules – making the fair data economy flourish
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
BYTE Project Overview
Ad

Viewers also liked (17)

PPTX
EDF2014: Talk of Inge Buffolo, Head of Institutional Relations ad Linguistic ...
PPTX
EDF2014: Talk of Axel Polleres, Full Professor, WU - Vienna University of Eco...
PPTX
EDF2014: Talk of Peter Cullen, General Manager, Trustworthy Computing, Micros...
PPT
EDF2014: José Ignacio Sánchez Valdenebro, Deputy Director of Digital Public S...
PDF
EDF2014: Talk of European Data Innovator Award Winner: Johann Mittheisz, form...
PPT
EDF2014: Talk of Nancy Routzouni, E-Governance Advisor to the Deputy Minister...
PPTX
EDF2014: Talk of Krzysztof Wecel, Assistant professor, Poznan University of E...
PPT
EDF2014: Talk of Ioannis Kotsiopoulos, European Dynamics: Semantics – Interop...
PPT
EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...
PPTX
EDF2014: Talk of Vassileios Tsetsos, Chief Technical Officer, Mobics Ltd: Pre...
PPTX
Barbato leit ict 15-16-17
PPTX
EDF2014: Talk of Marta Nagy-Rothengass, Head of Unit Data Value Chain, Direct...
PPTX
EDF2014: Talk of Stefan Decker, Director, Insight Galway, Ireland & Anthony M...
PDF
EDF2014: Talk of Abraham Bernstein, Full Professor of Informatics, University...
PPTX
EDF2014: Talk of Frank Kresin, Research Director, Waag Society, Netherlands: ...
PPTX
EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
PDF
EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...
EDF2014: Talk of Inge Buffolo, Head of Institutional Relations ad Linguistic ...
EDF2014: Talk of Axel Polleres, Full Professor, WU - Vienna University of Eco...
EDF2014: Talk of Peter Cullen, General Manager, Trustworthy Computing, Micros...
EDF2014: José Ignacio Sánchez Valdenebro, Deputy Director of Digital Public S...
EDF2014: Talk of European Data Innovator Award Winner: Johann Mittheisz, form...
EDF2014: Talk of Nancy Routzouni, E-Governance Advisor to the Deputy Minister...
EDF2014: Talk of Krzysztof Wecel, Assistant professor, Poznan University of E...
EDF2014: Talk of Ioannis Kotsiopoulos, European Dynamics: Semantics – Interop...
EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...
EDF2014: Talk of Vassileios Tsetsos, Chief Technical Officer, Mobics Ltd: Pre...
Barbato leit ict 15-16-17
EDF2014: Talk of Marta Nagy-Rothengass, Head of Unit Data Value Chain, Direct...
EDF2014: Talk of Stefan Decker, Director, Insight Galway, Ireland & Anthony M...
EDF2014: Talk of Abraham Bernstein, Full Professor of Informatics, University...
EDF2014: Talk of Frank Kresin, Research Director, Waag Society, Netherlands: ...
EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...
Ad

Similar to EDF2014: Franck Cotton & Kamel Gadouche, France: TeraLab - A Secure Big Data Platform, Description And Use Cases (20)

PPT
Privacy in the Age of Big Data
PDF
Big Data for Library Services (2017)
PPTX
Department of Commerce App Challenge: Big Data Dashboards
PDF
PPTX
Unit 1-FDS. .pptx
PPTX
Workshop_Presentation.pptx
PDF
Data Science Introduction and Process in Data Science
PDF
Data Science 1st Edition Robert Stahlbock Gary M Weiss Mahmoud Abounasr
PDF
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
PPTX
Best Data Science Course in Rohini, BY DICS
PPT
Data Science in the Real World: Making a Difference
PPTX
Data Science
PPTX
Unit 1 Introduction to Data Analytics .pptx
PDF
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
PDF
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
PDF
The Data Lab - Value from Data
PPTX
DevelopingDataScienceProfession
PDF
Introduction To Data Science Laura Igual Santi Segu
PDF
Data Science In Societal Applications Siddharth Swarup Rautaray
PDF
Data Science In Societal Applications Siddharth Swarup Rautaray
Privacy in the Age of Big Data
Big Data for Library Services (2017)
Department of Commerce App Challenge: Big Data Dashboards
Unit 1-FDS. .pptx
Workshop_Presentation.pptx
Data Science Introduction and Process in Data Science
Data Science 1st Edition Robert Stahlbock Gary M Weiss Mahmoud Abounasr
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Best Data Science Course in Rohini, BY DICS
Data Science in the Real World: Making a Difference
Data Science
Unit 1 Introduction to Data Analytics .pptx
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
The Data Lab - Value from Data
DevelopingDataScienceProfession
Introduction To Data Science Laura Igual Santi Segu
Data Science In Societal Applications Siddharth Swarup Rautaray
Data Science In Societal Applications Siddharth Swarup Rautaray

More from European Data Forum (8)

PPT
EDF2014: BIG - NESSI Networking Session: Intro Presentation
PPTX
EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...
PDF
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
PPTX
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
PPTX
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
PPTX
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
PDF
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
PDF
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...
EDF2014: BIG - NESSI Networking Session: Intro Presentation
EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
KodekX | Application Modernization Development
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Approach and Philosophy of On baking technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
DOCX
The AUB Centre for AI in Media Proposal.docx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Review of recent advances in non-invasive hemoglobin estimation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
KodekX | Application Modernization Development
Advanced methodologies resolving dimensionality complications for autism neur...
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
Encapsulation_ Review paper, used for researhc scholars
Mobile App Security Testing_ A Comprehensive Guide.pdf
Approach and Philosophy of On baking technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Building Integrated photovoltaic BIPV_UPV.pdf
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction
The AUB Centre for AI in Media Proposal.docx

EDF2014: Franck Cotton & Kamel Gadouche, France: TeraLab - A Secure Big Data Platform, Description And Use Cases

  • 1. TeraLab, A Secure Big Data Platform Description And Use Cases Franck Cotton – INSEE Kamel Gadouche – GENES/CASD
  • 3. Birth of the TeraLab project *Etude Deloitte 2013 • Call for projects “Cloud computing / Big Data” conducted by the French Government • Proposal for the construction and operation of a Big Data platform, – For innovation, research and education projects – Submitted by a consortium comprising • The IMT (Institut Mines-Télécom) • The GENES,particularly the CASD (secure remote access data center) • With INSEE partnership • Project selected and launched – Budget of 5,7 M€ – Over 5 years – Contract signed in December 2013
  • 4. The TeraLab platform • A state-of-the-art technical infrastructure – Elastic distributed system + tera-memory server – With unique security features • A rich catalogue of software tools – Data storage (MPP, NoSQL) – Query, exploration, visualization (Pig, Hive, Mahout…) – Management and monitoring • Data sets – Pre-installed (public data, open data…) – Brought by the projects, or acquired for them • A dedicated team – 6 people – Platform configuration and operation – Project advisors *Etude Deloitte 2013
  • 6. The CASD • The CASD is a facility including – A central secure computing infrastructure (IICE): “the bubble” – Specific access devices (SD-Box™), guarantying imperviousness as the sole means for accessing the IICE. • With the SD-Box, researchers may – Work remotely on confidential data • With 64-bit statistics sofware: SAS, Stata, R, Gauss, Matlab, Latex, Excel... • Soon with Big Data software: Hive, Pig, Mahout, Revolution Analytics, Python… – Request inputs or outputs • Scripts or data • Inputs and outputs are monitored • With the SD-Box, data owners are sure that – The authorized researcher is the one behind the SD-Box (smartcard and biometry) – No data can be retrieved by researchers (no copy or paste, printing, USB keys...) *Etude Deloitte 2013
  • 10. The TeraLab platform – planning • 2014 - 2015 – Incremental platform construction – Pilot projects – No cost • 2016 - 2018 – Professionalization (business model, methodology, client support, etc.) – Operating expenses recovery • 2019 and beyond – Target service offer – Commercial mode *Etude Deloitte 2013
  • 11. TERALAB: SOME USE CASES *Etude Deloitte 2013
  • 12. Use cases in public statistics • A burning subject – The statistical community sees Big Data as a high-priority topic – A few experiences in some pioneer statistical institutes (Estonia,The Netherlands, etc.) – Several actions launched by international organizations (OECD, UNECE, Eurostat) • HowTeraLab fits in – Needs: methodological tests, exploration of data sources, process redesign – A presentation to the French official statistics system aroused much interest – Precise project on scanner data for the consumer price index • Currently a 7 To relational database – Other ideas expressed • Telco data for tourism statistics • Web site log analysis • Next-generation social declarations *Etude Deloitte 2013
  • 13. Use case for health data • French context – Everyone has a unique personal identifier (the NIR) • Allowing data matching • Longitudinal studies • Using the NIR requires high confidentiality (organized by law) – A central database with all the health services provided to every citizen • More than 1.2 billion records with more than a thousand variables • About 250 terabytes of data generated each year • Real time updates • HowTeraLab fits in – Able to meet the challenges • Huge volumes • Real-time analysis – While ensuring ultra-high security *Etude Deloitte 2013
  • 14. Use case for data challenges • The DataScience web site (http://datascience.net) – Allows data owners to issue public or private challenges based on their data – Allows data scientists to analyze the data, to submit models and their results and to get evaluation scores (ranking). The winner gets a prize in euros. • The goals are to improve the knowledge – On methodological aspects – On data management aspects • HowTeraLab fits in – Allow to organize challenges on Big Data • hosted byTeraLab – standard • hosted byTeraLab – bubble – Help disseminate Big Data technologies *Etude Deloitte 2013
  • 16. Where are we now? • The story has just begun – The planning is tight • A ß-version of the distributed service will open in April 2014 • The “tera-memory” server will open in summer 2014 • The ultra-secure compartment will open in September 2014 – The team is currently being set up – Several pilot projects have been identified – The methodology for projects management is being defined • Contact us if you have a Big Data project • Visit us at http://www.teralab-datascience.fr/ *Etude Deloitte 2013
  • 17. Thank you for your attention franck.cotton@insee.fr kamel.gadouche@casd.eu 22/03/2014 17 WWW.TERALAB-DATASCIENCE.FR