SlideShare a Scribd company logo
Towards real-time analysis of large data volumes for synchrotron experiments

Martin Kunz, Nobumichi Tamura
Advanced Light Source, Lawrence Berkeley National Lab
Towards real-time analysis of large data volumes
for synchrotron experiments

Acknowledgements

- Jack Deslippe, David Skinner (NERSC)
- Abdelilah Essiari , Craig E. Tull (LBNL-CRD)
- Eli Dart (ESNET)
- Dula Parkinson (LBNL – ALS)
Towards real-time analysis of large data volumes
for synchrotron experiments

X-rays and Earth-Sciences; the story of a moving bottle-neck:
1960’s / 1970’s
X-ray Source

X-ray Detectors

Henry Levy with Picker 5-circle and PDP-5

Data Analysis

Publication
Towards real-time analysis of large data volumes
for synchrotron experiments

X-rays and Earth-Sciences; the story of a moving bottle-neck:
1980’s / 1990’s
X-ray Source

X-ray Detectors

1995: “MD Storm”: Readout time: 45 minutes

Data Analysis

Publication
Towards real-time analysis of large data volumes
for synchrotron experiments

X-rays and Earth-Sciences; the story of a moving bottle-neck:
2000’s / 2010’s
X-ray Source

X-ray Detectors

Data Analysis

Publication
Towards real-time analysis of large data volumes
for synchrotron experiments

X-rays and Earth-Sciences; the story of a moving bottle-neck:

Future:
X-ray Source

X-ray Detectors

Interactive access to supercomputers

Data Analysis

Publication
Towards real-time analysis of large data volumes
for synchrotron experiments

Examples of mineral physics related experiments with high data rates:
1) In situ powder diffraction with automated P-T stepping:

ALS BL 12.2.2 with Perkin Elmer detector (~ 0 read-out delay)

http://www.ltp-oldenburg.de

Data rate in the order of 1000’s of frames per day (i.e. 10’s of GB/day)
Towards real-time analysis of large data volumes
for synchrotron experiments

Examples of mineral physics related experiments with high data rates:
2) Micro-diffraction / phase/orientation/strain-mapping at high spatial resolution

Micro-diffraction set-up at ALS beamline 12.3.2 with
Pilatus-1M detector.

Left: Distribution of Re3N (black) and Re (blue) grown in a laser-heated DAC
Right: Relative orientation of Re3N grains.
Source: Friedrich et al. (2010), PRL (105), 085504.

Data rate in the order of 10000’s of frames per day (i.e. 100’s of GB/day)
Towards real-time analysis of large data volumes
for synchrotron experiments

Examples of mineral physics related experiments with high data rates:
3) Tomography 3d-mapping of geo-materials:

X-rays

Scintillator

Supercritical CO2 penetrating sandstone on ALS BL 8.3.2 (courtesy J
Ajo-Franklin)

Tomography set-up at ALS beamline 8.3.2
Distribution of Fe-alloy melt prepared at 64 GPa measured at SSRL. Shi et al. (2013)
Nature Geosciences. DOI: 10.1038/NGEO1956

Data rate in the order of 100’000’s of frames per day (i.e. TB’s/day)
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
- 24 dual-socket AMD Opteron 248 2.2Ghz processor nodes 48 CPU’s
- 48 GB aggregate memory
- 14 TB shared disk storage
- Gigabit Ethernet interconnect
- 212 GFLOPS (theoretical peak)
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) User tunes parameters manually on some ‘typical’ patterns
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) Analysis Parameters are written into a instruction-file
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) Analysis Parameters are written into a instruction-file
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
2) Launch parsing script:
-> reads instruction file and parses data-file onto available CPU’s
-> writes batch files which manage individual CPU’s
-> launches software on each node
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
3) Results are written in a single file which can be viewed and further analyzed and published:
Relative lattice orientation: Gives domain structure.
Total color range blue to red corresponds to 4 degs rotation.

Average Intensity: Gives high-res fine structure of grain
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection

Data are packaged:
- after every n images a ‘trigger file’ is deposited in a
directory which is monitored by NERSC.
- a SPADE web-app wraps the data (512 files at a
time) with HDF5 (hierarchical data format) and ships
them to NERSC via a Gigabit line (will be upgraded to
10G line).
- at NERSC data are received by a SPADE instance,
places them in target folder and on tape, and sends
an acknowledgment.
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection Up and running

Transfer control is web-based
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection Up and running

Transfer control is web-based
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection: Up and running

Transfer control is web-based
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
2) Analysis parameters are set-up with a web-app - under development
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
2) Analysis parameters are set-up with a web-app - under development

Jobs are launched manually by user via same web-page.
Test-runs indicate analysis time in the order of data collection time;
can in principle run synchronous to data collection.
Towards real-time analysis of large data volumes
for synchrotron experiments

How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
3) Analysis jobs are executed on Carver - under development

Carver is an IBM iDataPlex cluster
- 1202 nodes with a total of 9984 processor cores
- 106 Tflop/sec peak performance
- largest allocated parallel job is 512 cores
Towards real-time analysis of large data volumes
for synchrotron experiments

Summary:
- Data analysis is the new bottle-neck limiting progress in many aspects of experimental mineral
physics
- Real-time analysis with immediate feed-back is increasingly important in experimental mineral
physics
- These challenges cannot always be met with traditional desktop machines – software has to be
automatized and parallelized; collaborations with super-computing is becoming important also for
experimental scientists (at least for a few more iterations of Moore’s cycle).
- Data analysis on super-computers, remotely controlled with web-applications is a very promising
alley, allowing for big-data methods to enter mineral physics.
- Future developments may (must?) evolve away from super computers to highly parallelized
(GPU’s) local computers and/or cloud computing.

More Related Content

PPTX
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
PPTX
Welcome & Workshop Objectives: Introduction to COMPRES by Jay Bass, Universit...
PPTX
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
PPTX
NERSC, AI and the Superfacility, Debbie Bard
PPT
Cyberinfrastructure to Support Ocean Observatories
PPTX
CENIC: Pacific Wave and PRP Update Big News for Big Data
PPTX
Creating a Big Data Machine Learning Platform in California
PPTX
Using the Pacific Research Platform for Earth Sciences Big Data
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Welcome & Workshop Objectives: Introduction to COMPRES by Jay Bass, Universit...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
NERSC, AI and the Superfacility, Debbie Bard
Cyberinfrastructure to Support Ocean Observatories
CENIC: Pacific Wave and PRP Update Big News for Big Data
Creating a Big Data Machine Learning Platform in California
Using the Pacific Research Platform for Earth Sciences Big Data

What's hot (20)

PPTX
The Pacific Research Platform
PPTX
GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...
PPT
Cyberinfrastructure to Support Ocean Observatories
PPT
Creating High Performance Lambda Collaboratories
PDF
Reusable Software and Open Data To Optimize Agriculture
PDF
Research on Blue Waters
PPTX
The Pacific Research Platform
PPTX
Security Challenges and the Pacific Research Platform
PPTX
ADASS XXV: LSST DM - Building the Data System for the Era of Petascale Optica...
PPT
Applying Photonics to User Needs: The Application Challenge
PPT
Ceoa Nov 2005 Final Small
PPTX
AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound ...
PPTX
LSST Solar System Science: MOPS Status, the Science, and Your Questions
PPT
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
PPTX
PRP, NRP, GRP & the Path Forward
PPTX
Peering The Pacific Research Platform With The Great Plains Network
PPT
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
PPTX
PRP, CHASE-CI, TNRP and OSG
PPTX
Looking Back, Looking Forward NSF CI Funding 1985-2025
PPTX
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform
GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...
Cyberinfrastructure to Support Ocean Observatories
Creating High Performance Lambda Collaboratories
Reusable Software and Open Data To Optimize Agriculture
Research on Blue Waters
The Pacific Research Platform
Security Challenges and the Pacific Research Platform
ADASS XXV: LSST DM - Building the Data System for the Era of Petascale Optica...
Applying Photonics to User Needs: The Application Challenge
Ceoa Nov 2005 Final Small
AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound ...
LSST Solar System Science: MOPS Status, the Science, and Your Questions
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
PRP, NRP, GRP & the Path Forward
Peering The Pacific Research Platform With The Great Plains Network
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
PRP, CHASE-CI, TNRP and OSG
Looking Back, Looking Forward NSF CI Funding 1985-2025
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
Ad

Viewers also liked (19)

PPSX
Predictive analysis
PDF
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
PPTX
Predictive Analytics: Big data lessons from big physics
PDF
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
PDF
Mining Big Data in Real Time
PDF
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...
PPTX
Real-time Big Data Analytics: From Deployment to Production
PDF
Telco Big Data Workshop Sample
PPTX
Predictive analysis and modelling
PDF
Real-Time Big Data Stream Analytics
PDF
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
PPT
Introduction To Predictive Analytics Part I
PDF
Real Time Analytics: Algorithms and Systems
PPT
Big Data Real Time Analytics - A Facebook Case Study
PPTX
Predictive Analytics - An Overview
PPTX
Predictive Analytics: Context and Use Cases
PDF
Predictive Analytics using R
PDF
A quick intro to In memory computing
PPTX
Big data ppt
Predictive analysis
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
Predictive Analytics: Big data lessons from big physics
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
Mining Big Data in Real Time
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...
Real-time Big Data Analytics: From Deployment to Production
Telco Big Data Workshop Sample
Predictive analysis and modelling
Real-Time Big Data Stream Analytics
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Introduction To Predictive Analytics Part I
Real Time Analytics: Algorithms and Systems
Big Data Real Time Analytics - A Facebook Case Study
Predictive Analytics - An Overview
Predictive Analytics: Context and Use Cases
Predictive Analytics using R
A quick intro to In memory computing
Big data ppt
Ad

Similar to Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL (20)

PDF
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
PPT
Toward a Global Interactive Earth Observing Cyberinfrastructure
PDF
Big Fast Data in High-Energy Particle Physics
PDF
Jarp big data_sydney_v7
PPTX
The Pacific Research Platform
 Two Years In
PPT
Computation and Knowledge
PPT
201109021 mcguinness ska_meeting
PDF
How HPC and large-scale data analytics are transforming experimental science
PPT
Science and Cyberinfrastructure in the Data-Dominated Era
PPT
Building an Information Infrastructure to Support Genetic Sciences
PPT
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
PPTX
The Transformation of Systems Biology Into A Large Data Science
PPTX
Opportunities for X-Ray science in future computing architectures
PPT
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
PDF
Genome Assembly
PDF
Dynamic Data Center concept
PPT
"Some Reflections on Data in the Public Sector" : Communia: The European Them...
PPTX
Data Automation at Light Sources
PDF
Computational Training and Data Literacy for Domain Scientists
PDF
Data Capacitor II at Indiana University
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
Toward a Global Interactive Earth Observing Cyberinfrastructure
Big Fast Data in High-Energy Particle Physics
Jarp big data_sydney_v7
The Pacific Research Platform
 Two Years In
Computation and Knowledge
201109021 mcguinness ska_meeting
How HPC and large-scale data analytics are transforming experimental science
Science and Cyberinfrastructure in the Data-Dominated Era
Building an Information Infrastructure to Support Genetic Sciences
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
The Transformation of Systems Biology Into A Large Data Science
Opportunities for X-Ray science in future computing architectures
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
Genome Assembly
Dynamic Data Center concept
"Some Reflections on Data in the Public Sector" : Communia: The European Them...
Data Automation at Light Sources
Computational Training and Data Literacy for Domain Scientists
Data Capacitor II at Indiana University

More from EarthCube (20)

PDF
Community Webinar: Tune up for AGU
PPTX
Engagement Team monthly meeting 10.10.2014
PPTX
Sci Committee Meeting Slides 10.06.14
PPTX
Funded teams slides 10.10.14
PPTX
Technology and Architecture Committee meeting slides 10.06.14
PPTX
EarthCube Governance Intro for Solar Terrestrial End-user Workshop
PPTX
EarthCube Community Webinar: Introduction to Committees and Teams
PPT
AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...
PPTX
AHM 2014: PolarHub: A Global Hub for Geospatial Service Discovery
PDF
AHM 2014: Addressing Data and Heterogeneity, Semantic Building Blocks & CI Pe...
PPTX
AHM 2014: Revisting Governance Model, Preparing for Next Steps
PPT
AHM 2014: The World of VHub.org ONline Collaboration, Sharing, Data, Models...
PDF
AHM 2014: Crawling for EarthCube
PPTX
AHM 2014: The Flow Simulation Tools on VHub
PDF
AHM 2014: Integrated Data Management System for Critical Zone Observatories
PDF
Peckham 2014 i_em_ss
PPTX
AHM 2014: BCube Brokering Framework
PPTX
AHM 2014: EarthCube Architecture Forum Introduction
PPT
AHM 2014: A Few Notes on GEOSS Architecture
PPTX
AHM 2014: The iPlant Collaborative, Community Cyberinfrastructure for Life Sc...
Community Webinar: Tune up for AGU
Engagement Team monthly meeting 10.10.2014
Sci Committee Meeting Slides 10.06.14
Funded teams slides 10.10.14
Technology and Architecture Committee meeting slides 10.06.14
EarthCube Governance Intro for Solar Terrestrial End-user Workshop
EarthCube Community Webinar: Introduction to Committees and Teams
AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...
AHM 2014: PolarHub: A Global Hub for Geospatial Service Discovery
AHM 2014: Addressing Data and Heterogeneity, Semantic Building Blocks & CI Pe...
AHM 2014: Revisting Governance Model, Preparing for Next Steps
AHM 2014: The World of VHub.org ONline Collaboration, Sharing, Data, Models...
AHM 2014: Crawling for EarthCube
AHM 2014: The Flow Simulation Tools on VHub
AHM 2014: Integrated Data Management System for Critical Zone Observatories
Peckham 2014 i_em_ss
AHM 2014: BCube Brokering Framework
AHM 2014: EarthCube Architecture Forum Introduction
AHM 2014: A Few Notes on GEOSS Architecture
AHM 2014: The iPlant Collaborative, Community Cyberinfrastructure for Life Sc...

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Monthly Chronicles - July 2025
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Chapter 3 Spatial Domain Image Processing.pdf
GamePlan Trading System Review: Professional Trader's Honest Take
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Understanding_Digital_Forensics_Presentation.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Unlocking AI with Model Context Protocol (MCP)
Network Security Unit 5.pdf for BCA BBA.
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Review of recent advances in non-invasive hemoglobin estimation
The Rise and Fall of 3GPP – Time for a Sabbatical?
The AUB Centre for AI in Media Proposal.docx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Monthly Chronicles - July 2025

Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

  • 1. Towards real-time analysis of large data volumes for synchrotron experiments Martin Kunz, Nobumichi Tamura Advanced Light Source, Lawrence Berkeley National Lab
  • 2. Towards real-time analysis of large data volumes for synchrotron experiments Acknowledgements - Jack Deslippe, David Skinner (NERSC) - Abdelilah Essiari , Craig E. Tull (LBNL-CRD) - Eli Dart (ESNET) - Dula Parkinson (LBNL – ALS)
  • 3. Towards real-time analysis of large data volumes for synchrotron experiments X-rays and Earth-Sciences; the story of a moving bottle-neck: 1960’s / 1970’s X-ray Source X-ray Detectors Henry Levy with Picker 5-circle and PDP-5 Data Analysis Publication
  • 4. Towards real-time analysis of large data volumes for synchrotron experiments X-rays and Earth-Sciences; the story of a moving bottle-neck: 1980’s / 1990’s X-ray Source X-ray Detectors 1995: “MD Storm”: Readout time: 45 minutes Data Analysis Publication
  • 5. Towards real-time analysis of large data volumes for synchrotron experiments X-rays and Earth-Sciences; the story of a moving bottle-neck: 2000’s / 2010’s X-ray Source X-ray Detectors Data Analysis Publication
  • 6. Towards real-time analysis of large data volumes for synchrotron experiments X-rays and Earth-Sciences; the story of a moving bottle-neck: Future: X-ray Source X-ray Detectors Interactive access to supercomputers Data Analysis Publication
  • 7. Towards real-time analysis of large data volumes for synchrotron experiments Examples of mineral physics related experiments with high data rates: 1) In situ powder diffraction with automated P-T stepping: ALS BL 12.2.2 with Perkin Elmer detector (~ 0 read-out delay) http://www.ltp-oldenburg.de Data rate in the order of 1000’s of frames per day (i.e. 10’s of GB/day)
  • 8. Towards real-time analysis of large data volumes for synchrotron experiments Examples of mineral physics related experiments with high data rates: 2) Micro-diffraction / phase/orientation/strain-mapping at high spatial resolution Micro-diffraction set-up at ALS beamline 12.3.2 with Pilatus-1M detector. Left: Distribution of Re3N (black) and Re (blue) grown in a laser-heated DAC Right: Relative orientation of Re3N grains. Source: Friedrich et al. (2010), PRL (105), 085504. Data rate in the order of 10000’s of frames per day (i.e. 100’s of GB/day)
  • 9. Towards real-time analysis of large data volumes for synchrotron experiments Examples of mineral physics related experiments with high data rates: 3) Tomography 3d-mapping of geo-materials: X-rays Scintillator Supercritical CO2 penetrating sandstone on ALS BL 8.3.2 (courtesy J Ajo-Franklin) Tomography set-up at ALS beamline 8.3.2 Distribution of Fe-alloy melt prepared at 64 GPa measured at SSRL. Shi et al. (2013) Nature Geosciences. DOI: 10.1038/NGEO1956 Data rate in the order of 100’000’s of frames per day (i.e. TB’s/day)
  • 10. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis - 24 dual-socket AMD Opteron 248 2.2Ghz processor nodes 48 CPU’s - 48 GB aggregate memory - 14 TB shared disk storage - Gigabit Ethernet interconnect - 212 GFLOPS (theoretical peak)
  • 11. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis 1) User tunes parameters manually on some ‘typical’ patterns
  • 12. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis 1) Analysis Parameters are written into a instruction-file
  • 13. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis 1) Analysis Parameters are written into a instruction-file
  • 14. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis 2) Launch parsing script: -> reads instruction file and parses data-file onto available CPU’s -> writes batch files which manage individual CPU’s -> launches software on each node
  • 15. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis 3) Results are written in a single file which can be viewed and further analyzed and published: Relative lattice orientation: Gives domain structure. Total color range blue to red corresponds to 4 degs rotation. Average Intensity: Gives high-res fine structure of grain
  • 16. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 1) Data are sent directly to NERSC for analysis and storage during data collection Data are packaged: - after every n images a ‘trigger file’ is deposited in a directory which is monitored by NERSC. - a SPADE web-app wraps the data (512 files at a time) with HDF5 (hierarchical data format) and ships them to NERSC via a Gigabit line (will be upgraded to 10G line). - at NERSC data are received by a SPADE instance, places them in target folder and on tape, and sends an acknowledgment.
  • 17. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 1) Data are sent directly to NERSC for analysis and storage during data collection Up and running Transfer control is web-based
  • 18. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 1) Data are sent directly to NERSC for analysis and storage during data collection Up and running Transfer control is web-based
  • 19. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 1) Data are sent directly to NERSC for analysis and storage during data collection: Up and running Transfer control is web-based
  • 20. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 2) Analysis parameters are set-up with a web-app - under development
  • 21. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 2) Analysis parameters are set-up with a web-app - under development Jobs are launched manually by user via same web-page. Test-runs indicate analysis time in the order of data collection time; can in principle run synchronous to data collection.
  • 22. Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 3) Analysis jobs are executed on Carver - under development Carver is an IBM iDataPlex cluster - 1202 nodes with a total of 9984 processor cores - 106 Tflop/sec peak performance - largest allocated parallel job is 512 cores
  • 23. Towards real-time analysis of large data volumes for synchrotron experiments Summary: - Data analysis is the new bottle-neck limiting progress in many aspects of experimental mineral physics - Real-time analysis with immediate feed-back is increasingly important in experimental mineral physics - These challenges cannot always be met with traditional desktop machines – software has to be automatized and parallelized; collaborations with super-computing is becoming important also for experimental scientists (at least for a few more iterations of Moore’s cycle). - Data analysis on super-computers, remotely controlled with web-applications is a very promising alley, allowing for big-data methods to enter mineral physics. - Future developments may (must?) evolve away from super computers to highly parallelized (GPU’s) local computers and/or cloud computing.

Editor's Notes

  • #4: I would like to start off by giving a brief slightly personalized historic perspective on the application of X-rays in mineral physics research: X-rays are applied in Earth Sciences on a routine basis for about 50 years, this story thus pretty much parallels my life. In the 60-ies and 70-ies, when I was just learning how to spell X-ray the first automated diffractometer replaced fully manual film techniques…. The brightness of the X-rays available in those days limited a data collection powder or single crystal to days and weeks.
  • #5: This changed most dramatically with the advent of dedicated light sources, in particular high-energy 3rd generation sources such as the ESRF in Grenoble where the first dedicated mineral physics beamline ID30. I meanwhile managed to spell X-rays and thus was fortunate enough to be involved in the early days of said dedicated beamline. The brilliance of the ID30 undulator enabled experiments through a diamond anvil cell to be performed in matter of seconds. However, each data point required the physical transport of a 1 x 1 ft image plate to the one and only IP reader on the floor, plus a read-out time of about 45 minutes. Sadly, the tremendous increase in brightness and flux of the X-ray sources could only be utilized in a limited way.
  • #6: Another twenty years later - the age-apropriate amount of light sources meanwhile doesn’t fit on my birthday cake anymore - we hail the advent of ultra-fast and ultra-low noise direct detection X-ray detectors such as the Perkin-Elmer or pilatus, which - in principle- allow data-point rates of up to 30 Hz. This leads to the possibility of large data rates. However, our capabil abilities to deal with these data are largely still on the level of high-end desktops and serial work-flow software. The opportunity given to us by the combination of ever brighter lightsources and fast detectors, I.e. to apply big-data methods to mineral physics research can therefore not be fully harnessed.
  • #7: The way out of this bottleneck is in automatizing and parallelizing the analysis workflow using - at least for the time being - massively parallel super-computers. This is the approach we are presently taking at the Advanced Lightsource in collaboration with the National Energy Research Scientific Computing Center.
  • #8: Let me quickly give you 3 examples of the order of magnitude of data rates we have to deal with: Intense X-rays and fast detector, coupled with programmable T and P change allows a much denser coverage of the P-V-T surface and thus a much better description of thermo-elastic properties of Earth materials and their phase transitions….
  • #9: Mineral physics experiments involving very high temperatures and pressures invariable forces us to deal with large spatial and temporal gradients of pressure, temperature and chemical composition. High-spatial or temporal resolution is therefore needed to explore these inhomegenities. Fast detectors and bright X-rays thus allow us to collect spatially / and or temporally highly resolved maps of our sample…..
  • #10: Going beyond diffraction, various flavors of tomographic techniques allow now to create 3-dimensional images of samples in- and ex-situ, if needed even with chemical or phase selectivity. Such experiments …..
  • #16: This solution works fairly well with medium-sized datasets of up to 10000 frames; With larger data volumes and/or tricky data, data analysis even on a 48 CPU cluster can take much more than the data collection