SlideShare a Scribd company logo
Introduction to Themes and Technologies Per Öster<per.oster@csc.fi>CSC – IT Center for Science LtdFinland
CSC at a glanceFounded in 1970 as a technical support unit     for Univac 1108
Reorganized as a company, CSC - Scientific     Computing Ltd. in 1993
All shares to the Ministry of Education of Finland in 1997
 Operates on a non-profit principle
Facilities in Espoo, close to Otaniemi community (of 15,000 students and    16,000 technologyprofessionals)
Staff 170
Turnover 2008 19,6 millioneurosThemes of the First Week
Themes of the Second Week
The Acronyms
Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
1. Principles of job submission and execution management	VisionUNiformInterface to COmputingResourcesseamless, secure, and intuitiveHistory08/1997 – 12/2002: UNICORE and UNICORE Plus projectsInitial development started in two German projects funded by the German ministry of education and research (BMBF)Continuation in different EU projects since 2002Open Source community development since summer 2004
http://www.unicore.euUNICORE 6 Guiding Principles, Implementation StrategiesOpen source under BSD license with software hosted on SourceForgeStandards-based: OGSA-conform, WS-RF 1.2 compliantOpen, extensible Service-Oriented Architecture (SOA)Interoperable with other Grid technologiesSeamless, secure and intuitive following a vertical end-to-end approachMature Security: X.509, proxy and VO supportWorkflow support tightly integrated while being extensible for different workflow languages and engines for domain-specific usageApplication integration mechanisms on the client, services and resource levelVariety of clients: graphical, command-line, API, portal, etc.Quick and simple installation and configurationSupport for many operating systems (Windows, MacOS, Linux, UNIX) and batch systems (LoadLeveler, Torque, SLURM, LSF, OpenCCS)Implemented in Java to achieve platform-independence
scientific clientsand applicationsURCEclipse-based Rich clientHiLAProgrammingAPIUCCcommand-line clientPortal e.g. GridSphereX.509, Proxies, SOAP, WS-RF, WS-I, JSDLweb service stackGatewaycentral services running in WS-RF hosting environmentsServiceRegistryWorkflowEngineOGSA-RUS, UR,GLUE 2.0ServiceOrchestratorCISInfoServiceGateway – Site 1Gateway – Site 2authenticationUNICOREWS-RFhostingenvironmentUNICOREWS-RFhostingenvironmentOGSA-ByteIO, OGSA-BES, JSDL, HPC-P, OGSA-RUS, URUNICORE Atomic ServicesOGSA-*UNICORE Atomic ServicesOGSA-*UVOSVO ServiceGrid services hostingXNJS – Site 1XNJS – Site 2IDBIDBjob incarnationX.509, XACML, SAML, ProxiesXACML entityXACML entityXUUDBXUUDBauthorizationTarget System Interface – Site 1Target System Interface – Site 2DRMAAExternalStorageLocal RMS (e.g. Torque, LL, LSF, etc.)Local RMS (e.g. Torque, LL, LSF, etc.)GridFTP, ProxiesUSpaceUSpacedata transfer to external storageshttp://www.unicore.eu
http://www.unicore.euWorkflows in   Two layer architecture for scalabilityWorkflow engineBased on Shark open-source XPDLenginePluggable, domain-specific workflow languagesService orchestratorJob execution and monitoringCallback to workflow engineBrokering based on pluggable strategiesClientsGUI client based on EclipseCommandline submission of workflows is also possible
Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
High-Throughput ComputingLarge amount of tasks that can be executed independentlyParameter StudiesMonte Carlo or Stochastic MethodsGenome Sequencing (matching)Analysis of LHC data:Starting from thisLooking for this(1 in 1013)
2. Principles of high-throughput computingVisionCondor provides high-throughput computing in a variety of environmentsLocal dedicated clusters (machine rooms)Local opportunistic (desktop) computers)Grid environments; Can submit jobs to other systemsCan run workflows of jobsCan run parallel jobsIndependently parallel (lots of single jobs)Tightly coupled (such as MPI)
2. Principles of high-throughput computingHistory and Activity Distributed Computing research performed by a team of ~35 faculty, full time staff and students whoEstablished in 1985Faces software/middleware engineering challenges in a UNIX/Linux/Windows/OS X environment, Involved in national and international collaborations,Interacts with users in academia and industry,Maintains and support a distributed production environment (more than 5000 CPUs at UW),Educates and trains students.
Condor Project:Main Threads of ActivitiesDistributed Computing Research – develop and evaluate new concepts, frameworks and technologies Develop and maintain Condor; support our users More on next slideThe Open Science Grid (OSG) – build and operate a national High Throughput Computing infrastructureThe Grid Laboratory Of Wisconsin (GLOW) – build, maintain and operate a distributed computing and storage infrastructure on the UW campus  The NSF Middleware Initiative  (NMI)  - Develop, build and operate a national Build and Test facility powered by Metronome  (ETICS-II)
Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
Web ServicesXMLDCERPCDCOMRMICORBA“Web services has dramatically reduced the programming and management cost of publishing and receiving information”Jim Gray, Microsoft ResearchEMBRACE – 4yr EU project to establish services for the bioinformatics community
3. Principles of service-oriented architecturesVisionProvide the fundamental components to get the grid workingHistoryStarting point in I-WAY, a distributed high-performance network demonstrated at the SuperComputing '95 conference and exhibition
…14 Years Later4 major versionsComponents to address the original problemsMany new fieldsrecent hot topics: service oriented science, virtualizationDiverse application areasrecently: lots of bioinformatics and medical appsothers include: earthquakes, particle physics, earth sciences
21Globus Software now – many componentsGlobus ProjectsOGSA-DAIGT4MPICH-G2DataRepReplicaLocationJava RuntimeMyProxyDelegationGridWayGridFTPMDS4CASC RuntimeGSI-OpenSSHIncubatorMgmtReliableFileTransferGRAMPython RuntimeC SecGT4 DocsIncubatorProjectsCog WFGAARDSVirtWkSpMEDICUSOthers...MetricsOGROGDTEUGPGridShibDyn AcctGavia JSCDDMLRMAHOC-SAPURSEIntroduceWEEPGavia MSSGGCServMarkSecurityExecutionMgmtInfoServicesCommonRuntimeOtherData Mgmt
Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
4. Principles of distributed data management
EGEE Project Overview17000 users136000  LCPUs (cores)25Pb disk39Pb tape12 million jobs/month+45% in a year268 sites+5% in a year48 countries+10% in a year162 VOs+29% in a yearTechnical Status - Steven Newhouse - EGEE-III First Review 24-25 June 200924
Middleware Supporting HTCTechnical Status - Steven Newhouse - EGEE-III First Review 24-25 June 200925ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial SciencesHistory of gLiteDevelopment started in 2004
Entered production in May 2006
Middleware distribution of EGEESupported End-user Activity13,000 end-users in 112 VOs
+44% users in a year
23 core VOs
A core VO has >10% of usage within its science clustergLite MiddlewareTechnical Status - Steven Newhouse - EGEE-III First Review 24-25 June 200926User InterfaceUser AccessExternal ComponentsUser InterfaceEGEE Maintained ComponentsInformation ServicesGeneral ServicesSecurityServicesVirtual Organisation MembershipServiceWorkloadManagement ServiceLogging &Book keepingServiceHydraBDIIProxy ServerAMGAFile TransferServiceLHC FileCatalogueStorage ElementCompute ElementSCASCREAMLCG-CEDisk Pool ManagerAuthz. ServiceBLAHMONLCAS &  LCMAPSdCacheWorker NodegLExecPhysical Resources
Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
The Computing “Eco-system” Scientific need for all tiers!TIER 1Large-scale HPC centersCapability ComputingNational/regional centers, Grid-collaborationTIER 2Capacity ComputingTIER3Local centersPersonal/office computingTIER4
5. Principles of using distributed and high performance systemsARC middleware (Advanced Resource Connector)open source out-of-the-box Grid solution software which enables production quality computational and data Grids (released in May 2002)development is coordinated by NDGFemphasis is put on scalability, stability, reliability and performancebuilds upon standard OS solutions,OpenLDAP, OpenSSL, SASL and Globus Toolkitadds services not provided by Globusextends or completely replaces some Globus components
NorduGrid collaboration*a community around open source Grid middleware: ARCnational Grids (e.g. M-grid, SweGrid, NorGrid), users also outside the Nordic countriesreal users, real applicationsimplemented a production Grid system working non stop since May 2002open for anyone to participate* http://www.nordugrid.org/monitor
M-grid   ̶  the Finnish Material Sciences Gridjoint project between seven Finnish universities, Helsinki Institute of Physics and CSCpartners are laboratories and departments and not university IT centersnot limited by the field of research, used for a wide range of physical, chemical and nanoscience applicationsjointly funded by the Academy of Finland and the participating universities
first large initiative to put Grid middleware into production use in Finland
goal:  throughput computing capacity  mainly for the needs of physics and chemistry researchers

More Related Content

PDF
Session 23 - Intro to EGEE-III
PPTX
Session 33 - Production Grids
PDF
OGF Introductory Overview - FAS* 2014
PDF
Pathways for EOSC-hub and MaX collaboration
PDF
OGF Standards Overview - ITU-T JCA Cloud
PDF
OGF standards for cloud computing
PDF
OGF Introductory Overview - OGF 44 at EGI Conference 2015
PPTX
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Session 23 - Intro to EGEE-III
Session 33 - Production Grids
OGF Introductory Overview - FAS* 2014
Pathways for EOSC-hub and MaX collaboration
OGF Standards Overview - ITU-T JCA Cloud
OGF standards for cloud computing
OGF Introductory Overview - OGF 44 at EGI Conference 2015
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...

What's hot (20)

PPT
Gridforum Juergen Knobloch Grids For Science 20080402
PPT
Cyberinfrastructure and its Role in Science
PDF
OCCI - The Open Cloud Computing Interface – flexible, portable, interoperable...
PDF
Cloud Testbeds for Standards Development and Innovation
PPT
Calit2-a Persistent UCSD/UCI Framework for Collaboration
PDF
Tutorial on Hybrid Data Infrastructures: D4Science as a case study
PDF
Using e-Infrastructures for Biodiversity Conservation
PDF
Graphs are at the Heart of the Cloud
PDF
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
PPT
TeraGrid Communication and Computation
PDF
SCAPE - Building Digital Preservation Infrastructure
PDF
GlobusWorld 2021: Arecibo Observatory Data Movement
PDF
Big Data, Beyond the Data Center
PPT
111018 geo sif_aq_interop
PDF
NSF CAC Cloud Interoperability Testbed Projects
PPTX
Enabling efficient movement of data into & out of a high-performance analysis...
PDF
Mateo Valero - Big data: de la investigación científica a la gestión empresarial
PDF
IDB-Cloud Providing Bioinformatics Services on Cloud
PPTX
Craig Walker_TERN Eco-informatics: Managing and delivering ecological researc...
PDF
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Gridforum Juergen Knobloch Grids For Science 20080402
Cyberinfrastructure and its Role in Science
OCCI - The Open Cloud Computing Interface – flexible, portable, interoperable...
Cloud Testbeds for Standards Development and Innovation
Calit2-a Persistent UCSD/UCI Framework for Collaboration
Tutorial on Hybrid Data Infrastructures: D4Science as a case study
Using e-Infrastructures for Biodiversity Conservation
Graphs are at the Heart of the Cloud
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
TeraGrid Communication and Computation
SCAPE - Building Digital Preservation Infrastructure
GlobusWorld 2021: Arecibo Observatory Data Movement
Big Data, Beyond the Data Center
111018 geo sif_aq_interop
NSF CAC Cloud Interoperability Testbed Projects
Enabling efficient movement of data into & out of a high-performance analysis...
Mateo Valero - Big data: de la investigación científica a la gestión empresarial
IDB-Cloud Providing Bioinformatics Services on Cloud
Craig Walker_TERN Eco-informatics: Managing and delivering ecological researc...
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Ad

Viewers also liked (6)

PDF
Session5 T Infr Access Emidio
PDF
Session 40 : SAGA Overview and Introduction
DOC
Application Form
PDF
Issgc Welcome
PPT
Session 49 - Semantic metadata management practical
PDF
Session5 T Infr Access Emidio
Session 40 : SAGA Overview and Introduction
Application Form
Issgc Welcome
Session 49 - Semantic metadata management practical
Ad

Similar to General Introduction to technologies that will be seen in the school (20)

ODP
The Malaysian Government Interopersbility Framework For Open Source Software ...
PDF
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
PDF
OpenDaylight-in-NextGenNetworkServices
PPTX
PPTX
ION Costa Rica - About the IETF and How to Get Involved
PDF
ION Islamabad - What's Happening at the IETF?
PPTX
ION Belgrade - IETF Update
PDF
Activeeon - Scale Beyond Limits
PPTX
Federated Cloud Computing
PDF
High-Performance and Scalable Designs of Programming Models for Exascale Systems
PPTX
ION Durban - What's Happening at the IETF?
PPT
SomeSlides
PPTX
2017 dagstuhl-nfv-rothenberg
PDF
CV_Kelvin_2016
PDF
Hungarian ClusterGrid and its applications
PDF
A Library for Emerging High-Performance Computing Clusters
PDF
Red hat's updates on the cloud & infrastructure strategy
PDF
Network Softwarization on KREONET: KREONET-S
PDF
Designing HPC & Deep Learning Middleware for Exascale Systems
PPTX
Ogce Workflow Suite
The Malaysian Government Interopersbility Framework For Open Source Software ...
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
OpenDaylight-in-NextGenNetworkServices
ION Costa Rica - About the IETF and How to Get Involved
ION Islamabad - What's Happening at the IETF?
ION Belgrade - IETF Update
Activeeon - Scale Beyond Limits
Federated Cloud Computing
High-Performance and Scalable Designs of Programming Models for Exascale Systems
ION Durban - What's Happening at the IETF?
SomeSlides
2017 dagstuhl-nfv-rothenberg
CV_Kelvin_2016
Hungarian ClusterGrid and its applications
A Library for Emerging High-Performance Computing Clusters
Red hat's updates on the cloud & infrastructure strategy
Network Softwarization on KREONET: KREONET-S
Designing HPC & Deep Learning Middleware for Exascale Systems
Ogce Workflow Suite

More from ISSGC Summer School (20)

PDF
Session 58 - Cloud computing, virtualisation and the future
PDF
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
PPTX
Session 50 - High Performance Computing Ecosystem in Europe
PPT
Integrating Practical2009
PPT
Session 49 Practical Semantic Sticky Note
PPT
Session 48 - Principles of Semantic metadata management
PPT
Session 46 - Principles of workflow management and execution
PPT
Session 42 - GridSAM
PPT
Session 37 - Intro to Workflows, API's and semantics
PPT
Session 43 :: Accessing data using a common interface: OGSA-DAI as an example
PPT
Session 36 - Engage Results
PDF
Social Program
PPT
Session29 Arc
PDF
Session 24 - Distribute Data and Metadata Management with gLite
PDF
Session 23 - gLite Overview
PPT
Session 3-Distributed System Principals
PDF
Session10part2 Servers Detailed
PPT
Session18 Madduri
PDF
Session6 Security Emidio
PDF
Session9part1
Session 58 - Cloud computing, virtualisation and the future
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 50 - High Performance Computing Ecosystem in Europe
Integrating Practical2009
Session 49 Practical Semantic Sticky Note
Session 48 - Principles of Semantic metadata management
Session 46 - Principles of workflow management and execution
Session 42 - GridSAM
Session 37 - Intro to Workflows, API's and semantics
Session 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 36 - Engage Results
Social Program
Session29 Arc
Session 24 - Distribute Data and Metadata Management with gLite
Session 23 - gLite Overview
Session 3-Distributed System Principals
Session10part2 Servers Detailed
Session18 Madduri
Session6 Security Emidio
Session9part1

Recently uploaded (20)

PDF
Classroom Observation Tools for Teachers
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
master seminar digital applications in india
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Insiders guide to clinical Medicine.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
RMMM.pdf make it easy to upload and study
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Institutional Correction lecture only . . .
PDF
01-Introduction-to-Information-Management.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Pharma ospi slides which help in ospi learning
Classroom Observation Tools for Teachers
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Microbial diseases, their pathogenesis and prophylaxis
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
master seminar digital applications in india
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
FourierSeries-QuestionsWithAnswers(Part-A).pdf
human mycosis Human fungal infections are called human mycosis..pptx
Microbial disease of the cardiovascular and lymphatic systems
Insiders guide to clinical Medicine.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
RMMM.pdf make it easy to upload and study
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
TR - Agricultural Crops Production NC III.pdf
Institutional Correction lecture only . . .
01-Introduction-to-Information-Management.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Pharma ospi slides which help in ospi learning

General Introduction to technologies that will be seen in the school

  • 1. Introduction to Themes and Technologies Per Öster<per.oster@csc.fi>CSC – IT Center for Science LtdFinland
  • 2. CSC at a glanceFounded in 1970 as a technical support unit for Univac 1108
  • 3. Reorganized as a company, CSC - Scientific Computing Ltd. in 1993
  • 4. All shares to the Ministry of Education of Finland in 1997
  • 5. Operates on a non-profit principle
  • 6. Facilities in Espoo, close to Otaniemi community (of 15,000 students and 16,000 technologyprofessionals)
  • 8. Turnover 2008 19,6 millioneurosThemes of the First Week
  • 9. Themes of the Second Week
  • 11. Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
  • 12. Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
  • 13. 1. Principles of job submission and execution management VisionUNiformInterface to COmputingResourcesseamless, secure, and intuitiveHistory08/1997 – 12/2002: UNICORE and UNICORE Plus projectsInitial development started in two German projects funded by the German ministry of education and research (BMBF)Continuation in different EU projects since 2002Open Source community development since summer 2004
  • 14. http://www.unicore.euUNICORE 6 Guiding Principles, Implementation StrategiesOpen source under BSD license with software hosted on SourceForgeStandards-based: OGSA-conform, WS-RF 1.2 compliantOpen, extensible Service-Oriented Architecture (SOA)Interoperable with other Grid technologiesSeamless, secure and intuitive following a vertical end-to-end approachMature Security: X.509, proxy and VO supportWorkflow support tightly integrated while being extensible for different workflow languages and engines for domain-specific usageApplication integration mechanisms on the client, services and resource levelVariety of clients: graphical, command-line, API, portal, etc.Quick and simple installation and configurationSupport for many operating systems (Windows, MacOS, Linux, UNIX) and batch systems (LoadLeveler, Torque, SLURM, LSF, OpenCCS)Implemented in Java to achieve platform-independence
  • 15. scientific clientsand applicationsURCEclipse-based Rich clientHiLAProgrammingAPIUCCcommand-line clientPortal e.g. GridSphereX.509, Proxies, SOAP, WS-RF, WS-I, JSDLweb service stackGatewaycentral services running in WS-RF hosting environmentsServiceRegistryWorkflowEngineOGSA-RUS, UR,GLUE 2.0ServiceOrchestratorCISInfoServiceGateway – Site 1Gateway – Site 2authenticationUNICOREWS-RFhostingenvironmentUNICOREWS-RFhostingenvironmentOGSA-ByteIO, OGSA-BES, JSDL, HPC-P, OGSA-RUS, URUNICORE Atomic ServicesOGSA-*UNICORE Atomic ServicesOGSA-*UVOSVO ServiceGrid services hostingXNJS – Site 1XNJS – Site 2IDBIDBjob incarnationX.509, XACML, SAML, ProxiesXACML entityXACML entityXUUDBXUUDBauthorizationTarget System Interface – Site 1Target System Interface – Site 2DRMAAExternalStorageLocal RMS (e.g. Torque, LL, LSF, etc.)Local RMS (e.g. Torque, LL, LSF, etc.)GridFTP, ProxiesUSpaceUSpacedata transfer to external storageshttp://www.unicore.eu
  • 16. http://www.unicore.euWorkflows in Two layer architecture for scalabilityWorkflow engineBased on Shark open-source XPDLenginePluggable, domain-specific workflow languagesService orchestratorJob execution and monitoringCallback to workflow engineBrokering based on pluggable strategiesClientsGUI client based on EclipseCommandline submission of workflows is also possible
  • 17. Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
  • 18. High-Throughput ComputingLarge amount of tasks that can be executed independentlyParameter StudiesMonte Carlo or Stochastic MethodsGenome Sequencing (matching)Analysis of LHC data:Starting from thisLooking for this(1 in 1013)
  • 19. 2. Principles of high-throughput computingVisionCondor provides high-throughput computing in a variety of environmentsLocal dedicated clusters (machine rooms)Local opportunistic (desktop) computers)Grid environments; Can submit jobs to other systemsCan run workflows of jobsCan run parallel jobsIndependently parallel (lots of single jobs)Tightly coupled (such as MPI)
  • 20. 2. Principles of high-throughput computingHistory and Activity Distributed Computing research performed by a team of ~35 faculty, full time staff and students whoEstablished in 1985Faces software/middleware engineering challenges in a UNIX/Linux/Windows/OS X environment, Involved in national and international collaborations,Interacts with users in academia and industry,Maintains and support a distributed production environment (more than 5000 CPUs at UW),Educates and trains students.
  • 21. Condor Project:Main Threads of ActivitiesDistributed Computing Research – develop and evaluate new concepts, frameworks and technologies Develop and maintain Condor; support our users More on next slideThe Open Science Grid (OSG) – build and operate a national High Throughput Computing infrastructureThe Grid Laboratory Of Wisconsin (GLOW) – build, maintain and operate a distributed computing and storage infrastructure on the UW campus The NSF Middleware Initiative (NMI) - Develop, build and operate a national Build and Test facility powered by Metronome (ETICS-II)
  • 22. Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
  • 23. Web ServicesXMLDCERPCDCOMRMICORBA“Web services has dramatically reduced the programming and management cost of publishing and receiving information”Jim Gray, Microsoft ResearchEMBRACE – 4yr EU project to establish services for the bioinformatics community
  • 24. 3. Principles of service-oriented architecturesVisionProvide the fundamental components to get the grid workingHistoryStarting point in I-WAY, a distributed high-performance network demonstrated at the SuperComputing '95 conference and exhibition
  • 25. …14 Years Later4 major versionsComponents to address the original problemsMany new fieldsrecent hot topics: service oriented science, virtualizationDiverse application areasrecently: lots of bioinformatics and medical appsothers include: earthquakes, particle physics, earth sciences
  • 26. 21Globus Software now – many componentsGlobus ProjectsOGSA-DAIGT4MPICH-G2DataRepReplicaLocationJava RuntimeMyProxyDelegationGridWayGridFTPMDS4CASC RuntimeGSI-OpenSSHIncubatorMgmtReliableFileTransferGRAMPython RuntimeC SecGT4 DocsIncubatorProjectsCog WFGAARDSVirtWkSpMEDICUSOthers...MetricsOGROGDTEUGPGridShibDyn AcctGavia JSCDDMLRMAHOC-SAPURSEIntroduceWEEPGavia MSSGGCServMarkSecurityExecutionMgmtInfoServicesCommonRuntimeOtherData Mgmt
  • 27. Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
  • 28. 4. Principles of distributed data management
  • 29. EGEE Project Overview17000 users136000 LCPUs (cores)25Pb disk39Pb tape12 million jobs/month+45% in a year268 sites+5% in a year48 countries+10% in a year162 VOs+29% in a yearTechnical Status - Steven Newhouse - EGEE-III First Review 24-25 June 200924
  • 30. Middleware Supporting HTCTechnical Status - Steven Newhouse - EGEE-III First Review 24-25 June 200925ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial SciencesHistory of gLiteDevelopment started in 2004
  • 32. Middleware distribution of EGEESupported End-user Activity13,000 end-users in 112 VOs
  • 33. +44% users in a year
  • 35. A core VO has >10% of usage within its science clustergLite MiddlewareTechnical Status - Steven Newhouse - EGEE-III First Review 24-25 June 200926User InterfaceUser AccessExternal ComponentsUser InterfaceEGEE Maintained ComponentsInformation ServicesGeneral ServicesSecurityServicesVirtual Organisation MembershipServiceWorkloadManagement ServiceLogging &Book keepingServiceHydraBDIIProxy ServerAMGAFile TransferServiceLHC FileCatalogueStorage ElementCompute ElementSCASCREAMLCG-CEDisk Pool ManagerAuthz. ServiceBLAHMONLCAS & LCMAPSdCacheWorker NodegLExecPhysical Resources
  • 36. Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
  • 37. The Computing “Eco-system” Scientific need for all tiers!TIER 1Large-scale HPC centersCapability ComputingNational/regional centers, Grid-collaborationTIER 2Capacity ComputingTIER3Local centersPersonal/office computingTIER4
  • 38. 5. Principles of using distributed and high performance systemsARC middleware (Advanced Resource Connector)open source out-of-the-box Grid solution software which enables production quality computational and data Grids (released in May 2002)development is coordinated by NDGFemphasis is put on scalability, stability, reliability and performancebuilds upon standard OS solutions,OpenLDAP, OpenSSL, SASL and Globus Toolkitadds services not provided by Globusextends or completely replaces some Globus components
  • 39. NorduGrid collaboration*a community around open source Grid middleware: ARCnational Grids (e.g. M-grid, SweGrid, NorGrid), users also outside the Nordic countriesreal users, real applicationsimplemented a production Grid system working non stop since May 2002open for anyone to participate* http://www.nordugrid.org/monitor
  • 40. M-grid ̶ the Finnish Material Sciences Gridjoint project between seven Finnish universities, Helsinki Institute of Physics and CSCpartners are laboratories and departments and not university IT centersnot limited by the field of research, used for a wide range of physical, chemical and nanoscience applicationsjointly funded by the Academy of Finland and the participating universities
  • 41. first large initiative to put Grid middleware into production use in Finland
  • 42. goal: throughput computing capacity mainly for the needs of physics and chemistry researchers
  • 43. opened to all CSC customers in Nov 2005Grids at CSC (HPC and Grids in Practice)HP CP4000BL ProLiant Cluster
  • 46. 11 TF peak performance
  • 47. Infiniband interconnectgLite on HP clusterARC on HP clusterCray XT4/XT5
  • 52. Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
  • 53. 6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)OGSA-DAI Visionis to enable the sharing of data resources to enable collaboration, to support:Data access - access to structured data in distributed heterogeneous data resources.Data transformation e.g. expose data in schema X to users as data in schema Y.Data integration e.g. expose multiple databases to users as a single virtual databaseData delivery - delivering data to where it's needed by the most appropriate means e.g. web service, e-mail, HTTP, FTP, GridFTP
  • 54. 6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)OGSA-DAI HistoryThe OGSA-DAI project started in February 2002 as part of the UK e-Science Grid Core ProgramIs today part of OMII-UK, a partnership between:OMII, The University of SouthamptonmyGrid, The University of ManchesterOGSA-DAI, The University of Edinburgh
  • 55. 6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)Vision of a Simple API for Grid Application - SAGAProvide simple programmatic interface that is widely-adopted, usable and available for enabling applications for the gridSimplicity:easy to use, install, administer and maintainUniformity:provides support for different application programming languages as well as consistent semantics and style for different Grid functionalityScalability:Contains mechanisms for the same application (source) code to run on a variety of systems ranging from laptops to HPC resourcesGenericity:adds support for different grid middleware, even concurrent onesModularity:provides a framework that is easily extendable
  • 56. 6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)Metadata management: Make metadata Princess in the kingdom of Semantic Web
  • 57. Principles of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows
  • 58. 7. WorkflowsOrganize your work e.g:Gather initial dataPre-processing of dataDefine computing job(s)Initiate job(s)Gather resultsPost-processing of results:RepeatDuring the school you will understand how you can do this in different ways with the systems studied. But, this can also be done with specific workflow systems: Taverna, P-Grade Portal,…
  • 59. Motivations for developing P-GRADE portalP-GRADE portal should Give an answer for all the questions of an e-scientistHide the complexity of the underlying grid middlewaresProvide a high-level graphical user interface that is easy-to-use for e-scientistsSupport many different grid programming approaches (see Morris Riedel’s talk):Simple Scripts & Control (sequential and MPI job execution)Scientific Application Plug-ins (based on GEMLCA)Complex WorkflowsParameter sweep applications: both on job and workflow levelInteroperability: transparent access to grids based on different middleware technologySupport three levels of parallelism
  • 60. Short History of P-GRADE portalParallel Grid Application and Development EnvironmentInitial development started in the Hungarian SuperComputing Grid project in 2003It has been continuously developed since 2003Detailed information: http://portal.p-grade.hu/Open Source community development since January 2008: https://sourceforge.net/projects/pgportal/
  • 61. Integrating PracticalPrinciples of service-oriented architecturePrinciples of high-throughput computingPrinciples of distributed data managementPrinciples of job submission and execution managementPrinciples of using distributed and high performance systemsHigher level APIs: OGSA-DAI, SAGA and metadata managementWorkflows

Editor's Notes

  • #27: Yellow – gLite, Green – externally supported components, gLite consortium