SlideShare a Scribd company logo
glideinWMS for users




      Introduction to glideinWMS
                     by Igor Sfiligoi (UCSD)




CERN, Dec 2012             glideinWMS Intro    1
Scope of this talk

                         This talk provides a
                   user perspective of glideinWMS
                 for users with previous experience
                        with Grid computing.

                  It does not provide much detail
                 but concentrates on the concepts
                     behind the system instead.



CERN, Dec 2012                glideinWMS Intro        2
The problem(s)
 ●   Users have many                   ●   Resources provided
     jobs that must be run                 by O(100) Grid sites
      ●   Each user has
          multiple tasks at once

 ●   How do we schedule them to get the results
     in the shortest amount of time?
      ●   Assuming one result per task
 ●   How do we treat all users in a fair way?
      ●   Independently of how many jobs they submit

CERN, Dec 2012               glideinWMS Intro                     3
glideinWMS approach
 ●   Separates
      ●   Resource provisioning from                                           Never the
      ●   Resource scheduling                                                  user jobs!

 ●   In practice, sends out pilot jobs to the Grid
      ●   And creates an “overlay batch system”
 ●   Pilot jobs get ownership of the Grid slots
      ●   At least for a limited time                                Grid Site

                                                 Grid Site
                                                              Overlay
 ●   Known also as                                             batch
                                                                             Grid Site
                                                              system
     the pilot approach
                                                  Grid Site
                                                                 Grid Site

CERN, Dec 2012                glideinWMS Intro                                              4
Job scheduling
 ●   Once we have the overlay batch system,
     job scheduling works like in any dedicated B.S.
      ●   We own the B.S. and can set the
          job scheduling policies
 ●   glideinWMS based on HTCondor
      ●   So can do whatever HTCondor can do
      ●   Which is quite flexible
            –    But also nothing more...                                              HTCondor
                                                                                  (formerly knowns as Condor)
 ●   HTCondor-based pilots are                                                     is a widely used
                                                                                    batch system.
     usually called glideins
          Thus glideinWMS stands for “glidein based Workflow Management System”
                                                                                        More details
                                                                                         later on.
CERN, Dec 2012                                     glideinWMS Intro                                             5
Creating the overlay
                           i.e. resource provisioning



 ●   glideinWMS will grow and shrink the
     overlay B.S. automatically
      ●   No human intervention needed
 ●   Expansion based on user jobs in the queue
      ●   The more jobs, the faster it will try to grow
      ●   Since not all jobs can run at all sites,
          different attempted growth rates for different sites
 ●   Shrinks automatically if resources unused
      ●   Again, based
          on user jobs in the queue                     Each glidein should run
                                                          at least one user job,
                                                         but will try to run many
                                                    if the Grid slot is long enough
CERN, Dec 2012                 glideinWMS Intro                                       6
glideinWMS in a picture


                 Grid Site                Grid Site




                                                                   Grid site
                             glideinWMS
                                                          HTCondor
                                                         CPU Handler

                                                                User Job
                   HTCondor
                      Job
                   Repository




CERN, Dec 2012                        glideinWMS Intro                         7
From the user point of view
 ●   Users see just a “regular” HTCondor system
      ●   Just a dynamic one
 ●   However
      ●   Have to be aware of the resource provisioning logic
           –     No native HTCondor tools to help with this
      ●   Debugging system problems much harder
           –     Again, no native HTCondor tools to help with this
           –     Most likely question a user will ask is
                 “Why is my job not starting?”




CERN, Dec 2012                      glideinWMS Intro                 8
A few more details
 ●   So, glideinWMS is really HTCondor++
      ●   First you have to understand how HTCondor works
      ●   If you do, 99% of the problems are solved

 ●   HTCondor is composed of 3 logical pieces
      ●   Submit nodes – keep the job queue(s)
      ●   Execute nodes – owns and operates a resource
      ●   A central manager – glues the other two categories
                             together and executes policies



CERN, Dec 2012              glideinWMS Intro                   9
HTCondor in a picture

                                              Execute node

                        Central manager       Execute node
          Submit node
                                              Execute node
                           Condor
          Submit node
                                              Execute node
          Submit node
                                              Execute node
            Condor                              Condor




CERN, Dec 2012             glideinWMS Intro                  10
Even more details
 ●   The actual work quanta are
      ●   Jobs – on the submit node, typically many/node
      ●   Slots – on the execute node, typically only a few /node
 ●   While internally implemented differently,
     jobs and slots are conceptually very similar
      ●   Both describe a logical entity
                                                 “ClassAd” in
      ●   Both have attributes describing it     HTCondor speak
      ●   Both have requirements
 ●   HTCondor policy engine is really all about
     matchmaking jobs to slots

CERN, Dec 2012                glideinWMS Intro                    11
Matchmaking
 ●   Jobs that are not running
     (i.e. are “idle” in HTCondor speak)
     will be matched against Slots
     that don't yet run anything
     (i.e. are “Unclaimed” in HTCondor speak)
 ●   Requirements expressions can (and usually do)
     reference attributes in the other ClassAd
 ●   Both sides must evaluate to True for a match
      ●   Although we encourage all logic to reside in the
          Slot Requirement in glideinWMS setups
          (more on this later on)



CERN, Dec 2012                         glideinWMS Intro      12
A non-technical example

  Buyer Ad                                         Pet Ad
  MyType = “Buyer”                                 MyType = “Pet”
  TargetType = “Pet”                               TargetType = “Buyer”
  Requirements =                                   Requirements =
  (PetType == “Dog”) &&                                DogLover == True
  (Price <= AcctBalance) &&                        PetType = “Dog”
  (Size == "Large"||Size == "Very Large")
                                                   Color = “Brown”
  AcctBalance = 1000
                                                   Price = 75
  DogLover = True
                                                   Breed = "Saint Bernard"
  LegalName = “Curly Howard”
  ...                                              Size = "Very Large"
                                                   ...

  Buyer ~= Job                                             Dog == Resource ~= Slot

CERN, Dec 2012                  glideinWMS Intro                                13
Matching order
 ●   Most of the time, there are
     way more idle jobs than Unclaimed slots
      ●   So order is important
 ●   Two policies
      1) Jobs  from highest priority user first
      2) Priority-FIFO policy for jobs of the same user
 ●   User priority based on usage
      ●   The more resources you use, the lower the priority
          (with priority recovery over time)
      ●   But some users may be marked as “more important”
          (priority multipliers and group quotas)


CERN, Dec 2012                           glideinWMS Intro      14
The glideinWMS part
 ●   i.e. The layer on top of HTCondor that
     finds resources where to start the
     “Execute node” daemons
      ●   i.e. glideins
 ●   Composed of two parts:
      ●   Glidein Factory – The abstraction layer
      ●   VO Fronend      – The brain




CERN, Dec 2012              glideinWMS Intro        15
Glidein Factory
●   The splitting in two allows for the                         i.e. serve
                                                               many VO FEs
    Glidein Factory to be generic
     ●   I.e can (and should) be shared between many VOs
●   The G.F. is really just an abstraction layer
     ●   It insulates the VO from the provisioning details
           –     e.g. knowing the name of the Grid CE and relative RSL
     ●   Allows new technology to be added seamlessly
         (e.g. Clouds)
●   It also provides a troubleshooting service
     ●   The factory operators are supposed to
         address any Grid related problems they observe

CERN, Dec 2012                     glideinWMS Intro                      16
The VO Frontend
 ●   The name may be misleading
      ●   It is really the “matchmaker of Grid resources”
 ●   Introduces a new quanta
      ●   Entry – logical equivalent of a “queue at a site”
                  Basic working block of a G.F.
 ●   The VO Frontend
      1) Matches    idle Jobs to Entries
      2) Instructs the affected G.F. to
         increase or decrease the number
         of glideins on that Entry
                                                 Thus regulates the
                                                resource provisioning

CERN, Dec 2012               glideinWMS Intro                           17
Updated glideinWMS picture
                                  G.F.
                        +3
           VO FE                                      Grid
                                  G.F.
                        +1
                                                   Execute node

                             Central manager       Execute node
          Submit node
                                                   Execute node
                                Condor
          Submit node
                                                   Execute node
          Submit node
                                                   Execute node
            Condor                                   Condor




CERN, Dec 2012                  glideinWMS Intro                  18
VO FE Matchmaking logic
 ●   Based on Job attributes
      ●   Jobs don't have “FE-specific requirements”
 ●   The exact matchmaking policy
     depends on the VO FE instance
                                          Will describe CMS policies in a different talk
 ●   glideinWMS has 2 level matchmaking
      ●   Once in the FE, then in the HTCondor C.M.
      ●   Recommended to avoid explicit
          “HTCondor requirements” in the Job ClassAd
           –     The glideins should set “Slot requirements” based on the
                 same attributes used by the VO FE, instead
                                                  Since VO FE configures the glideins
CERN, Dec 2012                     glideinWMS Intro                                     19
What is the user to do?
 0) Learn  how to use HTCondor
 1) Learn what the VO FE policy is
 2) Create the HTCondor submit file (i.e. JDL)
   containing the necessary attributes
 3) Submit jobs
 4) Wait for the results to come back
 5) Rinse and repeat (from (2))




CERN, Dec 2012           glideinWMS Intro        20
This is it


             ●   Hopefully you have a high level view of
                 the system now
             ●   More details in separate talks




CERN, Dec 2012                glideinWMS Intro             21
Pointers
 ●   glideinWMS Home Page
     http://tinyurl.com/glideinWMS
 ●   HTCondor Home Page
     http://research.cs.wisc.edu/htcondor/
 ●   HTCondor support
     htcondor-users@cs.wisc.edu
     htcondor-admin@cs.wisc.edu
 ●   glideinWMS support
     glideinwms-support@fnal.gov


CERN, Dec 2012          glideinWMS Intro     22
Acknowledgments
 ●   The creation of this document was sponsored
     by grants from the US NSF and US DOE,
     and by the University of California system




CERN, Dec 2012         glideinWMS Intro            23

More Related Content

PDF
Distributed Multi-Threading in GNU-Prolog
ODP
Grid Lingo - in glideinWMS context
PDF
Orchestration for the rest of us
KEY
Aloha RubyConf 2012 - JRuby
PDF
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
PDF
Node.js #digpen presentation
PDF
The Lies We Tell Our Code (#seascale 2015 04-22)
PDF
Node.js at Joyent: Engineering for Production
Distributed Multi-Threading in GNU-Prolog
Grid Lingo - in glideinWMS context
Orchestration for the rest of us
Aloha RubyConf 2012 - JRuby
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
Node.js #digpen presentation
The Lies We Tell Our Code (#seascale 2015 04-22)
Node.js at Joyent: Engineering for Production

Viewers also liked (17)

PDF
Introduction to security in the Open Science Grid - OSG School 2014
PDF
Introduction to Distributed HTC and overlay systems - OSG User School 2014
PDF
How is glideinWMS different from vanilla HTCondor
PDF
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
PDF
Matchmaking in glideinWMS in CMS
PDF
glideinWMS, The OSG overlay DHTC system - OSG School 2014
PDF
Glidein Factory Operations
PDF
Building a Global Namespace with Nirvana
PDF
Known HTCondor break points
PPTX
Presentation 15 condor-v1
PDF
glideinWMS Training 2014 - HTCondor Internals
PDF
Using ssh as portal - The CMS CRAB over glideinWMS experience
PDF
An argument for moving the requirements out of user hands - The CMS Experience
PDF
Understanding priorities in HTCondor
PDF
Where to find DHTC resources - OSG School 2014
PDF
Solving Grid problems through glidein monitoring
PDF
Augmenting Big Data Analytics with Nirvana
Introduction to security in the Open Science Grid - OSG School 2014
Introduction to Distributed HTC and overlay systems - OSG User School 2014
How is glideinWMS different from vanilla HTCondor
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Matchmaking in glideinWMS in CMS
glideinWMS, The OSG overlay DHTC system - OSG School 2014
Glidein Factory Operations
Building a Global Namespace with Nirvana
Known HTCondor break points
Presentation 15 condor-v1
glideinWMS Training 2014 - HTCondor Internals
Using ssh as portal - The CMS CRAB over glideinWMS experience
An argument for moving the requirements out of user hands - The CMS Experience
Understanding priorities in HTCondor
Where to find DHTC resources - OSG School 2014
Solving Grid problems through glidein monitoring
Augmenting Big Data Analytics with Nirvana
Ad

Similar to Introduction to glideinWMS (20)

ODP
glideinWMS - The Larger Picture
PDF
glideinWMS Architecture - glideinWMS Training Jan 2012
PDF
The glideinWMS approach to the ownership of System Images in the Cloud World
PDF
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
ODP
Glidein internals
PDF
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
PDF
glideinWMS Frontend Internals - glideinWMS Training Jan 2012
PDF
glideinWMS Training Jan 2012 - Condor tuning
PDF
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
PPTX
Multi core programming 2
PDF
Improving Engineering Processes using Hudson - Spark IT 2010
PDF
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
PDF
glideinWMS validation scirpts - glideinWMS Training Jan 2012
PDF
JBoss Drools - Open-Source Business Logic Platform
PPTX
PDF
PPTX
Truemotion Adventures in Containerization
PDF
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
PDF
Saint2012 mod process security
PPTX
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
glideinWMS - The Larger Picture
glideinWMS Architecture - glideinWMS Training Jan 2012
The glideinWMS approach to the ownership of System Images in the Cloud World
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
Glidein internals
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
glideinWMS Frontend Internals - glideinWMS Training Jan 2012
glideinWMS Training Jan 2012 - Condor tuning
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
Multi core programming 2
Improving Engineering Processes using Hudson - Spark IT 2010
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012
JBoss Drools - Open-Source Business Logic Platform
Truemotion Adventures in Containerization
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
Saint2012 mod process security
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Ad

More from Igor Sfiligoi (20)

PDF
Preparing Fusion codes for Perlmutter - CGYRO
PDF
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
PDF
Comparing single-node and multi-node performance of an important fusion HPC c...
PDF
The anachronism of whole-GPU accounting
PDF
Auto-scaling HTCondor pools using Kubernetes compute resources
PDF
Speeding up bowtie2 by improving cache-hit rate
PDF
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
PDF
Comparing GPU effectiveness for Unifrac distance compute
PDF
Managing Cloud networking costs for data-intensive applications by provisioni...
PDF
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
PDF
Using A100 MIG to Scale Astronomy Scientific Output
PDF
Using commercial Clouds to process IceCube jobs
PDF
Modest scale HPC on Azure using CGYRO
PDF
Data-intensive IceCube Cloud Burst
PDF
Scheduling a Kubernetes Federation with Admiralty
PDF
Accelerating microbiome research with OpenACC
PDF
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
PDF
Porting and optimizing UniFrac for GPUs
PDF
Demonstrating 100 Gbps in and out of the public Clouds
PDF
TransAtlantic Networking using Cloud links
Preparing Fusion codes for Perlmutter - CGYRO
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
Comparing single-node and multi-node performance of an important fusion HPC c...
The anachronism of whole-GPU accounting
Auto-scaling HTCondor pools using Kubernetes compute resources
Speeding up bowtie2 by improving cache-hit rate
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Comparing GPU effectiveness for Unifrac distance compute
Managing Cloud networking costs for data-intensive applications by provisioni...
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Using A100 MIG to Scale Astronomy Scientific Output
Using commercial Clouds to process IceCube jobs
Modest scale HPC on Azure using CGYRO
Data-intensive IceCube Cloud Burst
Scheduling a Kubernetes Federation with Admiralty
Accelerating microbiome research with OpenACC
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Porting and optimizing UniFrac for GPUs
Demonstrating 100 Gbps in and out of the public Clouds
TransAtlantic Networking using Cloud links

Recently uploaded (20)

PDF
August Patch Tuesday
PPTX
Chapter 5: Probability Theory and Statistics
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
project resource management chapter-09.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Mushroom cultivation and it's methods.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
August Patch Tuesday
Chapter 5: Probability Theory and Statistics
DP Operators-handbook-extract for the Mautical Institute
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Unlocking AI with Model Context Protocol (MCP)
Group 1 Presentation -Planning and Decision Making .pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Web App vs Mobile App What Should You Build First.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
A Presentation on Artificial Intelligence
project resource management chapter-09.pdf
Encapsulation theory and applications.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Mushroom cultivation and it's methods.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
WOOl fibre morphology and structure.pdf for textiles

Introduction to glideinWMS

  • 1. glideinWMS for users Introduction to glideinWMS by Igor Sfiligoi (UCSD) CERN, Dec 2012 glideinWMS Intro 1
  • 2. Scope of this talk This talk provides a user perspective of glideinWMS for users with previous experience with Grid computing. It does not provide much detail but concentrates on the concepts behind the system instead. CERN, Dec 2012 glideinWMS Intro 2
  • 3. The problem(s) ● Users have many ● Resources provided jobs that must be run by O(100) Grid sites ● Each user has multiple tasks at once ● How do we schedule them to get the results in the shortest amount of time? ● Assuming one result per task ● How do we treat all users in a fair way? ● Independently of how many jobs they submit CERN, Dec 2012 glideinWMS Intro 3
  • 4. glideinWMS approach ● Separates ● Resource provisioning from Never the ● Resource scheduling user jobs! ● In practice, sends out pilot jobs to the Grid ● And creates an “overlay batch system” ● Pilot jobs get ownership of the Grid slots ● At least for a limited time Grid Site Grid Site Overlay ● Known also as batch Grid Site system the pilot approach Grid Site Grid Site CERN, Dec 2012 glideinWMS Intro 4
  • 5. Job scheduling ● Once we have the overlay batch system, job scheduling works like in any dedicated B.S. ● We own the B.S. and can set the job scheduling policies ● glideinWMS based on HTCondor ● So can do whatever HTCondor can do ● Which is quite flexible – But also nothing more... HTCondor (formerly knowns as Condor) ● HTCondor-based pilots are is a widely used batch system. usually called glideins Thus glideinWMS stands for “glidein based Workflow Management System” More details later on. CERN, Dec 2012 glideinWMS Intro 5
  • 6. Creating the overlay i.e. resource provisioning ● glideinWMS will grow and shrink the overlay B.S. automatically ● No human intervention needed ● Expansion based on user jobs in the queue ● The more jobs, the faster it will try to grow ● Since not all jobs can run at all sites, different attempted growth rates for different sites ● Shrinks automatically if resources unused ● Again, based on user jobs in the queue Each glidein should run at least one user job, but will try to run many if the Grid slot is long enough CERN, Dec 2012 glideinWMS Intro 6
  • 7. glideinWMS in a picture Grid Site Grid Site Grid site glideinWMS HTCondor CPU Handler User Job HTCondor Job Repository CERN, Dec 2012 glideinWMS Intro 7
  • 8. From the user point of view ● Users see just a “regular” HTCondor system ● Just a dynamic one ● However ● Have to be aware of the resource provisioning logic – No native HTCondor tools to help with this ● Debugging system problems much harder – Again, no native HTCondor tools to help with this – Most likely question a user will ask is “Why is my job not starting?” CERN, Dec 2012 glideinWMS Intro 8
  • 9. A few more details ● So, glideinWMS is really HTCondor++ ● First you have to understand how HTCondor works ● If you do, 99% of the problems are solved ● HTCondor is composed of 3 logical pieces ● Submit nodes – keep the job queue(s) ● Execute nodes – owns and operates a resource ● A central manager – glues the other two categories together and executes policies CERN, Dec 2012 glideinWMS Intro 9
  • 10. HTCondor in a picture Execute node Central manager Execute node Submit node Execute node Condor Submit node Execute node Submit node Execute node Condor Condor CERN, Dec 2012 glideinWMS Intro 10
  • 11. Even more details ● The actual work quanta are ● Jobs – on the submit node, typically many/node ● Slots – on the execute node, typically only a few /node ● While internally implemented differently, jobs and slots are conceptually very similar ● Both describe a logical entity “ClassAd” in ● Both have attributes describing it HTCondor speak ● Both have requirements ● HTCondor policy engine is really all about matchmaking jobs to slots CERN, Dec 2012 glideinWMS Intro 11
  • 12. Matchmaking ● Jobs that are not running (i.e. are “idle” in HTCondor speak) will be matched against Slots that don't yet run anything (i.e. are “Unclaimed” in HTCondor speak) ● Requirements expressions can (and usually do) reference attributes in the other ClassAd ● Both sides must evaluate to True for a match ● Although we encourage all logic to reside in the Slot Requirement in glideinWMS setups (more on this later on) CERN, Dec 2012 glideinWMS Intro 12
  • 13. A non-technical example Buyer Ad Pet Ad MyType = “Buyer” MyType = “Pet” TargetType = “Pet” TargetType = “Buyer” Requirements = Requirements = (PetType == “Dog”) && DogLover == True (Price <= AcctBalance) && PetType = “Dog” (Size == "Large"||Size == "Very Large") Color = “Brown” AcctBalance = 1000 Price = 75 DogLover = True Breed = "Saint Bernard" LegalName = “Curly Howard” ... Size = "Very Large" ... Buyer ~= Job Dog == Resource ~= Slot CERN, Dec 2012 glideinWMS Intro 13
  • 14. Matching order ● Most of the time, there are way more idle jobs than Unclaimed slots ● So order is important ● Two policies 1) Jobs from highest priority user first 2) Priority-FIFO policy for jobs of the same user ● User priority based on usage ● The more resources you use, the lower the priority (with priority recovery over time) ● But some users may be marked as “more important” (priority multipliers and group quotas) CERN, Dec 2012 glideinWMS Intro 14
  • 15. The glideinWMS part ● i.e. The layer on top of HTCondor that finds resources where to start the “Execute node” daemons ● i.e. glideins ● Composed of two parts: ● Glidein Factory – The abstraction layer ● VO Fronend – The brain CERN, Dec 2012 glideinWMS Intro 15
  • 16. Glidein Factory ● The splitting in two allows for the i.e. serve many VO FEs Glidein Factory to be generic ● I.e can (and should) be shared between many VOs ● The G.F. is really just an abstraction layer ● It insulates the VO from the provisioning details – e.g. knowing the name of the Grid CE and relative RSL ● Allows new technology to be added seamlessly (e.g. Clouds) ● It also provides a troubleshooting service ● The factory operators are supposed to address any Grid related problems they observe CERN, Dec 2012 glideinWMS Intro 16
  • 17. The VO Frontend ● The name may be misleading ● It is really the “matchmaker of Grid resources” ● Introduces a new quanta ● Entry – logical equivalent of a “queue at a site” Basic working block of a G.F. ● The VO Frontend 1) Matches idle Jobs to Entries 2) Instructs the affected G.F. to increase or decrease the number of glideins on that Entry Thus regulates the resource provisioning CERN, Dec 2012 glideinWMS Intro 17
  • 18. Updated glideinWMS picture G.F. +3 VO FE Grid G.F. +1 Execute node Central manager Execute node Submit node Execute node Condor Submit node Execute node Submit node Execute node Condor Condor CERN, Dec 2012 glideinWMS Intro 18
  • 19. VO FE Matchmaking logic ● Based on Job attributes ● Jobs don't have “FE-specific requirements” ● The exact matchmaking policy depends on the VO FE instance Will describe CMS policies in a different talk ● glideinWMS has 2 level matchmaking ● Once in the FE, then in the HTCondor C.M. ● Recommended to avoid explicit “HTCondor requirements” in the Job ClassAd – The glideins should set “Slot requirements” based on the same attributes used by the VO FE, instead Since VO FE configures the glideins CERN, Dec 2012 glideinWMS Intro 19
  • 20. What is the user to do? 0) Learn how to use HTCondor 1) Learn what the VO FE policy is 2) Create the HTCondor submit file (i.e. JDL) containing the necessary attributes 3) Submit jobs 4) Wait for the results to come back 5) Rinse and repeat (from (2)) CERN, Dec 2012 glideinWMS Intro 20
  • 21. This is it ● Hopefully you have a high level view of the system now ● More details in separate talks CERN, Dec 2012 glideinWMS Intro 21
  • 22. Pointers ● glideinWMS Home Page http://tinyurl.com/glideinWMS ● HTCondor Home Page http://research.cs.wisc.edu/htcondor/ ● HTCondor support htcondor-users@cs.wisc.edu htcondor-admin@cs.wisc.edu ● glideinWMS support glideinwms-support@fnal.gov CERN, Dec 2012 glideinWMS Intro 22
  • 23. Acknowledgments ● The creation of this document was sponsored by grants from the US NSF and US DOE, and by the University of California system CERN, Dec 2012 glideinWMS Intro 23