SlideShare a Scribd company logo
glideinWMS for users



    Matchmaking in glideinWMS
             in CMS
                     by Igor Sfiligoi (UCSD)




CERN, Dec 2012          glideinWMS matchmaking   1
Scope of this talk



                      This talk provides a
                 high level description of how
                  glideinWMS matchmaking
                        works in CMS.



                 Reader is expected to be familiar with the CMS experiment environment
                                        http://cms.web.cern.ch/


CERN, Dec 2012                        glideinWMS matchmaking                             2
glideinWMS architecture
 ●   A reminder                  G.F.
                       +3
          VO FE                                          Grid
                                 G.F.
                       +1
                                                      Execute node

                            Central manager           Execute node
         Submit node
                                                      Execute node
                              Negotiator
         Submit node
                                                      Execute node
         Submit node
                                                      Execute node
           Schedd                                       Condor




CERN, Dec 2012               glideinWMS matchmaking                  3
Two levels of matchmaking
 ●   First in the VO Frontend
      ●   To decide where                                             G.F.

          to provision resources               VO FE
                                                             +3

                                                             +1
                                                                      G.F.
                                                                                               Grid
                                                                                          Execute node

      ●   i.e. where                           Submit node
                                                                  Central manager         Execute node
                                                                                          Execute node

          to send glideins
                                                                    Negotiator
                                               Submit node
                                                                                          Execute node
                                               Submit node
                                                                                          Execute node
                                                Schedd

     Then in the
                                                                                              Condor
 ●

     HTCondor Negotiator
      ●   To decide                                                                 The two
          which Job gets the glidein Slot                                    must have
                                                                              compatible
                                                                               policies


CERN, Dec 2012               glideinWMS matchmaking                                                      4
Defining the policy
 ●    The VO FE configures the glideins
       ●   So it can define the Slot Requirements
 ●    Preferred strategy to leave all policy
      decisions in the VO FE hands, i.e. both
       ●   VO FE matchmaking policy                                           Easier keep them
                                                                              in sync this way
       ●   HTCondor matchmaking policy
 ●    This implies
       ●   Users should not define Job Requirements
       ●   Instead, publish attributes describing requirements
     http://www.slideshare.net/igor_sfiligoi/condor-week-12-attribute-matchmaking-move-req-out-of-user-hands


CERN, Dec 2012                              glideinWMS matchmaking                                             5
CMS Production @ CERN
                Policies




CERN, Dec 2012   glideinWMS matchmaking   6
Description
 ●   The VO FE @ CERN serves
     the production needs
      ●   i.e. Reconstruction and MC production
 ●   Job submission regulated by service managed
     by a dedicated team,
     so jobs are
      ●   Targeted
      ●   Well behaved
                             At least by and large



CERN, Dec 2012            glideinWMS matchmaking     7
Matchmaking policy
 ●   Two dimensions
      ●   Grid Site
      ●   Single CPU vs HTPC
 ●   The actual policy is the AND of both
 ●   Both VO FE policy and HTCondor policy
     defined in the VO FE instance configuration




CERN, Dec 2012          glideinWMS matchmaking     8
Matching on Grid site name
 ●   User Jobs expected to publish the attribute
     DESIRED_Sites               String list

      ●   e.g. +DESIRED_Sites   = “T2_DE_DESY,T2_US_UCSD”
 ●   The G.F. and the glideins advertising
     GLIDEIN_CMSSite
 ●   The matchmaking policy is
     GLIDEIN_CMSSite ∈ DESIRED_Sites




CERN, Dec 2012            glideinWMS matchmaking            9
Matching on Job Type
 ●   Use Jobs can publish the attribute
     DESIRES_HTPC            Integer representation of Boolean values

      ●   e.g. +DESIRES_HTPS   = 1
      ●   If not defined, defaults to 0
 ●   The G.F. And the glideins may advertise
     GLIDEIN_Is_HTPC          Boolean value

      ●   If not defined, defaults to False
 ●   The matchmaking policy is
     (GLIDEIN_Is_HTPC==True)==(DESIRES_HTPC==1)


CERN, Dec 2012              glideinWMS matchmaking                 10
Example submit file


         Universe
          Universe = vanilla
                     = vanilla
         Executable = mcgen
          Executable = mcgen
         Arguments = -k 1543.3
          Arguments = -k 1543.3
         Output
          Output    = mcgen.out
                     = mcgen.out
         Error
          Error     = mcgen.err
                     = mcgen.err
         Log
          Log       = mcgen.log
                     = mcgen.log
         +DESIRED_Sites = “T2_DE_DESY,T2_US_UCSD”
          +DESIRED_Sites = “T2_DE_DESY,T2_US_UCSD”
         +DESIRES_HTPC = 0
          +DESIRES_HTPC = 0
         Requirements = True
          Requirements = True
         Queue 1
          Queue 1




CERN, Dec 2012           glideinWMS matchmaking      11
CMS AnaOps @ UCSD
                      Policies




CERN, Dec 2012        glideinWMS matchmaking   12
Description
 ●   VO FE @ UCSD serves CMS analysis users
 ●   User Jobs much more chaotic
      ●   Most users don't really understand their needs
      ●   Must protect from accidental errors
      ●   Yet keep the system flexible
 ●   Net result
      ●   More complex policy



CERN, Dec 2012             glideinWMS matchmaking          13
Two different policies
 ●   The AnaOps FE actually has two policies
      ●   The Regular policy
      ●   The Overflow policy
 ●   The Regular policy tries to match resources
      ●   Based on User desires
 ●   The Overflow policy “outsmarts” the Users
      ●   Will violate User desires without breaking the Jobs
      ●   The aim is to finish user jobs sooner
      ●   User can opt-out, if he wishes
CERN, Dec 2012             glideinWMS matchmaking               14
The Regular M.M. policy
 ●   Four+one dimensions
      ●   Grid Site
      ●   Single CPU vs HTPC
      ●   Memory usage
      ●   Job duration
                                           Due to preemption
      ●   Number of Job Starts
 ●   The actual policy is the AND of both
 ●   Both VO FE policy and HTCondor policy
     defined in the VO FE instance configuration
CERN, Dec 2012            glideinWMS matchmaking               15
Grid site selection
 ●   This is both similar and different compared to
     the Production FE @CERN
      ●   Serves the same purpose, but supports three
          different ways to select a site
           –     Due to historical evolution
 ●   The three options are
      ●   GLIDEIN_CMSSite ∈ DESIRED_Sites
                                                              Planning to extend to
      ●   GLIDEIN_SEs ∈ DESIRED_SEs                        (GLIDEIN_SEs ∩ DESIRED_SEs) ≠∅

      ●   GLIDEIN_Gatekeeper ∈ DESIRED_Gatekeepers
 ●   The actual policy is the OR of the three

CERN, Dec 2012                    glideinWMS matchmaking                           16
Job type selection
 ●   Just like @ CERN




CERN, Dec 2012        glideinWMS matchmaking   17
Memory Usage
●   Most Grid sites put strict limits on the amount of
    memory that can be used
    ●   Will kill glideins if they exceed the limit
●   G.F. and glideins advertise the Entry-specific limit
    GLIDEIN_MaxMemMBs
●   Jobs can explicitly declare the needed memory
    request_memory Native Condor attribute, no + needed
     ● Condor will also measure it at run time            Use a combination
                                                          of these to calculate
         –   ImageSize – Virtual memory used              the actual JobMemory

         –   ResidentSetSize – True memory usage
●   Policy: JobMemory <= GLIDEIN_MaxMemMBs
CERN, Dec 2012               glideinWMS matchmaking                               18
Job Duration                  1/2




 ●   Glideins have a limited lifetime
      ●   Must fit within the limits of the Grid site's queue
      ●   Glideins publish the deadline
          GLIDEIN_ToDie
           –     Jobs must finish before reaching the deadline
 ●   Final user job lifetime unpredictable
      ●   Depends on the type of computing done
      ●   User should indicate the expected job lifetime
           –     Else we have to assume reasonable defaults
                                                                Not many users set
                                                               this value(s) right now
CERN, Dec 2012                  glideinWMS matchmaking                                   19
Job Duration                  2/2




 ●   The same type of computation may take
     different amount of time
      ●   e.g. Based on the type of input
 ●   Jobs can declare two attributes
      ●   NormMaxWallTimeMins – Expected limit
      ●   MaxWallTimeMins – Absolute max limit
 ●   The matchmaking logic is
      ●   Use NormMaxWallTimeMins for
                                                       Based on simple assumption
          the first job startup                         that the job was killed for
                                                           hitting the deadline.
      ●   Use MaxWallTimeMins for all others

CERN, Dec 2012              glideinWMS matchmaking                                    20
Cut on number of re-starts
 ●   Not really a user configurable property
      ●   More an emergency break
 ●   In a properly configured system,
     should never be triggered
      ●   But unexpected problems happen
      ●   So better limit the damage




CERN, Dec 2012            glideinWMS matchmaking   21
The Overflow Use case
 ●   User Jobs specify a list of sites,
     because the data they need is there
 ●   With recent versions of CMSSW, jobs can
     access the data from remote
      ●   With a small performance penalty
 ●   We can thus schedule jobs “anywhere”
      ●   As long as the needed data is
          at a Site that has joined the xrootd federation
      ●   But only if no CPU available “close to the data”
           –     And not too far, either
                      http://indico.cern.ch/contributionDisplay.py?contribId=381&sessionId=5&confId=149557
                      http://indico.cern.ch/contributionDisplay.py?contribId=232&sessionId=8&confId=149557

CERN, Dec 2012                               glideinWMS matchmaking                                          22
The Overflow M.M. policy
 ●   Violate only the “Site selection” rule
      ●   Keep all the others
 ●   Plus, add one+one more:
      ●   An opt-out mechanism
      ●   Delayed matching




CERN, Dec 2012             glideinWMS matchmaking   23
New Site M.M. policy
 ●   The user specified attribute is used
     to flag the job as “Overflowable”
      ●   i.e. the job will match if and only if
          (DESIRED_<site>s ∩ SUPPORTED_<site>s) ≠∅
                        Still support all 3 types of site identification
 ●   Matching jobs can then run on any glidein
      ●   Additional limits can be put in place by the FE,
          but mostly invisible to the user




CERN, Dec 2012               glideinWMS matchmaking                        24
The opt-out mechanism
 ●   The Overflow policy
     considers all jobs by default
      ●   But Users may want to opt-out some of the Jobs
           –     Sometimes it is just a need
                 (to get deterministic results, e.g. for testing a site)
 ●   To opt-out, the user defines
     +CMS_ALLOW_OVERFLOW = False
      ●   The FE will not consider such jobs for Overflowing




CERN, Dec 2012                     glideinWMS matchmaking                  25
Delayed matching
 ●   As said initially,
     Jobs should preferentially run close to the data
      ●   Overflow should only consider jobs
          “that cannot find resources close to the data”
 ●   We implemented it based on time
      ●   Jobs are matched only
          if waiting in the queue for more than 6 hours

                                    Users cannot influence it




CERN, Dec 2012             glideinWMS matchmaking               26
Example submit file

 Universe
  Universe = vanilla
             = vanilla
 Executable = myana
  Executable = myana
 Arguments = -k 1543.3
  Arguments = -k 1543.3
 Output
  Output    = myana.out
             = myana.out
 Error
  Error     = myana.err
             = myana.err
 Log
  Log       = myana.log
             = myana.log
 request_memory = 1500
  request_memory = 1500
 +DESIRED_SEs = "dc2-grid-64.brunel.ac.uk,stormfe1.pi.infn.it"
  +DESIRED_SEs = "dc2-grid-64.brunel.ac.uk,stormfe1.pi.infn.it"
 +NormMaxWallTimeMins = 7200
  +NormMaxWallTimeMins = 7200
 +MaxWallTimeMins = 14400
  +MaxWallTimeMins = 14400
 +DESIRES_HTPC = 0
  +DESIRES_HTPC = 0
 +CMS_ALLOW_OVERFLOW = True
  +CMS_ALLOW_OVERFLOW = True
 Requirements = True
  Requirements = True
 Queue 1
  Queue 1


CERN, Dec 2012            glideinWMS matchmaking                  27
The End




CERN, Dec 2012   glideinWMS matchmaking   28
Pointers
 ●   glideinWMS Home Page
     http://tinyurl.com/glideinWMS
 ●   HTCondor Home Page
     http://research.cs.wisc.edu/htcondor/
 ●   HTCondor support
     htcondor-users@cs.wisc.edu
     htcondor-admin@cs.wisc.edu
 ●   glideinWMS support
     glideinwms-support@fnal.gov

CERN, Dec 2012        glideinWMS matchmaking   29
Acknowledgments
 ●   The creation of this document was sponsored
     by grants from the US NSF and US DOE,
     and by the University of California system




CERN, Dec 2012       glideinWMS matchmaking        30

More Related Content

PDF
glideinWMS Architecture - glideinWMS Training Jan 2012
PDF
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
PDF
OpenMI Developers Training
PPTX
OpenMI Developers Training
PDF
Introduction to Distributed HTC and overlay systems - OSG User School 2014
PDF
Introduction to security in the Open Science Grid - OSG School 2014
PDF
How is glideinWMS different from vanilla HTCondor
PDF
glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS Architecture - glideinWMS Training Jan 2012
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
OpenMI Developers Training
OpenMI Developers Training
Introduction to Distributed HTC and overlay systems - OSG User School 2014
Introduction to security in the Open Science Grid - OSG School 2014
How is glideinWMS different from vanilla HTCondor
glideinWMS, The OSG overlay DHTC system - OSG School 2014

Viewers also liked (9)

PDF
glideinWMS Training 2014 - HTCondor Internals
PDF
Using ssh as portal - The CMS CRAB over glideinWMS experience
PPTX
distcom-short-20140112-1600
PDF
VMworld 2013: Performance and Capacity Management of DRS Clusters
PDF
Understanding priorities in HTCondor
PPTX
Presentation 15 condor-v1
PDF
Where to find DHTC resources - OSG School 2014
PDF
Augmenting Big Data Analytics with Nirvana
PDF
Known HTCondor break points
glideinWMS Training 2014 - HTCondor Internals
Using ssh as portal - The CMS CRAB over glideinWMS experience
distcom-short-20140112-1600
VMworld 2013: Performance and Capacity Management of DRS Clusters
Understanding priorities in HTCondor
Presentation 15 condor-v1
Where to find DHTC resources - OSG School 2014
Augmenting Big Data Analytics with Nirvana
Known HTCondor break points
Ad

Similar to Matchmaking in glideinWMS in CMS (17)

PDF
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
PDF
Introduction to glideinWMS
PDF
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
PDF
glideinWMS Frontend Internals - glideinWMS Training Jan 2012
PDF
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
PDF
An argument for moving the requirements out of user hands - The CMS Experience
PDF
HMS: Scalable Configuration Management System for Hadoop
PDF
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
ODP
Glidein internals
PDF
A Queue Simulation Tool for a High Performance Scientific Computing Center
PDF
Condor overview - glideinWMS Training Jan 2012
PDF
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
PPTX
(ATS3-PLAT06) Handling “Big Data” with Pipeline Pilot (MapReduce/NoSQL)
PDF
Cms forum, future of Web Content Management
PDF
Future of WCM - CM Forum Belgium
PDF
Unit 1: Apply the Twelve-Factor App to Microservices Architectures
PPT
Pebble Grid Computing Framework
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Introduction to glideinWMS
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Internals - glideinWMS Training Jan 2012
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
An argument for moving the requirements out of user hands - The CMS Experience
HMS: Scalable Configuration Management System for Hadoop
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
Glidein internals
A Queue Simulation Tool for a High Performance Scientific Computing Center
Condor overview - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
(ATS3-PLAT06) Handling “Big Data” with Pipeline Pilot (MapReduce/NoSQL)
Cms forum, future of Web Content Management
Future of WCM - CM Forum Belgium
Unit 1: Apply the Twelve-Factor App to Microservices Architectures
Pebble Grid Computing Framework
Ad

More from Igor Sfiligoi (20)

PDF
Preparing Fusion codes for Perlmutter - CGYRO
PDF
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
PDF
Comparing single-node and multi-node performance of an important fusion HPC c...
PDF
The anachronism of whole-GPU accounting
PDF
Auto-scaling HTCondor pools using Kubernetes compute resources
PDF
Speeding up bowtie2 by improving cache-hit rate
PDF
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
PDF
Comparing GPU effectiveness for Unifrac distance compute
PDF
Managing Cloud networking costs for data-intensive applications by provisioni...
PDF
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
PDF
Using A100 MIG to Scale Astronomy Scientific Output
PDF
Using commercial Clouds to process IceCube jobs
PDF
Modest scale HPC on Azure using CGYRO
PDF
Data-intensive IceCube Cloud Burst
PDF
Scheduling a Kubernetes Federation with Admiralty
PDF
Accelerating microbiome research with OpenACC
PDF
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
PDF
Porting and optimizing UniFrac for GPUs
PDF
Demonstrating 100 Gbps in and out of the public Clouds
PDF
TransAtlantic Networking using Cloud links
Preparing Fusion codes for Perlmutter - CGYRO
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
Comparing single-node and multi-node performance of an important fusion HPC c...
The anachronism of whole-GPU accounting
Auto-scaling HTCondor pools using Kubernetes compute resources
Speeding up bowtie2 by improving cache-hit rate
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Comparing GPU effectiveness for Unifrac distance compute
Managing Cloud networking costs for data-intensive applications by provisioni...
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Using A100 MIG to Scale Astronomy Scientific Output
Using commercial Clouds to process IceCube jobs
Modest scale HPC on Azure using CGYRO
Data-intensive IceCube Cloud Burst
Scheduling a Kubernetes Federation with Admiralty
Accelerating microbiome research with OpenACC
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Porting and optimizing UniFrac for GPUs
Demonstrating 100 Gbps in and out of the public Clouds
TransAtlantic Networking using Cloud links

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
A Presentation on Artificial Intelligence
PPT
Teaching material agriculture food technology
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Cloud computing and distributed systems.
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation theory and applications.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
A Presentation on Artificial Intelligence
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Cloud computing and distributed systems.
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Electronic commerce courselecture one. Pdf
Encapsulation theory and applications.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Per capita expenditure prediction using model stacking based on satellite ima...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Modernizing your data center with Dell and AMD
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Matchmaking in glideinWMS in CMS

  • 1. glideinWMS for users Matchmaking in glideinWMS in CMS by Igor Sfiligoi (UCSD) CERN, Dec 2012 glideinWMS matchmaking 1
  • 2. Scope of this talk This talk provides a high level description of how glideinWMS matchmaking works in CMS. Reader is expected to be familiar with the CMS experiment environment http://cms.web.cern.ch/ CERN, Dec 2012 glideinWMS matchmaking 2
  • 3. glideinWMS architecture ● A reminder G.F. +3 VO FE Grid G.F. +1 Execute node Central manager Execute node Submit node Execute node Negotiator Submit node Execute node Submit node Execute node Schedd Condor CERN, Dec 2012 glideinWMS matchmaking 3
  • 4. Two levels of matchmaking ● First in the VO Frontend ● To decide where G.F. to provision resources VO FE +3 +1 G.F. Grid Execute node ● i.e. where Submit node Central manager Execute node Execute node to send glideins Negotiator Submit node Execute node Submit node Execute node Schedd Then in the Condor ● HTCondor Negotiator ● To decide The two which Job gets the glidein Slot must have compatible policies CERN, Dec 2012 glideinWMS matchmaking 4
  • 5. Defining the policy ● The VO FE configures the glideins ● So it can define the Slot Requirements ● Preferred strategy to leave all policy decisions in the VO FE hands, i.e. both ● VO FE matchmaking policy Easier keep them in sync this way ● HTCondor matchmaking policy ● This implies ● Users should not define Job Requirements ● Instead, publish attributes describing requirements http://www.slideshare.net/igor_sfiligoi/condor-week-12-attribute-matchmaking-move-req-out-of-user-hands CERN, Dec 2012 glideinWMS matchmaking 5
  • 6. CMS Production @ CERN Policies CERN, Dec 2012 glideinWMS matchmaking 6
  • 7. Description ● The VO FE @ CERN serves the production needs ● i.e. Reconstruction and MC production ● Job submission regulated by service managed by a dedicated team, so jobs are ● Targeted ● Well behaved At least by and large CERN, Dec 2012 glideinWMS matchmaking 7
  • 8. Matchmaking policy ● Two dimensions ● Grid Site ● Single CPU vs HTPC ● The actual policy is the AND of both ● Both VO FE policy and HTCondor policy defined in the VO FE instance configuration CERN, Dec 2012 glideinWMS matchmaking 8
  • 9. Matching on Grid site name ● User Jobs expected to publish the attribute DESIRED_Sites String list ● e.g. +DESIRED_Sites = “T2_DE_DESY,T2_US_UCSD” ● The G.F. and the glideins advertising GLIDEIN_CMSSite ● The matchmaking policy is GLIDEIN_CMSSite ∈ DESIRED_Sites CERN, Dec 2012 glideinWMS matchmaking 9
  • 10. Matching on Job Type ● Use Jobs can publish the attribute DESIRES_HTPC Integer representation of Boolean values ● e.g. +DESIRES_HTPS = 1 ● If not defined, defaults to 0 ● The G.F. And the glideins may advertise GLIDEIN_Is_HTPC Boolean value ● If not defined, defaults to False ● The matchmaking policy is (GLIDEIN_Is_HTPC==True)==(DESIRES_HTPC==1) CERN, Dec 2012 glideinWMS matchmaking 10
  • 11. Example submit file Universe Universe = vanilla = vanilla Executable = mcgen Executable = mcgen Arguments = -k 1543.3 Arguments = -k 1543.3 Output Output = mcgen.out = mcgen.out Error Error = mcgen.err = mcgen.err Log Log = mcgen.log = mcgen.log +DESIRED_Sites = “T2_DE_DESY,T2_US_UCSD” +DESIRED_Sites = “T2_DE_DESY,T2_US_UCSD” +DESIRES_HTPC = 0 +DESIRES_HTPC = 0 Requirements = True Requirements = True Queue 1 Queue 1 CERN, Dec 2012 glideinWMS matchmaking 11
  • 12. CMS AnaOps @ UCSD Policies CERN, Dec 2012 glideinWMS matchmaking 12
  • 13. Description ● VO FE @ UCSD serves CMS analysis users ● User Jobs much more chaotic ● Most users don't really understand their needs ● Must protect from accidental errors ● Yet keep the system flexible ● Net result ● More complex policy CERN, Dec 2012 glideinWMS matchmaking 13
  • 14. Two different policies ● The AnaOps FE actually has two policies ● The Regular policy ● The Overflow policy ● The Regular policy tries to match resources ● Based on User desires ● The Overflow policy “outsmarts” the Users ● Will violate User desires without breaking the Jobs ● The aim is to finish user jobs sooner ● User can opt-out, if he wishes CERN, Dec 2012 glideinWMS matchmaking 14
  • 15. The Regular M.M. policy ● Four+one dimensions ● Grid Site ● Single CPU vs HTPC ● Memory usage ● Job duration Due to preemption ● Number of Job Starts ● The actual policy is the AND of both ● Both VO FE policy and HTCondor policy defined in the VO FE instance configuration CERN, Dec 2012 glideinWMS matchmaking 15
  • 16. Grid site selection ● This is both similar and different compared to the Production FE @CERN ● Serves the same purpose, but supports three different ways to select a site – Due to historical evolution ● The three options are ● GLIDEIN_CMSSite ∈ DESIRED_Sites Planning to extend to ● GLIDEIN_SEs ∈ DESIRED_SEs (GLIDEIN_SEs ∩ DESIRED_SEs) ≠∅ ● GLIDEIN_Gatekeeper ∈ DESIRED_Gatekeepers ● The actual policy is the OR of the three CERN, Dec 2012 glideinWMS matchmaking 16
  • 17. Job type selection ● Just like @ CERN CERN, Dec 2012 glideinWMS matchmaking 17
  • 18. Memory Usage ● Most Grid sites put strict limits on the amount of memory that can be used ● Will kill glideins if they exceed the limit ● G.F. and glideins advertise the Entry-specific limit GLIDEIN_MaxMemMBs ● Jobs can explicitly declare the needed memory request_memory Native Condor attribute, no + needed ● Condor will also measure it at run time Use a combination of these to calculate – ImageSize – Virtual memory used the actual JobMemory – ResidentSetSize – True memory usage ● Policy: JobMemory <= GLIDEIN_MaxMemMBs CERN, Dec 2012 glideinWMS matchmaking 18
  • 19. Job Duration 1/2 ● Glideins have a limited lifetime ● Must fit within the limits of the Grid site's queue ● Glideins publish the deadline GLIDEIN_ToDie – Jobs must finish before reaching the deadline ● Final user job lifetime unpredictable ● Depends on the type of computing done ● User should indicate the expected job lifetime – Else we have to assume reasonable defaults Not many users set this value(s) right now CERN, Dec 2012 glideinWMS matchmaking 19
  • 20. Job Duration 2/2 ● The same type of computation may take different amount of time ● e.g. Based on the type of input ● Jobs can declare two attributes ● NormMaxWallTimeMins – Expected limit ● MaxWallTimeMins – Absolute max limit ● The matchmaking logic is ● Use NormMaxWallTimeMins for Based on simple assumption the first job startup that the job was killed for hitting the deadline. ● Use MaxWallTimeMins for all others CERN, Dec 2012 glideinWMS matchmaking 20
  • 21. Cut on number of re-starts ● Not really a user configurable property ● More an emergency break ● In a properly configured system, should never be triggered ● But unexpected problems happen ● So better limit the damage CERN, Dec 2012 glideinWMS matchmaking 21
  • 22. The Overflow Use case ● User Jobs specify a list of sites, because the data they need is there ● With recent versions of CMSSW, jobs can access the data from remote ● With a small performance penalty ● We can thus schedule jobs “anywhere” ● As long as the needed data is at a Site that has joined the xrootd federation ● But only if no CPU available “close to the data” – And not too far, either http://indico.cern.ch/contributionDisplay.py?contribId=381&sessionId=5&confId=149557 http://indico.cern.ch/contributionDisplay.py?contribId=232&sessionId=8&confId=149557 CERN, Dec 2012 glideinWMS matchmaking 22
  • 23. The Overflow M.M. policy ● Violate only the “Site selection” rule ● Keep all the others ● Plus, add one+one more: ● An opt-out mechanism ● Delayed matching CERN, Dec 2012 glideinWMS matchmaking 23
  • 24. New Site M.M. policy ● The user specified attribute is used to flag the job as “Overflowable” ● i.e. the job will match if and only if (DESIRED_<site>s ∩ SUPPORTED_<site>s) ≠∅ Still support all 3 types of site identification ● Matching jobs can then run on any glidein ● Additional limits can be put in place by the FE, but mostly invisible to the user CERN, Dec 2012 glideinWMS matchmaking 24
  • 25. The opt-out mechanism ● The Overflow policy considers all jobs by default ● But Users may want to opt-out some of the Jobs – Sometimes it is just a need (to get deterministic results, e.g. for testing a site) ● To opt-out, the user defines +CMS_ALLOW_OVERFLOW = False ● The FE will not consider such jobs for Overflowing CERN, Dec 2012 glideinWMS matchmaking 25
  • 26. Delayed matching ● As said initially, Jobs should preferentially run close to the data ● Overflow should only consider jobs “that cannot find resources close to the data” ● We implemented it based on time ● Jobs are matched only if waiting in the queue for more than 6 hours Users cannot influence it CERN, Dec 2012 glideinWMS matchmaking 26
  • 27. Example submit file Universe Universe = vanilla = vanilla Executable = myana Executable = myana Arguments = -k 1543.3 Arguments = -k 1543.3 Output Output = myana.out = myana.out Error Error = myana.err = myana.err Log Log = myana.log = myana.log request_memory = 1500 request_memory = 1500 +DESIRED_SEs = "dc2-grid-64.brunel.ac.uk,stormfe1.pi.infn.it" +DESIRED_SEs = "dc2-grid-64.brunel.ac.uk,stormfe1.pi.infn.it" +NormMaxWallTimeMins = 7200 +NormMaxWallTimeMins = 7200 +MaxWallTimeMins = 14400 +MaxWallTimeMins = 14400 +DESIRES_HTPC = 0 +DESIRES_HTPC = 0 +CMS_ALLOW_OVERFLOW = True +CMS_ALLOW_OVERFLOW = True Requirements = True Requirements = True Queue 1 Queue 1 CERN, Dec 2012 glideinWMS matchmaking 27
  • 28. The End CERN, Dec 2012 glideinWMS matchmaking 28
  • 29. Pointers ● glideinWMS Home Page http://tinyurl.com/glideinWMS ● HTCondor Home Page http://research.cs.wisc.edu/htcondor/ ● HTCondor support htcondor-users@cs.wisc.edu htcondor-admin@cs.wisc.edu ● glideinWMS support glideinwms-support@fnal.gov CERN, Dec 2012 glideinWMS matchmaking 29
  • 30. Acknowledgments ● The creation of this document was sponsored by grants from the US NSF and US DOE, and by the University of California system CERN, Dec 2012 glideinWMS matchmaking 30