Job Submission Plugins

Direct CREAM Plugin

Metrics

Name Description Frequency
DirectJobState [SAM:Active+Passive]. Submits grid job to CE. every hour
DirectJobMonit [SAM:Active]. Monitors grid jobs submitted to CEs. every 5 min
DirectJobSubmit [SAM:Passive]. Holds terminal status of submission -

Please note the existence of an active job (e.g. a previously submitted test not yet in a terminal state) prevents the subsequent submission of new tests for the same service. Active job attributes can be found into the /<metric work dir>/activejob.map file.

Timeouts

CREAM Job State Machine: [https://wiki.italiangrid.it/twiki/bin/view/CREAM/UserGuideEMI2#CREAM_job_states]

DirectJobState timeout options:

--timeout-job-discard <sec> Discard job after the timeout.
                        (Default: 21600)
--timeout-job-registered <sec> Time allowed for a job to stay in Registered status.
                         (Default: 120)
--timeout-job-pending <sec> Time allowed for a job to stay in Pending status.
                        (Default: 300)
--timeout-job-idle <sec> Time allowed for a job to stay in Idle status.
                        (Default: 2700)
--timeout-job-running <sec> Time allowed for a job to stay in Running status.
                        (Default: 3000)
--timeout-job-really-running <sec> Time allowed for a job to stay in Really-Running status.
                        (Default: 3300)
--timeout-job-held <sec> Time allowed for a job to stay in Held status.
                        (Default: 3540)
../_images/creamtimeout.jpg

CE Queue selection

If no resource is specified as DirectJobState options:

--resource <URI>       CREAM CE to send job to. Format :
                   <host>[:<port>]/cream-<lrms-system-name>-<queue-name>

the queue will be selected by a BDII lookup matching hostname and VO/Role (*) for Production queue and ordering by minimal GlueCEStateEstimatedResponseTime. Selection algorithm can be tuned via options:

--get-resource-by-max <attribute> The BDII attribute used to order the candidate
                    resources retrieved from discovery. The one
                    with highest value will be used for submission.
--get-resource-by-min <attribute> The BDII attribute used to order the candidate
                    resources retrieved from discovery. The one
                    with lowest value will be used for submission.
                    (Default attribute: GlueCEStateEstimatedResponseTime)

(*) VO/Role selection algorithm:

  • If a FQAN is specified as probe argument, the discovery algorithm looks for a resource (queue) with compatible GlueCEAccessControlBaseRule. Example of the BDII attribute:

    GlueCEAccessControlBaseRule: VOMS:/lhcb/Role=pilot
    
  • If no match is found or no FQAN is specified, the algorithm look for a VO compatible resource. E.g.

    GlueCEAccessControlBaseRule: VOMS:/lhcb
    
    or
    
    GlueCEAccessControlBaseRule: VO:lhcb
    

Ldap queries below mimic the algorithm behaviour (please change hostname and vo/role accordingly...)

  • With FQAN:

    ldapsearch -x -H ldap://sam-bdii.cern.ch:2170 -b mds-vo-name=local,o=grid "(&(objectClass=GlueCE)
                   (GlueCEImplementationName=CREAM)(GlueCEStateStatus=Production)
                   (GlueCEInfoHostName=ce-iep-grid.saske.sk)(GlueCEAccessControlBaseRule=VOMS:/alice/Role=pilot))"
    
  • Without FQAN:

    ldapsearch -x -H ldap://sam-bdii.cern.ch:2170 -b mds-vo-name=local,o=grid "(&(objectClass=GlueCE)
                   (GlueCEImplementationName=CREAM)(GlueCEStateStatus=Production)(GlueCEInfoHostName=ce-iep-grid.saske.sk)
                   (|(GlueCEAccessControlBaseRule=VOMS:/alice)(GlueCEAccessControlBaseRule=VO:alice)))"
    

WN Tests Payload

The following CLI parameters to DirectJobState metric are available:

--add-wntar-nag <d1,d2,..> Comma-separated list of top level directories with
                    Nagios compliant directories structure to be added
                    to tarball to be sent to WN.
--add-wntar-nag-nosam Instructs the metric not to include standard SAM WN
                    probes and their Nagios config to WN tarball.
                    (Default: WN probes are included)
--add-wntar-nag-nosamcfg Instructs the metric not to include Nagios
                    configuration for SAM WN probes to WN tarball. The
                    probes themselves and respective Python packages,
                    however, will be included.

with --add-wntar-nag <d1,d2,..> parameter the respective “Nagios compliant directories structure” should look like this:

[kvs] ~ tree /path/to/your/pobes/wnjob/org.my/
/path/to/your/pobes/wnjob/org.my/
|-- etc
| `-- wn.d
| `-- org.my
| |-- commands.cfg
| `-- services.cfg
`-- probes
`-- org.my
|-- check_A
|-- check_B
`-- checks_lib.sh
  • probes/org.my/\* should contain your probes/checks

  • etc/wn.d/org.my/ should contain file(s) with .cfg extension with Nagios command and service objects definitions (optionally, service dependencies definitions). In your {{etc/wn.d/org.my/*.cfg}} files please use the following paths defining Nagios macros and the framework template names:

  • $USER3$ - macro defining path to <nagiosRoot>/probes/ directory on WN. Usage:

    define command{
        command_name check_A1
        command_line $USER3$/org.my/check_A
    }
    
  • <wnjobWorkDir> - will be substituted with the job’s working directory on WN. Handy if your check requires and creates a working directory. Possible usage (assumes -w instructs check_A to create {{<wnjobWorkDir>/.mygridprobes}} directory):

    define command{
        command_name check_A2
        command_line $USER3$/org.my/check_A -w <wnjobWorkDir>/.mygridprobes
        }
    

Direct ARC Plugin

Metrics

Name Description Frequency
DirectJobState [SAM:Active+Passive]. Submits grid job to CE. every hour
DirectJobMonit [SAM:Active]. Monitors grid jobs submitted to CEs. every 5 min
DirectJobSubmit [SAM:Passive]. Holds terminal status of submission -

Please note the existence of an active job (e.g. a previously submitted test not yet in a terminal state) prevents the subsequent submission of new tests for the same service. Active job attributes can be found into the /<metric work dir>/activejob.map file.

Timeouts

The following timeouts are defined for ARC:

--timeout-job-discard <sec>  Global timeout for jobs. Job will be canceled
                                and dropped if it is not in terminal state by
                                that time. (Default: 86400)
--timeout-job-running <sec>  Time allowed for a job to stay in Running status.
                                (Default: 3000)
    --timeout-job-queuing <sec>  Time allowed for a job to stay in queuing status.
                                (Default: 86400)
    --timeout-job-preparing <sec> Time allowed for a job to stay in preparing status.
                                (Default: 1200)
    --timeout-job-finishing <sec> Time allowed for a job to stay in finishing status.
                                (Default: 1200)

CE Queue selection

The queue will be selected by a BDII lookup matching hostname and VO/Role (*) for Production queue and ordering by minimal GlueCEStateEstimatedResponseTime. Selection algorithm can be tuned via options:

--get-resource-by-max <attribute> The BDII attribute used to order the candidate
                    resources retrieved from discovery. The one
                    with highest value will be used for submission.
--get-resource-by-min <attribute> The BDII attribute used to order the candidate
                    resources retrieved from discovery. The one
                    with lowest value will be used for submission.
                    (Default attribute: GlueCEStateEstimatedResponseTime)

(*) VO/Role selection algorithm:

  • If a FQAN is specified as probe argument, the discovery algorithm looks for a resource (queue) with compatible GlueCEAccessControlBaseRule. Example of the BDII attribute:

    GlueCEAccessControlBaseRule: VOMS:/lhcb/Role=pilot
    
  • If no match is found or no FQAN is specified, the algorithm look for a VO compatible resource. E.g.

    GlueCEAccessControlBaseRule: VOMS:/lhcb
    
    or
    
    GlueCEAccessControlBaseRule: VO:lhcb
    

Ldap queries below mimic the algorithm behaviour (please change hostname and vo/role accordingly...)

  • With FQAN:

    ldapsearch -x -H ldap://sam-bdii.cern.ch:2170 -b mds-vo-name=local,o=grid
               "(&(objectClass=GlueCE)(GlueCEImplementationName=CREAM)(GlueCEStateStatus=Production)
               (GlueCEInfoHostName=ce-iep-grid.saske.sk)(GlueCEAccessControlBaseRule=VOMS:/alice/Role=pilot))"
    
  • Without FQAN:

    ldapsearch -x -H ldap://sam-bdii.cern.ch:2170 -b mds-vo-name=local,o=grid
               "(&(objectClass=GlueCE)(GlueCEImplementationName=CREAM)(GlueCEStateStatus=Production)
               (GlueCEInfoHostName=ce-iep-grid.saske.sk)(|(GlueCEAccessControlBaseRule=VOMS:/alice)
               (GlueCEAccessControlBaseRule=VO:alice)))"
    

WN Tests Payload

The following CLI parameters to DirectJobState metric are available:

--add-wntar-nag <d1,d2,..> Comma-separated list of top level directories with
                    Nagios compliant directories structure to be added
                    to tarball to be sent to WN.
--add-wntar-nag-nosam Instructs the metric not to include standard SAM WN
                    probes and their Nagios config to WN tarball.
                    (Default: WN probes are included)
--add-wntar-nag-nosamcfg Instructs the metric not to include Nagios
                    configuration for SAM WN probes to WN tarball. The
                    probes themselves and respective Python packages,
                    however, will be included.

with --add-wntar-nag <d1,d2,..> parameter the respective “Nagios compliant directories structure” should look like this:

[kvs] ~ tree /path/to/your/pobes/wnjob/org.my/
/path/to/your/pobes/wnjob/org.my/
|-- etc
| `-- wn.d
| `-- org.my
| |-- commands.cfg
| `-- services.cfg
`-- probes
`-- org.my
|-- check_A
|-- check_B
`-- checks_lib.sh
  • probes/org.my/\* should contain your probes/checks

  • etc/wn.d/org.my/ should contain file(s) with .cfg extension with Nagios command and service objects definitions (optionally, service dependencies definitions). In your {{etc/wn.d/org.my/*.cfg}} files please use the following paths defining Nagios macros and the framework template names:

  • $USER3$ - macro defining path to <nagiosRoot>/probes/ directory on WN. Usage:

    define command{
        command_name check_A1
        command_line $USER3$/org.my/check_A
    }
    
  • <wnjobWorkDir> - will be substituted with the job’s working directory on WN. Handy if your check requires and creates a working directory. Possible usage (assumes -w instructs check_A to create {{<wnjobWorkDir>/.mygridprobes}} directory):

    define command{
        command_name check_A2
        command_line $USER3$/org.my/check_A -w <wnjobWorkDir>/.mygridprobes
        }
    

Condor Plugin

Metrics

Name Description Frequency
CONDOR-JobState [SAM:Active+Passive]. Submits grid job to CE. every hour
CONDOR-JobMonit [SAM:Active]. Monitors grid jobs submitted to CEs. every 5 min
CONDOR-JobSubmit [SAM:Passive]. Holds terminal status of job submission to CE. -

Please note the existence of an active job (e.g. a previously submitted test not yet in a terminal state) prevents the subsequent submission of new tests for the same service. Active job attributes can be found into the /<metric work dir>/activejob.map file.

Timeouts

Information on Condor Job State Machine: [http://research.cs.wisc.edu/htcondor/manual/v8.0/12_Appendix_A.html#sec:Job-ClassAd-Attributes]

CONDOR-JobState timeout options:

--timeout-job-discard <sec> Discard job after the timeout.
                        (Default: 21600)
--timeout-job-idle <sec> Time allowed for a job to stay in Idle status.
                        (Default: 2700)
--timeout-job-running <sec> Time allowed for a job to stay in Running status.
                        (Default: 3000)
--timeout-job-held <sec> Time allowed for a job to stay in Held status.
                        (Default: 3540)
../_images/condortimeout.jpg

JDL classads

Additional JDL classads can be specified via the following option:

--jdl-ads              Classads to add to the JDL

CE Queue selection

Explicit queue selection is possible via –resource argument as follows:

--resource <URI>       CE to send job to. Format :
                   <type>://<host>[:<port>]/<schedd>/<lrms-system-name>/<queue-name>
                   Type is one of cream, arc, condor or gt. Schedd is mandatory for condor and
                   lrms is mandatory for cream, but both need to specified for all types. Ports
                   default to arc:2811, cream:8443, gt:2119 and condor:9619

Otherwise, queue will be selected by a BDII lookup matching hostname and VO (*) for Production queue ordering by minimal GlueCEStateEstimatedResponseTime. Selection algorithm can be tuned via options:

--get-resource-by-max <attribute> The BDII attribute used to order the candidate
                    resources retrieved from discovery. The one
                    with highest value will be used for submission.
--get-resource-by-min <attribute> The BDII attribute used to order the candidate
                    resources retrieved from discovery. The one
                    with lowest value will be used for submission.
                    (Default attribute: GlueCEStateEstimatedResponseTime)

(*) VO/Role selection algorithm:

  • If a FQAN is specified as probe argument, the discovery algorithm looks for a resource (queue) with compatible GlueCEAccessControlBaseRule. Example of the BDII attribute:

    GlueCEAccessControlBaseRule: VOMS:/atlas/Role=pilot
    
  • If no match is found or no FQAN is specified, the algorithm look for a VO compatible resource. E.g.

    GlueCEAccessControlBaseRule: VOMS:/atlas
    
    or
    
    GlueCEAccessControlBaseRule: VO:atlas
    

Ldap queries below mimic the algorithm behaviour (please change hostname and vo/role accordingly...)

  • With FQAN:

    ldapsearch -x -H ldap://sam-bdii.cern.ch:2170 -b mds-vo-name=local,o=grid "(&(objectClass=GlueCE)
               (GlueCEStateStatus=Production)(GlueCEInfoHostName=ce-iep-grid.saske.sk)
               (GlueCEAccessControlBaseRule=VOMS:/atlas/Role=pilot))"
    
  • Without FQAN:

    ldapsearch -x -H ldap://sam-bdii.cern.ch:2170 -b mds-vo-name=local,o=grid "(&(objectClass=GlueCE)
                (GlueCEStateStatus=Production)(GlueCEInfoHostName=ce-iep-grid.saske.sk)
                (|(GlueCEAccessControlBaseRule=VOMS:/atlas)(GlueCEAccessControlBaseRule=VO:atlas)))"
    

WN Tests Payload

The following CLI parameters to JobState metric are available:

--add-wntar-nag <d1,d2,..> Comma-separated list of top level directories with
                    Nagios compliant directories structure to be added
                    to tarball to be sent to WN.
--add-wntar-nag-nosam Instructs the metric not to include standard SAM WN
                    probes and their Nagios config to WN tarball.
                    (Default: WN probes are included)
--add-wntar-nag-nosamcfg Instructs the metric not to include Nagios
                    configuration for SAM WN probes to WN tarball. The
                    probes themselves and respective Python packages,
                    however, will be included.

with --add-wntar-nag <d1,d2,..> parameter the respective “Nagios compliant directories structure” should look like this:

[kvs] ~ tree /path/to/your/pobes/wnjob/org.my/
/path/to/your/pobes/wnjob/org.my/
|-- etc
| `-- wn.d
| `-- org.my
| |-- commands.cfg
| `-- services.cfg
`-- probes
`-- org.my
|-- check_A
|-- check_B
`-- checks_lib.sh
  • probes/org.my/\* should contain your probes/checks

  • etc/wn.d/org.my/ should contain file(s) with .cfg extension with Nagios command and service objects definitions (optionally, service dependencies definitions). In your {{etc/wn.d/org.my/*.cfg}} files please use the following paths defining Nagios macros and the framework template names:

  • $USER3$ - macro defining path to <nagiosRoot>/probes/ directory on WN. Usage:

    define command{
        command_name check_A1
        command_line $USER3$/org.my/check_A
    }
    
  • <wnjobWorkDir> - will be substituted with the job’s working directory on WN. Handy if your check requires and creates a working directory. Possible usage (assumes -w instructs check_A to create {{<wnjobWorkDir>/.mygridprobes}} directory):

    define command{
        command_name check_A2
        command_line $USER3$/org.my/check_A -w <wnjobWorkDir>/.mygridprobes
        }