ETF is an end-to-end measurements middleware focused on determining service availability, reliability as well as measuring different performance metrics of services (such as ICMP responses, I/O latencies, etc.). It’s based on Nagios and check_mk, it runs plugins to determine service state, performance, collects the results and streams them to an aggregation, visualization, reporting facility. Measurements are taken by the actual plugins, which follow the monitoring plugins standard.
The following types of measurements are supported:
- Remote API testing - the most common, runs a set of plugins to check functionality of a remote API. This ranges from simple ping like testing to more complicated workflows (such as job submission testing) where internal state needs to be persisted.
- Worker node testing - part of the job submission plugins, which deploys a micro-framework that executes the tests on the worker nodes and reports back the results
- Local testing - not used at the moment, but there is a possibility to run a dedicated agent (check_mk agent) to readout local metrics (usually performance metrics such as CPU, disk I/O, etc) and collect/publish it together with the other tests
ETF is currently operated as a service for WLCG experiments and WLCG operations coordination TFs/WGs. The following table summarizes the existing and planned deployments.
|ATLAS, HTTP TF, RFC TF
|prod: etf-atlas-prod.cern.ch/qa: etf-atlas-preprod.cern.ch
|CMS, RFC TF
|prod: etf-cms-prod.cern.ch/qa: etf-cms-preprod.cern.ch
|LHCb, HTTP, RFC, MJF
|prod: etf-lhcb-prod.cern.ch/qa: etf-lhcb-preprod.cern.ch
|Alice, RFC TF
|prod: etf-alice-prod.cern.ch/qa: etf-alice-preprod.cern.ch
|prod: psomd.grid.iu.edu/qa: perfsonar-itb.grid.iu.edu
In addition, there is a central instance that provides an aggregated (site) view for ATLAS, CMS, LHCb and Alice at etf.cern.ch.
Each instance (except for central) contains both the new web interface based on check_mk as well as the old legacy
Nagios interface, which is located at
/etf/nagios/. In order to access the instances you’ll need an IGTF complaint
certificate loaded in your browser.
For a list of available plugins/metrics supported by ETF see Plugins Documentation
Preferred support channel for ETF is the WLCG Infrastructure Monitoring GGUS SU (generic SAM support), there is also a direct 3rd level support unit for ETF called WLCG Experiment Probe Submission Framework SU.