execo_g5k

Execo extension and tools for Grid5000.

To use execo_g5k, passwordless, public-key based authentification must be used (either with appropriate public / private specific g5k keys shared on all your home directories, or with an appropriate ssh-agent forwarding configuration).

The code can be run from a frontend, from a computer outside of Grid5000 (for this you need an appropriate configuration, see section Running from outside Grid5000) or from g5k nodes (in this case, you need to explicitely refer to the local Grid5000 site / frontend by name, and the node running the code needs to be able to connect to all involved frontends)

OAR functions

OarSubmission

class execo_g5k.oar.OarSubmission(resources=None, walltime=None, job_type=None, sql_properties=None, queue=None, reservation_date=None, directory=None, project=None, name=None, additional_options=None, command=None)

An oar submission.

POD style class.

members are:

  • resources: Set the requested resources for the job. Oar option -l, without the walltime. Can be a single resource specification or an iterable of resource specifications.
  • walltime: Job walltime. Walltime part of oar -l option.
  • job_type: Job type, oar option -t: deploy, besteffort, cosystem, checkpoint, timesharing, allow_classic_ssh. Can be a single job type or an iterable of job types.
  • sql_properties: constraints to properties for the job, oar option -p (use single quotes for literal strings).
  • queue: the queue to submit the job to. Oar option -q.
  • reservation_date: Request that the job starts at a specified time. Oar option -r.
  • directory: Specify the directory where to launch the command (default is current directory). Oar option -d.
  • project: pecify a name of a project the job belongs to. Oar option –project.
  • name: Specify an arbitrary name for the job. Oar option -n.
  • additional_options: passed directly to oarsub on the command line.
  • command: run by oarsub (default: sleep a long time).

oarsub

execo_g5k.oar.oarsub(job_specs, frontend_connection_params=None, timeout=False, abort_on_error=False)

Submit jobs.

Parameters:
  • job_specs – iterable of tuples (execo_g5k.oar.OarSubmission, frontend) with None for default frontend
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for submitting. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.
  • abort_on_error – default False. If True, raises an exception on any error. If False, will returned the list of job got, even if incomplete (some frontends may have failed to answer).

Returns a list of tuples (oarjob id, frontend), with frontend == None for default frontend. If submission error, oarjob id == None. The returned list matches, in the same order, the job_specs parameter.

oardel

execo_g5k.oar.oardel(job_specs, frontend_connection_params=None, timeout=False)

Delete oar jobs.

Ignores any error, so you can delete inexistant jobs, already deleted jobs, or jobs that you don’t own. Those deletions will be ignored.

Parameters:
  • job_specs – iterable of tuples (job_id, frontend) with None for default frontend
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for deleting. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.

get_current_oar_jobs

execo_g5k.oar.get_current_oar_jobs(frontends=None, start_between=None, end_between=None, frontend_connection_params=None, timeout=False, abort_on_error=False)

Return a list of current active oar job ids.

The list contains tuples (oarjob id, frontend).

Parameters:
  • frontends – an iterable of frontends to connect to. A frontend with value None means default frontend. If frontends == None, means get current oar jobs only for default frontend.
  • start_between – a tuple (low, high) of endpoints. Filters and returns only jobs whose start date is in between these endpoints.
  • end_between – a tuple (low, high) of endpoints. Filters and returns only jobs whose end date is in between these endpoints.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.
  • abort_on_error – default False. If True, raises an exception on any error. If False, will returned the list of job got, even if incomplete (some frontends may have failed to answer).

get_oar_job_info

execo_g5k.oar.get_oar_job_info(oar_job_id=None, frontend=None, frontend_connection_params=None, timeout=False, nolog_exit_code=False, nolog_timeout=False, nolog_error=False)

Return a dict with informations about an oar job.

Parameters:
  • oar_job_id – the oar job id. If None given, will try to get it from OAR_JOB_ID environment variable.
  • frontend – the frontend of the oar job. If None given, use default frontend
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.

Hash returned may contain these keys:

  • start_date: unix timestamp of job’s start date
  • walltime: job’s walltime (seconds)
  • scheduled_start: unix timestamp of job’s start prediction (may change between invocations)
  • state: job state. Possible states: ‘Waiting’, ‘Hold’, ‘toLaunch’, ‘toError’, ‘toAckReservation’, ‘Launching’, ‘Running’, ‘Suspended’, ‘Resuming’, ‘Finishing’, ‘Terminated’, ‘Error’, see table jobs, column state, in oar documentation http://oar.imag.fr/sources/2.5/docs/documentation/OAR-DOCUMENTATION-ADMIN/#jobs
  • name: job name

But no info may be available as long as the job is not scheduled.

wait_oar_job_start

execo_g5k.oar.wait_oar_job_start(oar_job_id=None, frontend=None, frontend_connection_params=None, timeout=None, prediction_callback=None)

Sleep until an oar job’s start time.

As long as the job isn’t scheduled, wait_oar_job_start will sleep / poll every execo_g5k.config.g5k_configuration['polling_interval'] seconds until it is scheduled. Then, knowing its start date, it will sleep the amount of time necessary to wait for the job start.

returns True if wait was successful, False otherwise (job cancelled, error)

Parameters:
  • oar_job_id – the oar job id. If None given, will try to get it from OAR_JOB_ID environment variable.
  • frontend – the frontend of the oar job. If None given, use default frontend.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is None (no timeout).
  • prediction_callback – function taking a unix timestamp as parameter. This function will be called each time oar job start prediction changes.

get_oar_job_nodes

execo_g5k.oar.get_oar_job_nodes(oar_job_id=None, frontend=None, frontend_connection_params=None, timeout=False)

Return an iterable of execo.host.Host containing the hosts of an oar job.

This method waits for the job start (the list of nodes isn’t fixed until the job start).

Parameters:
  • oar_job_id – the oar job id. If None given, will try to get it from OAR_JOB_ID environment variable.
  • frontend – the frontend of the oar job. If None given, use default frontend.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.

get_oar_job_subnets

execo_g5k.oar.get_oar_job_subnets(oar_job_id=None, frontend=None, frontend_connection_params=None, timeout=False)

Return a tuple containing an iterable of tuples (IP, MAC) and a dict containing the subnet parameters of the reservation (if any).

subnet parameters dict has keys: ‘ip_prefix’, ‘broadcast’, ‘netmask’, ‘gateway’, ‘network’, ‘dns_hostname’, ‘dns_ip’.

Parameters:
  • oar_job_id – the oar job id. If None given, will try to get it from OAR_JOB_ID environment variable.
  • frontend – the frontend of the oar job. If None given, use default frontend.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.

get_oar_job_kavlan

execo_g5k.oar.get_oar_job_kavlan(oar_job_id=None, frontend=None, frontend_connection_params=None, timeout=False)

Return the list of vlan ids of a job (if any).

Parameters:
  • oar_job_id – the oar job id. If None given, will try to get it from OAR_JOB_ID environment variable.
  • frontend – the frontend of the oar job. If None given, use default frontend.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.

oarsubgrid

execo_g5k.oar.oarsubgrid(job_specs, reservation_date=None, walltime=None, job_type=None, queue=None, directory=None, additional_options=None, frontend_connection_params=None, timeout=False)

Similar to execo_g5k.oargrid.oargridsub, but instead of performing an oargrid reservation, it performs parallel oar submissions.

The only difference for the user is that it returns a list of tuples (oarjob id, frontend) (as execo_g5k.oar.oarsub) instead of an oargrid job id. It should run faster (since oar submission are performed in parallel instead of sequentially, and also because it bypasses the oargrid layer). As with execo_g5k.oargrid.oargridsub, all parameters reservation_date, walltime, job_type, queue, directory, additional_options from job_specs are ignored and replaced by the arguments passed. As for oargridsub, all job submissions must succeed. If at least one job submission fails, all other jobs are deleted, and it returns an empty list.

for parameter details, see execo_g5k.oargrid.oargridsub.

OARGRID functions

oargridsub

execo_g5k.oargrid.oargridsub(job_specs, reservation_date=None, walltime=None, job_type=None, queue=None, directory=None, additional_options=None, frontend_connection_params=None, timeout=False)

Submit oargrid jobs.

Parameters:
  • job_specs – iterable of tuples (OarSubmission, clusteralias). Reservation date, walltime, queue, directory, project, additional_options, command of the OarSubmission are ignored.
  • reservation_date – grid job reservation date. Default: now.
  • walltime – grid job walltime.
  • job_type – type of job for all clusters: deploy, besteffort, cosystem, checkpoint, timesharing.
  • queue – oar queue to use.
  • directory – directory where the reservation will be launched.
  • additional_options – passed directly to oargridsub on the command line.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.

Returns a tuple (oargrid_job_id, ssh_key), or (None, None) if error.

Note that, as oargrid does not handle correctly quoting sql clauses enclosed inside braces, this function tries to automatically overcome this limitation by adding some, with the right escaping (backslashes). Also, note that oargrid’s command line parser does not handle correctly commas in sql clauses enclosed inside braces, as it considers it as a rdef separator. This prevents, for example, using comma separated list values for NOT IN clauses.

oargriddel

execo_g5k.oargrid.oargriddel(job_ids, frontend_connection_params=None, timeout=False)

Delete oargrid jobs.

Ignores any error, so you can delete inexistant jobs, already deleted jobs, or jobs that you don’t own. Those deletions will be ignored.

Parameters:
  • job_ids – iterable of oar grid job ids.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use g5k_configuration['default_timeout']. None means no timeout.

get_current_oargrid_jobs

execo_g5k.oargrid.get_current_oargrid_jobs(start_between=None, end_between=None, frontend_connection_params=None, timeout=False)

Return a list of current active oargrid job ids.

Parameters:
  • start_between – a tuple (low, high) of endpoints. Filters and returns only jobs whose start date is in between these endpoints.
  • end_between – a tuple (low, high) of endpoints. Filters and returns only jobs whose end date is in between these endpoints.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.

get_oargrid_job_info

execo_g5k.oargrid.get_oargrid_job_info(oargrid_job_id=None, frontend_connection_params=None, timeout=False)

Return a dict with informations about an oargrid job.

Parameters:
  • oargrid_job_id – the oargrid job id.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.

Hash returned contains these keys:

  • start_date: unix timestamp of job’s start date
  • walltime: job’s walltime in seconds
  • user: job’s user

get_oargrid_job_oar_jobs

execo_g5k.oargrid.get_oargrid_job_oar_jobs(oargrid_job_id=None, frontend_connection_params=None, timeout=False)

Return a list of tuples (oar job id, site), the list of individual oar jobs which make an oargrid job.

Parameters:
  • oargrid_job_id – the oargrid job id.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.

wait_oargrid_job_start

execo_g5k.oargrid.wait_oargrid_job_start(oargrid_job_id=None, frontend_connection_params=None, timeout=False)

Sleep until an oargrid job’s start time.

Parameters:
  • oargrid_job_id – the oargrid job id.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.

get_oargrid_job_nodes

execo_g5k.oargrid.get_oargrid_job_nodes(oargrid_job_id, frontend_connection_params=None, timeout=False)

Return an iterable of execo.host.Host containing the hosts of an oargrid job.

Parameters:
  • oargrid_job_id – the oargrid job id.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • timeout – timeout for retrieving. Default is False, which means use execo_g5k.config.g5k_configuration['default_timeout']. None means no timeout.

kadeploy3

The most user friendly way to deploy with execo is to use function execo_g5k.kadeploy.deploy with adds a layer of “cleverness” over kadeploy. It allows:

  • deploying over several sites simultaneously
  • using a test (parameter check_deployed_command: a shell command or construct) to run on the deployed hosts, to detect if the host has been deployed. The benefit is that if you call deploy at the beginning of your experiment script, then when relaunching your script while the hosts are already deployed, the deploy function will detect that they are already deployed and won’t redeploy them. The default is to use a check_deployed_command which will work with 99% of the deployed images. If needed this test can be tweaked or deactivated.
  • retrying the deployment several times, each time redeploying only the hosts which fail the aforementioned test.
  • for complex experiment scenarios, you can provide a callback function (parameter check_enough_func) which will be called at the end of each deployment try, given the list of correctly deployed and failed hosts, and which can decide if the deployed resources are sufficient for the experiment or not. If sufficient, no further deployment retry is done. This can be usefull for experiment needing complex and large grid topologies, where you can almost never have 100% deployment success, but still want the experiment to proceed as soon as you have sufficient resources for setting up your experiment topology.

Internally, execo_g5k.kadeploy.deploy uses class execo_g5k.kadeploy.Kadeployer which derives from the execo.action.Action class hierarchy, and offers no special cleverness, but handles simultaneous deployment to multiple sites. If needed it also allows asynchronous control of multiple deployments in parallel, if needed.

Deployment

class execo_g5k.kadeploy.Deployment(hosts=None, env_file=None, env_name=None, user=None, vlan=None, other_options=None)

A kadeploy3 deployment, POD style class.

env_file = None

filename of an environment to deploy

env_name = None

name of a kadeploy3 registered environment to deploy.

there must be either one of env_name or env_file parameter given. If none given, will try to use the default environement from g5k_configuration.

hosts = None

hosts: iterable of execo.host.Host on which to deploy.

other_options = None

string of other options to pass to kadeploy3

user = None

kadeploy3 user

vlan = None

if given, kadeploy3 will automatically switch the nodes to this vlan after deployment

Kadeployer

Inheritance diagram of execo_g5k.kadeploy.Kadeployer
class execo_g5k.kadeploy.Kadeployer(deployment, frontend_connection_params=None, stdout_handlers=None, stderr_handlers=None)

Bases: execo.action.Remote

Deploy an environment with kadeploy3 on several hosts.

Able to deploy in parallel to multiple frontends.

Parameters:
  • deployment – instance of Deployment class describing the intended kadeployment.
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • stdout_handlers – iterable of ProcessOutputHandlers which will be passed to the actual deploy processes.
  • stderr_handlers – iterable of ProcessOutputHandlers which will be passed to the actual deploy processes.
deployed_hosts = None

Iterable of Host containing the deployed hosts. This iterable won’t be complete if the Kadeployer has not terminated.

deployment = None

Instance of Deployment class describing the intended kadeployment.

frontend_connection_params = None

Connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.

timeout = None

Deployment timeout

undeployed_hosts = None

Iterable of Host containing the hosts not deployed. This iterable won’t be complete if the Kadeployer has not terminated.

deploy

execo_g5k.kadeploy.deploy(deployment, check_deployed_command=True, node_connection_params={'user': 'root'}, num_tries=1, check_enough_func=None, frontend_connection_params=None, deploy_timeout=None, check_timeout=30, stdout_handlers=None, stderr_handlers=None)

Deploy nodes, many times if needed, checking which of these nodes are already deployed with a user-supplied command. If no command given for checking if nodes deployed, rely on kadeploy to know which nodes are deployed.

  • loop num_tries times:
    • if check_deployed_command given, try to connect to these hosts using the supplied node_connection_params (or the default ones), and to execute check_deployed_command. If connection succeeds and the command returns 0, the host is assumed to be deployed, else it is assumed to be undeployed.
    • optionnaly call user-supplied check_enough_func, passing to it the list of deployed and undeployed hosts, to let user code decide if enough nodes deployed. Otherwise, try as long as there are undeployed nodes.
    • deploy the undeployed nodes

returns a tuple with the list of deployed hosts and the list of undeployed hosts.

When checking correctly deployed nodes with check_deployed_command, and if the deployment is using the kavlan option, this function will try to contact the nodes using the appropriate DNS hostnames in the new vlan.

Parameters:
  • deployment – instance of execo.kadeploy.Deployment class describing the intended kadeployment.
  • check_deployed_command – command to perform remotely to check node deployement. May be a String, True, False or None. If String: the actual command to be used (This command should return 0 if the node is correctly deployed, or another value otherwise). If True, the default command value will be used (from execo_g5k.config.g5k_configuration). If None or False, no check is made and deployed/undeployed status will be taken from kadeploy’s output.
  • node_connection_params – a dict similar to execo.config.default_connection_params whose values will override those in execo.config.default_connection_params when connecting to check node deployment with check_deployed_command (see below).
  • num_tries – number of deploy tries
  • check_enough_func – a function taking as parameter a list of deployed hosts and a list of undeployed hosts, which will be called at each deployment iteration end, and that should return a boolean indicating if there is already enough nodes (in this case, no further deployement will be attempted).
  • frontend_connection_params – connection params for connecting to frontends if needed. Values override those in execo_g5k.config.default_frontend_connection_params.
  • deploy_timeout – timeout for deployement. Default is None, which means no timeout.
  • check_timeout – timeout for node deployment checks. Default is 30 seconds.
  • stdout_handlers – iterable of ProcessOutputHandlers which will be passed to the actual deploy processes.
  • stderr_handlers – iterable of ProcessOutputHandlers which will be passed to the actual deploy processes.

Utilities

format_oar_date

execo_g5k.oar.format_oar_date(ts)

Return a string with the formatted date (year, month, day, hour, min, sec, ms) formatted for oar/oargrid.

timezone is forced to Europe/Paris, and timezone info is discarded, for g5k oar/oargrid.

Parameters:tz – a date in one of the formats handled.

format_oar_duration

execo_g5k.oar.format_oar_duration(duration)

Return a string with a formatted duration (hours, mins, secs, ms) formatted for oar/oargrid.

Parameters:duration – a duration in one of the formats handled.

oar_date_to_unixts

execo_g5k.oar.oar_date_to_unixts(date)

Convert a date in the format returned by oar/oargrid to an unix timestamp.

Timezone of g5k oar/oargrid timestamps is Europe/Paris.

oar_duration_to_seconds

execo_g5k.oar.oar_duration_to_seconds(duration)

Convert a duration in the format returned by oar/oargrid to a number of seconds.

get_kavlan_host_name

execo_g5k.utils.get_kavlan_host_name(host, vlanid)

Returns the DNS hostname of a host once switched in a kavlan.

G5kAutoPortForwarder

class execo_g5k.utils.G5kAutoPortForwarder(site, host, port)

Context manager for automatically opening a port forwarder if outside grid5000

Parameters:
  • site – the grid5000 site to connect to
  • host – the host to connect to in the site
  • port – the port to connect to on the host

It automatically decides if a port forwarding is needed, depending if run from inside or outside grid5000.

When entering the context, it returns a tuple actual host, actual port to connect to. It is the real host/port if inside grid5000, or the forwarded port if outside. The forwarded port, if any, is guaranteed to be operational.

When leaving the context, it kills the port forwarder background process.

Grid5000 API utilities

Execo_g5k provides easy-to-use functions to interacts with the Grid5000 API. All static parts of the API is stored locally, in $HOME/.execo/g5k_api_cache, which version is checked on the first call to a method. Beware that if API is not reachable at runtime, the local cache will be used.

Functions for wrapping the grid5000 REST API. This module also manage a cache of the Grid‘5000 Reference API (hosts and network equipments) (Data are stored in $HOME/.execo/g5k_api_cache/’ under pickle format)

All queries to the Grid5000 REST API are done with or without credentials, depending on key api_username of execo_g5k.config.g5k_configuration. If credentials are used, the password is interactively asked and if the keyring module is available, the password will then be stored in the keyring through this module and will not be asked in subsequent executions. (the keyring python module allows to access the system keyring services like gnome-keyring or kwallet. If a password needs to be changed, do it from the keyring GUI).

This module is thread-safe.

APIConnection

class execo_g5k.api_utils.APIConnection(base_uri=None, username=None, password=None, headers=None, additional_args=None, timeout=300)

Basic class for easily getting url contents.

Intended to be used to get content from restfull apis, particularly the grid5000 api.

Parameters:
  • base_uri – server base uri. defaults to g5k_configuration.get('api_uri')
  • username – username for the http connection. If None (default), use default from g5k_configuration.get('api_username'). If False, don’t use a username at all.
  • password – password for the http connection. If None (default), get the password from a keyring (if available) or interactively.
  • headers – http headers to use. If None (default), default headers accepting json answer will be used.
  • additional_args – a list of optional arguments (strings) to pass at the end of the url of all http requests.
  • timeout – timeout for the http connection.
get(relative_uri)

Get the (response, content) tuple for the given path on the server

post(relative_uri, body)

Submit the body to a given path on the server, returns the (response, content) tuple

Sites

execo_g5k.api_utils.get_g5k_sites()

Get the list of Grid5000 sites. Returns an iterable.

execo_g5k.api_utils.get_site_attributes(site)

Get the attributes of a site (as known to the g5k api) as a dict

execo_g5k.api_utils.get_site_clusters(site, queues='default')

Get the list of clusters from a site. Returns an iterable.

Parameters:
execo_g5k.api_utils.get_site_hosts(site, queues='default')

Get the list of hosts from a site. Returns an iterable.

Parameters:
execo_g5k.api_utils.get_site_network_equipments(site)

Get the list of network elements from a site. Returns an iterable.

Clusters

execo_g5k.api_utils.get_g5k_clusters(queues='default')

Get the list of all g5k clusters. Returns an iterable.

Parameters:queues – queues filter, see execo_g5k.api_utils.filter_clusters
execo_g5k.api_utils.get_cluster_attributes(cluster)

Get the attributes of a cluster (as known to the g5k api) as a dict

execo_g5k.api_utils.get_cluster_hosts(cluster)

Get the list of hosts from a cluster. Returns an iterable.

execo_g5k.api_utils.get_cluster_site(cluster)

Get the site of a cluster.

execo_g5k.api_utils.get_cluster_network_equipments(cluster)

Get the list of the network equipments used by a cluster

Hosts

execo_g5k.api_utils.get_g5k_hosts(queues='default')

Get the list of all g5k hosts. Returns an iterable.

Parameters:queues – queues filter, see execo_g5k.api_utils.filter_clusters
execo_g5k.api_utils.get_host_attributes(host)

Get the attributes of a host (as known to the g5k api) as a dict

execo_g5k.api_utils.get_host_site(host)

Get the site of a host.

Works both with a bare hostname or a fqdn.

execo_g5k.api_utils.get_host_cluster(host)

Get the cluster of a host.

Works both with a bare hostname or a fqdn.

execo_g5k.api_utils.get_host_network_equipments(host)

Network equipments

execo_g5k.api_utils.get_network_equipment_attributes(equip)

Get the attributes of a network equipment of a site as a dict

execo_g5k.api_utils.get_network_equipment_site(equip)

Return the site of a network_equipment

Other

execo_g5k.api_utils.get_resource_attributes(path)

Get generic resource (path on g5k api) attributes as a dict

execo_g5k.api_utils.group_hosts(hosts)

Given a sequence of hosts, group them in a dict by sites and clusters

execo_g5k.api_utils.canonical_host_name(host)

Convert, if needed, the host name to its canonical form without kavlan part.

Can be given a Host, will return a Host. Can be given a string, will return a string. Works with short or fqdn forms of hostnames.

execo_g5k.api_utils.filter_clusters(clusters, queues='default')

Filter a list of clusters on their queue(s).

Given a list of clusters, return the list filtered, keeping only clusters that have at least one oar queue matching one in the list of filter queues passed in parameters. The cluster queues are taken from the queues attributes in the grid5000 API. If this attribute is missing, the cluster is considered to be in queues [“admin”, “default”, “besteffort”]

Parameters:
  • clusters – list of clusters
  • queues – a queue name or a list of queues. clusters will be kept in the returned filtered list only if at least one of their queues matches one queue of this parameter. If queues = None or False, no filtering at all is done. By default, keeps clusters in queue “default”.
execo_g5k.api_utils.get_hosts_metric(hosts, metric, from_ts=None, to_ts=None, resolution=1)

Get metric values from Grid‘5000 metrology API

Parameters:
  • hosts – List of hosts
  • metric – Grid‘5000 metrology metric to fetch (eg: “power”, “cpu_user”)
  • from_ts – Time from which metric is collected, in any type supported by execo.time_utils.get_unixts, optional.
  • to_ts – Time until which metric is collected, in any type supported by execo.time_utils.get_unixts, optional.
  • resolution – time resolution, in any type supported by execo.time_utils.get_seconds, default 1 second.
Returns:

A dict of host -> List of (timestamp, metric value) retrieved from API

Planning utilities

Module provides functions to help you to plan your experiment on Grid‘5000.

Retrieve resources planning

These functions allow to retrieve retrieve resources planning, compute the slots and find slots with specific properties.

execo_g5k.planning.get_planning(elements=['grid5000'], vlan=False, subnet=False, storage=False, out_of_chart=False, starttime=None, endtime=None, ignore_besteffort=True, queues='default')

Retrieve the planning of the elements (site, cluster) and others resources. Element planning structure is {'busy': [(123456,123457), ... ], 'free': [(123457,123460), ... ]}.

Parameters:
  • elements – a list of Grid‘5000 elements (‘grid5000’, <site>, <cluster>)
  • vlan – a boolean to ask for KaVLAN computation
  • subnet – a boolean to ask for subnets computation
  • storage – a boolean to ask for sorage computation
  • out_of_chart – if True, consider that days outside weekends are busy
  • starttime – start of time period for which to compute the planning, defaults to now + 1 minute
  • endtime – end of time period for which to compute the planning, defaults to 4 weeks from now
  • ignore_besteffort – True by default, to consider the resources with besteffort jobs as available
  • queues – list of oar queues for which to get the planning

Return a dict whose keys are sites, whose values are dict whose keys are cluster, subnets, kavlan or storage, whose values are planning dicts, whose keys are hosts, subnet address range, vlan number or chunk id planning respectively.

execo_g5k.planning.compute_slots(planning, walltime, excluded_elements=None)

Compute the slots limits and find the number of available nodes for each elements and for the given walltime.

Return the list of slots where a slot is [ start, stop, freehosts ] and freehosts is a dict of Grid‘5000 element with number of nodes available {'grid5000': 40, 'lyon': 20, 'reims': 10, 'stremi': 10 }.

WARNING: slots does not includes subnets

Parameters:
  • planning – a dict of the resources planning, returned by get_planning
  • walltime – a duration in a format supported by get_seconds where the resources are available
  • excluded_elements – list of elements that will not be included in the slots computation
execo_g5k.planning.find_first_slot(slots, resources_wanted)

Return the first slot (a tuple start date, end date, resources) where some resources are available

Parameters:
  • slots – list of slots returned by compute_slots
  • resources_wanted – a dict of elements that must have some free hosts
execo_g5k.planning.find_max_slot(slots, resources_wanted)

Return the slot (a tuple start date, end date, resources) with the maximum nodes available for the given elements

Parameters:
  • slots – list of slots returned by compute_slots
  • resources_wanted – a dict of elements that must be maximized
execo_g5k.planning.find_free_slot(slots, resources_wanted)

Return the first slot (a tuple start date, end date, resources) with enough resources

Parameters:
  • slots – list of slots returned by compute_slots
  • resources_wanted – a dict describing the wanted ressources {'grid5000': 50, 'lyon': 20, 'stremi': 10 }

Jobs generation tools

execo_g5k.planning.distribute_hosts(resources_available, resources_wanted, excluded_elements=None, ratio=None)

Distribute the resources on the different sites and cluster

Parameters:
  • resources_available – a dict defining the resources available
  • resources_wanted – a dict defining the resources available you really want
  • excluded_elements – a list of elements that won’t be used
  • ratio – if not None (the default), a float between 0 and 1, to actually only use a fraction of the resources.
execo_g5k.planning.get_jobs_specs(resources, excluded_elements=None, name=None)

Generate the several job specifications from the dict of resources and the blacklisted elements

Parameters:
  • resources – a dict, whose keys are Grid‘5000 element and values the corresponding number of n_nodes
  • excluded_elements – a list of elements that won’t be used
  • name – the name of the jobs that will be given

Plot functions

execo_g5k.planning.draw_gantt(planning, colors=None, show=False, save=True, outfile=None)

Draw the hosts planning for the elements you ask (requires Matplotlib)

Parameters:
  • planning – the dict of elements planning
  • colors – a dict to define element coloring {'element': (255., 122., 122.)}
  • show – display the Gantt diagram
  • save – save the Gantt diagram to outfile
  • outfile – specify the output file
execo_g5k.planning.draw_slots(slots, colors=None, show=False, save=True, outfile=None)

Draw the number of nodes available for the clusters (requires Matplotlib >= 1.2.0)

Parameters:
  • slots – a list of slot, as returned by compute_slots
  • colors – a dict to define element coloring {'element': (255., 122., 122.)}
  • show – display the slots versus time
  • save – save the plot to outfile
  • outfile – specify the output file

Grid5000 Charter

g5k_charter_remaining

execo_g5k.charter.g5k_charter_remaining(sites, day, user=None)

Returns the amount of time remaining per cluster for submitting jobs, according to the grid5000 charter.

Grid5000 usage charter: https://www.grid5000.fr/mediawiki/index.php/Grid5000:UserCharter This function is only available if psycopg2 is available.

Parameters:
  • sites – a list of grid5000 sites
  • day – a datetime.date
Returns:

a dict, keys are cluster names, value is remaining time for each cluster (in seconds * core)

g5k_charter_time

execo_g5k.charter.g5k_charter_time(t)

Is the given date in a g5k charter time period ?

Returns a boolean, True if the given date is in a period where the g5k charter needs to be respected, False if it is in a period where charter is not applicable (night, weekends, non working days)

Parameters:t – a date in a type supported by execo.time_utils.get_unixts

get_next_charter_period

execo_g5k.charter.get_next_charter_period(start, end)

Return the next g5k charter time period.

Parameters:
  • start – timestamp in a type supported by execo.time_utils.get_unixts from which to start searching for the next g5k charter time period. If start is in a g5k charter time period, the returned g5k charter time period starts at start.
  • end – timestamp in a type supported by execo.time_utils.get_unixts until which to search for the next g5k charter time period. If end is in the g5k charter time period, the returned g5k charter time period ends at end.
Returns:

a tuple (charter_start, charter_end) of unix timestamps. (None, None) if no g5k charter time period found

Grid5000 Topology

A module based on networkx to create a topological graph of the Grid‘5000 platform. “Nodes” are used to represent elements (compute nodes, switch, router, renater) and “Edges” are the network links. Nodes has a kind data (+ power and core for compute nodes) whereas edges has bandwidth and latency information.

All information comes from the Grid‘5000 reference API.

class execo_g5k.topology.g5k_graph(elements=None)

Main graph representing the topology of the Grid‘5000 platform. All nodes elements are defined with their FQDN

Create the MultiGraph() representing Grid‘5000 network topology

Parameters:sites – add the topology of the given site(s)
add_backbone()

Add the nodes corresponding to Renater equipments

add_cluster(cluster, data=None)

Add the cluster to the graph

add_equip(equip, site)

Add a network equipment

add_host(host, data=None)

Add a host in the graph

Parameters:
  • host – a string corresponding to the node name
  • data – a dict containing the Grid‘5000 host attributes
add_site(site, data=None)

Add a site to the graph

get_backbone()

Return

get_clusters(site=None)

Return the list of clusters

get_equip_hosts(equip)

Return the nodes which are connected to the equipment

get_host_adapters(host)

Return the mountable network interfaces from a host

get_host_neighbours(host)

Return the compute nodes that are connected to the same switch, router or linecard

get_hosts(cluster=None, site=None)

Return the list of nodes corresponding to hosts

get_site_router(site)

Return the node corresponding to the router of a site

get_sites()

Return the list of sites

rm_backbone()

Remove all elements from the backbone

rm_cluster(cluster)

Remove the cluster from the graph

rm_equip(equip)

Remove an equipment from the node

rm_host(host)

Remove the host from the graph

rm_site(site)

Remove the site from the graph

execo_g5k.topology.treemap(gr, nodes_legend=None, edges_legend=None, nodes_labels=None, layout='neato', compact=False)

Create a treemap of the topology and return a matplotlib figure

Parameters:
  • nodes_legend – a dict of dicts containing the parameter used to draw the nodes, such as ‘myelement’: {‘color’: ‘#9CF7BC’, ‘shape’: ‘p’, ‘size’: 200}
  • edges_legend – a dict of dicts containing the parameter used to draw the edges, such as bandwidth: {‘width’: 0.2, ‘color’: ‘#666666’}
  • nodes_labels – a dict of dicts containing the font parameters for the labels, such as ‘myelement ‘: {‘nodes’: {}, ‘font_size’: 8, ‘font_weight’: ‘bold’, ‘str_func’: lambda n: n.split(‘.’)[1].title()}
  • layout – the graphviz tool to be used to compute node position
  • compact – represent only on node for a cluster/cabinet

WARNING: This function use matplotlib.figure that by default requires a DISPLAY. If you want use this on a headless host, you need to change the matplotlib backend before to import execo_g5k.topology module.

Configuration

This module may be configured at import time by defining two dicts g5k_configuration and default_frontend_connection_params in the file ~/.execo.conf.py

The g5k_configuration dict contains global g5k configuration parameters.

execo_g5k.config.g5k_configuration = {'no_ssh_for_local_frontend': False, 'api_uri': 'https://api.grid5000.fr/3.0/', 'default_env_name': None, 'default_timeout': 900, 'check_deployed_command': "! (mount | grep -E '^/dev/[[:alpha:]]+2 on / ')", 'oar_pgsql_ro_db': 'oar2', 'api_username': 'mimbert', 'default_env_file': None, 'kadeploy3_options': '-k -d', 'oar_pgsql_ro_user': 'oarreader', 'oar_job_key_file': '/home/mimbert/.oar_key', 'default_frontend': 'lyon', 'api_additional_args': [], 'oar_pgsql_ro_port': 5432, 'polling_interval': 20, 'kadeploy3': 'kadeploy3', 'tiny_polling_interval': 10, 'oar_pgsql_ro_password': 'read'}

Global Grid5000 configuration parameters.

  • kadeploy3: kadeploy3 command.
  • kadeploy3_options: a string with kadeploy3 command line options.
  • default_env_name: a default environment name to use for deployments (as registered to kadeploy3).
  • default_env_file: a default environment file to use for deployments (for kadeploy3).
  • default_timeout: default timeout for all calls to g5k services (except deployments).
  • check_deployed_command: default shell command used by execo_g5k.kadeploy.deploy to check that the nodes are correctly deployed. This command should return 0 if the node is correctly deployed, or another value otherwise. The default checks that the root is not on the second partition of the disk.
  • no_ssh_for_local_frontend: if True, don’t use ssh to issue g5k commands for local site. If False, always use ssh, both for remote frontends and local site. Set it to True if you are sure that your scripts always run on the local frontend.
  • polling_interval: time interval between pollings for various operations, eg. wait oar job start.
  • tiny_polling_interval: small time interval between pollings for various operations. Used for example when waiting for a job start, and start date of the job is over but the job is not yet in running state.
  • default_frontend: address of default frontend.
  • api_uri: base uri for g5k api serverr.
  • api_username: username to use for requests to the Grid5000 REST API. If None, no credentials will be used, which is fine when running on a Grid5000 frontend.
  • api_additional_args: additional arguments to append at the end all requests to g5k api. May be used to request the testing branch (use: api_additional_args = [“branch=testing”])
  • oar_job_key_file: ssh key to use for oar. If defined, takes precedence over environment variable OAR_JOB_KEY_FILE.

Its default values are:

g5k_configuration = {
    'kadeploy3': 'kadeploy3',
    'kadeploy3_options': '-k -d',
    'default_env_name': None,
    'default_env_file': None,
    'default_timeout': 900,
    'check_deployed_command': "! (mount | grep -E '^/dev/[[:alpha:]]+2 on / ')",
    'no_ssh_for_local_frontend' : False,
    'polling_interval' : 20,
    'tiny_polling_interval' : 10,
    'default_frontend' : None,
    'api_uri': "https://api.grid5000.fr/3.0/",
    'api_username': None,
    'api_additional_args': [],
    'oar_job_key_file': None,
    'oar_pgsql_ro_db': 'oar2',
    'oar_pgsql_ro_user': 'oarreader',
    'oar_pgsql_ro_password': 'read',
    'oar_pgsql_ro_port': 5432,
    }

The default_frontend_connection_params dict contains default parameters for remote connections to grid5000 frontends.

execo_g5k.config.default_frontend_connection_params = {'host_rewrite_func': <function frontend_rewrite_func>, 'pty': True}

Default connection params when connecting to a Grid5000 frontend.

Its default values are:

    default_frontend_connection_params = {
        'pty': True,
        'host_rewrite_func': lambda host: host + ".grid5000.fr"
        }

in default_frontend_connection_params, the host_rewrite_func configuration variable is set to automatically map a site name to its corresponding frontend, so that all commands are run on the proper frontends.

default_oarsh_oarcp_params contains default connection parameters suitable to connect to grid5000 nodes with oarsh / oarcp.

execo_g5k.config.default_oarsh_oarcp_params = {'taktuk_connector': 'ssh', 'port': 6667, 'host_rewrite_func': <function host_rewrite_func>, 'ssh': 'ssh', 'pty': True, 'keyfile': '/home/mimbert/.ssh/oar_key', 'scp': 'scp', 'user': 'oar'}

A convenient, predefined connection paramaters dict with oarsh / oarcp configuration.

See execo.config.make_default_connection_params

Its default values are:

    default_oarsh_oarcp_params = {
        'ssh':         'oarsh',
        'scp':         'oarcp',
        'taktuk_connector': 'oarsh',
        'pty': True,
        }

OAR keys

Oar/oargrid by default generate job specific ssh keys. So by default, one has to retrieve these keys and explicitely use them for connecting to the jobs, which is painfull. Another possibility is to tell oar/oargrid to use specific keys. Oar can automatically use the key pointed to by the environement variable OAR_JOB_KEY_FILE if it is defined. Oargrid does not automatically use this key, but execo also takes care of explicitely telling oargrid to use it if it is defined. So the most convenient way to use execo/oar/oargrid, is to set OAR_JOB_KEY_FILE in your ~/.profile to point to your internal Grid5000 ssh key and export this environment variable, or use the oar_job_key_file in execo_g5k.config.g5k_configuration.

Running from another host than a frontend

Note that when running a script from another host than a frontend (from a node or from your laptop outside Grid5000), everything will work except oarsh / oarcp connections, since these executables only exist on frontends. In this case you can still connect by using ssh / scp as user oar on port 6667 (ie. use connection_params = {'user': 'oar', 'port': 6667}). This is the port where an oar-specific ssh server is listening. This server will then change user from oar to the user currently owning the node.

Running from outside Grid5000

Execo scripts can be run from outside grid5000 with a subtle configuration of both execo and ssh.

First, in ~/.ssh/config, declare aliases for g5k connection (through the access machine). For example, here is an alias g5k for connecting through the lyon access:

Host *.g5k g5k
  ProxyCommand ssh access.grid5000.fr "nc -q 0 `echo %h | sed -ne 's/\.g5k$//p;s/^g5k$/lyon/p'` %p"

Then in ~/.execo.conf.py put this code:

import re

default_connection_params = {
    'host_rewrite_func': lambda host: re.sub("\.grid5000\.fr$", ".g5k", host),
    'taktuk_gateway': 'g5k'
    }


default_frontend_connection_params = {
    'host_rewrite_func': lambda host: host + ".g5k"
    }

g5k_configuration = {
    'api_username': '<g5k_username>',
    }

Now, every time execo tries to connect to a host, the host name is rewritten as to be reached through the Grid5000 ssh proxy connection alias, and the same for the frontends.

The perfect grid5000 connection configuration

  • use separate ssh keys for connecting from outside to grid5000 and inside grid5000:
    • use your regular ssh key for connecting from outside grid5000 to inside grid5000 by adding your regular public key to ~/.ssh/authorized_keys on each site’s nfs.
    • generate (with ssh-keygen)a specific grid5000 private/public key pair, without passphrase, for navigating inside grid5000. replicate ~/.ssh/id_dsa and ~/.ssh/id_dsa.pub (for a dsa key pair, or the equivalent rsa keys for an rsa key pair) to ~/.ssh/ on each site’s nfs and add ~/.ssh/id_dsa.pub to ~/.ssh/authorized_keys on each site’s nfs.
  • add export OAR_JOB_KEY_FILE=~/.ssh/id_dsa (or id_rsa) to each site’s ~/.bash_profile
  • Connections should then work directly with oarsh/oarcp if you use execo_g5k.config.default_oarsh_oarcp_params connection parameters. Connections should also work directly with ssh for nodes reserved with the allow_classic_ssh option. Finally, for deployed nodes, connections should work directly because option -k is passed to kadeploy3 by default.

TODO: Currently, due to an ongoing bug or misconfiguration (see https://www.grid5000.fr/cgi-bin/bugzilla3/show_bug.cgi?id=3302), oar fails to access the ssh keys if they are not world-readable, so you need to make them so.

Note also that configuring default_oarsh_oarcp_params to bypass oarsh/oarcp and directly connect to port 6667 will save you from many problems such as high number of open pty as well as impossibility to kill oarsh / oarcp processes (due to it running sudoed):

default_oarsh_oarcp_params = {
    'user':        "oar",
    'keyfile':     "path/to/ssh/key/used/for/oar",
    'port':        6667,
    'ssh':         'ssh',
    'scp':         'scp',
    'taktuk_connector': 'ssh',
    }