Executing Workflows¶
The yadage-run Command¶
The main command line tool in yadage is the yadage-run command. This command reads a workflow definition, sequentially builds the DAG from the stage definitions when their predicates become true and submits the individual packtivities (graph nodes) to a backend (default: Multiprocessing Pool). After execution a number of visualizations are produced for convenience.
Main Usage¶
The obligatory arguments for yadage-run a writeable work directory and the workflow definition:
yadage-run <workdir> <workflow def>
If the workflow takes input parameters they can be provided as a third argument in YAML format:
yadage-run <workdir> <workflow def> <input parameters>
Execution Backends¶
Yadage supports a number of different backends used to execute the packtivities. The default is a multiprocessing pool openend at the machine used to execute the yadage-run
command. For multi-machine distributed execution, currently a shared filesystem is needed. Possible distributed backends are a Celery cluster or IPython clusters. Custom backends can be implemented by implementing as described in the packtivity documentaion.
The backend choice is set via the -b/--backend
command line option
Example:¶
A basic example with of usage with an ipython cluster with 4 cores woud be:
yadage-run -b ipcluster:4 <workdir> <workflow def>