pegasus-statistics(1)
=====================
:doctype: manpage


Name
----
pegasus-statistics - A tool to generate statistics about the workflow run.


Synopsis
--------
[verse]
*pegasus-statistics* [*-h*|*--help*]
                   [*-o*|*--output* 'dir']
                   [*-c*|*--conf* 'propfile']
                   [*-p*|*--statistics-level* 'level']
                   [*-t*|*--time-filter* 'filter']
                   [*-i*|*--ignore-db-inconsistency*]
                   [*-v*|*--verbose*]
                   [*-q*|*--quiet*]
                   [*-m*|*--multiple-wf*]
                   [*-p*|*--ispmc*]
                   [*-u*|*--isuuid*]
                   [['submitdir ..'] | ['workflow_uuid ..']]


Description
-----------
pegasus-statistics generates statistics about the workflow run like total
jobs/tasks/sub workflows ran, how many succeeded/failed etc. It generates
job instance statistics like run time, condor queue delay etc. It generates
invocation statistics information grouped by transformation name. It also
generates job instance and invocation statistics information grouped by
time and host.


Options
-------
*-h*::
*--help*::
Prints a usage summary with all the available command-line options.

*-o* 'dir'::
*--output*  'dir'::
Writes the output to the given directory.

*-c* 'propfile'::
*--conf*  'propfile'::
The properties file to use. This option overrides all other property files.

*-s* 'level'::
*--statistics-level* 'level'::
Specifies the statistics information to generate. Valid levels are: *all*,
*summary*, *wf_stats*, *jb_stats*, *tf_stats*, and *ti_stats*. Default is
*summary*. The output generated by pegasus-statistics is based on the the
'level' set:

- *all*: generates all the statistics information.

- *summary*: generates the workflow statistics summary. In the case of a
hierarchical workflow the summary is across all sub workflows.

- *wf_stats*: generates the workflow statistics information of each
individual workflow. In case of a hierarchical workflow the workflow
statistics are created for each sub workflow.

- *jb_stats*: generates the job statistics information of each individual
workflow. In case of hierarchical workflow the job statistics is created
for each sub workflows. Note: Not supported when generating statistics
over multiple workflows.

- *tf_stats*: generates the invocation statistics information of each
individual workflow grouped by transformation name .In case of hierarchical
workflow the transformation statistics is created for each sub workflows.

- *ti_stats*: generates the job instance and invocation statistics like total
count and runtime grouped by time and host.

*-t* 'filter'::
*--time-filter* 'filter'::
Specifies the time filter to group the time statistics. Valid 'filter' values
are: *month*, *week*, *day*, *hour*. Default is *day*.

*-i*::
*--ignore-db-inconsistency*::
Turn off the the check for database consistency.

*-v*::
*--verbose*::
Increases the log level.  If omitted, the default level will be set to
WARNING.  When this option is given, the log level is changed to INFO.
If this option is repeated, the log level will be changed to DEBUG.

*-q*::
*--quiet*::
Decreases the log level.  If omitted, the default level will be set to
WARNING.  When this option is given, the log level is changed to ERROR.

*-m*::
*--multiple-wf*::
Set this option when generating statistics over more than one workflow.
The tool automatically sets this flag if multiple submit directories or
multiple workflow UUIDs are provided. This option would need to be set
explicitly only to generate statistics over all workflows in a single
STAMPEDE database.
NOTE: When workflows are specified as UUIDs the --conf options
needs to be set for the tool to determine the STAMPEDE database
URL.

*-p*::
*--ispmc*::
Set this flag to generate statistics for workflows which are run with
PMC clustering enabled. It is recommended that this option be used when
calculating statistics over multiple workflow runs.

*-u*::
*--isuuid*::
Set this option if the positional argument are workflow UUIDs.
NOTE: When workflows are specified as UUIDs the --conf options
needs to be set for the tool to determine the STAMPEDE database
URL.

Example
-------
Runs pegasus-statistics and writes the output to the given directory:
----------
$ pegasus-statistics  -o /scratch/statistics /scratch/grid-setup/run0001
----------

Runs pegasus-statistics over a workflow run identified by a single workflow UUID:
----------
$ pegasus-statistics  --conf pegasusrc --isuuid 316f2986-7754-44ec-8b38-fcd0cb602ce0
----------

Runs pegasus-statistics over a workflow run identified by a multiple workflow UUID:
----------
$ pegasus-statistics  --conf pegasusrc --isuuid 316f2986-7754-44ec-8b38-fcd0cb602ce0 \
7ef77af8-4eb2-45ca-b37d-c5a02186133a
----------

Runs pegasus-statistics over all workflows in the STAMPEDE database:
----------
$ pegasus-statistics  --conf pegasusrc --multiple-wf
----------

Authors
-------
Prasanth Thomas
Rajiv Mayani

Pegasus Team <http://pegasus.isi.edu>
