PJ_DUMP(1)
==========
:doctype: manpage


NAME
----
pj_dump - dumps a paje trace file in a CSV-like textual format


SYNOPSIS
--------
*pj_dump* ['OPTIONS'] ['FILE']


DESCRIPTION
-----------

The pj_dump(1) command translates the paje trace file 'FILE' to a
CSV-like textual format (described below).  It is a useful program to
analyze the behavior of parallel and distributed applications that
were traced using some library that generates trace files in the Paje
file format. Once you dump the contents of the Paje trace file in a
CSV-like manner, you are free to analyze the contents of the trace the
way you want. You can use R for example to draw scatter plots and
gantt charts. If 'FILE' is not provided, then the standard input is
used.

By default, *pj_dump* will read the trace file from the beginning
until the end of file is found. During this process, *pj_dump* relies
on the Paje library to recreate in memory the behavior that is
registered in the trace files. This means that *pj_dump* will put in
memory all the contents of the trace file, even if the input is very
large. Once all the contents of the trace file are simulated,
*pj_dump* dumps the information in the CSV-like textual format
described below in the OUTPUT DESCRIPTION section.

You can change the default behavior of *pj_dump* by providing the
parameters *--start=START* and *--end=END* where START and END are
valid timestamps of the input trace. If provided, it dumps only the
contents of the trace between START and END. Note that even if used,
*pj_dump* will simulate the whole trace file to keep the same semantic
of behavior. Another way to change the default behavior is through the
*--stop-at=TIME* parameter. If provided, *pj_dump* will read the trace
file up to timestamp TIME (considering that the trace file is
completely time ordered) and dumps what has been simulated until
then. The *--no-strict* switch should be avoided and can be used only
with old Paje trace files with old field names in event
definitions. The *--ignore-incomplete-links* switch make *pj_dump*
ignore incomplete links silently. More details on this switch below,
in the OUTPUT DESCRIPTION section.


OPTIONS
-------

*pj_dump* accepts the following options:    

*-a, --stop-at*='TIME'::
    Stop the trace simulation at TIME.

*-s, --start*=START::
    Dump starts at timestamp START (instead of timestamp 0).

*-e, --end*='END'::
    Dump ends timestamp END (instead of End Of File).

*-n, --no-strict*::
    Support old field names in event definitions.

*-z, --ignore-incomplete-links*::
    Ignore incomplete links without warnings.

*-u, --user-defined*::
    Dump user-defined fields. See USER-DEFINED FIELDS section below.

*--type-hierarchy*='FILE'::
    Dump the type hierarchy in CSV format to FILE.
    
*--entity-hierarchy*='FILE'::
    Dump the entity hierarchy in CSV format to FILE.

*-f, --flex*::
    Use alternative file reader based on flex/bison (experimental).

*-?, --help*::
    Show all the available options.

*--usage*::
    Give a short usage message.

INPUT DESCRIPTION
-----------------

The pj_dump(1) command expects an input that follows the Paje file
format (as described in the PDF document listed in the RESOURCES
section of this page). If FILE is not provided, pj_dump(1) will try
to read from the standard input.


OUTPUT DESCRIPTION
------------------

It's easier to understand what is written here if you are acquainted
to the Paje terminology (Container, State, Variable, Link, Event and
the information attached to each of these). Take a look to the
description of the Paje File Format (link below in the RESOURCES
section) for further details.

The contents of the lines generated by the pj_dump(1) command are
separated by commas, defining the columns. So, a line like this:

    Container, 0, LINK, 0, 4.48514, 4.48514, 9

has seven columns. The first column is always one of: Container,
State, Variable, Event or Link. The remaining columns of the line have
specific information depending on the first column. Here's a synthetic
description of the five different types of lines you'll find as output
of pj_dump(1):

  Container, parentContainer, containerType, startTime, endTime, duration, name
  State, container, stateType, startTime, endTime, duration, imbrication, value
  Variable, container, variableType, startTime, endTime, duration, value
  Event, container, eventType, time, value
  Link, container, linkType, startTime, endTime, duration, value, startContainer, endContainer

See below a detailed description with examples for each of them.

Container
~~~~~~~~~

All lines starting with _Container_ look like this:

    Container, 0, HOST, 0, 4.48514, 4.48514, Tremblay

1. "Container"
2. "0" - The name of the parent container
3. "HOST" - The type of this container
4. "0" - The starting time
5. "4.48514" - The finish time
6. "4.48514" -The duration
7. "Tremblay" - The name of this container

State
~~~~~

All lines starting with _State_ look like this:

    State, node48, SERVICE, 691, 692, 1, 0, booked

1. "State"
2. "node48" - The name of the container
3. "SERVICE" - The type of this state
4. "691" - The starting time
5. "692" - The finish time
6. "1" - The duration
7. "0" - The imbrication level
8. "booked" - The value of the state

Variable
~~~~~~~~

All lines starting with _Variable_ look like this:

    Variable, Tremblay, pcompute, 2.15357, 2.17013, 0.016554, 9.8095e+07

1. "Variable"
2. "Tremblay" - The name of the container
3. "pcompute" - The name of the variable
4. "2.15357" - The starting time
5. "2.17013" - The ending time
6. "0.016554" - The duration
7. "9.8095e+07" - The value of the variable

Event
~~~~~

All lines starting with _Event_ look like this:

    Event, Tremblay, msmark, 3.4286, finish_send_tasks

1. "Event"
2. "Tremblay" - The name of the container
3. "msmark" - The name of the event
4. "3.4286" - The instant in time when this event took place
5. "finish_send_tasks" - The value of the event

Link
~~~~

All lines starting with _Link_ look like this:

    Link, 0, 0-HOST1-LINK4, 0, 0, 0, G, Tremblay, 9, mpi_123

1. "Link"
2. "0" - The name of the container
3. "0-HOST1-LINK4" - The type of this link
4. "0" - The starting time
5. "0" - The ending time
6. "0" - The duration
7. "G" - The value of this link
8. "Tremblay" - The starting container
9. "9" - The ending container
10. "mpi_123" - The unique key

Incomplete Links
^^^^^^^^^^^^^^^^

According to the description of the Paje File Format, a link is formed
by two events: PajeStartLink and PajeEndLink. These events are matched
by the Paje Simulator using a key that is provided in the trace
file. If one of these two events are missing for some arbitrary reason
and the trace file ends (or the container is destroyed), you'll have a
simulation with **incomplete links**. Generally, the Paje Simulator,
and by consequence *pj_dump*, consider these links as errors, and list
them in the following manner:

    $ pj_dump  ~/tracefile.paje
    List of incomplete links in container '0':
    Link, 0, MSG_PROCESS_TASK_LINK, 0, -1, 0, SR, broadcaster-12, NULL
    Link, 0, MSG_PROCESS_TASK_LINK, 0.00013, -1, 0, SR, broadcaster-13, NULL
    Link, 0, MSG_PROCESS_TASK_LINK, 0.002868, -1, 0, SR, broadcaster-13, NULL
    (...)
    PajeLinkException: Incomplete links at the end of container with name '0'

The best action when this happens is to fix the tracer or the
converter that generated the trace, since it indicates probably some
error during the execution. If you think that this error is
acceptable, you can provide the *-z* switch to *pj_dump* to tell the
Paje Simulator to ignore incomplete links. All the trace file will be
dumped and all errors concerning incomplete links will be silently
ignored. Use with caution.

USER-DEFINED FIELDS
-------------------

User-defined fields is a feature of the Paje trace file format to add
additional information in the trace that does not belong to the
traditional fields of each event definition. An event definition with
four user-defined fields (Size, Params, Footprint and Tag) looks like
this:

    %EventDef PajeSetState 20
    %  Time      date
    %  Container string
    %  Type      string
    %  Value     string
    %  Size      string
    %  Params    string
    %  Footprint string
    %  Tag       string
    %EndEventDef

The dump of user-defined fields are disable by default in pj_dump. The
user can activate them by passing *-u* (or *--user-defined*) as
argument. When doing so, the CSV-like output of pj_dump will be
altered from the definition above (see OUTPUT DESCRIPTION). Besides
dumping the default fields for each entity and container of the trace,
the CSV will have additional fields that correspond to user-defined
fields. The order in which they appear in the output obeys the order
of the corresponding event definition. So a *State* defined with the
event definition 20 above will have four additional fields in the
CSV-like output.

RESOURCES
---------

Description of the Paje trace file:
<https://github.com/schnorr/pajeng/blob/master/doc/lang-paje/lang-paje.pdf>

Main web site:
<http://github.com/schnorr/pajeng/>


REPORTING BUGS
--------------

       Report pj_dump bugs to <http://github.com/schnorr/pajeng/issues>


COPYRIGHT
---------

Copyright \(C) 2012-2014 Lucas M. Schnorr. Free use of this software is granted under the terms of the GNU General Public License (GPL).


SEE ALSO
--------

*pj_validate(1)*
