Create analysis pipeline¶
Consider parseq_XES_scan and parseq_XAS as examples for the development steps described below.
Centralized facilities¶
Nodes and transformations¶
A pipeline for data processing of XAS spectra. This pipeline has multiple entry nodes and three fitting routines.A pipeline for data processing of XAS spectra. This pipeline has multiple entry nodes and three fitting routines.
An analysis pipeline consists of data nodes and transformations that connect the nodes, see an example on the right.
Data nodes define array names that will appear as attributes of data objects,
e.g as item.x
, with x
being an attribute of a data object item
. The
array values will be read from files or calculated from other arrays. The
pipeline can be used with or without GUI widgets. In the former case, the
defined node arrays will appear in the node plot: 1D, 2D or 3D (a stack of 2D
plots).
EXAFS data arranged in a tree. The item tooltips present data flow information, array sizes and error messages.
The data tree model (as in the Qt’s Model/View Programming concept) is a single object throughout the pipeline. In contrast, data tree widgets, see an example on the left, are present in each data node, not as a single tree instance, with the idea also to serve as a plot legend. Data can be rearranged by the user: ordered, renamed, grouped and removed. User selection in the data model is common for all transformation nodes. For 1D data, the line color is the same in all data nodes. 1D data plotting can optionally be done for several curves simultaneously: being selected either dynamically (via mouse selection) or statically (via check boxes). 2D and 3D data plotting is always done for one selected data object.
Each transformation class defines a dictionary of transformation parameters and default values for them. It also defines a static method that calculates data arrays. The transformation parameters will be attributed to each data object. The parameter values are supposed to be changed in GUI widgets. This change can be done simultaneously for one or several active data objects. Alternatively, any parameter can be copied to one or several later selected data.
Each transformation can optionally define the number of threads or processes
that will start in parallel to run the transformation of several data items.
The multiprocessing python API requires the main transformation method as a
static or class method (not an instance method). Additionally, for the sake
of inter-process data transfer in multiprocessing, all input and output arrays
have to be added to inArrays
and outArrays
lists (attributes of the
transformation class).
An example of the apply/reset popup menu on a control element.
In the pipeline GUI widgets, all interactive GUI elements can be registered
using a few dedicated methods of the base class PropWidget
. The
registration will enable (a) automatic GUI update from the active data and will
run transformations given the updated GUI elements, so no
signal slots
are typically required. The registration will also enable (b) copying
transformation parameters to other data by means of popup menus on each GUI
element, see on the right.
The transformation class docstrings will be built by ParSeq at the application strat up time using Sphinx into an html file and will be displayed in a help panel close to the transformation widget.
Undo and redo¶
TODO
File tree views and file formats¶
TODO
Metadata¶
TODO
Plots¶
TODO
Fits¶
EXAFS fit widget as an example of ParSeq fit widgets. It was built on top of the ParSeq base fit and base fit widget classes.EXAFS fit widget as an example of ParSeq fit widgets. It was built on top of the ParSeq base fit and base fit widget classes.
Data nodes can optionally host curve fitting routines. Similarly to transformations, fitting solvers can run in parallel for several data items. Fitting parameters can be constrained or tied to other parameters, also to parameters of another data item fit. See an example fit widget on the right.
Cross-data combinations¶
TODO
Standard data corrections¶
TODO
Project saving with data export and plot script generation¶
TODO
Command-line interface and start options¶
TODO
Error handling¶
TODO
Help system¶
TODO
Performance profiling¶
TODO
Prepare pipeline metadata and images¶
Create a project directory for the pipeline. Create __init__.py file that defines metadata about the project. Note that pipeline applications and ParSeq itself use the module parseq.core.singletons as a means to store global variables; the pipeline’s __init__.py module defines a few of them. Together with the docstrings of the module, these metadata will appear in the About dialog.
Create doc/_images directory and put an application icon there. The pipeline transformations will have class docstrings that may also include images; those images should be located here, in doc/_images.
Make data nodes¶
To define a node class means to name all plot arrays, define their roles, labels and units. The data containers may also have other array attributes that do not participate in plots; these are not to be declared.
Make data transformations¶
Start making a transformation class with defining a dictionary defaultParams
of default parameter values. Decide on using multiprocessing/multithreading by
specifying nThreads or nProcesses. If any of these numbers is > 1 (the
default values are both 1), specify two lists of array names: inArrays and
outArrays. Define a static or a class method Transform.run_main()
.
Note, it can have a few signatures. Within the method, get the actual
transformation parameters from the dictionary data.transformParams and the
defined data arrays as attributes of data, e.g. data.x
.
For expensive transformations, you should update the progress status.
For accessing arrays of other data objects, use a different signature of
Transform.run_main()
that contains the allData argument. Note that in
this case multiprocessing is not possible.
Make GUI widgets¶
The widgets that control transformation parameters are descendants of
PropWidget
. The main methods of that class are
PropWidget.registerPropWidget()
and
PropWidget.registerPropGroup()
. They use the Qt signal/slot mechanism to
update the corresponding transformation parameters; the user does not have to
explicitly implement the reaction slots. Additionally, these methods enable
copying transformation parameters to other data by means of popup menus, update
the GUI upon selecting data objects in the data tree, start the corresponding
transformation and operate undo and redo lists.
Because each transformation already has a set of default parameter values, these GUI widgets can gradually grow during the development time, without compromising the data pipeline functionality.
Provide docstrings in reStructuredText markup. They will be built by Sphinx and displayed near the corresponding widgets.
Make fitting worker classes¶
Similarly to a transformation class, a fitting class defines a dictionary
defaultParams, defines multiprocessing/multithreading and a static or a class
method Fit.run_main()
.
Make data pipeline¶
This is a small module that instantiates the above nodes, transformations, fits and widgets and connects them together.
Create test data tree¶
Put a few data files in a local folder (i.e. data) and create a module that defines a function that loads the data into a data tree, defines suitable transformation parameters and launches the first transformation (the next ones will start automatically).
Create pipeline starter¶
The starter should understand command line arguments and prepare options for loading the test data and to run the pipeline with and without GUI.
Creating development versions of analysis application¶
Copy the whole folder of the application to the same level but with a different name, e.g. append a version suffix. In the import section in the start script change the import name to the above created folder name. Done.