Statistics API¶
The statistics class calculates and stores statistics information about the data as it goes through each plugin. Each plugin object contains a statistics object called stats_obj, through which stats are calculated and accessed. However the dictionaries that contain the statistics for each plugin are class attributes – they are shared across all the statistics object.
Calculating slice statistics¶
Statistics are calculated frame by frame, in tandem with the processing of each frame/slice by a plugin. The method that does this is set_slice_stats, and is called for every frame. There is a dictionary within each stats object called stats that contains a list for each stat (max, min, mean etc). Every time set_slice_stats is called, these lists are appended with the stats for the last slice. The actual calculation of the statistics takes place in calc_slice_stats.
-
Statistics.
set_slice_stats
(my_slice, base_slice=None, pad=True)[source]¶ Sets slice stats for the current slice.
- Parameters
my_slice – The slice whose stats are being set.
base_slice – Provide a base slice to calculate residuals from, to calculate RMSD.
pad – Specify whether slice is padded or not (usually can leave as True even if slice is not padded).
-
Statistics.
calc_slice_stats
(my_slice, base_slice=None, pad=True)[source]¶ Calculates and returns slice stats for the current slice.
- Parameters
my_slice – The slice whose stats are being calculated.
base_slice – Provide a base slice to calculate residuals from, to calculate RMSD.
pad – Specify whether slice is padded or not (usually can leave as True even if slice is not padded).
Calculating volume statistics¶
At the end of the plugin, the method set_volume_stats is called, which deals all the post-processing tasks that need to be carried out. This method combines slice stats to create volume stats. For example, the mean of all the slice-wide means is calculated to create a volume mean. These volume stats are then inserted into the dictionary global_stats, which is the class-wide dictionary that stores all the stats for all the plugins. The keys in global_stats are plugin numbers, with the values being a list containing the stats for that plugin. Usually this list has just one item, but for iterative plugins the list will have as many items as there were iterations of the plugin.
Accessing stats¶
Stats can be accessed within a plugin using the following methods:
-
Statistics.
get_stats
(p_num=None, stat=None, instance=- 1)[source]¶ Returns stats associated with a certain plugin, given the plugin number (its place in the process list).
- Parameters
p_num – Plugin number of the plugin whose associated stats are being fetched. If p_num <= 0, it is relative to the plugin number of the current plugin being run. E.g current plugin number = 5, p_num = -2 –> will return stats of the third plugin. By default will gather stats for the current plugin.
stat – Specify the stat parameter you want to fetch, i.e ‘max’, ‘mean’, ‘median_std_dev’. If left blank will return the whole dictionary of stats: {‘max’: , ‘min’: , ‘mean’: , ‘mean_std_dev’: , ‘median_std_dev’: , ‘NRMSD’: }
instance – In cases where there are multiple set of stats associated with a plugin due to iterative loops or multi-parameters, specify which set you want to retrieve, i.e 3 to retrieve the stats associated with the third run of a plugin. Pass ‘all’ to get a list of all sets. By default will retrieve the most recent set.
-
Statistics.
get_stats_from_name
(plugin_name, n=None, stat=None, instance=- 1)[source]¶ Returns stats associated with a certain plugin.
- Parameters
plugin_name – name of the plugin whose associated stats are being fetched.
n – In a case where there are multiple instances of plugin_name in the process list, specify the nth instance. Not specifying will select the first (or only) instance.
stat – Specify the stat parameter you want to fetch, i.e ‘max’, ‘mean’, ‘median_std_dev’. If left blank will return the whole dictionary of stats: {‘max’: , ‘min’: , ‘mean’: , ‘mean_std_dev’: , ‘median_std_dev’: , ‘NRMSD’: }
instance – In cases where there are multiple set of stats associated with a plugin due to iterative loops or multi-parameters, specify which set you want to retrieve, i.e 3 to retrieve the stats associated with the third run of a plugin. Pass ‘all’ to get a list of all sets. By default will retrieve the most recent set.
-
Statistics.
get_stats_from_dataset
(dataset, stat=None, instance=- 1)[source]¶ Returns stats associated with a dataset.
- Parameters
dataset – The dataset whose associated stats are being fetched.
stat – Specify the stat parameter you want to fetch, i.e ‘max’, ‘mean’, ‘median_std_dev’. If left blank will return the whole dictionary of stats: {‘max’: , ‘min’: , ‘mean’: , ‘mean_std_dev’: , ‘median_std_dev’: , ‘NRMSD’: }
instance – In cases where there are multiple set of stats associated with a dataset due to iterative loops or multi-parameters, specify which set you want to retrieve, i.e 3 to retrieve the stats associated with the third run of a plugin. Pass ‘all’ to get a list of all sets. By default will retrieve the most recent set.
Which stats are calculated¶
The combination of stats that are calculated for a plugin depend on its stats object’s stats_key. This is a list containing the names of all the stats to be calculated. By default, it is set to max, min, mean, mean_std_dev, median_std_dev, NRMSD and zeros. This can be modified using the set_stats_key method, which should be called in the setup method of a plugin (if at all). It’s important to use this method rather than changing the stats_key directly, as this method checks all the stats it’s given are valid, and also updates an attribute called slice_stats_key, which is a list of all the slice-wide stats that are to be calculated.
Writing stats to file and datasets¶
Stats are written to a hdf5 file, and also to the output datasets of each plugin with the following methods.
-
Statistics.
_write_stats_to_file
(p_num=None, plugin_name=None, comm=<Mock name='mock.MPI.COMM_WORLD' id='140070287344848'>)[source]¶ Writes stats to a h5 file. This file is used to create figures and tables from the stats.
- Parameters
p_num – The plugin number of the plugin the stats belong to (usually left as None except for special cases).
plugin_name – Same as above (but for the name of the plugin).
comm – The MPI communicator the plugin is using.
MPI¶
When Savu is being run with MPI, the slice-wide stats are split between each process (as the slices are). To correctly calculate volume-wide stats, these stats must all be collected together. This is done by the following method.
-
Statistics.
_combine_mpi_stats
(slice_stats, comm=<Mock name='mock.MPI.COMM_WORLD' id='140070287344848'>)[source]¶ Combines slice stats from different processes, so volume stats can be calculated.
- Parameters
slice_stats – slice stats (each process will have a different set).
comm – MPI communicator being used.