Statistics API

The statistics class calculates and stores statistics information about the data as it goes through each plugin. Each plugin object contains a statistics object called stats_obj, through which stats are calculated and accessed. However the dictionaries that contain the statistics for each plugin are class attributes – they are shared across all the statistics object.

Calculating slice statistics

Statistics are calculated frame by frame, in tandem with the processing of each frame/slice by a plugin. The method that does this is set_slice_stats, and is called for every frame. There is a dictionary within each stats object called stats that contains a list for each stat (max, min, mean etc). Every time set_slice_stats is called, these lists are appended with the stats for the last slice. The actual calculation of the statistics takes place in calc_slice_stats.

Statistics.set_slice_stats(my_slice, base_slice=None, pad=True)[source]

Sets slice stats for the current slice.

Parameters
  • my_slice – The slice whose stats are being set.

  • base_slice – Provide a base slice to calculate residuals from, to calculate RMSD.

  • pad – Specify whether slice is padded or not (usually can leave as True even if slice is not padded).

Statistics.calc_slice_stats(my_slice, base_slice=None, pad=True)[source]

Calculates and returns slice stats for the current slice.

Parameters
  • my_slice – The slice whose stats are being calculated.

  • base_slice – Provide a base slice to calculate residuals from, to calculate RMSD.

  • pad – Specify whether slice is padded or not (usually can leave as True even if slice is not padded).

Calculating volume statistics

At the end of the plugin, the method set_volume_stats is called, which deals all the post-processing tasks that need to be carried out. This method combines slice stats to create volume stats. For example, the mean of all the slice-wide means is calculated to create a volume mean. These volume stats are then inserted into the dictionary global_stats, which is the class-wide dictionary that stores all the stats for all the plugins. The keys in global_stats are plugin numbers, with the values being a list containing the stats for that plugin. Usually this list has just one item, but for iterative plugins the list will have as many items as there were iterations of the plugin.

Statistics.set_volume_stats()[source]

Calculates volume-wide statistics from slice stats, and updates class-wide arrays with these values. Links volume stats with the output dataset and writes slice stats to file.

Statistics.calc_volume_stats(slice_stats)[source]

Calculates and returns volume-wide stats from slice-wide stats.

Parameters

slice_stats – The slice-wide stats that the volume-wide stats are calculated from.

Accessing stats

Stats can be accessed within a plugin using the following methods:

Statistics.get_stats(p_num=None, stat=None, instance=- 1)[source]

Returns stats associated with a certain plugin, given the plugin number (its place in the process list).

Parameters
  • p_num – Plugin number of the plugin whose associated stats are being fetched. If p_num <= 0, it is relative to the plugin number of the current plugin being run. E.g current plugin number = 5, p_num = -2 –> will return stats of the third plugin. By default will gather stats for the current plugin.

  • stat – Specify the stat parameter you want to fetch, i.e ‘max’, ‘mean’, ‘median_std_dev’. If left blank will return the whole dictionary of stats: {‘max’: , ‘min’: , ‘mean’: , ‘mean_std_dev’: , ‘median_std_dev’: , ‘NRMSD’: }

  • instance – In cases where there are multiple set of stats associated with a plugin due to iterative loops or multi-parameters, specify which set you want to retrieve, i.e 3 to retrieve the stats associated with the third run of a plugin. Pass ‘all’ to get a list of all sets. By default will retrieve the most recent set.

Statistics.get_stats_from_name(plugin_name, n=None, stat=None, instance=- 1)[source]

Returns stats associated with a certain plugin.

Parameters
  • plugin_name – name of the plugin whose associated stats are being fetched.

  • n – In a case where there are multiple instances of plugin_name in the process list, specify the nth instance. Not specifying will select the first (or only) instance.

  • stat – Specify the stat parameter you want to fetch, i.e ‘max’, ‘mean’, ‘median_std_dev’. If left blank will return the whole dictionary of stats: {‘max’: , ‘min’: , ‘mean’: , ‘mean_std_dev’: , ‘median_std_dev’: , ‘NRMSD’: }

  • instance – In cases where there are multiple set of stats associated with a plugin due to iterative loops or multi-parameters, specify which set you want to retrieve, i.e 3 to retrieve the stats associated with the third run of a plugin. Pass ‘all’ to get a list of all sets. By default will retrieve the most recent set.

Statistics.get_stats_from_dataset(dataset, stat=None, instance=- 1)[source]

Returns stats associated with a dataset.

Parameters
  • dataset – The dataset whose associated stats are being fetched.

  • stat – Specify the stat parameter you want to fetch, i.e ‘max’, ‘mean’, ‘median_std_dev’. If left blank will return the whole dictionary of stats: {‘max’: , ‘min’: , ‘mean’: , ‘mean_std_dev’: , ‘median_std_dev’: , ‘NRMSD’: }

  • instance – In cases where there are multiple set of stats associated with a dataset due to iterative loops or multi-parameters, specify which set you want to retrieve, i.e 3 to retrieve the stats associated with the third run of a plugin. Pass ‘all’ to get a list of all sets. By default will retrieve the most recent set.

Which stats are calculated

The combination of stats that are calculated for a plugin depend on its stats object’s stats_key. This is a list containing the names of all the stats to be calculated. By default, it is set to max, min, mean, mean_std_dev, median_std_dev, NRMSD and zeros. This can be modified using the set_stats_key method, which should be called in the setup method of a plugin (if at all). It’s important to use this method rather than changing the stats_key directly, as this method checks all the stats it’s given are valid, and also updates an attribute called slice_stats_key, which is a list of all the slice-wide stats that are to be calculated.

Statistics.set_stats_key(stats_key)[source]

Changes which stats are to be calculated for the current plugin.

Parameters

stats_key – List of stats to be calculated.

Writing stats to file and datasets

Stats are written to a hdf5 file, and also to the output datasets of each plugin with the following methods.

Statistics._write_stats_to_file(p_num=None, plugin_name=None, comm=<Mock name='mock.MPI.COMM_WORLD' id='140070287344848'>)[source]

Writes stats to a h5 file. This file is used to create figures and tables from the stats.

Parameters
  • p_num – The plugin number of the plugin the stats belong to (usually left as None except for special cases).

  • plugin_name – Same as above (but for the name of the plugin).

  • comm – The MPI communicator the plugin is using.

Links the volume wide statistics to the output dataset(s).

Parameters
  • stats_dict – Dictionary of stats being linked.

  • iterative – boolean indicating if the plugin is iterative or not.

MPI

When Savu is being run with MPI, the slice-wide stats are split between each process (as the slices are). To correctly calculate volume-wide stats, these stats must all be collected together. This is done by the following method.

Statistics._combine_mpi_stats(slice_stats, comm=<Mock name='mock.MPI.COMM_WORLD' id='140070287344848'>)[source]

Combines slice stats from different processes, so volume stats can be calculated.

Parameters
  • slice_stats – slice stats (each process will have a different set).

  • comm – MPI communicator being used.

API

class Statistics[source]