Labeled Arrays

Labeled arrays provide conveniences for accessing data in ndarrays by dimension names and labels. They are the primary data structure used throughout the emma modules.

Classes and Functions

  • LbAxis: defines an array axis with an axis name and labels. Facilitiates label-based array indexing, slicing, querying, and assignment.

  • Impression: a collection of LbAxis objects that implicitly define an array’s structure. Impressions are a labeled array building block, having structure but no data. Includes methods to facilitate array construction and casting.

  • LbArray: the labeled array, containing data as a numpy or pytables ndarray, wrapped in LbAxis objects to make data retrieval by axis name(s) and label(s) simple. Includes methods to facilitate array replication.

  • LbViewer: provides mechanisms to “view” data in an array. Viewers impose contiguity in defining array slices. Non-contiguous criteria are generated as separte views to ensure exposed data are views rather than copies of the underlying LbArray data.

  • alignAxes: reorganizes axes in an LbArray to match the arrangement in a separate LbArray with the same axis names.

  • alignAxisLabels: reorganizes axis labels in an LbArray to match labels in a seprate LbAxis.

  • dfToLabeledArray: creates a labeled array from a pandas DataFrame.

  • openLbArray_HDF: loads a labeled array from an on-disk array (H5 file). Arrays constructed with H5 file specs are intialized on-disk and parameters for reloading the array from the H5 file are stored as attributes.

  • broadcast1dByAxis: takes a 1d array and cast it into dimensions defined by a labeled array (useful in array modification applications).

  • reindexDf: reindexes a pandas DataFrame to conform to the order and length of a given LbAxis.

  • inheritValues: propagates values at an aggregate scale to finer scale based on multi-level index relationships.

  • sharesByAxis: summarizes a labeled array and reports shares by provided axes.

  • ShadowAxis`Shadow`: A simple class to relate two labeled arrays to facilitate procedural mimesis.

LbAxis

class emma.labeled_array.LbAxis(name, labels, levels=None)

Defines an axis by names and labels to facilitate indexing a labeled array.

From an axis, users can get the index locations corresponding to addresses in an ndarray based on feature names (see get_loc). To retreive the labels for a known set of indices, use the axis’s labels attribute, which is a Pandas Index (axis.labels[[0,1,2]], e.g.).

Parameters
  • name (String) – Each axis is given a name which is found as an attribute in any LbArray constructed from the axis, providing for dynamic axis naming in the labeled array. Axis names should be unique for natural naming to function predictably.

  • labels (Iterable) – Labels represent the names for each index along the axis. They must be unique. Labels are converted to a Pandas index when the axis is initialized. If a data frame is given the columns are converted to a MultiIndex.

  • levels ([String,..], default=None) – If labels should generate a MultiIndex (hierarchical mappings), level names can be specified here. If None, levels are referenced by index. If a data frame is provided as labels, the column names are used as level names unless otherwise specified here.

n

Length of this axis.

Type

Int

_parent_

To facilitate easy data access from a labeled array, its constituent axes are linked to it by setting the LbArray object as the parent of each axis object.

Type

LbArray

addLevel(labels, level_name=None, left_on=None, right_on=None)

Add a new level to this axis’s labels. The new labels are expected to merge many-to-one with existing labels or be a series of the same length as existing labels.

Parameters
  • labels (pandas DataFrame or Series) – The new labels to be added as a sepearte level on this axis. If a data frame is provided, the new labels are aligned to existing labels through a merge, and left_on, and right_on must be provided. If a series is provided, the new labels are assumed to be well-ordered and are concatenated to the existing labels.

  • level_name ([String,..], default=None) – If labels is a series, provide a name for the new label mappings. If None, the series name attribute is used, if it exists; otherwise, raises RuntimeError

  • left_on (String or [String,..]) – If labels is a data frame, provide the level name(s) in the current axis dimensions for aligning the new labels with existing labels.

  • right_on (String or [String,..]) – If ‘labels` is a data frame, provide the column names(s) in the data frame for aligning the new labels with existing labels.

Raises
  • MergeError: – New labels cannot be merged with existing labels in a m-to-1 join.

  • ValueError: – New label names collide with existing label names.

  • RuntimeError: – Labels are provided in a series with no name attribute and no names argument.

extend(new_labels)

(FUTURE)

Add one or more labels to the axis. The new label(s) will be coerced to the same type as existing labels, if possible. Extending the axis changes the shape of the parent object, which may be expensive.

get_loc(*args, **kwargs)

Returns the index location of the provided label(s) on this axis. Labels to find are passed as a series of arguments (get_loc(“Lbl1”, “Lbln”)). To use an iterable container of zones (list or 1darray, e.g.), use `* (get_loc(*[“Lbl1”, “Lbln”])).

If the axis labels are stored as a MultiIndex, provide keyword args to identify the index labels.

Returns

indices – A list of indices for the requested labels is always returned, even if only a single label is requested.

Return type

[Int,..]

Raises
  • KeyError: – If the label is not in the axis’s labels attribute.

  • ValueError: – Is multiple keywords are provided but labels is a simple index.

to_dict()

Converts this axis to a dictionary. The axis name is the key for the main dictionary. If labels is a MultiIndex,`levels` are used to assign key names in a nested dictionary.

to_frame()

Converts this axis to pandas data frame. If labels is a MultiIndex, levels are used to assign column names. Otherwise, a single column is returned with the name of the axis.

Impression

class emma.labeled_array.Impression(axes)

The Impression class records axis details of an uninitialized labeled array. It is a building block for the LbArray class and useful for propagating LbArrays in an analytical stream.

Parameters

axes ([LbAxis_object, ..]) – The labeled array is constructed using a collection of LbAxis objects that provide axis names and labels for name-based indexing.

shape

The shape of any ndarray conforming to this impression, based on the axes provided to construct it.

Type

(Int, ..)

ndim

The number of dimensions in any ndarray conforming to this impression.

Type

Int

cast(new_axes, drop=None, squeeze=False, **kwargs)

Replicate this impression’s axes into new dimensions, specified by one or multiple LbAxis objects. Dimensions in this array can be dropped when casting (using the drop arg) or selected by label values using keyword arguments.

new_axes: LbAxis or [LbAxis,…]

Labeled axis object(s) defining the dimensions into which this impressions (selected) axes will be cast.

drop: [LBAxis or String,…], default=None

Specify any axes to be excluded from the new impression using the axis object or name.

squeeze: Boolean, default=False

If True, axes that have only a single label will be dropped and the dimensionality of this impression will be reduced before casting into new dimensions. If False, this impression’s dimensionats are retained and then cast into the new dimensions.

fill(data, hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, **kwargs)

Fill this impression with data, returning a labeled array (LbArray) object.

Parameters
  • data (array-like, scalar, or constructor) – The data to fill this impression with.

  • hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created.

  • atom (tb.Atom) – The atom (data type) of the new array.

  • driver (String, default=None) – The HDF connection driver.

  • overwrite (Boolean, default=False) – If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.

  • kwargs – Keywords for data initialization if data is a constructor.

Returns

filled_array – A labeled array with axes matching those of this impression and data as provided to the method.

Return type

LbArray

impress(drop=None, squeeze=False, fill_with=None, hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, constr_kwargs={}, **kwargs)

Returns a new impression based on this impression. The new impression retains axis details (shape and labels). The impression can be made based on axis criteria passed as keyword arguments. Optionally, data can be provided to “fill” the resulting impresssion, returning a new LbArray object.

Parameters
  • drop ([LBAxis or String,..], default=None) – Specify any axes to be excluded from the new impression using the axis object or name. Axes used for keyword selection cannot be dropped.

  • squeeze (Boolean, default=False) – If True, axes that have only a single label will be dropped and the dimensionality of the new impression reduced. If False, the returned impression has the same number of dimensions as the source.

  • fill_with (array-like, scalar, or constructor) – The data to fill this impression with.

  • hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created.

  • atom (tb.Atom) – The atom (data type) of the new array.

  • driver (String, default=None) – The HDF connection driver.

  • overwrite (Boolean, default=False) – If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.

  • = {} (constr_kwargs) – If filling the impression with a constructor (np.ones, e.g.), pass any constructor function kwargs in a dictionary.

  • kwargs – kwargs can be passed with keys corresponding to axis names and values corresponding to labels in the given axis. For example a.take(origins=[1,2,3], destinations=[4,5,6]) or a.take(**{“origins”:[1,2,3], “destination”:[4,5,6]}).

LbArray

class emma.labeled_array.LbArray(data, axes, desc=None, hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, **kwargs)

The LbArray class allows users to slice and index ndarray objects using axis names and labels instead of indices.

Standard index-based techniques for slicing ndarrays remain available in labeled arrays. For example, basic indexing (lb_array[(0, 1, 2)] or advanced indexing (lb_array[0,:,[1,2]]) should work as on any normal ndarray. However, data can also be accessed using labels, as follows:

lb_array.foo[[“bar”, “baz”]] will return all values corresponding to the labels “bar” and “baz” in the “foo” dimension. These lookups can be chained, such that lb_array.foo[“bar”].spam[“ham”, “eggs”].etc[“ad nauseum”]

Chained lookups call the take method, which often provides more succint ways to retrieve select data (lb_array.take(foo=”bar”, spam=[“ham”, “eggs”], etc=”ad nauseum”))

Parameters
  • data (ndarray or constructor) –

    The ndarray holds multi-dimensional data. An existing array (including an on-disk pytables array) can be used to construct the labeled array. Alternatively the user can pass an array constructor, such as np.ones, to initialize a new array.

    The data attribute for an LbArray is always a pytables array. If a numpy array or constructor are given as data, an in-memory array is generated from the original data.

  • axes ([LbAxis,..]) – (see Impression)

  • desc (String, default=None) – A description of the contents of the skim.

  • hdf_store (String, default=None) – If using an on-disk array to store the array and initialize data, give the path to the H5 file. If data is an existing pytables array and hdf_store is provided, the data will be replicated in the new node.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created.

  • atom (tb.Atom) – The atom (data type) of the new array.

  • driver (String, default=None) – The HDF connection driver.

  • overwrite (Boolean, default=False) – If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.

  • kwargs – Keyword args can be given that are passed through to the constructor method call if a constructor is given for data.

size

Number cells in the array.

Type

Integer

See also

Impression

cast(new_axes, copy_data=True, drop=None, squeeze=False, hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, init_val=None, **kwargs)

Returns an ‘impression’ of this labeled array with additional axes defined. The impression is an uninitialized labeled array that retains axis details (shape and labels), but broadcasts these into one or more new axes.

Parameters
  • new_axes (LbAxis or [LbAxis,..]) – Labeled axis object(s) defining the dimensions into which an impression of this array will be stamped.

  • copy_data (Boolean, default=True) – If True, data in this array will be cast (repetively propagated) into the new dimensions. If False, a blank impression is returned that can subsequently be filled with data using Impression.fill.

  • drop ([LBAxis or String,..], default=None) – Specify any axes to be excluded from the new array by axis object or by name. Cannot be used if copy_data is True.

  • squeeze (Boolean, default=False) – If True, axes that have only a single label will be dropped and the dimensionality of the array selection reduced. If False, the returned array has the same number of dimensions as the source array.

  • hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created.

  • atom (tb.Atom) – The atom (data type) of the new array.

  • driver (String, default=None) – The HDF connection driver.

  • init_value (var) – The initial value of all cells in the new array (if copy_data is False).

  • overwrite (Boolean, default=False) – If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.

  • kwargs – Keyword arguments specifying a selection from this array to cast into new dimensions.

Returns

casted – If copy_data is True, an LbArray is returned with data from the original array replicated in new dimensions. Otherwise, a new impression is made that can be filled with fresh data.

Return type

Impression or LbArray

copy(hdf_store=None, node_path=None, name=None, driver=None, overwrite=False)

Copy this array’s contents to a new labeled array with the same axis names and labels. Optional arguments allow HDF storage in the specified file and node (overwriting existing content if directed).

If args are provided, they focus on copying the data into a new h5 data store node. Otherwise an in-memory copy (numpy) as returned.

Parameters
  • hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created.

  • driver (String, default=None) – The HDF connection driver.

  • overwrite (Boolean, default=False) – If casting to an on-disk array at an existing node, the node will be replaced with new data if overwrite=True.

Returns

copied

Return type

LbArray

dissolve(axes, levels, aggfuncs, hdf_store=None, node_path=None, name=None, overwrite=False, driver=None, *args, **kwargs)

Simplify this array by aggregating values along the specified axes at the specified levels. Multiple aggregation statistics can be requested, generating a new axis called “agg” in the resulting array.

This method relies on the Pandas DataFrame.agg method.

Parameters
  • axes (String, Int, or LbAxis) – The axes (by name, index, or object) on which to dissolve this array’s data. All axes provided are asumed to use a MultiIndex.

  • levels (String or Int) – The levels at which to dissove data in each axis. Levels should be listed such that they can be zipped with the axes.

  • aggfuncs (function, str, list) –

    Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

    Accepted combinations are: - function - string function name - list of functions and/or function names, e.g. [np.sum, ‘mean’]

  • hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created. node name in the hdf file where data will be stored.

  • overwrite (Boolean, default=False) – If casting to an on-disk array at an existing node, the node will be replaced with new data if overwrite=True.

  • args – Positional arguments to pass to aggfuncs

  • kwargs – Keyword arguments to pass to aggfuncs

Returns

dissolved_array – A new labeled array with simplified axes, aggregated values, and (if multiple aggfuncs are given) a new axis called “agg” with labels corresponding to aggregations generated.

Return type

LbArray

See also

pd.DataFrame.agg()

flatten()

Generates a flattened numpy array from this array’s data attribute.

mean(by_axes, **kwargs)

Return the mean value for each cell in this array, grouping along the specified axes.

Parameters
  • by_axes ([LbAxis, String, or Int,..]) – Values will averaged along the given axes, provided as LbAxis objects, axis names, or indices.

  • kwargs – Use keyword arguments to select slices of the array prior to summarization.

put(data, **kwargs)

Assign data to selected portions of this labeld array based on provided axis criteria.

Parameters
  • data (array_like) – The data to be put into this array in the cells specified by the provided criteria.

  • kwargs – Keyword arguments providing axis/label details for producing views of the LbArray’s data. Provided data will be pushed to the array via these views.

Returns

Return type

None - this array’s data are updated with data based on kwargs

sum(by_axes, **kwargs)

Summarize the value totals in this array, grouping along the specified axes.

Parameters
  • by_axes ([LbAxis, String, or Int,..]) – Values will summarized along the given axes, provided as LbAxis objects, axis names, or indices.

  • kwargs – Use keyword arguments to select slices of the array prior to summarization.

take(squeeze=False, **kwargs)

Retrieve values from the ndarray based on axis names and labels.

Parameters
  • squeeze (Boolean, default=False) – If True, axes that have only a single label will be dropped and the dimensionality of the array selection reduced. If False, the returned array has the same number of dimensions as the source array.

  • kwargs – kwargs can be passed with keys corresponding to axis names and values corresponding to labels in the given axis. For example a.take(origins=[1,2,3], destinations=[4,5,6]) or a.take(**{“origins”:[1,2,3], “destination”:[4,5,6]}).

Returns

filtered_array – This method ALWAYS returns a copy of the original data, filtered based on the requested labels. The primary advantage of this is that the taken datasets do not prevent the original larger set from being garbage-collected (see notes on views versus copies <here https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#arrays-indexing>) However, it also means that repeatedly taking large chunks of data interactively may consume substantial amounts of memory. The take method is envisioned to serve as a means of filtering data for analysis purposes, and those analytical means can (and should) generally be encapsulated within a localized scope to efficiently release memory allocated to filtered copies once they’ve served their purpose.

Return type

LbArray

to_array()

Generates a numpy array from this array’s data attribute.

to_frame(column_name)

Flattens this array’s data and axes attributes to generates a long pandas data frame with a hierarchical index.

Parameters

column_name (String) – The name of the column in the output data frame containing this array’s data.

LbViewer

class emma.labeled_array.LbViewer(lb_array, collation=None, **kwargs)

The LbView class provides a view into an LbArray object’s data. When updates are made to the data in the view, they are also reflected in the source data. Views can only be obtained through basic slicing, so the LbView class presents simple views using a generator to iterate over basic slices of the source data.

Since views reference another array’s data, they can disrupt garbage collection, preventing memory allocated to the source ndarray from being released while the view object persists. Views are primarily intended to facilitate selective updating of the source ndarray and should be encapsulated within the processes handling the udpates.

Parameters
  • lb_array (LbArray) – The labeled array from which to generate view.

  • collation ("Discrete", "Contiguous", "ByLevel", default=None) –

    -If “Discrete” or not specified, a separate view will be generated for each axis label requested, even if multiple labels could be retreived through basic slicing. This produces array views of consistent and predictable shapes.

    -If “Contiguous”, contiguous indices will be consolidated into the smallest number of single views possible. View shapes may vary.

    -If “ByLevel”, contiguous indices are only consolidated within each index level. View shapes may vary. forthcoming, currently unsupported

  • **kwargs – Keyword arguments providing axis/label details for producing views of the LbArray’s data.

class LbView(parent, data, criteria, _slice)

A view of a labeled array generated by the LbViewer class.

push()

ShadowAxis

class emma.labeled_array.Shadow(shadow_array, leader_array, shadow_axes)

A simple class to relate two labeled arrays to facilitate procedural mimesis (what happens in one array happens in the related array).

Parameters

See also

ShadowAxis

getAxisByIndex(index)

Get a shadow axis by referencing its index in the shadow_array.

Returns

shadow_axis

Return type

LbAxis or -1 if no match was found

Functions

emma.labeled_array.alignAxes(array, ref_array, inplace=False, **kwargs)

Rearrange the axes of an LbArray so that they match those in another LbArray. This function only rerranges the axes as a whole. It does not align labels for any axes (see alignAxisLabels).

Parameters
  • array (LbArray) – The input array whose axes will be rearranged.

  • ref_array (LbArray) – The reference array that will guide rearrangement of array. The two arrays must have the same number of dimensions and identical axis names (obviously, axis order need not be identical).

  • inplace (Boolean, default=False) – If True, array is updated with rearranged data and axes. If False, a new array with the rearranged values and axes is returned.

  • kwargs – Keyword arguments can be passed to initialize a new array on disk, e.g.

Returns

a

Return type

LbArray

emma.labeled_array.alignAxisLabels(array, axes, ref_axes, ref_array=None, inplace=False, **kwargs)

Rearrange axis labels and corresponding data indices in an LbArray so that they match those in another LbAxis. This function only rerranges the axis labels and indicies within the specified axes. It does not re-arrange axis positions (see alignAxes)

The axes to align must be of the same length and include identical values at all levels (i.e, if labels are a MultiIndex). Index values must be unique (across all levels, not within each level).

Parameters
  • array (LbArray) – The labeled array whose data and axes will be rearranged through label realignment.

  • axes ([str or LbAxis,..]) – Axis name(s) or object(s) for the axes to be realigned.

  • ref_axes ([str or LbAxis,..]) – A parallel listing of axis name(s) or object(s) that guide realignment of axes.

  • ref_array (LbArray, default=None) – The parent array of the ref_axes. This is only requred when axis names are provided rather than axis objects.

  • inplace (Boolean, default=False) – If True, array is updated with rearranged data and axis labels. If False, a new LbArray with rearranged data and axis labels is returned.

  • kwargs – Keyword arguments can be passed to initialize a new array on disk, e.g.

Returns

a

Return type

LbArray

See also

alignAxes()

emma.labeled_array.broadcast1dByAxis(labeled_array, axis, values, inplace=False, **kwargs)

Cast a 1d array into the dimensions of a labeled array along a specified axis.

Parameters
  • labeled_array (LbArray) – The labeled array that defines the casting dimensions

  • axis (LbAxis or String) – The axis in labeled_array along which to repeat values when casting.

  • values (1d array-like) – The values to cast into the dimensions defined by labeled_array

  • inplace (Boolean, defaul=False) – If True, labeled_array’s values are modified in place via the casting. Otherwise, a new LbArray is returned with casted values.

  • kwargs – Keyword arguments can be used for the initialization of a new LbArray when casting (if `inplace`=False). This is useful if casting into an H5 file, for example.

Returns

a – An LbArray with values casted into the specified dimensions.

Return type

LbArray

See also

LbArray()

emma.labeled_array.dfToLabeledArray(df, dimension_columns, value_column, dtype=None, use_index=False, fill_value=nan, desc=None, hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False)

Create a labeled array from a pandas data frame. The data frame is expected to be organized in long form such that there is one column of values and n columns representing each dimension.

The input data frame is sorted based on the provided dimension columns to ensure that the resulting array is properly indexed along each axis.

Parameters
  • df (pandas Data Frame) –

  • dimension_columns ([String,..]) – A (list of) column name(s). The column names are used as axis names, while the unique values in each column become axis labels.

  • value_columns (String) – The column that contains the numeric values to be loaded into the labeled array.

  • dtype (np.dtype) – Cast the values in value_columns to the specified dtype.

  • fill_value (numeric, default=np.nan) – The labeled array to be constructed will be an dense matrix. If the data frame does not contain all values, missing data will be filled with this value.

  • desc (String, default=None) – A description of the contents of the skim.

  • hdf_store (String, default=None) – If using an on-disk array to store the array and initialize data, give the path to the H5 file.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created.

  • atom (tb.Atom) – The atom (data type) of the new array.

  • driver (String, default=None) – The HDF connection driver.

  • overwrite (Boolean, default=False) – If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.

Returns

lb_array

Return type

LbArray

See also

LbArray()

emma.labeled_array.inheritValues(coarse_df, fine_axis, ref_level, coarse_level=None)

Reindex rows in a data frame from a coarse axis to a fine axis.

Parameters
  • coarse_df (LbArray) – A data frame whose rows represent features at a relatively coarse scale.

  • fine_axis (LbAxis) – A LbAxis whose labels are a MultiIndex with one index level corresponding to index values in coarse_df and another level identifying related featuresa at a relatively fine scale.

  • ref_level (String) – The name of the level in fine_axis that relates its labels to the rows in coarse_df.

  • coarse_level (String, default=None) – If coarse_df uses a multi-level index, provide the name of the level that corresponds to values in ref_level.

Returns

Return type

pd.DataFrame

emma.labeled_array.openLbArray_HDF(hdf_store, node_path, mode='r')

Open an LbArray object from a node in an hdf file.

The hdf file must have the expected attributes for reconstructing an LbArray object (these are stored when an array references an hdf node as its data attribute).

Parameters
  • hdf_store (String) – The path to the H5 file.

  • node_path (String, default=None) – The path to the node in the H5 file.

Returns

a – An LbArray object built around the on-disk array at the hdf node.

Return type

LbArray

See also

LbArray()

emma.labeled_array.reindexDf(df, lbaxis, level=None, fill_value=0, copy=True)

Filter and sequence rows in a data frame to match values in a given labeled axis.

df: DataFrame

The data frame to be reindexed

lbaxis: LbAxis

The referenced axis. The input data frame will be reindexed based on the labels in this axis.

level: String or Int, default=None

If the reference axis is a MultiIndex, a level name may be specified on which to reindex the input data frame.

fill_value: numeric, default=0

Value to use for missing values. Defaults to NaN, but can be any compatible value.

copy: Boolean, default=True

Return a new object, even if the passed indexes are the same.

emma.labeled_array.sharesByAxis(lbarray, by_axes=[], hdf_store=None, node_path=None, name=None, overwrite=False)

Summarize data in a labeled array along specified axes and express the summed values as shares.

If no axis is specified, each cell in the returned array is the original value divided by the sum of the full array. When axes are given, values are grouped and summed along these axes before shares are generated.

Parameters
  • lbarray (LbArray) – The labeled array with original values.

  • by_axes ([LbAxis, String, or Integer...], default=[]) – The axis objects, names, or indices of the axes by which shares are calculated.

  • hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created. node name in the hdf file where data will be stored.

  • overwrite (Boolean, default=False) – If casting to an on-disk array at an existing node, the node will be replaced with new data if overwrite=True.

Returns

share_array

Return type

LbArray