OD (Origin-Destination)

Origin-Destination data are essential to many travel analyses. Emma relies heavily on OD data to analyze accessibility, trip distribution, system productivity, etc.

This module provides classes and functions that facilitate the creation, storage, and retrieval of OD matrices as well as methods for developing and executing functions using multiple matrices. Matrix data may be stored in the h5 file format, for efficient on-disk processing.

Module objectives: - efficient I/O between tabular OD data and matrices. - efficient replication of matrix structures and/or values for OD processing. - consistent processing idioms for vectorized formula applications.

Classes and functions

  • Skim: stores OD data as a specialized labeled array. Skims always have congruent i (origins) and j (destinations) dimensions. Skim is a child class of LbArray, inheriting (or in a few cases overloading) its attributes and methods while adding a handful of distinct attributes and methods to ensure preservation of the i and j dimensions.

  • The loadOD_… family of functions supports reading origin-destination data from a long table into a Skim object. The skim object must be already initialized. Supported formats include: csv (most performant), ESRI geodatabase, DBF, or Excel.

  • maxInteractions/weightedInteractions are functions that consume origin- end and desetination-end activity data in a pandas DataFrame to generate a Skim reflecting trip-making potential among OD pairs.

  • summarizeAccess uses origin-end OR destination-end activty data in a pandas DataFrame and one or more Decay objects (see the decay module) to generate access scores to/from each zone.

  • distribute provides a simple function for balancing a trip table based on zone-levl productions and attractions. Currently the function only works for a simple 2d matrix.

  • openSkim_HDF loads a skim object from an on-disk array (H5 file). Skims constructed with H5 file specs are intialized on-disk and parameters for reloading the Skim object from the H5 file are stored as attributes.

  • csrVApply is an early foray into supporting sparse matrix applications, but it remains untested.

Skim

class emma.od.Skim(zones, data, axes_k=None, desc='', hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, **kwargs)

The Skim class stores origin-destination data in a labeled array. Attributes record dimesional details (LbAxis parameters)

Parameters
  • zones (array like 1d) – An object containing zone names and indices for reference when indexing/slicing the skim matrix. The zones argument defines the two key axes of the labeled array (the i and j dimensions) as “From” and “To”. Multi-level indices (pd.MultiIndex, e.g.) are supported.

  • data (array like or numpy array constructor) – The ndarray that contains the data in the skim. This can be an on-disk pytables array or a numpy array or numpy array constructor.

  • axes_k ([LbAxis,..], default=None) – One or more LbAxis objects that describe the k-plus dimensions of the array (those besides the “From” and “To” dimensions defined by zones).

  • desc (String) – A description of the contents of the skim.

  • hdf_store (String, default=None) – If using an on-disk array to store the skim and initialize data, give the path to the H5 file. If data is an existing pytables array and hdf_store is provided, the data will be replicated in the new node.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created.

  • atom (tb.Atom) – The atom (data type) of the new array.

  • driver (String, default=None) – The HDF connection driver.

  • overwrite (Boolean, default=False) – If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.

  • kwargs – Keyword arguments for initializing the data. Kwargs vary by the data initialization method provided.

nzones

The number of zones in the matrix (the length of each axis)

Type

Integer

axes

The axes defining dimensions and labels. The zones parameter is used to creat the “From” and “To” dimensions, and these are combined with axes_k.

Type

[LbAxis,..]

See also

LbArray

cast(new_axes, squeeze=False, hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, desc='', **kwargs)

Returns a copy of this skim cast along additional axes_k dimensions.

Overrides the LbArray implementation of cast as follows:
  • Data are always copied.

  • No existing dimensions can be dropped.

  • Selection criteria (keyword arguments) are ignored for the skim’s From and To axes to ensure consistency in the i and j dimensions.

To cast skims into labeled arrays, consider the impress method. To fill with alternative data before casting, consider the stamp method, followed by cast.

Parameters
  • new_axes (LbAxis or [LbAxis,..]) – Labeled axis object(s) defining the dimensions into which an impression of this array will be stamped.

  • squeeze (Boolean, default=False) – If True, axes that have only a single label will be dropped and the dimensionality of the array selection reduced. If False, the returned array has the same number of dimensions as the source array.

  • hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created.

  • atom (tb.Atom) – The atom (data type) of the new array.

  • driver (String, default=None) – The HDF connection driver.

  • overwrite (Boolean, default=False) – If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.

  • kwargs – Keyword arguments specifying a selection from this array to cast into new dimensions.

Returns

casted – If copy_data is True, an LbArray is returned with data from the original array replicated in new dimensions. Otherwise, a new impression is made that can be filled with fresh data.

Return type

Impression or LbArray

copy(hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, **kwargs)

Copy this skim’s contents to a new labeled array with the same axis names and labels. Optional arguments allow HDF storage in the specified file and node (overwriting existing content if directed)

If args are provided, they focus on copying the data into a new h5 data store node. Otherwise an in-memory copy (numpy) as returned.

Parameters
  • hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created.

  • atom (tb.Atom) – The atom (data type) of the new array.

  • driver (String, default=None) – The HDF connection driver.

  • overwrite (Boolean, default=False) – If casting to an on-disk array at an existing node, the node will be replaced with new data if overwrite=True.

fetch(squeeze=False, **kwargs)

Retrieve values from the Skim based on axis names and labels.

This is essentially like calling take except the i and j dimensions always remain intact and a new Skim is returned.

Parameters
  • squeeze (Boolean, default=False) – If True, axes that have only a single label will be dropped and the dimensionality of the array selection reduced. If False, the returned array has the same number of dimensions as the source array.

  • kwargs – kwargs can be passed with keys corresponding to axis names and values corresponding to labels in the given axis. The “From” and “To” axis (i and j dimensions) cannot be used when fetching - instead use take.

Returns

filtered_skim

Return type

Skim

See also

LbArray.take()

stamp(fill_with, drop=None, squeeze=False, hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, constr_kwargs={}, desc='', **kwargs)

Returns a new skim based on this skim. The new skim retains axis details (shape and labels)reflecting axis criteria passed as keyword arguments. The skim must be filled with data. To create an unfilled impression, use the impress method.

Parameters
  • fill_with (array-like, scalar, or constructor) – The data to fill this impression with.

  • drop ([LBAxis or String,..], default=None) – Specify any axes to be excluded from the new impression using the axis object or name. Axes used for keyword selection cannot be dropped.

  • squeeze (Boolean, default=False) – If True, axes that have only a single label will be dropped and the dimensionality of the new impression reduced. If False, the returned impression has the same number of dimensions as the source.

  • hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.

  • node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.

  • name (String, default=None) – The name of the new node to be created.

  • atom (tb.Atom) – The atom (data type) of the new array.

  • driver (String, default=None) – The HDF connection driver.

  • overwrite (Boolean, default=False) – If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.

  • = {} (constr_kwargs) – If filling the impression with a constructor (np.ones, e.g.), pass any constructor function kwargs in a dictionary.

  • kwargs – kwargs can be passed with keys corresponding to axis names and values corresponding to labels in the given axis. For example a.take(origins=[1,2,3], destinations=[4,5,6]) or a.take(**{“origins”:[1,2,3], “destination”:[4,5,6]}).

See also

Impression.impress()

Functions

emma.od.csrVApply(vfunc, csr_mat, *args, **kwargs)

Apply a vectorized function to values in a sparse (csr) matrix. It also works for a csc matrix (?right? seems like it should).

Parameters
  • vfunc (callable) – A vectorized function (see numpy.vectorize)

  • csr_mat (scipy.sparse.csr.csr_matrix) – A compressed sparse row (csr) matrix

  • *args (ordered arguments for vfunc) –

  • **kwargs (key-word args accepted by vfunc) –

emma.od.distribute(decay_skim, prods_df, prods_col, attrs_df, attrs_col, level=None, prods_id=None, prods_index=False, attrs_id=None, attrs_index=False, hdf_store=None, node_path=None, name=None, driver=None, overwrite=False, converges_at=1e-05, max_iters=500, tolerance=1e-08, report_convergence=False, **kwargs)

Create and solve an iterative proportional fitting (IPF) problem, seeded based on weights in the decay skim, trip productions in a data frame, and trip attractions in a data frame.

decay_skim: Skim

The skim object containing decay factors for each OD pair.

prods_df: DataFrame

The data frame containing estimates of trip productions.

prods_col: String

The name of the field in prods_df that contains trip production estimates.

attrs_df: DataFrame

The data frame containing estimates of trip productions.

attrs_col: String

The name of the field in attrs_df that contains trip attraction estimates.

level: String or Int, default=None

If the decay skim’s zones attribute is a MultiIndex, a level name may be specified on which to reindex the input data frame.

prods_id: String, default=

A column in the productions data frame used to reindex the data frame to match the indexing of the decay skim.

prods_index: Boolean, default=False

If true, the destinations data frame’s index will be the used when reindexing to match the indexing of the decay skim.

attrs_id: String, default=None

A column in the attractions data frame used to reindex the data frame to match the indexing of the decay skim.

attrs_index: Boolean, default=False

If true, the destinations data frame’s index will be the used when reindexing to match the indexing of the decay skim.

hdf_store: String, default=None

If creating a new on-disk array to store the resulting array, give the path to the H5 file.

node_path: String, default=None

The node path in the H5 hierarchy within the hdf_store.

name: String, default=None

The name of the new node to be created.

driver: String, default=None

The HDF connection driver.

overwrite: Boolean, default=False

If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.

converges_at: Float

Specificies a convergence value. If the percentage error between seed marginals and production and attraction targets is less than or equal to this value, the IPF process exits, returning an adequately fitted matrix. Default is 1e-5

max_iters: Int

Maximum number of iterations allowed. The IPF process exits after this number of iterations even if convergence has not been achieved. Default is 500.

tolerance: Float

The IPF process exits if the difference between the convergence variables of two consecutive iterations is below this value. If there is minimal difference between two iterations, the process is unlikely to acheive substantially stronger convergence through additional interations. Default is 1e-8.

report_convergence: Boolean

If False (default), only the rebalanced matrix is returned. If True, the details of the IPF are returned as a tuple.

report_convergence: Boolean

If False (default), only the rebalanced matrix is returned. If True, the details of the IPF are returned as a tuple.

kwargs:

Keyword arguments defining slices of the decay skim to take for seeding distribution.

Returns

  • trip_table (Skim) – A balanced matrix where marginals match (or approximate) the production and attraction totals given.

  • number_of_iterations (Int) – If report_convergence is True, the second value returned is the number of iterations completed by the IPF process

  • convergence (Boolean) – If report_convergence is True, the third value returned is a boolean flag indicating whether convergence was acheived.

  • narrowing (Boolean) – If report_convergence is True, the fourth value returned is a boolean flag indicating whether convergence was narrowing in the final iteration. If False, the IPF has exited due to minimal improvement in convergence in two consecutve runs.

emma.od.loadOD_csv(skim, source_file, o_field, d_field, val_fields_dict, chunk_rows=500000, level=None, **kwargs)

Import OD data from a long table (csv) into a labeled array (pytables array) format.

Parameters
  • skim (Skim) – The skim object to which OD values will be assigned from the long table. Origin/desination indexing is handled by the skim object’s zones attribute.

  • source_file (String) – The csv file containing long-form OD data

  • o_field (String or Int) – The field name or index that identifies the origin zone in each OD row.

  • d_field (String or Int) – The field name or index that identifies the destination zone in each OD row.

  • val_fields_dict ({String: {String: String}, ..}) – A dictionary with keys corresponding to column names in source_file and values corresponding to dimension/label criteria in the skim ({“AM_HBW_Trips”: {“Period”: “AM”, “Purpose”: “HBW”}, e.g.).

  • chunk_rows (Int, default=500000) – If importing a large table, chunking rows is recommended to avoid memory errors.

  • level (String, default=None) – If the skim’s zones attribute is a multi-index, provide the level name within the index to which values in o_field and d_field correspond.

  • **kwargs – Keyword arguments for pandas read_csv function.

emma.od.loadOD_dbf(skim, source_file, o_field, d_field, val_fields_dict, chunk_rows=500000, level=None)

Import OD data from a long table (dbf) into a labeled array (pytables array) format.

Parameters
  • skim (Skim) – The skim object to which OD values will be assigned from the long table. Origin/desination indexing is handled by the skim object’s zones attribute.

  • source_file (String) – The dbf file containing long-form OD data

  • o_field (String or Int) – The field name or index that identifies the origin zone in each OD row.

  • d_field (String or Int) – The field name or index that identifies the destination zone in each OD row.

  • val_fields_dict ({String: String, ..}) – A dictionary with keys corresponding to column names in source_file and values corresponding to dimension/label criteria in the skim ({“AM_HBW_Trips”: {“Period”: “AM”, “Purpose”: “HBW”}, e.g.).

  • chunk_rows (Int, default=500000) – If importing a large table, chunking rows is recommended to avoid memory errors.

  • level (String, default=None) – If the skim’s zones attribute is a multi-index, provide the level name within the index to which values in o_field and d_field correspond.

emma.od.loadOD_gdb(skim, source_file, layer_name, o_field, d_field, val_fields_dict, chunk_rows=500000, level=None)

Import OD data from a long table (dbf) into a labeled array (pytables array) format.

Parameters
  • skim (Skim) – The skim object to which OD values will be assigned from the long table. Origin/desination indexing is handled by the skim object’s zones attribute.

  • source_file (String) – The geodatabase containing long-form OD data.

  • layer_name (String) – The feature class or table in source_file to import.

  • o_field (String or Int) – The field name or index that identifies the origin zone in each OD row.

  • d_field (String or Int) – The field name or index that identifies the destination zone in each OD row.

  • val_fields_dict ({String: String, ..}) – A dictionary with keys corresponding to column names in source_file and values corresponding to dimension/label criteria in the skim ({“AM_HBW_Trips”: {“Period”: “AM”, “Purpose”: “HBW”}, e.g.).

  • chunk_rows (Int, default=500000) – If importing a large table, chunking rows is recommended to avoid memory errors.

  • level (String, default=None) – If the skim’s zones attribute is a multi-index, provide the level name within the index to which values in o_field and d_field correspond.

emma.od.loadOD_excel(skim, source_file, sheet_name, o_field, d_field, val_fields_dict, level=None, **kwargs)

Import OD data from a long table (excel) into a labeled array (pytables array) format. No chunking is done with excel files.

Parameters
  • skim (Skim) – The skim object to which OD values will be assigned from the long table. Origin/desination indexing is handled by the skim object’s zones attribute.

  • source_file (xlrd_io) – The excel file containing long-form OD data.

  • o_field (String or Int) – The field name or index that identifies the origin zone in each OD row.

  • d_field (String or Int) – The field name or index that identifies the destination zone in each OD row.

  • val_fields_dict ({String: String, ..}) – A dictionary with keys corresponding to column names in source_file and values corresponding to dimension/label criteria in the skim ({“AM_HBW_Trips”: {“Period”: “AM”, “Purpose”: “HBW”}, e.g.).

  • level (String, default=None) – If the skim’s zones attribute is a multi-index, provide the level name within the index to which values in o_field and d_field correspond.

emma.od.maxInteractions(zones_df, o_values, d_values)

Uses activity values in a zones data frame to evaluate the maximum “interaction” potential between each zone in a hypothetical, frictionless world.

Interaction describes probable trip-making potential. Maximum interactions analyses frame theoretical trip-making opportunities to support network effectiveness analyses and/or seed trip distribution processes.

Parameters
  • zones_df (Pandas data frame) – The data frame containing activity informaiton. Zone labels are not required, but consistent indexing of the data frame with any related skim objects to be used in downstream procedures is essential.

  • o_values (String) – The column in zones_df with origin-end activity data (households, trip productions, e.g.)

  • d_values (String) – The column in zones_df with destination-end activity data (jobs, trip attractions, e.g.)

Returns

maxInteractions

Return type

2d array

emma.od.openSkim_HDF(hdf_store, node_path)

Open a Skim object from a node in an hdf file.

The hdf file must have the expected attributes for reconstructing a Skim object (these are stored when a skim references an hdf node as its data attribute).

Parameters
  • hdf_store (String) – The path to the H5 file.

  • node_path (String, default=None) – The path to the node in the H5 file.

Returns

s – A skim object built around the on-disk array at the hdf node.

Return type

Skim

emma.od.summarizeAccess(zones_df, activity_col, decay_skim, key_level=None, access_to_dests=True, **kwargs)

Uses activity values in a zones data frame and a decay skim to summarize the number of destination-end activities reachable from each origin zone (access_to_dests=True) or the number of origin-end activities that can reach each destination zone (access_to_dests=False).

zones_df: Pandas data frame

The index is used to match zone names in the data frame to the zones attribute of the skim.

activity_col: string or [string,…]

The column(s) in zones_df with activity data (jobs, households, e.g.)

decay_skim: Skim

The skim object containing decay factors for each OD pair.

Key_level: String or Int, default=None

If the decay skim’s zones attribute is a MultiIndex, a key level may be specified on which to reindex zones_df.

access_to_dests: Boolean

If True (default), the function returns the number of destination activities reachable from each zone of origin (row sums). If False, the fucntion returns the number of origin activities that can reach each zone of destination (column sums).

kwargs:

Keyword arguments that specify the axis dimensions and labels from which to fetch OD decay values from the decay_skim. For example: impedances=’CongTime’, period=[‘am’, ‘pm’].

emma.od.weightedInteractions(decay_skim, origins_df, origins_col, dests_df, dests_col, level=None, origins_id=None, origins_index=False, dests_id=None, dests_index=False, weighting_factor=1.0, hdf_store=None, node_path=None, name=None, driver=None, overwrite=False, **kwargs)

Create and solve an iterative proportional fitting problem, seeded based on weights in the decay skim, trip productions in a data frame, and trip attractions in a data frame.

decay_skim: Skim

The skim object containing decay factors for each OD pair.

origins_df: DataFrame

The data frame containing estimates of origin-end activity.

origins_col: String

The name of the field in origins_df that contains origin activity estimates

dests_df: DataFrame

The data frame containing estimates of destination-end activity.

dests_col: String

The name of the field in dests_df that contains destination activity estimates

level: String or Int, default=None

If the reference skim’s zones attribute is a MultiIndex, a level name may be specified on which to reindex the input data frames.

origins_id: String, default=None

A column in the origins data frame used to reindex the data frame to match the indexing of the decay skim.

prods_index: Boolean, default=False

If true, the origins data frame’s index will be the used when reindexing to match the indexing of the decay skim.

dests_id: String, default=None

A column in the destinations data frame used to reindex the data frame to match the indexing of the decay skim.

dests_index: Boolean, default=False

If true, the destinations data frame’s index will be the used when reindexing to match the indexing of the decay skim.

weighting_factor: Numeric

A scalar value by which weighted interaction scores are multiplied (to approximate total trip counts when seeding a trip distibution matrix, e.g.)

hdf_store: String, default=None

If creating a new on-disk array to store the resulting array, give the path to the H5 file.

node_path: String, default=None

The node path in the H5 hierarchy within the hdf_store.

name: String, default=None

The name of the new node to be created.

driver: String, default=None

The HDF connection driver.

overwrite: Boolean, default=False

If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.

kwargs:

Keyword arguments defining slices of the decay skim to take for seeding distribution.

Returns

weighted_interactions – A skim object with values reflecting proportional trip-making propensities among OD pairs.

Return type

Skim