OD (Origin-Destination)¶
Origin-Destination data are essential to many travel analyses. Emma relies heavily on OD data to analyze accessibility, trip distribution, system productivity, etc.
This module provides classes and functions that facilitate the creation, storage, and retrieval of OD matrices as well as methods for developing and executing functions using multiple matrices. Matrix data may be stored in the h5 file format, for efficient on-disk processing.
Module objectives: - efficient I/O between tabular OD data and matrices. - efficient replication of matrix structures and/or values for OD processing. - consistent processing idioms for vectorized formula applications.
Classes and functions
Skim: stores OD data as a specialized labeled array. Skims always have congruent i (origins) and j (destinations) dimensions. Skim is a child class of LbArray, inheriting (or in a few cases overloading) its attributes and methods while adding a handful of distinct attributes and methods to ensure preservation of the i and j dimensions.
The loadOD_… family of functions supports reading origin-destination data from a long table into a Skim object. The skim object must be already initialized. Supported formats include: csv (most performant), ESRI geodatabase, DBF, or Excel.
maxInteractions/weightedInteractions are functions that consume origin- end and desetination-end activity data in a pandas DataFrame to generate a Skim reflecting trip-making potential among OD pairs.
summarizeAccess uses origin-end OR destination-end activty data in a pandas DataFrame and one or more Decay objects (see the decay module) to generate access scores to/from each zone.
distribute provides a simple function for balancing a trip table based on zone-levl productions and attractions. Currently the function only works for a simple 2d matrix.
openSkim_HDF loads a skim object from an on-disk array (H5 file). Skims constructed with H5 file specs are intialized on-disk and parameters for reloading the Skim object from the H5 file are stored as attributes.
csrVApply is an early foray into supporting sparse matrix applications, but it remains untested.
Skim¶
-
class
emma.od.
Skim
(zones, data, axes_k=None, desc='', hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, **kwargs)¶ The Skim class stores origin-destination data in a labeled array. Attributes record dimesional details (LbAxis parameters)
- Parameters
zones (array like 1d) – An object containing zone names and indices for reference when indexing/slicing the skim matrix. The zones argument defines the two key axes of the labeled array (the i and j dimensions) as “From” and “To”. Multi-level indices (pd.MultiIndex, e.g.) are supported.
data (array like or numpy array constructor) – The ndarray that contains the data in the skim. This can be an on-disk pytables array or a numpy array or numpy array constructor.
axes_k ([LbAxis,..], default=None) – One or more LbAxis objects that describe the k-plus dimensions of the array (those besides the “From” and “To” dimensions defined by zones).
desc (String) – A description of the contents of the skim.
hdf_store (String, default=None) – If using an on-disk array to store the skim and initialize data, give the path to the H5 file. If data is an existing pytables array and hdf_store is provided, the data will be replicated in the new node.
node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.
name (String, default=None) – The name of the new node to be created.
atom (tb.Atom) – The atom (data type) of the new array.
driver (String, default=None) – The HDF connection driver.
overwrite (Boolean, default=False) – If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.
kwargs – Keyword arguments for initializing the data. Kwargs vary by the data initialization method provided.
-
nzones
¶ The number of zones in the matrix (the length of each axis)
- Type
Integer
-
axes
¶ The axes defining dimensions and labels. The zones parameter is used to creat the “From” and “To” dimensions, and these are combined with axes_k.
- Type
[LbAxis,..]
See also
LbArray
-
cast
(new_axes, squeeze=False, hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, desc='', **kwargs)¶ Returns a copy of this skim cast along additional axes_k dimensions.
- Overrides the LbArray implementation of cast as follows:
Data are always copied.
No existing dimensions can be dropped.
Selection criteria (keyword arguments) are ignored for the skim’s From and To axes to ensure consistency in the i and j dimensions.
To cast skims into labeled arrays, consider the impress method. To fill with alternative data before casting, consider the stamp method, followed by cast.
- Parameters
new_axes (LbAxis or [LbAxis,..]) – Labeled axis object(s) defining the dimensions into which an impression of this array will be stamped.
squeeze (Boolean, default=False) – If True, axes that have only a single label will be dropped and the dimensionality of the array selection reduced. If False, the returned array has the same number of dimensions as the source array.
hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.
node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.
name (String, default=None) – The name of the new node to be created.
atom (tb.Atom) – The atom (data type) of the new array.
driver (String, default=None) – The HDF connection driver.
overwrite (Boolean, default=False) – If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.
kwargs – Keyword arguments specifying a selection from this array to cast into new dimensions.
- Returns
casted – If copy_data is True, an LbArray is returned with data from the original array replicated in new dimensions. Otherwise, a new impression is made that can be filled with fresh data.
- Return type
-
copy
(hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, **kwargs)¶ Copy this skim’s contents to a new labeled array with the same axis names and labels. Optional arguments allow HDF storage in the specified file and node (overwriting existing content if directed)
If args are provided, they focus on copying the data into a new h5 data store node. Otherwise an in-memory copy (numpy) as returned.
- Parameters
hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.
node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.
name (String, default=None) – The name of the new node to be created.
atom (tb.Atom) – The atom (data type) of the new array.
driver (String, default=None) – The HDF connection driver.
overwrite (Boolean, default=False) – If casting to an on-disk array at an existing node, the node will be replaced with new data if overwrite=True.
-
fetch
(squeeze=False, **kwargs)¶ Retrieve values from the Skim based on axis names and labels.
This is essentially like calling take except the i and j dimensions always remain intact and a new Skim is returned.
- Parameters
squeeze (Boolean, default=False) – If True, axes that have only a single label will be dropped and the dimensionality of the array selection reduced. If False, the returned array has the same number of dimensions as the source array.
kwargs – kwargs can be passed with keys corresponding to axis names and values corresponding to labels in the given axis. The “From” and “To” axis (i and j dimensions) cannot be used when fetching - instead use take.
- Returns
filtered_skim
- Return type
See also
LbArray.take()
-
stamp
(fill_with, drop=None, squeeze=False, hdf_store=None, node_path=None, name=None, atom=Float64Atom(shape=(), dflt=0.0), driver=None, overwrite=False, constr_kwargs={}, desc='', **kwargs)¶ Returns a new skim based on this skim. The new skim retains axis details (shape and labels)reflecting axis criteria passed as keyword arguments. The skim must be filled with data. To create an unfilled impression, use the impress method.
- Parameters
fill_with (array-like, scalar, or constructor) – The data to fill this impression with.
drop ([LBAxis or String,..], default=None) – Specify any axes to be excluded from the new impression using the axis object or name. Axes used for keyword selection cannot be dropped.
squeeze (Boolean, default=False) – If True, axes that have only a single label will be dropped and the dimensionality of the new impression reduced. If False, the returned impression has the same number of dimensions as the source.
hdf_store (String, default=None) – If using an on-disk array to store the filled impression, give the path to the H5 file.
node_path (String, default=None) – The node path in the H5 hierarchy within the hdf_store.
name (String, default=None) – The name of the new node to be created.
atom (tb.Atom) – The atom (data type) of the new array.
driver (String, default=None) – The HDF connection driver.
overwrite (Boolean, default=False) – If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.
= {} (constr_kwargs) – If filling the impression with a constructor (np.ones, e.g.), pass any constructor function kwargs in a dictionary.
kwargs – kwargs can be passed with keys corresponding to axis names and values corresponding to labels in the given axis. For example a.take(origins=[1,2,3], destinations=[4,5,6]) or a.take(**{“origins”:[1,2,3], “destination”:[4,5,6]}).
See also
Impression.impress()
Functions¶
-
emma.od.
csrVApply
(vfunc, csr_mat, *args, **kwargs)¶ Apply a vectorized function to values in a sparse (csr) matrix. It also works for a csc matrix (?right? seems like it should).
- Parameters
vfunc (callable) – A vectorized function (see numpy.vectorize)
csr_mat (scipy.sparse.csr.csr_matrix) – A compressed sparse row (csr) matrix
*args (ordered arguments for vfunc) –
**kwargs (key-word args accepted by vfunc) –
-
emma.od.
distribute
(decay_skim, prods_df, prods_col, attrs_df, attrs_col, level=None, prods_id=None, prods_index=False, attrs_id=None, attrs_index=False, hdf_store=None, node_path=None, name=None, driver=None, overwrite=False, converges_at=1e-05, max_iters=500, tolerance=1e-08, report_convergence=False, **kwargs)¶ Create and solve an iterative proportional fitting (IPF) problem, seeded based on weights in the decay skim, trip productions in a data frame, and trip attractions in a data frame.
- decay_skim: Skim
The skim object containing decay factors for each OD pair.
- prods_df: DataFrame
The data frame containing estimates of trip productions.
- prods_col: String
The name of the field in prods_df that contains trip production estimates.
- attrs_df: DataFrame
The data frame containing estimates of trip productions.
- attrs_col: String
The name of the field in attrs_df that contains trip attraction estimates.
- level: String or Int, default=None
If the decay skim’s zones attribute is a MultiIndex, a level name may be specified on which to reindex the input data frame.
- prods_id: String, default=
A column in the productions data frame used to reindex the data frame to match the indexing of the decay skim.
- prods_index: Boolean, default=False
If true, the destinations data frame’s index will be the used when reindexing to match the indexing of the decay skim.
- attrs_id: String, default=None
A column in the attractions data frame used to reindex the data frame to match the indexing of the decay skim.
- attrs_index: Boolean, default=False
If true, the destinations data frame’s index will be the used when reindexing to match the indexing of the decay skim.
- hdf_store: String, default=None
If creating a new on-disk array to store the resulting array, give the path to the H5 file.
- node_path: String, default=None
The node path in the H5 hierarchy within the hdf_store.
- name: String, default=None
The name of the new node to be created.
- driver: String, default=None
The HDF connection driver.
- overwrite: Boolean, default=False
If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.
- converges_at: Float
Specificies a convergence value. If the percentage error between seed marginals and production and attraction targets is less than or equal to this value, the IPF process exits, returning an adequately fitted matrix. Default is 1e-5
- max_iters: Int
Maximum number of iterations allowed. The IPF process exits after this number of iterations even if convergence has not been achieved. Default is 500.
- tolerance: Float
The IPF process exits if the difference between the convergence variables of two consecutive iterations is below this value. If there is minimal difference between two iterations, the process is unlikely to acheive substantially stronger convergence through additional interations. Default is 1e-8.
- report_convergence: Boolean
If False (default), only the rebalanced matrix is returned. If True, the details of the IPF are returned as a tuple.
- report_convergence: Boolean
If False (default), only the rebalanced matrix is returned. If True, the details of the IPF are returned as a tuple.
- kwargs:
Keyword arguments defining slices of the decay skim to take for seeding distribution.
- Returns
trip_table (Skim) – A balanced matrix where marginals match (or approximate) the production and attraction totals given.
number_of_iterations (Int) – If report_convergence is True, the second value returned is the number of iterations completed by the IPF process
convergence (Boolean) – If report_convergence is True, the third value returned is a boolean flag indicating whether convergence was acheived.
narrowing (Boolean) – If report_convergence is True, the fourth value returned is a boolean flag indicating whether convergence was narrowing in the final iteration. If False, the IPF has exited due to minimal improvement in convergence in two consecutve runs.
-
emma.od.
loadOD_csv
(skim, source_file, o_field, d_field, val_fields_dict, chunk_rows=500000, level=None, **kwargs)¶ Import OD data from a long table (csv) into a labeled array (pytables array) format.
- Parameters
skim (Skim) – The skim object to which OD values will be assigned from the long table. Origin/desination indexing is handled by the skim object’s zones attribute.
source_file (String) – The csv file containing long-form OD data
o_field (String or Int) – The field name or index that identifies the origin zone in each OD row.
d_field (String or Int) – The field name or index that identifies the destination zone in each OD row.
val_fields_dict ({String: {String: String}, ..}) – A dictionary with keys corresponding to column names in source_file and values corresponding to dimension/label criteria in the skim ({“AM_HBW_Trips”: {“Period”: “AM”, “Purpose”: “HBW”}, e.g.).
chunk_rows (Int, default=500000) – If importing a large table, chunking rows is recommended to avoid memory errors.
level (String, default=None) – If the skim’s zones attribute is a multi-index, provide the level name within the index to which values in o_field and d_field correspond.
**kwargs – Keyword arguments for pandas read_csv function.
-
emma.od.
loadOD_dbf
(skim, source_file, o_field, d_field, val_fields_dict, chunk_rows=500000, level=None)¶ Import OD data from a long table (dbf) into a labeled array (pytables array) format.
- Parameters
skim (Skim) – The skim object to which OD values will be assigned from the long table. Origin/desination indexing is handled by the skim object’s zones attribute.
source_file (String) – The dbf file containing long-form OD data
o_field (String or Int) – The field name or index that identifies the origin zone in each OD row.
d_field (String or Int) – The field name or index that identifies the destination zone in each OD row.
val_fields_dict ({String: String, ..}) – A dictionary with keys corresponding to column names in source_file and values corresponding to dimension/label criteria in the skim ({“AM_HBW_Trips”: {“Period”: “AM”, “Purpose”: “HBW”}, e.g.).
chunk_rows (Int, default=500000) – If importing a large table, chunking rows is recommended to avoid memory errors.
level (String, default=None) – If the skim’s zones attribute is a multi-index, provide the level name within the index to which values in o_field and d_field correspond.
-
emma.od.
loadOD_gdb
(skim, source_file, layer_name, o_field, d_field, val_fields_dict, chunk_rows=500000, level=None)¶ Import OD data from a long table (dbf) into a labeled array (pytables array) format.
- Parameters
skim (Skim) – The skim object to which OD values will be assigned from the long table. Origin/desination indexing is handled by the skim object’s zones attribute.
source_file (String) – The geodatabase containing long-form OD data.
layer_name (String) – The feature class or table in source_file to import.
o_field (String or Int) – The field name or index that identifies the origin zone in each OD row.
d_field (String or Int) – The field name or index that identifies the destination zone in each OD row.
val_fields_dict ({String: String, ..}) – A dictionary with keys corresponding to column names in source_file and values corresponding to dimension/label criteria in the skim ({“AM_HBW_Trips”: {“Period”: “AM”, “Purpose”: “HBW”}, e.g.).
chunk_rows (Int, default=500000) – If importing a large table, chunking rows is recommended to avoid memory errors.
level (String, default=None) – If the skim’s zones attribute is a multi-index, provide the level name within the index to which values in o_field and d_field correspond.
-
emma.od.
loadOD_excel
(skim, source_file, sheet_name, o_field, d_field, val_fields_dict, level=None, **kwargs)¶ Import OD data from a long table (excel) into a labeled array (pytables array) format. No chunking is done with excel files.
- Parameters
skim (Skim) – The skim object to which OD values will be assigned from the long table. Origin/desination indexing is handled by the skim object’s zones attribute.
source_file (xlrd_io) – The excel file containing long-form OD data.
o_field (String or Int) – The field name or index that identifies the origin zone in each OD row.
d_field (String or Int) – The field name or index that identifies the destination zone in each OD row.
val_fields_dict ({String: String, ..}) – A dictionary with keys corresponding to column names in source_file and values corresponding to dimension/label criteria in the skim ({“AM_HBW_Trips”: {“Period”: “AM”, “Purpose”: “HBW”}, e.g.).
level (String, default=None) – If the skim’s zones attribute is a multi-index, provide the level name within the index to which values in o_field and d_field correspond.
-
emma.od.
maxInteractions
(zones_df, o_values, d_values)¶ Uses activity values in a zones data frame to evaluate the maximum “interaction” potential between each zone in a hypothetical, frictionless world.
Interaction describes probable trip-making potential. Maximum interactions analyses frame theoretical trip-making opportunities to support network effectiveness analyses and/or seed trip distribution processes.
- Parameters
zones_df (Pandas data frame) – The data frame containing activity informaiton. Zone labels are not required, but consistent indexing of the data frame with any related skim objects to be used in downstream procedures is essential.
o_values (String) – The column in zones_df with origin-end activity data (households, trip productions, e.g.)
d_values (String) – The column in zones_df with destination-end activity data (jobs, trip attractions, e.g.)
- Returns
maxInteractions
- Return type
2d array
See also
-
emma.od.
openSkim_HDF
(hdf_store, node_path)¶ Open a Skim object from a node in an hdf file.
The hdf file must have the expected attributes for reconstructing a Skim object (these are stored when a skim references an hdf node as its data attribute).
- Parameters
hdf_store (String) – The path to the H5 file.
node_path (String, default=None) – The path to the node in the H5 file.
- Returns
s – A skim object built around the on-disk array at the hdf node.
- Return type
-
emma.od.
summarizeAccess
(zones_df, activity_col, decay_skim, key_level=None, access_to_dests=True, **kwargs)¶ Uses activity values in a zones data frame and a decay skim to summarize the number of destination-end activities reachable from each origin zone (access_to_dests=True) or the number of origin-end activities that can reach each destination zone (access_to_dests=False).
- zones_df: Pandas data frame
The index is used to match zone names in the data frame to the zones attribute of the skim.
- activity_col: string or [string,…]
The column(s) in zones_df with activity data (jobs, households, e.g.)
- decay_skim: Skim
The skim object containing decay factors for each OD pair.
- Key_level: String or Int, default=None
If the decay skim’s zones attribute is a MultiIndex, a key level may be specified on which to reindex zones_df.
- access_to_dests: Boolean
If True (default), the function returns the number of destination activities reachable from each zone of origin (row sums). If False, the fucntion returns the number of origin activities that can reach each zone of destination (column sums).
- kwargs:
Keyword arguments that specify the axis dimensions and labels from which to fetch OD decay values from the decay_skim. For example: impedances=’CongTime’, period=[‘am’, ‘pm’].
-
emma.od.
weightedInteractions
(decay_skim, origins_df, origins_col, dests_df, dests_col, level=None, origins_id=None, origins_index=False, dests_id=None, dests_index=False, weighting_factor=1.0, hdf_store=None, node_path=None, name=None, driver=None, overwrite=False, **kwargs)¶ Create and solve an iterative proportional fitting problem, seeded based on weights in the decay skim, trip productions in a data frame, and trip attractions in a data frame.
- decay_skim: Skim
The skim object containing decay factors for each OD pair.
- origins_df: DataFrame
The data frame containing estimates of origin-end activity.
- origins_col: String
The name of the field in origins_df that contains origin activity estimates
- dests_df: DataFrame
The data frame containing estimates of destination-end activity.
- dests_col: String
The name of the field in dests_df that contains destination activity estimates
- level: String or Int, default=None
If the reference skim’s zones attribute is a MultiIndex, a level name may be specified on which to reindex the input data frames.
- origins_id: String, default=None
A column in the origins data frame used to reindex the data frame to match the indexing of the decay skim.
- prods_index: Boolean, default=False
If true, the origins data frame’s index will be the used when reindexing to match the indexing of the decay skim.
- dests_id: String, default=None
A column in the destinations data frame used to reindex the data frame to match the indexing of the decay skim.
- dests_index: Boolean, default=False
If true, the destinations data frame’s index will be the used when reindexing to match the indexing of the decay skim.
- weighting_factor: Numeric
A scalar value by which weighted interaction scores are multiplied (to approximate total trip counts when seeding a trip distibution matrix, e.g.)
- hdf_store: String, default=None
If creating a new on-disk array to store the resulting array, give the path to the H5 file.
- node_path: String, default=None
The node path in the H5 hierarchy within the hdf_store.
- name: String, default=None
The name of the new node to be created.
- driver: String, default=None
The HDF connection driver.
- overwrite: Boolean, default=False
If True, the existing node at `node_path`/`name` (if any) will be discarded and replaced with the new node.
- kwargs:
Keyword arguments defining slices of the decay skim to take for seeding distribution.
- Returns
weighted_interactions – A skim object with values reflecting proportional trip-making propensities among OD pairs.
- Return type
See also