I/O¶
Travel analyses often use information about thousands of zones and record information for all possible origin-destination pairs, generating large matrices. It is often impossible to allocate enough memory to store these large datasets or perform analytical tasks with them. Moreover, while in-memory operations are performant, many matrices’ data must persist across numerous processes. Emma uses the H5 file format and pytables to store arrays on-disk and efficiently consume and manipulate these data for travel analysis. The io module provides classes and functions to make these operations as seamless as possible.
Classes and Functions
H5Array: A class that manages connections to an H5 file. Users do not need to implement this class directly. Rather, the LbArray and Skim classes use it when their data attributes refer to on-disk arrays in H5 files.
initH5Array: A function that initializes an on-disk array. Users do no need to call this function directly. Rather, the LbArray and Skim classes use it to intialize on-disk arrays in H5 files as need when setting their data attributes.
listH5Nodes: A convenient function for viewing the contents of an H5 file.
sparseToHDF/loadSparse_HDF: basic functions to support the storage and loading of sparse matrices in H5 files.
H5Array¶
-
class
emma.io.
H5Array
(hdf_store, node_path, driver=None)¶ A lightweight class facilitating access to a pytables on-disk array.
- Parameters
hdf_store (String) – The path to the H5 file.
node_path (String) – The path to the array node in the H5 file.
driver (String, default=None) – The HDF connection driver.
-
open
(mode='a')¶ Connect to data at this HDF node. Can be used as a with block.
- Parameters
mode to open the file. It can be one of the following (The) –
-‘r’: Read-only; no data can be modified.
-‘w’: Write; a new file is created (an existing file with the same name would be deleted).
-‘a’: Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
-‘r+’: It is similar to ‘a’, but the file must already exist.
Functions¶
-
emma.io.
listH5Nodes
(hdf_file, at_node='/', recursive=False, pretty=False)¶ Returns a dictionary of nodes in an H5 file, within a specified node (default is the root). Recursive option allows all subnodes to be listed.
- Parameters
hdf_file (String) – The path to the h5 file.
at_node (String, default="/") – The node within hdf_file whose child nodes will be listed.
recursive (Boolean, default=False) – If True, all descendant nodes from at_node are listed in a nested dictionary. Otherwise, onlt at_node’s immediate children are returned.
pretty (Boolean, default=False) – If True, the returned dictionary is dumped to a yaml string. Pass this to Python’s print function for a highly-legible view of the H5 hierarchy. if False, a dicitonary of nodes is returnd.
- Returns
nodes – The nodes dictionary contains node names as keys and dictionaries as values. If recursive is True, each nested dictionary conforms to this structure. Empty dict values indicate no children. If pretty, the nodes dictionary is dumped to a string.
- Return type
Dict or String
-
emma.io.
sparseToHDF
(sparse_mat, hdf_file, hdf_node_name)¶ Store a csr matrix in HDF5
- Parameters
sparse_mat (sparse matrix) – The sparse matrix to be stored. The data will be stored in csr format. See scipy.sparse module docs for sparse matrix formats.
hdf_file (str) – The path to the HDF5 file where the matrix data will be stored.
hdf_node_name (str) – The node prefix in the HDF5 hierarchy within the hdf_file.
-
emma.io.
loadSparse_HDF
(hdf_file, hdf_node_name)¶ Load a sparse matrix from HDF5.
- Parameters
hdf_file (str) – The path to the HDF5 file where the matrix data are stored.
hdf_node_name (str) – The node prefix in the HDF5 hierarchy within the hdf_file.
- Returns
sparse_mat – A loaded sparse matrix. The matrix is always in csr format but can be easily converted to other sparse formats on the fly.
- Return type
scipy.sparse.csr.csr_matrix