Step 1 - Clean Skims¶
This script is used to read data from source csv files representing OD skims in long form. The skims are filtered and columns renamed on the fly to prepare the tables to be imported in the emma Skim object.
Workflow:
Specify the network configuration
Specify input/output files for each travel mode
Specify column renaming specifications, as needed
Specify criteria for row exclusions, as needed
Clean skims
Functions¶
The following functions are referenced in this script, from the wsa.clean_skims (or cs) submodule:
-
wsa.cs.
previewSkim
(in_file, nrows=5, logger=None, **kwargs)¶ Return the top rows of a csv file to preview its contents.
- Parameters
in_file (String) – Path to csv file
nrows (Integer) – The number of rows at the top of the table to load in the preview.
logger (Logger) – If desired, pass a logger object to record the skim preview. All logging is done at the INFO level.
kwargs – Keyword arguments that can be passed to pandas.read_csv
- Returns
preview
- Return type
pd.DataFrame
-
wsa.cs.
cleanSkims
(in_file, out_file, criteria, rename={}, chunksize=50000, logger=None, **kwargs)¶ Ingest a raw skim and retain only those rows that meet the provided criteria (all criteria must be true).
- Parameters
in_file (String) – Path to the raw csv data
out_file (String) – Path to the new csv output
criteria ([(String, String, Var),..]) –
A list of tuples. Each tuple contains specifications for filtering the raw csv data by a particular criterion. The tuple consists of three parts: (reference column, comparator, value). Use column names expected after renaming, if rename if provided. Comparators may by provided as strings corresponding to built-in class comparison methods:
__eq__() = equals [==]
__ne__() = not equal to [!=]
__lt__() = less than [<]
__le__() = less than or equal to [<=]
__gt__() = greater than [>]
__ge__() = greater than or equal to [>=]
rename ({String: String,..}, default={}) – Optionally rename columns in the raw data based on key: value pairs in a dictionary. The key is the existing column name, and the value is the new name for that column. Only columns for which renaming is desired need to be included in the dictionary.
chunksize (Int) – The number of rows to read in from in_file at one time. All rows are evaluated in chunks to manage memory consumption.
logger (Logger) – If desired, pass a logger object to record information about the skim cleaning process. All logging is done at the INFO level.
kwargs – Any keyword arguments passed to pandas.read_csv for loading in_file.
- Returns
- Return type
None - outfile is written during this process.