Step 1 - Clean Skims

This script is used to read data from source csv files representing OD skims in long form. The skims are filtered and columns renamed on the fly to prepare the tables to be imported in the emma Skim object.

Workflow:

  • Specify the network configuration

  • Specify input/output files for each travel mode

  • Specify column renaming specifications, as needed

  • Specify criteria for row exclusions, as needed

  • Clean skims

Functions

The following functions are referenced in this script, from the wsa.clean_skims (or cs) submodule:

wsa.cs.previewSkim(in_file, nrows=5, logger=None, **kwargs)

Return the top rows of a csv file to preview its contents.

Parameters
  • in_file (String) – Path to csv file

  • nrows (Integer) – The number of rows at the top of the table to load in the preview.

  • logger (Logger) – If desired, pass a logger object to record the skim preview. All logging is done at the INFO level.

  • kwargs – Keyword arguments that can be passed to pandas.read_csv

Returns

preview

Return type

pd.DataFrame

wsa.cs.cleanSkims(in_file, out_file, criteria, rename={}, chunksize=50000, logger=None, **kwargs)

Ingest a raw skim and retain only those rows that meet the provided criteria (all criteria must be true).

Parameters
  • in_file (String) – Path to the raw csv data

  • out_file (String) – Path to the new csv output

  • criteria ([(String, String, Var),..]) –

    A list of tuples. Each tuple contains specifications for filtering the raw csv data by a particular criterion. The tuple consists of three parts: (reference column, comparator, value). Use column names expected after renaming, if rename if provided. Comparators may by provided as strings corresponding to built-in class comparison methods:

    • __eq__() = equals [==]

    • __ne__() = not equal to [!=]

    • __lt__() = less than [<]

    • __le__() = less than or equal to [<=]

    • __gt__() = greater than [>]

    • __ge__() = greater than or equal to [>=]

  • rename ({String: String,..}, default={}) – Optionally rename columns in the raw data based on key: value pairs in a dictionary. The key is the existing column name, and the value is the new name for that column. Only columns for which renaming is desired need to be included in the dictionary.

  • chunksize (Int) – The number of rows to read in from in_file at one time. All rows are evaluated in chunks to manage memory consumption.

  • logger (Logger) – If desired, pass a logger object to record information about the skim cleaning process. All logging is done at the INFO level.

  • kwargs – Any keyword arguments passed to pandas.read_csv for loading in_file.

Returns

Return type

None - outfile is written during this process.