An Open-Source C++ Library
for Handling Amtrak On-Time Performance Data

Bill Seymour
2023-05-15


Contents


Introduction

This paper describes the public API of an open-source C++ library that’s intended to provide some fairly simple analysis of Amtrak on-time performance (OTP) data.  This paper also provides user documentation for two open-source programs that analyze such data and produce HTML reports.

It turns out that queries to juckins.net will give us the raw data that we need; so the library understands Web pages returned by those queries; but it could be used to read other kinds of files as well.  The library, as supplied, knows about:

The code can be built with any C++ implementation that conforms to C++11 or later.  The source files are available here.

The code is distributed under the Boost Software License.  (This is not part of Boost.  The author just likes their open-source license.)


Library Synopsis

namespace atkotp {

enum class weekday { Error, Mo, Tu, We, Th, Fr, Sa, Su, All };

class one_day final
{
public:
    one_day(const std::string&, weekday, int);
    one_day(std::string&&, weekday, int);

    // The canonical special member functions

    void swap(one_day&);

    const std::string& date() const noexcept;
    weekday day() const noexcept;
    int mins() const noexcept;
};
using std::swap;
void swap(one_day&, one_day&);

typedef std::vector<one_day> all_days;

class parser
{
public:
    virtual ~parser() noexcept { }
    enum state { Ok, Partial, Continue, Done, Error };
    state operator()(all_days&, const std::string&);
};

class history_parser final : public parser // ...
extern history_parser parse_history;

class connection_parser final : public parser // ...
extern connection_parser parse_connection;

bool read(std::istream&, all_days&, parser&);

std::string make_date(const std::string&, std::size_t);
weekday make_wday(const std::string&, std::size_t) noexcept;
int make_mins(const std::string&, std::size_t, bool = false) noexcept;

std::string date_to_string(const std::string&);
const char* wday_to_string(weekday) noexcept;
std::string mins_to_string(int);

typedef std::vector<int> times;
int min(const times&) noexcept;
int max(const times&) noexcept;
int median(const times&) noexcept;
double mean(const times&) noexcept;
double stddev(const times&, double) noexcept;

void make_stats(std::ostream&, const all_days&, const char* = nullptr);

} // namespace atkotp


Library Description

This simple library consists of just two source files, atkhist.hpp, the header that users #include, and atkhist.cpp, a translation unit that contains definitions of non-inline functions described herein along with some undocumented helpers.


The Days of the Week

enum class weekday { Error, Mo, Tu, We, Th, Fr, Sa, Su, All };
This scoped enumeration provides symbols for, and gives integer values to, the days of the week, 1 for Monday through 7 for Sunday, which is widely used in the travel industry.  It also uses 0 to indicate some error and 8 to mean “all days”.


One Day’s Data

class one_day final
{
public:
    one_day(const std::string&, weekday, int);
    one_day(std::string&&, weekday, int);

    // default special member functions

    void swap(one_day&);

    const std::string& date() const noexcept;
    weekday day() const noexcept;
    int mins() const noexcept;
};
using std::swap;
void swap(one_day&, one_day&);


Public Constructors

one_day(const std::string& date, weekday, int time_in_minutes);
one_day(std::string&& date, weekday, int time_in_minutes);
one_day objects are normally constructed from a date in YYYYMMDD format, a day of the week, and an amount of time.  Note that the date can be either copied or moved.

A debug build (the NDEBUG macro not defined) will assert if the second argument is not one of weekday::Mo through weekday::Su.


Special Member Functions and swap

one_day(const one_day&) = default;
one_day(one_day&&) = default;

one_day& operator=(const one_day&) = default;
one_day& operator=(one_day&&) = default;

~one_day() = default;

void swap(const one_day&);

// Non-member:
using std::swap;
void swap(one_day&, one_day&);
one_day objects are freely copyable, moveable, and swappable.  The destructor will be non-trivial.

Note that there’s no publicly visible default constructor.


Observers

const std::string& date() const noexcept;
weekday day() const noexcept;
int mins() const noexcept;
These just return the values passed to the constructor.


An “All Days” Container

typedef std::vector<one_day> all_days;
The read() function described below will read the input file and load a vector<one_day>.


Parsing Input Files

The library provides an abstract base class for a function object that parses the input files.
class parser
{
public:
    virtual ~parser() noexcept { }

    enum state { Ok, Partial, Continue, Done, Error };
    state operator()(all_days& d, const std::string& s)
    {
        return do_parse(d, s);
    }
protected:
    virtual state do_parse(all_days&, const std::string&) = 0;
};
And it provides two such derived classes for the types of input that the library already knows about.
class history_parser final : public parser
{
    state do_parse(all_days&, const std::string&);
public:
    ~history_parser() noexcept { }
};
extern history_parser parse_history;

class connection_parser final : public parser
{
    state do_parse(all_days&, const std::string&);
public:
    ~connection_parser() noexcept { }
};
extern connection_parser parse_connection;
These are passed as the third argument to the read function described below.

Users who wish to use the library to read other kinds of files can create such derived classes of their own.  The only thing that must be provided is a do_parse function to override the pure virtual parser::do_parse(all_days&,const std::string&).

The do_parse function parses each line of text passed as its second argument and returns one of:

It’s OK if do_parse() never returns Done; the read function will just read to the end of the file and then behave as if Done had been returned.

A return of Done, or just reading to the end of the file, will be treated as a parse error if Partial was returned without a following Ok when the read function exits its loop.

Note that a return of parser::Partial makes sense only if not all information for a complete one_day can be found on a single input line.  In this case, users’ parser classes will probably need to keep some internal state.

The do_parse function may also extract data other than what’s needed to construct a one_day.  If such additional data is on an input line that doesn’t otherwise contain any one_day data, do_parse() should return Continue.


Non-member Functions


Reading the Input File

bool read(std::istream& istr, all_days&, parser&);
This function reads istr, loads the vector<one_day> passed by non-const reference as the second argument calling the parser’s overloaded function call operator with each line of text from the input stream, and returns whether it actually loaded anything.

Any I/O or parse error will be a fatal error:  an error message will be written to the standard error device, and the function will call std::exit(EXIT_FAILURE).

Note that absence of input data is not a fatal error:  the function just returns false if the parser couldn’t find anything to load into the vector<one_day>.


Helpers for Parsing the Input

std::string make_date(const std::string&, std::size_t);
This function takes a U.S.-style middle-endian date in MM/DD/YYYY format and turns it into YYYYMMDD.  The second argument is the position in the input string where the date will be found.
weekday make_wday(const std::string& input_line, std::size_t pos) noexcept;
This function takes a string like “Mo”, “Tu”, etc., and turns it into a weekday enumerator.  The second argument is the position in the input string where the weekday will be found.  It returns weekday::Error if the two characters beginning at input_line[pos] don’t make sense.
int make_mins(const std::string&, std::size_t, bool next_day = false) noexcept;
This function takes a time of day like “1:45PM” and turns it into a number of minutes after midnight.  A space between the minutes and the AM or PM is optional.

The second argument is the position in the input string where the time will be found.

If the optional third argument, next_day, is true, 24 hours will be added.  This is intended to help with finding the difference between two times of day one of which slops over midnight.


Converting to Human-Readable Text

std::string date_to_string(const std::string&);
This function just adds hyphens to a YYYYMMDD date to make YYYY-MM-DD.  We might do something fancier Real Soon Now.
const char* wday_to_string(weekday) noexcept;
This function just returns the full English weekday name that corresponds to a weekday enumerator, or it returns "All Days" for weekday::All.  A debug build (the NDEBUG macro not defined) will assert if weekday::Error is passed.
std::string mins_to_string(int);
This function takes a signed number of minutes and turns it into a string like “23:59” or “-0:30”.  The hour will not have a leading zero unless the absolute value is less than one hour.


Some Trivial Statistics

typedef std::vector<int> times;
int min(const times&) noexcept;
int max(const times&) noexcept;
int median(const times&) noexcept;
double mean(const times&) noexcept;
double stddev(const times&, double) noexcept;
Given a vector of ints that’s already sorted ascending, we can compute the minimum, maximum, median, mean, and standard deviation.  In a debug build (the NDEBUG macro not defined), all of these will assert if the argument is an empty vector.
void make_stats(std::ostream&, const all_days&, const char* = nullptr);
This is the function that creates HTML tables of statistics like the tables in the sample outputs below.

The first argument is the output stream to which the HTML gets written.  The second argument is the vector<one_day> that the read() function loaded.  The optional third argument is the text for the <th colspan=5></th> element in the table headings (in the examples below, either Late Time (negative means early) or Available Transfer Time”).  The default is the rather uninformative “Statistics”.


Two Programs that Use the Library

Supplied with the library are sources for two complete programs, atk-history.cpp for analyzing historical OTP data, and atk-connection.cpp for estimating the likelihood of making connections between trains.  Both programs require that the user first run the appropriate query at juckins.net, do a “Save As” of the returned Web page, then run the desired program from a command line redirecting the saved Web page to the standard input.

Both write an HTML report to the standard output (which, presumably, the user will redirect to a file for later loading into a browser).  The output is just good old HTML1…nothing fancy…and no JavaScript or <a> tags that could get you somewhere you don’t want to be.

The author’s goal wasn’t to produce pretty reports, but to provide a quick and dirty way to estimate the likelihood of a trip working more or less as planned.  If you’re looking for PDF timetables and archives of older Amtrak publications, Christopher Juckins has you covered.

If you can’t, or just don’t want to, build the code yourself, the two executables are available for Linux and for Windows.


OTP History

The atk-history program analyzes results from a query done at https://juckins.net/amtrak_status/archive/html/history.php and reports arrival times at particular stations.

Note that, at some intermediate stations, arrival times aren’t entered, so when doing the juckins.net query, you might have to ask for departure times instead of arrival times and hope that there wasn’t a large dwell time at that station on any of the days for which the query was run.  Fortunately, both arrival and departure times are typically given at any station where the scheduled dwell time is longer than what it takes to just get the passengers on and off.

An optional command line argument is an amount of time late (e.g., 2:30) that will trigger a warning message.  The default is two hours.  A single integer will be interpreted as a number of hours.

For example, having queried juckins.net for Northeast Regional train 92’s arrival times at New York and saved the resulting Web page in a file called 92nyp.html, the command

atk-history 3:15 <92nyp.html
produces output like:


On-time Performance for Amtrak train 92
at New York - Penn Station, NY

For 2021-11-01 through 2022-01-28

There’s no data for 5 days (6%) of 89 total days.

For the 84 days when the arrival time is known:

WeekdayData
Points
Late Time (negative means early)
Min.Max.MedianMeanStd. Dev.
Monday11-0:192:380:260:430:54
Tuesday13-1:022:090:220:200:49
Wednesday13-0:546:260:210:511:52
Thursday12-0:352:080:180:270:52
Friday12-0:174:540:090:481:30
Saturday12-0:584:050:100:431:26
Sunday11-0:341:590:080:190:43
All Days84-1:026:260:180:361:15

The train was more than 3:15 late on 3 days (3%).


[Note that we still call them “arrival times” even if your juckins.net query asked for departure times.  We might fix that Real Soon Now.]


Connection Likelihood

The atk-connection program analyzes results from a query done at https://juckins.net/amtrak_status/archive/html/connections.php.

An optional command line argument is an integer that specifies the number of minutes to allow for the transfer.  The default is ten minutes.

For example, having queried juckins.net for connections from the eastbound Texas Eagle to the eastbound Capitol Limited and saved the resulting Web page in a file called 22to30.html, the command

atk-connection <22to30.html
produces output like:


Likelihood of connecting from train 22 to train 30
at Chicago - Union Station, IL

For 2021-11-01 through 2022-01-26

There’s no data for one or both trains on 3 days (3%) of 87 total days.

For the 84 days for which we do have data:

WeekdayData
Points
Available Transfer Time
Min.Max.MedianMeanStd. Dev.
Monday133:347:495:225:220:53
Tuesday132:025:305:024:440:57
Wednesday132:2410:105:035:091:41
Thursday123:535:275:084:520:29
Friday110:175:334:564:101:37
Saturday114:127:235:145:190:44
Sunday114:325:225:025:000:15
All Days840:1710:105:064:571:08

The transfer time was less than 10 minutes on 0 days (0%),
so we missed the connection on 3 days (3%).


[Note that missing data counts as missing a connection in the error message but isn’t used for computing the statistics.]


Two More Output Examples

Here are two examples of actual output showing that the code can handle less-than-daily trains, one for the westbound Cardinal at Chicago, and one for connecting from the Sunset Limited to the earliest Pacific Surfliner of the day.


On-time Performance for Amtrak train 51
at Chicago - Union Station, IL

For 2021-11-01 through 2022-01-29

There’s no data for 3 days (8%) of 40 total days.

For the 37 days when the arrival time is known:

WeekdayData
Points
Late Time (negative means early)
Min.Max.MedianMeanStd. Dev.
Monday12-0:303:210:310:521:09
Thursday13-0:272:05-0:060:080:39
Saturday12-0:263:160:120:361:11
All Days37-0:303:210:090:311:03

The train was more than 2:00 late on 5 days (13%).


Likelihood of connecting from train 1 to train 562
at Los Angeles - Union Station, CA

For 2021-11-01 through 2022-01-28

There’s no data for one or both trains on 3 days (8%) of 39 total days.

For the 36 days for which we do have data:

WeekdayData
Points
Available Transfer Time
Min.Max.MedianMeanStd. Dev.
Monday12-0:411:430:440:460:42
Wednesday12-13:461:350:01-1:033:59
Friday12-5:461:210:41-0:061:57
All Days36-13:461:430:37-0:082:42

The transfer time was less than 10 minutes on 12 days (31%),
so we missed the connection on 15 days (38%).


Appendix:  Example of a User-defined Parser

Here’s one way to overload the parser::do_parse function to check arrival times using as input JSON of the form:
{
  "date": "04/30/2023",
  "scheduled_time": "11:45AM",
  "actual_time": "12:00PM"
},
{
  "date": "05/01/2023",
  "scheduled_time": "11:59PM",
  "actual_time": "12:01AM+1"
}
and using <ctime> functions inherited from C to infer the weekday from the date.

Note that we can append “+1” to the actual arrival time to mean “next day”.

This uses the library-supplied make_date and make_mins functions which assume that the input, once it’s found, has the correct format.  You might want to parse the data strings yourself if you don’t want to count on the JSON being correct.

This example doesn’t collect any additional information from the input stream which might otherwise be command line arguments to your main program.  See the code for the supplied history_parser::do_parse and connection_parser::do_parse functions (defined in atkhist.cpp) for examples of getting train numbers and station names from the input stream.


//
// my_parser.hpp
//
// A header for just the parser for including in the main program.
//

#ifndef MY_PARSER_HPP_INCLUDED
#define MY_PARSER_HPP_INCLUDED

#include "atkhist.hpp"

namespace my_parser {

class my_parser final : public atkotp::parser
{
    std::string dt;
    atkotp::weekday dy;
    int stm, atm; // scheduled and actual times in minutes

    enum bits { None = 0, Date = 1, STime = 2, ATime = 4, All = 7 } found;
    void reset() noexcept { found = None; }
    bool nothing() noexcept { return found == None; }
    bool done() noexcept { return found == All; }
    bool got(bits b) noexcept { return int(found) & int(b); }
    void add(bits b) noexcept { found = bits(int(found) | int(b)); }

    bool get_date(const std::string&, std::size_t);
    int get_time(const std::string&, std::size_t);

    atkotp::parser::state do_parse(atkotp::all_days&, const std::string&);

public:
    my_parser() : found(None) { }

    //
    // If we don't want it to be copyable or moveable:
    //
    my_parser(const my_parser&) = delete;
    my_parser(my_parser&&) = delete;
    my_parser& operator=(const my_parser&) = delete;
    my_parser& operator=(my_parser&&) = delete;

    ~my_parser() = default;
};

} // namespace my_parser

#endif // MY_PARSER_HPP_INCLUDED


// // my_parser.cpp // // A translation unit with definitions of my_parser’s three private functions // that weren’t defined in-class. // #include "my_parser.hpp" #include <climits> // INT_MIN #include <ctime> // time_t, tm, mktime namespace { using std::string; using std::size_t; using namespace atkotp; } namespace my_parser { // // The get_date function assigns values to the dt and dy data members // and returns whether it was successful. // bool my_parser::get_date(const string& input, size_t pos) { // // Find the '"' that starts the date: // pos = input.find_first_of('"', pos + 1); if (pos == string::npos) { return false; // error } dt = make_date(input, pos + 1); // // Infer the weekday: // std::tm t; t.tm_sec = t.tm_min = t.tm_hour = 0; t.tm_mday = stoi(string(dt, 6, 2)); t.tm_mon = stoi(string(dt, 4, 2)) - 1; t.tm_year = stoi(string(dt, 0, 4)) - 1900; if (std::mktime(&t) == std::time_t(-1)) { return false; // another possible error } dy = t.tm_wday == 0 ? weekday::Su : weekday(t.tm_wday); return true; } // // The get_time function doesn't know which data member to set, // so it just returns the time as a number of minutes after midnight, // or INT_MIN to indicate an error. Note that "+1" in the input string // means "next day". // int my_parser::get_time(const string& input, size_t pos) { static const char max_time[] = "12:34 AM +1"; pos = input.find_first_of('"', pos + 1); return pos == string::npos ? INT_MIN : make_mins(input, pos + 1, input.find("+1", pos) < pos + sizeof max_time); } // // Override parser::do_parse(): // parser::state my_parser::do_parse(all_days& dest, const string& input) { // // Help with avoiding magic numbers: // static const char date_string[] = "date"; static const char sked_string[] = "scheduled_time"; static const char act_string[] = "actual_time"; // // Where the data is on the input line: // size_t pos; // // Allow any number of input fields in any order on a single input line. // if ((pos = input.find(date_string)) != string::npos) { if (got(Date)) { return parser::Error; // We already have a date. } if (!get_date(input, pos + sizeof date_string)) { return parser::Error; } add(Date); // and fall through } if ((pos = input.find(sked_string)) != string::npos) { if (got(STime)) { return parser::Error; // We already have a scheduled time. } stm = get_time(input, pos + sizeof sked_string); if (stm == INT_MIN) { return parser::Error; } add(STime)); } if ((pos = input.find(act_string)) != string::npos) { if (got(ATime)) { return parser::Error; // We already have an actual time. } atm = get_time(input, pos + sizeof act_string); if (atm == INT_MIN) { return parser::Error; } add(ATime)); } // // If we have enough information to construct a one_day object: // if (done()) { dest.push_back(one_day(dt, dy, atm - stm)); reset(); return parser::Ok; // note early return } // // If we don't know anything at all, we don't care about this input line. // if (nothing()) { return parser::Continue; } // // If we get here, we have some, but not all, of the data we need // for a one_day. If we’re at the end of a JSON block, that’s an error; // otherwise we hope that there's more interesting data in the block. // return input.find("}") != string::npos ? parser::Error : parser::Partial; } } // namespace my_parser // End of my_parser.cpp


“Beware of bugs in the above code.  I have only proved it correct, not tried it.” — Donald E. Knuth


All suggestions and corrections will be welcome; all flames will be amusing.
Mail to was@pobox.com.