It turns out that queries to juckins.net will give us the raw data that we need; so the library understands Web pages returned by those queries; but it could be used to read other kinds of input as well. The library, as supplied, knows about:
The code can be built with any C++ implementation that conforms to C++11 or later. The source files are available here. You’ll also need a template library that computes some simple statistcs, which library is available here.
The code is distributed under the
Boost Software License. (This is not part of
Boost. The author just likes their open-source license.)
namespace atkotp { enum class weekday { Error, Mo, Tu, We, Th, Fr, Sa, Su, All }; class one_day final { public: one_day(const std::string&, weekday, int); one_day(std::string&&, weekday, int); // Special member functions except one_day() void swap(one_day&); const std::string& date() const noexcept; weekday wday() const noexcept; int mins() const noexcept; }; using std::swap; void swap(one_day&, one_day&); typedef std::vector<one_day> all_days; class parser { public: virtual ~parser() noexcept { } enum state { Ok, Partial, Continue, Done, Error }; virtual state operator()(const std::string&, all_days&) = 0; virtual bool incomplete() const noexcept { return false; } }; class history_parser final : public parser // ... extern history_parser parse_history; class connection_parser final : public parser // ... extern connection_parser parse_connection; bool read(std::istream&, all_days&, parser&); std::string make_date(const std::string&, std::size_t); weekday make_wday(const std::string&, std::size_t) noexcept; int make_mins(const std::string&, std::size_t, bool = false) noexcept; std::string date_to_string(const std::string&); const char* wday_to_string(weekday) noexcept; std::string mins_to_string(int); void write_stats(std::ostream&, const all_days&, const char* = nullptr, int = INT_MIN); } // namespace atkotp
enum class weekday { Error, Mo, Tu, We, Th, Fr, Sa, Su, All };
This scoped enumeration provides symbols for, and gives integer values to, the days of the week, 1 for Monday through 7 for Sunday, which numbering is widely used in the travel industry. It also uses 0 to indicate some error and 8 to mean “all days of the week”.
class one_day final { public: one_day(const std::string&, weekday, int); one_day(std::string&&, weekday, int); // Default special member functions void swap(one_day&); const std::string& date() const noexcept; weekday wday() const noexcept; int mins() const noexcept; }; using std::swap; void swap(one_day&, one_day&);
This class just keeps three pieces of data together and doesn’t provide any particular behavior.
one_day(const std::string& date, weekday, int time); one_day(std::string&& date, weekday, int time);
one_day objects are normally constructed from a date in YYYYMMDD format, a day of the week, and a signed amount of time in minutes. Note that the date can be either copied or moved.A debug build (the NDEBUG macro not defined) will assert if the second argument is not one of weekday::Mo through weekday::Su.
one_day(const one_day&) = default; one_day(one_day&&) = default; one_day& operator=(const one_day&) = default; one_day& operator=(one_day&&) = default; ~one_day() = default; void swap(const one_day&); // Non-member: using std::swap; void swap(one_day&, one_day&);
one_day objects are freely copyable, moveable, and swappable. The destructor will be non-trivial.Note that there’s no publicly visible default constructor.
const std::string& date() const noexcept; weekday wday() const noexcept; int mins() const noexcept;
These just return the values passed to the constructor.
typedef std::vector<one_day> all_days;
The read() function described below will read a std::istream and load a vector<one_day>.
And it provides two such derived classes for the types of input that the library already knows about.class parser { public: virtual ~parser() noexcept { } enum state { Ok, Partial, Continue, Done, Error }; virtual state operator()(const std::string&, all_days&) = 0; virtual bool incomplete() const noexcept { return false; } };
These are passed as the third argument to the read function described below.class history_parser final : public parser { public: history_parser(); ~history_parser() noexcept { } state operator()(const std::string&, all_days&); int train() const noexcept; const std::string& station() const noexcept; bool incomplete() const noexcept; }; extern history_parser parse_history; class connection_parser final : public parser { public: connection_parser(); ~connection_parser() noexcept { } state operator()(const std::string&, all_days&); int arriving_train() const noexcept; int departing_train() const noexcept; const std::string& station() const noexcept; bool incomplete() const noexcept; }; extern connection_parser parse_connection;
Users who wish to parse other kinds of input can create such derived classes
of their own. The only thing that must be provided is a
function call operator
to override the pure virtual
The function call operator parses each line of text passed as its first argument and returns one of:
A return of Done, or just reading to the end of the stream, will be treated as a parse error if Partial was returned without a following Ok when the read function exits its loop.
Note that returning Partial makes sense only if not all information for a complete one_day can be found on a single input line. In this case, users’ parsers will probably need to keep some internal state.
It’s OK to push back a one_day for a day that a train didn’t run, or if for any reason
the data is missing for that day. Indeed, that might be desirable if your main program wants to take note of
such days. Just create the one_day object passing some out-of-band value to the constructor’s
third argument, which value will then be returned by
The function call operator may also extract data other than what’s needed to construct
a one_day. The two parsers supplied with the library, for example, get train numbers and station names
from the HTML returned by the juckins.net queries they’re designed to handle. If such additional data
is on an input line that doesn’t otherwise contain any one_day data,
The read() function calls the parser’s virtual
bool read(std::istream& istr, all_days&, parser&);
This function reads istr, loads the vector<one_day> passed bynon-const reference as the second argument calling the parser’s function call operator with each line of text from the input stream, and returns whether it actually loaded anything.Any I/O or parse error will be a fatal error: an error message will be written to std::cerr, and the function will call
std::exit(EXIT_FAILURE). Note that absence of input data is not a fatal error: the function just returns false if the parser couldn’t find anything to load into the vector<one_day>.
std::string make_date(const std::string&, std::size_t);
This function takes a U.S.-style middle-endian date in MM/DD/YYYY format and turns it into YYYYMMDD. The second argument is the position in the input string where the date will be found.
weekday make_wday(const std::string& input_line, std::size_t pos) noexcept;
This function takes a string like “Mo”, “Tu”, etc., and turns it into a weekday enumerator. The second argument is the position in the input string where the weekday will be found. It returns weekday::Error if the two characters beginning at input_line[pos] don’t make sense.
int make_mins(const std::string&, std::size_t, bool next_day = false) noexcept;
This function takes a time of day like “1:45PM” and turns it into a number of minutes after midnight. A space between the minutes and the AM or PM is optional.The second argument is the position in the input string where the time will be found.
If the optional third argument, next_day, is true, 24 hours will be added. This is intended to help with finding the difference between two times of day one of which slops over midnight.
std::string date_to_string(const std::string&);
This function just adds hyphens to a YYYYMMDD date to make YYYY-MM-DD. We might do something fancier Real Soon Now.
const char* wday_to_string(weekday) noexcept;
This function just returns the full English weekday name that corresponds to a weekday enumerator, or it returns"All Days" for weekday::All. A debug build (the NDEBUG macro not defined) will assert if weekday::Error is passed; but a production build will return "Error" in that case.
std::string mins_to_string(int);
This function takes a signed number of minutes and turns it into a string like “23:59” or “-0:30”. The hour will not have a leading zero unless the absolute value is less than one hour.
void write_stats(std::ostream&, const all_days&, const char* = nullptr, int = INT_MIN);
This is the function that creates HTML tables of statistics like the tables in the sample outputs below.The first argument is the output stream to which the HTML gets written. The second argument is the vector<one_day> that the read() function loaded.
The optional third argument is the text for the
<th colspan=5></th> element in the table headings (in the examples below, either“Late Time (negative means early)” or“Available Transfer Time”). If this argument defaults to nullptr, you’ll get the rather uninformative “Statistics”.The optional fourth argument, which defaults to INT_MIN, is a maximum out-of-band value which, when returned by
one_day::mins(), will mean “no data for today”. For example, if you pass −100 as this value, then anything less than −99 will mean “no data”. (The library-supplied parsers use INT_MIN as the only out-of-band value.)This function depends on a template library that computes the desired statistcs, which library is available here.
Both write an HTML report to the standard output (which, presumably, the user will redirect to a file for later loading into a browser). The output is just good old HTML1…nothing fancy…and no JavaScript or <a> tags that could get you somewhere you don’t want to be.
The author’s goal wasn’t to produce pretty reports, but to provide a quick and dirty way to estimate the likelihood of a trip working more or less as planned. If you’re looking for PDF timetables and archives of older Amtrak publications, Christopher Juckins has you covered.
If you can’t, or just don’t want to, build the code yourself, the two executables are available for Linux and for Windows.
Note that, at some intermediate stations, arrival times aren’t entered, so when doing the juckins.net query, you might have to ask for departure times instead of arrival times and hope that there wasn’t a large dwell time at that station on any of the days for which the query was run. Fortunately, both arrival and departure times are typically given at any station where the scheduled dwell time is longer than what it takes to just get the passengers on and off. |
An optional command line argument is an amount of time late (e.g., 2:30) that will trigger a warning message. The default is two hours. A single integer will be interpreted as a number of hours.
For example, having queried juckins.net for Northeast Regional train 92’s arrival times at New York and saved the resulting Web page in a file called 92nyp.html, the command
atk-history 3:15 <92nyp.htmlproduces output like:
For the 84 days when the arrival time is known:
Weekday | Data Points | Late Time (negative means early) | ||||
---|---|---|---|---|---|---|
Min. | Max. | Median | Mean | Std. Dev. | ||
Monday | 11 | -0:19 | 2:38 | 0:26 | 0:43 | 0:54 |
Tuesday | 13 | -1:02 | 2:09 | 0:22 | 0:20 | 0:49 |
Wednesday | 13 | -0:54 | 6:26 | 0:21 | 0:51 | 1:52 |
Thursday | 12 | -0:35 | 2:08 | 0:18 | 0:27 | 0:52 |
Friday | 12 | -0:17 | 4:54 | 0:09 | 0:48 | 1:30 |
Saturday | 12 | -0:58 | 4:05 | 0:10 | 0:43 | 1:26 |
Sunday | 11 | -0:34 | 1:59 | 0:08 | 0:19 | 0:43 |
All Days | 84 | -1:02 | 6:26 | 0:18 | 0:36 | 1:15 |
The train was more than 3:15 late on 3 days (3%).
[Note that we still call them “arrival times” even if your juckins.net query asked for departure times. We might fix that Real Soon Now.]
An optional command line argument is an integer that specifies the number of minutes to allow for the transfer. The default is ten minutes.
For example, having queried juckins.net for connections from the eastbound Texas Eagle to the eastbound Capitol Limited and saved the resulting Web page in a file called 22to30.html, the command
atk-connection <22to30.htmlproduces output like:
For the 84 days for which we do have data:
Weekday | Data Points | Available Transfer Time | ||||
---|---|---|---|---|---|---|
Min. | Max. | Median | Mean | Std. Dev. | ||
Monday | 13 | 3:34 | 7:49 | 5:22 | 5:22 | 0:53 |
Tuesday | 13 | 2:02 | 5:30 | 5:02 | 4:44 | 0:57 |
Wednesday | 13 | 2:24 | 10:10 | 5:03 | 5:09 | 1:41 |
Thursday | 12 | 3:53 | 5:27 | 5:08 | 4:52 | 0:29 |
Friday | 11 | 0:17 | 5:33 | 4:56 | 4:10 | 1:37 |
Saturday | 11 | 4:12 | 7:23 | 5:14 | 5:19 | 0:44 |
Sunday | 11 | 4:32 | 5:22 | 5:02 | 5:00 | 0:15 |
All Days | 84 | 0:17 | 10:10 | 5:06 | 4:57 | 1:08 |
The transfer time was less than 10 minutes on 0 days (0%),
so we missed the connection on 3 days (3%).
[Note that missing data counts as missing a connection in the error message but isn’t used for computing the statistics.]
For the 37 days when the arrival time is known:
Weekday | Data Points | Late Time (negative means early) | ||||
---|---|---|---|---|---|---|
Min. | Max. | Median | Mean | Std. Dev. | ||
Monday | 12 | -0:30 | 3:21 | 0:31 | 0:52 | 1:09 |
Thursday | 13 | -0:27 | 2:05 | -0:06 | 0:08 | 0:39 |
Saturday | 12 | -0:26 | 3:16 | 0:12 | 0:36 | 1:11 |
All Days | 37 | -0:30 | 3:21 | 0:09 | 0:31 | 1:03 |
The train was more than 2:00 late on 5 days (13%).
For the 36 days for which we do have data:
Weekday | Data Points | Available Transfer Time | ||||
---|---|---|---|---|---|---|
Min. | Max. | Median | Mean | Std. Dev. | ||
Monday | 12 | -0:41 | 1:43 | 0:44 | 0:46 | 0:42 |
Wednesday | 12 | -13:46 | 1:35 | 0:01 | -1:03 | 3:59 |
Friday | 12 | -5:46 | 1:21 | 0:41 | -0:06 | 1:57 |
All Days | 36 | -13:46 | 1:43 | 0:37 | -0:08 | 2:42 |
The transfer time was less than 10 minutes on 12 days (31%),
so we missed the connection on 15 days (38%).
and using <ctime> functions inherited from C to infer the weekday from the date.{ "date": "04/30/2023", "scheduled_time": "11:45AM", "actual_time": "12:00PM" }, { "date": "05/01/2023", "scheduled_time": "11:59PM", "actual_time": "12:01AM+1" }
Note that we can append “+1” to the actual arrival time to mean “next day”.
This uses the library-supplied make_date and make_mins functions which assume that the input, once it’s found, has the correct format. You might want to parse the data strings yourself if you don’t want to count on the JSON being correct.
This example doesn’t collect any additional information from the input stream which might otherwise be command line arguments to your main program. See the code for the supplied history_parser::operator() and connection_parser::operator() functions (defined in atkhist.cpp) for examples of getting train numbers and station names from the input stream.
// // my_parser.hpp // // A header for just the parser for including in the main program. // #ifndef MY_PARSER_HPP_INCLUDED #define MY_PARSER_HPP_INCLUDED #include "atkhist.hpp" namespace my_parser { class my_parser final : public atkotp::parser { std::string dt; atkotp::weekday dy; int stm, atm; // scheduled and actual times enum bits { None = 0, Date = 1, STime = 2, ATime = 4, All = 7 } found; void reset() noexcept { found = None; } bool nothing() noexcept { return found == None; } bool done() noexcept { return found == All; } bool got(bits b) noexcept { return int(found) & int(b); } void add(bits b) noexcept { found = bits(int(found) | int(b)); } bool get_date(const std::string&, std::size_t); int get_time(const std::string&, std::size_t); public: my_parser() : found(None) { } ~my_parser() = default; atkotp::parser::state operator()(const std::string&, atkotp::all_days&); }; } // namespace my_parser #endif // MY_PARSER_HPP_INCLUDED
// // my_parser.cpp // // A translation unit with definitions of my_parser’s three functions // that weren’t defined in-class. // #include "my_parser.hpp" #include <ctime> // time_t, tm, mktime namespace { using std::string; using std::size_t; using namespace atkotp; } namespace my_parser { // // The get_date function assigns values to the dt and dy data members // and returns whether it was successful. // bool my_parser::get_date(const string& input, size_t pos) { // // Find the '"' that starts the date: // pos = input.find_first_of('"', pos + 1); if (pos == string::npos) { return false; // error } dt = make_date(input, pos + 1); // // Infer the weekday: // std::tm t; t.tm_sec = t.tm_min = t.tm_hour = 0; t.tm_mday = stoi(string(dt, 6, 2)); t.tm_mon = stoi(string(dt, 4, 2)) - 1; t.tm_year = stoi(string(dt, 0, 4)) - 1900; if (std::mktime(&t) == std::time_t(-1)) { return false; // another possible error } dy = t.tm_wday == 0 ? weekday::Su : weekday(t.tm_wday); return true; } // // The get_time function doesn't know which data member to set, // so it just returns the time as a number of minutes after midnight, // or INT_MIN to indicate an error. Note that "+1" in the input string // means "next day". // int my_parser::get_time(const string& input, size_t pos) { static const char max_time[] = "12:34 AM +1"; pos = input.find_first_of('"', pos + 1); return pos == string::npos ? INT_MIN : make_mins(input, pos + 1, input.find("+1", pos) < pos + sizeof max_time); } // // Override the function call operator: // parser::state my_parser::operator()(const string& input, all_days& dest) { // // Help with avoiding magic numbers for string lengths: // static const char date_string[] = "date"; static const char sked_string[] = "scheduled_time"; static const char act_string[] = "actual_time"; // // Where the data is on the input line: // size_t pos; // // Allow any number of input fields in any order on a single input line. // if ((pos = input.find(date_string)) != string::npos) { if (got(Date)) { return parser::Error; // We already have a date. } if (!get_date(input, pos + sizeof date_string)) { return parser::Error; } add(Date); // and fall through } if ((pos = input.find(sked_string)) != string::npos) { if (got(STime)) { return parser::Error; // We already have a scheduled time. } stm = get_time(input, pos + sizeof sked_string); if (stm == INT_MIN) { return parser::Error; } add(STime)); } if ((pos = input.find(act_string)) != string::npos) { if (got(ATime)) { return parser::Error; // We already have an actual time. } atm = get_time(input, pos + sizeof act_string); if (atm == INT_MIN) { return parser::Error; } add(ATime)); } // // If we have enough information to construct a one_day object: // if (done()) { dest.push_back(one_day(dt, dy, atm - stm)); reset(); return parser::Ok; // note early return } // // If we don't know anything at all, we don't care about this input line. // if (nothing()) { return parser::Continue; } // // If we get here, we have some, but not all, of the data we need // for a one_day. If we’re at the end of a JSON block, that’s an error; // otherwise we hope that there's more interesting data in the block. // return input.find("}") != string::npos ? parser::Error : parser::Partial; } } // namespace my_parser // End of my_parser.cpp
“Beware of bugs in the above code. I have only proved it correct, not tried it.” — Donald E. Knuth