Reading Directories in C++

Bill Seymour
2024-03-09


Abstract

This paper describes a quick and dirty class for cycling through directories.  It’s portable to POSIX systems and Microsoft Windows.  (The author recently had reason to loop through both Linux and Windows directories and wanted to create just a single way to do it.)

This is all really old news, and so all the code is in the public domain.

The class doesn’t take any action on any directory entry and doesn’t automatically recurse through subdirectories.  All it does is return information about the directory entries.  Users can take whatever action they like including explicitly recursing through subdirectories.  (Deleting a file in an open directory could cause problems.)

It’s intended that an instance of the class be a singleton, and it’s neither copyable nor moveable.

The code requires at least a C++11 compiler and standard library, and it requires 64-bit time_ts.

The class is defined in the dirrdr namespace.

The source files are available in https://www.cstdbill.com/dirrdr/directory_reader.zip which includes:


Synposis

static_assert(sizeof(std::time_t) * CHAR_BIT >= 64, "64-bit time_t required");

namespace dirrdr {

class directory_reader final
{
public:
    directory_reader();
    ~directory_reader();

    directory_reader(const directory_reader&) = delete;
    directory_reader(directory_reader&&) = delete;
    directory_reader& operator=(const directory_reader&) = delete;
    directory_reader& operator=(directory_reader&&) = delete;

    directory_reader& open(const char* = nullptr);
    directory_reader& open(const std::string&);

    void close();

    bool read_next() /*noexcept on Windows*/;
    bool rewind() /*noexcept on Windows*/;

    bool at_end() const noexcept;
    std::size_t dir_count() const noexcept;

    const std::string& directory() const noexcept;

    const char* entry_name() const noexcept;

    unsigned long long file_size() const noexcept;

    bool is_regular_file() const noexcept;
    bool is_directory() const noexcept;

    const implementation-specific& more_data() const noexcept;
};

} // namespace dirrdr


Special Member Functions

directory_reader();
The default constructor is the only constructor.  It just creates an object that’s not connected to any particular directory yet.
~directory_reader();
The destructor is non-trivial.
directory_reader(const directory_reader&) = delete;
directory_reader(directory_reader&&) = delete;
directory_reader& operator=(const directory_reader&) = delete;
directory_reader& operator=(directory_reader&&) = delete;
Instances of this class are neither copyable nor moveable.


Open, Close, Etc.

directory_reader& open(const char* = nullptr);
directory_reader& open(const std::string&);
These open a particular directory the name of which is passed as the argument, and then they read the first directory entry.

If no other directory is currently open, the argument must be either a full path to a directory, or a null const char* or empty string which will open the current working directory.

If another directory is currently open, the state will be pushed onto a stack and the new directory will be opened.  In this case, a non-null, non-empty argument is required.  If it begins with a directory separator character ('/' on POSIX, '\\' on Windows), it will be taken to be a full path; otherwise it will be a directory relative to the most recently opened directory.  (Opening a path relative to the currently open directory is how you recurse through subdirectories.)

On Windows, the name may optionally begin with a drive letter followed by a colon.

These functions will throw an invalid_argument exception if they can’t open the directory for some reason.  The POSIX implementation can also throw a runtime_error if it reads the first directory entry successfully but then lstat() fails.

The open() functions return *this.

bool at_end() const noexcept;
std::size_t dir_count() const noexcept;
at_end() returns whether all opened directories have been closed; dir_count() returns the number of currently open directories.

Class invariant:  at_end() == (dir_count() == 0)

The functions above are the only ones that can be called without a directory already being open.  In a debug build (the NDEBUG macro not defined), all the other functions will assert() if no directory is open.  In a production build, you might get demons flying out of your nose.

void close() noexcept;
This closes the current directory and pops the state off the stack so that we’re back to where we were with the previous directory (or back to nothing open).
bool read_next() /*noexcept on Windows*/;
This reads the next directory entry and returns whether it’s successful.  If it returns false, that means that we’re at the end of the directory, i.e., that there wasn’t another entry to read.

The POSIX implementation will throw a runtime_error if it can’t lstat() the directory entry; the Windows implementation won’t throw.

bool rewind() /*noexcept on Windows*/;
This gets back to the beginning of the directory, rereads the first entry, and returns whether it’s successful, which should always be true.  (If it returns false, that means that there’s no first entry, not even . or .. , which would be really surprising.)

Since it rereads the first entry, the POSIX implementation will throw a runtime_error if lstat() fails.


Returned Data

const std::string& directory() const noexcept;
This is the directory’s full path.
const char* entry_name() const noexcept;
This is the directory entry’s name.
unsigned long long file_size() const noexcept;
bool is_regular_file() const noexcept;
bool is_directory() const noexcept;
There are three functions that return the intersection of the reliable information available on POSIX and Windows.
const implementation-specific& more_data() const noexcept;
This is what you can use if you need more O/S-specific information about a directory entry.  It returns a struct stat on a POSIX system, or a _finddatai64_t* on Windows.


*I can’t find a description of Microsoft’s vendor lock-in
 _finddatai64_t on the Web, so I can’t just link to it.

It’s defined in <io.h> as:

struct _finddatai64_t {
    unsigned attrib;
    time_t   time_create;
    time_t   time_access;
    time_t   time_write;
    __int64  size;
    char     name[FILENAME_MAX];
};

The various bits in attrib are:

#define	_A_NORMAL 0x00000000
#define	_A_RDONLY 0x00000001
#define	_A_HIDDEN 0x00000002
#define	_A_SYSTEM 0x00000004
#define	_A_VOLID  0x00000008
#define	_A_SUBDIR 0x00000010
#define	_A_ARCH   0x00000020
I’ve determined experimentally that 0x400 is the symlink
bit, but I can’t find that documented anywhere.

An __int64 is the same thing as a std::int64_t.

FILENAME_MAX is probably 260 (YMMV).


All suggestions and corrections will be welcome; all flames will be amusing.
Mail to was@pobox.com.