Collection of file system based algorithms. More...

Namespaces
	impl
	Does not exist in the actual source code, purely here to workaround doxygen limitations.

	shared_fs_mutex
	Algorithms for protecting a shared filing system resource from racy modification.

Classes
struct	contents_visitor
	A visitor for the filesystem contents algorithm. More...

struct	comparison_summary

struct	compare_visitor
	A visitor for the filesystem traversal and comparison algorithm. More...

struct	reduce_visitor
	A visitor for the filesystem traversal and reduction algorithm. More...

struct	traversal_summary
	A summary of a directory tree. More...

struct	summarize_visitor
	A visitor for the filesystem traversal and summary algorithm. More...

struct	traverse_visitor
	A visitor for the filesystem traversal algorithm. More...

class	trivial_vector
	Provides a constant time capacity expanding move-only STL vector. Requires `T` to be trivially copyable. More...

class	cached_parent_handle_adapter
	Adapts any `construct()`-able implementation to cache its parent directory handle in a process wide cache. More...

class	combining_handle_adapter
	A handle combining the data from one or two other handles. More...

Typedefs
template<class Target , class Source >
using	xor_handle_adapter = combining_handle_adapter< detail::xor_handle_adapter_op, Target, Source >
	A handle combining the data from two other handles using XOR. More...

Functions
result< file_handle::extent_type >	clone_or_copy (file_handle &src, const path_handle &destdir, path_view destleaf={}, bool preserve_timestamps=true, bool force_copy_now=false, file_handle::creation creation=file_handle::creation::always_new, deadline d={}) noexcept
	Clone or copy the extents of the filesystem entity identified by `src` to `destdir` optionally renamed to `destleaf`. More...

result< bool >	relink_or_clone_copy_unlink (file_handle &src, const path_handle &destdir, path_view destleaf={}, bool atomic_replace=true, bool preserve_timestamps=true, bool force_copy_now=false, deadline d={}) noexcept
	Relink or clone-unlink/copy-unlink the extents of the filesystem entity identified by `src` to `destdir` optionally renamed to `destleaf`. This lets you relink files across filing systems, with as close as possible matching of semantics to atomic relinking. More...

result< contents_visitor::contents_type >	contents (const path_handle &dirh, contents_visitor *visitor=nullptr, size_t threads=0, bool force_slow_path=false) noexcept
	Calculate the contents of everything within and under `dirh`. What is returned is unordered. More...

result< comparison_summary >	compare (const path_handle &ldirh, const path_handle &rdirh, stat_t::want want=comparison_summary::default_metadata(), compare_visitor *visitor=nullptr, size_t threads=0, bool force_slow_path=false) noexcept
	Compares the directories identified by `ldirh` and `rdirh`, and everything therein. More...

result< size_t >	reduce (directory_handle &&dirh, reduce_visitor *visitor=nullptr, size_t threads=0, bool force_slow_path=false) noexcept
	Reduce the directory identified `dirh`, and everything therein, to the null set. More...

result< traversal_summary >	summarize (const path_handle &topdirh, stat_t::want want=traversal_summary::default_metadata(), summarize_visitor *visitor=nullptr, size_t threads=0, bool force_slow_path=false) noexcept
	Summarise the directory identified `topdirh`, and everything therein. More...

result< size_t >	traverse (const path_handle &dirh, traverse_visitor visitor, size_t threads=0, void data=nullptr, bool force_slow_path=false) noexcept
	Traverse everything within and under `dirh`. More...

template<class T >
bool	operator== (const trivial_vector< T > &a, const trivial_vector< T > &b)
	Compare.

template<class T >
bool	operator!= (const trivial_vector< T > &a, const trivial_vector< T > &b)
	Compare.

template<class T >
bool	operator< (const trivial_vector< T > &a, const trivial_vector< T > &b)
	Compare.

template<class T >
bool	operator<= (const trivial_vector< T > &a, const trivial_vector< T > &b)
	Compare.

template<class T >
bool	operator> (const trivial_vector< T > &a, const trivial_vector< T > &b)
	Compare.

template<class T >
bool	operator>= (const trivial_vector< T > &a, const trivial_vector< T > &b)
	Compare.

template<class T >
void	swap (trivial_vector< T > &a, trivial_vector< T > &b) noexcept
	Swap.

template<class T , class... Args>
result< cached_parent_handle_adapter< T > >	cache_parent (Args &&... args) noexcept
	Constructs a `T` adapted into a parent handle caching implementation. More...

Detailed Description

Collection of file system based algorithms.

Typedef Documentation

◆ xor_handle_adapter

template<class Target , class Source >

using llfio_v2_xxx::algorithm::xor_handle_adapter = typedef combining_handle_adapter<detail::xor_handle_adapter_op, Target, Source>

A handle combining the data from two other handles using XOR.

Template Parameters

Target	The type of the target handle. This is the one written to during any writes i.e. the input and second handle are XORed together and written to the first handle.
Source	The type of the second handle with which to XOR the target handle.

Warning: This class is still in development, do not use.

Function Documentation

◆ cache_parent()

template<class T , class... Args>

result<cached_parent_handle_adapter<T> > llfio_v2_xxx::algorithm::cache_parent ( Args &&... args )

inlinenoexcept

Constructs a T adapted into a parent handle caching implementation.

This function works via the construct<T>() free function framework for which your handle implementation must have registered its construction details.

   {
     construct<T> constructor{std::forward<Args>(args)...};
     OUTCOME_TRY(auto &&h, constructor());
     try
     {
       return cached_parent_handle_adapter<T>(std::move(h), constructor.base, constructor._path);
     }
     catch(...)
     {
       return error_from_exception();
     }
   }

◆ clone_or_copy()

result<file_handle::extent_type> llfio_v2_xxx::algorithm::clone_or_copy	(	file_handle &	src,
		const path_handle &	destdir,
		path_view	destleaf = `{}`,
		bool	preserve_timestamps = `true`,
		bool	force_copy_now = `false`,
		file_handle::creation	creation = `file_handle::creation::always_new`,
		deadline	d = `{}`
	)

inlinenoexcept

Clone or copy the extents of the filesystem entity identified by src to destdir optionally renamed to destleaf.

Returns: The number of bytes cloned or copied.

Parameters

src	The file to clone or copy.
destdir	The base to lookup `destleaf` within.
destleaf	The leafname to use. If empty, use the same leafname as `src` currently has.
preserve_timestamps	Use `stat_t::stamp()` to preserve as much metadata from the original to the clone/copy as possible.
force_copy_now	Parameter to pass to `file_handle::clone_extents()` to force extents to be copied now, not copy-on-write lazily later.
creation	How to create the destination file handle. NOTE that if this is NOT `always_new`, if the destination has identical maximum extent and last modified timestamp (and permissions on POSIX) to the source, it is NOT copied, and zero is returned.
d	Deadline by which to complete the operation.

Firstly, a file_handle is constructed at the destination using creation, which defaults to always creating a new inode. The caching used for the destination handle is replicated from the source handle – be aware that not caching metadata is expensive.

Next file_handle::clone_extents() with emulate_if_unsupported = false is called on the whole file content. If extent cloning is supported, this will be very fast and not consume new disk space (note: except on networked filesystems). If the source file is sparsely allocated, the destination will have identical sparse allocation.

If the previous operation did not succeed, the disk free space is checked using statfs_t, and if the copy would exceed current disk free space, the destination file is unlinked and an error code comparing equal to errc::no_space_on_device is returned.

Next, file_handle::clone_extents() with emulate_if_unsupported = true is called on the whole file content. This copies only the allocated extents in blocks sized whatever is the large page size on this platform (2Mb on x64).

Finally, if preserve_timestamps is true, the destination file handle is restamped with the metadata from the source file handle just before the destination file handle is closed.

◆ compare()

result<comparison_summary> llfio_v2_xxx::algorithm::compare	(	const path_handle &	ldirh,
		const path_handle &	rdirh,
		stat_t::want	want = `comparison_summary::default_metadata()`,
		compare_visitor *	visitor = `nullptr`,
		size_t	threads = `0`,
		bool	force_slow_path = `false`
	)

inlinenoexcept

Compares the directories identified by ldirh and rdirh, and everything therein.

This extends summarize() to summarise two directory hierarchies, also summarising the number of types of differences between them.

It is trivially easy to further extend this implementation to synchronise the contents of a directory tree such that after completion, both trees shall be identical. See the examples directory for an example of this use case.

This is a trivial implementation on top of algorithm::traverse(), indeed it is implemented entirely as header code. You should review the documentation for algorithm::traverse(), as this algorithm is entirely implemented using that algorithm.

   {
     LLFIO_LOG_FUNCTION_CALL(&dirh);
     compare_visitor default_visitor;
     if(visitor == nullptr)
     {
       visitor = &default_visitor;
     }
     result<comparison_summary> state(in_place_type<comparison_summary>);
     state.assume_value().want = want;
     directory_entry entry{{}, stat_t(nullptr)};
     OUTCOME_TRY(entry.stat.fill(dirh, want));
     OUTCOME_TRY(compare_visitor::accumulate(state.assume_value(), &state.assume_value(), nullptr, entry, want));
     OUTCOME_TRY(traverse(dirh, visitor, threads, &state.assume_value(), force_slow_path));
     return state;
   }

◆ contents()

result<contents_visitor::contents_type> llfio_v2_xxx::algorithm::contents	(	const path_handle &	dirh,
		contents_visitor *	visitor = `nullptr`,
		size_t	threads = `0`,
		bool	force_slow_path = `false`
	)

inlinenoexcept

Calculate the contents of everything within and under dirh. What is returned is unordered.

This is a very thin veneer over traverse() which came out of the fact that I kept writing "get me the contents" traversal visitors again and again, so eventually I just wrote a library edition. Its only "clever" bit is that it stores the contents in thread local storage, and merges the contents afterwards.

It is race free to concurrent relocations of dirh. It is entirely implemented in header-only code, as it is very simple.

   {
     contents_visitor default_visitor;
     if(visitor == nullptr)
     {
       visitor = &default_visitor;
     }
     contents_visitor::_state_type state(dirh);
     OUTCOME_TRY(auto &&dirhpath, dirh.current_path());
     state.rootdirpathlen.store(dirhpath.native().size() + 1, std::memory_order_relaxed);
     OUTCOME_TRY(traverse(dirh, visitor, threads, &state, force_slow_path));
     return {std::move(state.contents)};
   }

◆ reduce()

result<size_t> llfio_v2_xxx::algorithm::reduce	(	directory_handle &&	dirh,
		reduce_visitor *	visitor = `nullptr`,
		size_t	threads = `0`,
		bool	force_slow_path = `false`
	)

inlinenoexcept

Reduce the directory identified dirh, and everything therein, to the null set.

You might be surprised to learn that most directory tree removal implementations are of poor quality, not leveraging the filesystem facilities that are available, not handling concurrent modification of the filesystem correctly, having poor performance, or failing to handle not uncommon corner cases. This implementation is considerably better quality, indeed it is to my knowledge the highest quality possible on at least Linux and Microsoft Windows.

The algorithm is as follows:

Attempt to rename dirh to a uniquely named directory. If successful, this causes concurrent users to no longer see the directory tree. It also usefully detects if permissions problems would prevent whole directory tree removal. Note that on Windows, if any process has a handle open to anything inside the directory tree, it is usually the case that all renames will be prevented.
algorithm::traverse() is begun, using the visitor supplied. This will unlink all items using a breadth-first algorithm, from root to tips. With the default visitor, directories which cannot be opened for traversal are ignored; entries which cannot be unlinked are attempted to be renamed into the base directory; entries which cannot be renamed are ignored.
Except for unrenameable files, now the entire directory tree will have been reduced to a minimum possible set of uniquely named items in the base directory, all of which by definition must be undeletable. We now loop attempting to reduce the remaining entries. The default visitor implementation takes a timeout, which if exceeded, an error code comparing equal to errc::timed_out is returned.

Even on slow filesystems such as those on Windows, or networked filesystems, this algorithm performs very well. We do not currently inspect the filing system to see if bisect unlinking directories with millions of entries will perform well (some filing systems can store very large directories with multiple independent inode locks, thus using multiple kernel threads on the same directory sees a large performance increase for very large directories). We also remove items based on enumerated order, under the assumption that filesystems will have optimised for this specific use case.

If the function succeeds, dirh is moved into the function, and the total number of filesystem entries removed is returned.

If the function fails, dirh is NOT moved into the function, and continues to refer to the (likely renamed) directory you passed in. You might do something like try to rename it into storage_backed_temporary_files_directory(), or some other hail mary action.

You should review the documentation for algorithm::traverse(), as this algorithm is entirely implemented using that algorithm.

◆ relink_or_clone_copy_unlink()

result<bool> llfio_v2_xxx::algorithm::relink_or_clone_copy_unlink	(	file_handle &	src,
		const path_handle &	destdir,
		path_view	destleaf = `{}`,
		bool	atomic_replace = `true`,
		bool	preserve_timestamps = `true`,
		bool	force_copy_now = `false`,
		deadline	d = `{}`
	)

inlinenoexcept

Relink or clone-unlink/copy-unlink the extents of the filesystem entity identified by src to destdir optionally renamed to destleaf. This lets you relink files across filing systems, with as close as possible matching of semantics to atomic relinking.

Returns: True if a clone-unlink or copy-unlink was performed, false if a simple relink was sufficient. src on return is the relinked handle, which will be to a completely different inode if a clone-unlink or copy-unlink was performed.

Parameters

src	The file to relink or clone or copy.
destdir	The base to lookup `destleaf` within.
destleaf	The leafname to use. If empty, use the same leafname as `src` currently has.
atomic_replace	Whether any file entry at the destination should be atomically replaced, or an error returned instead.
preserve_timestamps	Use `stat_t::stamp()` to preserve as much metadata from the original to the clone/copy as possible.
force_copy_now	Parameter to pass to `file_handle::clone_extents()` to force extents to be copied now, not copy-on-write lazily later.
d	Deadline by which to complete the operation.

Firstly, .relink() is tried, which if successful, there is an immediate return with value false.

If relinking fails, an anonymous inode file_handle is constructed at the destination. The caching used for the destination handle is replicated from the source handle – be aware that not caching metadata is expensive. If atomic_replace is false, and there is a file entry matching the destination, an error code comparing equal to errc::file_exists will be returned.

Next file_handle::clone_extents() with emulate_if_unsupported = false is called on the whole file content. If extent cloning is supported, this will be very fast and not consume new disk space (note: except on networked filesystems). If the source file is sparsely allocated, the destination will have identical sparse allocation.

If the previous operation did not succeed, the disk free space is checked using statfs_t, and if the copy would exceed current disk free space, the destination file is unlinked and an error code comparing equal to errc::no_space_on_device is returned.

Next, file_handle::clone_extents() with emulate_if_unsupported = true is called on the whole file content. This copies only the allocated extents in blocks sized whatever is the large page size on this platform (2Mb on x64).

Once all the extents have been replicated into temporary inode, a hard link is attempted to the destination leafname. If that fails, if atomic_replace is false then error is returned, otherwise a hard link to a randomly chosen filename is created, and that link is relinked over the destination leafname.

Finally, if preserve_timestamps is true, the destination file handle is restamped with the metadata from the source file handle just before the destination file handle is closed.

◆ summarize()

result<traversal_summary> llfio_v2_xxx::algorithm::summarize	(	const path_handle &	topdirh,
		stat_t::want	want = `traversal_summary::default_metadata()`,
		summarize_visitor *	visitor = `nullptr`,
		size_t	threads = `0`,
		bool	force_slow_path = `false`
	)

inlinenoexcept

Summarise the directory identified topdirh, and everything therein.

It can be useful to summarise a directory hierarchy, especially to determine how much storage it occupies, but also how many mounted filesystems it straddles etc. You should specify what metadata you wish to summarise, if this is a subset of what metadata directory_handle::read() returns, performance will be considerably better. The default summarises all possible metadata.

Most errors during summary are accumulated into stats_failed and directory_opens_failed, rather than failing the summary.

This is a trivial implementation on top of algorithm::traverse(), indeed it is implemented entirely as header code. You should review the documentation for algorithm::traverse(), as this algorithm is entirely implemented using that algorithm.

   {
     LLFIO_LOG_FUNCTION_CALL(&topdirh);
     summarize_visitor default_visitor;
     if(visitor == nullptr)
     {
       visitor = &default_visitor;
     }
     result<traversal_summary> state(in_place_type<traversal_summary>);
     state.assume_value().want = want;
     directory_entry entry{{}, stat_t(nullptr)};
     directory_handle _dirh;
     if(!topdirh.is_directory())
     {
       OUTCOME_TRY(_dirh, directory_handle::directory(topdirh, {}));
     }
     const path_handle &dirh = _dirh.is_valid() ? _dirh : topdirh;
     // We should fail here if we can't stat topdirh
     OUTCOME_TRY(entry.stat.fill(dirh, want));
     summarize_visitor::accumulate(state.assume_value(), &state.assume_value(), nullptr, entry, want);
     OUTCOME_TRY(traverse(dirh, visitor, threads, &state.assume_value(), force_slow_path));
     return state;
   }

◆ traverse()

result<size_t> llfio_v2_xxx::algorithm::traverse	(	const path_handle &	dirh,
		traverse_visitor *	visitor,
		size_t	threads = `0`,
		void *	data = `nullptr`,
		bool	force_slow_path = `false`
	)

inlinenoexcept

Traverse everything within and under dirh.

The algorithm is as follows:

Call pre_enumeration() of the visitor on the directory_handle about to be enumerated.
Enumerate the contents of the directory.
Call post_enumeration() of the visitor on the contents just enumerated.
For each directory in the contents, append a base directory handle and a directory fragment to its hierarchy depth level in a stack of lists.
Loop, using the least deep available item in the stack, until the stack is empty.

If known_dirs_remaining exceeds four, a threadpool of not more than threads threads is spun up in order to traverse the hierarchy more quickly.

This algorithm is therefore primarily a breadth-first algorithm, in that we proceed from root, level by level, to the tips. The number returned is the total number of directories traversed.

Notes

The implementation tries hard to not open too many file descriptors at a time in order to not exceed the system limit, which may be as low as 1024 on POSIX. On POSIX, it checks getrlimit(RLIMIT_NOFILE) for the soft limit on open file descriptors, and if the remaining unused open file descriptors is less than 65536, it will prefer a slow path implementation which exchanges file descriptor usage for lots more dynamic memory allocation and memory copying. You can force this slow path on any platform using force_slow_path, and in correct code you should always check for failures in directory_open_failed() comparing equal to errc::too_many_files_open, and if encountered restart the traversal using the slow path forced.

Almost every modern POSIX system allows a RLIMIT_NOFILE of over a million nowadays, so you should setrlimit(RLIMIT_NOFILE) appropriately in your program if you are absolutely sure that doing so will not break code in your program (e.g. select() fails spectacularly if file descriptors exceed 1024 on most POSIX).

To give an idea of the difference slow path makes, for Linux ext4:

Slow path, 1 thread, traversed 131,915 directories and 8,254,162 entries in 3.10 seconds.
Slow path, 16 threads, traversed 131,915 directories and 8,254,162 entries in 0.966 seconds.
Fast path, 1 thread, traversed 131,915 directories and 8,254,162 entries in 2.73 seconds (+12%).
Fast path, 16 threads, traversed 131,915 directories and 8,254,162 entries in 0.525 seconds (+46%).

Namespaces

Classes

Typedefs

Functions

Detailed Description

Typedef Documentation

◆ xor_handle_adapter

Function Documentation

◆ cache_parent()

◆ clone_or_copy()

◆ compare()

◆ contents()

◆ reduce()

◆ relink_or_clone_copy_unlink()

◆ summarize()

◆ traverse()

Notes