LLFIO  v2.00
llfio_v2_xxx::algorithm Namespace Reference

Collection of file system based algorithms. More...

Namespaces

 impl
 Does not exist in the actual source code, purely here to workaround doxygen limitations.
 
 shared_fs_mutex
 Algorithms for protecting a shared filing system resource from racy modification.
 

Classes

struct  contents_visitor
 A visitor for the filesystem contents algorithm. More...
 
struct  comparison_summary
 
struct  compare_visitor
 A visitor for the filesystem traversal and comparison algorithm. More...
 
struct  reduce_visitor
 A visitor for the filesystem traversal and reduction algorithm. More...
 
struct  traversal_summary
 A summary of a directory tree. More...
 
struct  summarize_visitor
 A visitor for the filesystem traversal and summary algorithm. More...
 
struct  traverse_visitor
 A visitor for the filesystem traversal algorithm. More...
 
class  trivial_vector
 Provides a constant time capacity expanding move-only STL vector. Requires T to be trivially copyable. More...
 
class  cached_parent_handle_adapter
 Adapts any construct()-able implementation to cache its parent directory handle in a process wide cache. More...
 
class  combining_handle_adapter
 A handle combining the data from one or two other handles. More...
 

Typedefs

template<class Target , class Source >
using xor_handle_adapter = combining_handle_adapter< detail::xor_handle_adapter_op, Target, Source >
 A handle combining the data from two other handles using XOR. More...
 

Functions

result< file_handle::extent_type > clone_or_copy (file_handle &src, const path_handle &destdir, path_view destleaf={}, bool preserve_timestamps=true, bool force_copy_now=false, file_handle::creation creation=file_handle::creation::always_new, deadline d={}) noexcept
 Clone or copy the extents of the filesystem entity identified by src to destdir optionally renamed to destleaf. More...
 
result< bool > relink_or_clone_copy_unlink (file_handle &src, const path_handle &destdir, path_view destleaf={}, bool atomic_replace=true, bool preserve_timestamps=true, bool force_copy_now=false, deadline d={}) noexcept
 Relink or clone-unlink/copy-unlink the extents of the filesystem entity identified by src to destdir optionally renamed to destleaf. This lets you relink files across filing systems, with as close as possible matching of semantics to atomic relinking. More...
 
result< contents_visitor::contents_typecontents (const path_handle &dirh, contents_visitor *visitor=nullptr, size_t threads=0, bool force_slow_path=false) noexcept
 Calculate the contents of everything within and under dirh. What is returned is unordered. More...
 
result< comparison_summarycompare (const path_handle &ldirh, const path_handle &rdirh, stat_t::want want=comparison_summary::default_metadata(), compare_visitor *visitor=nullptr, size_t threads=0, bool force_slow_path=false) noexcept
 Compares the directories identified by ldirh and rdirh, and everything therein. More...
 
result< size_t > reduce (directory_handle &&dirh, reduce_visitor *visitor=nullptr, size_t threads=0, bool force_slow_path=false) noexcept
 Reduce the directory identified dirh, and everything therein, to the null set. More...
 
result< traversal_summarysummarize (const path_handle &topdirh, stat_t::want want=traversal_summary::default_metadata(), summarize_visitor *visitor=nullptr, size_t threads=0, bool force_slow_path=false) noexcept
 Summarise the directory identified topdirh, and everything therein. More...
 
result< size_t > traverse (const path_handle &dirh, traverse_visitor *visitor, size_t threads=0, void *data=nullptr, bool force_slow_path=false) noexcept
 Traverse everything within and under dirh. More...
 
template<class T >
bool operator== (const trivial_vector< T > &a, const trivial_vector< T > &b)
 Compare.
 
template<class T >
bool operator!= (const trivial_vector< T > &a, const trivial_vector< T > &b)
 Compare.
 
template<class T >
bool operator< (const trivial_vector< T > &a, const trivial_vector< T > &b)
 Compare.
 
template<class T >
bool operator<= (const trivial_vector< T > &a, const trivial_vector< T > &b)
 Compare.
 
template<class T >
bool operator> (const trivial_vector< T > &a, const trivial_vector< T > &b)
 Compare.
 
template<class T >
bool operator>= (const trivial_vector< T > &a, const trivial_vector< T > &b)
 Compare.
 
template<class T >
void swap (trivial_vector< T > &a, trivial_vector< T > &b) noexcept
 Swap.
 
template<class T , class... Args>
result< cached_parent_handle_adapter< T > > cache_parent (Args &&... args) noexcept
 Constructs a T adapted into a parent handle caching implementation. More...
 

Detailed Description

Collection of file system based algorithms.

Typedef Documentation

◆ xor_handle_adapter

template<class Target , class Source >
using llfio_v2_xxx::algorithm::xor_handle_adapter = typedef combining_handle_adapter<detail::xor_handle_adapter_op, Target, Source>

A handle combining the data from two other handles using XOR.

Template Parameters
TargetThe type of the target handle. This is the one written to during any writes i.e. the input and second handle are XORed together and written to the first handle.
SourceThe type of the second handle with which to XOR the target handle.
Warning
This class is still in development, do not use.

Function Documentation

◆ cache_parent()

template<class T , class... Args>
result<cached_parent_handle_adapter<T> > llfio_v2_xxx::algorithm::cache_parent ( Args &&...  args)
inlinenoexcept

Constructs a T adapted into a parent handle caching implementation.

This function works via the construct<T>() free function framework for which your handle implementation must have registered its construction details.

182  {
183  construct<T> constructor{std::forward<Args>(args)...};
184  OUTCOME_TRY(auto &&h, constructor());
185  try
186  {
187  return cached_parent_handle_adapter<T>(std::move(h), constructor.base, constructor._path);
188  }
189  catch(...)
190  {
191  return error_from_exception();
192  }
193  }

◆ clone_or_copy()

result<file_handle::extent_type> llfio_v2_xxx::algorithm::clone_or_copy ( file_handle src,
const path_handle destdir,
path_view  destleaf = {},
bool  preserve_timestamps = true,
bool  force_copy_now = false,
file_handle::creation  creation = file_handle::creation::always_new,
deadline  d = {} 
)
inlinenoexcept

Clone or copy the extents of the filesystem entity identified by src to destdir optionally renamed to destleaf.

Returns
The number of bytes cloned or copied.
Parameters
srcThe file to clone or copy.
destdirThe base to lookup destleaf within.
destleafThe leafname to use. If empty, use the same leafname as src currently has.
preserve_timestampsUse stat_t::stamp() to preserve as much metadata from the original to the clone/copy as possible.
force_copy_nowParameter to pass to file_handle::clone_extents() to force extents to be copied now, not copy-on-write lazily later.
creationHow to create the destination file handle. NOTE that if this is NOT always_new, if the destination has identical maximum extent and last modified timestamp (and permissions on POSIX) to the source, it is NOT copied, and zero is returned.
dDeadline by which to complete the operation.

Firstly, a file_handle is constructed at the destination using creation, which defaults to always creating a new inode. The caching used for the destination handle is replicated from the source handle – be aware that not caching metadata is expensive.

Next file_handle::clone_extents() with emulate_if_unsupported = false is called on the whole file content. If extent cloning is supported, this will be very fast and not consume new disk space (note: except on networked filesystems). If the source file is sparsely allocated, the destination will have identical sparse allocation.

If the previous operation did not succeed, the disk free space is checked using statfs_t, and if the copy would exceed current disk free space, the destination file is unlinked and an error code comparing equal to errc::no_space_on_device is returned.

Next, file_handle::clone_extents() with emulate_if_unsupported = true is called on the whole file content. This copies only the allocated extents in blocks sized whatever is the large page size on this platform (2Mb on x64).

Finally, if preserve_timestamps is true, the destination file handle is restamped with the metadata from the source file handle just before the destination file handle is closed.

◆ compare()

result<comparison_summary> llfio_v2_xxx::algorithm::compare ( const path_handle ldirh,
const path_handle rdirh,
stat_t::want  want = comparison_summary::default_metadata(),
compare_visitor visitor = nullptr,
size_t  threads = 0,
bool  force_slow_path = false 
)
inlinenoexcept

Compares the directories identified by ldirh and rdirh, and everything therein.

This extends summarize() to summarise two directory hierarchies, also summarising the number of types of differences between them.

It is trivially easy to further extend this implementation to synchronise the contents of a directory tree such that after completion, both trees shall be identical. See the examples directory for an example of this use case.

This is a trivial implementation on top of algorithm::traverse(), indeed it is implemented entirely as header code. You should review the documentation for algorithm::traverse(), as this algorithm is entirely implemented using that algorithm.

106  {
107  LLFIO_LOG_FUNCTION_CALL(&dirh);
108  compare_visitor default_visitor;
109  if(visitor == nullptr)
110  {
111  visitor = &default_visitor;
112  }
113  result<comparison_summary> state(in_place_type<comparison_summary>);
114  state.assume_value().want = want;
115  directory_entry entry{{}, stat_t(nullptr)};
116  OUTCOME_TRY(entry.stat.fill(dirh, want));
117  OUTCOME_TRY(compare_visitor::accumulate(state.assume_value(), &state.assume_value(), nullptr, entry, want));
118  OUTCOME_TRY(traverse(dirh, visitor, threads, &state.assume_value(), force_slow_path));
119  return state;
120  }
result< size_t > traverse(const path_handle &dirh, traverse_visitor *visitor, size_t threads=0, void *data=nullptr, bool force_slow_path=false) noexcept
Traverse everything within and under dirh.

◆ contents()

result<contents_visitor::contents_type> llfio_v2_xxx::algorithm::contents ( const path_handle dirh,
contents_visitor visitor = nullptr,
size_t  threads = 0,
bool  force_slow_path = false 
)
inlinenoexcept

Calculate the contents of everything within and under dirh. What is returned is unordered.

This is a very thin veneer over traverse() which came out of the fact that I kept writing "get me the contents" traversal visitors again and again, so eventually I just wrote a library edition. Its only "clever" bit is that it stores the contents in thread local storage, and merges the contents afterwards.

It is race free to concurrent relocations of dirh. It is entirely implemented in header-only code, as it is very simple.

225  {
226  contents_visitor default_visitor;
227  if(visitor == nullptr)
228  {
229  visitor = &default_visitor;
230  }
231  contents_visitor::_state_type state(dirh);
232  OUTCOME_TRY(auto &&dirhpath, dirh.current_path());
233  state.rootdirpathlen.store(dirhpath.native().size() + 1, std::memory_order_relaxed);
234  OUTCOME_TRY(traverse(dirh, visitor, threads, &state, force_slow_path));
235  return {std::move(state.contents)};
236  }

◆ reduce()

result<size_t> llfio_v2_xxx::algorithm::reduce ( directory_handle &&  dirh,
reduce_visitor visitor = nullptr,
size_t  threads = 0,
bool  force_slow_path = false 
)
inlinenoexcept

Reduce the directory identified dirh, and everything therein, to the null set.

You might be surprised to learn that most directory tree removal implementations are of poor quality, not leveraging the filesystem facilities that are available, not handling concurrent modification of the filesystem correctly, having poor performance, or failing to handle not uncommon corner cases. This implementation is considerably better quality, indeed it is to my knowledge the highest quality possible on at least Linux and Microsoft Windows.

The algorithm is as follows:

  1. Attempt to rename dirh to a uniquely named directory. If successful, this causes concurrent users to no longer see the directory tree. It also usefully detects if permissions problems would prevent whole directory tree removal. Note that on Windows, if any process has a handle open to anything inside the directory tree, it is usually the case that all renames will be prevented.
  2. algorithm::traverse() is begun, using the visitor supplied. This will unlink all items using a breadth-first algorithm, from root to tips. With the default visitor, directories which cannot be opened for traversal are ignored; entries which cannot be unlinked are attempted to be renamed into the base directory; entries which cannot be renamed are ignored.
  3. Except for unrenameable files, now the entire directory tree will have been reduced to a minimum possible set of uniquely named items in the base directory, all of which by definition must be undeletable. We now loop attempting to reduce the remaining entries. The default visitor implementation takes a timeout, which if exceeded, an error code comparing equal to errc::timed_out is returned.

Even on slow filesystems such as those on Windows, or networked filesystems, this algorithm performs very well. We do not currently inspect the filing system to see if bisect unlinking directories with millions of entries will perform well (some filing systems can store very large directories with multiple independent inode locks, thus using multiple kernel threads on the same directory sees a large performance increase for very large directories). We also remove items based on enumerated order, under the assumption that filesystems will have optimised for this specific use case.

If the function succeeds, dirh is moved into the function, and the total number of filesystem entries removed is returned.

If the function fails, dirh is NOT moved into the function, and continues to refer to the (likely renamed) directory you passed in. You might do something like try to rename it into storage_backed_temporary_files_directory(), or some other hail mary action.

You should review the documentation for algorithm::traverse(), as this algorithm is entirely implemented using that algorithm.

◆ relink_or_clone_copy_unlink()

result<bool> llfio_v2_xxx::algorithm::relink_or_clone_copy_unlink ( file_handle src,
const path_handle destdir,
path_view  destleaf = {},
bool  atomic_replace = true,
bool  preserve_timestamps = true,
bool  force_copy_now = false,
deadline  d = {} 
)
inlinenoexcept

Relink or clone-unlink/copy-unlink the extents of the filesystem entity identified by src to destdir optionally renamed to destleaf. This lets you relink files across filing systems, with as close as possible matching of semantics to atomic relinking.

Returns
True if a clone-unlink or copy-unlink was performed, false if a simple relink was sufficient. src on return is the relinked handle, which will be to a completely different inode if a clone-unlink or copy-unlink was performed.
Parameters
srcThe file to relink or clone or copy.
destdirThe base to lookup destleaf within.
destleafThe leafname to use. If empty, use the same leafname as src currently has.
atomic_replaceWhether any file entry at the destination should be atomically replaced, or an error returned instead.
preserve_timestampsUse stat_t::stamp() to preserve as much metadata from the original to the clone/copy as possible.
force_copy_nowParameter to pass to file_handle::clone_extents() to force extents to be copied now, not copy-on-write lazily later.
dDeadline by which to complete the operation.

Firstly, .relink() is tried, which if successful, there is an immediate return with value false.

If relinking fails, an anonymous inode file_handle is constructed at the destination. The caching used for the destination handle is replicated from the source handle – be aware that not caching metadata is expensive. If atomic_replace is false, and there is a file entry matching the destination, an error code comparing equal to errc::file_exists will be returned.

Next file_handle::clone_extents() with emulate_if_unsupported = false is called on the whole file content. If extent cloning is supported, this will be very fast and not consume new disk space (note: except on networked filesystems). If the source file is sparsely allocated, the destination will have identical sparse allocation.

If the previous operation did not succeed, the disk free space is checked using statfs_t, and if the copy would exceed current disk free space, the destination file is unlinked and an error code comparing equal to errc::no_space_on_device is returned.

Next, file_handle::clone_extents() with emulate_if_unsupported = true is called on the whole file content. This copies only the allocated extents in blocks sized whatever is the large page size on this platform (2Mb on x64).

Once all the extents have been replicated into temporary inode, a hard link is attempted to the destination leafname. If that fails, if atomic_replace is false then error is returned, otherwise a hard link to a randomly chosen filename is created, and that link is relinked over the destination leafname.

Finally, if preserve_timestamps is true, the destination file handle is restamped with the metadata from the source file handle just before the destination file handle is closed.

◆ summarize()

result<traversal_summary> llfio_v2_xxx::algorithm::summarize ( const path_handle topdirh,
stat_t::want  want = traversal_summary::default_metadata(),
summarize_visitor visitor = nullptr,
size_t  threads = 0,
bool  force_slow_path = false 
)
inlinenoexcept

Summarise the directory identified topdirh, and everything therein.

It can be useful to summarise a directory hierarchy, especially to determine how much storage it occupies, but also how many mounted filesystems it straddles etc. You should specify what metadata you wish to summarise, if this is a subset of what metadata directory_handle::read() returns, performance will be considerably better. The default summarises all possible metadata.

Most errors during summary are accumulated into stats_failed and directory_opens_failed, rather than failing the summary.

This is a trivial implementation on top of algorithm::traverse(), indeed it is implemented entirely as header code. You should review the documentation for algorithm::traverse(), as this algorithm is entirely implemented using that algorithm.

275  {
276  LLFIO_LOG_FUNCTION_CALL(&topdirh);
277  summarize_visitor default_visitor;
278  if(visitor == nullptr)
279  {
280  visitor = &default_visitor;
281  }
282  result<traversal_summary> state(in_place_type<traversal_summary>);
283  state.assume_value().want = want;
284  directory_entry entry{{}, stat_t(nullptr)};
285  directory_handle _dirh;
286  if(!topdirh.is_directory())
287  {
288  OUTCOME_TRY(_dirh, directory_handle::directory(topdirh, {}));
289  }
290  const path_handle &dirh = _dirh.is_valid() ? _dirh : topdirh;
291  // We should fail here if we can't stat topdirh
292  OUTCOME_TRY(entry.stat.fill(dirh, want));
293  summarize_visitor::accumulate(state.assume_value(), &state.assume_value(), nullptr, entry, want);
294  OUTCOME_TRY(traverse(dirh, visitor, threads, &state.assume_value(), force_slow_path));
295  return state;
296  }
result< directory_handle > directory(const path_handle &base, directory_handle::path_view_type path, directory_handle::mode _mode=directory_handle::mode::read, directory_handle::creation _creation=directory_handle::creation::open_existing, directory_handle::caching _caching=directory_handle::caching::all, directory_handle::flag flags=directory_handle::flag::none) noexcept
Definition: directory_handle.hpp:435

◆ traverse()

result<size_t> llfio_v2_xxx::algorithm::traverse ( const path_handle dirh,
traverse_visitor visitor,
size_t  threads = 0,
void *  data = nullptr,
bool  force_slow_path = false 
)
inlinenoexcept

Traverse everything within and under dirh.

The algorithm is as follows:

  1. Call pre_enumeration() of the visitor on the directory_handle about to be enumerated.
  2. Enumerate the contents of the directory.
  3. Call post_enumeration() of the visitor on the contents just enumerated.
  4. For each directory in the contents, append a base directory handle and a directory fragment to its hierarchy depth level in a stack of lists.
  5. Loop, using the least deep available item in the stack, until the stack is empty.

If known_dirs_remaining exceeds four, a threadpool of not more than threads threads is spun up in order to traverse the hierarchy more quickly.

This algorithm is therefore primarily a breadth-first algorithm, in that we proceed from root, level by level, to the tips. The number returned is the total number of directories traversed.

Notes

The implementation tries hard to not open too many file descriptors at a time in order to not exceed the system limit, which may be as low as 1024 on POSIX. On POSIX, it checks getrlimit(RLIMIT_NOFILE) for the soft limit on open file descriptors, and if the remaining unused open file descriptors is less than 65536, it will prefer a slow path implementation which exchanges file descriptor usage for lots more dynamic memory allocation and memory copying. You can force this slow path on any platform using force_slow_path, and in correct code you should always check for failures in directory_open_failed() comparing equal to errc::too_many_files_open, and if encountered restart the traversal using the slow path forced.

Almost every modern POSIX system allows a RLIMIT_NOFILE of over a million nowadays, so you should setrlimit(RLIMIT_NOFILE) appropriately in your program if you are absolutely sure that doing so will not break code in your program (e.g. select() fails spectacularly if file descriptors exceed 1024 on most POSIX).

To give an idea of the difference slow path makes, for Linux ext4:

  • Slow path, 1 thread, traversed 131,915 directories and 8,254,162 entries in 3.10 seconds.
  • Slow path, 16 threads, traversed 131,915 directories and 8,254,162 entries in 0.966 seconds.
  • Fast path, 1 thread, traversed 131,915 directories and 8,254,162 entries in 2.73 seconds (+12%).
  • Fast path, 16 threads, traversed 131,915 directories and 8,254,162 entries in 0.525 seconds (+46%).