LLFIO  v2.00
llfio_v2_xxx::map_handle Class Reference

A handle to a memory mapped region of memory, either backed by the system page file or by a section. More...

#include "map_handle.hpp"

Inheritance diagram for llfio_v2_xxx::map_handle:
llfio_v2_xxx::lockable_byte_io_handle llfio_v2_xxx::byte_io_handle llfio_v2_xxx::handle

Classes

struct  cache_statistics
 Statistics about the map handle cache. More...
 

Public Types

enum class  memory_accounting_kind { unknown , commit_charge , over_commit }
 The kind of memory accounting this system uses. More...
 
using extent_type = byte_io_handle::extent_type
 
using size_type = byte_io_handle::size_type
 
using mode = byte_io_handle::mode
 
using creation = byte_io_handle::creation
 
using caching = byte_io_handle::caching
 
using flag = byte_io_handle::flag
 
using buffer_type = byte_io_handle::buffer_type
 
using const_buffer_type = byte_io_handle::const_buffer_type
 
using buffers_type = byte_io_handle::buffers_type
 
using const_buffers_type = byte_io_handle::const_buffers_type
 
template<class T >
using io_request = byte_io_handle::io_request< T >
 
template<class T >
using io_result = byte_io_handle::io_result< T >
 
using path_type = byte_io_handle::path_type
 
using barrier_kind = byte_io_multiplexer::barrier_kind
 
using registered_buffer_type = byte_io_multiplexer::registered_buffer_type
 
template<class T >
using awaitable = byte_io_multiplexer::awaitable< T >
 

Public Member Functions

constexpr map_handle ()
 Default constructor.
 
 map_handle (byte *addr, size_type length, size_type pagesize, section_handle::flag flags, section_handle *section=nullptr, extent_type offset=0) noexcept
 Construct an instance managing pages at addr, length, pagesize and flags
 
constexpr map_handle (map_handle &&o) noexcept
 Implicit move construction of map_handle permitted.
 
 map_handle (const map_handle &)=delete
 No copy construction (use clone())
 
map_handleoperator= (map_handle &&o) noexcept
 Move assignment of map_handle permitted.
 
map_handleoperator= (const map_handle &)=delete
 No copy assignment.
 
void swap (map_handle &o) noexcept
 Swap with another instance.
 
virtual result< void > close () noexcept override
 Unmap the mapped view.
 
virtual native_handle_type release () noexcept override
 Releases the mapped view, but does NOT release the native handle.
 
section_handlesection () const noexcept
 The memory section this handle is using.
 
void set_section (section_handle *s) noexcept
 Sets the memory section this handle is using.
 
byte * address () const noexcept
 The address in memory where this mapped view resides.
 
extent_type offset () const noexcept
 The offset of the memory map.
 
size_type capacity () const noexcept
 The reservation size of the memory map.
 
size_type length () const noexcept
 The size of the memory map. This is the accessible size, NOT the reservation size.
 
span< byte > as_span () noexcept
 The memory map as a span of bytes.
 
span< const byte > as_span () const noexcept
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
 
size_type page_size () const noexcept
 The page size used by the map, in bytes.
 
bool is_nvram () const noexcept
 True if the map is of non-volatile RAM.
 
result< size_type > update_map () noexcept
 Update the size of the memory map to that of any backing section, up to the reservation limit.
 
result< size_type > truncate (size_type newsize, bool permit_relocation) noexcept
 
result< buffer_typecommit (buffer_type region, section_handle::flag flag=section_handle::flag::readwrite) noexcept
 
result< buffer_typedecommit (buffer_type region) noexcept
 
result< void > zero_memory (buffer_type region) noexcept
 
result< buffer_typedo_not_store (buffer_type region) noexcept
 
io_result< buffers_type > read (io_request< buffers_type > reqs, deadline d=deadline()) noexcept
 Read data from the open handle, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation. More...
 
io_result< buffers_type > read (registered_buffer_type base, io_request< buffers_type > reqs, deadline d=deadline()) noexcept
 
io_result< size_type > read (extent_type offset, std::initializer_list< buffer_type > lst, deadline d=deadline()) noexcept
 
io_result< const_buffers_type > write (io_request< const_buffers_type > reqs, deadline d=deadline()) noexcept
 Write data to the open handle, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation. More...
 
io_result< const_buffers_type > write (registered_buffer_type base, io_request< const_buffers_type > reqs, deadline d=deadline()) noexcept
 
io_result< size_type > write (extent_type offset, std::initializer_list< const_buffer_type > lst, deadline d=deadline()) noexcept
 
virtual result< void > lock_file () noexcept
 Locks the inode referred to by the open handle for exclusive access. More...
 
virtual bool try_lock_file () noexcept
 Tries to lock the inode referred to by the open handle for exclusive access, returning false if lock is currently unavailable. More...
 
virtual void unlock_file () noexcept
 Unlocks a previously acquired exclusive lock.
 
virtual result< void > lock_file_shared () noexcept
 Locks the inode referred to by the open handle for shared access. More...
 
virtual bool try_lock_file_shared () noexcept
 Tries to lock the inode referred to by the open handle for shared access, returning false if lock is currently unavailable. More...
 
virtual void unlock_file_shared () noexcept
 Unlocks a previously acquired shared lock.
 
virtual result< extent_guardlock_file_range (extent_type offset, extent_type bytes, lock_kind kind, deadline d=deadline()) noexcept
 EXTENSION: Tries to lock the range of bytes specified for shared or exclusive access. Note that this may, or MAY NOT, observe whole file locks placed with lock(), lock_shared() etc. More...
 
result< extent_guardlock_file_range (io_request< buffers_type > reqs, deadline d=deadline()) noexcept
 
result< extent_guardlock_file_range (io_request< const_buffers_type > reqs, deadline d=deadline()) noexcept
 
template<class... Args>
bool try_lock_file_range (Args &&... args) noexcept
 
template<class... Args, class Rep , class Period >
bool try_lock_file_range_for (Args &&... args, const std::chrono::duration< Rep, Period > &duration) noexcept
 
template<class... Args, class Clock , class Duration >
bool try_lock_file_range_until (Args &&... args, const std::chrono::time_point< Clock, Duration > &timeout) noexcept
 
virtual void unlock_file_range (extent_type offset, extent_type bytes) noexcept
 EXTENSION: Unlocks a byte range previously locked. More...
 
byte_io_multiplexermultiplexer () const noexcept
 The i/o multiplexer this handle will use to multiplex i/o. If this returns null, then this handle has not been registered with an i/o multiplexer yet.
 
virtual result< void > set_multiplexer (byte_io_multiplexer *c=this_thread::multiplexer()) noexcept
 Sets the i/o multiplexer this handle will use to implement read(), write() and barrier(). More...
 
size_t max_buffers () const noexcept
 The maximum number of buffers which a single read or write syscall can (atomically) process at a time for this specific open handle. On POSIX, this is known as IOV_MAX. Preferentially uses any i/o multiplexer set over the virtually overridable per-class implementation. More...
 
result< registered_buffer_type > allocate_registered_buffer (size_t &bytes) noexcept
 Request the allocation of a new registered i/o buffer with the system suitable for maximum performance i/o, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation. More...
 
io_result< buffers_type > read (io_request< buffers_type > reqs, deadline d=deadline()) noexcept
 Read data from the open handle, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation. More...
 
io_result< buffers_type > read (registered_buffer_type base, io_request< buffers_type > reqs, deadline d=deadline()) noexcept
 
io_result< size_type > read (extent_type offset, std::initializer_list< buffer_type > lst, deadline d=deadline()) noexcept
 
template<class... Args>
bool try_read (Args &&... args) noexcept
 
template<class... Args, class Rep , class Period >
bool try_read_for (Args &&... args, const std::chrono::duration< Rep, Period > &duration) noexcept
 
template<class... Args, class Clock , class Duration >
bool try_read_until (Args &&... args, const std::chrono::time_point< Clock, Duration > &timeout) noexcept
 
io_result< const_buffers_type > write (io_request< const_buffers_type > reqs, deadline d=deadline()) noexcept
 Write data to the open handle, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation. More...
 
io_result< const_buffers_type > write (registered_buffer_type base, io_request< const_buffers_type > reqs, deadline d=deadline()) noexcept
 
io_result< size_type > write (extent_type offset, std::initializer_list< const_buffer_type > lst, deadline d=deadline()) noexcept
 
template<class... Args>
bool try_write (Args &&... args) noexcept
 
template<class... Args, class Rep , class Period >
bool try_write_for (Args &&... args, const std::chrono::duration< Rep, Period > &duration) noexcept
 
template<class... Args, class Clock , class Duration >
bool try_write_until (Args &&... args, const std::chrono::time_point< Clock, Duration > &timeout) noexcept
 
virtual io_result< const_buffers_type > barrier (io_request< const_buffers_type > reqs=io_request< const_buffers_type >(), barrier_kind kind=barrier_kind::nowait_data_only, deadline d=deadline()) noexcept
 Issue a write reordering barrier such that writes preceding the barrier will reach storage before writes after this barrier, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation. More...
 
io_result< const_buffers_type > barrier (barrier_kind kind, deadline d=deadline()) noexcept
 
template<class... Args>
bool try_barrier (Args &&... args) noexcept
 
template<class... Args, class Rep , class Period >
bool try_barrier_for (Args &&... args, const std::chrono::duration< Rep, Period > &duration) noexcept
 
template<class... Args, class Clock , class Duration >
bool try_barrier_until (Args &&... args, const std::chrono::time_point< Clock, Duration > &timeout) noexcept
 
awaitable< io_result< buffers_type > > co_read (io_request< buffers_type > reqs, deadline d=deadline()) noexcept
 A coroutinised equivalent to .read() which suspends the coroutine until the i/o finishes. Blocks execution i.e is equivalent to .read() if no i/o multiplexer has been set on this handle! More...
 
awaitable< io_result< buffers_type > > co_read (registered_buffer_type base, io_request< buffers_type > reqs, deadline d=deadline()) noexcept
 
awaitable< io_result< const_buffers_type > > co_write (io_request< const_buffers_type > reqs, deadline d=deadline()) noexcept
 A coroutinised equivalent to .write() which suspends the coroutine until the i/o finishes. Blocks execution i.e is equivalent to .write() if no i/o multiplexer has been set on this handle! More...
 
awaitable< io_result< const_buffers_type > > co_write (registered_buffer_type base, io_request< const_buffers_type > reqs, deadline d=deadline()) noexcept
 
awaitable< io_result< const_buffers_type > > co_barrier (io_request< const_buffers_type > reqs=io_request< const_buffers_type >(), barrier_kind kind=barrier_kind::nowait_data_only, deadline d=deadline()) noexcept
 A coroutinised equivalent to .barrier() which suspends the coroutine until the i/o finishes. Blocks execution i.e is equivalent to .barrier() if no i/o multiplexer has been set on this handle! More...
 
flag flags () const noexcept
 The flags this handle was opened with.
 
 QUICKCPPLIB_BITFIELD_BEGIN_T (flag, uint16_t)
 Bitwise flags which can be specified. More...
 
void swap (handle &o) noexcept
 Swap with another instance.
 
virtual result< path_type > current_path () const noexcept
 
result< handleclone () const noexcept
 
bool is_valid () const noexcept
 True if the handle is valid (and usually open)
 
bool is_readable () const noexcept
 True if the handle is readable.
 
bool is_writable () const noexcept
 True if the handle is writable.
 
bool is_append_only () const noexcept
 True if the handle is append only.
 
virtual result< void > set_append_only (bool enable) noexcept
 EXTENSION: Changes whether this handle is append only or not. More...
 
bool is_multiplexable () const noexcept
 True if multiplexable.
 
bool is_nonblocking () const noexcept
 True if nonblocking.
 
bool is_seekable () const noexcept
 True if seekable.
 
bool requires_aligned_io () const noexcept
 True if requires aligned i/o.
 
bool is_kernel_handle () const noexcept
 True if native_handle() is a valid kernel handle.
 
bool is_regular () const noexcept
 True if a regular file or device.
 
bool is_directory () const noexcept
 True if a directory.
 
bool is_symlink () const noexcept
 True if a symlink.
 
bool is_pipe () const noexcept
 True if a pipe.
 
bool is_socket () const noexcept
 True if a socket.
 
bool is_multiplexer () const noexcept
 True if a multiplexer like BSD kqueues, Linux epoll or Windows IOCP.
 
bool is_process () const noexcept
 True if a process.
 
bool is_section () const noexcept
 True if a memory section.
 
bool is_allocation () const noexcept
 True if a memory allocation.
 
bool is_path () const noexcept
 True if a path or a directory.
 
bool is_tls_socket () const noexcept
 True if a TLS socket.
 
bool is_http_socket () const noexcept
 True if a HTTP socket.
 
caching kernel_caching () const noexcept
 Kernel cache strategy used by this handle.
 
bool are_reads_from_cache () const noexcept
 True if the handle uses the kernel page cache for reads.
 
bool are_writes_durable () const noexcept
 True if writes are safely on storage on completion.
 
bool are_safety_barriers_issued () const noexcept
 True if issuing safety fsyncs is on.
 
native_handle_type native_handle () const noexcept
 The native handle used by this handle.
 

Static Public Member Functions

static result< map_handlemap (size_type bytes, bool zeroed=false, section_handle::flag _flag=section_handle::flag::readwrite) noexcept
 
static result< map_handlereserve (size_type bytes) noexcept
 
static result< map_handlemap (section_handle &section, size_type bytes=0, extent_type offset=0, section_handle::flag _flag=section_handle::flag::readwrite) noexcept
 
static memory_accounting_kind memory_accounting () noexcept
 
static cache_statistics trim_cache (std::chrono::steady_clock::time_point older_than={}, size_t max_items=(size_t) -1) noexcept
 
static bool set_cache_disabled (bool disabled) noexcept
 
static result< span< buffer_type > > prefetch (span< buffer_type > regions) noexcept
 
static result< buffer_typeprefetch (buffer_type region) noexcept
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
 

Protected Member Functions

 map_handle (section_handle *section, section_handle::flag flags)
 
virtual size_t _do_max_buffers () const noexcept override
 The virtualised implementation of max_buffers() used if no multiplexer has been set.
 
virtual io_result< const_buffers_type > _do_barrier (io_request< const_buffers_type > reqs=io_request< const_buffers_type >(), barrier_kind kind=barrier_kind::nowait_data_only, deadline d=deadline()) noexcept override
 The virtualised implementation of barrier() used if no multiplexer has been set.
 
virtual io_result< buffers_type > _do_read (io_request< buffers_type > reqs, deadline d=deadline()) noexcept override
 The virtualised implementation of read() used if no multiplexer has been set.
 
virtual io_result< const_buffers_type > _do_write (io_request< const_buffers_type > reqs, deadline d=deadline()) noexcept override
 The virtualised implementation of write() used if no multiplexer has been set.
 
bool _recycle_map () noexcept
 
virtual result< registered_buffer_type > _do_allocate_registered_buffer (size_t &bytes) noexcept
 The virtualised implementation of allocate_registered_buffer() used if no multiplexer has been set.
 
virtual io_result< buffers_type > _do_read (registered_buffer_type base, io_request< buffers_type > reqs, deadline d) noexcept
 The virtualised implementation of read() used if no multiplexer has been set.
 
virtual io_result< const_buffers_type > _do_write (registered_buffer_type base, io_request< const_buffers_type > reqs, deadline d) noexcept
 The virtualised implementation of write() used if no multiplexer has been set.
 
io_result< buffers_type > _do_multiplexer_read (registered_buffer_type &&base, io_request< buffers_type > reqs, deadline d) noexcept
 
io_result< const_buffers_type > _do_multiplexer_write (registered_buffer_type &&base, io_request< const_buffers_type > reqs, deadline d) noexcept
 
io_result< const_buffers_type > _do_multiplexer_barrier (registered_buffer_type &&base, io_request< const_buffers_type > reqs, barrier_kind kind, deadline d) noexcept
 

Static Protected Member Functions

static result< map_handle_new_map (size_type bytes, bool fallback, section_handle::flag _flag) noexcept
 
static result< map_handle_recycled_map (size_type bytes, section_handle::flag _flag) noexcept
 

Protected Attributes

section_handle_section {nullptr}
 
byte * _addr {nullptr}
 
extent_type _offset {0}
 
size_type _reservation {0}
 
size_type _length {0}
 
size_type _pagesize {0}
 
section_handle::flag _flag {section_handle::flag::none}
 
byte_io_multiplexer_ctx {nullptr}
 
union {
   native_handle_type   _v
 
   struct {
      intptr_t   _padding0_
 
      uint32_t   _padding1_
 
      flag   flags
 
      uint16_t   _padding2_
 
   }   _
 
}; 
 

Friends

class mapped_file_handle
 

Detailed Description

A handle to a memory mapped region of memory, either backed by the system page file or by a section.

An important concept to realise with mapped regions is that they can far exceed the size of their backing storage. This allows one to reserve address space for a file which may grow in the future. This is how mapped_file_handle is implemented to provide very fast memory mapped file i/o of a potentially growing file.

The size you specify when creating the map handle is the address space reservation. The map's length() will return the last known valid length of the mapped data i.e. the backing storage's length at the time of construction. This length is used by read() and write() to prevent reading and writing off the end of the mapped region. You can update this length to the backing storage's length using update_map() up to the reservation limit.

You can attempt to modify the address space reservation after creation using truncate(). If successful, this will be more efficient than tearing down the map and creating a new larger map.

The native handle returned by this map handle is always that of the backing storage, but closing this handle does not close that of the backing storage, nor does releasing this handle release that of the backing storage. Locking byte ranges of this handle is therefore equal to locking byte ranges in the original backing storage, which can be very useful.

On Microsoft Windows, when mapping file content, you should try to always create the first map of that file using a writable handle. See mapped_file_handle for more detail on this.

On Linux, be aware that there is a default limit of 65,530 non-contiguous VMAs per process. It is surprisingly easy to run into this limit in real world applications. You can either require users to issue sysctl -w vm.max_map_count=262144 to increase the kernel limit, or take considerable care to never poke holes into large VMAs. .do_not_store() is very useful here for releasing the resources backing pages without decommitting them.

Commit charge:

All virtual memory systems account for memory allocated, even if never used. This is known as "commit charge". All virtual memory systems will not permit more pages to be committed than there is storage for them between RAM and the swap files (except for Linux, where most distributions configure "over commit" in the Linux kernel). This ensures that if the system gives you a committed memory page, you are hard guaranteed that writing into it will not fail. Note that memory mapped files have storage backed by their file contents, so except for pages written into and not yet flushed to storage, memory mapped files usually do not contribute more than a few pages each to commit charge.

Note
You can determine the virtual memory accounting model for your system using map_handle::memory_accounting(). This caches the result of interrogating the system, so it is fast after its first call.

The system commit limit can be easily exceeded if programs commit a lot of memory that they never use. To avoid this, for large allocations you should reserve pages which you don't expect to use immediately, and later explicitly commit and decommit them. You can request pages not accounted against the system commit charge using flag::nocommit. For portability, you should always combine flag::nocommit with flag::none, indeed only Linux permits the allocation of usable pages which are not charged against commit. All other platforms enforce that reserved pages must be unusable, and only pages which are committed are usable.

Separate to whether a page is committed or not is whether it actually consumes resources or not. Pages never written into are not stored by virtual memory systems, and much code when considering the memory consumption of a process only considers the portion of the total commit charge which contains modified pages. This makes sense, given the prevalence of code which commits memory it never uses, however it also leads to anti-social outcomes such as Linux distributions enabling pathological workarounds such as over commit and specialised OOM killers.

Map handle caching

Repeatedly freeing and allocating virtual memory is particularly expensive because page contents must be cleared by the system before they can be handed out again. Most kernels clear pages using an idle loop, but if the system is busy then a surprising amount of CPU time can get consumed wiping pages.

Most users of page allocated memory can tolerate receiving dirty pages, so map_handle implements a process-local cache of previously allocated page regions which have since been close()d. If a new map_handle::map() asks for virtual memory and there is a region in the cache, that region is returned instead of a new region.

Before a region is added to the cache, it is decommitted (except on Linux when overcommit is enabled, see below). It therefore only consumes virtual address space in your process, and does not otherwise consume any resources apart from a VMA entry in the kernel. In particular, it does not appear in your process' RAM consumption (except on Linux). When a region is removed from the cache, it is committed, thus adding it to your process' RAM consumption. During this decommit-recommit process the kernel may choose to scavenge the memory, in which case fresh pages will be restored. However there is a good chance that whatever the pages contained before decommit will still be there after recommit.

Linux has a famously messed up virtual memory implementation. LLFIO implements a strict memory accounting model, and ordinarily we tell Linux what pages are to be counted towards commit charge or not so you don't have to. If overcommit is disabled in the system, you then get identical strict memory accounting like on every other OS.

If however overcommit is enabled, we don't decommit pages, but rather mark them LazyFree. This is to avoid inhibiting VMA coalescing, which is super important on Linux because of its ridiculously low per-process VMA limit typically 64k regions on most installs. Therefore, if you do disable overcommit, you will also need to substantially raise the maximum per process VMA limit as now LLFIO will strictly decommit memory, which prevents VMA coalescing and thus generates lots more VMAs.

The process local map handle cache does not self trim over time, so if you wish to reclaim virtual address space you need to manually call map_handle::trim_cache() from time to time.

Barriers:

map_handle, because it implements byte_io_handle, implements barrier() in a very conservative way to account for OS differences i.e. it calls msync(), and then the barrier() implementation for the backing file (probably fsync() or equivalent on most platforms, which synchronises the entire file).

This is vast overkill if you are using non-volatile RAM, so a special inlined nvram_barrier() implementation taking a single buffer and no other arguments is also provided as a free function. This calls the appropriate architecture-specific instructions to cause the CPU to write all preceding writes out of the write buffers and CPU caches to main memory, so for Intel CPUs this would be CLWB <each cache line>; SFENCE;. As this is inlined, it ought to produce optimal code. If your CPU does not support the requisite instructions (or LLFIO has not added support), and empty buffer will be returned to indicate that nothing was barriered, same as the normal barrier() function.

Large page support:

Large, huge, massive and super page support is available via the section_handle::flag::page_sizes_N flags. Use these in combination with utils::page_size() to request allocations or maps which use different page sizes.

Windows:

Firstly, explicit large page support is only available to processes and logged in users who have been assigned the SeLockMemoryPrivilege. A default Windows installation assigns that privilege to nothing, so explicit action will need to be taken to assign that privilege per Windows installation.

Secondly, as of Windows 10 1803, there is the large page size or the normal page size. There isn't (useful) support for pages of any other size, as there is on other systems.

For allocating memory, large page allocation can randomly fail depending on what the system is feeling like, same as on all the other operating systems. It is not permitted to reserve address space using large pages.

For mapping files, large page maps do not work as of Windows 10 1803 (curiously, ReactOS does implement this). There is a big exception to this, which is for DAX formatted NTFS volumes with a formatted cluster size of the large page size, where if you map in large page sized multiples, the Windows kernel uses large pages (and one need not hold SeLockMemoryPrivilege either). Therefore, if you specify section_handle::flag::nvram with a section_handle::flag::page_sizes_N, LLFIO does not ask for large pages which would fail, it merely rounds all requests up to the nearest large page multiple.

Linux:

As usual on Linux, large page (often called huge page on Linux) support comes in many forms.

Explicit support is via MAP_HUGETLB to mmap(), and whether an explicit request succeeds or not is up to how many huge pages were configured into the running system via boot-time kernel parameters, and how many huge pages are in use already. For most recent kernels on most distributions, explicit memory allocation requests using large pages generally works without issue. As of Linux kernel 4.18, mapping files using large pages only works on tmpfs, this corresponds to path_discovery::memory_backed_temporary_files_directory() sourced anonymous section handles. Work is proceeding well for the major Linux filing systems to become able to map files using large pages soon, and in theory LLFIO based should "just work" on such a newer kernel.

Note that some distributions enable transparent huge pages, whereby if you request allocations of large page multiples at large page offsets, the kernel uses large pages, without you needing to specify any section_handle::flag::page_sizes_N. Almost all distributions enable opt-in transparent huge pages, where you can explicitly request that pages within a region of memory transparently use huge pages as much as possible. LLFIO does not expose such facilities, you will need to manually invoke madvise(MADV_HUGEPAGE) on the region desired.

FreeBSD:

FreeBSD has no support for failing if large pages cannot be used for a specific mmap(). The best you can do is to ask for large pages, and you may or may not get them depending on available system resources, filing system in use, etc. LLFIO does not check returned maps to discover if large pages were actually used, that is on end user code to check if it really needs to know.

MacOS:

MacOS only supports large pages for memory allocations, not for mapping files. It fails if large pages could not be used when a large page allocation was requested.

See also
mapped_file_handle, algorithm::mapped_span

Member Enumeration Documentation

◆ memory_accounting_kind

The kind of memory accounting this system uses.

Enumerator
commit_charge 

This system will not permit more than physical RAM and your swap files to be committed. On every OS except for Linux, this is always the case.

over_commit 

This system will permit more memory to be committed than physical RAM and your swap files, and will terminate your process without warning at some unknown point if you write into enough of the pages committed. This is typically the default on Linux, but it can be changed at runtime.

678  {
679  unknown,
680  /*! This system will not permit more than physical RAM and your swap files to be committed.
681  On every OS except for Linux, this is always the case.
682  */
683  commit_charge,
684  /*! This system will permit more memory to be committed than physical RAM and your swap
685  files, and will terminate your process without warning at some unknown point
686  if you write into enough of the pages committed. This is typically the default on Linux,
687  but it can be changed at runtime.
688  */
689  over_commit
690  };

Member Function Documentation

◆ allocate_registered_buffer()

result<registered_buffer_type> llfio_v2_xxx::byte_io_handle::allocate_registered_buffer ( size_t &  bytes)
inlinenoexceptinherited

Request the allocation of a new registered i/o buffer with the system suitable for maximum performance i/o, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation.

Returns
A shared pointer to the i/o buffer. Note that the pointer returned is not the resource under management, using shared ptr's aliasing feature.
Parameters
bytesThe size of the i/o buffer requested. This may be rounded (considerably) upwards, you should always use the value returned.

Some i/o multiplexer implementations have the ability to allocate i/o buffers in special memory shared between the i/o hardware and user space processes. Using registered i/o buffers can entirely eliminate all kernel transitions and memory copying during i/o, and can saturate very high end hardware from a single kernel thread.

If no multiplexer is set, the default implementation uses map_handle to allocate raw memory pages from the OS kernel. If the requested buffer size is a multiple of one of the larger page sizes from utils::page_sizes(), an attempt to satisfy the request using the larger page size will be attempted first.

269  {
270  if(_ctx == nullptr)
271  {
272  return _do_allocate_registered_buffer(bytes);
273  }
274  return _ctx->do_byte_io_handle_allocate_registered_buffer(this, bytes);
275  }
virtual result< registered_buffer_type > _do_allocate_registered_buffer(size_t &bytes) noexcept
The virtualised implementation of allocate_registered_buffer() used if no multiplexer has been set.
Definition: map_handle.hpp:1023
virtual result< registered_buffer_type > do_byte_io_handle_allocate_registered_buffer(byte_io_handle *h, size_t &bytes) noexcept
Implements byte_io_handle::allocate_registered_buffer()
Definition: byte_io_handle.hpp:537

◆ barrier()

virtual io_result<const_buffers_type> llfio_v2_xxx::byte_io_handle::barrier ( io_request< const_buffers_type >  reqs = io_request<const_buffers_type>(),
barrier_kind  kind = barrier_kind::nowait_data_only,
deadline  d = deadline() 
)
inlinevirtualnoexceptinherited

Issue a write reordering barrier such that writes preceding the barrier will reach storage before writes after this barrier, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation.

Warning
Assume that this call is a no-op. It is not reliably implemented in many common use cases, for example if your code is running inside a LXC container, or if the user has mounted the filing system with non-default options. Instead open the handle with caching::reads which means that all writes form a strict sequential order not completing until acknowledged by the storage device. Filing system can and do use different algorithms to give much better performance with caching::reads, some (e.g. ZFS) spectacularly better.
Let me repeat again: consider this call to be a hint to poke the kernel with a stick to go start to do some work sooner rather than later. It may be ignored entirely.
For portability, you can only assume that barriers write order for a single handle instance. You cannot assume that barriers write order across multiple handles to the same inode, or across processes.
Returns
The buffers barriered, which may not be the buffers input. The size of each scatter-gather buffer is updated with the number of bytes of that buffer barriered.
Parameters
reqsA scatter-gather and offset request for what range to barrier. May be ignored on some platforms which always write barrier the entire file. Supplying a default initialised reqs write barriers the entire file.
kindWhich kind of write reordering barrier to perform.
dAn optional deadline by which the i/o must complete, else it is cancelled. Note function may return significantly after this deadline if the i/o takes long to cancel.
Errors returnable\n Any of the values POSIX fdatasync() or Windows NtFlushBuffersFileEx() can return.
Memory Allocations\n None.
404  {
405  return (_ctx == nullptr) ? _do_barrier(reqs, kind, d) : _do_multiplexer_barrier({}, std::move(reqs), kind, d);
406  }
virtual io_result< const_buffers_type > _do_barrier(io_request< const_buffers_type > reqs, barrier_kind kind, deadline d) noexcept
The virtualised implementation of barrier() used if no multiplexer has been set.

◆ clone()

result<handle> llfio_v2_xxx::handle::clone ( ) const
inlinenoexceptinherited

Clone this handle (copy constructor is disabled to avoid accidental copying)

Errors returnable\n Any of the values POSIX dup() or DuplicateHandle() can return.

◆ co_barrier()

awaitable<io_result<const_buffers_type> > llfio_v2_xxx::byte_io_handle::co_barrier ( io_request< const_buffers_type >  reqs = io_request<const_buffers_type>(),
barrier_kind  kind = barrier_kind::nowait_data_only,
deadline  d = deadline() 
)
inlinenoexceptinherited

A coroutinised equivalent to .barrier() which suspends the coroutine until the i/o finishes. Blocks execution i.e is equivalent to .barrier() if no i/o multiplexer has been set on this handle!

The awaitable returned is eager i.e. it immediately begins the i/o. If the i/o completes and finishes immediately, no coroutine suspension occurs.

486  {
487  if(_ctx == nullptr)
488  {
489  return awaitable<io_result<const_buffers_type>>(barrier(std::move(reqs), kind, d));
490  }
491  awaitable<io_result<const_buffers_type>> ret;
492  ret.set_state(_ctx->construct(ret._state_storage, this, nullptr, {}, d, std::move(reqs), kind));
493  return ret;
494  }
virtual io_result< const_buffers_type > barrier(io_request< const_buffers_type > reqs=io_request< const_buffers_type >(), barrier_kind kind=barrier_kind::nowait_data_only, deadline d=deadline()) noexcept
Issue a write reordering barrier such that writes preceding the barrier will reach storage before wri...
Definition: byte_io_handle.hpp:402
virtual io_operation_state * construct(span< byte > storage, byte_io_handle *_h, io_operation_state_visitor *_visitor, registered_buffer_type &&b, deadline d, io_request< buffers_type > reqs) noexcept=0
Constructs either a unsynchronised_io_operation_state or a synchronised_io_operation_state for a read...

◆ co_read()

awaitable<io_result<buffers_type> > llfio_v2_xxx::byte_io_handle::co_read ( io_request< buffers_type >  reqs,
deadline  d = deadline() 
)
inlinenoexceptinherited

A coroutinised equivalent to .read() which suspends the coroutine until the i/o finishes. Blocks execution i.e is equivalent to .read() if no i/o multiplexer has been set on this handle!

The awaitable returned is eager i.e. it immediately begins the i/o. If the i/o completes and finishes immediately, no coroutine suspension occurs.

423  {
424  if(_ctx == nullptr)
425  {
426  return awaitable<io_result<buffers_type>>(read(std::move(reqs), d));
427  }
428  awaitable<io_result<buffers_type>> ret;
429  ret.set_state(_ctx->construct(ret._state_storage, this, nullptr, {}, d, std::move(reqs)));
430  return ret;
431  }
io_result< buffers_type > read(io_request< buffers_type > reqs, deadline d=deadline()) noexcept
Read data from the open handle, preferentially using any i/o multiplexer set over the virtually overr...
Definition: byte_io_handle.hpp:297

◆ co_write()

awaitable<io_result<const_buffers_type> > llfio_v2_xxx::byte_io_handle::co_write ( io_request< const_buffers_type >  reqs,
deadline  d = deadline() 
)
inlinenoexceptinherited

A coroutinised equivalent to .write() which suspends the coroutine until the i/o finishes. Blocks execution i.e is equivalent to .write() if no i/o multiplexer has been set on this handle!

The awaitable returned is eager i.e. it immediately begins the i/o. If the i/o completes and finishes immediately, no coroutine suspension occurs.

454  {
455  if(_ctx == nullptr)
456  {
457  return awaitable<io_result<const_buffers_type>>(write(std::move(reqs), d));
458  }
459  awaitable<io_result<const_buffers_type>> ret;
460  ret.set_state(_ctx->construct(ret._state_storage, this, nullptr, {}, d, std::move(reqs)));
461  return ret;
462  }
io_result< const_buffers_type > write(io_request< const_buffers_type > reqs, deadline d=deadline()) noexcept
Write data to the open handle, preferentially using any i/o multiplexer set over the virtually overri...
Definition: byte_io_handle.hpp:345

◆ commit()

result<buffer_type> llfio_v2_xxx::map_handle::commit ( buffer_type  region,
section_handle::flag  flag = section_handle::flag::readwrite 
)
inlinenoexcept

Ask the system to commit the system resources to make the memory represented by the buffer available with the given permissions. addr and length should be page aligned (see page_size()), if not the returned buffer is the region actually committed.

◆ current_path()

virtual result<path_type> llfio_v2_xxx::handle::current_path ( ) const
inlinevirtualnoexceptinherited

Returns the current path of the open handle as said by the operating system. Note that you are NOT guaranteed that any path refreshed bears any resemblance to the original, some operating systems will return some different path which still reaches the same inode via some other route e.g. hardlinks, dereferenced symbolic links, etc. Windows and Linux correctly track changes to the specific path the handle was opened with, not getting confused by other hard links. MacOS nearly gets it right, but under some circumstances e.g. renaming may switch to a different hard link's path which is almost certainly a bug.

If LLFIO was not able to determine the current path for this open handle e.g. the inode has been unlinked, it returns an empty path. Be aware that FreeBSD can return an empty (deleted) path for file inodes no longer cached by the kernel path cache, LLFIO cannot detect the difference. FreeBSD will also return any path leading to the inode if it is hard linked. FreeBSD does implement path retrieval for directory inodes correctly however, and see algorithm::cached_parent_handle_adapter<T> for a handle adapter which makes use of that.

On Linux if /proc is not mounted, this call fails with an error. All APIs in LLFIO which require the use of current_path() can be told to not use it e.g. flag::disable_safety_unlinks. It is up to you to detect if current_path() is not working, and to change how you call LLFIO appropriately.

On Windows, you will almost certainly get back a path of the form \!!\Device\HarddiskVolume10\Users\ned\.... See path_view for what all the path prefix sequences mean, but to summarise the \!!\ prefix is LLFIO-only and will not be accepted by other Windows APIs. Pass LLFIO derived paths through the function to_win32_path() to Win32-ise them. This function is also available on Linux where it does nothing, so you can use it in portable code.

Warning
This call is expensive, it always asks the kernel for the current path, and no checking is done to ensure what the kernel returns is accurate or even sensible. Be aware that despite these precautions, paths are unstable and can change randomly at any moment. Most code written to use absolute file systems paths is racy, so don't do it, use path_handle to fix a base location on the file system and work from that anchor instead!
Memory Allocations\n At least one malloc for the path_type, likely several more.
See also
algorithm::cached_parent_handle_adapter<T> which overrides this with an implementation based on retrieving the current path of a cached handle to the parent directory. On platforms with instability or failure to retrieve the correct current path for regular files, the cached parent handle adapter works around the problem by taking advantage of directory inodes not having the same instability problems on any platform.

Reimplemented in llfio_v2_xxx::symlink_handle, and llfio_v2_xxx::process_handle.

◆ decommit()

result<buffer_type> llfio_v2_xxx::map_handle::decommit ( buffer_type  region)
inlinenoexcept

Ask the system to make the memory represented by the buffer unavailable and to decommit the system resources representing them. addr and length should be page aligned (see page_size()), if not the returned buffer is the region actually decommitted.

◆ do_not_store()

result<buffer_type> llfio_v2_xxx::map_handle::do_not_store ( buffer_type  region)
inlinenoexcept

Ask the system to unset the dirty flag for the memory represented by the buffer. This will prevent any changes not yet sent to the backing storage from being sent in the future, also if the system kicks out this page and reloads it you may see some edition of the underlying storage instead of what was here. addr and length should be page aligned (seepage_size()), if not the returned buffer is the region actually undirtied.

Note that commit charge is not affected by this operation, as writes into the undirtied pages are guaranteed to succeed.

You should be aware that on Microsoft Windows, the platform syscall for discarding virtual memory pages becomes hideously slow when called upon committed pages within a large address space reservation. All three syscalls were trialled, and the least worst is actually DiscardVirtualMemory() which is what this function uses. However it still has exponential slow down as more pages within a large reservation become committed e.g. 8Gb committed within a 2Tb reservation is approximately 20x slower than when < 1Gb is committed. Non-Windows platforms do not have this problem.

Warning
This function destroys the contents of unwritten pages in the region in a totally unpredictable fashion. Only use it if you don't care how much of the region reaches physical storage or not. Note that the region is not necessarily zeroed, and may be randomly zeroed.
Note
Microsoft Windows does not support unsetting the dirty flag on file backed maps, so on Windows this call does nothing.

◆ lock_file()

virtual result<void> llfio_v2_xxx::lockable_byte_io_handle::lock_file ( )
inlinevirtualnoexceptinherited

Locks the inode referred to by the open handle for exclusive access.

Note that this may, or may not, interact with the byte range lock extensions. See unique_file_lock for a RAII locker.

Errors returnable\n Any of the values POSIX flock() can return.
Memory Allocations\n The default synchronous implementation in file_handle performs no memory allocation.

◆ lock_file_range()

virtual result<extent_guard> llfio_v2_xxx::lockable_byte_io_handle::lock_file_range ( extent_type  offset,
extent_type  bytes,
lock_kind  kind,
deadline  d = deadline() 
)
inlinevirtualnoexceptinherited

EXTENSION: Tries to lock the range of bytes specified for shared or exclusive access. Note that this may, or MAY NOT, observe whole file locks placed with lock(), lock_shared() etc.

Be aware this passes through the same semantics as the underlying OS call, including any POSIX insanity present on your platform:

  • Any fd closed on an inode must release all byte range locks on that inode for all other fds. If your OS isn't new enough to support the non-insane lock API, flag::byte_lock_insanity will be set in flags() after the first call to this function.
  • Threads replace each other's locks, indeed locks replace each other's locks.

You almost cetainly should use your choice of an algorithm::shared_fs_mutex::* instead of this as those are more portable and performant, or use the SharedMutex modelling member functions which lock the whole inode for exclusive or shared access.

Warning
This is a low-level API which you should not use directly in portable code. Another issue is that atomic lock upgrade/downgrade, if your platform implements that (you should assume it does not in portable code), means that on POSIX you need to release the old extent_guard after creating a new one over the same byte range, otherwise the old extent_guard's destructor will simply unlock the range entirely. On Windows however upgrade/downgrade locks overlay, so on that platform you must not release the old extent_guard. Look into algorithm::shared_fs_mutex::safe_byte_ranges for a portable solution.
Returns
An extent guard, the destruction of which will call unlock().
Parameters
offsetThe offset to lock. Note that on POSIX the top bit is always cleared before use as POSIX uses signed transport for offsets. If you want an advisory rather than mandatory lock on Windows, one technique is to force top bit set so the region you lock is not the one you will i/o - obviously this reduces maximum file size to (2^63)-1.
bytesThe number of bytes to lock. Setting this and the offset to zero causes the whole file to be locked.
kindWhether the lock is to be shared or exclusive.
dAn optional deadline by which the lock must complete, else it is cancelled.
Errors returnable\n Any of the values POSIX fcntl() can return, errc::timed_out, errc::not_supported may be
returned if deadline i/o is not possible with this particular handle configuration (e.g. non-overlapped HANDLE on Windows).
Memory Allocations\n The default synchronous implementation in file_handle performs no memory allocation.

Reimplemented in llfio_v2_xxx::fast_random_file_handle.

◆ lock_file_shared()

virtual result<void> llfio_v2_xxx::lockable_byte_io_handle::lock_file_shared ( )
inlinevirtualnoexceptinherited

Locks the inode referred to by the open handle for shared access.

Note that this may, or may not, interact with the byte range lock extensions. See unique_file_lock for a RAII locker.

Errors returnable\n Any of the values POSIX flock() can return.
Memory Allocations\n The default synchronous implementation in file_handle performs no memory allocation.

◆ map() [1/2]

static result<map_handle> llfio_v2_xxx::map_handle::map ( section_handle section,
size_type  bytes = 0,
extent_type  offset = 0,
section_handle::flag  _flag = section_handle::flag::readwrite 
)
inlinestaticnoexcept

Create a memory mapped view of a backing storage, optionally reserving additional address space for later growth.

Parameters
sectionA memory section handle specifying the backing storage to use.
bytesHow many bytes to reserve (0 = the size of the section). Rounded up to nearest 64Kb on Windows.
offsetThe offset into the backing storage to map from. This can be byte granularity, but be careful if you use non-pagesize offsets (see below).
_flagThe permissions with which to map the view which are constrained by the permissions of the memory section. flag::none can be useful for reserving virtual address space without committing system resources, use commit() to later change availability of memory. Note that apart from read/write/cow/execute, the section's flags override the map's flags.
Errors returnable\n Any of the values POSIX mmap() or NtMapViewOfSection() can return.

◆ map() [2/2]

static result<map_handle> llfio_v2_xxx::map_handle::map ( size_type  bytes,
bool  zeroed = false,
section_handle::flag  _flag = section_handle::flag::readwrite 
)
inlinestaticnoexcept

Map unused memory into view, creating new memory if insufficient unused memory is available (i.e. add the returned memory to the process' commit charge, unless flag::nocommit was specified). Note that the memory mapped by this call may contain non-zero bits (recycled memory) unless zeroed is true.

Parameters
bytesHow many bytes to map. Typically will be rounded up to a multiple of the page size (see page_size()).
zeroedSet to true if only all bits zeroed memory is wanted. If this is true, a syscall is always performed as the kernel probably has zeroed pages ready to go, whereas if false, the request may be satisfied from a local cache instead. The default is false.
_flagThe permissions with which to map the view.
Note
On Microsoft Windows this constructor uses the faster VirtualAlloc() which creates less versatile page backed memory. If you want anonymous memory allocated from a paging file backed section instead, create a page file backed section and then a mapped view from that using the other constructor. This makes available all those very useful VM tricks Windows can do with section mapped memory which VirtualAlloc() memory cannot do.

When this kind of map handle is closed, it is added to an internal cache so new map handle creations of this kind with zeroed = false are very quick and avoid a syscall. The internal cache may return a map slightly bigger than requested. If you wish to always invoke the syscall, specify zeroed = true.

When maps are added to the internal cache, on all systems except Linux the memory is decommitted first. This reduces commit charge appropriately, thus only virtual address space remains consumed. On Linux, if memory_accounting() is memory_accounting_kind::commit_charge, we also decommit, however be aware that this can increase the average VMA use count in the Linux kernel, and most Linux kernels are configured with a very low per-process limit of 64k VMAs (this is easy to raise using sysctl -w vm.max_map_count=262144). Otherwise on Linux to avoid increasing VMA count we instead mark closed maps as LazyFree, which means that their contents can be arbitrarily disposed of by the Linux kernel as needed, but also allows Linux to coalesce VMAs so the very low per-process limit is less likely to be exceeded. If the LazyFree syscall is not implemented on this Linux, we do nothing.

Warning
The cache does not self-trim on its own, you MUST call trim_cache() to trim allocations of virtual address (these don't count towards process commit charge, but they do consume address space and precious VMAs in the Linux kernel). Only on 32 bit processes where virtual address space is limited, or on Linux where VMAs allocated is considered by the Linux OOM killer, will you need to probably care much about regular cache trimming.
Errors returnable\n Any of the values POSIX mmap() or VirtualAlloc() can return.
634  {
635  return (zeroed || (_flag & section_handle::flag::nocommit)) ? _new_map(bytes, true, _flag) : _recycled_map(bytes, _flag);
636  }

◆ max_buffers()

size_t llfio_v2_xxx::byte_io_handle::max_buffers ( ) const
inlinenoexceptinherited

The maximum number of buffers which a single read or write syscall can (atomically) process at a time for this specific open handle. On POSIX, this is known as IOV_MAX. Preferentially uses any i/o multiplexer set over the virtually overridable per-class implementation.

Note that the actual number of buffers accepted for a read or a write may be significantly lower than this system-defined limit, depending on available resources. The read() or write() call will return the buffers accepted at the time of invoking the syscall.

Note also that some OSs will error out if you supply more than this limit to read() or write(), but other OSs do not. Some OSs guarantee that each i/o syscall has effects atomically visible or not to other i/o, other OSs do not.

OS X does not implement scatter-gather file i/o syscalls. Thus this function will always return 1 in that situation.

Microsoft Windows may implement scatter-gather i/o under certain handle configurations. Most of the time for non-socket handles this function will return 1.

For handles which implement i/o entirely in user space, and thus syscalls are not involved, this function will return 0.

240  {
241  if(_ctx == nullptr)
242  {
243  return _do_max_buffers();
244  }
245  return _ctx->do_byte_io_handle_max_buffers(this);
246  }
virtual size_t _do_max_buffers() const noexcept
The virtualised implementation of max_buffers() used if no multiplexer has been set.
virtual size_t do_byte_io_handle_max_buffers(const byte_io_handle *h) const noexcept
Implements byte_io_handle::max_buffers()
Definition: byte_io_handle.hpp:533

◆ prefetch()

static result<span<buffer_type> > llfio_v2_xxx::map_handle::prefetch ( span< buffer_type regions)
inlinestaticnoexcept

Ask the system to begin to asynchronously prefetch the span of memory regions given, returning the regions actually prefetched. Note that on Windows 7 or earlier the system call to implement this was not available, and so you will see an empty span returned.

◆ QUICKCPPLIB_BITFIELD_BEGIN_T()

llfio_v2_xxx::handle::QUICKCPPLIB_BITFIELD_BEGIN_T ( flag  ,
uint16_t   
)
inlineinherited

Bitwise flags which can be specified.

< No flags

Unlinks the file on handle close. On POSIX, this simply unlinks whatever is pointed to by path() upon the call of close() if and only if the inode matches. On Windows, if you are on Windows 10 1709 or later, exactly the same thing occurs. If on previous editions of Windows, the file entry does not disappears but becomes unavailable for anyone else to open with an errc::resource_unavailable_try_again error return. Because this is confusing, unless the win_disable_unlink_emulation flag is also specified, this POSIX behaviour is somewhat emulated by LLFIO on older Windows by renaming the file to a random name on close() causing it to appear to have been unlinked immediately.

Some kernel caching modes have unhelpfully inconsistent behaviours in getting your data onto storage, so by default unless this flag is specified LLFIO adds extra fsyncs to the following operations for the caching modes specified below: truncation of file length either explicitly or during file open. closing of the handle either explicitly or in the destructor.

Additionally on Linux only to prevent loss of file metadata: On the parent directory whenever a file might have been created. On the parent directory on file close.

This only occurs for these kernel caching modes: caching::none caching::reads caching::reads_and_metadata caching::safety_barriers

file_handle::unlink() could accidentally delete the wrong file if someone has renamed the open file handle since the time it was opened. To prevent this occuring, where the OS doesn't provide race free unlink-by-open-handle we compare the inode of the path we are about to unlink with that of the open handle before unlinking.

Warning
This does not prevent races where in between the time of checking the inode and executing the unlink a third party changes the item about to be unlinked. Only operating systems with a true race-free unlink syscall are race free.

Ask the OS to disable prefetching of data. This can improve random i/o performance.

Ask the OS to maximise prefetching of data, possibly prefetching the entire file into kernel cache. This can improve sequential i/o performance.

< See the documentation for unlink_on_first_close

Microsoft Windows NTFS, having been created in the late 1980s, did not originally implement extents-based storage and thus could only represent sparse files via efficient compression of intermediate zeros. With NTFS v3.0 (Microsoft Windows 2000), a proper extents-based on-storage representation was added, thus allowing only 64Kb extent chunks written to be stored irrespective of whatever the maximum file extent was set to.

For various historical reasons, extents-based storage is disabled by default in newly created files on NTFS, unlike in almost every other major filing system. You have to explicitly "opt in" to extents-based storage.

As extents-based storage is nearly cost free on NTFS, LLFIO by default opts in to extents-based storage for any empty file it creates. If you don't want this, you can specify this flag to prevent that happening.

Filesystems tend to be embarrassingly parallel for operations performed to different inodes. Where LLFIO performs i/o to multiple inodes at a time, it will use OpenMP or the Parallelism or Concurrency standard library extensions to usually complete the operation in constant rather than linear time. If you don't want this default, you can disable default using this flag.

Microsoft Windows NTFS has the option, when creating a directory, to set whether leafname lookup will be case sensitive. This is the only way of getting exact POSIX semantics on Windows without resorting to editing the system registry, however it also affects all code doing lookups within that directory, so we must default it to off.

Create the handle in a way where i/o upon it can be multiplexed with other i/o on the same initiating thread of execution i.e. you can perform more than one read concurrently, without using threads. The blocking operations .read() and .write() may have to use a less efficient, but cancellable, blocking implementation for handles created in this way. On Microsoft Windows, this creates handles with OVERLAPPED semantics. On POSIX, this creates handles with nonblocking semantics for non-file handles such as pipes and sockets, however for file, directory and symlink handles it does not set nonblocking, as it is non-portable.

< Using insane POSIX byte range locks

< This is an inode created with no representation on the filing system

110  {
111  none = uint16_t(0), //!< No flags
112  /*! Unlinks the file on handle close. On POSIX, this simply unlinks whatever is pointed
113  to by `path()` upon the call of `close()` if and only if the inode matches. On Windows,
114  if you are on Windows 10 1709 or later, exactly the same thing occurs. If on previous
115  editions of Windows, the file entry does not disappears but becomes unavailable for
116  anyone else to open with an `errc::resource_unavailable_try_again` error return. Because this is confusing, unless the
117  `win_disable_unlink_emulation` flag is also specified, this POSIX behaviour is
118  somewhat emulated by LLFIO on older Windows by renaming the file to a random name on `close()`
119  causing it to appear to have been unlinked immediately.
120  */
121  unlink_on_first_close = uint16_t(1U << 0U),
122 
123  /*! Some kernel caching modes have unhelpfully inconsistent behaviours
124  in getting your data onto storage, so by default unless this flag is
125  specified LLFIO adds extra fsyncs to the following operations for the
126  caching modes specified below:
127  * truncation of file length either explicitly or during file open.
128  * closing of the handle either explicitly or in the destructor.
129 
130  Additionally on Linux only to prevent loss of file metadata:
131  * On the parent directory whenever a file might have been created.
132  * On the parent directory on file close.
133 
134  This only occurs for these kernel caching modes:
135  * caching::none
136  * caching::reads
137  * caching::reads_and_metadata
138  * caching::safety_barriers
139  */
140  disable_safety_barriers = uint16_t(1U << 2U),
141  /*! `file_handle::unlink()` could accidentally delete the wrong file if someone has
142  renamed the open file handle since the time it was opened. To prevent this occuring,
143  where the OS doesn't provide race free unlink-by-open-handle we compare the inode of
144  the path we are about to unlink with that of the open handle before unlinking.
145  \warning This does not prevent races where in between the time of checking the inode
146  and executing the unlink a third party changes the item about to be unlinked. Only
147  operating systems with a true race-free unlink syscall are race free.
148  */
149  disable_safety_unlinks = uint16_t(1U << 3U),
150  /*! Ask the OS to disable prefetching of data. This can improve random
151  i/o performance.
152  */
153  disable_prefetching = uint16_t(1U << 4U),
154  /*! Ask the OS to maximise prefetching of data, possibly prefetching the entire file
155  into kernel cache. This can improve sequential i/o performance.
156  */
157  maximum_prefetching = uint16_t(1U << 5U),
158 
159  win_disable_unlink_emulation = uint16_t(1U << 9U), //!< See the documentation for `unlink_on_first_close`
160  /*! Microsoft Windows NTFS, having been created in the late 1980s, did not originally
161  implement extents-based storage and thus could only represent sparse files via
162  efficient compression of intermediate zeros. With NTFS v3.0 (Microsoft Windows 2000),
163  a proper extents-based on-storage representation was added, thus allowing only 64Kb
164  extent chunks written to be stored irrespective of whatever the maximum file extent
165  was set to.
166 
167  For various historical reasons, extents-based storage is disabled by default in newly
168  created files on NTFS, unlike in almost every other major filing system. You have to
169  explicitly "opt in" to extents-based storage.
170 
171  As extents-based storage is nearly cost free on NTFS, LLFIO by default opts in to
172  extents-based storage for any empty file it creates. If you don't want this, you
173  can specify this flag to prevent that happening.
174  */
175  win_disable_sparse_file_creation = uint16_t(1U << 10U),
176  /*! Filesystems tend to be embarrassingly parallel for operations performed to different
177  inodes. Where LLFIO performs i/o to multiple inodes at a time, it will use OpenMP or
178  the Parallelism or Concurrency standard library extensions to usually complete the
179  operation in constant rather than linear time. If you don't want this default, you can
180  disable default using this flag.
181  */
182  disable_parallelism = uint16_t(1U << 11U),
183  /*! Microsoft Windows NTFS has the option, when creating a directory, to set whether
184  leafname lookup will be case sensitive. This is the only way of getting exact POSIX
185  semantics on Windows without resorting to editing the system registry, however it also
186  affects all code doing lookups within that directory, so we must default it to off.
187  */
188  win_create_case_sensitive_directory = uint16_t(1U << 12U),
189 
190  /*! Create the handle in a way where i/o upon it can be multiplexed with other i/o
191  on the same initiating thread of execution i.e. you can perform more than one read
192  concurrently, without using threads. The blocking operations `.read()` and `.write()`
193  may have to use a less efficient, but cancellable, blocking implementation for handles created
194  in this way. On Microsoft Windows, this creates handles with `OVERLAPPED` semantics.
195  On POSIX, this creates handles with nonblocking semantics for non-file handles such
196  as pipes and sockets, however for file, directory and symlink handles it does not set
197  nonblocking, as it is non-portable.
198  */
199  multiplexable = uint16_t(1U << 13U),
200 
201  // NOTE: IF UPDATING THIS UPDATE THE std::ostream PRINTER BELOW!!!
202 
203  byte_lock_insanity = uint16_t(1U << 14U), //!< Using insane POSIX byte range locks
204  anonymous_inode = uint16_t(1U << 15U) //!< This is an inode created with no representation on the filing system
205  } QUICKCPPLIB_BITFIELD_END(flag)
@ none
No flags.
Definition: byte_socket_handle.hpp:224

◆ read() [1/2]

io_result<buffers_type> llfio_v2_xxx::byte_io_handle::read ( io_request< buffers_type >  reqs,
deadline  d = deadline() 
)
inlinenoexceptinherited

Read data from the open handle, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation.

Warning
Depending on the implementation backend, very different buffers may be returned than you supplied. You should always use the buffers returned and assume that they point to different memory and that each buffer's size will have changed.
Returns
The buffers read, which may not be the buffers input. The size of each scatter-gather buffer returned is updated with the number of bytes of that buffer transferred, and the pointer to the data may be completely different to what was submitted (e.g. it may point into a memory map).
Parameters
reqsA scatter-gather and offset request.
dAn optional deadline by which the i/o must complete, else it is cancelled. Note function may return significantly after this deadline if the i/o takes long to cancel.
Errors returnable\n Any of the values POSIX read() can return, errc::timed_out, errc::operation_canceled. errc::not_supported may be
returned if deadline i/o is not possible with this particular handle configuration (e.g. reading from regular files on POSIX or reading from a non-overlapped HANDLE on Windows).
Memory Allocations\n The default synchronous implementation in file_handle performs no memory allocation.
298  {
299  return (_ctx == nullptr) ? _do_read(reqs, d) : _do_multiplexer_read({}, reqs, d);
300  }
virtual io_result< buffers_type > _do_read(io_request< buffers_type > reqs, deadline d) noexcept
The virtualised implementation of read() used if no multiplexer has been set.

◆ read() [2/2]

io_result<buffers_type> llfio_v2_xxx::byte_io_handle::read
inlinenoexcept

Read data from the open handle, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation.

Warning
Depending on the implementation backend, very different buffers may be returned than you supplied. You should always use the buffers returned and assume that they point to different memory and that each buffer's size will have changed.
Returns
The buffers read, which may not be the buffers input. The size of each scatter-gather buffer returned is updated with the number of bytes of that buffer transferred, and the pointer to the data may be completely different to what was submitted (e.g. it may point into a memory map).
Parameters
reqsA scatter-gather and offset request.
dAn optional deadline by which the i/o must complete, else it is cancelled. Note function may return significantly after this deadline if the i/o takes long to cancel.
Errors returnable\n Any of the values POSIX read() can return, errc::timed_out, errc::operation_canceled. errc::not_supported may be
returned if deadline i/o is not possible with this particular handle configuration (e.g. reading from regular files on POSIX or reading from a non-overlapped HANDLE on Windows).
Memory Allocations\n The default synchronous implementation in file_handle performs no memory allocation.
298  {
299  return (_ctx == nullptr) ? _do_read(reqs, d) : _do_multiplexer_read({}, reqs, d);
300  }
virtual io_result< buffers_type > _do_read(io_request< buffers_type > reqs, deadline d=deadline()) noexcept override
The virtualised implementation of read() used if no multiplexer has been set.

◆ reserve()

static result<map_handle> llfio_v2_xxx::map_handle::reserve ( size_type  bytes)
inlinestaticnoexcept

Reserve address space within which individual pages can later be committed. Reserved address space is NOT added to the process' commit charge.

Parameters
bytesHow many bytes to reserve. Rounded up to nearest 64Kb on Windows.
Note
On Microsoft Windows this constructor uses the faster VirtualAlloc() which creates less versatile page backed memory. If you want anonymous memory allocated from a paging file backed section instead, create a page file backed section and then a mapped view from that using the other constructor. This makes available all those very useful VM tricks Windows can do with section mapped memory which VirtualAlloc() memory cannot do.
Errors returnable\n Any of the values POSIX mmap() or VirtualAlloc() can return.
653  {
654  return _new_map(bytes, false, section_handle::flag::none | section_handle::flag::nocommit);
655  }

◆ set_append_only()

virtual result<void> llfio_v2_xxx::handle::set_append_only ( bool  enable)
inlinevirtualnoexceptinherited

EXTENSION: Changes whether this handle is append only or not.

Warning
On Windows this is implemented as a bit of a hack to make it fast like on POSIX, so make sure you open the handle for read/write originally. Note unlike on POSIX the append_only disposition will be the only one toggled, seekable and readable will remain turned on.
Errors returnable\n Whatever POSIX fcntl() returns. On Windows nothing is changed on the handle.
Memory Allocations\n No memory allocation.

Reimplemented in llfio_v2_xxx::process_handle.

◆ set_cache_disabled()

static bool llfio_v2_xxx::map_handle::set_cache_disabled ( bool  disabled)
inlinestaticnoexcept

Disable the map handle cache, returning its previous setting. Note that you may also wish to explicitly trim the cache.

◆ set_multiplexer()

result< void > llfio_v2_xxx::byte_io_handle::set_multiplexer ( byte_io_multiplexer c = this_thread::multiplexer())
inlinevirtualnoexceptinherited

Sets the i/o multiplexer this handle will use to implement read(), write() and barrier().

Note that this call deregisters this handle from any existing i/o multiplexer, and registers it with the new i/o multiplexer. You must therefore not call it if any i/o is currently outstanding on this handle. You should also be aware that multiple dynamic memory allocations and deallocations may occur, as well as multiple syscalls (i.e. this is an expensive call, try to do it from cold code).

If the handle was not created as multiplexable, this call always fails.

Memory Allocations\n Multiple dynamic memory allocations and deallocations.

Reimplemented in llfio_v2_xxx::mapped_file_handle.

502 {
503  if(!is_multiplexable())
504  {
505  return errc::operation_not_supported;
506  }
507  if(c == _ctx)
508  {
509  return success();
510  }
511  if(_ctx != nullptr)
512  {
513  OUTCOME_TRY(_ctx->do_byte_io_handle_deregister(this));
514  _ctx = nullptr;
515  }
516  if(c != nullptr)
517  {
518  OUTCOME_TRY(auto &&state, c->do_byte_io_handle_register(this));
519  _v.behaviour = (_v.behaviour & ~(native_handle_type::disposition::_multiplexer_state_bit0 | native_handle_type::disposition::_multiplexer_state_bit1));
520  if((state & 1) != 0)
521  {
522  _v.behaviour |= native_handle_type::disposition::_multiplexer_state_bit0;
523  }
524  if((state & 2) != 0)
525  {
526  _v.behaviour |= native_handle_type::disposition::_multiplexer_state_bit1;
527  }
528  }
529  _ctx = c;
530  return success();
531 }
virtual result< void > do_byte_io_handle_deregister(byte_io_handle *) noexcept
Implements byte_io_handle deregistration.
Definition: byte_io_multiplexer.hpp:542
bool is_multiplexable() const noexcept
True if multiplexable.
Definition: handle.hpp:344

◆ trim_cache()

static cache_statistics llfio_v2_xxx::map_handle::trim_cache ( std::chrono::steady_clock::time_point  older_than = {},
size_t  max_items = (size_t) -1 
)
inlinestaticnoexcept

Get statistics about the map handle cache, optionally trimming the least recently used maps.

◆ truncate()

result<size_type> llfio_v2_xxx::map_handle::truncate ( size_type  newsize,
bool  permit_relocation 
)
inlinenoexcept

Resize the reservation of the memory map without changing the address (unless the map was zero sized, in which case a new address will be chosen).

If shrinking, address space is released on POSIX, and on Windows if the new size is zero. If the new size is zero, the address is set to null to prevent surprises. Windows does not support modifying existing mapped regions, so if the new size is not zero, the call will probably fail. Windows should let you truncate a previous extension however, if it is exact.

If expanding, an attempt is made to map in new reservation immediately after the current address reservation, thus extending the reservation. If anything else is mapped in after the current reservation, the function fails.

Note
On all supported platforms apart from OS X, proprietary flags exist to avoid performing a map if a map extension cannot be immediately placed after the current map. On OS X, we hint where we'd like the new map to go, but if something is already there OS X will place the map elsewhere. In this situation, we delete the new map and return failure, which is inefficient, but there is nothing else we can do.
Returns
The bytes actually reserved.
Parameters
newsizeThe bytes to truncate the map reservation to. Rounded up to the nearest page size (POSIX) or 64Kb on Windows.
permit_relocationPermit the address to change (some OSs provide a syscall for resizing a memory map).
Errors returnable\n Any of the values POSIX mremap(), mmap(addr) or VirtualAlloc(addr) can return.

◆ try_lock_file()

virtual bool llfio_v2_xxx::lockable_byte_io_handle::try_lock_file ( )
inlinevirtualnoexceptinherited

Tries to lock the inode referred to by the open handle for exclusive access, returning false if lock is currently unavailable.

Note that this may, or may not, interact with the byte range lock extensions. See unique_file_lock for a RAII locker.

Errors returnable\n Any of the values POSIX flock() can return.
Memory Allocations\n The default synchronous implementation in file_handle performs no memory allocation.

◆ try_lock_file_shared()

virtual bool llfio_v2_xxx::lockable_byte_io_handle::try_lock_file_shared ( )
inlinevirtualnoexceptinherited

Tries to lock the inode referred to by the open handle for shared access, returning false if lock is currently unavailable.

Note that this may, or may not, interact with the byte range lock extensions. See unique_file_lock for a RAII locker.

Errors returnable\n Any of the values POSIX flock() can return.
Memory Allocations\n The default synchronous implementation in file_handle performs no memory allocation.

◆ unlock_file_range()

virtual void llfio_v2_xxx::lockable_byte_io_handle::unlock_file_range ( extent_type  offset,
extent_type  bytes 
)
inlinevirtualnoexceptinherited

EXTENSION: Unlocks a byte range previously locked.

Parameters
offsetThe offset to unlock. This should be an offset previously locked.
bytesThe number of bytes to unlock. This should be a byte extent previously locked.
Errors returnable\n Any of the values POSIX fcntl() can return.
Memory Allocations\n None.

Reimplemented in llfio_v2_xxx::fast_random_file_handle.

◆ write() [1/2]

io_result<const_buffers_type> llfio_v2_xxx::byte_io_handle::write ( io_request< const_buffers_type >  reqs,
deadline  d = deadline() 
)
inlinenoexceptinherited

Write data to the open handle, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation.

Warning
Depending on the implementation backend, not all of the buffers input may be written. For example, with a zeroed deadline, some backends may only consume as many buffers as the system has available write slots for, thus for those backends this call is "non-blocking" in the sense that it will return immediately even if it could not schedule a single buffer write. Another example is that some implementations will not auto-extend the length of a file when a write exceeds the maximum extent, you will need to issue a truncate(newsize) first.
Returns
The buffers written, which may not be the buffers input. The size of each scatter-gather buffer returned is updated with the number of bytes of that buffer transferred.
Parameters
reqsA scatter-gather and offset request.
dAn optional deadline by which the i/o must complete, else it is cancelled. Note function may return significantly after this deadline if the i/o takes long to cancel.
Errors returnable\n Any of the values POSIX write() can return, errc::timed_out, errc::operation_canceled. errc::not_supported may be
returned if deadline i/o is not possible with this particular handle configuration (e.g. writing to regular files on POSIX or writing to a non-overlapped HANDLE on Windows).
Memory Allocations\n The default synchronous implementation in file_handle performs no memory allocation.
346  {
347  return (_ctx == nullptr) ? _do_write(reqs, d) : _do_multiplexer_write({}, std::move(reqs), d);
348  }
virtual io_result< const_buffers_type > _do_write(io_request< const_buffers_type > reqs, deadline d) noexcept
The virtualised implementation of write() used if no multiplexer has been set.

◆ write() [2/2]

io_result<const_buffers_type> llfio_v2_xxx::byte_io_handle::write
inlinenoexcept

Write data to the open handle, preferentially using any i/o multiplexer set over the virtually overridable per-class implementation.

Warning
Depending on the implementation backend, not all of the buffers input may be written. For example, with a zeroed deadline, some backends may only consume as many buffers as the system has available write slots for, thus for those backends this call is "non-blocking" in the sense that it will return immediately even if it could not schedule a single buffer write. Another example is that some implementations will not auto-extend the length of a file when a write exceeds the maximum extent, you will need to issue a truncate(newsize) first.
Returns
The buffers written, which may not be the buffers input. The size of each scatter-gather buffer returned is updated with the number of bytes of that buffer transferred.
Parameters
reqsA scatter-gather and offset request.
dAn optional deadline by which the i/o must complete, else it is cancelled. Note function may return significantly after this deadline if the i/o takes long to cancel.
Errors returnable\n Any of the values POSIX write() can return, errc::timed_out, errc::operation_canceled. errc::not_supported may be
returned if deadline i/o is not possible with this particular handle configuration (e.g. writing to regular files on POSIX or writing to a non-overlapped HANDLE on Windows).
Memory Allocations\n The default synchronous implementation in file_handle performs no memory allocation.
346  {
347  return (_ctx == nullptr) ? _do_write(reqs, d) : _do_multiplexer_write({}, std::move(reqs), d);
348  }
virtual io_result< const_buffers_type > _do_write(io_request< const_buffers_type > reqs, deadline d=deadline()) noexcept override
The virtualised implementation of write() used if no multiplexer has been set.

◆ zero_memory()

result<void> llfio_v2_xxx::map_handle::zero_memory ( buffer_type  region)
inlinenoexcept

Zero the memory represented by the buffer. Differs from zero() because it acts on mapped memory, not on allocated file extents.

On Linux only, any full 4Kb pages will be deallocated from the system entirely, including the extents for them in any backing storage. On newer Linux kernels the kernel can additionally swap whole 4Kb pages for freshly zeroed ones making this a very efficient way of zeroing large ranges of memory. Note that commit charge is not affected by this operation, as writes into the zeroed pages are guaranteed to succeed.

On Windows and Mac OS, this call currently only has an effect for non-backed memory due to lacking kernel support.

Errors returnable\n Any of the errors returnable by madvise() or DiscardVirtualMemory or the zero() function.

The documentation for this class was generated from the following file: