$darkmode
Elektra 0.9.12
|
One of Elektra's core goals is low memory usage. Currently, there are many places within Elektra where keys and keysets are duplicated and copied around. Most of those copied keys are never modified, but are required to be detached from the lifetime of the original instance. We want to introduce an in-memory copy-on-write mechanism to lower our memory usage.
In the near future, Elektra will also gain facilities for change tracking and session recording, both of which will potentially again duplicate keys. There are also aspirations to create a new, simple internal cache that would also benefit from a copy-on-write mechanism.
Key
and a KeySet
must be unaffected by copy-on-write.There is already some kind of copy-on-write semantics within libelektra-core
to support the mmapstorage
plugin. We can build on this and add a more generic copy-on-write to it.
This is already implemented for the MMAP cache, so the implementation should be straightforward: Do the same COW duplications as done for MMAP but with a different flag.
For the metadata, however, also COW KeySets might be needed (at least with the current API). Example:
Pros:
Cons:
Lifetime of a copied COW key MUST be less than the key it was copied from. We can not track how many keys point to the same data and name this way, so we can only free data and name if the key does not have the COW flag. If the original key gets deleted, using a COW key that points to the same data and name will lead to corrupt data. The same is true for updating values of the original key.
This is only problematic if we want to use COW for keys outside of KDB
. If it is only for use within KDB
, especially for usage as internal cache and in change tracking, we could always guarantee that the original keys are going to last as long as the KDB
instance. However, we need to document for the users of Elektra that keys returned from kdbGet
are only valid until kdbClose
. If they want to continue using them afterwards, they'd have to deep copy them.
Triggering the delete problem:
Triggering the update problem:
The same problems in principle exist for mmapstorage
where kdbSet
frees (munmap
) the keyset. We can still let users access the flag ELEKTRA_CP_COW
, we just need to clearly document what is forbidden. Maybe set the KEY_FLAG_RO_VALUE
on the original key, so that the API itself detects the error. There is, however, no flag for keyDel
that we could set.
The struct _Key
will be extended with two more pointers, if we want to eliminate the lifetime problem:
struct _Key * origData
: points to the key containing the referenced datastruct _Key * origName
: points to the key containing the referenced nameThose two pointers are needed for memory management. Each referenced key will also have its reference counter increased. This way, an original key can be keyDel()
d without impacting the copied keys.
Three new key flags will be added:
KEY_FLAG_COW_VALUE
: the value points to a value of another keyKEY_FLAG_COW_NAME
: the name points to the name of another keyKEY_FLAG_COW_META
: metakeys are copy-on-writeA new copy flag will be added:
KEY_CP_COW
: instructs keyCopy
to copy whatever it should copy as copy-on-write. This will NOT be part of KEY_CP_ALL
. We don't want developers outside of Elektra to accidentally use this.If keyCopy()
is instructed to do a copy-on-write copy:
dest->data.v
and dest->data.c
point to the exact same location as in the source. dest->dataSize
is set to the same value as source->dataSize
. KEY_FLAG_COW_VALUE
is set on dest->flags
. KEY_FLAG_RO_VALUE
is set on source->flags
. dest->originalData
is set to source
. source->refs
is incremented.dest->key
points to source->key
. dest->keySize
is set to the same value as source->keySize
. dest->ukey
points to source->ukey
. dest->keyUSize
is set to the same value as source->keyUSize
. KEY_FLAG_COW_NAME
is set on dest->flags
. KEY_FLAG_RO_NAME
is set on source->flags
. dest->originalName
is set to source
. source->refs
is incremented.dest->meta
points to a new keyset. The keys in dest->meta
are also copied with KEY_CP_COW
, i.e. they are also copy-on-write keys. KEY_FLAG_COW_META
is set on dest->flags
. KEY_FLAG_RO_META
is set on source->flags
.The source key will remain as a read-only key. This constraint is needed, because the source key is the only key we can free the resources on. If the data or the name would change in the source key, all COW-copied keys would suddenly have another value. For the same reason, the source key will need to live longer than all COW-copied keys from it.
A keyCopy()
without KEY_CP_COW
from an COW key will create a deep copy of the key. These keys are "normal" non-COW keys and can live on their own.
Every key*()
method that modifies data on a COW-copied key will need to allocate new memory for this data and remove the KEY_FLAG_COW_DATA
flag. Every key*()
method that modifies the name of a COW-copied key will need to allocate new memory for this name and remove the KEY_FLAG_COW_NAME
flag. Every key*()
method that modifies metadata needs to make sure that the same happens for metakeys.
Keysets are not copy-on-write. A ksDeepDup()
of a keyset with COW keys will create a keyset with deep-copied non-COW keys. Internally we may need a ksCowDup()
function to create a keyset with copy-on-write keys from another keyset. Whether this function will be part of the public API is a point for discussion.
Make Elektra's Key
and KeySet
data structures copy-on-write. This requires some major refactoring of code within libelektra-core
. Code that does only interact with the data structures via the public libelektra-core
API should not notice any differences. The mmapstorage
plugin needs to be updated.
Unlike "mmapstorage-like COW implementation" keyDup, keyCopy, ksCopy and ksDup will always use COW. ksCopy
and ksDup
is needed for (de)duplication of metadata. Furthermore, the API has better usability if Key and KeySet behave the same, especially for bindings where duplication might happen implicitly.
For the Key
, we need to extract everything for the data and name into their own structs. This is done for memory-management reasons, as we need to track how many keys point to the same data and/or name.
@mpranj's thoughts regarding moving name and data to separate structures:
1. If they [key name and data] are a separate entity,
mmapstorage
will need a flag once again for each of those. This is used to mark whether the data is in an mmap region or not. (or we find some bit somewhere that we can steal for this purpose)
- Adding more indirections is probably not going to help performance. (I understand that we save memory here)
For KeySet
, we need to split out everything to do with the stored keys into a separate datastructure. This includes the array itself, the sizes and the hashmap.
Why don't we just add the number of references to the original KeySet
?
We need reference counting for the internal COW datastructures. We do it the same way reference counting currently works for Key
and KeySet
. One tweak though is that the refcount should never be 0, as this does not make sense for internal datastructures.
This means we always increment the refcount after creation and always decrement before deletion, so that the refcount is never zero. An example implementation is shown below:
Instead of using different structs for _KeyData
, and _KeyName
use a more generic struct for reference counting. This would avoid some duplication on the reference counting code for the key. Keysets will still have their own data struct, as it contains more than just a pointer and a size.
In general, it should be possible to always do copy-on-write. From a users perspective, copy-on-write copies of a key (and a keyset) should behave the same. There is, however, one edge case: the user modifying the value of a key directly. This is shown in the following example:
This edge case can be accounted for by providing a private function keyDetach
, that forces that the key has its very own copy of the data.
If we do change the internal data structures it makes much more sense to fix the cache and mmapstorage afterwards (or in tandem). The most important constraint for mmap is that any structure (or bytes) that is an allocation unit (e.g. we malloc() the bytes needed for KeySet struct, so this is an unit) needs to have a flag to determine whether those bytes are actually malloc()ed or they are mmap()ed. Thus all the newly added structures as proposed will need some kind of an mmap flag.
mmapstorage
only calls munmap
in some error cases, so basically munmap
is almost never done and the keyset is never invalidated.
During kdbSet
the storage plugins always write to a temp file, due to how the resolver works. We also don't need to mmap the temp file here: when doing kdbSet
we already have the KeySet
at hand, mmap-ing it is not needed at all, because we have the data. We just want to update the cache file. The mmap
/munmap
in kdbSet are just so we can write the KeySet to a file in our format. (mmap()
is just simpler, but we could also malloc()
a region and then fwrite()
the stuff)
Therefore the only case where we return a mmap()
ed KeySet should be in kdbGet
.
When the mmapstorage
was designed/implemented, not all structures had refcounters, so there was no way to know when a munmap
is safe. This was simply out of scope at that point in time.
If refcounting is now implemented for all structures, we might be able to properly munmap
in future.
Two ideas to deal with this in conjunction with our reference counting implementation:
If we have free
function-pointer along side the refcount, mmapstorage
(and also other plugins with different allocators) could set it to their own implementation. To mimic the current behavior of mmapstorage
this would point to a no-op function. However, we could also improve things and keep track of when all data has been freed and only then call munmap
.
Another simpler way to avoid the flag, which doesn't really allow for further improvements, would be using the refcount. mmapstorage
could set the refcount to a value that is otherwise illegal. This would allow us to detect the keys. Depending on the refcount implementation good values would probably be 0 or UINT16_MAX. The special value would have to ignored by all refcounting functions (inc, dec, del) and turn the functions into no-ops.
_KeySetData
, _KeyData
and _KeyName
. We could then allocate multiple of them at the same time, and borrow and give back instance from and to the pool.KeySet * meta
directly in struct _Key
. This may help with performance in cases we need metadata. It will, however, increase memory usage. This should only be considered after some benchmarking shows this is a real issue.The following calculations are based on the AMD64 platform. All results are in bytes unless stated otherwise.
Example key: user:/hosts/ipv6/example.com = 2606:2800:220:1:248:1893:25c8:1946
We want to measure the following properties for the key:
Approach | Empty Key | Empty Key (with name) | Empty Key (with name + data) | Single Example Key | Example Key + 1 Duplicate | Example Key + 2 Duplicates |
---|---|---|---|---|---|---|
Current Implementation | 64 | 64 | 64 | 153 | 306 | 459 |
mmapstorage-like COW implementation (without additional pointers) | 64 | 64 | 64 | 153 | 217 | 281 |
mmapstorage-like COW implementation (with additional pointers) | 80 | 80 | 80 | 169 | 249 | 329 |
Full-blown COW implementation | 32 | 72 | 96 | 185 | 217 | 249 |
Full-blown COW implementation - Variant 1 (RcBuffer) | 40 | 88 | 112 | 201 | 241 | 281 |
We want to measure the following properties for the keyset:
Approach | Empty KeySet | Empty KeySet (with data) | Example KeySet | Example KeySet + 1 Duplicate | Example KeySet + 2 Duplicates |
---|---|---|---|---|---|
Current Implementation | 64 | 64 | 192 | 384 | 576 |
mmapstorage-like COW implementation (without additional pointers) | 64 | 64 | 192 | 384 | 576 |
mmapstorage-like COW implementation (with additional pointers) | 64 | 64 | 192 | 384 | 576 |
Full-blown COW implementation | 16 | 64 | 192 | 208 | 224 |
Raw data size:
28 + 1
= 29
25
34 + 1
= 35
Current Implementation:
sizeof
]: 64
64
64
Empty Key + keyname + unescaped keyname + data
= 64 + 29 + 25 + 35
= 153
Single Example Key * 2
= 153 * 2
= 306
Single Example Key * 3
= 153 * 3
= 459
sizeof
]: 64
64
Empty KeySet (with data) + 16 * pointer to keys
= 64 + 16 * 8
= 192
Example KeySet * 2
= 192 * 2
= 384
Example KeySet * 3
= 192 * 3
= 576
mmapstorage-like COW implementation (without additional pointers):
sizeof
]: 64
64
64
Empty Key + keyname + unescaped keyname + data
= 64 + 29 + 25 + 35
= 153
Single Example Key + Empty Key
= 153 + 64
= 217
Single Example Key + Empty Key * 2
= 153 + 64 * 2
= 281
mmapstorage-like COW implementation (with additional pointers):
sizeof
]: 64
sizeof
]: 80
80
80
Empty Key + keyname + unescaped keyname + data
= 80 + 29 + 25 + 35
= 169
Single Example Key + Empty Key
= 169 + 80
= 249
Single Example Key + Empty Key * 2
= 169 + 80 * 2
= 329
Full-blown COW implementation:
sizeof
]: 32
sizeof
]: Empty Key + sizeof(KeyName)
= 32 + 40
= 72
sizeof
]: Empty Key + sizeof(KeyName) + sizeof(KeyData)
= 32 + 40 + 24
= 96
Empty Key (with name + data) + keyname + unescaped keyname + data
= 96 + 29 + 25 + 35
= 185
Single Example Key + Empty Key
= 185 + 32
= 217
Single Example Key + Empty Key * 2
= 185 + 32 * 2
= 249
sizeof
]: 16
Empty KeySet + sizeof(KeySetData)
= 16 + 48
= 64
Empty KeySet (with data) + 16 * pointer to keys
= 64 + 16 * 8
= 192
Example KeySet + Empty KeySet
= 192 + 16
= 208
Example KeySet + Empty KeySet * 2
= 192 + 16 * 2
= 224
Full-blown COW implementation - Variant 1 (RcBuffer):
sizeof
]: 40
sizeof
]: Empty Key + sizeof(RcBuffer)*2
= 40 + 24*2
= 88
sizeof
]: Empty Key + sizeof(RcBuffer)*3
= 40 + 24*3
= 112
Empty Key (with name + data) + keyname + unescaped keyname + data
= 112 + 29 + 25 + 35
= 201
Single Example Key + Empty Key
= 201 + 40
= 241
Single Example Key + Empty Key * 2
= 201 + 40 * 2
= 281
sizeof
]: 16
Empty KeySet + sizeof(KeySetData)
= 16 + 48
= 64
Empty KeySet (with data) + 16 * pointer to keys
= 64 + 16 * 8
= 192
Example KeySet + Empty KeySet
= 192 + 16
= 208
Example KeySet + Empty KeySet * 2
= 192 + 16 * 2
= 224
For allocations want to measure the following properties:
Approach | Empty Key | Empty Key (with name) | Empty Key (with name + data) | Duplication | Key + 1 Duplication | Key + 2 Duplications |
---|---|---|---|---|---|---|
Current Implementation | 1 | 1 | 1 | 1 | 2 | 3 |
mmapstorage-like COW implementation (without additional pointers) | 1 | 1 | 1 | 1 | 2 | 3 |
mmapstorage-like COW implementation (with additional pointers) | 1 | 1 | 1 | 1 | 2 | 3 |
Full-blown COW implementation | 1 | 2 | 3 | 1 | 4 | 5 |
Implement the full-blown COW approach.
Key
and KeySet
objects.mmapstorage
plugins needs to be updated