Elektra
0.9.11
|
This document is a full explanation of how key names work in Elektra.
In addition to this document, a reference Python implementation can be found in keynames.py. The goal of the Python implementation is not to be fast, or to be used in any way other than as a reference. If there are any discrepancies between this document, the Python implementation and the actual C implementation in src/libs/elektra/keyname.c, you should consider them as follows:
In any case: If you find any discrepancies, please file a bug report at https://issues.libelektra.org/new.
Note: Mistakes happen. The goal of the Python implementation was to provide a reference that has a very high likelihood of being correct.
To Elektra developers: Feel free to add any unclear or previous incorrect examples to the test cases in tests/ctest/test_keyname.c. These tests are very fast (1000+ test cases per second) and the more tests the better.
Before we dive into key names, we need to talk about key name parts and namespaces.
Each key is part of one of these namespaces:
Each of these namespaces has a very specific meaning, explained in Section 1.2.
Apart from the namespace, a key name is just a series of zero or more key name parts. Each key name part is just an arbitrary (possibly empty) sequence of non-zero bytes.
So without knowing anything about how key names are written, we could say that there is a key in the namespace "system" with the key name parts "elektra", "version" and "info".
Note: Not every such sequence, is a valid key name. For more information see Section 4
Elektra's keys commonly look like Unix paths:
Note: How this representation works exactly and how namespaces come into play, will be explained in the next section. For now, we only care that there is some similarity to Unix paths.
This is on purpose. Elektra's key database (KDB) is designed to resemble a Unix filesystem as much as possible. In particular, the KDB has a similar hierarchy. More generally, all key names exhibit this hierarchy. By going back to thinking about a key name as a namespace and a series of key name parts, we can define this key hierarchy.
Each namespace has a separate hierarchy. The relation between these, will be explored in the next section. For now, we just look at a single namespace.
In a Unix filesystems, we commonly talk about files and directories. We also say a file is located within a directory. But you might also know that in Unix "everything is a file". This applies to directories as well, but "a file is located within a file" is a bit clunky, so you might say "file `A` is located below file `B`", if B
is a file within the directory A
. What makes A
a directory is just the fact that there can be other files below A
[1].
This relation of "is below" is also what defines Elektra's key hierarchy. Based on a key K
with n
key name parts, we say:
Km
is below K
, if Km
has n+m
key name parts and the first n
key name parts of Km
are equal to the key name parts of K
(in the same order).K1
is directly below K
, if K1
is below K
and K1
has n+1
key name parts.Here are a few examples to show how this works in practice (using the Unix-path-like representation teased above):
Key 1 | Key 2 | Relation |
---|---|---|
/elektra/version/info | /elektra/version | "Key 1" is directly below "Key 2" |
/elektra/version/info | /elektra | "Key 1" is below "Key 2" |
/elektra | /elektra/version/info | "Key 2" is below "Key 1" |
/elektra/version/info | /elektra/version/info | "Key 1" and "Key 2" are equal |
/elektra/version/info | /elektra/data | no relation |
/elektra/data | /elektra/version | "Key 1" and "Key 2" are siblings |
You can think of the key hierarchy (within a single namespace) as a big tree of keys. Each node in the tree is a single key K
and the children of the nodes are the keys that are directly below K
.
The diagram above shows the key hierarchy of the keys in the table above (A
-> B
denotes A
is directly below B
).
Note: While we could use the directory vs. file terminology for Elektra as well, it is recommended to avoid it. This is because in Elektra, every key may have an associated value. In particular a key may have a value, even if there are other keys below it. This value is not, as a beginner might suspect, the set of keys below it, like you could say the value of a directory is the set of files (and directories) within. The value is just another value, like any other key would have.
Instead, we recommend the terms leaf key and non-leaf key, as these are commonly used for tree-like structures and their definitions fit perfectly.
As an inverse to "is below" we sometimes use "parent". For example:
/elektra/version
is directly below /elektra
, so /elektra
is the (direct) parent of /elektra/version
/elektra/version/info
is below /elektra
, so /elektra
is a parent of /elektra/version
/elektra/version/info
and /elektra/version
are below /elektra
, so /elektra
is the common parent.This terminology can be easy to confuse, as "parent" is used for multiple different things. If the context doesn't make it clear what "parent" means, you might consider more clear terms, e.g. other common terms for tree-like structures. For example, you could use "ancestor" and "direct parent" as a clear differentiation.
There is also the term "(the) parent key". In many cases, this is refers to a very specific key in the given context that is a "common parent" to a certain set of keys. If you are working within such a context, be careful about "a parent key" vs. "the parent key".
We mentioned above that there are different namespaces in Elektra. Now we will explain their meaning.
To recap, Elektra knows these namespaces:
We mentioned above that there are key names with zero key name parts, i.e. just namespace. These are called root keys (based on Unix's filesystem root, as well as the root of a tree).
Let us explore them one by one:
KeySet
.There is also a certain ranking between the namespaces "proc", "dir", "user", "system" and "default". Namely, that they override each other in exactly this order. Given two key names with identical key name parts, but one with namespace "dir" and one with namespace "user", the one with namespace "dir" should be considered more specific and should be preferred.
A special feature of Elektra is namespace resolution. Namespace resolution is the process of finding an appropriate namespace for a key based on a key name with namespace "cascading". It is most commonly used, when you need to find which key in the KDB should be used, based on a series of key name parts.
To resolve the namespace, we just look at each of the namespaces in the ranking defined above. We then use the first namespace where the key actually exists. Namespace resolution is performed, when ksLookup
/ksLookupByName
is called with a key name with namespace "cascading" [2]. This is also done, if you call kdb get
or kdb set
with a key name with namespace "cascading".
The standard way to represent a key name in Elektra is this:
This can be deconstructed into:
system
:
/
elektra
/
version
/
info
Note: It might seem strange, that there is a part separator before the first part. This makes sense, because then the part separator always introduces a new part. A better fitting description would be "part introducer". But since we commonly call
/
a separator, we will stick to this terminology.
For keys in the namespace "cascading", we omit both the namespace itself and as well as the namespace separator:
We also have a special rule for the root keys, i.e. the key names with zero key name parts and just a namespace. According to above rules system:
would be the escaped name of the root key for the namespace "system" and the escaped name for the root key of the namespace "cascading" would be an empty string. But this is not the case. We use system:/
and /
instead. The exact details will be explored later, but for now just remember, that an (escaped) key name always contains at least one part separator (/
).
But there is a problem. We said a key name part is "an arbitrary sequence of non-zero bytes". This means "elektra/version" is a key name part as well. Since the slash /
would conflict with the part separator, we can escape it with a backslash \
(and \
can be escaped with another \
):
This can be deconstructed into:
/
elektra/version
/
info
back\slash
Note: When talking about a single key name part
/
and\
are never escaped.
Because of this escaping mechanism, we call this the escaped name of a key.
Elektra's key names are designed to mimic Unix paths to some extent. To this end we support the commonly used /.
and /..
. This is one reason, why we need to differentiate between canonical key names and non-canonical key names.
Following the syntax of Unix paths, in Elektra both /elektra/./version
and /elektra/version
refer to the same key. Similarly, /elektra/../version
and /version
refer to the same key.
To give each key a unique key name, we need to introduce a canonical (escaped) key name. For Unix paths, we could say that the canonical path is the shortest possible path that refers to a file. In Elektra this doesn't quite work, but will use this definition for now.
Note: Only escaped key names can be canonical or non-canonical, so we normally omit the "escaped" specifier.
Let us look at a few examples to get a feeling for canonical and non-canonical key names.
Non-canonical | Canonical |
---|---|
/elektra/./version | /elektra/version |
/elektra/../version | /version |
/elektra/.././version | /version |
/elektra///version | /elektra/version |
/elektra//../version | /version |
/elektra/./../version | /version |
/elektra/../../ | / |
user:/elektra/../../ | user:/ |
/elektra/version/ | /elektra/version |
As you can see, the behavior of .
and ..
matches that of Unix paths as long as we are using the namesapce "cascading". The namespace of a key name can never be changed via ..
, so user:/..
is equivalent to user:/
.
There is also a small difference in the last example. In Unix such paths would also refer to the same file, but in some Unix tools a trailing slash alters the behavior of the tool. In Elektra this is never the case. /elektra/version/
and /elektra/version
refer to the same key and are always treated as the same key name.
The only exception are the root keys. The canonical key names for the root keys always end with a /
. In fact, we will see later, that removing the /
makes the key name invalid.
There also is a completely new addition in Elektra. Elektra has a notion of array parts. These are key name parts that denote an array index. How exactly these work, will be explored later. For now, we only need to know that they start with an #
and their canonical form has n
underscores followed by n+1
digits.
A few examples for array parts:
Non-canonical | Canonical |
---|---|
/elektra/#10 | /elektra/#_10 |
/elektra/#1234 | /elektra/#___1234 |
We already know, that /
and \
have to be escaped in an escaped key name. In addition to these two, there are a few more characters that have to be escaped. However, these additional characters may only be escaped under certain conditions.
The characters in question are: .
, #
, %
. The conditions under which these characters can be escaped can be found below.
While the escaped name of a key is nice for humans, it is not very well suited for machines. The escaping of path separators makes it hard to find the key name parts of a given key name. The escaped name is also not well suited for sorting [3].
Both of these flaws are solved by the unescaped name. In unescaped names we use a zero-byte as the part separator. Since a key name part cannot contain a zero-byte, we do not need an escaping mechanism for the path separator.
However, these zero-bytes mean that the unescaped name is not a printable string and therefore not human-readable. This why we will only describe the unescaped name in the deconstructed form.
The escaped names from above correspond to the following unescaped names:
system:/elektra/version/info
0x07
0x00
elektra
0x00
version
0x00
info
0x00
/elektra/version/info
0x01
0x00
elektra
0x00
version
0x00
info
0x00
/elektra\/version\\/info
0x01
0x00
elektra/version\
0x00
info
0x00
/
:0x01
0x00
0x00
Note: We use 0xZZ to represent a single byte in hexadecimal. This form is only used when the context makes it clear that it represents a single byte and not a four character string.
The process of turning an escaped name into the corresponding unescaped name is called unescaping. Turning an unescaped name back into an escaped name is called escaping.
Unescaping works, by simply removing the backslashes \
that are used as escapes. This applies both to \/
and \\
anywhere in key name parts, as well as to the escape sequences that are only used at the start of key name parts, e.g. \#
.
Not all of the key names described by the above sections are valid under all circumstances. You might also say, that a key name as defined above does not necessarily refer to a key in the KDB. So while it might be a key name, there is no key that actually uses this name. Only valid key names refer to a key in the KDB and may be used by a key as its name. The remaining key names are referred to as invalid key names.
For unescaped key names, it is pretty simple: Unescaped key names are valid key names, if all of the following are true:
0x01
- 0x08
.0x00
.0x00
.For this reason unescaped names for root keys are 3 bytes long, with the last two bytes being zero-byte. Between the second and the last byte, we find the key name parts separated by a zero-byte 0x00
.
For escaped key names it is easier to define, what makes an invalid key name, than what makes a valid key name, so we will go this route. An escaped key name is considered an invalid key name, if any of the following are true:
\
. This is a dangling escape as we expect another character after the escape character \
.:
is not followed by a /
. (This mainly applies to root keys.):
, but the substring before the first :
is not one of: meta
, spec
, proc
, dir
, user
, system
and default
./%
or consists of a namespace followed by the namespace separator :
followed by /%
. In other words, the first escaped part is translated into an empty unescaped part. The unescaped names for these keys would collide with the root keys /
, user:/
, etc.Note: The C-API does not allow you to construct a
Key
with an invalid key name; for examplekeyNew
(andkeyVNew
) will returnNULL
.
The escape sequences \\
and \/
are always valid. For the other escape sequences certain conditions must be fulfilled:
\.
: can be used at the start of a key name part, if the whole key name part is \.
or \..
. In other words, \
can be used to escape the behavior of .
and ..
.\#
: can be used at the start of a key name part, if the key name part would be a non-canonical array part without the \
. specifically, \#
can be used, if the key name part matches the regular expression \\#[1-9][0-9]+
and the digits form a number less than or equal to 9223372036854775807 (= 2^63 - 1
). Meaning, \
can be used to avoid array part canonicalization.\%
: can be used, if the whole key name part is \%
. That is \%
is the escaped version of %
(the empty key name part).It may seem weird that some escape sequences have such specific requirements. This is necessary to create a 1:1 mapping between (canonical) escaped names and unescaped names. Without the restrictions, e.g. both \abc
and abc
would be unescaped as abc
. In addition, the conditions for \#
have to be so specific, because \
must only be allowed, if it affects unescaping. For example, we cannot allow \#abc
, because that would unescape into #abc
, just like #abc
.
As mentioned above, Elektra has a notion of array parts. More specifically, certain key name parts will be interpreted as array indices under certain circumstances (see also documentation for arrays).
We already mentioned above, that array parts have canonical and non-canoncial forms.
A canonical array part is a hash-sign #
followed by n
underscores (_
) followed by n+1
digits. Additionally, the digits must form a number greater than or equal to 0
and less than or equal to 9223372036854775807 (= 2^63 - 1
). The number must not have any leading 0
s (except for the number zero itself).
In non-canonical key names, the underscores (_
) are omitted. That is, either the correct number of underscores is used or no underscores at all.
Any other key name part that starts with a #
is never an array part.
All key name parts starting with #
are valid. It does not matter, if the key name part is an array part or not. However, some parts of Elektra may expect array parts under certain circumstances. Providing other key name parts under such circumstances, may cause problems. If the context doesn't call for an array part, then array parts behave no different and are treated as plain string just like any other key name part.
Finally, some examples:
Key name part | Behavior in array | Behavior elsewhere |
---|---|---|
#0 | Index of first element | Child named #0 |
#_10 | Index of 11th element | Child named #_10 |
#_99 | Index of 100th element | Child named #_99 |
#__100 | Index of 101st element | Child named #__100 |
#abc | Possible error [4] | Child named #abc |
#_100 | Possible error [4] | Child named #_100 |
#__10 | Possible error [4] | Child named #__10 |
#10a | Possible error [4] | Child named #10a |
Apart from invalid key names, which cannot be constructed via the C-API, there are also reserved key names. These can be used with the C-API (keyNew
returns a valid Key *
), but there might be situations, in which Key
s with such key names are treated differently or rejected entirely.
Generally speaking, any part of Elektra may define that some key names have special meaning, are not allowed, etc. However, sometimes guaranteed compatibility with other parts of Elektra is required. A good example are storage plugins. A storage plugins should strive to be compatible with as much of Elektra as possible. But since the storage plugin doesn't know anything about other plugins or even the application using Elektra, it is hard to attribute special meaning to certain key names.
This is why there are two types of reserved key name:
system:/elektra
: These key names are reserved for Elektra's internals. Each of these keys has a very specific purpose that is defined globally for all of Elektra. Using such a key name automatically carries this meaning. Even outside the context in which Elektra uses these directly, you should never use system:/elektra
keys for other purposes.®elektra
: These key names are reserved, but their meaning depends on the context. Similar to the METADATA.ini file for metadata, some conventions for these key names are defined in reserved name document.Note: We use UTF-8 here, so
®elektra
is specifically the 9-byte sequenceC2 AE 65 6C 65 6B 74 72 61
.
Key
s with such key names will never be used in the interface between storage plugins and the rest of Elektra. This allows storage plugins to use ®elektra
to encode things that otherwise wouldn't be possible (e.g. values of non-leaf keys).
[1]: This explanation of Unix paths and Unix filesystems, is not entirely accurate. But it is good enough for our purposes, so we will just ignore some details. ↑
[2]: The actual process of resolution process that happens in ksLookupByName
and ksLookup
is a bit more complicated. It may involve some keys with namespace "spec" as well. ↑
[3]: For performance reasons, we want to make the comparison between to key names as fast as possible. A good solution is a single memcmp
. But this doesn't account for the fact that key names represent a hierarchy and that /
has a special meaning: /key/sub
should always be sorted after /key
and before /key.1
. With memcmp
, /key
is first, because it is the shortest and otherwise equal. But then we would get /key.1
not /key/sub
, because / < .
in ASCII. This cannot happen with zero-bytes as the separator, because there is no byte with a smaller value. ↑
[4]: We classify this as a "possible error", because not all parts of Elektra will fully validate all conventions and rules around arrays. In particular, a standard KeySet
itself imposes no restrictions. Therefore, you can use any KeySet
locally in your application. As soon as you pass the KeySet
into an external function (e.g. kdbSet
), however, errors may occur since the KeySet
may pass through a function that requires KeySet
which are properly validated against array rules. For more details what conventions and rules exist around arrays, see documentation for arrays. ↑