Skip to content

Instantly share code, notes, and snippets.

@enkore
Last active May 24, 2017 13:46
Show Gist options
  • Save enkore/d16849b9e2eecdab0903bcd37bd0ee27 to your computer and use it in GitHub Desktop.
Save enkore/d16849b9e2eecdab0903bcd37bd0ee27 to your computer and use it in GitHub Desktop.
Borg DEK spec

Design goals for new cryptography in Borg

2017-04/05 update

CLK4.2:

  • Note that all type_byte || xxx constructions don't work unless type_byte != the current type bytes

More compact coding. Authenticate ID via auth tag.:

type_byte := suite_id(4bit) || key_type(4bit)  // thus, suite_id=0 means the current crypto
envelope := type_byte(1) || session_id(16) || message_iv(4) || auth_tag(16) || body
session_id := 16 random bytes

See http://i.imgur.com/Thdd195.png

2016-11-XX/2016-12-XX/undisclosed

CLK4.:

envelope := type_byte || suite_id || session_id || body
type_byte := key mode byte
suite_id := uint8/uint16 (LE)
session_id := >16 bytes, chosen aptly, fixed length. e.g. 29 bytes for a total header length of 32 bytes, or just 32 bytes, or even just 16 bytes.

Protocol:

  1. Session initiation -> generate random session_id

  2. Choose cipher suite of session

  3. HKDF parameters defined:

    ikm := enc_key || enc_hmac_key || id_key
    info := borg-data-key
    salt := suite_id || session_id
    alg := HMAC-SHA-512
    
  4. Output length defined by suite.

  5. Suite authenticated data (SAD): entire header

Analysis

  1. R1: Inherent agility (suite_id)
  2. R2: matter of the cipher suite
  3. R3: matter of the cipher suite
  4. R4: Does not require extra storage for data keys
  5. R4: Every chunk has the information necessary for deriving the data key
  6. R5: matter of the cipher suite

Compared

  1. CLK4 doesn't need key wrapping algorithms as CLK1 or CLK3 would require
  2. CLK4 avoids per-chunk keys, which raised doubts regarding KDF invocation counts in CLK2
  3. CLK4 would seem to have lowest overall computational requirements; HKDF is cheaper than key-wrapping, and every session only needs a couple bytes out of the slow system CSPRNG. Unlike CLK2 the KDF is only invoked seldom (session-scoped).
  4. CLK4 overhead is constant regardless of the key requirements of the used suite (unlike CLK1/CLK3). It can be plausibly made lower than CLK1/CLK3, but can't get quite as low as CLK2 (which has none or only the suite ID).

2016-09-02

  1. Algorithm agility
  2. No IV issues, specifically, no global counters and/or IV spaces smaller than 96 bits
  3. Apt for multithreaded en- and decryption (decryption should never be a problem here)
  4. The keyblob (file/repokey) has to be time-invariant unless the user specifically invokes an action that has the clear consequence of changing it. This specifically prohibits changing the default semantics of existing commands.
  5. Any new crypto has to be authenticated encryption

---

(IRC @enkore @textshell)

From these requirements (R1... Rn) it follows that

  • A "DEK-like" solution is preferable, were some key material can be inserted into the keyblob at a user command

    • This allows for R1, while not violating R4
  • For the encryption itself a nonce-less or fully nonce-reuse resistant (R2) algorithm would be best, but they don't really exist yet

  • So a key-wrapping solution seems to be the best, because it allows to fulfill R3 and R4 at the same time.

  • There are multiple approaches:

    These we call "chunk local key", because every chunk carries a wrapped or derived key:

    1. (CLK1) random per chunk keys encrypted with a key from the DEK (key wrapped with DEK)
    2. (CLK2) deterministically derived per chunk keys (e.g. HKDF-HMAC-SHAxxx/BLAKE2xxxx) (shared secret / initial key material in DEK, while info/salt would be chunk ID) (encryption can use static message IV of 0)
    3. (CLK3) session key wrapped for each chunk (message IV counter)

    These also fulfill R2: CLK1 has no shared keys or IVs. CLK2 ditto. CLK3 has per-session key with perfect IV counting (since it's only used in one session and never reused), therefore IV reuse is impossible.

    These fulfill R3 because multiple session keys can be used with ease to independently and concurrently encrypt chunks.

    CLK1 and CLK3 likely require a key wrapping mode (large block mode), while CLK2 requires a KDF safe for many invocations with the same IKM but different info (eg. HKDF?)

    CLK3 can key-wrap the session key once and prepend the wrapped session key to every chunk.

    Therefore CLK3 has lower computational overhead when encrypting (it's the same as now), while higher (but optimizable) overhead when decrypting (naive: always decrypt session key; optimized: LRU cache for (wrapped session key,) -> (decrypted session key)).

    CLK2 is symmetric here (both encrypting and decrypting require the same KDF invocation to derive keys).

    CLK1 is the worst choice in this regard: needs randomness for each chunk, and needs to key-wrap each chunks keys. Similarly, decryption needs to key-unwrap every chunks keys.

  • R5 depends on specific choice of underlying primitives but generally goes without saying on this level of discussion

---

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The first idea/proposal for algorith agility in Borg (currently not updated):

---

Goals

Define flexible and extensible encryption scheme for Borg.

External key management or UI changes are not in scope of this spec.

Terms

DEK / Data encryption key

key used for encrypting data, e.g. file chunks, but also metadata chunks (like archive's metadata[b'items']).

Here it specifically means the combination of a cipher specification (the cipher suite) and the keys required for the suite (possibly other data is needed as well, e.g. last CTR value).

DKID

An opaque 128 bit number that uniquely identifies a DEK. External representation are 32 hexadecimal digits.

TODO: Maybe use UUIDv4 format instead here.

Keyblob
The contents of the keyfile or repokey (whichever applies).
Cipher suite
A cipher suite is a named combination of authentication (e.g. a HMAC) and encryption. They are not "lego bricks" of ciphers and MACs, but pre-defined, specific combinations.

Implementation

Key storage

  • Only supported for Keyfile / Repokey * These are the only supported encryption modes anyway

  • Must be enabled explicitly (since older versions can't read it)

  • DEKs are stored in the keyblob as a dictionary

  • DEKs are dictionaries (serialized with msgpack like everything else in the keyblob)

  • With the following keys:

    • suite: name of the cipher suite
    • Other keys as defined by the cipher suite

    Examples:

    {'suite': 'aes-ctr-sha-256', 'last_iv': 1234, 'enc_key': b'<blob>', 'hmac_key': b'<blob>'}
    {'suite': 'chacha20-poly1305', 'last_iv': 1234, 'enc_key': b'<blob>', 'hmac_key': b'<blob>'}
    
  • All cipher suites must be AEAD. Data and HMAC keys (if applicable) are required to be different.

  • Authentication of the DKID for each payload encrypted is the responsibility of the cipher suite, as well as prepending/appending any authentication tags, initilization vectors or other data to the encrypted payload (if applicable).

  • New algorithms can be added as needed. Older versions will not be able to decrypt the data, but will be able to tell that they aren't able and inform the user appropriately.

  • Important: cipher suites may store a "last IV" value for CTR or similar modes of operation. If the changed() method of any DEK is True, then the keyblob must be written out at the end of the operation (when new data was encrypted and comitted).

Chunk storage

  • DEK-encrypted chunks have the following layout:

    <1 byte TYPE><16 byte DKID><payload bytes>
    
  • TYPE is 0x10

  • The DKID is a 16 byte opaque bytestring

  • The Key class handles both legacy TYPE and DEK-TYPE chunks

  • If the DKID referenced is not available, a KeyNotAvailable exception is raised

Crypto API

# This API would be used by AESKeyBase, not directly to encrypt chunks. At this layer a chunk would
# already be compressed, for example.
class CipherSuite:
    NAME = 'aes-256-ctr-sha-256'

    def __init__(self, dkid, params):
        self.dkid = dkid

    @classmethod
    def create_new(self):
        """Make a new instance with randomly generated keys."""

    def serialize(self):
        """Return dict representation of this DEK."""
        return {'suite': self.NAME, ...}

    def changed(self):
        """True if the representation changed."""
        return False

    def encrypt(self, payload):
        # Put self.dkid into AD

    def decrypt(self, encrypted_payload):
        # Check self.dkid against AD in payload

CIPHERSUITES = {
  SomeSuite.NAME: SomeSuite
}

Roadmap

  • Should include check for DEK-chunks as soon as possible, inform user and exit gracefully.
  • Non-breaking change, but DEK-encrypted data is of course not readable by previous versions
    • To achieve proper deduplication when old versions use the same repo, the items list must not be encrypted with a DEK. --metadata-key options to optionally do this (and break backwards compat when enabled somewhat?)
      • Note that no duplicate data would be stored. Just the refcounting is off, so delete, prune w/ old version is unsafe (detectable, would abort safely).
  • Flexible enough to be the basis of more elaborate (external) key management schemes
    • E.g. option to create new archive with new key set, but don't store it ever, directly export it
@ThomasWaldmann
Copy link

About first roadmap item: If a unknown type byte is encountered, this is raised:

class UnsupportedPayloadError(Error):
    """Unsupported payload type {}. A newer version is required to access this repository."""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment