Skip to content

Instantly share code, notes, and snippets.

@tevador
Last active December 10, 2024 20:03
Show Gist options
  • Save tevador/50160d160d24cfc6c52ae02eb3d17024 to your computer and use it in GitHub Desktop.
Save tevador/50160d160d24cfc6c52ae02eb3d17024 to your computer and use it in GitHub Desktop.

JAMTIS

This document describes a new addressing scheme for Monero.

Chapters 1-2 are intended for general audience.

Chapters 3-7 contain technical specifications.

Table of Contents

1. Introduction

1.1 Why a new address format?

Sometime in 2024, Monero plans to adopt a new transaction protocol called Seraphis [1], which enables much larger ring sizes than the current RingCT protocol. However, due to a different key image construction, Seraphis is not compatible with CryptoNote addresses. This means that each user will need to generate a new set of addresses from their existing private keys. This provides a unique opportunity to vastly improve the addressing scheme used by Monero.

1.2 Current Monero addresses

The CryptoNote-based addressing scheme [2] currently used by Monero has several issues:

  1. Addresses are not suitable as human-readable identifiers because they are long and case-sensitive.
  2. Too much information about the wallet is leaked when scanning is delegated to a third party.
  3. Generating subaddresses requires view access to the wallet. This is why many merchants prefer integrated addresses [3].
  4. View-only wallets need key images to be imported to detect spent outputs [4].
  5. Subaddresses that belong to the same wallet can be linked via the Janus attack [5].
  6. The detection of outputs received to subaddresses is based on a lookup table, which can sometimes cause the wallet to miss outputs [6].

1.3 Jamtis

Jamtis is a new addressing scheme that was developed specifically for Seraphis and tackles all of the shortcomings of CryptoNote addresses that were mentioned above. Additionally, Jamtis incorporates two other changes related to addresses to take advantage of this large upgrade opportunity:

  • A new 16-word mnemonic scheme called Polyseed [7] that will replace the legacy 25-word seed for new wallets.
  • The removal of integrated addresses and payment IDs [8].

2. Features

2.1 Address format

Jamtis addresses, when encoded as a string, start with the prefix xmra and consist of 196 characters. Example of an address: xmra1mj0b1977bw3ympyh2yxd7hjymrw8crc9kin0dkm8d3wdu8jdhf3fkdpmgxfkbywbb9mdwkhkya4jtfn0d5h7s49bfyji1936w19tyf3906ypj09n64runqjrxwp6k2s3phxwm6wrb5c0b6c1ntrg2muge0cwdgnnr7u7bgknya9arksrj0re7whkckh51ik

There is no "main address" anymore - all Jamtis addresses are equivalent to a subaddress.

2.1.1 Recipient IDs

Jamtis introduces a short recipient identifier (RID) that can be calculated for every address. RID consists of 25 alphanumeric characters that are separated by underscores for better readability. The RID for the above address is regne_hwbna_u21gh_b54n0_8x36q. Instead of comparing long addresses, users can compare the much shorter RID. RIDs are also suitable to be communicated via phone calls, text messages or handwriting to confirm a recipient's address. This allows the address itself to be transferred via an insecure channel.

2.2 Light wallet scanning

Jamtis introduces new wallet tiers below view-only wallet. One of the new wallet tiers called "FindReceived" is intended for wallet-scanning and only has the ability to calculate view tags [9]. It cannot generate wallet addresses or decode output amounts.

View tags can be used to eliminate 99.6% of outputs that don't belong to the wallet. If provided with a list of wallet addresses, this tier can also link outputs to those addresses. Possible use cases are:

2.2.1 Wallet component

A wallet can have a "FindReceived" component that stays connected to the network at all times and filters out outputs in the blockchain. The full wallet can thus be synchronized at least 256x faster when it comes online (it only needs to check outputs with a matching view tag).

2.2.2 Third party services

If the "FindReceived" private key is provided to a 3rd party, it can preprocess the blockchain and provide a list of potential outputs. This reduces the amount of data that a light wallet has to download by a factor of at least 256. The third party will not learn which outputs actually belong to the wallet and will not see output amounts.

2.3 Wallet tiers for merchants

Jamtis introduces new wallet tiers that are useful for merchants.

2.3.1 Address generator

This tier is intended for merchant point-of-sale terminals. It can generate addresses on demand, but otherwise has no access to the wallet (i.e. it cannot recognize any payments in the blockchain).

2.3.2 Payment validator

This wallet tier combines the Address generator tier with the ability to also view received payments (including amounts). It is intended for validating paid orders. It cannot see outgoing payments and received change.

2.4 Full view-only wallets

Jamtis supports full view-only wallets that can identify spent outputs (unlike legacy view-only wallets), so they can display the correct wallet balance and list all incoming and outgoing transactions.

2.5 Janus attack mitigation

Janus attack is a targeted attack that aims to determine if two addresses A, B belong to the same wallet. Janus outputs are crafted in such a way that they appear to the recipient as being received to the wallet address B, while secretly using a key from address A. If the recipient confirms the receipt of the payment, the sender learns that they own both addresses A and B.

Jamtis prevents this attack by allowing the recipient to recognize a Janus output.

2.6 Robust output detection

Jamtis addresses and outputs contain an encrypted address tag which enables a more robust output detection mechanism that does not need a lookup table and can reliably detect outputs sent to arbitrary wallet addresses.

3. Notation

3.1 Serialization functions

  1. The function BytesToInt256(x) deserializes a 256-bit little-endian integer from a 32-byte input.
  2. The function Int256ToBytes(x) serialized a 256-bit integer to a 32-byte little-endian output.

3.2 Hash function

The function Hb(k, x) with parameters b, k, refers to the Blake2b hash function [10] initialized as follows:

  • The output length is set to b bytes.
  • Hashing is done in sequential mode.
  • The Personalization string is set to the ASCII value "Monero", padded with zero bytes.
  • If the key k is not null, the hash function is initialized using the key k (maximum 64 bytes).
  • The input x is hashed.

The function SecretDerive is defined as:

SecretDerive(k, x) = H32(k, x)

3.3 Elliptic curves

Two elliptic curves are used in this specification:

  1. Curve25519 - a Montgomery curve. Points on this curve include a cyclic subgroup 𝔾1.
  2. Ed25519 - a twisted Edwards curve. Points on this curve include a cyclic subgroup 𝔾2.

Both curves are birationally equivalent, so the subgroups 𝔾1 and 𝔾2 have the same prime order ℓ = 2252 + 27742317777372353535851937790883648493. The total number of points on each curve is 8ℓ.

3.3.1 Curve25519

Curve25519 is used exclusively for the Diffie-Hellman key exchange [11].

Only a single generator point B is used:

Point Derivation Serialized (hex)
B generator of 𝔾1 0900000000000000000000000000000000000000000000000000000000000000

Private keys for Curve25519 are 32-byte integers denoted by a lowercase letter d. They are generated using the following KeyDerive1(k, x) function:

  1. d = H32(k, x)
  2. d[31] &= 0x7f (clear the most significant bit)
  3. d[0] &= 0xf8 (clear the least significant 3 bits)
  4. return d

All Curve25519 private keys are therefore multiples of the cofactor 8, which ensures that all public keys are in the prime-order subgroup. The multiplicative inverse modulo is calculated as d-1 = 8*(8*d)-1 to preserve the aforementioned property.

Public keys (elements of 𝔾1) are denoted by the capital letter D and are serialized as the x-coordinate of the corresponding Curve25519 point. Scalar multiplication is denoted by a space, e.g. D = d B.

3.3.2 Ed25519

The Edwards curve is used for signatures and more complex cryptographic protocols [12]. The following three generators are used:

Point Derivation Serialized (hex)
G generator of 𝔾2 5866666666666666666666666666666666666666666666666666666666666666
U Hp("seraphis U") 126582dfc357b10ecb0ce0f12c26359f53c64d4900b7696c2c4b3f7dcab7f730
X Hp("seraphis X") 4017a126181c34b0774d590523a08346be4f42348eddd50eb7a441b571b2b613

Here Hp refers to an unspecified hash-to-point function.

Private keys for Ed25519 are 32-byte integers denoted by a lowercase letter k. They are generated using the following function:

KeyDerive2(k, x) = H64(k, x) mod ℓ

Public keys (elements of 𝔾2) are denoted by the capital letter K and are serialized as 256-bit integers, with the lower 255 bits being the y-coordinate of the corresponding Ed25519 point and the most significant bit being the parity of the x-coordinate. Scalar multiplication is denoted by a space, e.g. K = k G.

3.4 Block cipher

The function BlockEnc(s, x) refers to the application of the Twofish [13] permutation using the secret key s on the 16-byte input x. The function BlockDec(s, x) refers to the application of the inverse permutation using the key s.

3.5 Base32 encoding

"Base32" in this specification referes to a binary-to-text encoding using the alphabet xmrbase32cdfghijknpqtuwy01456789. This alphabet was selected for the following reasons:

  1. The order of the characters has a unique prefix that distinguishes the encoding from other variants of "base32".
  2. The alphabet contains all digits 0-9, which allows numeric values to be encoded in a human readable form.
  3. Excludes the letters o, l, v and z for the same reasons as the z-base-32 encoding [14].

4. Wallets

4.1 Wallet parameters

Each wallet consists of two main private keys and a timestamp:

Field Type Description
km private key wallet master key
kvb private key view-balance key
birthday timestamp date when the wallet was created

The master key km is required to spend money in the wallet and the view-balance key kvb provides full view-only access.

The birthday timestamp is important when restoring a wallet and determines the blockchain height where scanning for owned outputs should begin.

4.2 New wallets

4.2.1 Standard wallets

Standard Jamtis wallets are generated as a 16-word Polyseed mnemonic [7], which contains a secret seed value used to derive the wallet master key and also encodes the date when the wallet was created. The key kvb is derived from the master key.

Field Derivation
km BytesToInt256(polyseed_key) mod ℓ
kvb kvb = KeyDerive1(km, "jamtis_view_balance_key")
birthday from Polyseed

4.2.2 Multisignature wallets

Multisignature wallets are generated in a setup ceremony, where all the signers collectively generate the wallet master key km and the view-balance key kvb.

Field Derivation
km setup ceremony
kvb setup ceremony
birthday setup ceremony

4.3 Migration of legacy wallets

Legacy pre-Seraphis wallets define two private keys:

  • private spend key ks
  • private view-key kv

4.3.1 Standard wallets

Legacy standard wallets can be migrated to the new scheme based on the following table:

Field Derivation
km km = ks
kvb kvb = KeyDerive1(km, "jamtis_view_balance_key")
birthday entered manually

Legacy wallets cannot be migrated to Polyseed and will keep using the legacy 25-word seed.

4.3.2 Multisignature wallets

Legacy multisignature wallets can be migrated to the new scheme based on the following table:

Field Derivation
km km = ks
kvb kvb = kv
birthday entered manually

4.4 Additional keys

There are additional keys derived from kvb:

Key Name Derivation Used to
dfr find-received key kfr = KeyDerive1(kvb, "jamtis_find_received_key") scan for received outputs
dua unlock-amounts key kid = KeyDerive1(kvb, "jamtis_unlock_amounts_key") decrypt output amounts
sga generate-address secret sga = SecretDerive(kvb, "jamtis_generate_address_secret") generate addresses
sct cipher-tag secret ket = SecretDerive(sga, "jamtis_cipher_tag_secret") encrypt address tags

The key dfr provides the ability to calculate the sender-receiver shared secret when scanning for received outputs. The key dua can be used to create a secondary shared secret and is used to decrypt output amounts.

The key sga is used to generate public addresses. It has an additional child key sct, which is used to encrypt the address tag.

4.5 Key hierarchy

The following figure shows the overall hierarchy of wallet keys. Note that the relationship between km and kvb only applies to standard (non-multisignature) wallets.

key hierarchy

4.6 Wallet access tiers

Tier Knowledge Off-chain capabilities On-chain capabilities
AddrGen sga generate public addresses none
FindReceived dfr recognize all public wallet addresses eliminate 99.6% of non-owned outputs (up to § 5.3.5), link output to an address (except of change and self-spends)
ViewReceived dfr, dua, sga all view all received except of change and self-spends (up to § 5.3.14)
ViewAll kvb all view all
Master km all all

4.6.1 Address generator (AddrGen)

This wallet tier can generate public addresses for the wallet. It doesn't provide any blockchain access.

4.6.2 Output scanning wallet (FindReceived)

Thanks to view tags, this tier can eliminate 99.6% of outputs that don't belong to the wallet. If provided with a list of wallet addresses, it can also link outputs to those addresses (but it cannot generate addresses on its own). This tier should provide a noticeable UX improvement with a limited impact on privacy. Possible use cases are:

  1. An always-online wallet component that filters out outputs in the blockchain. A higher-tier wallet can thus be synchronized 256x faster when it comes online.
  2. Third party scanning services. The service can preprocess the blockchain and provide a list of potential outputs with pre-calculated spend keys (up to § 5.2.4). This reduces the amount of data that a light wallet has to download by a factor of at least 256.

4.6.3 Payment validator (ViewReceived)

This level combines the tiers AddrGen and FindReceived and provides the wallet with the ability to see all incoming payments to the wallet, but cannot see any outgoing payments and change outputs. It can be used for payment processing or auditing purposes.

4.6.4 View-balance wallet (ViewAll)

This is a full view-only wallet than can see all incoming and outgoing payments (and thus can calculate the correct wallet balance).

4.6.5 Master wallet (Master)

This tier has full control of the wallet.

4.7 Wallet public keys

There are 3 global wallet public keys. These keys are not usually published, but are needed by lower wallet tiers.

Key Name Value
Ks wallet spend key Ks = kvb X + km U
Dua unlock-amounts key Dua = dua B
Dfr find-received key Dfr = dfr Dua

5. Addresses

5.1 Address generation

Jamtis wallets can generate up to 2128 different addresses. Each address is constructed from a 128-bit index j. The size of the index space allows stateless generation of new addresses without collisions, for example by constructing j as a UUID [15].

Each Jamtis address encodes the tuple (K1j, D2j, D3j, tj). The first three values are public keys, while tj is the "address tag" that contains the encrypted value of j.

5.1.1 Address keys

The three public keys are constructed as:

  • K1j = Ks + kuj U + kxj X + kgj G
  • D2j = daj Dfr
  • D3j = daj Dua

The private keys kuj, kxj, kgj and daj are derived as follows:

Keys Name Derivation
kuj spend key extensions kuj = KeyDerive2(sga, "jamtis_spendkey_extension_u" || j)
kxj spend key extensions kxj = KeyDerive2(sga, "jamtis_spendkey_extension_x" || j)
kgj spend key extensions kgj = KeyDerive2(sga, "jamtis_spendkey_extension_g" || j)
daj address keys daj = KeyDerive1(sga, "jamtis_address_privkey" || j)

5.1.2 Address tag

Each address additionally includes an 18-byte tag tj = (j', hj'), which consists of the encrypted value of j:

  • j' = BlockEnc(sct, j)

and a 2-byte "tag hint", which can be used to quickly recognize owned addresses:

  • hj' = H2(sct, "jamtis_address_tag_hint" || j')

5.2 Sending to an address

TODO

5.3 Receiving an output

TODO

5.4 Change and self-spends

TODO

5.5 Transaction size

Jamtis has a small impact on transaction size.

5.5.1 Transactions with 2 outputs

The size of 2-output transactions is increased by 28 bytes. The encrypted payment ID is removed, but the transaction needs two encrypted address tags t~ (one for the recipient and one for the change). Both outputs can use the same value of De.

5.5.2 Transactions with 3 or more outputs

Since there are no "main" addresses anymore, the TX_EXTRA_TAG_PUBKEY field can be removed from transactions with 3 or more outputs.

Instead, all transactions with 3 or more outputs will require one 50-byte tuple (De, t~) per output.

6. Address encoding

6.1 Address structure

An address has the following overall structure:

Field Size (bits) Description
Header 30* human-readable address header (§ 6.2)
K1 256 address key 1
D2 255 address key 2
D3 255 address key 3
t 144 address tag
Checksum 40* (§ 6.3)

* The header and the checksum are already in base32 format

6.2 Address header

The address starts with a human-readable header, which has the following format consisting of 6 alphanumeric characters:

"xmra" <version char> <network type char>

Unlike the rest of the address, the header is never encoded and is the same for both the binary and textual representations. The string is not null terminated.

The software decoding an address shall abort if the first 4 bytes are not 0x78 0x6d 0x72 0x61 ("xmra").

The "xmra" prefix serves as a disambiguation from legacy addresses that start with "4" or "8". Additionally, base58 strings that start with the character x are invalid due to overflow [16], so legacy Monero software can never accidentally decode a Jamtis address.

6.2.1 Version character

The version character is "1". The software decoding an address shall abort if a different character is encountered.

6.2.2 Network type

network char network type
"t" testnet
"s" stagenet
"m" mainnet

The software decoding an address shall abort if an invalid network character is encountered.

6.3 Checksum

The purpose of the checksum is to detect accidental corruption of the address. The checksum consists of 8 characters and is calculated with a cyclic code over GF(32) using the polynomial:

x8 + 3x7 + 11x6 + 18x5 + 5x4 + 25x3 + 21x2 + 12x + 1

The checksum can detect all errors affecting 5 or fewer characters. Arbitrary corruption of the address has a chance of less than 1 in 1012 of not being detected. The reference code how to calculate the checksum is in Appendix A.

6.4 Binary-to-text encoding

An address can be encoded into a string as follows:

address_string = header + base32(data) + checksum

where header is the 6-character human-readable header string (already in base32), data refers to the address tuple (K1, D2, D3, t), encoded in 910 bits, and the checksum is the 8-character checksum (already in base32). The total length of the encoded address 196 characters (=6+182+8).

6.4.1 QR Codes

While the canonical form of an address is lower case, when encoding an address into a QR code, the address should be converted to upper case to take advantage of the more efficient alphanumeric encoding mode.

6.5 Recipient authentication

TODO

7. Test vectors

TODO

References

  1. https://github.com/UkoeHB/Seraphis
  2. https://github.com/monero-project/research-lab/blob/master/whitepaper/whitepaper.pdf
  3. monero-project/meta#299 (comment)
  4. https://www.getmonero.org/resources/user-guides/view_only.html
  5. https://web.getmonero.org/2019/10/18/subaddress-janus.html
  6. monero-project/monero#8138
  7. https://github.com/tevador/polyseed
  8. monero-project/monero#7889
  9. monero-project/research-lab#73
  10. https://eprint.iacr.org/2013/322.pdf
  11. https://cr.yp.to/ecdh/curve25519-20060209.pdf
  12. https://ed25519.cr.yp.to/ed25519-20110926.pdf
  13. https://www.schneier.com/wp-content/uploads/2016/02/paper-twofish-paper.pdf
  14. http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt
  15. https://en.wikipedia.org/wiki/Universally_unique_identifier
  16. https://github.com/monero-project/monero/blob/319b831e65437f1c8e5ff4b4cb9be03f091f6fc6/src/common/base58.cpp#L157

Appendix A: Checksum

# Jamtis address checksum algorithm

# cyclic code based on the generator 3BI5PLC1
# can detect 5 errors up to the length of 994 characters
GEN=[0x1ae45cd581, 0x359aad8f02, 0x61754f9b24, 0xc2ba1bb368, 0xcd2623e3f0]

M = 0xffffffffff

def jamtis_polymod(data):
    c = 1
    for v in data:
        b = (c >> 35)
        c = ((c & 0x07ffffffff) << 5) ^ v
        for i in range(5):
            c ^= GEN[i] if ((b >> i) & 1) else 0
    return c

def jamtis_verify_checksum(data):
    return jamtis_polymod(data) == M

def jamtis_create_checksum(data):
    polymod = jamtis_polymod(data + [0,0,0,0,0,0,0,0]) ^ M
    return [(polymod >> 5 * (7 - i)) & 31 for i in range(8)]

# test/example

CHARSET = "xmrbase32cdfghijknpqtuwy01456789"

addr_test = (
    "xmra1mj0b1977bw3ympyh2yxd7hjymrw8crc9kin0dkm8d3"
    "wdu8jdhf3fkdpmgxfkbywbb9mdwkhkya4jtfn0d5h7s49bf"
    "yji1936w19tyf3906ypj09n64runqjrxwp6k2s3phxwm6wr"
    "b5c0b6c1ntrg2muge0cwdgnnr7u7bgknya9arksrj0re7wh")

addr_data = [CHARSET.find(x) for x in addr_test]
addr_enc = addr_data + jamtis_create_checksum(addr_data)
addr = "".join([CHARSET[x] for x in addr_enc])

print(addr)
print("len =", len(addr))
print("valid =", jamtis_verify_checksum(addr_enc))
@UkoeHB
Copy link

UkoeHB commented Sep 12, 2023

Why can't the bitsize of the "standard" view tag scale with volume to keep the false positive rate roughly constant? E.g. when tx volume doubles, one bit is added to the view tag deterministically. That would react much more smoothly and provide plausible deniability under all conditions.

This can be abused by a malicious remote scanning service to reduce the anonymity of users by spamming the chain.

@tevador
Copy link
Author

tevador commented Sep 12, 2023

Malicious actors can reduce the anonymity of users by spamming the chain right now.

@UkoeHB
Copy link

UkoeHB commented Sep 12, 2023

Yes but a dynamic view tag would make spam more damaging.

@tevador
Copy link
Author

tevador commented Sep 12, 2023

The options are:

  1. Fixed-size view tag: Either good plausible deniability now and possibly inadequate filtering later, or vice versa (or somewhere in between).
  2. Multiple view tags of various sizes: coarse tuning of the false positive rate; susceptible to spam attacks (an attacker can spam for a while to make users subscribe with the larger tag); retroactive privacy loss when switching to a larger tag.
  3. Dynamic-size view tag: fine tuning of the false positive rate, no retroactive privacy loss, susceptible to spam attacks.

Choose your poison.

@jeffro256
Copy link

The fourth option, which is what @UkoeHB was proposing, is a fixed-size view tag but optionally enable third parties to compute nominal address tags, which reduces light-side single-core compute time by about 100x, but increases the bandwidth by ~10%.

@tevador
Copy link
Author

tevador commented Sep 12, 2023

optionally enable third parties to compute nominal address tags

This is orthogonal, can be added to any of the above 3 options. The important point is that it does not reduce the bandwidth requirements for light clients.

@jeffro256
Copy link

jeffro256 commented Sep 13, 2023

I don't know if this idea has ever been floated before, and I'm making this up right now, but we could do dynamic view tags that 1) aren't susceptible to spam attacks, 2) scale as the receiver wishes and 3) all look uniform on-chain while keeping transaction size the same. They would have an absolute maximum size set by consensus (say 2-bytes, or 3-bytes if we're pushing it, but 2-bytes is probably fine for a maximum). All tags on-chain will show up as this constant length. Let's call this number of bits b_max. The actual length of view tag/the amount of filtering that a receiver desires is encoded in the address. Let's call this value b_addr. We shouldn't give users too many options, else they will partition themselves too finely and addresses could be attempted to be correlated. Let's say that we give the users 8 (could be any number and 4 might be better) choices, which means the size of integer b_addr is fixed at 3 bits (which we could actually fit into an address in my proposal without expanding the address size since there's 4 unused bits). Let's say that our 8 choices for b_addr (the utilized bit length of the view-assist tag) are 1, 2, 4, 6, 8, 10, 12, or 16 bits wide. The higher b_addr is, the more efficient scanning is, but the smaller the anonymity pool is. Full wallets would likely set this value as high as it will go (since they lose no privacy either way if they are giving up that private key). Light wallets would select a good value for them, then send k_va (view-assist) and b_addr to their light wallet server.

Senders, when sending to an address, will extract b_addr from the address and encode b_addr number of bits into the view-assist tag, and whatever bits are left in the view tag space (b_max - b_addr), they will fill with random noise (this part is important to not miss otherwise we might accidentally filter using more bits than intended).

Light wallet servers, who know b_addr for each user, when doing DH exchanges against k_va, will match b_addr bits and send those records to the light wallet client, who scans them as usual.

Cons: 1) Partitioning on receive addresses can happen. 2) Malicious senders can use more bits than requested (b_addr) to signal probabilistically to someone's light wallet server that an incoming normal enote belongs to said receiver. 3) If a receiver creates a recieve address with b_addr1, then gives that address to a sender, then wants to increase b_addr1 to b_addr2>b_addr1, the receiver might not properly scan a transaction sent with the old b_addr1, and will need the sender to tell the receiver the transaction ID. Not too worried about point 2 since its already possible to construct a transaction then blab about it. Point 1 is a little trickier.

Pros: 1) Many options for users so they are incentivized to not completely bomb their privacy even with incredibly high transaction volume and bad connectivity, 2) on-chain uniformity, 3) view tags not susceptible to spam attacks 4) no retroactive loss of privacy when increasing b_addr (moving to less private, more efficient tier) 5) There are options to do less than 1:256 view filter, e.g. 1:64 filtering.

What do you think?

@jeffro256
Copy link

In the context of a Jamtis protocol where we do 2 DH operations for normal enotes anyways (to guard the sender-receiver secret from light wallet servers), all txs could also still include the independent fixed-size view tag for the second key (like the "sparse view tag" in my original proposal). Full wallets could use this fixed-size view tag as they first tag they scan against, allowing them to generate random values of b_addr for their addresses to help mitigate partitioning, while not affecting their scan time by more than fractions of a percent.

@tevador
Copy link
Author

tevador commented Sep 13, 2023

Interesting idea, but:

  1. Users of remote scanning services might be coerced or tricked into using the longest possible tag, getting zero privacy, while costing the malicious service nothing.
  2. Any address with fewer than the maximum number of bits would immediately leak the fact that the user is using a remote scanner.
  3. Horrible UX when changing the tag size. Payments sent to old addresses would not be recognized and asking the sender for the TXID is not always possible (e.g. donations).

Compare that with a dynamic tag size calculated from the running average tx volume over the last 100 000 blocks so that the mean false positive rate is about 256 tag matches per day:

  1. Malicious services would have to spam constantly at least 50% of the transaction volume to add 1 bit to the tag size (reducing the effective false positive rate to 128 matches per day). This attack is not free, the attacker is paying transaction fees.
  2. Neither addresses nor transactions leak anything.
  3. No UX problems because there is always agreement about the tag size that was used in a transaction.

@tevador
Copy link
Author

tevador commented Sep 13, 2023

Full wallets could use this fixed-size view tag as they first tag they scan against, allowing them to generate random values of b_addr for their addresses to help mitigate partitioning, while not affecting their scan time by more than fractions of a percent.

You have to look at it from a game-theoretic perspective. If users can make a choice that improves their experience regardless of what other users do, you have to assume they will make that choice (see prisoner's dilemma). Full wallet users will use the full tag size if they can speed up their scan time by a fraction of a percent.

@jeffro256
Copy link

Yeah the downsides are kinda weird and hard to reason about/plan OPSEC for. If there was a way to allow someone to encode a certain level of entropy to be received by someone else without the sender knowing what the level of entropy is, I think that'd be the way to go, but I don't know if that's possible.

@jeffro256
Copy link

I'm much less sure than before, but I still think that a mix of a 1-byte and a 2-byte fixed-size view tag is the best option (assuming we're doing 2 DH options to get q). I think we should put a ton of effort into plan A, making sure the scanning compute process is as optimized as possible, and uses as much processors as available for any given machine. The performance tests show that even doing 1 DH op (instead of 1 Twofish op) for every received record on the light wallet client-side keeps up with modern, middle-of-the-road bandwidth speed, even using just a single core.

The fact that the view tags options are so coarse might hopefully incentivize us developers in the future to push hard for plan A for light wallet users so that they don't switch to option B, the 2-byte view tags, but if they do, at least they won't have deterministic downsides and will more-or-less know what they're getting into: 256x smaller anonymity set.

@tevador
Copy link
Author

tevador commented Sep 14, 2023

I don't think we should allow users to select the view tag size. There should be only one view tag and I'm more in favor of the dynamic size as it's a more future-proof solution and I'm not convinced that spam attacks are a real problem compared to the alternatives.

I'm proposing the following:

Jamtis with dynamic view tags

Pros

  • Third-parties who compute view tags on behalf of users can no longer strongly identify incoming normal enotes to known public addresses. (same as the proposal by @jeffro256)
  • Third-parties who compute view tags on behalf of users can no longer strongly identify incoming normal enotes sent to a public address that is used more than once. (same as the proposal by @jeffro256)
  • Third-parties can now compute view tags and generate public addresses on behalf of users without the ability to learn any additonal balance recovery information. (same as the proposal by @jeffro256)
  • Light wallets have a fixed bandwidth (about 200KB/day) and CPU (about 100 ms/day) cost regardless of the transaction volume. These costs are so low that no third party provider should be able to successfully argue for users to hand over higher tier private keys.
  • Users cannot shoot themselves in the foot by selecting a view tag size that doesn't have enough false-positive matches.

Cons

  • Public address length is increased from 196 to 244 characters. (same as the proposal by @jeffro256)
  • Third-parties who compute view tags on behalf of users can spam the network to reduce the effective number of false-positive matches of their users.
  • Additional ~40 ms of CPU time per day for users who scan the blockchain locally, but this is negligible.
  • Additional complexity in the specs

Changes

Private keys and wallet tiers

The number of private keys stays the same, but some keys have a different function and have been renamed:

  • d_ua "unlock-amounts" -> d_vr "view-received"
  • d_fr "find-received" -> "filter-received"
k_m (master key)
 |
 |
 |
 +- k_vb (view-balance key)
     |
     |
     |
     +- d_vr (view-received key)
         |
         |
         |
         +- d_fr (filter-received key)
         |
         |
         |
         +- s_ga (generate-address secret)
             |
             |
             |
             +- s_ct (cipher-tag secret)

This cleanly maps to the supported wallet tiers:

Tier Knowledge Off-chain capabilities On-chain capabilities
Master k_m all all
ViewBalance k_vb all view all
ViewReceived d_vr all view all received except of change and self-spends
FilterReceived d_fr recognize all public wallet addresses calculate view tags
GenAddr s_ga generate public addresses none

GenAddr + FilterReceived can be safely combined. The key hierarchy ensures that no additional tiers can be constructed.

Addresses

Addresses consist of 4 public keys:

  1. K^j_1 = K_s + k^j_u U + k^j_x X + k^j_g G (unchanged)
  2. D^j_2 = (1 / d^j_a) * d_fr * B
  3. D^j_3 = (1 / d^j_a) * d_vr * B
  4. D^j_4 = (1 / d^j_a) * B

B is the Curve25519 base point. Note the inverted usage of d^j_a, which simplifies enote recovery.

There is no tag hint, so only j' = BlockEnc(s_ct, j) is part of the address. The total address length in base32 is 244 characters including the prefix and checksum.

Key exchange

The sender generates an ephemeral private key d_e and calculates D_e = d_e * D^j_4.

Shared secrets

There are 3 DH shared secrets:

  1. DH_1 = d_e * D^j_2 = d_fr * D_e
  2. DH_2 = d_e * D^j_3 = d_vr * D_e
  3. DH_3 = d_e * B = d^j_a * D_e
  • DH_1 is used to calculate the view tag.
  • DH_2 is used to derive the first high-level shared secret: s^sr_1 = H(DH_2 || D_e || input_context)
  • DH_3 is used to derive the second high-level shared secret: s^sr_2 = H(DH_3)

Self-send enotes use a different construction for the high-level secrets (unchanged).

View tags

The view tag is calculated by hashing DH_1 together with K_o (both for normal and self-send enotes).

View tag filter target

The view tag size is dynamic and is automatically adjusted based on the transaction volume so that the false positive rate (the number of view tag matches) is 480 enotes/day. Because the view tag filter rate must be a power of 2, this will actually result in a range from 480 to 960 enotes per day depending on the tx volume. If we "average the averages" over all possible values of tx volume, this will give a mean of 720 enote matches per day, or roughly 1 match per block, which is what was suggested by @jeffro256. I think this is close to the upper limit of what is acceptable for light wallet clients (~200 KB/day) and should provide a good number of false positives even if there was a short term drop in tx volume.

The fomula to calculate the view tag size in bits is:

tag_size = trunc(log2(3 * num_outputs_100k / 200000))

where num_outputs_100k is the total number of outputs in the last 100 000 blocks. The trunc(log2(x)) function can be easily calculated using only integer operations (it's basically the position of the most significant bit).

As an example, the value of num_outputs_100k is currently about 7.9 million, which results in a view tag size of 6 bits when plugged into the formula. With around 56000 daily outputs, there will be about 880 matches per day. If the long-term daily volume increases to about 62000 ouputs, the view tag size will be increased to 7 bits and the number of matches will drop to 480 per day.

View tag size encoding

The view tag size must be encoded explicitly to avoid UX issues with missed transactions at times when the view tag size changes. This can be done with a 1-byte field per transaction (all outputs will use the same tag size).

I'm proposing a range of valid values for the tag size between 1 and 16 bits.

A 1-bit view tag requires num_outputs_100k > 133333. Since there are always at least 100k coinbase outputs, the 1-bit view tag would be "too large" only if there were fewer than 120 transactions per day, which hasn't happened on mainnet except for a few weeks shortly after launch in 2014.

A 17-bit view tag that would overflow the supported range would require num_outputs_100k > 8738133333, an increase of more than 1000x over the current tx volume. If this somehow happened, the number of false positives would exceed 960 per day, which would only have performance implications for light wallets, but would not cause any privacy problems.

So the proposed range of 1-16 bits is sufficient.

Complementary view tag

Regardless of the tag_size, the view tag is always encoded in 2 bytes as a 16-bit integer per enote. The remaining bits are filled with a "complementary" view tag calculated from s^sr_1, which needs a different private key.

For example, with tag_size = 6, the 16 bits would be CCCCCCCCCCTTTTTT, where T is a view tag bit and C is a complementary view tag bit.

Third-party scanning

The intended use is to provide d_fr to a third party, who can then calculate the "T" bits of the view tag and filter out non-matching enotes. There will always be a sufficient number of false positives so that the third party cannot learn with certainty which enotes are owned by the user. The light wallet can then calculate the "C" bits and further filter out enotes. On average, the light wallet will need to recompute K_o for 1 enote out of 65536.

Users might be tempted to provide the view-received key d_vr to the third party to speed up scanning. However, this does not save any bandwidth in practice because the server can't calculate s^sr_1 for self-send enotes. It only saves a minuscule amount of CPU time (~100 ms/day at best) in exchange for a loss of privacy for all incoming payments (including amounts).

Similarly, users might be tempted to provide the view-balance key k_vb to the third party to speed up scanning. This would save about 200 KB/day in exchange for a complete loss of privacy.

These unintended use cases are sufficiently unfavorable to restrict third party scanners to the FilterReceived wallet tier.

Scanning speed

The following table shows the cryptographic operations needed to recognize owned enotes for different types of wallets (assuming the wallet does not receive more than a few payments per day). I'm ignoring symmetric crypto operations for simplicity (they are negligible).

Wallet type For each enote For ~720 enotes/day For 1/65536 enotes
Full wallet (ViewBalance) 1x DH 1x DH 3x recompute K_o
Full wallet (ViewReceived) 1x DH 1x DH 1x recompute K_o
Light wallet (ViewBalance) - 1x DH 3x recompute K_o
Light wallet (ViewReceived) - 1x DH 1x recompute K_o

Here "Light wallet" refers to a wallet that downloads data from a FilterReceived wallet service. The ViewBalance tiers need to recompute each K_o three time to detect self-send enotes.

To get an idea about the required bandwidth and CPU time, I'm estimating 256 bytes of data per view tag match, 50 μs of CPU time for DH and 50 μs of CPU time to recompute K_o (recomputing K_o needs 3 fixed-base scmults, which are about 3-4x faster than variable-base scmults for DH).

Wallet type bandwidth/day CPU/day
Full wallet (ViewBalance) depends on tx volume depends on tx volume
Full wallet (ViewReceived) depends on tx volume depends on tx volume
Light wallet (ViewBalance) 180 KB <144 ms
Light wallet (ViewReceived) 180 KB <72 ms

So even when opening a light wallet after 1 month, sync times should be on the order of a few seconds regardless of future transaction volumes.

Practical issues

How does the sending wallet figure out what view tag size to use?

Current Monero wallets already have that information. Wallets call the RPC function get_output_distribution when constructing a tx to pick decoys. This distribution contains enough information (the number of outputs in each block) to calculate the number of bits the view tag should have.

With full-chain membership proofs, wallets will still have to make a RPC call to get the current fee estimate, so that could also be used to get the current view tag size. A rough estimate could be made from the knowledge of the number of leaf nodes in the output tree.

What if a malicious sender purposely selects a shorter view tag (to cause more computation for all wallets) or a longer view tag (to reduce the recipient's light wallet privacy)?

There could be a relay rule that rejects transactions that use a view tag size other than the current or the previous one (i.e. 1 bit shorter if tx volume is growing or 1 bit longer if tx volume is dropping). It could also be enforced by consensus, but that seems like an overkill.

@jeffro256
Copy link

jeffro256 commented Sep 14, 2023

Shouldn't the calculation be DH_3 = d_e * G = 1/(d^j_a * b) * D_e?

I personally think we should set the target to 1 enote false positive per blocktime (2 minutes) to confound timing attacks. If there's a view tag hit almost every single time that a block is submitted, I imagine this would mitigate a lot of timing attacks for low wallet usage. That's about 2.8x what you're proposing, but that's still very doable today, and since its a constant throughput, compute and bandwidth will quickly catch up.

I'm liking this proposal, and just have one more modification: a three byte fixed-size view tag for DH_2. This 1) makes full wallet scanning faster and not dependent on sender-submitted fields (the view tag width), 2) also speeds up light-wallet client side scanning as a byproduct, and 3) most importantly, completely nukes the incentive for a light wallet user to hand over their "filter-received key" else they will have no normal enote privacy. Con: enotes are 3 bytes bigger.

@tevador
Copy link
Author

tevador commented Sep 14, 2023

Also shouldn't it be DH_3 = d_e * G = 1/(d^j_a * b) * D_e?

I'm using B to denote the Curve25519 base point. B = ed25519_pk_to_curve25519(G).

Second, I personally think we should set the target to 1 enote false positive per blocktime (2 minutes) to confound timing attacks. If there's a view tag hit almost every single time that a block is submitted, I imagine this would mitigate a lot of timing attacks for low wallet usage. That's about 2.8x what you're proposing, but that's still very doable today, and since its a constant throughput, compute and bandwidth will quickly catch up.

Yes, the target could be higher than 256. I chose 256/day as it matches an 8-bit view tag with the current tx volume. The lower bound for the target is 144/day to hide when an output is spent soon after the 10 block lock time. The upper bound is only limited by the bandwidth cost for light wallets.

a three byte fixed-size view tag for DH_2. This 1) makes full wallet scanning faster and not dependent on sender-submitted fields (the view tag width), 2) also speeds up light-wallet client side scanning as a byproduct.

I'm not really sure if this is worth the ~25-70 ms of CPU time per day it would save.

  1. most importantly, completely nukes the incentive for a light wallet user to hand over their "filter-received key" else they will have no normal enote privacy

Did you mean "view-received key"?

@jeffro256
Copy link

jeffro256 commented Sep 15, 2023

I'm not really sure if this is worth the ~25-70 ms of CPU time per day it would save.

Here "Light wallet" refers to a wallet that downloads data from a FilterReceived wallet service. The ViewAll tiers need to recompute each K_o twice to detect self-send enotes.

You actually need to try 1 + <number of self-send types> times. For the current Jamtis code in seraphis_lib with PLAIN, DUMMY, CHANGE, & SELFSPEND enote types, this is 4 total K_o re-computations per filter-received enote hit, which might end of being not insignificant for total scan-time. However, since this cost doesn't scale up over time with dynamic view tags, I guess that I'm more okay with it as long as there's no address tag hint to tempt people to disclose the view-received private key.

I'm using B to denote the Curve25519 base point. B = ed25519_pk_to_curve25519(G)

Ah okay I thought B was D^j_ua (AKA DH Base).

Did you mean "view-received key"?

Yes I did, sorry.

Note the inverted usage of d^j_a, which simplifies enote recovery.

I do really like this feature, and AFAIK, inverting the address private key in the address, not in balance recovery, is orthogonal to all of the previous discussed changes, which is nice.

D^j_4 = (1 / d^j_a) * B

I like the simplicity of this, but if we're missing some sort of d_ua unlock-amounts factor, then we can't have tier(s) which identify transactions that we're involved in (by recomputing K_o) without knowing the amounts, which makes cold/hot/hardware wallet separation more private, but just as convenient. And since we're using the x25519 curve for this portion of the protocol, we can cache the value of d_ua * B and then multiply by d^j_a to get s^sr_2, and it's all just as performant.

To expand on the last point, we could have all secret keys (besides cipher-tag) below the view-balance secret in the derivation tree: view-received, view-sent (new key explained below), unlock-amounts, generate-address (moved out from under view-received), and filter-involved (basically the same as filter-received but the name needs an update since we use it also for outgoing). Then we can mix and match the unlock-amounts key with/without view-received and view-sent keys to create different tiers while keeping the number of operations in balance recovery the same. The new derivation tree would look like:

Private Keys

k_m (private master key)
 |
 |
 |
 +- k_vb (private view-balance key)
     |
     |
     |
     +- d_fi (private filter-involved key)
     |
     |
     |
     +- d_ua (private unlock-amounts key)
     |
     |
     |
     +- s_vs (secret view-sent key)
     |
     |
     |
     +- d_vr (private view-received key)
     |
     |
     |
     +- s_ga (secret generate-address key)
             |
             |
             |
             +- s_ct (secret cipher-tag key)

Addresses

Addresses consist of 4 public keys (just added in a factor of d_ua):

  1. K^j_1 = K_s + k^j_u U + k^j_x X + k^j_g G (unchanged)
  2. D^j_2 = 1 / (d^j_a * d_ua) * d_fi * B (filter-received -> filter-involved)
  3. D^j_3 = 1 / (d^j_a * d_ua) * d_vr * B
  4. D^j_4 = 1 / (d^j_a * d_ua) * B

Shared Secrets

There are 3 DH shared secrets:

  1. DH_1 = d_e * D^j_2 = d_fi * D_e (filter-received -> filter-involved)
  2. DH_2 = d_e * D^j_3 = d_vr * D_e (unchanged from @tevador's last post)
  3. DH_3 = d_e * B = d^j_a * d_ua * D_e (added in factor of d_ua)

The DH exchanges are used for the same normal enote high-level secrets in @tevador's post.

However, self-send enotes use a different construction for the high-level secrets (and different from before). For self-send higher level secrets, we use a combination of s_vs (view-sent secret) and d_ua (unlock-amounts key) instead of only k_vb (view-balance):

  1. s^sr_1 = H_[tau]1(s_vs || D_e || input_context)
  2. s^sr_2 = H_[tau]2(d_ua || s^sr_1)

Wallet Tiers

Tier Knowledge Off-chain capabilities On-chain capabilities
GenAddr s_ga generate public addresses none
FilterInvolved d_fi recognize all public wallet addresses calculate view tags
ViewReceived d_vr, d_fi, s_ga all view all received enotes (w/o amounts) except for change and self-spends
ViewSent s_vs, d_fi, s_ga all view all change and self-spends enotes (w/o amounts)
HotWallet s_vs, d_vr, d_fi, s_ga, all view all received, change, and self-spends enotes (w/o amounts)
PaymentValidator d_fi, d_vr, d_ua, s_ga, all view all received enotes with amounts
ViewBalance k_vb all view all enotes, calculate key images
Master k_m all all

Sorry, this post strayed away from the view tag balancing discussion, but changing the derivation tree and self-send higher-level secrets calculations in this manner can be added to the current Jamtis proposal orthogonally to make better hot/cold wallet setups for little to no extra cost.

@tevador
Copy link
Author

tevador commented Sep 15, 2023

For the current Jamtis code in seraphis_lib with PLAIN, DUMMY, CHANGE, & SELFSPEND enote types

What is the reasoning for these types? AFAICS we only need 2 types to tell the wallet if the enote should be displayed in history or not (this could also be achieved with a 1-bit flag encrypted with s^sr_2, so only 1 extra K_o recomputation is needed).

To expand on the last point, we could have all secret keys (besides cipher-tag) below the view-balance secret in the derivation tree: view-received, view-sent (new key explained below), unlock-amounts, generate-address (moved out from under view-received), and filter-involved (basically the same as filter-received but the name needs an update since we use it also for outgoing). Then we can mix and match the unlock-amounts key with/without view-received and view-sent keys to create different tiers while keeping the number of operations in balance recovery the same.

I don't like the additional tiers between "FilterInvolved" and "ViewBalance". They give more arguments for third-party scanners to request additional private keys. Especially the "HotWallet" tier sounds very dangerous as light wallet users might be satisfied with not revealing amounts, but it actually allows the third party to identify spent outputs in the blockchain.

The missing d_ua key and the key hierarchy in my proposal was intentional to prevent any wallet tiers that could be useful for 3rd party scanning other than "FilterReceived".

Here is a comment by @UkoeHB speaking against your "HotWallet" tier: https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024?permalink_comment_id=4274612#gistcomment-4274612

@jeffro256
Copy link

What is the reasoning for these types? AFAICS we only need 2 types to tell the wallet if the enote should be displayed in history or not (this could also be achieved with a 1-bit flag encrypted with s^sr_2, so only 1 extra K_o recomputation is needed).

Sorry for the confusion, PLAIN is the type for normal enotes. DUMMY, CHANGE, and SELFSPEND are the self-send types. As for the DUMMY type, @UkoeHB would probably be able to answer this question best. But to be fair, he added that type in when self-send type checks were relatively cheap (b/c of address tag hints).

I don't like the additional tiers between "FilterInvolved" and "ViewBalance". They give more arguments for third-party scanners to request additional private keys. Especially the "HotWallet" tier sounds very dangerous as light wallet users might be satisfied with not revealing amounts, but it actually allows the third party to identify spent outputs in the blockchain.

The missing d_ua key and the key hierarchy in my proposal was intentional to prevent any wallet tiers that could be useful for 3rd party scanning other than "FilterReceived".

Since its only messing with self-send secret paths and account key derivation, there is literally nothing stopping someone from doing this anyways and still interacting with everyone else in a backwards-compatible manner, and without external observers knowing (senders hopefully ;) don't know the discrete log of your account secrets, so they can't know if you multiplied by d_ua or not). Also, the difference here with the "HotWallet" tier is that it isn't some cheesed tier that only makes sense as an abused scanning tier, it has an actual use case to make people's hot wallets more private.

@jeffro256
Copy link

a 1-bit flag encrypted with s^sr_2, so only 1 extra K_o recomputation is needed

How would this work? I could understand if it was one bit in the address index

@tevador
Copy link
Author

tevador commented Sep 15, 2023

there is literally nothing stopping someone from doing this anyways and still interacting with everyone else in a backwards-compatible manner, and without external observers knowing (senders hopefully ;) don't know the discrete log of your account secrets, so they can't know if you multiplied by d_ua or not).

Such a wallet would not be compatible with other wallet software if it was using different derivation paths and additional private keys. Yes, you can't prevent someone from inventing custom wallets that allow users to lose privacy, but it should not be supported by the official software.

Also, the difference here with the "HotWallet" tier is that it isn't some cheesed tier that only makes sense as an abused scanning tier, it has an actual use case to make people's hot wallets more private.

I don't see exactly how it's useful for hot wallets. We already have the "PaymentValidator" tier that is intended as a (view-only) hot wallet. If you are using a hardware wallet, presumably you have a "ViewBalance" tier and the hardware wallet only stores the master key. Without seeing amounts, you can't prepare a transaction to be signed with the hardware wallet.

How would this work? I could understand if it was one bit in the address index

  1. You derive the self-send s^sr_1 and recompute K_o. If it matches, you have identified a self-send enote.
  2. Derive s^sr_2 to decrypt the amount and the 1-bit flag. The flag will tell you if this self-send should be displayed in the transaction history (because the user actively sent funds to their own address) or rather be subtracted from the spent amount (because it's a change enote).

Btw, in order to properly support self-sends, I think the self-send shared secrets need to include the output index in the hash, otherwise a 2-out transaction with two self-sends (1 self-spend and 1 change) would have the same shared secrets (and view tags) for both outputs. I'm not sure how it's handled in the current Seraphis library.

@tevador
Copy link
Author

tevador commented Sep 15, 2023

Here are some additions to my proposal:

View tag filter target

The filter target should be 480 enotes/day. Because the view tag filter rate must be a power of 2, this will actually result in a range from 480 to 960 enotes per day depending on the tx volume. If we "average the averages" over all possible values of tx volume, this will give a mean of 720 enote matches per day, or roughly 1 match per block, which is what was suggested by @jeffro256. I think this is close to the upper limit of what is acceptable for light wallet clients (~200 KB/day) and should provide a good number of false positives even if there was a short term drop in tx volume.

The fomula to calculate the view tag size in bits is:

tag_size = trunc(log2(3 * num_outputs_100k / 200000))

where num_outputs_100k is the total number of outputs in the last 100 000 blocks. The trunc(log2(x)) function can be easily calculated using only integer operations (it's basically the position of the most significant bit).

As an example, the value of num_outputs_100k is currently about 7.9 million, which results in a view tag size of 6 bits when plugged into the formula. With around 56000 daily outputs, there will be about 880 matches per day. If the long-term daily volume increases to about 62000 ouputs, the view tag size will be increased to 7 bits and the number of matches will drop to 480 per day.

View tag size encoding

The view tag size must be encoded explicitly to avoid UX issues with missed transactions at times when the view tag size changes. This can be done with a 1-byte field per transaction (all outputs will use the same tag size).

I'm proposing a range of valid values for the tag size between 1 and 16 bits (instead of the previously proposed 5-20 bits).

A 1-bit view tag requires num_outputs_100k > 133333. Since there are always at least 100k coinbase outputs, the 1-bit view tag would be "too large" only if there were fewer than 120 transactions per day, which hasn't happened on mainnet except for a few weeks shortly after launch in 2014.

A 17-bit view tag that would overflow the supported range would require num_outputs_100k > 8738133333, an increase of more than 1000x over the current tx volume. If this somehow happened, the number of false positives would exceed 960 per day, which would only have performance implications for light wallets, but would not cause any privacy problems.

So the proposed range of 1-16 bits is sufficient.

Complementary view tag

Regardless of the tag_size, the view tag is always encoded in 2 bytes as a 16-bit integer per enote. The remaining bits are filled with a "complementary" view tag calculated from s^sr_1, which needs a different private key.

For example, with tag_size = 6, the 16 bits would be CCCCCCCCCCTTTTTT, where T is a view tag bit and C is a complementary view tag bit. This construction ensures that only a few K_o recomputations are needed per 65536 enotes.

Wallet type For each enote For ~720 enotes/day For 1/65536 enotes
Full wallet (ViewAll) 1x DH 1x DH 3x recompute K_o
Full wallet (ViewReceived) 1x DH 1x DH 1x recompute K_o
Light wallet (ViewAll) - 1x DH 3x recompute K_o
Light wallet (ViewReceived) - 1x DH 1x recompute K_o

A 3rd party scanner would need to be provided with the view-received key d_vr in order to calculate the full 16-bit view tag for normal enotes. There are 3 deterrents against such usage:

  1. Complete loss of privacy for received payments (everything is leaked including amounts).
  2. Self-send enotes are not detected this way.
  3. The CPU savings for the light client are small (~100 ms/day at best).

@tevador
Copy link
Author

tevador commented Sep 15, 2023

Btw, in order to properly support self-sends, I think the self-send shared secrets need to include the output index in the hash, otherwise a 2-out transaction with two self-sends (1 self-spend and 1 change) would have the same shared secrets (and view tags) for both outputs. I'm not sure how it's handled in the current Seraphis library.

I'll answer myself here: this is currently solved by including K_o in the view tag hash, which is actually a better solution than just using the output index.

@UkoeHB
Copy link

UkoeHB commented Sep 16, 2023

I don't have bandwidth to respond to everything, but wanted to clarify this:

For the current Jamtis code in seraphis_lib with PLAIN, DUMMY, CHANGE, & SELFSPEND enote types

What is the reasoning for these types?

PLAIN = normal enote
DUMMY = self-send with zero amount inserted to ensure a tx has at least one self-send (a requirement added so that a remote scanner only needs to transmit key images from txs with view tag matches instead of all txs)
CHANGE = change
SELFSPEND = non-change/dummy self-send (e.g. churn), which is differentiated from change to aid bookkeeping

@tevador
Copy link
Author

tevador commented Sep 16, 2023

DUMMY = self-send with zero amount inserted to ensure a tx has at least one self-send (a requirement added so that a remote scanner only needs to transmit key images from txs with view tag matches instead of all txs)

I don't think this one is needed. It can be a CHANGE with zero amount. My calculations above assume that only 3 K_o recomputations are needed (1x PLAIN, 1x CHANGE, 1x SELFSPEND).

@tevador
Copy link
Author

tevador commented Sep 16, 2023

Wallet Tiers

Tier Knowledge Off-chain capabilities On-chain capabilities
GenAddr s_ga generate public addresses none
FilterInvolved d_fi recognize all public wallet addresses calculate view tags
ViewReceived d_vr, d_fi, s_ga all view all received enotes (w/o amounts) except for change and self-spends
ViewSent s_vs, d_fi, s_ga all view all change and self-spends enotes (w/o amounts)
HotWallet s_vs, d_vr, d_fi, s_ga, all view all received, change, and self-spends enotes (w/o amounts)
PaymentValidator d_fi, d_vr, d_ua, s_ga, all view all received enotes with amounts
ViewBalance k_vb all view all enotes, calculate key images
Master k_m all all

I think there is an infinite number of ways how the protocol can be made more complex with more features that we think might be useful. However, we should avoid unnecessarily bloating the specs and overloading users with choices.

Let's have a look at the features of Jamtis that have clear evidence of popular demand:

GenAddr wallet tier

It has been voiced many times by merchants that providing the ability to generate addresses without view access to the wallet is important. It came up for example during the discussion about deprecating integrated addresses: monero-project/meta#299 (comment)

FilterReceived wallet tier and dynamic view tags

The poplar demand for wallets that scan the blockchain on behalf of users is clear from the existence of such services, e.g. mymonero.com. The FilterReceived tier together with the dynamic view tag size provides a solution that preserves privacy to certain extent and only has a small fixed cost over providing full view access to the wallet.

Full view access tier

There is plenty of evidence that view-only wallets that cannot recognize spent outputs and thus display incorrect balance are bad for UX.

monero-project/monero#8613
monero-project/monero#7365
https://old.reddit.com/r/Monero/comments/4ce5ui/what_is_the_use_of_view_only_wallet_when_its/

Robust output recognition

Again, there is plenty of evidence that the lookup-table based approch for recognizing owned outputs is problematic for UX and sometimes causes the wallet to miss payments.

monero-project/monero#8138
https://monero.stackexchange.com/questions/10704/accounts-got-deleted-from-the-wallet
https://monero.stackexchange.com/questions/10184/funds-received-from-subwallet-are-not-showing

Janus attack protection

The attack was described in an official advisory here: https://web.getmonero.org/2019/10/18/subaddress-janus.html

The proposed mitigation is quite problematic for UX:

Use separate wallets instead of separate subaddresses if you need to keep two different addresses completely unlinkable. Alternatively, do not notify any sender of the receipt of funds to your wallet.

Users not aware of the existence of this attack might get exposed, so I think there is a sufficiently strong case to mitigate it, even if the cost is 51 extra characters in every address.

Payment validator (ViewReceived) wallet tier

I think this feature does not have such a strong case as the above features. Its functionality can be achieved with the ViewBalance tier, with only a small loss of privacy compared to current CryptoNote view-only access (explained by @UkoeHB here). I coudn't find any evidence of merchants requesting more privacy for view-only wallets.

However, since we are on the track to implement full-chain membership proofs, this might tip the balance in favor of a wallet tier that cannot strongly identify outgoing payments.

@jeffro256
Copy link

Such a wallet would not be compatible with other wallet software if it was using different derivation paths and additional private keys. Yes, you can't prevent someone from inventing custom wallets that allow users to lose privacy, but it should not be supported by the official software.

I guess we should clarify what we're actually trying to do here: are we trying to describe an address protocol or a specific wallet design? Assuming that we use some deterministic seed phrase and/or we allow the view-balance key to be exported, people will be able to make worse wallet implementations that offer more efficient scanning. If we're trying to mitigate this issue at the addressing protocol level using incentives, we need to make this actually part of the protocol. Dynamic view tags are protocol level, and so is normal enote DH exchanges against addresses. Self-send secrets and account secret derivation really aren't. If the protections we're providing are based on telling users to pretty please not do certain actions with their keys, we can have the "official" wallet software not do these things, but we're back to square one with @UkoeHB's key offloading initial observation. Enough tempting, and users might move from the "official" wallet software to a worse implementation, and then our wallet design means nothing.

I don't see exactly how it's useful for hot wallets. We already have the "PaymentValidator" tier that is intended as a (view-only) hot wallet. If you are using a hardware wallet, presumably you have a "ViewBalance" tier and the hardware wallet only stores the master key. Without seeing amounts, you can't prepare a transaction to be signed with the hardware wallet.

How cold/hot wallet setups work today is that the hot wallet is connected to the internet and has the view key, while the cold wallet is air-gapped and has the spend key. The hot wallet first scans for incoming transactions, and then sends those to the cold wallet. The cold wallet calculates key images and sends the key images back to the hot wallet to scan for outgoing transactions. When the user wants to spent, the hot wallet collects the output distribution, sends it to the cold wallet, the cold wallet signs, then sends the signed transaction back to the hot wallet wallet, which submits to the network.

In the proposed cold/hard wallet setup, the hot wallet would only scan all involved transactions without knowing amounts and send public transaction info to the cold wallet. When a user wants to spend funds, they do input selection and ownership proofs on the cold wallet, then send the signed transaction to the hot wallet, which finishes the transaction by completing membership proofs and submitting to the network. This type of setup is orthogonal from PaymentValidator, because it can be used without a PaymentValidator wallet, but if someone wants to spend funds that are collected from a PaymentValidator, they have to have some wallet somewhere with signing capabilities. And with the "HotWallet" tier, you can export outgoing txs to a signing wallet without knowing amounts, which is especially beneficial for merchants who frequently have large amounts of income flow to turn over.

I think there is an infinite number of ways how the protocol can be made more complex with more features that we think might be useful. However, we should avoid unnecessarily bloating the specs and overloading users with choices.

This is a UX problem, and IMO out of the scope of Jamtis. Which options are given to the users is down to the exact wallet implementation, because not all wallets have to support all features. It is my opinion that we should work as hard as possible to make Jamtis as flexible as possible for all sorts of users, while minimizing risk at the protocol level by using incentives to make users have fewer reasons to bomb their privacy for the sake of efficiency. I want to reiterate that an extremely simple "solution" to scanning inefficiency is giving away the view-balance key. That is always a choice for the user, so we have to always be competing with that. Making the protocol less flexible for users will squeeze users into worse privacy tiers. I agree that if there is some "official" implementation, their shouldn't be complete freedom on the part of the users to expose certain secrets, but again, that's outside of the scope of Jamtis IMO.

Derive s^sr_2 to decrypt the amount and the 1-bit flag. The flag will tell you if this self-send should be displayed in the transaction history (because the user actively sent funds to their own address) or rather be subtracted from the spent amount (because it's a change enote).

Okay, this makes a lot of sense actually. I like this idea.

@tevador
Copy link
Author

tevador commented Sep 17, 2023

I guess we should clarify what we're actually trying to do here: are we trying to describe an address protocol or a specific wallet design?

This is a UX problem, and IMO out of the scope of Jamtis.

Jamtis specifies part of the Seraphis wallet design, which includes:

  1. The format of the mnemonic seed.
  2. How private keys are derived from the mnemonic seed.
  3. How public addresses are generated from private keys.
  4. How enotes are constructed based on public addresses.
  5. How owned enotes are recognized based on wallet private keys.

All of this must be part of the specs to avoid fragmentation of the Monero ecosystem. This way, a user can restore their mnemonic seed into any compliant wallet implementation and will see the correct balance.

For example, the original CryptoNote protocol didn't specify how the private spend key and view key are derived and as a result, we got at least two incompatible wallet designs that produce different view keys:

https://xmr.llcoins.net/addresstests.html

You may have noticed a critical difference between this style and the Electrum Style: MyMonero's Private View Key derivation is done by hashing random integer a, while Electrum Style derivation is done by hashing the Private Spend Key. This means that 13 and 25 word seeds are not compatible – it is not possible to create an Electrum Style seed (and account) that matches a MyMonero Style seed (and account) or vice versa; the view keypair will always be different.

@tevador
Copy link
Author

tevador commented Sep 17, 2023

How cold/hot wallet setups work today is that the hot wallet is connected to the internet and has the view key, while the cold wallet is air-gapped and has the spend key. The hot wallet first scans for incoming transactions, and then sends those to the cold wallet. The cold wallet calculates key images and sends the key images back to the hot wallet to scan for outgoing transactions. When the user wants to spent, the hot wallet collects the output distribution, sends it to the cold wallet, the cold wallet signs, then sends the signed transaction back to the hot wallet wallet, which submits to the network.

Current Seraphis/Jamtis design improves this flow significantly. The hot wallet is a ViewBalance tier that can see everything and allows the user to safely prepare an unsigned transaction, including the exact list of enotes to spend and all outputs. Then there is one interaction with the cold wallet, which only displays the most basic information for confirmation and then sends the ownership proofs to the hot wallet. The hot wallet then performs decoy selection, completes the membership proof and submits the transaction to the network.

In the proposed cold/hard wallet setup [...]

I don't see a strong case to support the additional tiers in the official specs.

input selection [...] on the cold wallet

Hardware wallets have a tiny screen and a few buttons, which are enough to display and confirm the intent, but are completely inadequate for a full wallet interface. The setup with a ViewBalance main wallet and a signing hardware wallet has a much more appealing UX and achieves the main goal why users purchase hardware wallets: theft protection.

@jeffro256
Copy link

Current Seraphis/Jamtis design improves this flow significantly. The hot wallet is a ViewBalance tier that can see everything and allows the user to safely prepare an unsigned transaction, including the exact list of enotes to spend and all outputs. Then there is one interaction with the cold wallet, which only displays the most basic information for confirmation and then sends the ownership proofs to the hot wallet. The hot wallet then performs decoy selection, completes the membership proof and submits the transaction to the network.

This new flow would still be possible with almost exactly the same operations if you separate one-time address recovery and amount recovery with different keys.

I don't see a strong case to support the additional tiers in the official specs.

I think this might be tied to the following argument you make later:

The setup with a ViewBalance main wallet and a signing hardware wallet has a much more appealing UX and achieves the main goal why users purchase hardware wallets: theft protection.

For the same reason why one would want theft protection, one would want to hide amounts: mitigating compromised internet-connected computing devices. If you are working hard to mitigate this threat, I don't see why it would be such a leap to assume that some wouldn't also want to also hide the balances from their assumed-to-be compromised devices, given the option.

Hardware wallets have a tiny screen and a few buttons, which are enough to display and confirm the intent, but are completely inadequate for a full wallet interface

They wouldn't have to implement any interfacing, just the transaction input selection algorithm, which is neither or a compute heavy task nor memory heavy. At any rate, this also wouldn't be a problem for cold wallets using a normal laptop/desktop/smartphone.

@jeffro256
Copy link

jeffro256 commented Sep 18, 2023

All of this must be part of the specs to avoid fragmentation of the Monero ecosystem. This way, a user can restore their mnemonic seed into any compliant wallet implementation and will see the correct balance.

The key word here is compliant. Thus far we've been trying to mitigate people moving off of a compliant wallet design to something more suitable. The only way we can do this is by using incentives, which is why it's my opinion that its fruitless to base the privacy off of doing optionally doing something with your keys, given that you can interact with the ecosystem all the same. As such, telling users to not multiply by some d_ua factor is merely a recommendation, and we should not expect the ecosystem to evolve in that manner. And since, IMO, separating the one time address recovery and amount recovery in self-send enotes provides a tangible benefit for a real use case without affecting the performance of others, we should move forward with that feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment