Skip to content

Instantly share code, notes, and snippets.

@tevador
Last active December 10, 2024 20:03
Show Gist options
  • Save tevador/50160d160d24cfc6c52ae02eb3d17024 to your computer and use it in GitHub Desktop.
Save tevador/50160d160d24cfc6c52ae02eb3d17024 to your computer and use it in GitHub Desktop.

JAMTIS

This document describes a new addressing scheme for Monero.

Chapters 1-2 are intended for general audience.

Chapters 3-7 contain technical specifications.

Table of Contents

1. Introduction

1.1 Why a new address format?

Sometime in 2024, Monero plans to adopt a new transaction protocol called Seraphis [1], which enables much larger ring sizes than the current RingCT protocol. However, due to a different key image construction, Seraphis is not compatible with CryptoNote addresses. This means that each user will need to generate a new set of addresses from their existing private keys. This provides a unique opportunity to vastly improve the addressing scheme used by Monero.

1.2 Current Monero addresses

The CryptoNote-based addressing scheme [2] currently used by Monero has several issues:

  1. Addresses are not suitable as human-readable identifiers because they are long and case-sensitive.
  2. Too much information about the wallet is leaked when scanning is delegated to a third party.
  3. Generating subaddresses requires view access to the wallet. This is why many merchants prefer integrated addresses [3].
  4. View-only wallets need key images to be imported to detect spent outputs [4].
  5. Subaddresses that belong to the same wallet can be linked via the Janus attack [5].
  6. The detection of outputs received to subaddresses is based on a lookup table, which can sometimes cause the wallet to miss outputs [6].

1.3 Jamtis

Jamtis is a new addressing scheme that was developed specifically for Seraphis and tackles all of the shortcomings of CryptoNote addresses that were mentioned above. Additionally, Jamtis incorporates two other changes related to addresses to take advantage of this large upgrade opportunity:

  • A new 16-word mnemonic scheme called Polyseed [7] that will replace the legacy 25-word seed for new wallets.
  • The removal of integrated addresses and payment IDs [8].

2. Features

2.1 Address format

Jamtis addresses, when encoded as a string, start with the prefix xmra and consist of 196 characters. Example of an address: xmra1mj0b1977bw3ympyh2yxd7hjymrw8crc9kin0dkm8d3wdu8jdhf3fkdpmgxfkbywbb9mdwkhkya4jtfn0d5h7s49bfyji1936w19tyf3906ypj09n64runqjrxwp6k2s3phxwm6wrb5c0b6c1ntrg2muge0cwdgnnr7u7bgknya9arksrj0re7whkckh51ik

There is no "main address" anymore - all Jamtis addresses are equivalent to a subaddress.

2.1.1 Recipient IDs

Jamtis introduces a short recipient identifier (RID) that can be calculated for every address. RID consists of 25 alphanumeric characters that are separated by underscores for better readability. The RID for the above address is regne_hwbna_u21gh_b54n0_8x36q. Instead of comparing long addresses, users can compare the much shorter RID. RIDs are also suitable to be communicated via phone calls, text messages or handwriting to confirm a recipient's address. This allows the address itself to be transferred via an insecure channel.

2.2 Light wallet scanning

Jamtis introduces new wallet tiers below view-only wallet. One of the new wallet tiers called "FindReceived" is intended for wallet-scanning and only has the ability to calculate view tags [9]. It cannot generate wallet addresses or decode output amounts.

View tags can be used to eliminate 99.6% of outputs that don't belong to the wallet. If provided with a list of wallet addresses, this tier can also link outputs to those addresses. Possible use cases are:

2.2.1 Wallet component

A wallet can have a "FindReceived" component that stays connected to the network at all times and filters out outputs in the blockchain. The full wallet can thus be synchronized at least 256x faster when it comes online (it only needs to check outputs with a matching view tag).

2.2.2 Third party services

If the "FindReceived" private key is provided to a 3rd party, it can preprocess the blockchain and provide a list of potential outputs. This reduces the amount of data that a light wallet has to download by a factor of at least 256. The third party will not learn which outputs actually belong to the wallet and will not see output amounts.

2.3 Wallet tiers for merchants

Jamtis introduces new wallet tiers that are useful for merchants.

2.3.1 Address generator

This tier is intended for merchant point-of-sale terminals. It can generate addresses on demand, but otherwise has no access to the wallet (i.e. it cannot recognize any payments in the blockchain).

2.3.2 Payment validator

This wallet tier combines the Address generator tier with the ability to also view received payments (including amounts). It is intended for validating paid orders. It cannot see outgoing payments and received change.

2.4 Full view-only wallets

Jamtis supports full view-only wallets that can identify spent outputs (unlike legacy view-only wallets), so they can display the correct wallet balance and list all incoming and outgoing transactions.

2.5 Janus attack mitigation

Janus attack is a targeted attack that aims to determine if two addresses A, B belong to the same wallet. Janus outputs are crafted in such a way that they appear to the recipient as being received to the wallet address B, while secretly using a key from address A. If the recipient confirms the receipt of the payment, the sender learns that they own both addresses A and B.

Jamtis prevents this attack by allowing the recipient to recognize a Janus output.

2.6 Robust output detection

Jamtis addresses and outputs contain an encrypted address tag which enables a more robust output detection mechanism that does not need a lookup table and can reliably detect outputs sent to arbitrary wallet addresses.

3. Notation

3.1 Serialization functions

  1. The function BytesToInt256(x) deserializes a 256-bit little-endian integer from a 32-byte input.
  2. The function Int256ToBytes(x) serialized a 256-bit integer to a 32-byte little-endian output.

3.2 Hash function

The function Hb(k, x) with parameters b, k, refers to the Blake2b hash function [10] initialized as follows:

  • The output length is set to b bytes.
  • Hashing is done in sequential mode.
  • The Personalization string is set to the ASCII value "Monero", padded with zero bytes.
  • If the key k is not null, the hash function is initialized using the key k (maximum 64 bytes).
  • The input x is hashed.

The function SecretDerive is defined as:

SecretDerive(k, x) = H32(k, x)

3.3 Elliptic curves

Two elliptic curves are used in this specification:

  1. Curve25519 - a Montgomery curve. Points on this curve include a cyclic subgroup 𝔾1.
  2. Ed25519 - a twisted Edwards curve. Points on this curve include a cyclic subgroup 𝔾2.

Both curves are birationally equivalent, so the subgroups 𝔾1 and 𝔾2 have the same prime order ℓ = 2252 + 27742317777372353535851937790883648493. The total number of points on each curve is 8ℓ.

3.3.1 Curve25519

Curve25519 is used exclusively for the Diffie-Hellman key exchange [11].

Only a single generator point B is used:

Point Derivation Serialized (hex)
B generator of 𝔾1 0900000000000000000000000000000000000000000000000000000000000000

Private keys for Curve25519 are 32-byte integers denoted by a lowercase letter d. They are generated using the following KeyDerive1(k, x) function:

  1. d = H32(k, x)
  2. d[31] &= 0x7f (clear the most significant bit)
  3. d[0] &= 0xf8 (clear the least significant 3 bits)
  4. return d

All Curve25519 private keys are therefore multiples of the cofactor 8, which ensures that all public keys are in the prime-order subgroup. The multiplicative inverse modulo is calculated as d-1 = 8*(8*d)-1 to preserve the aforementioned property.

Public keys (elements of 𝔾1) are denoted by the capital letter D and are serialized as the x-coordinate of the corresponding Curve25519 point. Scalar multiplication is denoted by a space, e.g. D = d B.

3.3.2 Ed25519

The Edwards curve is used for signatures and more complex cryptographic protocols [12]. The following three generators are used:

Point Derivation Serialized (hex)
G generator of 𝔾2 5866666666666666666666666666666666666666666666666666666666666666
U Hp("seraphis U") 126582dfc357b10ecb0ce0f12c26359f53c64d4900b7696c2c4b3f7dcab7f730
X Hp("seraphis X") 4017a126181c34b0774d590523a08346be4f42348eddd50eb7a441b571b2b613

Here Hp refers to an unspecified hash-to-point function.

Private keys for Ed25519 are 32-byte integers denoted by a lowercase letter k. They are generated using the following function:

KeyDerive2(k, x) = H64(k, x) mod ℓ

Public keys (elements of 𝔾2) are denoted by the capital letter K and are serialized as 256-bit integers, with the lower 255 bits being the y-coordinate of the corresponding Ed25519 point and the most significant bit being the parity of the x-coordinate. Scalar multiplication is denoted by a space, e.g. K = k G.

3.4 Block cipher

The function BlockEnc(s, x) refers to the application of the Twofish [13] permutation using the secret key s on the 16-byte input x. The function BlockDec(s, x) refers to the application of the inverse permutation using the key s.

3.5 Base32 encoding

"Base32" in this specification referes to a binary-to-text encoding using the alphabet xmrbase32cdfghijknpqtuwy01456789. This alphabet was selected for the following reasons:

  1. The order of the characters has a unique prefix that distinguishes the encoding from other variants of "base32".
  2. The alphabet contains all digits 0-9, which allows numeric values to be encoded in a human readable form.
  3. Excludes the letters o, l, v and z for the same reasons as the z-base-32 encoding [14].

4. Wallets

4.1 Wallet parameters

Each wallet consists of two main private keys and a timestamp:

Field Type Description
km private key wallet master key
kvb private key view-balance key
birthday timestamp date when the wallet was created

The master key km is required to spend money in the wallet and the view-balance key kvb provides full view-only access.

The birthday timestamp is important when restoring a wallet and determines the blockchain height where scanning for owned outputs should begin.

4.2 New wallets

4.2.1 Standard wallets

Standard Jamtis wallets are generated as a 16-word Polyseed mnemonic [7], which contains a secret seed value used to derive the wallet master key and also encodes the date when the wallet was created. The key kvb is derived from the master key.

Field Derivation
km BytesToInt256(polyseed_key) mod ℓ
kvb kvb = KeyDerive1(km, "jamtis_view_balance_key")
birthday from Polyseed

4.2.2 Multisignature wallets

Multisignature wallets are generated in a setup ceremony, where all the signers collectively generate the wallet master key km and the view-balance key kvb.

Field Derivation
km setup ceremony
kvb setup ceremony
birthday setup ceremony

4.3 Migration of legacy wallets

Legacy pre-Seraphis wallets define two private keys:

  • private spend key ks
  • private view-key kv

4.3.1 Standard wallets

Legacy standard wallets can be migrated to the new scheme based on the following table:

Field Derivation
km km = ks
kvb kvb = KeyDerive1(km, "jamtis_view_balance_key")
birthday entered manually

Legacy wallets cannot be migrated to Polyseed and will keep using the legacy 25-word seed.

4.3.2 Multisignature wallets

Legacy multisignature wallets can be migrated to the new scheme based on the following table:

Field Derivation
km km = ks
kvb kvb = kv
birthday entered manually

4.4 Additional keys

There are additional keys derived from kvb:

Key Name Derivation Used to
dfr find-received key kfr = KeyDerive1(kvb, "jamtis_find_received_key") scan for received outputs
dua unlock-amounts key kid = KeyDerive1(kvb, "jamtis_unlock_amounts_key") decrypt output amounts
sga generate-address secret sga = SecretDerive(kvb, "jamtis_generate_address_secret") generate addresses
sct cipher-tag secret ket = SecretDerive(sga, "jamtis_cipher_tag_secret") encrypt address tags

The key dfr provides the ability to calculate the sender-receiver shared secret when scanning for received outputs. The key dua can be used to create a secondary shared secret and is used to decrypt output amounts.

The key sga is used to generate public addresses. It has an additional child key sct, which is used to encrypt the address tag.

4.5 Key hierarchy

The following figure shows the overall hierarchy of wallet keys. Note that the relationship between km and kvb only applies to standard (non-multisignature) wallets.

key hierarchy

4.6 Wallet access tiers

Tier Knowledge Off-chain capabilities On-chain capabilities
AddrGen sga generate public addresses none
FindReceived dfr recognize all public wallet addresses eliminate 99.6% of non-owned outputs (up to § 5.3.5), link output to an address (except of change and self-spends)
ViewReceived dfr, dua, sga all view all received except of change and self-spends (up to § 5.3.14)
ViewAll kvb all view all
Master km all all

4.6.1 Address generator (AddrGen)

This wallet tier can generate public addresses for the wallet. It doesn't provide any blockchain access.

4.6.2 Output scanning wallet (FindReceived)

Thanks to view tags, this tier can eliminate 99.6% of outputs that don't belong to the wallet. If provided with a list of wallet addresses, it can also link outputs to those addresses (but it cannot generate addresses on its own). This tier should provide a noticeable UX improvement with a limited impact on privacy. Possible use cases are:

  1. An always-online wallet component that filters out outputs in the blockchain. A higher-tier wallet can thus be synchronized 256x faster when it comes online.
  2. Third party scanning services. The service can preprocess the blockchain and provide a list of potential outputs with pre-calculated spend keys (up to § 5.2.4). This reduces the amount of data that a light wallet has to download by a factor of at least 256.

4.6.3 Payment validator (ViewReceived)

This level combines the tiers AddrGen and FindReceived and provides the wallet with the ability to see all incoming payments to the wallet, but cannot see any outgoing payments and change outputs. It can be used for payment processing or auditing purposes.

4.6.4 View-balance wallet (ViewAll)

This is a full view-only wallet than can see all incoming and outgoing payments (and thus can calculate the correct wallet balance).

4.6.5 Master wallet (Master)

This tier has full control of the wallet.

4.7 Wallet public keys

There are 3 global wallet public keys. These keys are not usually published, but are needed by lower wallet tiers.

Key Name Value
Ks wallet spend key Ks = kvb X + km U
Dua unlock-amounts key Dua = dua B
Dfr find-received key Dfr = dfr Dua

5. Addresses

5.1 Address generation

Jamtis wallets can generate up to 2128 different addresses. Each address is constructed from a 128-bit index j. The size of the index space allows stateless generation of new addresses without collisions, for example by constructing j as a UUID [15].

Each Jamtis address encodes the tuple (K1j, D2j, D3j, tj). The first three values are public keys, while tj is the "address tag" that contains the encrypted value of j.

5.1.1 Address keys

The three public keys are constructed as:

  • K1j = Ks + kuj U + kxj X + kgj G
  • D2j = daj Dfr
  • D3j = daj Dua

The private keys kuj, kxj, kgj and daj are derived as follows:

Keys Name Derivation
kuj spend key extensions kuj = KeyDerive2(sga, "jamtis_spendkey_extension_u" || j)
kxj spend key extensions kxj = KeyDerive2(sga, "jamtis_spendkey_extension_x" || j)
kgj spend key extensions kgj = KeyDerive2(sga, "jamtis_spendkey_extension_g" || j)
daj address keys daj = KeyDerive1(sga, "jamtis_address_privkey" || j)

5.1.2 Address tag

Each address additionally includes an 18-byte tag tj = (j', hj'), which consists of the encrypted value of j:

  • j' = BlockEnc(sct, j)

and a 2-byte "tag hint", which can be used to quickly recognize owned addresses:

  • hj' = H2(sct, "jamtis_address_tag_hint" || j')

5.2 Sending to an address

TODO

5.3 Receiving an output

TODO

5.4 Change and self-spends

TODO

5.5 Transaction size

Jamtis has a small impact on transaction size.

5.5.1 Transactions with 2 outputs

The size of 2-output transactions is increased by 28 bytes. The encrypted payment ID is removed, but the transaction needs two encrypted address tags t~ (one for the recipient and one for the change). Both outputs can use the same value of De.

5.5.2 Transactions with 3 or more outputs

Since there are no "main" addresses anymore, the TX_EXTRA_TAG_PUBKEY field can be removed from transactions with 3 or more outputs.

Instead, all transactions with 3 or more outputs will require one 50-byte tuple (De, t~) per output.

6. Address encoding

6.1 Address structure

An address has the following overall structure:

Field Size (bits) Description
Header 30* human-readable address header (§ 6.2)
K1 256 address key 1
D2 255 address key 2
D3 255 address key 3
t 144 address tag
Checksum 40* (§ 6.3)

* The header and the checksum are already in base32 format

6.2 Address header

The address starts with a human-readable header, which has the following format consisting of 6 alphanumeric characters:

"xmra" <version char> <network type char>

Unlike the rest of the address, the header is never encoded and is the same for both the binary and textual representations. The string is not null terminated.

The software decoding an address shall abort if the first 4 bytes are not 0x78 0x6d 0x72 0x61 ("xmra").

The "xmra" prefix serves as a disambiguation from legacy addresses that start with "4" or "8". Additionally, base58 strings that start with the character x are invalid due to overflow [16], so legacy Monero software can never accidentally decode a Jamtis address.

6.2.1 Version character

The version character is "1". The software decoding an address shall abort if a different character is encountered.

6.2.2 Network type

network char network type
"t" testnet
"s" stagenet
"m" mainnet

The software decoding an address shall abort if an invalid network character is encountered.

6.3 Checksum

The purpose of the checksum is to detect accidental corruption of the address. The checksum consists of 8 characters and is calculated with a cyclic code over GF(32) using the polynomial:

x8 + 3x7 + 11x6 + 18x5 + 5x4 + 25x3 + 21x2 + 12x + 1

The checksum can detect all errors affecting 5 or fewer characters. Arbitrary corruption of the address has a chance of less than 1 in 1012 of not being detected. The reference code how to calculate the checksum is in Appendix A.

6.4 Binary-to-text encoding

An address can be encoded into a string as follows:

address_string = header + base32(data) + checksum

where header is the 6-character human-readable header string (already in base32), data refers to the address tuple (K1, D2, D3, t), encoded in 910 bits, and the checksum is the 8-character checksum (already in base32). The total length of the encoded address 196 characters (=6+182+8).

6.4.1 QR Codes

While the canonical form of an address is lower case, when encoding an address into a QR code, the address should be converted to upper case to take advantage of the more efficient alphanumeric encoding mode.

6.5 Recipient authentication

TODO

7. Test vectors

TODO

References

  1. https://github.com/UkoeHB/Seraphis
  2. https://github.com/monero-project/research-lab/blob/master/whitepaper/whitepaper.pdf
  3. monero-project/meta#299 (comment)
  4. https://www.getmonero.org/resources/user-guides/view_only.html
  5. https://web.getmonero.org/2019/10/18/subaddress-janus.html
  6. monero-project/monero#8138
  7. https://github.com/tevador/polyseed
  8. monero-project/monero#7889
  9. monero-project/research-lab#73
  10. https://eprint.iacr.org/2013/322.pdf
  11. https://cr.yp.to/ecdh/curve25519-20060209.pdf
  12. https://ed25519.cr.yp.to/ed25519-20110926.pdf
  13. https://www.schneier.com/wp-content/uploads/2016/02/paper-twofish-paper.pdf
  14. http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt
  15. https://en.wikipedia.org/wiki/Universally_unique_identifier
  16. https://github.com/monero-project/monero/blob/319b831e65437f1c8e5ff4b4cb9be03f091f6fc6/src/common/base58.cpp#L157

Appendix A: Checksum

# Jamtis address checksum algorithm

# cyclic code based on the generator 3BI5PLC1
# can detect 5 errors up to the length of 994 characters
GEN=[0x1ae45cd581, 0x359aad8f02, 0x61754f9b24, 0xc2ba1bb368, 0xcd2623e3f0]

M = 0xffffffffff

def jamtis_polymod(data):
    c = 1
    for v in data:
        b = (c >> 35)
        c = ((c & 0x07ffffffff) << 5) ^ v
        for i in range(5):
            c ^= GEN[i] if ((b >> i) & 1) else 0
    return c

def jamtis_verify_checksum(data):
    return jamtis_polymod(data) == M

def jamtis_create_checksum(data):
    polymod = jamtis_polymod(data + [0,0,0,0,0,0,0,0]) ^ M
    return [(polymod >> 5 * (7 - i)) & 31 for i in range(8)]

# test/example

CHARSET = "xmrbase32cdfghijknpqtuwy01456789"

addr_test = (
    "xmra1mj0b1977bw3ympyh2yxd7hjymrw8crc9kin0dkm8d3"
    "wdu8jdhf3fkdpmgxfkbywbb9mdwkhkya4jtfn0d5h7s49bf"
    "yji1936w19tyf3906ypj09n64runqjrxwp6k2s3phxwm6wr"
    "b5c0b6c1ntrg2muge0cwdgnnr7u7bgknya9arksrj0re7wh")

addr_data = [CHARSET.find(x) for x in addr_test]
addr_enc = addr_data + jamtis_create_checksum(addr_data)
addr = "".join([CHARSET[x] for x in addr_enc])

print(addr)
print("len =", len(addr))
print("valid =", jamtis_verify_checksum(addr_enc))
@UkoeHB
Copy link

UkoeHB commented Dec 30, 2021

They can be migrated to the new scheme based to the following table (applies to both standard and multisig wallets):

IMO only multisig wallets should do k_vb = k_v. If normal wallets also do that, then view-only wallets and existing scanning services will acquire k_vb, when they should only have k_fr.

@tevador
Copy link
Author

tevador commented Dec 30, 2021

Originally, I planned to generate new view keys for legacy wallets, but:

  1. Seems odd to have different rules for standard and multisig wallets. It is theoretically possible to convert a multisig wallet to a standard wallet, which would generate different keys if such a wallet were later restored using the merged keys.
  2. It would break existing view-only wallets.
  3. Legacy addresses will stop working anyways, so people who want to deliberately disable view-only wallets post-fork just have to generate new keys and send the funds to the new wallet. Nobody can continue sending them funds to the legacy wallet unless they publish a new address.

@UkoeHB
Copy link

UkoeHB commented Dec 30, 2021

Seems odd to have different rules for standard and multisig wallets. It is theoretically possible to convert a multisig wallet to a standard wallet, which would generate different keys if such a wallet were later restored using the merged keys.

This can only be done with custom code (and is very unlikely to be done). If someone's stuff breaks, they should be responsible for fixing it in this case.

The reason multisig needs a different procedure is it isn't feasible to hold a new full-group setup ceremony for existing multisig wallets (partial-group ceremonies are feasible, e.g. of size M, because either funds are spendable [at least M people available] or unspendable [in which case any further actions with the wallet are useless]).

It would break existing view-only wallets.

Since the existing permission structure is going away, existing wallets in the old permission structure should also stop working. Otherwise there will be a silent behavior change.

Legacy addresses will stop working anyways, so people who want to deliberately disable view-only wallets post-fork just have to generate new keys and send the funds to the new wallet. Nobody can continue sending them funds to the legacy wallet unless they publish a new address.

This sounds like opt-in privacy to me (i.e. users have to proactively move all their funds to a new wallet in order to receive the default privacy guarantees of the update).

@tevador
Copy link
Author

tevador commented Dec 30, 2021

Fair enough. I have reverted to the original migration procedure for standard wallets.

@busyboredom
Copy link

busyboredom commented Dec 31, 2021

There is a lot of complexity in the proposed signature types. This is my understanding of their intended use cases, can you confirm that I am understanding correctly @tevador ?

signature type dependent on intended use case
none (default?) address, amount, description simple 1-time payments between strangers
local address adding your address to a friend's address book, so the friend can be confident in future payments to that address
global wallet published by a charity so donors can have confidence in the charity's addresses optionally required by exchanges or employers to ensure proper recipient
local with certificate address, time a merchant giving an time-sensitive invoice

Would it make sense to do away with 'none' and make 'local' the default? Is there any advantage to making an RID depend on amount or description when those things are so easily verifiable anyway? Just trying to think of ways to reduce the complexity for users.

For 'local with certificate', how should the expected RID be communicated? It can't be published because it is different for every address, so it seems like it would end up being communicated over the same channel as the address. Wouldn't that defeat the purpose of having a signature? Edit: I have been corrected. The RID is independent of the address so long as the certificate is not expired, and can be published.

@tevador
Copy link
Author

tevador commented Dec 31, 2021

@busyboredom I think you misunderstood the "local with certificate" signature type. The RID in this case is equal to the wallet RID independent of the address. The reason why the merchant wallets need to use the certificate is for Tier 0 wallets to be able to generate certified addresses without knowing the private key kvb.

reduce the complexity for users

None of this should be visible to the user. The user will just see a string of characters. I think the recommened implementation would be:

  1. "Basic" addresses without amount/due date/description will not use any signature to be as short as possible.
  2. Invoices from non-merchant wallets will use a local signature for privacy reasons (addresses will stay unlinkable).
  3. The global signature type may be used in some special cases by both non-merchant and merchant Tier 2 wallets. For example, an employer should require a globally signed address to send the salary to an employee. KYC exchanges might only permit withdrawals to globally signed addresses with a specific registered RID.
  4. Addresses from merchant Tier 0 or Tier 1.5 wallets should use almost exclusively the "local with certificate" type together with a DNS-based verification of the RID.

@busyboredom
Copy link

Ahhhh, you are right, I misunderstood "local with certificate". Thank you for correcting me.

Wouldn't the user need to decide what type of signature to use when providing an address? That decision has a learning curve, which is what prompted my questions. The concept of a signature alone is pretty alien to most non-devs. Here's the specific scenario in my head that's bugging me:

  1. Bob gives Alice his address and its default "none" RID in person.
  2. Alice adds the address and RID to her address book.
  3. The next day, Bob sends Alice an invoice with an amount, again using the default "none" signature.
  4. Alice attempts to pay the invoice, but encounters a warning even though Bob is in her address book.
  5. Confusion and googling ensue.

In that example, bob messed up in (1) and (3) by using "none" when he should have used "local". The confusion could be avoided be defaulting to "local" at the cost of having slightly longer default addresses.

@tevador
Copy link
Author

tevador commented Jan 1, 2022

I have added the rest of test vectors for merchant wallets. Also there has been a small update of the DNS verification procedure, which now uses a CNAME record instead of a TXT record (much simpler to implement).

@busyboredom

In that example, bob messed up in (1) and (3) by using "none" when he should have used "local".

As I said earlier, the user will not choose the signature type. It should be selected automaticaly based on the wallet type and other parameters.

There may be some confusion about "wallet RID" and "address RID". They are only equal for "global" and "local with certificate" signature types. I'm not sure if we should perhaps use two separate names? It does not make much sense to add someone's address RID into the address book (should be either their wallet RID or an address).

@tevador
Copy link
Author

tevador commented Jan 1, 2022

The confusion could be avoided be defaulting to "local" at the cost of having slightly longer default addresses.

I realized that you kind of have a point here. For a given wallet, the address RID should only depend on the indices i,j and nothing else, otherwise the user might get a different set of address RIDs when they restore their wallet next time and get confused.

Here are two possible solutions to get a consistent sequence of RIDs:

  1. Remove the "none" signature type. This will make the shortest possible address 133 bytes long (184 characters in base58).
  2. For non-merchant wallets, dedicate even values of j to "basic" addresses that cannot contain metadata, while odd indices j could have metadata, but would not support the "none" signature type. Merchant wallets would not support the "none" signature type.

@LocalMonero
Copy link

LocalMonero commented Jan 2, 2022

2.1 Address format
Jamtis addresses, when encoded as a string, start with the prefix xmr

Prefixes are important for good UX and DX. Here's some more prefix suggestions:
xmr1 instead of just xmr, because Seraphis might get an upgrade that will rend old addresses obsolete, at which point new addresses will have a prefix of xmr2, etc.

Clear testnet prefix: xmr1t
Clear stagenet prefix: xmr1s
Clear mainnet prefix: xmr1m

Jamtis introduces a short recipient identifier (RID) that can be calculated for every wallet and every address. RID consists of 25 case-insensitive alphanumeric characters that are separated by hyphens for better readability. The RID for the above address is h8eug-w77qs-aaf7m-ww63i-hn33c. Instead of comparing long addresses, users can compare the much shorter RID. RIDs are also suitable to be communicated via phone calls, text messages or handwriting to confirm a recipient's address. This allows the address itself to be transferred via an insecure channel.

It would be much more human readable if the RID was, say, 4 words from a 2048-word dictionary (like BIP39), so that instead of the difficult-to-read h8eug-w77qs-aaf7m-ww63i-hn33c you would get correct-horse-battery-coffee. Emojis can also be used instead to sidestep linguistic barriers and improve brevity: ✅🐴🔋☕.

2.3 Invoices
Addresses can encode metadata such as an amount, timestamp or a short description. Such addresses can be referred to as Monero invoices. For example, the following address encodes an amount of 0.001 XMR and the message "THIS IS A TEST PAYMENT": xmrhtx7H1uwbArRUAVNjfwTYAj5kmXKCgZcR6jtGnjB8eMH3RihMnhos4f6Gz7gbj5pgNdbxr5xuPAjRT9xcRL2X9ta5ShB2sQbiMDVvECXbUXLp4cLYQTJAJ36gRKXZaTfEtfJd1oQFJSpZ2fSp1YgZHWzGh4za35MQYJQ.

Invoices should have a clear prefix. Such as xmr1i. It's also unclear how much information can be encoded into the address.

@tevador
Copy link
Author

tevador commented Jan 2, 2022

Clear testnet prefix: xmr1t
Clear stagenet prefix: xmr1s
Clear mainnet prefix: xmr1m

Why would the version and the network type need to be human-readable? The purpose of the xmr prefix is to identify the opaque sequence of characters as an XMR address. The rest should be delegated to the software, which can parse the network type and the version from the binary header and display a human-readable error if the address is invalid. monero-wallet-rpc even has a specific function for this purpose.

It would be much more human readable if the RID was, say, 4 words from a 2048-word dictionary

4 BIP39 words are insecure as they only encode 44 bits. You would need at least 10-11 words to get a similar security level as the base32 identifier, but this might get confused with a wallet seed.

Invoices should have a clear prefix. Such as xmr1i.

Again, why would the user need to know in advance if the address they got has metadata or not? This can be displayed by the software when the address is actually used.

It's also unclear how much information can be encoded into the address.

It's mentioned in the specification (7.3). The description field is limited to 256 characters.

@LocalMonero
Copy link

LocalMonero commented Jan 2, 2022

Why would the version and the network type need to be human-readable? Again, why would the user need to know in advance if the address they got has metadata or not? This can be displayed by the software when the address is actually used.

Here's a concrete example: we get support tickets where people ask us why a certain address isn't working when used for deposits or withdrawals. It looks like a normal XMR address but it's actually an invoice. Our support staff would instantly be able to identify that the given string is an invoice and not a standard address and reply to the customer without ever needing to spin up a wallet.

The same situation can be imagined if the addresses ever get an upgrade. You can imagine the amount of support tickets that will flood in as people try to use old version addresses to deposit/withdraw. With a clear human-readable xmr1 vs xmr2 support staff will instantly be able to tell the user what the problem is. Otherwise, the support staff would have to manually check every address before they can confirm to the user that they're using an old version address.

In the same situation, it would also simplify the experience for the user themselves. Say they have their deposit or withdrawal address saved somewhere, but then an upgrade happens, and they forgot whether they've updated their address book with the new addresses or not. They'll instantly be able to see that the addresses they have written down are xmr1 and not xmr2, hence, they need to update their address book.

Another example: during the development of applications, you will often be testing with both normal and testnet networks. Having a clear prefix simplifies the development experience, you instantly know if you're looking at a testnet or mainnet address. This is why BTC has bc1q and tb1q as prefixes.

4 BIP39 words are insecure as they only encode 44 bits. You would need at least 10-11 words to get a similar security level as the base32 identifier, but this might get confused with a wallet seed.

How many bits of security is necessary? I see your point about confusing with mnemonics, in this case it's probably optimal to just use a sequence of emojis: 🐴🍏🍍---🐭🏰🐅---🏮👔🍓

Humans are objectively better at recognizing and faster at reading such pictogram sequences as opposed to random characters. Also, differentiating Latin alphabet characters is much more difficult for people who have a completely different writing system, such as Chinese users. Just about as difficult as it is for an English speaker to differentiate Chinese characters.

Emojis achieve the goals of ease of readability, brevity, clarity and they do so across any linguistic background.

By the way, a sequence of emojis is exactly what Telegram uses for E2EE verification during calls:
image

@tevador
Copy link
Author

tevador commented Jan 3, 2022

... why a certain address isn't working when used for deposits or withdrawals
... people try to use old version addresses to deposit/withdraw

The phrase "isn't working" implies they got some sort of an error when interacting with a software (either the merchant's website or the wallet). Why couldn't the software give a clear reason why the given address cannot be used? E.g. "Sorry, the address you are trying to use is an invoice address, please use another address" (btw, there is no reason why an invoice address couldn't be used for withdrawals, but that's beside the point).

I specifically wanted to avoid having "human readable addresses". Addresses are not something that humans should understand. They communicate data to the software, not the humans. When someone gives you an address, they are not actually giving it to you but to your wallet software.

But I guess people are too dumb to read software error messages...

Your proposal would require at least 3 additional characters: version, network type and address type. I don't like that this information would be duplicated in the human-readable prefix and the binary header. Also the checksum would probably need to be modified to also cover the human readable part in some way. Or perhaps the human-readable part could be included in the RID instead of the address?

during the development of applications, you will often be testing with both normal and testnet networks. Having a clear prefix simplifies the development experience, you instantly know if you're looking at a testnet or mainnet address

Developers are smart enough to learn the base58 representations of the header values that correspond to the different network types.

How many bits of security is necessary?

Currently, the minimum recommended security level is 112 bits. The base32 identifier is 120 bits. With an alphabet of 2048 symbols, you would need 10-11 symbols to get to a safe security level.

Emojis achieve the goals of ease of readability, brevity, clarity and they do so across any linguistic background.

I think emojis are a terrible idea for several reasons:

  1. Cannot be used in text-only environments (the "CLI" in monero-wallet-cli stands for "command line interface")
  2. Cannot be used in DNS
  3. Cannot be easily written down by hand
  4. Cannot be easily typed on the keyboard
  5. Can have a different appearence on different systems
  6. Some systems don't support emojis at all, which would result in identifiers like □□□□□□□□□□

@UkoeHB
Copy link

UkoeHB commented Jan 3, 2022

Developers are smart enough to learn the base58 representations of the header values that correspond to the different network types.

I don't know about you, but I'd rather understand something at a glance than have to memorize unintuitive base58 -> type mappings.

@LocalMonero
Copy link

LocalMonero commented Jan 3, 2022

The phrase "isn't working" implies they got some sort of an error when interacting with a software (either the merchant's website or the wallet). Why couldn't the software give a clear reason why the given address cannot be used?

People literally don't read these, they just straight up go to the support portal. This isn't even taking into account that a lot (most?) of services won't do error messages properly.

I specifically wanted to avoid having "human readable addresses". Addresses are not something that humans should understand.

Isn't putting xmr at the beginning making it human-readable?

Your proposal would require at least 3 additional characters: version, network type and address type. I don't like that this information would be duplicated in the human-readable prefix and the binary header.

Can you redefine the binary header such that the base58 representation at the start matches the human-readable characters?

Developers are smart enough to learn the base58 representations of the header values that correspond to the different network types.

It's not that one theoretically can't, it's just that if you can make it easier for the developer without making sacrifices or with very minor ones, then why not? Hence, Bitcoin uses bc1q and tb1q. "Developers are smart enough to learn X" is basically the antithesis of improving DX.

  1. Cannot be used in text-only environments (the "CLI" in monero-wallet-cli stands for "command line interface") Some systems don't support emojis at all, which would result in identifiers like □□□□□□□□□□

Is it not the case that emojis are part of Unicode? If your terminal can display Chinese, which also uses Unicode (Chinese is used in monero-wallet-cli) it can display Emojis, can it not?

Cannot be used in DNS

Sure it can, it's a question of encoding.

Cannot be easily written down by hand; Cannot be easily typed on the keyboard

Isn't the RID for verification only? You wouldn't need to manually type it, would you? Just compare what you see, say, on the store web page and on your wallet?

Just like in the case of Telegram call security verification, it's never a question of typing or writing something out, it's just a question of quickly comparing two sets of character are identical. Your goal was to encourage and simplify verification, wasn't it? What do you think is quicker and easier to verify, especially for someone without a Latin alphabet background:

  1. h8eug-w77qs-aaf7m-ww63i-hn33c, or
  2. 🐴🍏🍍---🐭🏰🐅---🏮👔🍓

Can have a different appearence on different systems

Doesn't really matter if the turtle looks slightly different on another system. It's still a turtle.

@tevador
Copy link
Author

tevador commented Jan 3, 2022

Isn't the RID for verification only? You wouldn't need to manually type it, would you?

One of the current workflows requires the user to type it, but that could be changed. There are also offline uses when someone gives you their RID, possibly written on a piece of paper.

Sure it can, it's a question of encoding.

It will be converted to punycode, which is basically equivalent to the base-32 string.

The emojis work in the terminal if the font supports it. It would look something like this:

emoji
(the text version is truncated to have a similar bit length)

I personally find the text version more readable, but I guess we could have both. If you can compile a list of 1024 of the most distinct emojis, there can be a 12-emoji RID that also encodes 120 bits (although without a checksum).

@LocalMonero
Copy link

LocalMonero commented Jan 3, 2022

Telegram only uses 33 bits for verification security. Here's an excerpt from Telegram's announcement about it:

Keys for end-to-end encrypted calls are generated using the Diffie-Hellman key exchange. Users who are on a call can ensure that there is no MitM by comparing key visualizations.

To make key verification practical in the context of a voice call, Telegram uses a three-message modification of the standard DH key exchange for calls:

A->B : (generates a and) sends g_a_hash := hash(g^a)
B->A : (stores g_a_hash, generates b and) sends g_b := g^b
A->B : (computes key (g_b)a, then) sends g_a := ga
B : checks hash(g_a) == g_a_hash, then computes key (g_a)^b

The idea is that Alice commits to a specific value of a (and of g_a), but does not reveal g_a to Bob (or Eve) until the very last step. Bob has to choose his value of b and g_b without knowing the true value of g_a. If Eve is performing a Man-in-the-Middle attack, she cannot change a depending on the value of g_b received from Bob and she also can't tune her value of b depending on g_a. As a result, Eve only gets one shot at injecting her parameters — and she must fire this shot with her eyes closed.

Thanks to this modification, it becomes possible to prevent eavesdropping (MitM attacks on DH) with a probability of more than 0.9999999999 by using just over 33 bits of entropy in the visualization. These bits are presented to the users in the form of four emoticons. We have selected a pool of 333 emoji that all look quite different from one another and can be easily described in simple words in any language.

It's understandable that things are very different for Monero, but do you think something like this approach or some other approach can be used to reduce the required length from 120 bits down to around 30?

@tevador
Copy link
Author

tevador commented Jan 4, 2022

I checked the protocol and it seems to be based on the premise that the voice call participants can securely compare the emojis on their screens via a potentially compromised communication channel. That might work well with voice calls (it's hard for an eavesdropper to modify the words being spoken in real time), but it's basically useless for any other protocol.

Consider instead a chat channel using the same protocol. The chat is compromised by Eve, who has created two different shared secrets with Alice and Bob. Alice and Bob will each get a different set of 4 emojis. When Alice asks Bob which emojis he is seeing, Eve can modify Bob's response to match the 4 emojis Alice has on her screen (which Eve knows).

So overall, the protocol can work for interactive communications against a passive eavesdropper. I don't see how it's usable for Monero addresses.

@LocalMonero
Copy link

Apparently Telegram was only able to select a pool of 333 emoji that look distinct enough from each other and can be easily described, so 14 emojis would be needed to get the same level of security. Seems like too many, killing the brevity 😞

@UkoeHB
Copy link

UkoeHB commented Jan 5, 2022

Recommended semantic changes for keys:

account root key -> account-creation key: k_ar -> k_ac  (disambiguate the letter 'r' from k_fr and r; also 'account root' implies it is the root of a single account, rather than of all accounts)
account key: k_ac -> k_a  (ac is an unintuitive abbreviation of 'account')
address key: k_a -> k_addr   (a bit longer, but easy to understand at a glance)

I don't see myself ever using/remembering the wallet tier abbreviations:

AC -> AddrGenAccnt
AR -> AddrGenAll
FR -> FindReceivedSimple
ACFR -> ViewReceivedAccnt
ARFR -> ViewReceivedAll
VB -> ViewAll
M -> Master

Whenever an output with a matching view tag is discovered in a transaction that spends a previous wallet output, the shared secret calculation from § 5.3.4 is replaced with the modified calculation from § 5.4.1. The wallet will first try to find the key corresponding to a change output and if that fails, it will try the key corresponding to a self-spend. (section 5.4.2)

It should try all three methods just in case.

  • I think section 6.2.3 needs to be updated (it's k^{i,(j)} -> k^{i,j} now afaict).

@j-berman
Copy link

j-berman commented Jan 5, 2022

Running through implications and expectations for light wallets in JAMTIS, please correct me if I'm wrong on any of this:

  • The light wallet server should store a user's kfr (or "find-received key"), all tx's with view tag matched outputs, and all tx's involving those view tag matched outputs as inputs.

  • Light wallet clients will therefore need to download -- and light wallet servers will need to store -- a larger number of outputs and tx's per user. First, they'll need all view tag matched outputs, which are an expected 0.4% of the chain. Then, they'll need expected number of tx's involving those view tag matched outputs as inputs, which will be a function of ring size and tx volume.

  • In order to calculate wallet balance (simplified), the light wallet client should request all view tag matched outputs and all tx's involving those view tag matched outputs as inputs from the server, iterate over all view tag matched outputs client-side, find outputs sent to the user and derive key images (client-side), then lookup tx's with matching key images among the tx's involving view tag matched outputs as inputs (client-side), then check for change outputs and self-spends (client-side).

  • The light wallet client should keep track of previously used "accounts" and "addresses within the account" in a hash table while scanning. The client should implement a default lookahead similar to the one used in normal wallets today.

  • If a light wallet user wants to create a new account or address within an account, they will do so client-side.

  • The light wallet server should not store addresses, and will not be able to derive a user's addresses.

  • Light wallet users benefit from making extra effort not to reuse addresses. Any time a user reuses an address, the light wallet server can derive a duplicate Ks' (or "nominal spend keys") for outputs received to the same address, which indicates those outputs were sent to the user with certainty.

  • There is an optimization win where a light wallet client can send the server an encrypted cache at a plaintext checkpoint height. Thus, logins after the first login will load quicker for light wallet users using the cache, and the server can prune data for the user prior to the checkpoint height. Privacy implications would need to be thought through on this more carefully.

@UkoeHB
Copy link

UkoeHB commented Jan 5, 2022

The light wallet server should store a user's kfr (or "find-received key"), all tx's with view tag matched outputs, and all tx's involving those view tag matched outputs as inputs.

Light wallet clients will therefore need to download -- and light wallet servers will need to store -- a larger number of outputs and tx's per user. First, they'll need all view tag matched outputs, which are an expected 0.4% of the chain. Then, they'll need expected number of tx's involving those view tag matched outputs as inputs, which will be a function of ring size and tx volume.

No, the light wallet server stores: k_fr and a list of {output, txo pubkey, nominal spend key, tx id} tuples for all view tag matches.

In order to calculate wallet balance (simplified), the light wallet client should request all view tag matched outputs and all tx's involving those view tag matched outputs as inputs from the server, iterate over all view tag matched outputs client-side, find outputs sent to the user and derive key images (client-side), then lookup tx's with matching key images among the tx's involving view tag matched outputs as inputs (client-side), then check for change outputs and self-spends (client-side).

The client: A) gets a set of output tuples from the server, B) does normal scanning on all of those outputs, C) gets key images for owned outputs, D) matches each key image to a tx id, looks at outputs received from the server in that tx, checks if those outputs are change/self-spends. This part is not done: "[requests] all tx's involving those view tag matched outputs as inputs from the server".

The light wallet client should keep track of previously used "accounts" and "addresses within the account" in a hash table while scanning. The client should implement a default lookahead similar to the one used in normal wallets today.

Yes, but in practice this can be farmed out to a standard/pre-defined object. Check out my planned infrastructure (i.e. the ENoteFinder= light wallet server, ViewWallet= light wallet client; the ViewWallet.ENoteFinder would be implemented as a remote connection).

If a light wallet user wants to create a new account or address within an account, they will do so client-side.

The light wallet server should not store addresses, and will not be able to derive a user's addresses.

Light wallet users benefit from making extra effort not to reuse addresses. Any time a user reuses an address, the light wallet server can derive a duplicate Ks' (or "nominal spend keys") for outputs received to the same address, which indicates those outputs were sent to the user with certainty.

Correct

There is an optimization win where a light wallet client can send the server an encrypted cache at a plaintext checkpoint height. Thus, logins after the first login will load quicker for light wallet users using the cache, and the server can prune data for the user prior to the checkpoint height. Privacy implications would need to be thought through on this more carefully.

Like storing the light wallet client's state at the server? The server can analyze the cache size to estimate the number of outputs you own (which can be correlated with scanning requests to the server).

@j-berman
Copy link

j-berman commented Jan 5, 2022

D) matches each key image to a tx id, looks at outputs received from the server in that tx, checks if those outputs are change/self-spends.

In order to match key image to tx id, the client has to query the server for tx id's by key image here, right? So essentially considering it ok that the client queries the server in a way that leaks key images? Or am I missing something there

Yes, but in practice this can be farmed out to a standard/pre-defined object. Check out my planned infrastructure (i.e. the ENoteFinder= light wallet server, ViewWallet= light wallet client; the ViewWallet.ENoteFinder would be implemented as a remote connection).

I'm not following here, can you expand on this a bit more?

Like storing the light wallet client's state at the server? The server can analyze the cache size to estimate the number of outputs you own (which can be correlated with scanning requests to the server).

I imagine there's some room for some clever padding. But alas, premature optimization imo. Reason I thought it would be more relevant is because I thought the server would store all plausible spends of view tag matched outputs, which seemed like a lot.

@UkoeHB
Copy link

UkoeHB commented Jan 5, 2022

In order to match key image to tx id, the client has to query the server for tx id's by key image here, right? So essentially considering it ok that the client queries the server in a way that leaks key images? Or am I missing something there

The client just has to look at outputs it already got from the server. The server will identify all change/self-spends via view tags (but will compute an incorrect nominal spend key for them).

I'm not following here, can you expand on this a bit more?

I just mean a light wallet client doesn't need an extra implementation (unless you are unable to use the core repo, in which case you will just need to port an object definition from C++ that does all the things you need).

@tevador
Copy link
Author

tevador commented Jan 5, 2022

I think @j-berman has a point in that a transaction can in theory spend an output and not send any change back to the wallet. The light client would never learn that this output has been spent.

However, this can be easily fixed by requiring all transactions sent from the wallet to have at least one output with a matching view tag. In most cases, this will be the change output, but even in a rare case when there is no change output, the wallet can select the output private key r such that the view tag also matches for the wallet key kfr (requires 128 attempts on average).

@UkoeHB
Copy link

UkoeHB commented Jan 5, 2022

I think @j-berman has a point in that a transaction can in theory spend an output and not send any change back to the wallet. The light client would never learn that this output has been spent.

Wait what? You just compute the key image and check the key image table. A light wallet client should have k_vb.

@j-berman
Copy link

j-berman commented Jan 5, 2022

The server will identify all change/self-spends via view tags (but will compute an incorrect nominal spend key for them).

Ah! Got it - awesome. And when there is a tx with 0 change, clients will be sure to include a 0-amount output back to the sender? I'm not sure how this is handled today (if it's a dummy address or not).

I just mean a light wallet client doesn't need an extra implementation

Got it

EDIT: haha just saw above comments

However, this can be easily fixed by requiring all transactions sent from the wallet to have at least one output with a matching view tag. In most cases, this will be the change output, but even in a rare case when there is no change output, the wallet can select the output private key r such that the view tag also matches for the wallet key kfr (requires 128 attempts on average).

This makes sense

@tevador
Copy link
Author

tevador commented Jan 5, 2022

Wait what? You just compute the key image and check the key image table.

The light client would need to download all key images first. And then it would still not be able to show in the wallet history when exactly the output was spent.

@j-berman
Copy link

j-berman commented Jan 5, 2022

Right, would seem when the server finds view tag matched outputs, it should return any key images in that tx of any rings that also have a view tag matched output.

@UkoeHB
Copy link

UkoeHB commented Jan 5, 2022

Ah! Got it - awesome. And when there is a tx with 0 change, clients will be sure to include a 0-amount output back to the sender? I'm not sure how this is handled today (if it's a dummy address or not).

This kind of approach is not reliable due to collaborative funding, where you can spend funds in a tx but not be the one who defined the output set.

The light client would need to download all key images first. And then it would still not be able to show in the wallet history when exactly the output was spent.

Yes, you need pairings <key image, tx id>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment