Skip to content

Instantly share code, notes, and snippets.

@AdamISZ
Created December 27, 2018 18:48
Show Gist options
  • Save AdamISZ/4551b947789d3216bacfcb7af25e029e to your computer and use it in GitHub Desktop.
Save AdamISZ/4551b947789d3216bacfcb7af25e029e to your computer and use it in GitHub Desktop.
Basic payjoin/p2ep protocol for Joinmarket wallets

Described here is a variant of what has previously been published under the name "P2EP" or Pay-to-endpoint, in which A pays B but B contributes utxos, i.e. it's a coinjoin-payment.

I'm using the term "payjoin" here to refer to using that idea, but not including a URI/endpoint specific to B, and not allowing (as a merchant would) arbitrary payments, which opens up certain problems around snooping attackers (more on this below). So payjoin just means "A pays B but B actively participates and passes across utxos as extra inputs".

I'll defer a more features-focused and non-tech friendly description of what this means to a later blogpost.

Here is just the step by step protocol interactions, both at the user level and under-the-hood.

Alice will be Sender, Bob will be Receiver. Note that the Sender is a Taker class in JM code, and the Receiver a Maker class. This is because they inherit the main functional difference that one waits to complete the tx, and the other initiates.

Bob starts by running

python receive-payjoin.py -m mixdepth <walletname> <amount-in-sats>.

After normal Joinmarket start up (wallet sync, connect to message channels, also: Bob's bot puts a fake offer into the joinmarket pit; he effectively runs a spoofed yieldgenerator, but doesn't respond (for now) to requests in private-message)), Bob gets a printout like:

2018-12-27 17:57:17+0100 [-]  Your receiving address is:  2NCiuh9iJ3s8vEQGUWAfeZHTwUVqE5gEdB5
2018-12-27 17:57:17+0100 [-]  You will receive amount:  50000000  satoshis.
2018-12-27 17:57:17+0100 [-]  The sender also needs to know your ephemeral nickname:  J57Pwz4XHyyGu9QB
2018-12-27 17:57:17+0100 [-]  This information has been stored in a file payjoin.txt; send it  to your counterparty when you are ready.

This information is passed out of band to Alice (i.e. address, amount, nick). Bob's bot now waits as long as needed until Alice turns up and completes the protocol.

Alice runs as follows:

python sendpayment.py -m 1  <alicewalletname> 50000000 2NCiuh9iJ3s8vEQGUWAfeZHTwUVqE5gEdB5 -T J57Pwz4XHyyGu9QB

... using, of course, the data passed out of band. What happens next will usually not require further intervention, and after some seconds Alice will broadcast a transaction paying net 0.5 btc to Bob, in which at least one of his utxos is included.

  • Protocol

Alice sends in plaintext !pubkey <Alice-pubkey>. Bob receives and in plaintext also sends !pubkey <Bob-pubkey> back. These are of course ephemeral ECDH pubkeys used just for this session of communication.

Both sides now set up the end-to-end encryption (the existing codebase handles this automatically; both sides have crypto-box libnacl objects; and !tx messages are enforced to only be transferred under encryption).

Notice that any Alice-attacker can do this step, but it's harmless because the next step, which is encrypted, authenticates before Bob has sent any identifying info

Now Alice constructs a kind of dummy-tx: It takes form:

Inputs: <Alice utxos selected to pay .5> Outputs: <2NCiuh9iJ3s8vEQGUWAfeZHTwUVqE5gEdB5: 50000000>, <Alice-change: remainder>

Alice doesn't bother to calculate a fee here (so it's zero); this was not really a transaction, just a proposal of inputs and change. We send it as a transaction, because, as noted in the code comments:

however for the purposes of code reuse and signalling
that we are the right counterparty, sending a transaction,
containing the outputs, works fine.

The meaning of "signalling" here should be fairly obvious: by passing a transaction with the correct destination and amount, we ~ prove that we are the intended sender. Note that this line of reasoning is a bit too weak for the general merchant case, because anonymous senders operating over automated protocols may use this to make repeated spend request and collect your utxos, then not pay, damaging your privacy. The previously linked blog post I believe covers this issue in some detail (it was the subject of considerable debate in the London Coinjoin meetup).

The intention here is that for an ad-hoc payment this is not really a problem. However, TODO: I will alter the code so that Alice actually sends a signed version, she loses nothing this way and Bob can broadcast it if something goes wrong Alice-side.

So after Bob receives this !tx encrypted transaction message, he decrypts, and deserializes the transaction, checks that it conforms in destination, amount, and bitcoin balance, checks that the inputs are valid, then appends his own inputs. He selects coins to fulfil the same amount as the destination (this is a heuristic that could easily change if someone has a better idea; use twice the amount? half? etc.). If he fails to select that much he tries half as much, and repeats the halving three times before giving up, i.e. at the very least he adds 1/8 the amount of coins as he is receiving. Then he of course alters the output amount at the destination address to account for the coins that he's added. Finally, he also alters the change output amount proposed by Alice to account for the bitcoin network fee that he calculates. If it's dusty, he drops it. Finally if all checks pass he signs each input he himself has added (usually 1), and reserializes the now partially signed transaction. He then sends another !tx message in return to Alice, containing this partially signed transaction.

Alice receives, decrypts and deserializes, and does the expected checks: Checks the inputs that are signed, verify correctly. Checks that her utxos are included. Checks that the destination receives net the originally intended amount. Calculates the bitcoin fee and checks that it's between 0.3 and 3x the fee that her wallet estimates for current network conditions. If not, the user is prompted to decide. If all checks pass, signs all her inputs, and broadcasts the payment.

In short: Alice is a P2EPTaker and runs sendpayment. Bob is a P2EPMaker and runs receive-payjoin. They both use standard Joinmarket wallets. Alice connects and spoofs Taker behaviour, Bob connects (first) and spoofs Maker behaviour. Bob sends destn, amt and nick out of band. Alice starts with !pubkey, Bob responds the same, both sides set up encryption, Alice calculates and sends !tx encrypted proposal, Bob adds his input(s), signs and sends back !tx, Alice checks and if OK co-signs and broadcasts.

@chris-belcher
Copy link

chris-belcher commented Dec 31, 2018

UIH2 makes most sense when applied to individual UTXOs, because the adversary is usually someone who mostly-passively observes the blockchain. For example walletexplorer.com or a blockchain analysis company.

@AdamISZ and I discussed a topic on IRC, here is a summary. If Bob the merchant keeps sweeping and aggregating a lot, then eventually he will run into the UIH2 attack, so we can't keep doing that. Bob the merchant's UTXO will keep getting higher and higher valued. But P2EP doesn't have to be applied to every single tx in order to break the common-input-ownership-heuristic. So it's perfectly fine for Bob the merchant to do P2EP when he can, and when he can't because of UIH2 to accept a regular bitcoin payment instead. This regular bitcoin payment will give him a low-valued UTXO that he can now use for future P2EP transactions.

@AdamISZ
Copy link
Author

AdamISZ commented Jan 2, 2019

ping participants @chris-belcher @LaurentMT @fivepiece

So here's my current feeling about the outcome of this discussion:

(1) Alice selects with a somewhat greedy algo; so she is more likely than default, to pick 2+ utxos to spend rather than just 1.
(2) Alice sends a signed transaction as a proposal; this is a kind of matter of politeness, because it means Bob has a fallback in case something goes wrong. He doesn't intend to use it.
(3) Bob tries to select for non-UIH2-violation.
(3a) If he succeeds, things proceed as intended, partial-sign payjointx back to alice, alice cosign and broadcast.
(3b) If he doesn't: I propose he builds the payjointx and cosigns anyway. My reasoning: the UIH2 violating (i.e. doesn't look like an ordinary payment) transaction still gives both parties benefits: (A) It violates CIOH (somewhat analogous to JM sendpayment, it may have poor privacy properties but it's still breaking simple clustering, (B) most importantly, it still has the same utxo de-fragmentation effect as the ideal case, and (C) it's not easy for a blockchain-analyst attacker to deal with this case, since there are batched payments and other non-standard transaction types, it isn't just "oh this is obviously payjoin/p2ep". I very much doubt current blockchain analysis will even consider it might be a coinjoin. Also consider the UX; if two people go to all this trouble, and then find they don't get a coinjoin transaction, it's going to be annoying/disappointing (ok that'd be a crap argument if the outcome was worse than a normal payment, but I really don't think it is).

An additional unrelated point that I might add to update this doc: let's have versioning; Alice !pubkey <lowest support ver> <highest supported ver>, Bob !pubkey <chosen ver> with obviously right now, Alice will send lowest and highest supported as 01.

@AdamISZ
Copy link
Author

AdamISZ commented Jan 2, 2019

Wallet selection logic to avoid UIH2 wherever possible; let me know if you disagree.
Note that in the 'cases' we are considering Bob's point of view, i.e. looking at the first of the two diagrams and using it to select BI (his inputs).

Let Alice's inputs be AI[1:n] and let the payment amount be x and the change amount be y.

So Alice's proposal looks like:

AI[1..n]    --->    x
                    y

Bob is going to select some list BI of coins.

After that selection, the transaction will look like:

AI[1...n]    --->    x + sum(BI)
BI[1..m]    --->    y

First observation, applying to all cases: the new output after Bob's addition, will be sum(BI)+x which is by definition always bigger than any single coin in BI.

Case 1: x > y

Case 1a: max(AI) < x
In this case, Alice's inputs don't cause a violation; and Bob's inputs by our "first observation" cannot cause a violation either. So Bob is here free to choose any input set he likes.

Case 1b: max(AI) > x
In this case Bob must choose a set BI such that the sum(BI) > max(AI) - x (hence: the new payment output x+sum(BI) > max(AI). If he can do so, he avoids UIH2, if he has insufficient coins for this, he can't.

Case 2: y > x

Case 2a: max(AI) < y
This case is actually the same as 1a, in the sense that, since sum(BI) < sum(BI)+x always, he cannot cause a violation (and this is true whether sum(BI)+x > y or not), and Alice was not causing a violation before and so will not after (max(AI)< y is enough to guarantee it). So again Bob is free to choose anything.

Case 2b: max(AI) > y
This case is actually effectively the same as 1b. Bob's task is to select sufficient coins that the new payment amount (sum(BI)+x) "overtakes" Alice's largest input max(AI). Whether he can do so or not defines whether or not he can avoid triggering/violating UIH2.

@LaurentMT
Copy link

LaurentMT commented Jan 4, 2019

@AdamISZ

I'm guessing that @LaurentMT is working on the tacit assumption that any input utxos provided are linked, and therefore what matters is to check the overall contribution of each, rather than individual utxo sizes.

Actually, it might make sense to apply the constraint to individual UTXOs in order to remove this assumption.

@chris-belcher

But P2EP doesn't have to be applied to every single tx in order to break the common-input-ownership-heuristic. So it's perfectly fine for Bob the merchant to do P2EP when he can, and when he can't because of UIH2 to accept a regular bitcoin payment instead.

Agreed

@LaurentMT
Copy link

LaurentMT commented Jan 4, 2019

Here are a few stats for UIH1:

From block 552084 to block 552207 (One day: 01/12/2018)

  • Txs with 2 outputs and more than 1 input = 35,349
    • UIH1 Txs (identifiable change output) = 19,020 (0.54)
    • !UIH1 Txs = 16,203 (0.46)
    • Ambiguous Txs = 126 (0.00)

From block 552322 to block 553207 (One week: 03/12/2018 - 09/12/2018)

  • Txs with 2 outputs and more than 1 input = 268,092
    • UIH1 Txs (identifiable change output) = 145,264 (0.54)
    • !UIH1 Txs = 121,820 (0.45)
    • Ambiguous Txs = 1,008 (0.00)

To be honest, I'm really surprised by this almost perfect 50/50. I'm still wondering if there are specific coin selection algorithms which may explain the high ratio of ! UIH1 Txs and if these algorithms create transactions with a fingerprint very different from most future P2EP transactions (e.g.: algorithm selecting all the utxos associated to a same reused address, other coin selection algorithm?). I'll try to investigate that as soon as possible.

Anyway, if we just consider these figures and if we make the hypothesis that !UIH1 txs don't have specific fingerprints, it seems better to avoid a detection by UIH1 (as in "it seems better to ignore my suggestion" :D) because P2EP transactions will then benefit from an added uncertainty on the real payment output while staying in the 50/50 ratio (I don't expect that we'll see a large volume of P2EP transactions in a near future). If the high ratio of !UIH1 Txs is caused by transactions with specific fingerprints, it's a different story (but it's just my opinion).

And here are a few stats for UIH2:

Stats from block 552084 to block 552207 (One day: 01/12/2018)

  • Txs with 2 outputs and more than 1 input = 35,349
    • UIH2 Txs = 10,986 (0.31)
    • !UIH2 Txs = 23,596 (0.67)
    • Ambiguous Txs = 767 (0.02)

From block 552322 to block 553207 (One week: 03/12/2018 - 09/12/2018)

  • Txs with 2 outputs and more than 1 input = 268,092
    • UIH2 Txs = 83,513 (0.31)
    • !UIH2 Txs = 178,638 (0.67)
    • Ambiguous Txs = 5,941 (0.02)

@AdamISZ
Copy link
Author

AdamISZ commented Jan 5, 2019

@LaurentMT i take it that everywhere I see "Txs with 2 outputs and more than 1 output " it means "... more than 1 input" right?

@AdamISZ
Copy link
Author

AdamISZ commented Jan 5, 2019

I think what I have in the code now is probably OK; I try to be !UIH2 on maker side once he sees the set-up, but I still go ahead and do a coinjoin even if that's not possible. In other words, in line with what I said in my previous two comments in this thread.

This data is very helpful, thanks, and I believe it supports that approach; there are an appreciable fraction (1/3) but still minority of 2-output txs that violate UIH2, so if we violate it 20-50% of the time, that should be fine.

I guess there's not much point in looking at UIH1 at all, and currently I don't (I wasn't entirely clear @LaurentMT from the above, but is that also what you're saying?).

Of course we can always be more sophisticated in future.

I'll also note that I've tried to mimic Core locktime and rbf approach (I have not yet put in a 5-10% opt-in rbf but that could easily be done).

@LaurentMT
Copy link

LaurentMT commented Jan 13, 2019

@AdamISZ Good catch! Fixed it in my comment.

WRT UIH1, the cause of the 50/50 is still not clear to me. Among the potential causes:

  • txs spending all utxos associated to a same address => identifiable fingerprint
  • rbf'd txs (FSS RBF) => identifiable "fingerprint" (for network eavesdroppers)

But I doubt that these 2 causes alone explain the 50/50 ratio.

@NicolasDorier
Copy link

There is 3 parameters to separately copy and paste (50000000,2NCiuh9iJ3s8vEQGUWAfeZHTwUVqE5gEdB5, J57Pwz4XHyyGu9QB)

It would be cool if you could somehow combine this in one string so people do not screw up (like you screwed up yourself in your video during copy pasting) :)

@AdamISZ
Copy link
Author

AdamISZ commented Jan 15, 2019

@NicolasDorier

Agreed; in fact, @chris-belcher made the same comment at the start of this thread. Only question would be encoding; perhaps bech32 similar to BOLT11 is the way to go, else something like existing stuff (BIP21 etc?)

On a more general point: there is obviously no attempt here to develop a standardized protocol. This was more a rough-and-ready "what we can easily do in Joinmarket given its existing infrastructure for inter-client communication". AFAIK the existing standardisation proposal is BIP79 by R Havar. I'd like to comment on that proposal in some detail, but only after I've finished implementing this implementation. It obviously wouldn't be a simple matter for all, or even several, wallets to implement a common standard, not to mention, there are a number of tricky details to consider in the more general merchant use case, which as I said at the start here, I'm deliberately not trying to address in this version. BIP79 discusses some of them, as of course did the blockstream blog post which is linked at the start of the gist.

@NicolasDorier
Copy link

NicolasDorier commented Jan 17, 2019

@AdamISZ I think as long as it is easy to copy/paste the protocol itself does not matter too much. (no separator like - for example, because double clicking on the string would not select everything)
Better standard can emerge once it starts to be actually used.

@AdamISZ
Copy link
Author

AdamISZ commented Jan 21, 2019

I realise this discussion is kinda "done" now, but I would really appreciate if anyone could actually review the algo/thinking in this comment above. Especially since it's ... kinda running in "production" :)

@shesek
Copy link

shesek commented Mar 2, 2019

"UIH1" : one output is smaller than any input. This heuristically implies that that output is not a payment, and must therefore be a change output.

"UIH2": one input is larger than any output. This heuristically implies that no output is a payment, or, to say it better, it implies that this is not a normal wallet-created payment, it's something strange/exotic.

I'm a bit confused by these definitions. Aren't they only true for transactions of exactly 2 inputs?

For example, this transaction doesn't seem covered by any of the definitions, but it is a UIH2.

This transaction is covered by UIH1's definition, but is actually a UIH2.

I think a more accurate definition for UIH (1+2) could perhaps be something like "one output is smaller-or-eq the sum of inputs excluding the smallest input", or in other words "we can do without the smallest input and still have enough to fund the smaller output", or sum(in)-min(in) >= min(out).

The definition for UIH2 could be "the sum of inputs excluding the smallest input is larger-or-eq than any output", or in other words "we can do without the smallest input and still have enough to fund the larger output", or sum(in)-min(in) >= max(out).

The definition for UIH1 would then be anything that matches UIH but not UIH2, i.e. min(out) <= sum(in)-min(in) < max(out)

If we also take fees into consideration, this becomes sum(in)-min(in) >= min(out)+fee for UIH1+2 and sum(in)-min(in) >= max(out)+fee for UIH2. (of course, this is just a rough estimation, because adding inputs/outputs effects the size which effects the fee, which isn't taken into consideration here. but I think it could perhaps still provide more accurate results than ignoring the fee entirely?)

Note that these definition only work with transactions of two outputs. I think it could be extended to transactions with more outputs, by changing the UIH1+2 definition from min(out) to sum(out)-max(out) and the UIH2 definition from max(out) to sum(out)-min(out). But someone should double check that :)

@DanGould
Copy link

DanGould commented Feb 5, 2023

The Unnecessary Input Heuristics & PayJoin Transactions paper extends the discussion we're having here.

This paper investigates the resemblance of PayJoin transactions to ordinary payment transactions by examining the significance of the unnecessary input heuristic in transactions with more than one input and exactly two outputs which is the common template of recent PayJoin transactions.

It lists specific definitions of UIH1 & UIH2 where they're used in real block explorers and transaction statistics relevant to the practical privacy benefits of payjoin.

@AdamISZ
Copy link
Author

AdamISZ commented Feb 5, 2023

The Unnecessary Input Heuristics & PayJoin Transactions paper extends the discussion we're having here.

This paper investigates the resemblance of PayJoin transactions to ordinary payment transactions by examining the significance of the unnecessary input heuristic in transactions with more than one input and exactly two outputs which is the common template of recent PayJoin transactions.

It lists specific definitions of UIH1 & UIH2 where they're used in real block explorers and transaction statistics relevant to the practical privacy benefits of payjoin.

Thanks! This whole area definitely needed some deeper thinking so, nice to see that someone's doing that!

@DanGould
Copy link

DanGould commented Feb 5, 2023

The biggest takeaway I get from the paper is that @AdamISZ's definition of UIH2 would not miss UIH2 identification if it were defined as "A subset sum of inputs is larger than any output" rather than "One input is larger than any output", as put to practice in btcpay. I think this is a summary of @shesek's most recent comment.

@NicolasDorier
Copy link

NicolasDorier commented Feb 6, 2023

Reading the paper, but just dropping here: The coin selection of BTCPay, and actually anything which rely on NBitcoin should add unnecessary outputs without Payjoin.

The coin selection of NBitcoin is the same as old coin selection of Bitcoin core, where if you pay amount X, it tries to cover X + 0.1 BTC instead for consolidation purposes. I am unsure about other implementations.

Regardless, it is interesting to improve the coin selection algo of payjoin.

@NicolasDorier
Copy link

It seems quite a low hanging fruit, so I'll do it.

btcpayserver/btcpayserver#4599

@NicolasDorier
Copy link

@NicolasDorier
Copy link

NicolasDorier commented Feb 6, 2023

Actually it seems a bit different.

EDIT: I will now comment in the discussion on btcpayserver, I believe the heuristic avoidance we had in BTCPay Server wasn't working properly.

@NicolasDorier
Copy link

Ok will roll it out for next btcpay release: btcpayserver/btcpayserver#4600

@DanGould
Copy link

DanGould commented Feb 16, 2023

I think I found a way to split outputs to eliminate even the subset-sum UIH1 by creating multiple subsets of the same value. Here's an example:

Payjoin A, a real Payjoin

before output splitting:

0.02292119 btc ---> 0.01172121 btc
0.01395977 btc      0.025 btc

Payjoin A', avoiding UIH

The larger of the two outputs is split to produce another of the difference between them. (That's knapsack mixing)

0.02292119 btc ---> 0.01172121 btc
0.01395977 btc      0.01172121 btc
                    0.01327879 btc

The resulting transaction no longer conforms to unnecessary input heuristic. Obviously two outputs of identical amounts is itself a different heuristic. I imagine that could be mitigated somewhat during fee subtraction. Since bip78 payjoin is done in one interaction only the receiver may choose to split in the case that payment amount + receiver contribution > sender change amount. The sender seems to be unable to know whether they could split or not after receiving the signed payjoin proposal which they can't alter.

@shesek
Copy link

shesek commented Feb 18, 2023

@DanGould A transaction having >2 outputs is itself an indication that it was not a standard payment, possibly a stronger one compared to UIH

@DanGould
Copy link

DanGould commented Mar 1, 2023

@shesek Yes, >2 outputs removes some of the covert nature of payjoin. The question that follows is which is a privacy property should be prioritized: Concealing the fact that a sophisticated piece of software constructed the tx? Or the identity of which output received payment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment