kayabaNerve/.md Secret

## .md

      
    Raw
  

              .md
            
          
    FCMPs++

FCMPs++ is the next protocol composition for Monero, featuring Full-chain
Membership Proofs (FCMPs). These replace the existing ring signatures, which
only offer an anonymity set of 16, with an anonymity set of every outputs
on-chain. This resolves concerns such as the EAE, EABE, Overseer, etc. attacks,
as demonstrated within a Chainalysis video (starting from a known wallet state).
It also resolves the risk of black marbles by ensuring every input's privacy set
is the privacy set of all honest outputs on-chain (where an honest output is one
whose state is not revealed to an adversary). In the worst case (infeasible),
where every already spent output has been connected to its place of spend (via a
chain reaction), the privacy set with FCMPs will still be all honest unspent
outputs.
Transaction Uniformity

Transaction uniformity is an idea which notably traces back to
this post by tevador. In it, tevador cites a post to mandate all transactions have 16 outputs so an
adversary cannot determine how many outputs a transaction actually has. tevador
extends this with a proposal to mandate all transactions have a constant amount
of inputs.
Why?
Right now, 16-out transactions represent services doing bulk payment processing.
Identifying the transactions which belong to individual users vs services splits
Monero's privacy set. Ideally, all transactions look perfectly identical,
zero-knowledge in all regards, with no additional info granted to a third-party
observer.
Transactions with a high amount of inputs presumably represent stores accepting
Monero or even, with just three-to-five outputs, notable users of Monero. In
comparison, transactions with just one input represent a new user to Monero (at
least a new wallet) or someone who recently made a transaction (aggregating all
of their outputs, leaving them with just one).
This is further exacerbated by how some wallets may not support batch payments,
allowing identifying all transactions with >2 outputs as not having been made by
specific wallets. This fingerprint can be combined with others to further reduce
the privacy set of users.
So why didn't Monero fix an input/output count a while ago?
The performance cost. Fixing to 16 outputs, the maximum number of outputs, is
wasteful by 87.5% for ~90% of transactions. Outputs not only exist on the
blockchain yet must be indexed into the database and randomly read when using
ring signatures. With FCMPs, outputs must be accumulated into the Merkle tree
(an additional computational cost).
The same commentary exists for inputs, though it is much more difficult to
support dummy inputs. The cited post by tevador notes a solution for RingCT yet
notes the lack of applicability to Seraphis. Even if dummy outputs can be
specified, we need to ensure the dummy outputs cannot be used to burn real
outputs. tevador did propose a solution for dummy outputs under FCMPs. It proposes growing the blockchain's tree for each input, creating a tree
solely for that input, with a dummy output. It's a great idea which should
generally apply to all future iterations of the membership proof design. That
doesn't change it's of non-trivial technical complexity when FCMPS++, the
next hard fork being discussed, is already of notable technical complexity.
Context on the Already-Present MAX_OUTPUTS

MAX_OUTPUTS is currently set to 16. This was a decision made years ago. It did
not make the Monero protocol unusable. The reasoning was due to the
computational burden of Bulletproofs and how it scales.
Bulletproofs uses 64 "IPA rows" per output. The notable aspect of the "IPA" is
it scales with powers of 2. You can use 1 IPA row, 2 IPA rows, 4, 8, 16, 32, 64,
128, 256, 512... So what happens when you have three outputs? You use 192 IPA
rows, yet the IPA scales to 128 and 256 rows (not 192). Internally, the IPA will
be for 256 IPA rows, incurring the cost of 256 IPA rows, yet only 192 of them
matter to us. This leaves 64 unused yet paid for. If we added a fourth output,
it would fill the already-paid-for rows and not increase the cost of the
Bulletproof (though we would still pay a cost for the output in general).
If we have 17 outputs, we incur a computational cost as if we had 32 outputs.
This is not only incredibly wasteful yet increasingly opens a Denial-of-Service
against Monero. If a single output takes a few ms, then specifying a transaction
with 333 outputs would take an entire second of computational cost to verify. If
the proof resolves invalid, the node cannot recoup any fees (so solely requiring
the fee be 'enough' does not work). This is before the rules of Bulletproofs
would pad the outputs' computational cost to be equivalent to 512 outputs (a
second and a half to verify). When Bulletproofs was introduced, the decision was
made to bound the time it costs to verify a transaction to roughly 50ms
(estimated, I have not explicitly re-benchmarked all of this on historical
hardware and modern hardware to provide a full historical analysis).
Please note Monero replaced Bulletproofs with Bulletproofs+ to prove the
validity of outputs. The exact same principles apply.
Why MAX_INPUTS?

Monero already has a MAX_INPUTS in practice. wallet2 will only create
transactions which are 149,400 bytes or less. If all 16 outputs are present,
this ends up as roughly 220 inputs with the current ring size. This means if you
have more than 220 inputs, and wish to spend your entire wallet balance in a
single transaction, you will need to make an transaction consolidating your
outputs prior to making your actual transaction. The current discussion is on
setting an explicit limit at a much lower value.
FCMPs, one of the two proofs used for inputs, is premised on Bulletproofs. With
that, it has the same scaling rules as prior described. It should be noted each
input does add some additional proof data, so a three-input FCMP is still a few
hundred bytes smaller than a four-input FCML (despite paying for the same amount
of IPA rows).
The naive solution is to use one FCMP per input. This ensures we never pay for
IPA rows we don't actually use. The issue is an FCMP for one input is 5600
bytes. We could support just 24 inputs under this policy. 24 inputs, at 5600
bytes, plus the bytes for the signature and just two outputs, will prevent
adding a 25th input and still fitting within wallet2's 149,400-byte limit.
The good news is we can prove for multiple inputs within a FCMP. While one input
takes 5600 bytes, two inputs is just 6336 (+736 bytes for the second input).
Each FCMP uses 384 IPA rows. This does qualify as a power of two as internally,
the FCMP uses two Bulletproofs (one with 256 rows, one with 128 rows). This
means an entire FCMP, a proof which evaluates membership within a Merkle tree
with capacity for 210 billion outputs, verifying the claimed input is valid, is
just six times as many IPA rows as a Bulletproof which verifies an output is a
valid 64-bit number and not effectively a negative amount. The proximity of
these despite the ocean of difference in complexity is something I'm incredibly
proud of, building on top of Curve Trees and
Eagen's posited idea of using the divisor
field of an elliptic curve to prove scalar multiplications.
If we used a single FCMP, we could still support 134 inputs within the
transaction size limit. Since an FCMP for one input is 15ms to verify, we can
extrapolate such an FCMP to take 2s to verify. This means any RPC request to
publish a transaction would potentially cause the node to devote 2s of CPU time
on verifying it. This would immediately be a wide window to perform
Denial-of-Services, potentially causing every RPC to be whitelisted (explicitly
or via some form of at-time-of-use payment, such as the existing credit scheme).
At the very least, we're presumably discussing a multi-second Proof of Work
(taking several to tens of seconds on lower-powered devices such as mobile
phones) to prove the client isn't using less resources than the RPC for the
request.
An alternative proposal was one I made in the FCMPs++ document itself. We can
step proofs along powers of two, ensuring we never pay for unused IPA rows while
minimizing bandwidth while possible. If we have 11 inputs, we would create one
proof for 8 inputs, one proof for 2 inputs, and one proof for 1 input (22580
bytes, compared to 61600 bytes if individual proofs were used and 11904 bytes if
a single proof is used). While this isn't optimal for bandwidth, it strikes a
great balance with computational performance. Not only is this due to only
paying for the actually used IPA rows (~69% compared to the padded 16-input
Bulletproof), it enables batch verification.
Batch verification is where multiple proofs are verified at once. Doing so
reduces the cost of verifying each individual proof. When accepting transactions
into the mempool, we can only define the batch as the proofs within the
transaction. When syncing blocks, we can define the batch as all proofs within
every transaction within the block. This can reduce proof verification time by
almost 33%!
With our 11-input example, across three proofs, work can be shared across the
first 384 IPA rows for all three proofs, and the next 384 IPA rows for the
2-input and 8-input proofs. This means not only are no IPA rows wasted, some
computations can be merged for even greater performance benefits.
This idea quickly faded away because of how expensive FCMP proofs are regarding
bandwidth in the first place. While the 11-input three-proof setup may be twice
as fast as an 11-input one-proof, it's twice as large and bandwidth is generally
prioritized more than computational cost.
So we do want to use a single proof, for bandwidth reasons, but we can't have
checking transactions take multiple seconds. The question is what a reasonable
limit is.
My Recommendation for MAX_INPUTS

If we maintain parity with the IPA rows allowed by MAX_OUTPUTS, we should allow
two inputs and a fractional input. This is technically acceptable. So long as we
allow consolidating inputs at all, an arbitrary amount of inputs can be
consolidated with a logarithmic bound on time. If we had 256 inputs, they could
be aggregated with just 8 steps (as log2(256) = 8). This would still be very
negative from a user-experience perspective but creates a latent system, not an
unusable system.
If we attempt to not so drastically harm the user experience, I pointed to a
distinct perspective. We could set MAX_INPUTS to MAX_OUTPUTS. What we have
already declared as a sufficient rate for creating outputs would be declared the
sufficient rate for consolidating inputs. We'd also have the benefit that an
adversary cannot spam outputs to a wallet with fewer transactions than necessary
to consolidate them. This holds true so long as MAX_INPUTS >= MAX_OUTPUTS,
though such a clause can be argued unnecessary as even just two inputs can be
sufficient to achieve logarithmic time bounds.
16 inputs, which would be the limit if we set MAX_INPUTS to the current
MAX_OUTPUTS value, has the benefit of only adding a single spend interval of
delay to consolidation. Right now, ~220 inputs can be aggregated in a single
transaction. With 16 inputs, one can create 16 transactions aggregating 16
inputs into 16 outputs. After the 10-block lock passes (one 'spend interval'),
the user has 16 outputs usable in a single transaction. The 256 inputs
aggregated exceed the current ~220 inputs allowed.
I personally believe 16 inputs is still too computationally expensive. I'd
actually prefer as low as 4 MAX_INPUTS, roughly 60ms to verify. It's still a
decent fan-in factor, covers the vast majority of transactions, and makes sense
as an eventual target for transaction uniformity.
I also acknowledge how impactful to user experience that would be. That's why
my current advocacy is 8 inputs. Since I prefer adversaries not be able to
create outputs faster than they can be consolidated, and since I believe 4/4 is
a solid target for transaction uniformity (a long-term goal for privacy), I
support reducing MAX_OUTPUTS to 8 as well.
On MIN_INPUTS and MIN_OUTPUTS, or Other Restrictions

Any MIN_INPUTS value which isn't 1 requires support for dummy inputs. I don't
support including that complexity in this next hardfork.
We currently have MIN_OUTPUTS of 2. Carrot, the wallet protocol planned to ship
with FCMPs++, requires a change output. Establishing a logarithmic time-bound
for executing payments requires being able to make more than one payment within
a transaction. This means, under Carrot, we cannot reduce MAX_OUTPUTS past 3. We
can, with the goal of transaction uniformity, set MIN_OUTPUTS to 3. I don't
support padding input/output counts until we are ready to more fully commit to
transaction uniformity which I do support as a long-term goal.
We could partially aim for transaction uniformity by only allowing n inputs, and
a power of two for outputs. I don't believe this actually achieves a real-world
benefit to privacy, as an adversary can still distinguish low-output
transactions from high-output transactions.
Once we have support for dummy inputs, the only bar to moving forward with
transaction uniformity is the performance cost.
On Future Performance Improvements

Bulletproofs, when used as a range proof, only grow by 64 bytes for each power
of two. If we limit MAX_OUTPUTS to 4, I'd argue Bulletproofs performant enough
we can move forward with transaction uniformity.
FCMPs++ is too early to comment on in my option. We need to see how these affect
testnets, stressnets, and mainnet to make commentary on the real-world
performance impact.
There are further improvements which can be made to Full-chain Membership Proofs
however. Curve Forests would effectively
remove the bandwidth growth of FCMPs with additional inputs. We'd only face the
same 64 byte growth as we cross powers of two. It would increase the
computational cost of adding new outputs to the tree however linear to the
value for MAX_INPUTS. This is computationally cheaper overall, as the proofs
themselves are also a bit cheaper, so long as a sufficient amount of proofs
actually use a high amount of inputs. Under transaction uniformity, every proof
would always use MAX_INPUTS, ensuring the trade-off for smaller and
potentially cheaper proofs always results in cheaper proofs.
We can also halve the amount of commitments to variables we use within FCMPs if
the original results from Generalized Bulletproofs are recovered. The original
variant promised the ability to twice as much data within each commitment. This
was incorrect as the resulting proof failed correctness (an academic term). The
solution was either not to commit to the second half of data promised, or to
double the length of a polynomial (doubling the amount of commitments to that
polynomial within the proof). Needing double the commitments in total, yet
halving the commitments within the proof, always has a slight edge on proof
size and is the variant of the proof currently implemented. If someone could
resolve the correctness issues, allowing to commit to twice as much data without
doubling the commitments within the proof itself, it'd halve the commitments we
use (also halving the amount of the commitments within the proof) and notably
improve our bandwidth.
We can also reduce the bandwidth used by increasing the time it takes to verify
proofs. This is an option available now, referred to as the "compute factor"
within a script I wrote to estimate proof size and computational cost. While
this would double, or even quadruple, the proof time, optimizations to our
underlying elliptic curve arithmetic may sufficiently counterbalance this. The
current implementations of Helios and Selene, the elliptic curves used by FCMPs,
were written by me with generic arithmetic and never advertised to be
production-grade. They resolved as usable in production but someone who
implements tailored arithmetic may be able to achieve significant performance
improvements (allowing decreasing the bandwidth without increasing the time to
verify). I have already outlined a competition for developers to work on this task.
Finally, we can look at a distinct proof system entirely.
zkFFT may be a better candidate than
Bulletproofs for FCMPs and usable in a future protocol upgrade. We can also aim
to implement an IVC/RCG proof such that one proof, with the size and
computational cost of a proof for one input, proves for an arbitrary number of
inputs. These likely have too large constants to make sense at this level yet
are worth keeping in the back of one's mind.
Most of these can be targeted for the next year or two, assuming the proper
resources are devoted. It's only distinct proof systems entirely which would
likely take two to three years (if not more) to adopt.
On User Experience

As prior stated, so long as we allow two inputs and three outputs, we can
achieve logarithmic-time bounds for consolidation and fulfillment of payments.
Obviously, the average transaction should not have to incur any twenty minute
delay. Per Rucknium's tally, 97.67% of transactions used less than or exactly 8 inputs. 97.71% of
transactions used less than or exactly 8 outputs.
With FCMPs++, which is when these changes would take effect, transaction
chaining becomes possible. If a user has 256 outputs they wish to consolidate,
the user may sign the consolidation transactions in a single moment. Then, the
wallet, running in the background with neither the private spend key nor even
the private view key, may publish the consolidation transactions. The wallet
would know which outputs were spent in the transactions it published but no
more.
The same scheme is possible for fulfilling an arbitrary number of payments.
I will also note how existing services already have to deal with MAX_OUTPUTS. It
is low enough I expect most services handling volume to have hit it in practice.
Some services, even with decent volume, may not have hit MAX_INPUTS (currently
implicit). Such services are at risk of erroneous or unexpected behavior if they
have more inputs than spendable within a single transaction and attempt to spend
their entire balance (or enough of their balance they would need to use so many
inputs). While the proposed changes may force existing services to face it, it
was always something which may be faced, is potential to be faced today, is
practical to be forced to be faced today, and should be handled by services.
On the 10-Block Lock

The impact to users would be minimized if a spend interval wasn't twenty minutes
long but if outputs could be immediately spent. This isn't possible. Monero will
always have at least a 1-block lock. Outputs must be indexed/accumulate on-chain
before they can be spent.
If we have a 1-block lock, and we have a 1-block reorganization, all
transactions in the latest block will be invalidated and need to be republished
if they refer to state from that block. With ring signatures, this would be
via selecting a decoy present in the most recent block. With FCMPs, this would
be via selecting the most recent tree to prove membership within. With ring
signatures, the transaction would have to be signed again with the private spend
key. With FCMPs, the transaction would solely need its membership proof
re-proved, requiring neither the private spend key nor the private view key
(solely the rerandomization, allowing linking the output to the input). That
still requires all affected users be online, actively aware, and willing to
republish their membership proofs. Monero itself would also pay a non-trivial
computational cost for the reorganization (having to revert the tree state).
The point of the 10-block lock is for Monero nodes and for users to not be
affected by small reorganizations which are accepted as part of Monero
relying on Proof of Work. The Monero Research Lab did recently revisit this
topic, doing a new analysis given Monero's current status, and came to the
conclusion that 10 blocks still makes sense for now.
I believe the way this would change is if Monero adopted a Proof-of-Stake
finality layer, yet I do not see the Monero community, who define the Monero
project, as open to such a system.
Wallets are able to mitigate the 10-block lock by splitting their value across
multiple outputs. This allows spending some outputs without locking all outputs
(and the wallets entire value with it). This has non-trivial privacy
implications under the current ring signatures. The output count of the
transaction is a fingerprint (due to the lack of transaction uniformity)
allowing identifying potential split transactions. Once potential split
transactions are identified, adversaries can look for transactions which have
multiple inputs whose rings include distinct outputs from the potential split
transaction as ring members. This gives a very high statistical likelihood the
transaction does spend the split transaction's outputs and that the potential
split transaction was actually a split transaction. Even if a split transaction
only has two outputs, the spend pattern is still an issue given Monero's current
ring signatures. With FCMPs, these privacy risks are mitigated.
On P2Pool

P2Pool represents a real-world use case of creating outputs en masse, forcing
users to need to consolidate outputs en masse. I'd argue P2Pool is an
inefficient design, even if I believe a peer-to-peer pool is important and
should be prioritized, and that the Monero protocol should not pay for its
inefficiencies. I'd love to see designs with less impact on Monero, via
alternate payment structures (even if they have trade-offs/compromises), be
developed.
Defining MAX_INPUTS would not prevent P2Pool from operating yet would increase
the time users spend consolidating their outputs from it. This would further
incentivize a P2Pool design which creates fewer outputs on-chain, for the user
experience benefits.
I'll also note there is a dedicated issue for a
coinbase consolidation transaction type. This resolves the P2Pool use case, while harming privacy, with no impact on
the discussions and decisions here.
On Fee Calculation

This section does not really flow with the rest of my commentary, yet how
transactions are weighted was a topic brought up with the rest of items here.
Since it's also being discussed, I want to make my position clear.
Fees in Monero are very annoying to get right. My proposal is to replace the fee field with a fee rate field. This forces
wallets to at least calculate the transaction weight properly as else their
transaction will fail to balance (inputs - outputs - fee = 0). The actual fee
paid would, of course, be fee_rate * weight.
Weights in Monero are very annoying to get right. The weight of a Monero
transaction, for the purposes of fees, is the byte-length of the transaction
plus a 'clawback' for the range proofs on the outputs. The clawback covers the
computational cost of a non-power-of-two Bulletproof whose cost isn't
represented by the byte-length. The byte-length of the transaction is itself
variable to various properties such the decoys selected and even the fee paid
(so paying a high enough fee may increase the transaction's weight forcing
paying an even higher fee).
My proposal for a weight, for the purposes of fees, is
base_fee + (inputs * input_fee) + (outputs * output_fee) + tx_extra_len, plus
some clawback for the unrepresented computational cost for non-power-of-two
inputs/outputs. I don't have an opinion on the exact clawback/constants used
here. I solely want to advocate for the simplicity with a formula solely
variable to the amount of inputs, the amount of outputs, and the length of the
transaction's extra field.
This extends existing discussions from Seraphis on fee discretization.
On Monero Decision Making

I do not propose these ideas now, and I do not propose them alone. Transaction
uniformity was not my idea and has been an active discussion for years. Fee
discretization was not my idea and has been an active discussion for years. I
started discussing limiting inputs in
Monero Research Lab meetings a month ago. There's been multiple discussions on this since in the following meetings.
I encourage anyone who wishes to participate in this discussion to be aware on
the considerations and to attend the Monero Research Lab meetings. They are
Wednesdays at 17:00 UTC.