pia park (dot) me

Merkle Sum Tree Attack PoC

Overview

The following article is focused on explaining PoC that did to reproduce the vulnerability that was described in this paper.

The attack is targeting the Merkle Sum Tree data structure, that is used in Summa Protocol to provide Proof of Solvency. As Summa modified their implementation slightly to deal with vulnerability, we’ll go through how’s the modification and how we tried PoC to attack to verify it’s indeed safe.

You could check full PoC code in this PR.

Merkle Sum Tree

As the original Merkle Sum Tree is broken and it has a vulnerability where the custodian can build MerkleTree where the balance of middle nod v can be max(vr, vl) <= v <= vr + vl, when vr is a balance of right node, vl is a balance of left node. Detail attack factor is described in this paper. Summa’s Merkle tree constructs a middle node that overcomes this vulnerability.

Summa’s Merkle Sum Tree

Summa’s implementation of the Merkle Sum Tree is designed for higher efficiency and addresses the vulnerabilities found in the Original Merkle Sum Tree, as described in a particular paper. It includes zkProofs for verifying tree integrity without revealing user details and an efficient approach for hashing in the middle nodes.

Middle Node Hash Comparison

Attribute Broken Merkle Tree Summa’s Approach
Formula H(LeftChild.hash, LeftChild.balance[0], LeftChild.balance[1], ..., LeftChild.balance[N_ASSETS], RightChild.hash, RightChild.balance[0], RightChild.balance[1], ..., RightChild.balance[N_ASSETS]) H(LeftChild.balance[0] + RightChild.balance[0], LeftChild.balance[1] + RightChild.balance[1], ..., LeftChild.balance[N_ASSETS - 1] + RightChild.balance[N_ASSETS - 1], LeftChild.hash, RightChild.hash)
Size 2 * (1 + N_ASSETS) N_ASSETS + 2

Broken Merkel Sum Tree is a solution to the Original Merkle Sum Tree’s vulnerability from this paper. As means, it has zkProof for verifying the tree’s integrity without revealing detailed info about the user(solution 2) and implemented middle node’s hash in hash(vl || vr || || h(l) || h(r)) this format (solution 1). However, in terms of efficiency, Summa takes a unique approach to hash.

Compared to the original middle node hash described on paper as a Broken Merkle Tree, for shorter hashing and cost for building the tree, building the zk proof of inclusion, Summa’s middle node hash formula is designed differently in this PR.

Summa’s MST Implementation Table

Leaf Node

Attribute Formula
Hash H(username, balance[0], balance[1], ..., balance[N_ASSETS - 1])
Balance balance[0], balance[1], ..., balance[N_ASSETS - 1]

Middle Node

Attribute Formula
Hash H(LeftChild.balance[0] + RightChild.balance[0], LeftChild.balance[1] + RightChild.balance[1], ..., LeftChild.balance[N_ASSETS - 1] + RightChild.balance[N_ASSETS - 1], LeftChild.hash, RightChild.hash)
Balance LeftChild.balance[0] + RightChild.balance[0], LeftChild.balance[1] + RightChild.balance[1], ..., LeftChild.balance[N_ASSETS - 1] + RightChild.balance[N_ASSETS - 1]

PoC

To revive the broken Merkle Sum Tree attack, will follow the PoC steps below:

  1. Corrupt Merkle Sum Tree Construction: will alter the construction of the middle node value into max(vr, vl).
  2. Corrupt proof generation: to fully attack this Merkle Sum Tree, need to corrupt proof generation so that the output of the proof should be valid in the original Merkle Sum Tree inclusion proof verification step.

Turned out that when trying to build a Tree in max(vr, vl) this value, and turn out cannot corrupt both balance and hash at the same time when following Summa’s Merkle Sum Tree implementation.

Corrupt Merkle Sum Tree Construction

commit: https://github.com/rkdud007/summa-solvency/pull/1/commits/774d68a6d52734831821eba5bd7062505e9a8aa8

This commit simply changed tree construction logic instead of using vr + vl, use max(vr, vl). So I called it as Corrupted Mekle Sum Tree.

As we expected, the generated proof doesn’t pass the verification.

cargo run –release –example gen_inclusion_proof

Start:   Creating params
End:     Creating params ...........................................................16.279ms


Start:   Creating proof
thread 'main' panicked at 'assertion failed: result.is_ok()', /Users/piapark/Documents/GitHub/summa-solvency/zk_prover/src/circuits/utils.rs:193:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Corrupt Balance on generating proof

commit: https://github.com/rkdud007/summa-solvency/pull/1/commits/7130a8645675736c86b1cdd54bdd810aea8f98cb

We also want to corrupt proof generation logic, so that non-corrupted Merkle inclusion proof verification should pass with corrupted proof and commit.

As corruption happened in switching the parent node balance into max(vr, vl), the expected corrupted proof should be max(vr, vl) - current balance for all N_CURRUNCIES.

Screenshot 2024-03-05 at 6 58 28 PM

Corrupt Hash on generating proof

As verification checks both balance and hash from proof ( list of path nodes ), we also need to corrupt the hash of this proof.

As we need to match with corrupted construction’s result root node, ( both balance and hash ) hash generation should follow the corrupted tree building.

During construction with maximum value for middle nodes, the middle node’s hash is already corrupted with maximum value. But especially in level 1, the child node’s hash are same as the original as it is a leaf node.

Screenshot 2024-03-05 at 7 38 03 PM

During proof generation, for example in Claire case, start from the sibling leaf node, and all the path node’s hash is corrupted not only due to balance but also from the child node hash itself ( as it requires corruption on the leaf node also )

Screenshot 2024-03-05 at 7 39 51 PM

Conclusion

Our goal for exploiting this vulnerability is to match root 0xcorruptedhash with root 0xcorruptedwithmaximumhash. As the verification is happening while getting path nodes from proof ( provided leaf and middle node preimages ) and operating summing the balance & hash it within the balance and child hashes, It is impossible to corrupt both balance and hash at the same time. Balance corruption will determine the corrupted hash value, which will not match with 0xcorruptedwithmaximumhash that cannot be corrupted with proof generation.

So we could say Summa’s modified Merkle Sum Tree implementation is indeed safe through this attack PoC.

Specially Thanks to 0xPanicError for doing PoC with me together, thanks to Summa core developers especially Enrico for providing a detail explanation of Summa protocol, thanks to yAcademy for have me in amazing zk fellowship.

#Cryptography #Data Structure #Security