Caphash optimization by alexander-zw · Pull Request #45 · scipr-lab/libiop

alexander-zw · 2021-05-03T20:11:19Z

Cap hashing was implemented for merkle trees. The algebraic and non-algebraic cap hashes were both implemented in hash_enum.tcc, as well as the dummy algebraic hash. See issue #41.

A bug was fixed that merkle tree membership proof generation and verification didn't work when the input queries weren't sorted and unique. They now work when the tree is not zero knowledge, but when the tree is set to be zero knowledge, the code still fails.

The hash count method was updated to count each two-to-one hash as 2 units and the cap hash as 1 unit per input.

More tests were added, to test cap hashing, and to test large trees with randomly generated query leaves. The code no not test unsorted non-unique queries since that still fails for zero knowledge trees.

No snark protocols currently take advantage of the cap hash; they still have cap size set to 2.

~~Changes are complete, but not ready to merge because I started working based off of PR #43, so that should preferably be merged first.~~ Ready for review

ValarDragon · 2021-05-03T20:12:42Z

    /* TODO: Push setting leaf hash into internal BCS code. Currently 2 is fine, as leaf size is internally unused. */
    const size_t leaf_size = 2;
-    params.leafhasher_ = get_leafhash<FieldT, MT_root_hash>(hash_type, security_parameter, leaf_size);
+    params.leafhasher_ = get_leafhash<MT_root_hash, FieldT>(hash_type, security_parameter, leaf_size);


Was this just a bug before?

No, I switched the order to be consistent with similar functions

What was here before was right I believe https://github.com/scipr-lab/libiop/blob/master/libiop/bcs/hashing/hash_enum.hpp#L33-L34

I still believe the template ordering before was right

Oh I see, you switched the template ordering of the internal command. Currently there are inconsistent orderings throughout, I believe the old <FieldT, MT_root_hash> was consistently used everywhere. We should consistently be using one or the other, I don't really have a preference for which one.

I switched all instances of the leaf hash functions to be the same as the hashchain. If you insist, I can change it back, but it is currently consistent as well.

…rship_proof

…ation

weikengchen · 2022-09-24T08:39:55Z

-    params.hashchain_ =
-        get_hashchain<FieldT, MT_root_hash>(hash_type, security_parameter);
+    params.cap_hasher = get_cap_hash<MT_root_hash, FieldT>(hash_type, security_parameter);
+    params.hashchain_ = get_hashchain<FieldT, MT_root_hash>(hash_type, security_parameter);


As discussed above, we are switching <FieldT, MT_root_hash> to <MT_root_hash, FieldT> in leafhasher (now cap_hasher here).

Do we want to apply the same refactoring to hashchain? We probably should do that in a separate PR for ease of testing.

Yeah I'd prefer to keep it the same for this PR, unless it causes merge conflicts

If it's acceptable, I'd like to keep them both the current order for now. We can refactor both at the same time in a future PR. This PR is already quite big, and it's been a long time since I wrote this so I don't remember all the context

weikengchen · 2022-09-24T08:58:23Z

    hash_digest_type get_root() const;

+    /* These two functions do not currently work if the given positions aren't sorted or
+       have duplicates, AND the tree is set to be zero knowledge. */


Is this one a new limitation that is caused by the present PR?

I really don't remember. It might be new due to the cap hashes. If so, then the issue will not be present if you don't use the new cap hash optimization.

I highly recommend explicitly sorting the positions within this function and lift such limitation, as it is unnatural.

weikengchen · 2022-09-24T09:02:19Z

 {
-    // TODO: Better document this function, its hashing layer by layer.
-    std::size_t n = (this->num_leaves_ - 1) / 2;
+    /* n is the first index of the layer we're about to compute. It starts at the bottom layer.


Bottom layer or the layer above the bottom layer?

You're right it's the bottom inner node layer

weikengchen · 2022-09-24T09:09:39Z

        {
+            /* FIXME: This code is currently incorrect if the given list of positions is not
+               sorted or has duplicates. This could be fixed if both positions and leaf_contents
+               are sorted before the leaf hashes are calculated, which would require refactoring. */


Is this a problem before this PR, or is it a new one? (My impression is that the way we add ZK is reasonably adding this limitation. Is the original okay when the positions are not sorted?)

^ same as above. If possible, explicitly sort the positions and lift this limitation.

weikengchen · 2022-09-24T09:11:19Z

    }

+    // Add the cap, including the root's direct children and the root.
+    // The only elements should be the cap (not including the root).


Should the cap include the root, or not?

Good question. I think the cap doesn't include the root. It's been such a long time since I wrote this code so I can only guess.

weikengchen · 2022-09-24T09:17:13Z

+            throw std::invalid_argument("Poseidon only supported for 128 bit soundness.");
+        }
+        poseidon_params<FieldT> params = get_poseidon_parameters<FieldT>(hash_enum);
+        /* security parameter is -1 b/c */


Clarify "-1" and "b/c"?

I have no idea what it means, it was copied from elsewhere

Then this line should be removed.

weikengchen · 2022-09-24T09:18:22Z

+        /* We explicitly place this on heap with no destructor,
+           as this reference has to live after the function terminates */
+        std::shared_ptr<algebraic_vector_hash<FieldT>> hash_class =
+            std::make_shared<algebraic_vector_hash<FieldT>>(permutation, security_parameter - 1);


Why security_parameter - 1 instead of security_parameter?

No idea, it was also copied from elsewhere

Can you refer to the source?

weikengchen · 2022-09-24T19:00:28Z

+        cap_size);
 }

+/** Constructs a merkle tree with leaf size 2. Generates and verifies membership proofs for


What is the convention for ** or * in libiop?

I don't know if there's an existing convention, the one I use is /** if it's documentation that the user might want to see on hover, and /* if it's a multiline comment that's only useful to see in the location where it's added (// if it's a single line comment)

Or rather, that's the philosophy I have now. Looks like I didn't stick to it at the time

Considered this one resolved.

weikengchen · 2022-09-24T19:04:39Z

I did a pass of the code, and most are smaller points. Let me know when you want me to reach out to the repo's owner to coordinate merging this PR.

ValarDragon reviewed May 3, 2021

View reviewed changes

alexander-zw force-pushed the caphash branch 4 times, most recently from f5fb79d to c04d91c Compare May 9, 2021 02:51

alexander-zw changed the title ~~Caphash~~ Caphash optimization May 9, 2021

alexander-zw added 7 commits June 27, 2021 17:09

Add cap hash function using blake 2b

a3038f9

Add cap hashing mechanism to merkle tree

e03582d

Add more tests for merkle trees and add sorting to validate_set_membe…

87828ca

…rship_proof

Update method that counts the number of hashes in merkle tree verific…

62cff94

…ation

Add algebraic cap hash

62245b9

Add cap size to BCS parameters

12259be

Make hash enum function errors more descriptive

5b276ee

alexander-zw force-pushed the caphash branch from 6f819da to 5b276ee Compare June 28, 2021 00:18

alexander-zw requested a review from ValarDragon June 28, 2021 00:19