Skip to content

secure join (cpz)#1431

Open
cpz2024 wants to merge 19 commits intosecretflow:zjj/wk25_merge_joinfrom
cpz2024:join
Open

secure join (cpz)#1431
cpz2024 wants to merge 19 commits intosecretflow:zjj/wk25_merge_joinfrom
cpz2024:join

Conversation

@cpz2024
Copy link
Copy Markdown
Contributor

@cpz2024 cpz2024 commented Apr 7, 2026

Pull Request

What problem does this PR solve?

Issue Number: Fixed #

Possible side effects?

  • Performance:

  • Backward compatibility:

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces secure join operations (join_uu_ss, join_uu_vv, and join_un) to the HAL layer, supported by new MPC kernels and extensive unit tests. Key improvements include an O(1) lookup optimization for permutation solving in the MPC utility layer. Feedback identifies a potential shape mismatch bug in the associative_reduce utility, significant code duplication between join variants that warrants refactoring, and several instances of dead or commented-out code that should be addressed for better maintainability.


// Edge case: if the last dimension has <= 1 element or tensor is empty
if (N < 2 || shape.numel() == 0) {
return in;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The early return here can lead to a shape mismatch. The function is expected to return a tensor with the last dimension reduced, but when N < 2, it returns the input tensor in unmodified. For example, if in has a shape of (M, 1), the expected output shape is (M), but this will return a tensor of shape (M, 1). This can be fixed by reshaping the output before returning.

Suggested change
return in;
return hal::reshape(ctx, in, out_shape);


// Secure _join_uu_ss
// Ref: https://dl.acm.org/doi/10.1145/3372297.3423358 (figure 6)
std::vector<spu::Value> join_uu_ss(SPUContext* ctx,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication between join_uu_ss and join_uu_vv. For example, steps 3 through 7 (permuting tables, comparison, handling payload, and final result aggregation) are nearly identical. This duplication makes the code harder to maintain.

Consider refactoring the common logic into a helper function to reduce duplication and improve maintainability. This helper could be parameterized (e.g., with lambdas) to handle the minor differences between the two functions, such as how values are created (hal::seal vs hal::_p2v).

Comment thread src/libspu/kernel/hal/join.cc Outdated
Comment thread src/libspu/kernel/hal/join.cc Outdated
Comment thread src/libspu/mpc/common/pv2k.cc Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants