Skip to content

Aggregation Builder API #580

@dvadym

Description

@dvadym

Feature Description

The goal is to introduce Query builder API, the similar way as it's implemented for PipelineDP4j, code.

Example of usage of QueryBuilder API from PipelineDP4j

QueryBuilder.from(data)
    .groupBy(
      ColumnNames("movie_id", "movie_title", "rating_year"),
      OptimalGroupSelectionGroupBySpec.Builder()
        .setPrivacyUnit(ColumnNames("account_id"))
        .setContributionBoundingLevel(
    ContributionBoundingLevel.PartitionLevel()
  )
        .setBudget(eps = 1.1, delta = 0.005)
        .setMinPrivacyUnitsPerGroup(5),
    )
    .count(
      "rating_count",
      LaplaceCountSpec.Companion.Builder()
        // can be omitted because RecordLevel is used. Or can be set to account_id.
        .setPrivacyUnit(ColumnNames("record_id"))
        .setBudget(eps = 1.0)
        .build(),
    )
    .mean(
      valueColumnName = "rating",
      outputColumnName = "avg_rating",
        .setBudget(eps = 1.0)
        .setValueBounds(minValue = 1.0, maxValue = 5.0)
        .build(),
    )
    .build()
    .run()

The AggregationBuilder is a declarative way how to define DP query. It's equiavalent of SQL. The idea of the builder was inspired by Beam GroupBy.

Notes:

  1. data can be DataFrame (Pandas, Beam, Spark) or Beam PCollection or Spark Rdd
  2. Query will use DPEngine API for performing DP computations.
  3. There is experimental implementation QueryBuilder API (code)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: New Feature ➕Introduction of a completely new addition to the codebase

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions