Feature Description
The goal is to introduce Query builder API, the similar way as it's implemented for PipelineDP4j, code.
Example of usage of QueryBuilder API from PipelineDP4j
QueryBuilder.from(data)
.groupBy(
ColumnNames("movie_id", "movie_title", "rating_year"),
OptimalGroupSelectionGroupBySpec.Builder()
.setPrivacyUnit(ColumnNames("account_id"))
.setContributionBoundingLevel(
ContributionBoundingLevel.PartitionLevel()
)
.setBudget(eps = 1.1, delta = 0.005)
.setMinPrivacyUnitsPerGroup(5),
)
.count(
"rating_count",
LaplaceCountSpec.Companion.Builder()
// can be omitted because RecordLevel is used. Or can be set to account_id.
.setPrivacyUnit(ColumnNames("record_id"))
.setBudget(eps = 1.0)
.build(),
)
.mean(
valueColumnName = "rating",
outputColumnName = "avg_rating",
.setBudget(eps = 1.0)
.setValueBounds(minValue = 1.0, maxValue = 5.0)
.build(),
)
.build()
.run()
The AggregationBuilder is a declarative way how to define DP query. It's equiavalent of SQL. The idea of the builder was inspired by Beam GroupBy.
Notes:
data can be DataFrame (Pandas, Beam, Spark) or Beam PCollection or Spark Rdd
- Query will use DPEngine API for performing DP computations.
- There is experimental implementation
QueryBuilder API (code)
Feature Description
The goal is to introduce Query builder API, the similar way as it's implemented for PipelineDP4j, code.
Example of usage of QueryBuilder API from PipelineDP4j
The
AggregationBuilderis a declarative way how to define DP query. It's equiavalent of SQL. The idea of the builder was inspired by Beam GroupBy.Notes:
datacan be DataFrame (Pandas, Beam, Spark) or Beam PCollection or Spark RddQueryBuilderAPI (code)