Skip to content

Latest commit

 

History

History
86 lines (63 loc) · 6.61 KB

File metadata and controls

86 lines (63 loc) · 6.61 KB

pivotp

Pivot CSV data. Features "smart" aggregation auto-selection based on data type & stats.

Table of Contents | Source: src/cmd/pivotp.rs | 🚀🐻‍❄️🪄

Description | Usage | Arguments | Pivotp Options | Common Options

Description

Pivots or groups CSV data using the Polars engine.

PIVOT MODE (with ): The pivot operation consists of:

  • One or more index columns (these will be the new rows)
  • A column that will be pivoted (this will create the new columns)
  • A values column that will be aggregated
  • An aggregation function to apply. Features "smart" aggregation auto-selection.

GROUP-BY MODE (without ): When is omitted, performs a group-by aggregation instead of a pivot. This is useful for simple aggregations like counting rows per group. In group-by mode, --index is required and --agg smart resolves to len (count). The none aggregation is not supported in group-by mode. If --values is omitted, a single "count" column is produced.

For examples, see https://github.com/dathere/qsv/blob/master/tests/test_pivotp.rs.

Usage

qsv pivotp [options] <on-cols> <input>
qsv pivotp [options] <input>
qsv pivotp --help

Arguments

 Argument   Description
 <on-cols>  The column(s) to pivot on (creates new columns). When omitted, pivotp runs in group-by mode.
 <input>  The input CSV file. The file must have headers. If the file has a pschema.json file, it will be used to inform the pivot operation unless --infer-len is explicitly set to a value other than the default of 10,000 rows. Stdin is not supported.

Pivotp Options

      Option       Type Description Default
 ‑i,
‑‑index 
string The column(s) to use as the index (row labels). Specify multiple columns by separating them with a comma. The output will have one row for each unique combination of the index's values. If None, all remaining columns not specified on --on and --values will be used. At least one of --index and --values must be specified. Required in group-by mode.
 ‑v,
‑‑values 
string The column(s) containing values to aggregate. If an aggregation is specified, these are the values on which the aggregation will be computed. If None, all remaining columns not specified on --on and --index will be used. At least one of --index and --values must be specified. In group-by mode, if omitted, a single "count" column is produced.
 ‑a,
‑‑agg 
string The aggregation function to use: first - First value encountered last - Last value encountered sum - Sum of values min - Minimum value max - Maximum value mean - Average value median - Median value len - Count of values item - Get single value from group. Raises error if there are multiple values. smart - use value column data type & statistics to pick an aggregation. Always uses type, cardinality, sparsity, CV, sign distribution (n_negative/n_positive), and sort_order from streaming stats. When the stats cache includes non-streaming stats (from a prior stats --everything or stats --mode --quartiles), also uses skewness and mode_count. When moarstats has been run, also leverages outlier profile, Pearson skewness, MAD/stddev ratio, median/mean ratio, and quartile coefficient of dispersion for smarter selection. With moarstats --advanced, also uses kurtosis, bimodality, entropy and Gini coefficient. For Date/DateTime values, checks sparsity and sort order. Will only work if there is one value column, otherwise it falls back to first smart
 ‑‑sort‑columns  flag Sort the transposed columns by name. (pivot mode only)
 ‑‑maintain‑order  flag Maintain output order: preserve input column order in pivot mode, and preserve group/row order in group-by mode.
 ‑‑col‑separator  string The separator in generated column names in case of multiple --values columns. (pivot mode only; ignored in group-by mode) _
 ‑‑validate  flag Validate a pivot by checking the pivot column(s)' cardinality. (pivot mode only)
 ‑‑try‑parsedates  flag When set, will attempt to parse columns as dates.
 ‑‑infer‑len  string Number of rows to scan when inferring schema. Set to 0 to scan entire file. 10000
 ‑‑decimal‑comma  flag Use comma as decimal separator when READING the input. Note that you will need to specify an alternate --delimiter.
 ‑‑ignore‑errors  flag Skip rows that can't be parsed.
 ‑‑grand‑total  flag Append a grand total row summing all numeric non-index columns. The first index column will contain "Grand ".
 ‑‑subtotal  flag Insert subtotal rows after each group in the first index column. The second index column will contain the total label. Requires 2+ index columns. (pivot mode only)
 ‑‑total‑label  string Custom label for total rows. Total

Common Options

     Option      Type Description Default
 ‑h,
‑‑help 
flag Display this message
 ‑o,
‑‑output 
string Write output to instead of stdout.
 ‑d,
‑‑delimiter 
string The field delimiter for reading/writing CSV data. Must be a single character. (default: ,)
 ‑q,
‑‑quiet 
flag Do not return smart aggregation chosen nor pivot result shape to stderr.

Source: src/cmd/pivotp.rs | Table of Contents | README