Partition a CSV based on a column value.
Table of Contents | Source: src/cmd/partition.rs | 👆
Description | Usage | Arguments | Partition Options | Common Options
Description ↩
Partitions the given CSV data into chunks based on the value of a column.
See split command to split a CSV data by row count, by number of chunks or
by kb-size.
The files are written to the output directory with filenames based on the
values in the partition column and the --filename flag.
Note: To account for case-insensitive file system collisions (e.g. macOS APFS and Windows NTFS), the command will add a number suffix to the filename if the value is already in use.
EXAMPLE:
Partition nyc311.csv file into separate files based on the value of the "Borough" column in the current directory:
$ qsv partition Borough . --filename "nyc311-{}.csv" nyc311.csvwill create the following files, each containing the data for each borough: nyc311-Bronx.csv nyc311-Brooklyn.csv nyc311-Manhattan.csv nyc311-Queens.csv nyc311-Staten_Island.csv
For more examples, see https://github.com/dathere/qsv/blob/master/tests/test_partition.rs.
Usage ↩
qsv partition [options] <column> <outdir> [<input>]
qsv partition --helpArguments ↩
| Argument | Description |
|---|---|
<column> |
The column to use as a key for partitioning. You can use the --select option to select the column by name or index, but only one column can be used for partitioning. See select command for more details. |
<outdir> |
The directory to write the output files to. |
<input> |
The CSV file to read from. If not specified, then the input will be read from stdin. |
Partition Options ↩
| Option | Type | Description | Default |
|---|---|---|---|
‑‑filename |
string | A filename template to use when constructing the names of the output files. The string '{}' will be replaced by a value based on the partition column, but sanitized for shell safety. | {}.csv |
‑p,‑‑prefix‑length |
string | Truncate the partition column after the specified number of bytes when creating the output file. | |
‑‑drop |
flag | Drop the partition column from results. | |
‑‑limit |
string | Limit the number of simultaneously open files. Useful for partitioning large datasets with many unique values to avoid "too many open files" errors. Data is processed in batches until all unique values are processed. If not set, it will be automatically set to the system limit with a 10% safety margin. If set to 0, it will process all data at once, regardless of the system's open files limit. |
Common Options ↩
| Option | Type | Description | Default |
|---|---|---|---|
‑h,‑‑help |
flag | Display this message | |
‑n,‑‑no‑headers |
flag | When set, the first row will NOT be interpreted as column names. Otherwise, the first row will appear in all chunks as the header row. | |
‑d,‑‑delimiter |
string | The field delimiter for reading CSV data. Must be a single character. (default: ,) |
Source: src/cmd/partition.rs
| Table of Contents | README