Modify headers of a CSV to only have "safe" names - guaranteed "database-ready"/"CKAN-ready" names.
Table of Contents | Source: src/cmd/safenames.rs | 
Description | Usage | Safenames Options | Common Options
Description ↩
Modify headers of a CSV to only have "safe" names - guaranteed "database-ready" names (optimized specifically for PostgreSQL column identifiers).
Fold to lowercase. Trim leading & trailing whitespaces. Replace whitespace/non-alphanumeric characters with _. If name starts with a number & check_first_char is true, prepend the unsafe prefix. If a header with the same name already exists, append a sequence suffix (e.g. col, col_2, col_3). Names are limited to 60 characters in length. Empty names are replaced with the unsafe prefix.
In addition, specifically because of CKAN Datastore requirements:
- Headers with leading underscores are replaced with "unsafe_" prefix.
- Headers that are named "_id" are renamed to "reserved__id".
These CKAN Datastore options can be configured via the --prefix & --reserved options, respectively.
In Always (a) and Conditional (c) mode, returns number of modified headers to stderr, and sends CSV with safe headers output to stdout.
In Verify (v) mode, returns number of unsafe headers to stderr. In Verbose (V) mode, returns number of headers; duplicate count and unsafe & safe headers to stderr. No stdout output is generated in Verify and Verbose mode.
In JSON (j) mode, returns Verbose mode info in minified JSON to stdout. In Pretty JSON (J) mode, returns Verbose mode info in pretty printed JSON to stdout.
Given data.csv: c1,12_col,Col with Embedded Spaces,,Column!@Invalid+Chars,c1 1,a2,a3,a4,a5,a6
$ qsv safenames data.csvc1,unsafe_12_col,col_with_embedded_spaces,unsafe_,column__invalid_chars,c1_2 1,a2,a3,a4,a5,a6 stderr: 5
Conditionally rename headers, allowing "quoted identifiers":
$ qsv safenames --mode c data.csvc1,unsafe_12_col,Col with Embedded Spaces,unsafe_,column__invalid_chars,c1_2 1,a2,a3,a4,a5,a6 stderr: 4
Verify how many "unsafe" headers are found:
$ qsv safenames --mode v data.csvstderr: 4
Verbose mode:
$ qsv safenames --mode V data.csvstderr: 6 header/s 1 duplicate/s: "c1:2" 4 unsafe header/s: ["12_col", "Col with Embedded Spaces", "", "Column!@Invalid+Chars"] 1 safe header/s: ["c1"]
Note that even if "Col with Embedded Spaces" is technically safe, it is generally discouraged. Though it can be created as a "quoted identifier" in PostgreSQL, it is still marked "unsafe" by default, unless mode is set to "conditional."
It is discouraged because the embedded spaces can cause problems later on. (see https://lerner.co.il/2013/11/30/quoting-postgresql/ for more info).
For more examples, see https://github.com/dathere/qsv/blob/master/tests/test_safenames.rs.
Usage ↩
qsv safenames [options] [<input>]
qsv safenames --helpSafenames Options ↩
| Option | Type | Description | Default |
|---|---|---|---|
‑‑mode |
string | Rename header names to "safe" names - i.e. guaranteed "database-ready" names. It has six modes - conditional, always, verify, Verbose, with Verbose having two submodes - JSON & pretty JSON. | Always |
‑‑reserved |
string | Comma-delimited list of additional case-insensitive reserved names that should be considered "unsafe." If a header name is found in the reserved list, it will be prefixed with "reserved_". | _id |
‑‑prefix |
string | Certain systems do not allow header names to start with "" (e.g. CKAN Datastore). This option allows the specification of the unsafe prefix to use when a header starts with "". | unsafe_ |
Common Options ↩
| Option | Type | Description | Default |
|---|---|---|---|
‑h,‑‑help |
flag | Display this message | |
‑o,‑‑output |
string | Write output to instead of stdout. Note that no output is generated for Verify and Verbose modes. | |
‑d,‑‑delimiter |
string | The field delimiter for reading CSV data. Must be a single character. (default: ,) |
Source: src/cmd/safenames.rs
| Table of Contents | README