dedup

Remove duplicate rows (See also extdedup, extsort, sort & sortcheck commands).

Table of Contents | Source: src/cmd/dedup.rs | 🤯🚀👆

Description | Examples | Usage | Dedup Options | Common Options

Description ↩

Deduplicates CSV rows.

This requires reading all of the CSV data into memory because because the rows need to be sorted first.

That is, unless the --sorted option is used to indicate the CSV is already sorted - typically, with the sort cmd for more sorting options or the extsort cmd for larger than memory CSV files. This will make dedup run in streaming mode with constant memory.

Either way, the output will not only be deduplicated, it will also be sorted.

A duplicate count will also be sent to .

Examples ↩

Deduplicate an unsorted CSV file:

qsv dedup unsorted.csv -o deduped.csv

Deduplicate a sorted CSV file:

qsv sort unsorted.csv | qsv dedup --sorted -o deduped.csv

Deduplicate based on specific columns:

qsv dedup --select col1,col2 unsorted.csv -o deduped.csv

Deduplicate based on numeric comparison of col1 and col2 columns:

qsv dedup -s col1,col2 --numeric unsorted.csv -o deduped.csv

Deduplicate ignoring case of col1 and col2 columns:

qsv dedup -s col1,col2 --ignore-case unsorted.csv -o deduped.csv

Write duplicates to a separate file:

qsv dedup -s col1,col2 --dupes-output dupes.csv unsorted.csv -o deduped.csv

For more examples, see tests.

Usage ↩

qsv dedup [options] [<input>]
qsv dedup --help

Dedup Options ↩

Option	Type	Description
`‑s,` `‑‑select`	string	Select a subset of columns to dedup. Note that the outputs will remain at the full width of the CSV. See 'qsv select --help' for the format details.
`‑N,` `‑‑numeric`	flag	Compare according to string numerical value
`‑i,` `‑‑ignore‑case`	flag	Compare strings disregarding case.
`‑‑sorted`	flag	The input is already sorted. Do not load the CSV into memory to sort it first. Meant to be used in tandem and after an extsort.
`‑D,` `‑‑dupes‑output`	string	Write duplicates to .
`‑H,` `‑‑human‑readable`	flag	Comma separate duplicate count.
`‑j,` `‑‑jobs`	string	The number of jobs to run in parallel when sorting an unsorted CSV, before deduping. When not set, the number of jobs is set to the number of CPUs detected. Does not work with --sorted option as its not multithreaded.

Common Options ↩

Option	Type	Description
`‑h,` `‑‑help`	flag	Display this message
`‑o,` `‑‑output`	string	Write output to instead of stdout.
`‑n,` `‑‑no‑headers`	flag	When set, the first row will not be interpreted as headers. That is, it will be sorted with the rest of the rows. Otherwise, the first row will always appear as the header row in the output.
`‑d,` `‑‑delimiter`	string	The field delimiter for reading CSV data. Must be a single character. (default: ,)
`‑q,` `‑‑quiet`	flag	Do not print duplicate count to stderr.
`‑‑memcheck`	flag	Check if there is enough memory to load the entire CSV into memory using CONSERVATIVE heuristics.

Source: src/cmd/dedup.rs | Table of Contents | README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dedup

Description ↩

Examples ↩

Usage ↩

Dedup Options ↩

Common Options ↩

FilesExpand file tree

dedup.md

Latest commit

History

dedup.md

File metadata and controls

dedup

Description ↩

Examples ↩

Usage ↩

Dedup Options ↩

Common Options ↩