I'd like to be able to merge rows sharing a (possibly complex) key within a single (optionally multiple) file.
Example data:
This is a common, normalized representation of data. I'd like to be able to run:
$ echo "1,a,A\n1,b,B\n2,x,X\n2,y,Y\n" | csvtk fold -H -f 1 -C -v 2
where -C stands in for the new argument that creates new columns for each row:
or
Whether row columns are appended sequentially, or interleved, is unimportant as transforming column orders after the operation is trivial.
It would be acceptable if, when csvtk found that the result would produce rows of different lengths -- for example, an ID has more or less than the usual number of rows, that it would generate an error.
You can almost get there with fold and sep, except there's no way to include additional columns in fold, so it only works on two columns:
$ echo "1,a,A\n1,b,B\n2,x,X\n2,y,Y\n" | csvtk fold -H -f 1 -s ';' -v 2 | csvtk sep -H -s ';' -f 2 -R
1,a,b
2,x,y
The data in the additional rows is lost.
One IRL use case for this is with ledger csv, which produces output like this:
transaction uuid,date,payee,amount
aaa123,2024-03-01,Bill's Garage,50
aaa123,2024-03-01,Bank account,50
456abc,2024-03-05,Solar Power Monopoly,200
456abc,2024-03-05,Credit card,200
This is a double-entry system, so there's always a credit line and a corresponding debit line. What I want is:
transaction uuid,date,payee,source,amount
aaa123,2024-03-01,Bill's Garage,Bank account,50
456abc,2024-03-05,Solar Power Monopoly,Credit card,200
However, an MVP would produce:
transaction uuid,date,payee 1,payee 2,amount 1,amount 2
aaa123,2024-03-01,Bill's Garage,Bank account,50
456abc,2024-03-05,Solar Power Monopoly,Credit card,200
I tried to not present an X:Y problem, but I think a solution would be allowing fold and sep to operate on multiple -v (and -f, respectively) columns,
I'd like to be able to merge rows sharing a (possibly complex) key within a single (optionally multiple) file.
Example data:
This is a common, normalized representation of data. I'd like to be able to run:
where
-Cstands in for the new argument that creates new columns for each row:or
Whether row columns are appended sequentially, or interleved, is unimportant as transforming column orders after the operation is trivial.
It would be acceptable if, when csvtk found that the result would produce rows of different lengths -- for example, an ID has more or less than the usual number of rows, that it would generate an error.
You can almost get there with
foldandsep, except there's no way to include additional columns infold, so it only works on two columns:The data in the additional rows is lost.
One IRL use case for this is with
ledger csv, which produces output like this:This is a double-entry system, so there's always a credit line and a corresponding debit line. What I want is:
However, an MVP would produce:
I tried to not present an X:Y problem, but I think a solution would be allowing
foldandsepto operate on multiple-v(and-f, respectively) columns,