remove-columns
Overview¶
kgtk remove-columns removes a subset of the columns from a KGTK file.
Note
This comand can be used to remove the columns of non-KGTK TSV input files (quasi-KGTK files)
by using the expert option --mode=NONE.
Note
The output file should still have required columns (id for a KGTK node file, (node1, label, node2)
for a KGTK edge file). This requirement may be disabled with the expert option --mode=NONE, but the
output file will not be a valid KGTK node or edge file.
Note
kgtk reorder-columns --trim may be used as an alternative to kgtk remove-columns.
Info
See kgtk rename-columns if you wish to rename columns.
See kgtk reorder-columns if you wish to reorder columns.
List of Column Names¶
When you use this command, you supply the --columns option with
a list of column names to be removed from the output file.
Column names may be passed to the --columns option as an unquoted, space-separated
list, as with other KGTK commands.
By default, column names are split on commas (,), unless --split-on-commas=FALSE is specified.
By default, leading and trailing whitespace is removed from column names,
unless --strip-spaces=FALSE is specified.
Column names can be passed as a quoted list and split on spaces, if --split-on-spaces=TRUE is specified.
Usage¶
usage: kgtk remove-columns [-h] [-i INPUT_FILE] [-o OUTPUT_FILE] -c COLUMNS
[COLUMNS ...] [--split-on-commas [SPLIT_ON_COMMAS]]
[--split-on-spaces [SPLIT_ON_SPACES]]
[--strip-spaces [STRIP_SPACES]]
[--all-except [ALL_EXCEPT]]
[--ignore-missing-columns [IGNORE_MISSING_COLUMNS]]
[-v [optional True|False]]
Remove specific columns from a KGTK file.
optional arguments:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
The KGTK input file. (May be omitted or '-' for
stdin.)
-o OUTPUT_FILE, --output-file OUTPUT_FILE
The KGTK output file. (May be omitted or '-' for
stdout.)
-c COLUMNS [COLUMNS ...], --columns COLUMNS [COLUMNS ...]
Columns to remove as a comma- or space-separated
strings, e.g., id,docid or id docid
--split-on-commas [SPLIT_ON_COMMAS]
When True, parse the list of columns, splitting on
commas. (default=True).
--split-on-spaces [SPLIT_ON_SPACES]
When True, parse the list of columns, splitting on
spaces. (default=False).
--strip-spaces [STRIP_SPACES]
When True, parse the list of columns, stripping
whitespace. (default=True).
--all-except [ALL_EXCEPT]
When True, remove all columns except the listed ones.
(default=False).
--ignore-missing-columns [IGNORE_MISSING_COLUMNS]
When True, ignore missing columns. (default=False).
-v [optional True|False], --verbose [optional True|False]
Print additional progress messages (default=False).
Examples¶
Sample Data¶
Suppose that file1.tsv contains the following table in KGTK format:
kgtk cat -i examples/docs/remove-columns-file1.tsv
| node1 | label | node2 | location | years |
|---|---|---|---|---|
| john | zipcode | 12345 | home | 10 |
| john | zipcode | 12346 | ||
| peter | zipcode | 12040 | home | |
| peter | zipcode | 12040 | cabin | |
| peter | zipcode | 12040 | work | 5 |
| peter | zipcode | 12040 | 6 | |
| steve | zipcode | 45601 | 3 | |
| steve | zipcode | 45601 | 4 | |
| steve | zipcode | 45601 | 5 | |
| steve | zipcode | 45601 | home | 1 |
| steve | zipcode | 45601 | work | 2 |
| steve | zipcode | 45601 | cabin |
Remove Specific Columns using an Unquoted List¶
Copy file1.tsv, sending the output to standard output,
removing the columns location and years, using an
unquoted list:
kgtk remove-columns -i examples/docs/remove-columns-file1.tsv \
--columns location years
| node1 | label | node2 |
|---|---|---|
| john | zipcode | 12345 |
| john | zipcode | 12346 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
Remove Specific Columns using an Unquoted List, Allowing Commas¶
Copy file1.tsv, sending the output to standard output,
removing the columns location and years, using an
unquoted list that allows commas inside column names
Note
The sample data does not include a column name with commas in it, so this is not a very good example. If it did, there would be a warning message whenever a KGTK command reads the file's header record.
kgtk remove-columns -i examples/docs/remove-columns-file1.tsv \
--split-on-commas False \
--columns location years
| node1 | label | node2 |
|---|---|---|
| john | zipcode | 12345 |
| john | zipcode | 12346 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
Remove Specific Columns using a Comma-Separated List¶
Copy file1.tsv, sending the output to standard output,
removing the columns location and years, using a
comma-separated list:
kgtk remove-columns -i examples/docs/remove-columns-file1.tsv \
--columns location,years
| node1 | label | node2 |
|---|---|---|
| john | zipcode | 12345 |
| john | zipcode | 12346 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
Remove Specific Columns using Quotes and Spaces¶
Copy file1.tsv, sending the output to standard output,
removing the columns location and years, using a
quoted, space-separated list:
kgtk remove-columns -i examples/docs/remove-columns-file1.tsv \
--split-on-spaces True \
--columns "location years"
| node1 | label | node2 |
|---|---|---|
| john | zipcode | 12345 |
| john | zipcode | 12346 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| peter | zipcode | 12040 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
| steve | zipcode | 45601 |
Removing Required Columns¶
Copy file1.tsv, sending the output to standard output, removing the columns
label and node2, using a space-separated list. The output file is an invalid
KGTK file (a quasi-KGTK file), which requires that "--mode=NONE" be specified:
kgtk remove-columns -i examples/docs/remove-columns-file1.tsv \
--split-on-spaces True --mode=NONE \
--columns "label node2"
| node1 | location | years |
|---|---|---|
| john | home | 10 |
| john | ||
| peter | home | |
| peter | cabin | |
| peter | work | 5 |
| peter | 6 | |
| steve | 3 | |
| steve | 4 | |
| steve | 5 | |
| steve | home | 1 |
| steve | work | 2 |
| steve | cabin |
Note
Quasi-KGTK input files may also be processed by specifying --mode=NONE.