Skip to content

remove-columns

Overview

kgtk remove-columns removes a subset of the columns from a KGTK file.

Note

This comand can be used to remove the columns of non-KGTK TSV input files (quasi-KGTK files) by using the expert option --mode=NONE.

Note

The output file should still have required columns (id for a KGTK node file, (node1, label, node2) for a KGTK edge file). This requirement may be disabled with the expert option --mode=NONE, but the output file will not be a valid KGTK node or edge file.

Note

kgtk reorder-columns --trim may be used as an alternative to kgtk remove-columns.

Info

See kgtk rename-columns if you wish to rename columns.

See kgtk reorder-columns if you wish to reorder columns.

List of Column Names

When you use this command, you supply the --columns option with a list of column names to be removed from the output file.

Column names may be passed to the --columns option as an unquoted, space-separated list, as with other KGTK commands.

By default, column names are split on commas (,), unless --split-on-commas=FALSE is specified.

By default, leading and trailing whitespace is removed from column names, unless --strip-spaces=FALSE is specified.

Column names can be passed as a quoted list and split on spaces, if --split-on-spaces=TRUE is specified.

Usage

usage: kgtk remove-columns [-h] [-i INPUT_FILE] [-o OUTPUT_FILE] -c COLUMNS
                           [COLUMNS ...] [--split-on-commas [SPLIT_ON_COMMAS]]
                           [--split-on-spaces [SPLIT_ON_SPACES]]
                           [--strip-spaces [STRIP_SPACES]]
                           [--all-except [ALL_EXCEPT]]
                           [--ignore-missing-columns [IGNORE_MISSING_COLUMNS]]
                           [-v [optional True|False]]

Remove specific columns from a KGTK file.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input-file INPUT_FILE
                        The KGTK input file. (May be omitted or '-' for
                        stdin.)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        The KGTK output file. (May be omitted or '-' for
                        stdout.)
  -c COLUMNS [COLUMNS ...], --columns COLUMNS [COLUMNS ...]
                        Columns to remove as a comma- or space-separated
                        strings, e.g., id,docid or id docid
  --split-on-commas [SPLIT_ON_COMMAS]
                        When True, parse the list of columns, splitting on
                        commas. (default=True).
  --split-on-spaces [SPLIT_ON_SPACES]
                        When True, parse the list of columns, splitting on
                        spaces. (default=False).
  --strip-spaces [STRIP_SPACES]
                        When True, parse the list of columns, stripping
                        whitespace. (default=True).
  --all-except [ALL_EXCEPT]
                        When True, remove all columns except the listed ones.
                        (default=False).
  --ignore-missing-columns [IGNORE_MISSING_COLUMNS]
                        When True, ignore missing columns. (default=False).

  -v [optional True|False], --verbose [optional True|False]
                        Print additional progress messages (default=False).

Examples

Sample Data

Suppose that file1.tsv contains the following table in KGTK format:

kgtk cat -i examples/docs/remove-columns-file1.tsv
node1 label node2 location years
john zipcode 12345 home 10
john zipcode 12346
peter zipcode 12040 home
peter zipcode 12040 cabin
peter zipcode 12040 work 5
peter zipcode 12040 6
steve zipcode 45601 3
steve zipcode 45601 4
steve zipcode 45601 5
steve zipcode 45601 home 1
steve zipcode 45601 work 2
steve zipcode 45601 cabin

Remove Specific Columns using an Unquoted List

Copy file1.tsv, sending the output to standard output, removing the columns location and years, using an unquoted list:

kgtk remove-columns -i examples/docs/remove-columns-file1.tsv \
                    --columns location years
node1 label node2
john zipcode 12345
john zipcode 12346
peter zipcode 12040
peter zipcode 12040
peter zipcode 12040
peter zipcode 12040
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601

Remove Specific Columns using an Unquoted List, Allowing Commas

Copy file1.tsv, sending the output to standard output, removing the columns location and years, using an unquoted list that allows commas inside column names

Note

The sample data does not include a column name with commas in it, so this is not a very good example. If it did, there would be a warning message whenever a KGTK command reads the file's header record.

kgtk remove-columns -i examples/docs/remove-columns-file1.tsv \
                    --split-on-commas False \
                    --columns location years
node1 label node2
john zipcode 12345
john zipcode 12346
peter zipcode 12040
peter zipcode 12040
peter zipcode 12040
peter zipcode 12040
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601

Remove Specific Columns using a Comma-Separated List

Copy file1.tsv, sending the output to standard output, removing the columns location and years, using a comma-separated list:

kgtk remove-columns -i examples/docs/remove-columns-file1.tsv \
                    --columns location,years
node1 label node2
john zipcode 12345
john zipcode 12346
peter zipcode 12040
peter zipcode 12040
peter zipcode 12040
peter zipcode 12040
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601

Remove Specific Columns using Quotes and Spaces

Copy file1.tsv, sending the output to standard output, removing the columns location and years, using a quoted, space-separated list:

kgtk remove-columns -i examples/docs/remove-columns-file1.tsv \
                    --split-on-spaces True \
                    --columns "location years"
node1 label node2
john zipcode 12345
john zipcode 12346
peter zipcode 12040
peter zipcode 12040
peter zipcode 12040
peter zipcode 12040
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601
steve zipcode 45601

Removing Required Columns

Copy file1.tsv, sending the output to standard output, removing the columns label and node2, using a space-separated list. The output file is an invalid KGTK file (a quasi-KGTK file), which requires that "--mode=NONE" be specified:

kgtk remove-columns -i examples/docs/remove-columns-file1.tsv \
                    --split-on-spaces True --mode=NONE \
                    --columns "label node2"
node1 location years
john home 10
john
peter home
peter cabin
peter work 5
peter 6
steve 3
steve 4
steve 5
steve home 1
steve work 2
steve cabin

Note

Quasi-KGTK input files may also be processed by specifying --mode=NONE.