Skip to content

ifnotempty

Summary

The ifnotempty command filters a KGTK file, passing through only those rows for which one (or more) specified columns contain nonempty values.

Note

The kgtk ifempty command computes the inverse output of this command.

Any or All

When multiple columns are specified, there is the choice of requiring any of the columns to be not empty or all of the columns to be not empty.

Count Only

kgtk ifnotempty --count reports the count of rows that passed the filter instead of copying the rows to the output file. The count will normally be reported to standard error; standard output will not receive any data.

Usage

usage: kgtk ifnotempty [-h] [-i INPUT_FILE] [-o OUTPUT_FILE]
                       [--reject-file REJECT_FILE] --columns
                       FILTER_COLUMN_NAMES [FILTER_COLUMN_NAMES ...]
                       [--count [True|False]] [--all [True|False]]
                       [-v [optional True|False]]

Filter a KGTK file based on whether one or more fields are not empty. When multiple fields are specified, either any field or all fields must be not empty.

Additional options are shown in expert help.
kgtk --expert ifnotempty --help

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input-file INPUT_FILE
                        The KGTK input file. (May be omitted or '-' for
                        stdin.)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        The KGTK output file. (May be omitted or '-' for
                        stdout.)
  --reject-file REJECT_FILE
                        The KGTK file for input records that fail the filter.
                        (Optional, use '-' for stdout.)
  --columns FILTER_COLUMN_NAMES [FILTER_COLUMN_NAMES ...]
                        The columns in the file being filtered (Required).
  --count [True|False]  Only count the records, do not copy them.
                        (default=False).
  --all [True|False]    False: Test if any are not empty, True: test if all
                        are not empty (default=False).

  -v [optional True|False], --verbose [optional True|False]
                        Print additional progress messages (default=False).

Examples

Suppose that file1.tsv contains the following table in KGTK format:

kgtk cat -i examples/docs/ifnotempty-file1.tsv
node1 label node2 location years
john zipcode 12345 home 10
john zipcode 12346
peter zipcode 12040 home
peter zipcode 12040 work 6
steve zipcode 45601 3
steve zipcode 45601

Pass Records with Nonempty Cells in a Single Column

kgtk ifnotempty -i examples/docs/ifnotempty-file1.tsv \
                --columns location
node1 label node2 location years
john zipcode 12345 home 10
peter zipcode 12040 home
peter zipcode 12040 work 6
kgtk ifnotempty -i examples/docs/ifnotempty-file1.tsv \
                --columns years
node1 label node2 location years
john zipcode 12345 home 10
peter zipcode 12040 work 6
steve zipcode 45601 3

Pass Records with Nonempty Cells in Either of Two Columns

kgtk ifnotempty -i examples/docs/ifnotempty-file1.tsv \
                --columns location years
node1 label node2 location years
john zipcode 12345 home 10
peter zipcode 12040 home
peter zipcode 12040 work 6
steve zipcode 45601 3

Pass Records with Nonempty Cells in Either of Two Columns with Rejects

kgtk ifnotempty -i examples/docs/ifnotempty-file1.tsv \
                --columns location years \
                --reject-file ifempty-file1-rejects.tsv
node1 label node2 location years
john zipcode 12345 home 10
peter zipcode 12040 home
peter zipcode 12040 work 6
steve zipcode 45601 3

Here are the rejected edges:

kgtk cat -i ifempty-file1-rejects.tsv
node1 label node2 location years
john zipcode 12346
steve zipcode 45601

Pass Records with Nonempty Cells in Both of Two Columns

kgtk ifempty -i examples/docs/ifnotempty-file1.tsv \
             --all --columns location years
node1 label node2 location years
john zipcode 12346
steve zipcode 45601

Count Records with Nonempty Cells in a Column

kgtk ifnotempty -i examples/docs/ifnotempty-file1.tsv \
                --count --columns years

The standard error output will be:

Read 6 records, 3 records passed the filter, 3 rejected.

Note

The expert option --errors-to-stdout can be used to route this message to standard output.