Skip to content

Overview

The expand command copies its input file to its output file, expanding | lists into multiple records.

Usage

usage: kgtk expand [-h] [-i INPUT_FILE] [-o OUTPUT_FILE]
                   [--columns KEY_COLUMN_NAMES [KEY_COLUMN_NAMES ...]]
                   [-v [optional True|False]]

Copy a KGTK file, expanding | lists into multiple records. 

Additional options are shown in expert help.
kgtk --expert expand --help

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input-file INPUT_FILE
                        The KGTK input file. (May be omitted or '-' for
                        stdin.)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        The KGTK output file. (May be omitted or '-' for
                        stdout.)
  --columns KEY_COLUMN_NAMES [KEY_COLUMN_NAMES ...]
                        The key columns will not be expanded. They will be
                        repeated on each output record. (default=id for node
                        files, (node1, label, node2) for edge files).

  -v [optional True|False], --verbose [optional True|False]
                        Print additional progress messages (default=False).

Examples

Normal Expansion

Suppose that file1.tsv contains the following table in KGTK format:

kgtk cat -i examples/docs/expand-file1.tsv
node1 label node2 location years
john zipcode 12345 home 10
john zipcode 12346
peter zipcode 12040 home|cabin
peter zipcode 12040 work 5|6
steve zipcode 45601 3|4|5
steve zipcode 45601 home|work|cabin 1|2
kgtk expand -i examples/docs/expand-file1.tsv

The output will be the following table in KGTK format:

node1 label node2 location years
john zipcode 12345 home 10
john zipcode 12346
peter zipcode 12040 home
peter zipcode 12040 cabin
peter zipcode 12040 work 5
peter zipcode 12040 6
steve zipcode 45601 3
steve zipcode 45601 4
steve zipcode 45601 5
steve zipcode 45601 home 1
steve zipcode 45601 work 2
steve zipcode 45601 cabin

Expanding node2

Suppose you are importing an edge file (file.tsv)into KGTK format, with columns node1, label, and node2, but node2 contains | lists. KGTK File Format v2 prohibits | lists in the node1, label, or node2 columns of an edge file.

kgtk cat -i examples/docs/expand-file2.tsv
node1 label node2 location years
john zipcode 12345|12346 home 10
peter zipcode 12040 home|cabin
peter zipcode 12040 work 5|6
steve zipcode 45601 3|4|5
steve zipcode 45601 home|work|cabin 1|2

The following command will expand the node2 values, resulting in a valid (all else being valid) KGTK edge file:

kgtk expand -i examples/docs/expand-file2.tsv --mode=NONE --columns node1 label
node1 label node2 location years
john zipcode 12345 home 10
john zipcode 12346
peter zipcode 12040 home
peter zipcode cabin
peter zipcode 12040 work 5
peter zipcode 6
steve zipcode 45601 3
steve zipcode 4
steve zipcode 5
steve zipcode 45601 home 1
steve zipcode work 2
steve zipcode cabin