Skip to content

lift

Overview

The lift command copies its input file to its output file, adding label columns for values in the node1, label, and node2 fields. Options are available to control the columns being lifted, the source of the label values, and the destination column for the label values.

Memory Usage

By default, the input rows are saved in memory, as well as the value-to-label mapping. This will impose a limit on the size of the input files that can be processed.

Seperating the labels from the edges being lifted, and presorting each of the files, enables operation with reduced memory requirements.

Usage

usage: kgtk lift [-h] [-i INPUT_FILE] [-o OUTPUT_FILE]
                 [--label-file INPUT_FILE]
                 [--unmodified-row-output-file UNMODIFIED_ROW_OUTPUT_FILE]
                 [--matched-label-output-file MATCHED_LABEL_OUTPUT_FILE]
                 [--unmatched-label-output-file UNMATCHED_LABEL_OUTPUT_FILE]
                 [--columns-to-write [OUTPUT_LIFTED_COLUMN_NAMES [OUTPUT_LIFTED_COLUMN_NAMES ...]]]
                 [--default-value DEFAULT_VALUE]
                 [--suppress-empty-columns [True/False]]
                 [--ok-if-no-labels [True/False]]
                 [--prefilter-labels [True/False]]
                 [--input-file-is-presorted [True/False]]
                 [--label-file-is-presorted [True/False]]
                 [--clear-before-lift [CLEAR_BEFORE_LIFT]]
                 [--overwrite [OVERWRITE]]
                 [--output-only-modified-rows [OUTPUT_ONLY_MODIFIED_ROWS]]
                 [--languages [LANGUAGE [LANGUAGE ...]]]
                 [--prioritize [True/False]] [--use-label-envar [True/False]]
                 [-v [optional True|False]]

Lift labels for a KGTK file. If called as "kgtk lift", for each of the items in the (node1, label, node2) columns, look for matching label records. If called as "kgtk add-labels", look for matching label records for all input columns. If found, lift the label values into additional columns in the current record. Label records are removed from the output unless --remove-label-records=False. 

Additional options are shown in expert help.
kgtk --expert lift --help

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input-file INPUT_FILE
                        The KGTK input file. (May be omitted or '-' for
                        stdin.)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        The KGTK output file. (May be omitted or '-' for
                        stdout.)
  --label-file INPUT_FILE
                        A KGTK file with label records (Optional, use '-' for
                        stdin.)
  --unmodified-row-output-file UNMODIFIED_ROW_OUTPUT_FILE
                        A KGTK output file that will contain only unmodified
                        rows. This file will have the same columns as the
                        input file. (Optional, use '-' for stdout.)
  --matched-label-output-file MATCHED_LABEL_OUTPUT_FILE
                        A KGTK output file that will contain matched label
                        edges. This file will have the same columns as the
                        source of the labels, either the input file or the
                        label file. (Optional, use '-' for stdout.)
  --unmatched-label-output-file UNMATCHED_LABEL_OUTPUT_FILE
                        A KGTK output file that will contain unmatched label
                        edges. This file will have the same columns as the
                        source of the labels, either the input file or the
                        label file. (Optional, use '-' for stdout.)
  --columns-to-write [OUTPUT_LIFTED_COLUMN_NAMES [OUTPUT_LIFTED_COLUMN_NAMES ...]]
                        The columns into which to store the lifted values. The
                        default is [node1;label, label;label, node2;label] or
                        their aliases.
  --default-value DEFAULT_VALUE
                        The value to use if a lifted label is not found.
                        (default=)
  --suppress-empty-columns [True/False]
                        If true, do not create new columns that would be
                        empty. (default=False).
  --ok-if-no-labels [True/False]
                        If true, do not abort if no labels were found.
                        (default=False).
  --prefilter-labels [True/False]
                        If true, read the input file before reading the label
                        file. (default=False).
  --input-file-is-presorted [True/False]
                        If true, the input file is presorted on the column for
                        which values are to be lifted. (default=False).
  --label-file-is-presorted [True/False]
                        If true, the label file is presorted on the node1
                        column. (default=False).
  --clear-before-lift [CLEAR_BEFORE_LIFT]
                        If true, set columns to write to the default value
                        before lifting. (default=False).
  --overwrite [OVERWRITE]
                        If true, overwrite non-default values in the columns
                        to write. If false, do not overwrite non-default
                        values in the columns to write. (default=True).
  --output-only-modified-rows [OUTPUT_ONLY_MODIFIED_ROWS]
                        If true, output only modified edges to the primary
                        output stream. (default=False).
  --languages [LANGUAGE [LANGUAGE ...]]
                        Lift only labels with a matching language qualifier.
                        ANY means any language qualifier. NONE means no
                        language qualifier. (default=ANY NONE)
  --prioritize [True/False]
                        If true and filtering labels by language, pick only
                        the label matching the language that appears before
                        other matches in the language list. (default=False).
  --use-label-envar [True/False]
                        If true, use the KGTK_LABEL_FILE envar for the label
                        file if no --label-file. (default=False).

  -v [optional True|False], --verbose [optional True|False]
                        Print additional progress messages (default=False).

Examples

Sample Data

Suppose that lift-file1.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file1.tsv
node1 label node2
Q1 P1 Q5
Q1 P2 Q6
Q1 label "Elmo"
Q2 label "Alice"
P1 label "instance of"
P2 label "friend"
Q5 label "human"
Q6 P1 Q5
Q6 label "Fred"

Default Lift

kgtk lift --input-file examples/docs/lift-file1.tsv

The output will be the following table in KGTK format:

node1 label node2 node1;label label;label node2;label
Q1 P1 Q5 "Elmo" "instance of" "human"
Q1 P2 Q6 "Elmo" "friend" "Fred"
Q6 P1 Q5 "Fred" "instance of" "human"

kgtk lift has moved the labels into additional columns and removed the label edges from the output file.

Multiple Labels

By default, kgtk lift will build a list of labels if multiple label records are found for a property. The labels in the list will be sorted and deduplicated.

Suppose that lift-file4.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file4.tsv
node1 label node2
Q1 P1 Q5
Q1 P2 Q6
Q1 label "Elmo"
Q2 label "Alice"
P1 label "instance of"
P2 label "friend"
P2 label "amigo"
Q5 label "human"
Q5 label "homo sapiens"
Q5 label "human"
Q6 P1 Q5
Q6 label "Fred"

Lift this file with no additional arguments:

kgtk lift --input-file examples/docs/lift-file4.tsv
node1 label node2 node1;label label;label node2;label
Q1 P1 Q5 "Elmo" "instance of" "homo sapiens"|"human"
Q1 P2 Q6 "Elmo" "amigo"|"friend" "Fred"
Q6 P1 Q5 "Fred" "instance of" "homo sapiens"|"human"

Lifting Specific Columns

Lift this file, lifting just the node1 column:

kgtk lift --input-file examples/docs/lift-file4.tsv \
          --columns-to-lift node1

The output will be the following table in KGTK format:

node1 label node2 node1;label
Q1 P1 Q5 "Elmo"
Q1 P2 Q6 "Elmo"
Q6 P1 Q5 "Fred"

Seperate Input Files

The labels may be in a seperate file from the input. If --suppress-empty-columns is False (its default), then the input file may be processed in a single pass without keeping a copy in memory. The labels will still be loaded into an in-memory dictionary.

Suppose that lift-file5.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file5.tsv
node1 label node2
Q1 P1 Q5
Q1 P2 Q6
Q6 P1 Q5

And lift-file6.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file6.tsv
node1 label node2
Q1 label "Elmo"
Q2 label "Alice"
Q5 label "human"
Q6 label "Fred"
P1 label "instance of"
P2 label "friend"
kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file6.tsv \
          --columns-to-lift node1

The output will be the following table in KGTK format:

node1 label node2 node1;label
Q1 P1 Q5 "Elmo"
Q1 P2 Q6 "Elmo"
Q6 P1 Q5 "Fred"

Presorted Input Files

If the labels are in a seperate file from the input rows, and the labels are sorted on the node1 column, and the only a single column will be lifted from the input rows, and the input file is sorted on that column, and if --suppress-empty-columns is False (its default), then the data may be processed using a merge algorithm that does not use in-memory buffering. This is useful if the input and label files are both very large.

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --input-file-is-presorted \
          --label-file examples/docs/lift-file6.tsv \
          --label-file-is-presorted \
          --columns-to-lift node1

The output will be the following table in KGTK format:

node1 label node2 node1;label
Q1 P1 Q5 "Elmo"
Q1 P2 Q6 "Elmo"
Q6 P1 Q5 "Fred"

Small Input, Many Labels

If the label file is very large but not sorted, and the input file is small enough to fit in memory, then one alternate approach is to use --prefilter-labels. This causes the input file to be read into memory first, then the values that need labels are extracted from it. Next, the label file is read, filtering out unneeded labels and keeping only needed labels in memory. Finally, the output file is generated from the in-memory copy of the input file and the labels. Multiple columns may be lifted in a single pass with this approach.

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file6.tsv \
          --prefilter-labels
node1 label node2 node1;label label;label node2;label
Q1 P1 Q5 "Elmo" "instance of" "human"
Q1 P2 Q6 "Elmo" "friend" "Fred"
Q6 P1 Q5 "Fred" "instance of" "human"

Duplicate Labels

Suppose that lift-file7.tsv contains the following table in KGTK format, which is sorted on the node1 column:

kgtk cat --input-file examples/docs/lift-file7.tsv
node1 label node2
P1 label "instance of"
P2 label "friend"
Q1 label "Elmo"
Q2 label "Alice"
Q5 label "human"
Q6 label "Fred"
Q6 label "Wilma"
Q6 label "Wilma"

Lift the duplicate labels, using the presorted options:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --input-file-is-presorted \
          --label-file examples/docs/lift-file7.tsv \
          --label-file-is-presorted \
          --columns-to-lift node1

The output will be the following table in KGTK format:

node1 label node2 node1;label
Q1 P1 Q5 "Elmo"
Q1 P2 Q6 "Elmo"
Q6 P1 Q5 "Fred"|"Wilma"

More Sample Data

Suppose that lift-file8.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file8.tsv
node1 label node2 confident
Q1 P1 Q5 True
Q1 P2 Q6 True
Q2 P1 Q5 False
Q2 P2 Q6 False

and suppose that lift-file9.tsv contains the following file in KGTK format:

kgtk cat --input-file examples/docs/lift-file9.tsv
node1 label node2 full-name
P1 label "instance of"
P2 label "friend"
P3 label "enemy"
Q1 name "Elmo" "Elmo Fudd"
Q2 name "Alice" "Alice Cooper"
Q5 species "human"
Q6 name "Fred" "Fred Rogers"

Default Lift, Seperate Label File

Let's start with a default lift with the seperate label file:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv
node1 label node2 confident node1;label label;label node2;label
Q1 P1 Q5 True "instance of"
Q1 P2 Q6 True "friend"
Q2 P1 Q5 False "instance of"
Q2 P2 Q6 False "friend"

Lift a Single Property

Now, let's lift the name property (label column value):

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
      --property name
node1 label node2 confident node1;label label;label node2;label
Q1 P1 Q5 True "Elmo"
Q1 P2 Q6 True "Elmo" "Fred"
Q2 P1 Q5 False "Alice"
Q2 P2 Q6 False "Alice" "Fred"

Lift with a Column Name Suffix

Now, let's lift the name property, using ";name" as the column name suffix:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-suffix ";name"
node1 label node2 confident node1;name label;name node2;name
Q1 P1 Q5 True "Elmo"
Q1 P2 Q6 True "Elmo" "Fred"
Q2 P1 Q5 False "Alice"
Q2 P2 Q6 False "Alice" "Fred"

Note

The ;node argument needs to be quoted on the command line, since ; is a shell metacharacter.

Lift from a Specific Column

Let's lift the full names column. The --lift-from option (also known as the label-value-column option) allows us to lift from a column other than the default, node2:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name
node1 label node2 confident node1;label label;label node2;label
Q1 P1 Q5 True "Elmo Fudd"
Q1 P2 Q6 True "Elmo Fudd" "Fred Rogers"
Q2 P1 Q5 False "Alice Cooper"
Q2 P2 Q6 False "Alice Cooper" "Fred Rogers"

Lift from a Specific Column with a Column Name Suffix

Let's lift the full names again, this time using ";full-name" as the column name suffix instead of "label".

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name"
node1 label node2 confident node1;full-name label;full-name node2;full-name
Q1 P1 Q5 True "Elmo Fudd"
Q1 P2 Q6 True "Elmo Fudd" "Fred Rogers"
Q2 P1 Q5 False "Alice Cooper"
Q2 P2 Q6 False "Alice Cooper" "Fred Rogers"

Note

The ;full-name needs to be quoted on the command line, since ; is a shell metacharacter.

Outputting Only Modified Rows

Let's output only modified rows. We will start by outputting all rows:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name" \
      --columns-to-lift node2
node1 label node2 confident node2;full-name
Q1 P1 Q5 True
Q1 P2 Q6 True "Fred Rogers"
Q2 P1 Q5 False
Q2 P2 Q6 False "Fred Rogers"

Next, we will output only the modified rows:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name" \
      --columns-to-lift node2 \
      --output-only-modified-rows
node1 label node2 confident node2;full-name
Q1 P2 Q6 True "Fred Rogers"
Q2 P2 Q6 False "Fred Rogers"

Unmodified Row Output File

Suppose we want to isolate the unmodified rows for further processing. We can send them to the unmodified row output file.

We will send only the modified rows to the primary output stream by using --output-only-modified-rows.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name" \
      --columns-to-lift node2 \
      --output-only-modified-rows \
      --unmodified-row-output-file lift-unmodified-rows.tsv
node1 label node2 confident node2;full-name
Q1 P2 Q6 True "Fred Rogers"
Q2 P2 Q6 False "Fred Rogers"

Here are the unmodified rows:

kgtk cat -i lift-unmodified-rows.tsv
node1 label node2 confident
Q1 P1 Q5 True
Q2 P1 Q5 False

Note

The unmodified row output file has the same columns as the primary input file. In this example, it does not have the node2;full-name column that was added to the primary output file.

Matched Label Output File

Suppose we are interested in finding which label file edges were matched with input file edges during the lift. The --matched-label-output-file OUTPUT_FILE option provides a simple way to get this list.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name" \
      --columns-to-lift node2 \
      --output-only-modified-rows \
      --matched-label-output-file lift-matched-labels.tsv
node1 label node2 confident node2;full-name
Q1 P2 Q6 True "Fred Rogers"
Q2 P2 Q6 False "Fred Rogers"

Here are the matched labels:

kgtk cat -i lift-matched-labels.tsv
node1 label node2 full-name
Q6 name "Fred" "Fred Rogers"

Note

The matched label output file has the same columns as the label file when a label file has been specified. Otherwise, the matched label file has the same columns as the primary input file.

Note

Logically, there should be an --unmatched-labels-output-file OUTPUT_FILE option. This option may be added in the future.

Note

It may be useful if the matched label output file had an additional column with a count of the number of matches. This option may be added in the future.

Unmatched Label Output File

Suppose we are interested in finding which label file edges were not matched with input file edges during the lift. The --unmatched-label-output-file OUTPUT_FILE option provides a simple way to get this list.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name" \
      --columns-to-lift node2 \
      --output-only-modified-rows \
      --unmatched-label-output-file lift-unmatched-labels.tsv
node1 label node2 confident node2;full-name
Q1 P2 Q6 True "Fred Rogers"
Q2 P2 Q6 False "Fred Rogers"

Here are the unmatched labels:

kgtk cat -i lift-unmatched-labels.tsv
node1 label node2 full-name
Q1 name "Elmo" "Elmo Fudd"
Q2 name "Alice" "Alice Cooper"

Note

The unmatched label output file has the same columns as the label file when a label file has been specified. Otherwise, the unmatched label file has the same columns as the primary input file.

Lifting Labels in a Specific Language

Supposelift-file11.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file11.tsv
node1 label node2
Q1 label 'Elmo'@en
Q2 label 'Alice'@en
Q5 label "human"
Q6 label 'Frances'@fr
P1 label "instance of"
P2 label "friend"

Lift only labels that are qualified as English, ignoring lables without language qualifiers:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv \
          --language en
node1 label node2 node1;label label;label node2;label
Q1 P1 Q5 'Elmo'@en
Q1 P2 Q6 'Elmo'@en
Q6 P1 Q5

Lifting Labels in Multiple Languages

Lift only labels that are qualified as English or French, ignoring lables without language qualifiers:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv \
          --languages en fr
node1 label node2 node1;label label;label node2;label
Q1 P1 Q5 'Elmo'@en
Q1 P2 Q6 'Elmo'@en 'Frances'@fr
Q6 P1 Q5 'Frances'@fr

Lifting Labels Qualified with Any Language

Lift only labels that are qualified as English, ignoring lables without language qualifiers:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv \
          --language ANY
node1 label node2 node1;label label;label node2;label
Q1 P1 Q5 'Elmo'@en
Q1 P2 Q6 'Elmo'@en 'Frances'@fr
Q6 P1 Q5 'Frances'@fr

Lifting Labels that Are Not Language Qualified

Lift only labels that are qualified as English, ignoring lables without language qualifiers:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv \
          --language NONE
node1 label node2 node1;label label;label node2;label
Q1 P1 Q5 "instance of" "human"
Q1 P2 Q6 "friend"
Q6 P1 Q5 "instance of" "human"

Lifting Labels in a Specific Language or Without Language Qualification

Lift only labels that are qualified as English, ignoring lables without language qualifiers:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv \
          --language en NONE
node1 label node2 node1;label label;label node2;label
Q1 P1 Q5 'Elmo'@en "instance of" "human"
Q1 P2 Q6 'Elmo'@en "friend"
Q6 P1 Q5 "instance of" "human"

Lift Labels with Prioritized Languages

Supposelift-file12.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file12.tsv
node1 label node2
Q1 label 'Elmo'@en
Q1 label 'Sr Elmo'@es
Q2 label 'Alice'@en
Q2 label 'Alicia'@es
Q5 label "human"
Q6 label 'Frank'@en
Q6 label 'Frances'@fr
Q6 label 'Francisco'@es
P1 label "instance of"
P2 label "friend"

Lift only labels that are qualified as English, ignoring labels without language qualifiers:

kgtk add-labels --input-file examples/docs/lift-file5.tsv \
                --label-file examples/docs/lift-file12.tsv \
                --language en
node1 label node2 node1;label node2;label
Q1 P1 Q5 'Elmo'@en
Q1 P2 Q6 'Elmo'@en 'Frank'@en
Q6 P1 Q5 'Frank'@en

Lift only labels that are qualified as Spanish, ignoring labels without language qualifiers:

kgtk add-labels --input-file examples/docs/lift-file5.tsv \
                --label-file examples/docs/lift-file12.tsv \
                --language es
node1 label node2 node1;label node2;label
Q1 P1 Q5 'Sr Elmo'@es
Q1 P2 Q6 'Sr Elmo'@es 'Francisco'@es
Q6 P1 Q5 'Francisco'@es

Lift only labels that are qualified as English or Spanish, preferring English labels, ignoring labels without language qualifiers:

kgtk add-labels --input-file examples/docs/lift-file5.tsv \
                --label-file examples/docs/lift-file12.tsv \
                --languages en es
node1 label node2 node1;label node2;label
Q1 P1 Q5 'Elmo'@en
Q1 P2 Q6 'Elmo'@en 'Frank'@en
Q6 P1 Q5 'Frank'@en

Lift only labels that are qualified as English or Spanish, preferring Spanish labels, ignoring labels without language qualifiers:

kgtk add-labels --input-file examples/docs/lift-file5.tsv \
                --label-file examples/docs/lift-file12.tsv \
                --languages es en
node1 label node2 node1;label node2;label
Q1 P1 Q5 'Sr Elmo'@es
Q1 P2 Q6 'Sr Elmo'@es 'Francisco'@es
Q6 P1 Q5 'Francisco'@es

Lift only labels that are qualified as French, English, or Spanish, preferring the labels in that order, ignoring labels without language qualifiers:

kgtk add-labels --input-file examples/docs/lift-file5.tsv \
                --label-file examples/docs/lift-file12.tsv \
                --languages fr en es
node1 label node2 node1;label node2;label
Q1 P1 Q5 'Elmo'@en
Q1 P2 Q6 'Elmo'@en 'Frances'@fr
Q6 P1 Q5 'Frances'@fr

Expert Example: Input Filtering

Let's list the full names only when we are confident in the relationship. The expert options --input-select-column INPUT_SELECT_COLUMN_NAME and --input-select-value INPUT_SELECT_COLUMN_VALUE provide a built-in filter operation.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          -p name \
          --label-value-column full-name \
          --input-select-column confident \
          --input-select-value True
node1 label node2 confident node1;label label;label node2;label
Q1 P1 Q5 True "Elmo Fudd"
Q1 P2 Q6 True "Elmo Fudd" "Fred Rogers"
Q2 P1 Q5 False
Q2 P2 Q6 False

Expert Example: Lifting into node2

Let's lift full names into the node2 column, replacing the existing values there. We can do this by specifying --columns-to-lift node2 and giving an empty --lift-suffix.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --columns-to-lift node2 \
          --lift-suffix ""
node1 label node2 confident
Q1 P1 Q5 True
Q1 P2 "Fred Rogers" True
Q2 P1 Q5 False
Q2 P2 "Fred Rogers" False

Note

--lift-suffix ""' uses shell quotes to specify an empty value.--lift-suffix=` is another way to specify the empty lift suffix, and does not require shell quoting.

Note

This procedure, repeated for the node1, label, and node2 columns, can be used to transform relationships from one knowledge base system to another.

Expert Example: Lifting into node2, Outputting Only Modified Rows

Let's lift full names into the node2 column, replacing the existing values there. We can do this by specifying --columns-to-lift node2 and giving an empty --lift-suffix. We will output only modified rows.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --columns-to-lift node2 \
          --lift-suffix= \
      --output-only-modified-rows
node1 label node2 confident
Q1 P2 "Fred Rogers" True
Q2 P2 "Fred Rogers" False

Expert Example: Update Lifted Relationships

Let's lift full names into the node2 column, changing the label of the relationahip when we do so.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --columns-to-lift node2 \
          --lift-suffix "" \
          --update-select-value FullName
node1 label node2 confident
Q1 P1 Q5 True
Q1 FullName "Fred Rogers" True
Q2 P1 Q5 False
Q2 FullName "Fred Rogers" False

Expert Example: Overriding the Label Match and Value Columns

Consider the following file, lift-file10.tsv, which is like lift-file9.tsv, but with the node1 and node2 columns swapped and with an additional column, action:

kgtk cat --input-file examples/docs/lift-file10.tsv
node1 label node2 full-name action
"instance of" label P1 go
"friend" label P2 go
"enemy" label P3 go
"Elmo" name Q1 "Elmo Fudd" go
"Alice" name Q2 "Alice Cooper" go
"human" species Q5 go
"Fred" name Q6 "Fred Rogers" go

Let's lift full names from this file. We'll swap the function of the node1 and node2 columns in the label file:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file10.tsv \
      --property name \
      --lift-from full-name \
      --columns-to-lift node2 \
      --label-match-column node2 \
      --label-value-column node1
node1 label node2 confident node2;label
Q1 P1 Q5 True
Q1 P2 Q6 True "Fred"
Q2 P1 Q5 False
Q2 P2 Q6 False "Fred"

Expert Example: Selecting the Labels to Lift

Let's pick up all labels using the action column's go value to select the labels that we pick:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file10.tsv \
          --label-select-column action \
          --label-select-value go \
          --label-match-column node2 \
          --label-value-column node1
node1 label node2 confident node1;label label;label node2;label
Q1 P1 Q5 True "Elmo" "instance of" "human"
Q1 P2 Q6 True "Elmo" "friend" "Fred"
Q2 P1 Q5 False "Alice" "instance of" "human"
Q2 P2 Q6 False "Alice" "friend" "Fred"

If we hadn't filtered the labels, the output would have looked like this:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file10.tsv \
          --label-match-column node2 \
          --label-value-column node1
node1 label node2 confident node1;label label;label node2;label
Q1 P1 Q5 True "instance of"
Q1 P2 Q6 True "friend"
Q2 P1 Q5 False "instance of"
Q2 P2 Q6 False "friend"