lift

Overview¶

The lift command copies its input file to its output file, adding label columns for values in the node1, label, and node2 fields. Options are available to control the columns being lifted, the source of the label values, and the destination column for the label values.

Memory Usage¶

By default, the input rows are saved in memory, as well as the value-to-label mapping. This will impose a limit on the size of the input files that can be processed.

Seperating the labels from the edges being lifted, and presorting each of the files, enables operation with reduced memory requirements.

Usage¶

usage: kgtk lift [-h] [-i INPUT_FILE] [-o OUTPUT_FILE]
                 [--label-file INPUT_FILE]
                 [--unmodified-row-output-file UNMODIFIED_ROW_OUTPUT_FILE]
                 [--matched-label-output-file MATCHED_LABEL_OUTPUT_FILE]
                 [--unmatched-label-output-file UNMATCHED_LABEL_OUTPUT_FILE]
                 [--columns-to-write [OUTPUT_LIFTED_COLUMN_NAMES ...]]
                 [--default-value DEFAULT_VALUE]
                 [--suppress-empty-columns [True/False]]
                 [--ok-if-no-labels [True/False]]
                 [--prefilter-labels [True/False]]
                 [--input-file-is-presorted [True/False]]
                 [--label-file-is-presorted [True/False]]
                 [--clear-before-lift [CLEAR_BEFORE_LIFT]]
                 [--overwrite [OVERWRITE]]
                 [--output-only-modified-rows [OUTPUT_ONLY_MODIFIED_ROWS]]
                 [--languages [LANGUAGE ...]] [--prioritize [True/False]]
                 [--use-label-envar [True/False]] [-v [optional True|False]]

Lift labels for a KGTK file. If called as "kgtk lift", for each of the items in the (node1, label, node2) columns, look for matching label records. If called as "kgtk add-labels", look for matching label records for all input columns. If found, lift the label values into additional columns in the current record. Label records are removed from the output unless --remove-label-records=False. 

Additional options are shown in expert help.
kgtk --expert lift --help

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input-file INPUT_FILE
                        The KGTK input file. (May be omitted or '-' for
                        stdin.)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        The KGTK output file. (May be omitted or '-' for
                        stdout.)
  --label-file INPUT_FILE
                        A KGTK file with label records (Optional, use '-' for
                        stdin.)
  --unmodified-row-output-file UNMODIFIED_ROW_OUTPUT_FILE
                        A KGTK output file that will contain only unmodified
                        rows. This file will have the same columns as the
                        input file. (Optional, use '-' for stdout.)
  --matched-label-output-file MATCHED_LABEL_OUTPUT_FILE
                        A KGTK output file that will contain matched label
                        edges. This file will have the same columns as the
                        source of the labels, either the input file or the
                        label file. (Optional, use '-' for stdout.)
  --unmatched-label-output-file UNMATCHED_LABEL_OUTPUT_FILE
                        A KGTK output file that will contain unmatched label
                        edges. This file will have the same columns as the
                        source of the labels, either the input file or the
                        label file. (Optional, use '-' for stdout.)
  --columns-to-write [OUTPUT_LIFTED_COLUMN_NAMES ...]
                        The columns into which to store the lifted values. The
                        default is [node1;label, label;label, node2;label] or
                        their aliases.
  --default-value DEFAULT_VALUE
                        The value to use if a lifted label is not found.
                        (default=)
  --suppress-empty-columns [True/False]
                        If true, do not create new columns that would be
                        empty. (default=False).
  --ok-if-no-labels [True/False]
                        If true, do not abort if no labels were found.
                        (default=False).
  --prefilter-labels [True/False]
                        If true, read the input file before reading the label
                        file. (default=False).
  --input-file-is-presorted [True/False]
                        If true, the input file is presorted on the column for
                        which values are to be lifted. (default=False).
  --label-file-is-presorted [True/False]
                        If true, the label file is presorted on the node1
                        column. (default=False).
  --clear-before-lift [CLEAR_BEFORE_LIFT]
                        If true, set columns to write to the default value
                        before lifting. (default=False).
  --overwrite [OVERWRITE]
                        If true, overwrite non-default values in the columns
                        to write. If false, do not overwrite non-default
                        values in the columns to write. (default=True).
  --output-only-modified-rows [OUTPUT_ONLY_MODIFIED_ROWS]
                        If true, output only modified edges to the primary
                        output stream. (default=False).
  --languages [LANGUAGE ...]
                        Lift only labels with a matching language qualifier.
                        ANY means any language qualifier. NONE means no
                        language qualifier. (default=ANY NONE)
  --prioritize [True/False]
                        If true and filtering labels by language, pick only
                        the label matching the language that appears before
                        other matches in the language list. (default=False).
  --use-label-envar [True/False]
                        If true, use the KGTK_LABEL_FILE envar for the label
                        file if no --label-file. (default=False).

  -v [optional True|False], --verbose [optional True|False]
                        Print additional progress messages (default=False).

Examples¶

Sample Data¶

Suppose that lift-file1.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file1.tsv

node1	label	node2
Q1	P1	Q5
Q1	P2	Q6
Q1	label	"Elmo"
Q2	label	"Alice"
P1	label	"instance of"
P2	label	"friend"
Q5	label	"human"
Q6	P1	Q5
Q6	label	"Fred"

Default Lift¶

kgtk lift --input-file examples/docs/lift-file1.tsv

The output will be the following table in KGTK format:

node1	label	node2	node1;label	label;label	node2;label
Q1	P1	Q5	"Elmo"	"instance of"	"human"
Q1	P2	Q6	"Elmo"	"friend"	"Fred"
Q6	P1	Q5	"Fred"	"instance of"	"human"

kgtk lift has moved the labels into additional columns and removed the label edges from the output file.

Multiple Labels¶

By default, kgtk lift will build a list of labels if multiple label records are found for a property. The labels in the list will be sorted and deduplicated.

Suppose that lift-file4.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file4.tsv

node1	label	node2
Q1	P1	Q5
Q1	P2	Q6
Q1	label	"Elmo"
Q2	label	"Alice"
P1	label	"instance of"
P2	label	"friend"
P2	label	"amigo"
Q5	label	"human"
Q5	label	"homo sapiens"
Q5	label	"human"
Q6	P1	Q5
Q6	label	"Fred"

Lift this file with no additional arguments:

kgtk lift --input-file examples/docs/lift-file4.tsv

node1	label	node2	node1;label	label;label	node2;label
Q1	P1	Q5	"Elmo"	"instance of"	"homo sapiens"\|"human"
Q1	P2	Q6	"Elmo"	"amigo"\|"friend"	"Fred"
Q6	P1	Q5	"Fred"	"instance of"	"homo sapiens"\|"human"

Lifting Specific Columns¶

Lift this file, lifting just the node1 column:

kgtk lift --input-file examples/docs/lift-file4.tsv \
          --columns-to-lift node1

The output will be the following table in KGTK format:

node1	label	node2	node1;label
Q1	P1	Q5	"Elmo"
Q1	P2	Q6	"Elmo"
Q6	P1	Q5	"Fred"

Seperate Input Files¶

The labels may be in a seperate file from the input. If --suppress-empty-columns is False (its default), then the input file may be processed in a single pass without keeping a copy in memory. The labels will still be loaded into an in-memory dictionary.

Suppose that lift-file5.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file5.tsv

node1	label	node2
Q1	P1	Q5
Q1	P2	Q6
Q6	P1	Q5

And lift-file6.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file6.tsv

node1	label	node2
Q1	label	"Elmo"
Q2	label	"Alice"
Q5	label	"human"
Q6	label	"Fred"
P1	label	"instance of"
P2	label	"friend"

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file6.tsv \
          --columns-to-lift node1

The output will be the following table in KGTK format:

node1	label	node2	node1;label
Q1	P1	Q5	"Elmo"
Q1	P2	Q6	"Elmo"
Q6	P1	Q5	"Fred"

Presorted Input Files¶

If the labels are in a seperate file from the input rows, and the labels are sorted on the node1 column, and the only a single column will be lifted from the input rows, and the input file is sorted on that column, and if --suppress-empty-columns is False (its default), then the data may be processed using a merge algorithm that does not use in-memory buffering. This is useful if the input and label files are both very large.

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --input-file-is-presorted \
          --label-file examples/docs/lift-file6.tsv \
          --label-file-is-presorted \
          --columns-to-lift node1

The output will be the following table in KGTK format:

node1	label	node2	node1;label
Q1	P1	Q5	"Elmo"
Q1	P2	Q6	"Elmo"
Q6	P1	Q5	"Fred"

Small Input, Many Labels¶

If the label file is very large but not sorted, and the input file is small enough to fit in memory, then one alternate approach is to use --prefilter-labels. This causes the input file to be read into memory first, then the values that need labels are extracted from it. Next, the label file is read, filtering out unneeded labels and keeping only needed labels in memory. Finally, the output file is generated from the in-memory copy of the input file and the labels. Multiple columns may be lifted in a single pass with this approach.

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file6.tsv \
          --prefilter-labels

node1	label	node2	node1;label	label;label	node2;label
Q1	P1	Q5	"Elmo"	"instance of"	"human"
Q1	P2	Q6	"Elmo"	"friend"	"Fred"
Q6	P1	Q5	"Fred"	"instance of"	"human"

Duplicate Labels¶

Suppose that lift-file7.tsv contains the following table in KGTK format, which is sorted on the node1 column:

kgtk cat --input-file examples/docs/lift-file7.tsv

node1	label	node2
P1	label	"instance of"
P2	label	"friend"
Q1	label	"Elmo"
Q2	label	"Alice"
Q5	label	"human"
Q6	label	"Fred"
Q6	label	"Wilma"
Q6	label	"Wilma"

Lift the duplicate labels, using the presorted options:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --input-file-is-presorted \
          --label-file examples/docs/lift-file7.tsv \
          --label-file-is-presorted \
          --columns-to-lift node1

The output will be the following table in KGTK format:

node1	label	node2	node1;label
Q1	P1	Q5	"Elmo"
Q1	P2	Q6	"Elmo"
Q6	P1	Q5	"Fred"\|"Wilma"

More Sample Data¶

Suppose that lift-file8.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file8.tsv

node1	label	node2	confident
Q1	P1	Q5	True
Q1	P2	Q6	True
Q2	P1	Q5	False
Q2	P2	Q6	False

and suppose that lift-file9.tsv contains the following file in KGTK format:

kgtk cat --input-file examples/docs/lift-file9.tsv

node1	label	node2	full-name
P1	label	"instance of"
P2	label	"friend"
P3	label	"enemy"
Q1	name	"Elmo"	"Elmo Fudd"
Q2	name	"Alice"	"Alice Cooper"
Q5	species	"human"
Q6	name	"Fred"	"Fred Rogers"

Default Lift, Seperate Label File¶

Let's start with a default lift with the seperate label file:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv

node1	label	node2	confident	label;label
Q1	P1	Q5	True	"instance of"
Q1	P2	Q6	True	"friend"
Q2	P1	Q5	False	"instance of"
Q2	P2	Q6	False	"friend"

Lift a Single Property¶

Now, let's lift the name property (label column value):

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
      --property name

node1	label	node2	confident	node1;label	node2;label
Q1	P1	Q5	True	"Elmo"
Q1	P2	Q6	True	"Elmo"	"Fred"
Q2	P1	Q5	False	"Alice"
Q2	P2	Q6	False	"Alice"	"Fred"

Lift with a Column Name Suffix¶

Now, let's lift the name property, using ";name" as the column name suffix:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-suffix ";name"

node1	label	node2	confident	node1;name	node2;name
Q1	P1	Q5	True	"Elmo"
Q1	P2	Q6	True	"Elmo"	"Fred"
Q2	P1	Q5	False	"Alice"
Q2	P2	Q6	False	"Alice"	"Fred"

Note

The ;node argument needs to be quoted on the command line, since ; is a shell metacharacter.

Lift from a Specific Column¶

Let's lift the full names column. The --lift-from option (also known as the label-value-column option) allows us to lift from a column other than the default, node2:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name

node1	label	node2	confident	node1;label	node2;label
Q1	P1	Q5	True	"Elmo Fudd"
Q1	P2	Q6	True	"Elmo Fudd"	"Fred Rogers"
Q2	P1	Q5	False	"Alice Cooper"
Q2	P2	Q6	False	"Alice Cooper"	"Fred Rogers"

Lift from a Specific Column with a Column Name Suffix¶

Let's lift the full names again, this time using ";full-name" as the column name suffix instead of "label".

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name"

node1	label	node2	confident	node1;full-name	node2;full-name
Q1	P1	Q5	True	"Elmo Fudd"
Q1	P2	Q6	True	"Elmo Fudd"	"Fred Rogers"
Q2	P1	Q5	False	"Alice Cooper"
Q2	P2	Q6	False	"Alice Cooper"	"Fred Rogers"

Note

The ;full-name needs to be quoted on the command line, since ; is a shell metacharacter.

Outputting Only Modified Rows¶

Let's output only modified rows. We will start by outputting all rows:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name" \
      --columns-to-lift node2

node1	label	node2	confident	node2;full-name
Q1	P1	Q5	True
Q1	P2	Q6	True	"Fred Rogers"
Q2	P1	Q5	False
Q2	P2	Q6	False	"Fred Rogers"

Next, we will output only the modified rows:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name" \
      --columns-to-lift node2 \
      --output-only-modified-rows

node1	label	node2	confident	node2;full-name
Q1	P2	Q6	True	"Fred Rogers"
Q2	P2	Q6	False	"Fred Rogers"

Unmodified Row Output File¶

Suppose we want to isolate the unmodified rows for further processing. We can send them to the unmodified row output file.

We will send only the modified rows to the primary output stream by using --output-only-modified-rows.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name" \
      --columns-to-lift node2 \
      --output-only-modified-rows \
      --unmodified-row-output-file lift-unmodified-rows.tsv

node1	label	node2	confident	node2;full-name
Q1	P2	Q6	True	"Fred Rogers"
Q2	P2	Q6	False	"Fred Rogers"

Here are the unmodified rows:

kgtk cat -i lift-unmodified-rows.tsv

node1	label	node2	confident
Q1	P1	Q5	True
Q2	P1	Q5	False

Note

The unmodified row output file has the same columns as the primary input file. In this example, it does not have the node2;full-name column that was added to the primary output file.

Matched Label Output File¶

Suppose we are interested in finding which label file edges were matched with input file edges during the lift. The --matched-label-output-file OUTPUT_FILE option provides a simple way to get this list.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name" \
      --columns-to-lift node2 \
      --output-only-modified-rows \
      --matched-label-output-file lift-matched-labels.tsv

node1	label	node2	confident	node2;full-name
Q1	P2	Q6	True	"Fred Rogers"
Q2	P2	Q6	False	"Fred Rogers"

Here are the matched labels:

kgtk cat -i lift-matched-labels.tsv

node1	label	node2	full-name
Q6	name	"Fred"	"Fred Rogers"

Note

The matched label output file has the same columns as the label file when a label file has been specified. Otherwise, the matched label file has the same columns as the primary input file.

Note

Logically, there should be an --unmatched-labels-output-file OUTPUT_FILE option. This option may be added in the future.

Note

It may be useful if the matched label output file had an additional column with a count of the number of matches. This option may be added in the future.

Unmatched Label Output File¶

Suppose we are interested in finding which label file edges were not matched with input file edges during the lift. The --unmatched-label-output-file OUTPUT_FILE option provides a simple way to get this list.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --lift-suffix ";full-name" \
      --columns-to-lift node2 \
      --output-only-modified-rows \
      --unmatched-label-output-file lift-unmatched-labels.tsv

node1	label	node2	confident	node2;full-name
Q1	P2	Q6	True	"Fred Rogers"
Q2	P2	Q6	False	"Fred Rogers"

Here are the unmatched labels:

kgtk cat -i lift-unmatched-labels.tsv

node1	label	node2	full-name
Q1	name	"Elmo"	"Elmo Fudd"
Q2	name	"Alice"	"Alice Cooper"

Note

The unmatched label output file has the same columns as the label file when a label file has been specified. Otherwise, the unmatched label file has the same columns as the primary input file.

Lifting Labels in a Specific Language¶

Supposelift-file11.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file11.tsv

node1	label	node2
Q1	label	'Elmo'@en
Q2	label	'Alice'@en
Q5	label	"human"
Q6	label	'Frances'@fr
P1	label	"instance of"
P2	label	"friend"

Lift only labels that are qualified as English, ignoring lables without language qualifiers:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv \
          --language en

node1	label	node2	node1;label
Q1	P1	Q5	'Elmo'@en
Q1	P2	Q6	'Elmo'@en
Q6	P1	Q5

Lifting Labels in Multiple Languages¶

Lift only labels that are qualified as English or French, ignoring lables without language qualifiers:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv \
          --languages en fr

node1	label	node2	node1;label	node2;label
Q1	P1	Q5	'Elmo'@en
Q1	P2	Q6	'Elmo'@en	'Frances'@fr
Q6	P1	Q5	'Frances'@fr

Lifting Labels Qualified with Any Language¶

Lift any labels with language qualifiers, ignoring any labels without language qualifiers.

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv \
          --language ANY

node1	label	node2	node1;label	node2;label
Q1	P1	Q5	'Elmo'@en
Q1	P2	Q6	'Elmo'@en	'Frances'@fr
Q6	P1	Q5	'Frances'@fr

Lifting Labels that Are Not Language Qualified¶

Lift only labels without language qualifiers:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv \
          --language NONE

node1	label	node2	label;label	node2;label
Q1	P1	Q5	"instance of"	"human"
Q1	P2	Q6	"friend"
Q6	P1	Q5	"instance of"	"human"

Lifting Labels in a Specific Language or Without Language Qualification¶

Lift only labels that are qualified as English or that do not have language qualifiers:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv \
          --language en NONE

node1	label	node2	node1;label	label;label	node2;label
Q1	P1	Q5	'Elmo'@en	"instance of"	"human"
Q1	P2	Q6	'Elmo'@en	"friend"
Q6	P1	Q5		"instance of"	"human"

Lift Labels with Prioritized Languages¶

Supposelift-file12.tsv contains the following table in KGTK format:

kgtk cat --input-file examples/docs/lift-file12.tsv

node1	label	node2
Q1	label	'Elmo'@en
Q1	label	'Sr Elmo'@es
Q2	label	'Alice'@en
Q2	label	'Alicia'@es
Q5	label	"human"
Q6	label	'Frank'@en
Q6	label	'Frances'@fr
Q6	label	'Francisco'@es
P1	label	"instance of"
P2	label	"friend"

Lift only labels that are qualified as English, ignoring labels without language qualifiers:

kgtk add-labels --input-file examples/docs/lift-file5.tsv \
                --label-file examples/docs/lift-file12.tsv \
                --language en

node1	label	node2	node1;label	node2;label
Q1	P1	Q5	'Elmo'@en
Q1	P2	Q6	'Elmo'@en	'Frank'@en
Q6	P1	Q5	'Frank'@en

Lift only labels that are qualified as Spanish, ignoring labels without language qualifiers:

kgtk add-labels --input-file examples/docs/lift-file5.tsv \
                --label-file examples/docs/lift-file12.tsv \
                --language es

node1	label	node2	node1;label	node2;label
Q1	P1	Q5	'Sr Elmo'@es
Q1	P2	Q6	'Sr Elmo'@es	'Francisco'@es
Q6	P1	Q5	'Francisco'@es

Lift only labels that are qualified as English or Spanish, without prioritization, ignoring labels without language qualifiers:

kgtk add-labels --input-file examples/docs/lift-file5.tsv \
                --label-file examples/docs/lift-file12.tsv \
                --languages en es

node1	label	node2	node1;label	node2;label
Q1	P1	Q5	'Elmo'@en\|'Sr Elmo'@es
Q1	P2	Q6	'Elmo'@en\|'Sr Elmo'@es	'Francisco'@es\|'Frank'@en
Q6	P1	Q5	'Francisco'@es\|'Frank'@en

Lift only labels that are qualified as English or Spanish, preferring English labels, ignoring labels without language qualifiers:

kgtk add-labels --input-file examples/docs/lift-file5.tsv \
                --label-file examples/docs/lift-file12.tsv \
                --languages en es --prioritize

node1	label	node2	node1;label	node2;label
Q1	P1	Q5	'Elmo'@en
Q1	P2	Q6	'Elmo'@en	'Frank'@en
Q6	P1	Q5	'Frank'@en

Lift only labels that are qualified as English or Spanish, preferring Spanish labels, ignoring labels without language qualifiers:

kgtk add-labels --input-file examples/docs/lift-file5.tsv \
                --label-file examples/docs/lift-file12.tsv \
                --languages es en --prioritize

node1	label	node2	node1;label	node2;label
Q1	P1	Q5	'Sr Elmo'@es
Q1	P2	Q6	'Sr Elmo'@es	'Francisco'@es
Q6	P1	Q5	'Francisco'@es

Lift only labels that are qualified as French, English, or Spanish, preferring the labels in that order, ignoring labels without language qualifiers:

kgtk add-labels --input-file examples/docs/lift-file5.tsv \
                --label-file examples/docs/lift-file12.tsv \
                --languages fr en es --prioritize

node1	label	node2	node1;label	node2;label
Q1	P1	Q5	'Elmo'@en
Q1	P2	Q6	'Elmo'@en	'Frances'@fr
Q6	P1	Q5	'Frances'@fr

Expert Example: Input Filtering¶

Let's list the full names only when we are confident in the relationship. The expert options --input-select-column INPUT_SELECT_COLUMN_NAME and --input-select-value INPUT_SELECT_COLUMN_VALUE provide a built-in filter operation.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          -p name \
          --label-value-column full-name \
          --input-select-column confident \
          --input-select-value True

node1	label	node2	confident	node1;label	node2;label
Q1	P1	Q5	True	"Elmo Fudd"
Q1	P2	Q6	True	"Elmo Fudd"	"Fred Rogers"
Q2	P1	Q5	False
Q2	P2	Q6	False

Expert Example: Lifting into `node2`¶

Let's lift full names into the node2 column, replacing the existing values there. We can do this by specifying --columns-to-lift node2 and giving an empty --lift-suffix.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --columns-to-lift node2 \
          --lift-suffix ""

node1	label	node2	confident
Q1	P1	Q5	True
Q1	P2	"Fred Rogers"	True
Q2	P1	Q5	False
Q2	P2	"Fred Rogers"	False

Note

--lift-suffix ""' uses shell quotes to specify an empty value.--lift-suffix=` is another way to specify the empty lift suffix, and does not require shell quoting.

Note

This procedure, repeated for the node1, label, and node2 columns, can be used to transform relationships from one knowledge base system to another.

Expert Example: Lifting into `node2`, Outputting Only Modified Rows¶

Let's lift full names into the node2 column, replacing the existing values there. We can do this by specifying --columns-to-lift node2 and giving an empty --lift-suffix. We will output only modified rows.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --columns-to-lift node2 \
          --lift-suffix= \
      --output-only-modified-rows

node1	label	node2	confident
Q1	P2	"Fred Rogers"	True
Q2	P2	"Fred Rogers"	False

Expert Example: Update Lifted Relationships¶

Let's lift full names into the node2 column, changing the label of the relationahip when we do so.

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file9.tsv \
          --property name \
          --lift-from full-name \
          --columns-to-lift node2 \
          --lift-suffix "" \
          --update-select-value FullName

node1	label	node2	confident
Q1	P1	Q5	True
Q1	FullName	"Fred Rogers"	True
Q2	P1	Q5	False
Q2	FullName	"Fred Rogers"	False

Expert Example: Overriding the Label Match and Value Columns¶

Consider the following file, lift-file10.tsv, which is like lift-file9.tsv, but with the node1 and node2 columns swapped and with an additional column, action:

kgtk cat --input-file examples/docs/lift-file10.tsv

node1	label	node2	full-name	action
"instance of"	label	P1		go
"friend"	label	P2		go
"enemy"	label	P3		go
"Elmo"	name	Q1	"Elmo Fudd"	go
"Alice"	name	Q2	"Alice Cooper"	go
"human"	species	Q5		go
"Fred"	name	Q6	"Fred Rogers"	go

Let's lift full names from this file. We'll swap the function of the node1 and node2 columns in the label file:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file10.tsv \
      --property name \
      --lift-from full-name \
      --columns-to-lift node2 \
      --label-match-column node2 \
      --label-value-column node1

node1	label	node2	confident	node2;label
Q1	P1	Q5	True
Q1	P2	Q6	True	"Fred"
Q2	P1	Q5	False
Q2	P2	Q6	False	"Fred"

Expert Example: Selecting the Labels to Lift¶

Let's pick up all labels using the action column's go value to select the labels that we pick:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file10.tsv \
          --label-select-column action \
          --label-select-value go \
          --label-match-column node2 \
          --label-value-column node1

node1	label	node2	confident	node1;label	label;label	node2;label
Q1	P1	Q5	True	"Elmo"	"instance of"	"human"
Q1	P2	Q6	True	"Elmo"	"friend"	"Fred"
Q2	P1	Q5	False	"Alice"	"instance of"	"human"
Q2	P2	Q6	False	"Alice"	"friend"	"Fred"

If we hadn't filtered the labels, the output would have looked like this:

kgtk lift --input-file examples/docs/lift-file8.tsv \
          --label-file examples/docs/lift-file10.tsv \
          --label-match-column node2 \
          --label-value-column node1

node1	label	node2	confident	label;label
Q1	P1	Q5	True	"instance of"
Q1	P2	Q6	True	"friend"
Q2	P1	Q5	False	"instance of"
Q2	P2	Q6	False	"friend"

Lifting Labels Stripping Language Qualifiers¶

Lift labels without stripping language qualifiers:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv

node1	label	node2	node1;label	label;label	node2;label
Q1	P1	Q5	'Elmo'@en	"instance of"	"human"
Q1	P2	Q6	'Elmo'@en	"friend"	'Frances'@fr
Q6	P1	Q5	'Frances'@fr	"instance of"	"human"

Lift labels, stripping language qualifiers:

kgtk lift --input-file examples/docs/lift-file5.tsv \
          --label-file examples/docs/lift-file11.tsv \
          --strip-language-qualifiers

node1	label	node2	node1;label	label;label	node2;label
Q1	P1	Q5	"Elmo"	"instance of"	"human"
Q1	P2	Q6	"Elmo"	"friend"	"Frances"
Q6	P1	Q5	"Frances"	"instance of"	"human"

lift

Overview¶

Memory Usage¶

Usage¶

Examples¶

Sample Data¶

Default Lift¶

Multiple Labels¶

Lifting Specific Columns¶

Seperate Input Files¶

Presorted Input Files¶

Small Input, Many Labels¶

Duplicate Labels¶

More Sample Data¶

Default Lift, Seperate Label File¶

Lift a Single Property¶

Lift with a Column Name Suffix¶

Lift from a Specific Column¶

Lift from a Specific Column with a Column Name Suffix¶

Outputting Only Modified Rows¶

Unmodified Row Output File¶

Matched Label Output File¶

Unmatched Label Output File¶

Lifting Labels in a Specific Language¶

Lifting Labels in Multiple Languages¶

Lifting Labels Qualified with Any Language¶

Lifting Labels that Are Not Language Qualified¶

Lifting Labels in a Specific Language or Without Language Qualification¶

Lift Labels with Prioritized Languages¶

Expert Example: Input Filtering¶

Expert Example: Lifting into node2¶

Expert Example: Lifting into node2, Outputting Only Modified Rows¶

Expert Example: Update Lifted Relationships¶

Expert Example: Overriding the Label Match and Value Columns¶

Expert Example: Selecting the Labels to Lift¶

Lifting Labels Stripping Language Qualifiers¶

Expert Example: Lifting into `node2`¶

Expert Example: Lifting into `node2`, Outputting Only Modified Rows¶