add-labels
Overview¶
The add-labels command copies its input file to its output file, adding label columns using values obtained from a labels file.
kgtk add-labels
is implemented as an alias for kgtk lift
.
Unlike kgtk lift
, kgtk add-labels
will lift labels for all columns with
names that do not end in ';label' (or the current suffix). A seperate label
file is required; it may be specified using either --label-file LABEL_FILE
or the KGTK_LABEL_FILE
envar. The input file is assumeed to be small enough
to fit into memory, and the labels will be read with prefiltering.
Any lifted columns that are entirely empty will be suppressed.
Unlike most KGTK commands, kgtk add-labels
does not by default verify that
the input file is a valid KGTK file. This facilitates using it to add
labels to more general TSV files.
See the kgtk lift
documentation for more details on the shared behavior
of these two commands.
Memory Usage¶
The input rows are saved in memory, as well as the value-to-label mapping. This will impose a limit on the size of the input files that can be processed.
Usage¶
usage: kgtk lift [-h] [-i INPUT_FILE] [-o OUTPUT_FILE]
[--label-file INPUT_FILE]
[--unmodified-row-output-file UNMODIFIED_ROW_OUTPUT_FILE]
[--matched-label-output-file MATCHED_LABEL_OUTPUT_FILE]
[--unmatched-label-output-file UNMATCHED_LABEL_OUTPUT_FILE]
[--columns-to-write [OUTPUT_LIFTED_COLUMN_NAMES ...]]
[--default-value DEFAULT_VALUE]
[--suppress-empty-columns [True/False]]
[--ok-if-no-labels [True/False]]
[--prefilter-labels [True/False]]
[--input-file-is-presorted [True/False]]
[--label-file-is-presorted [True/False]]
[--clear-before-lift [CLEAR_BEFORE_LIFT]]
[--overwrite [OVERWRITE]]
[--output-only-modified-rows [OUTPUT_ONLY_MODIFIED_ROWS]]
[--languages [LANGUAGE ...]] [--prioritize [True/False]]
[--use-label-envar [True/False]] [-v [optional True|False]]
Lift labels for a KGTK file. If called as "kgtk lift", for each of the items in the (node1, label, node2) columns, look for matching label records. If called as "kgtk add-labels", look for matching label records for all input columns. If found, lift the label values into additional columns in the current record. Label records are removed from the output unless --remove-label-records=False.
Additional options are shown in expert help.
kgtk --expert lift --help
optional arguments:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
The KGTK input file. (May be omitted or '-' for
stdin.)
-o OUTPUT_FILE, --output-file OUTPUT_FILE
The KGTK output file. (May be omitted or '-' for
stdout.)
--label-file INPUT_FILE
A KGTK file with label records (Optional, use '-' for
stdin.)
--unmodified-row-output-file UNMODIFIED_ROW_OUTPUT_FILE
A KGTK output file that will contain only unmodified
rows. This file will have the same columns as the
input file. (Optional, use '-' for stdout.)
--matched-label-output-file MATCHED_LABEL_OUTPUT_FILE
A KGTK output file that will contain matched label
edges. This file will have the same columns as the
source of the labels, either the input file or the
label file. (Optional, use '-' for stdout.)
--unmatched-label-output-file UNMATCHED_LABEL_OUTPUT_FILE
A KGTK output file that will contain unmatched label
edges. This file will have the same columns as the
source of the labels, either the input file or the
label file. (Optional, use '-' for stdout.)
--columns-to-write [OUTPUT_LIFTED_COLUMN_NAMES ...]
The columns into which to store the lifted values. The
default is [node1;label, label;label, node2;label,
...] or their aliases.
--default-value DEFAULT_VALUE
The value to use if a lifted label is not found.
(default=)
--suppress-empty-columns [True/False]
If true, do not create new columns that would be
empty. (default=True).
--ok-if-no-labels [True/False]
If true, do not abort if no labels were found.
(default=False).
--prefilter-labels [True/False]
If true, read the input file before reading the label
file. (default=True).
--input-file-is-presorted [True/False]
If true, the input file is presorted on the column for
which values are to be lifted. (default=False).
--label-file-is-presorted [True/False]
If true, the label file is presorted on the node1
column. (default=False).
--clear-before-lift [CLEAR_BEFORE_LIFT]
If true, set columns to write to the default value
before lifting. (default=False).
--overwrite [OVERWRITE]
If true, overwrite non-default values in the columns
to write. If false, do not overwrite non-default
values in the columns to write. (default=True).
--output-only-modified-rows [OUTPUT_ONLY_MODIFIED_ROWS]
If true, output only modified edges to the primary
output stream. (default=False).
--languages [LANGUAGE ...]
Lift only labels with a matching language qualifier.
ANY means any language qualifier. NONE means no
language qualifier. (default=ANY NONE)
--prioritize [True/False]
If true and filtering labels by language, pick only
the label matching the language that appears before
other matches in the language list. (default=False).
--use-label-envar [True/False]
If true, use the KGTK_LABEL_FILE envar for the label
file if no --label-file. (default=True).
-v [optional True|False], --verbose [optional True|False]
Print additional progress messages (default=False).
Examples¶
Sample Data¶
Suppose that add-labels-file1.tsv
contains the following table in KGTK format:
kgtk cat --input-file examples/docs/add-labels-file1.tsv
node1 | label | node2 |
---|---|---|
Q1 | P1 | Q5 |
Q1 | P2 | Q6 |
Q6 | P1 | Q5 |
Suppose also that add-labels-labels.tsv
contains the following table in KGTK format:
kgtk cat --input-file examples/docs/add-labels-labels.tsv
node1 | label | node2 |
---|---|---|
Q1 | label | "Elmo" |
Q2 | label | "Alice" |
Q5 | label | "human" |
Q6 | label | "Fred" |
P1 | label | "instance of" |
P2 | label | "friend" |
Q101 | label | "red" |
Q102 | label | "blue" |
Q103 | label | "green" |
Adding Labels to an Input File¶
Let's add labels to add-labels-file1.tsv
kgtk add-labels --input-file examples/docs/add-labels-file1.tsv \
--label-file examples/docs/add-labels-labels.tsv
The output will be the following table in KGTK format:
node1 | label | node2 | node1;label | label;label | node2;label |
---|---|---|---|---|---|
Q1 | P1 | Q5 | "Elmo" | "instance of" | "human" |
Q1 | P2 | Q6 | "Elmo" | "friend" | "Fred" |
Q6 | P1 | Q5 | "Fred" | "instance of" | "human" |
Adding Labels to an Input File with Extra Columns¶
Let's add labels to add-labels-file2.tsv
. This file
contains the additional column color
.
kgtk cat --input-file examples/docs/add-labels-file2.tsv
node1 | label | node2 | color |
---|---|---|---|
Q1 | P1 | Q5 | Q101 |
Q1 | P2 | Q6 | Q102 |
Q6 | P1 | Q5 | Q103 |
Add labels to this file:
kgtk add-labels --input-file examples/docs/add-labels-file2.tsv \
--label-file examples/docs/add-labels-labels.tsv
The output will be the following table in KGTK format:
node1 | label | node2 | color | node1;label | label;label | node2;label | color;label |
---|---|---|---|---|---|---|---|
Q1 | P1 | Q5 | Q101 | "Elmo" | "instance of" | "human" | "red" |
Q1 | P2 | Q6 | Q102 | "Elmo" | "friend" | "Fred" | "blue" |
Q6 | P1 | Q5 | Q103 | "Fred" | "instance of" | "human" | "green" |
Adding Labels with an Existing Label Column¶
Suppose that add-labels-file3.tsv
contains the following table in KGTK format:
kgtk cat --input-file examples/docs/add-labels-file3.tsv
node1 | label | node2 | color | node1;label |
---|---|---|---|---|
Q1 | P1 | Q5 | Q101 | "Elmo" |
Q1 | P2 | Q6 | Q102 | |
Q6 | P1 | Q5 | Q103 |
Add labels to this file with no additional arguments:
kgtk add-labels --input-file examples/docs/add-labels-file3.tsv \
--label-file examples/docs/add-labels-labels.tsv
node1 | label | node2 | color | node1;label | label;label | node2;label | color;label |
---|---|---|---|---|---|---|---|
Q1 | P1 | Q5 | Q101 | "Elmo" | "instance of" | "human" | "red" |
Q1 | P2 | Q6 | Q102 | "Elmo" | "friend" | "Fred" | "blue" |
Q6 | P1 | Q5 | Q103 | "Fred" | "instance of" | "human" | "green" |
Suppression of Empty Label Columns¶
Suppose that add-labels-file4.tsv
contains the following table in KGTK format:
kgtk cat --input-file examples/docs/add-labels-file4.tsv
node1 | label | node2 | color | node1;label | nolabel |
---|---|---|---|---|---|
Q1 | P1 | Q5 | Q101 | "Elmo" | Q201 |
Q1 | P2 | Q6 | Q102 | Q202 | |
Q6 | P1 | Q5 | Q103 | Q203 |
Add labels to this file, which includes the nolabel
column, which contains
values that are not labels in the labels file:
kgtk add-labels --input-file examples/docs/add-labels-file4.tsv \
--label-file examples/docs/add-labels-labels.tsv
The output will be the following table in KGTK format:
node1 | label | node2 | color | node1;label | nolabel | label;label | node2;label | color;label |
---|---|---|---|---|---|---|---|---|
Q1 | P1 | Q5 | Q101 | "Elmo" | Q201 | "instance of" | "human" | "red" |
Q1 | P2 | Q6 | Q102 | "Elmo" | Q202 | "friend" | "Fred" | "blue" |
Q6 | P1 | Q5 | Q103 | "Fred" | Q203 | "instance of" | "human" | "green" |
Accepting Input Files that Are Not Valid KGTK FIles¶
By default, kgtk add-labels
will accept input files that are not valid
KGTK files, in the sense that they do not have the required columns (node1
,
label
, node2
for a KTK Edge file, id
for a KGTK Node file).
Suppose that add-labels-file5.tsv
contains the following table in KGTK format:
kgtk cat --input-file examples/docs/add-labels-file5.tsv --mode=NONE
label | node2 | color |
---|---|---|
P1 | Q5 | Q101 |
P2 | Q6 | Q102 |
P1 | Q5 | Q103 |
Add labels to this file, which has values that are not labels in the labels file:
kgtk add-labels --input-file examples/docs/add-labels-file5.tsv \
--label-file examples/docs/add-labels-labels.tsv
The output will be the following table in KGTK format:
label | node2 | color | label;label | node2;label | color;label |
---|---|---|---|---|---|
P1 | Q5 | Q101 | "instance of" | "human" | "red" |
P2 | Q6 | Q102 | "friend" | "Fred" | "blue" |
P1 | Q5 | Q103 | "instance of" | "human" | "green" |
Adding Labels in a Specific Language¶
Supposeadd-labels-labels2.tsv
contains the following table in KGTK format:
kgtk cat --input-file examples/docs/add-labels-labels2.tsv
node1 | label | node2 |
---|---|---|
Q1 | label | 'Elmo'@en |
Q2 | label | 'Alice'@en |
Q5 | label | "human" |
Q6 | label | 'Frances'@fr |
P1 | label | "instance of" |
P2 | label | "friend" |
Lift only labels that are qualified as English, ignoring labels without language qualifiers:
kgtk add-labels --input-file examples/docs/add-labels-file1.tsv \
--label-file examples/docs/add-labels-labels2.tsv \
--language en
node1 | label | node2 | node1;label |
---|---|---|---|
Q1 | P1 | Q5 | 'Elmo'@en |
Q1 | P2 | Q6 | 'Elmo'@en |
Q6 | P1 | Q5 |
Adding Labels in Multiple Languages¶
Lift only labels that are qualified as English or French, ignoring labels without language qualifiers:
kgtk add-labels --input-file examples/docs/add-labels-file1.tsv \
--label-file examples/docs/add-labels-labels2.tsv \
--languages en fr
node1 | label | node2 | node1;label | node2;label |
---|---|---|---|---|
Q1 | P1 | Q5 | 'Elmo'@en | |
Q1 | P2 | Q6 | 'Elmo'@en | 'Frances'@fr |
Q6 | P1 | Q5 | 'Frances'@fr |
Adding Labels Qualified with Any Language¶
Lift only labels that are qualified as English, ignoring labels without language qualifiers:
kgtk add-labels --input-file examples/docs/add-labels-file1.tsv \
--label-file examples/docs/add-labels-labels2.tsv \
--language ANY
node1 | label | node2 | node1;label | node2;label |
---|---|---|---|---|
Q1 | P1 | Q5 | 'Elmo'@en | |
Q1 | P2 | Q6 | 'Elmo'@en | 'Frances'@fr |
Q6 | P1 | Q5 | 'Frances'@fr |
Adding Labels that Are Not Language Qualified¶
Lift only labels that are qualified as English, ignoring labels without language qualifiers:
kgtk add-labels --input-file examples/docs/add-labels-file1.tsv \
--label-file examples/docs/add-labels-labels2.tsv \
--language NONE
node1 | label | node2 | label;label | node2;label |
---|---|---|---|---|
Q1 | P1 | Q5 | "instance of" | "human" |
Q1 | P2 | Q6 | "friend" | |
Q6 | P1 | Q5 | "instance of" | "human" |
Adding Labels in a Specific Language or Without Language Qualification¶
Lift only labels that are qualified as English, ignoring labels without language qualifiers:
kgtk add-labels --input-file examples/docs/add-labels-file1.tsv \
--label-file examples/docs/add-labels-labels2.tsv \
--language en NONE
node1 | label | node2 | node1;label | label;label | node2;label |
---|---|---|---|---|---|
Q1 | P1 | Q5 | 'Elmo'@en | "instance of" | "human" |
Q1 | P2 | Q6 | 'Elmo'@en | "friend" | |
Q6 | P1 | Q5 | "instance of" | "human" |
Adding Labels with Prioritized Languages¶
Supposeadd-labels-labels2.tsv
contains the following table in KGTK format:
kgtk cat --input-file examples/docs/add-labels-labels3.tsv
node1 | label | node2 |
---|---|---|
Q1 | label | 'Elmo'@en |
Q1 | label | 'Sr Elmo'@es |
Q2 | label | 'Alice'@en |
Q2 | label | 'Alicia'@es |
Q5 | label | "human" |
Q6 | label | 'Frank'@en |
Q6 | label | 'Frances'@fr |
Q6 | label | 'Francisco'@es |
P1 | label | "instance of" |
P2 | label | "friend" |
Add only labels that are qualified as English, ignoring labels without language qualifiers:
kgtk add-labels --input-file examples/docs/add-labels-file1.tsv \
--label-file examples/docs/add-labels-labels3.tsv \
--language en
node1 | label | node2 | node1;label | node2;label |
---|---|---|---|---|
Q1 | P1 | Q5 | 'Elmo'@en | |
Q1 | P2 | Q6 | 'Elmo'@en | 'Frank'@en |
Q6 | P1 | Q5 | 'Frank'@en |
Add only labels that are qualified as Spanish, ignoring labels without language qualifiers:
kgtk add-labels --input-file examples/docs/add-labels-file1.tsv \
--label-file examples/docs/add-labels-labels3.tsv \
--language es
node1 | label | node2 | node1;label | node2;label |
---|---|---|---|---|
Q1 | P1 | Q5 | 'Sr Elmo'@es | |
Q1 | P2 | Q6 | 'Sr Elmo'@es | 'Francisco'@es |
Q6 | P1 | Q5 | 'Francisco'@es |
Add only labels that are qualified as English or Spanish, preferring English labels, ignoring labels without language qualifiers:
kgtk add-labels --input-file examples/docs/add-labels-file1.tsv \
--label-file examples/docs/add-labels-labels3.tsv \
--languages en es
node1 | label | node2 | node1;label | node2;label |
---|---|---|---|---|
Q1 | P1 | Q5 | 'Elmo'@en|'Sr Elmo'@es | |
Q1 | P2 | Q6 | 'Elmo'@en|'Sr Elmo'@es | 'Francisco'@es|'Frank'@en |
Q6 | P1 | Q5 | 'Francisco'@es|'Frank'@en |
Add only labels that are qualified as English or Spanish, preferring Spanish labels, ignoring labels without language qualifiers:
kgtk add-labels --input-file examples/docs/add-labels-file1.tsv \
--label-file examples/docs/add-labels-labels3.tsv \
--languages es en
node1 | label | node2 | node1;label | node2;label |
---|---|---|---|---|
Q1 | P1 | Q5 | 'Elmo'@en|'Sr Elmo'@es | |
Q1 | P2 | Q6 | 'Elmo'@en|'Sr Elmo'@es | 'Francisco'@es|'Frank'@en |
Q6 | P1 | Q5 | 'Francisco'@es|'Frank'@en |
Add only labels that are qualified as French, English, or Spanish, preferring the labels in that order, ignoring labels without language qualifiers:
kgtk add-labels --input-file examples/docs/add-labels-file1.tsv \
--label-file examples/docs/add-labels-labels3.tsv \
--languages fr en es
node1 | label | node2 | node1;label | node2;label |
---|---|---|---|---|
Q1 | P1 | Q5 | 'Elmo'@en|'Sr Elmo'@es | |
Q1 | P2 | Q6 | 'Elmo'@en|'Sr Elmo'@es | 'Frances'@fr|'Francisco'@es|'Frank'@en |
Q6 | P1 | Q5 | 'Frances'@fr|'Francisco'@es|'Frank'@en |
Expert Example: Rejecting Input Files that Are Not Valid KGTK FIles¶
By default, kgtk add-labels
will accept input files that are not valid
KGTK files, in the sense that they do not have the required columns (node1
,
label
, node2
for a KTK Edge file, id
for a KGTK Node file).
kgtk add-labels --input-file examples/docs/add-labels-file5.tsv \
--label-file examples/docs/add-labels-labels.tsv \
--force-input-mode-none false
The output will be the following error message:
In input header 'label node2 color': Missing required column: id | ID
Exit requested