KGTK graph cache
KGTK Graph Cache¶
The KGTK Graph Cache is a SQLite database containing copies of KGTK edge.
The Graph Cache was introduced with Kypher and the kgtk query
command.
The Graph Cache may also be used with other KGTK commands (specifically, ones
that use KgtkReader to read KGTK input files).
For more details on the Graph Cache, see the documentation for the
kgtk query
command.
Specifying the Location of the Graph Cache¶
The --graph-cache grpah-cache-path
option specifies the location of the
Graph Cache to KGTK commands that use it.
If --graph-cache
is not specified, some KGTK commands will look for the
envar KGTK_GRAPH_CACHE
to find the path to the Graph Cache. This behavior
may be suppressed with --use-graph-cache-envar=false
. The default value for
this option is `--use-graph-cache-envar=true".
Stale Graph Cache Data¶
Info
This section describes the behavior of KGTK commands other than kgtk query
.
If a Graph Cache has been located and a KGTK input file has been found in the Graph Cache as well as outside the Graph Cache, the data in the Graph Cache will be considered stale if the file size or modification time stored for the file in the Graph Cache do not match the file size or modification time of the KGTK input file outside the Graph Cache. The copy of the KGTK input file in the Graph Cache will be ignored, and the copy outside the Graph Cache will be read.
If the option --ignore-stale-graph-cache=false
is specified, then KGTK input
files found in the the Graph Cache will be used without performing the staleness
check. The default value for this option is `--ignore-stale-graph-cache=true
.
There are limited circumstances in which this option should be used
- For example, this option might be used in circumstances in which the copy of the data in the Graph Cache is trusted more than the copy outside the Graph Cache.
Expert Topic: Tuning Graph Cache I/O¶
Outside the kgtk query
command, the performance of KGTK commands that read
data from the Graph Cache may be tuned with certain options. Normally, it
should not be necessary to adjust the values of these options, but when
running KGTK commands in environments with limited resources (e.g., on some
laptops or in resource-constrained VMs), tuning these option values may
provide some benefit. The options are shown with their default values:
--graph-cache-fetchmany-size 1000
--graph-cache-filter-batch-size 100