community-detection
Summary¶
This command will return a clutering results from the input kgtk file. The algorithms are provided by graph_tool (blockmodel, nested and mcmc)
Input File¶
The input file should be a KGTK Edge file with the following columns or their aliases:
node1: the subject column (source node)label: the predicate column (property name)node2: the object column (target node)
Processing an Input File that is Not a KGTK Edge File¶
If your input file doesn't have node1, label, or node2 columns (or their aliases) at all, then it is
not a valid KGTK Edge file. In this case, you also have to pass the following command line option:
--input-mode=NONE
The Output File¶
The output file is an edge file that contains the following columns:
node1: this column contains each nodelabel: this column contains only 'in'node2: this column contains the resulting clusternode2;prob: this column contains the probability/confidence of clustering
Usage¶
usage: kgtk community-detection [-h] [-i INPUT_FILE] [-o OUTPUT_FILE]
[--method METHOD] [-v [optional True|False]]
Creating community detection from graph-tool using KGTK file, available options are blockmodel, nested and mcmc
optional arguments:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
The KGTK input file. (May be omitted or '-' for
stdin.)
-o OUTPUT_FILE, --output-file OUTPUT_FILE
The KGTK output file. (May be omitted or '-' for
stdout.)
--method METHOD Specify the clustering method to use.
-v [optional True|False], --verbose [optional True|False]
Print additional progress messages (default=False).
Examples¶
Default model (blockmodel)¶
The following file will be used to illustrate some of the capabilities of kgtk reachable-nodes.
head examples/docs/community-detection-arnold.tsv
| node1 | label | node2 | node1;label | label;label | node2;label |
|---|---|---|---|---|---|
| Q1086823 | P22 | Q345517 | 'Christopher Lawford'@en | 'father'@en | 'Peter Lawford'@en |
| Q1086823 | P25 | Q432694 | 'Christopher Lawford'@en | 'mother'@en | 'Patricia Kennedy Lawford'@en |
| Q1086823 | P26 | Q75326809 | 'Christopher Lawford'@en | 'spouse'@en | 'Jean Edith Olssen'@en |
| Q1086823 | P3373 | Q75326777 | 'Christopher Lawford'@en | 'sibling'@en | 'Victoria Lawford'@en |
| Q1086823 | P3373 | Q75326779 | 'Christopher Lawford'@en | 'sibling'@en | 'Sydney Lawford'@en |
| Q1086823 | P3373 | Q75326780 | 'Christopher Lawford'@en | 'sibling'@en | 'Robin Lawford'@en |
| Q1086823 | P3448 | Q96079835 | 'Christopher Lawford'@en | 'stepparent'@en | 'Mary Rowan'@en |
| Q1086823 | P3448 | Q96079836 | 'Christopher Lawford'@en | 'stepparent'@en | 'Deborah Gould'@en |
| Q1086823 | P3448 | Q96079838 | 'Christopher Lawford'@en | 'stepparent'@en | 'Patricia Seaton'@en |
Find the communities using blockmodel.
kgtk community-detection -i examples/docs/community-detection-arnold.tsv --method blockmodel
| node1 | label | node2 |
|---|---|---|
| Q1086823 | in | cluster_9 |
| Q345517 | in | cluster_9 |
| Q432694 | in | cluster_3 |
| Q75326809 | in | cluster_9 |
| Q75326777 | in | cluster_9 |
| Q75326779 | in | cluster_9 |
| Q75326780 | in | cluster_9 |
| Q96079835 | in | cluster_9 |
| Q96079836 | in | cluster_9 |
| Q96079838 | in | cluster_9 |
| Q76363382 | in | cluster_9 |
| Q76363384 | in | cluster_9 |
| Q76363386 | in | cluster_9 |
| Q11673 | in | cluster_40 |
| Q467912 | in | cluster_40 |
| Q134549 | in | cluster_3 |
| Q9696 | in | cluster_17 |
| Q313696 | in | cluster_18 |
| Q236540 | in | cluster_18 |
| Q441424 | in | cluster_20 |
| Q7926996 | in | cluster_20 |
| Q25310 | in | cluster_22 |
| Q265595 | in | cluster_18 |
| Q268799 | in | cluster_18 |
| Q272401 | in | cluster_18 |
| Q272908 | in | cluster_17 |
| Q505178 | in | cluster_18 |
| Q2383370 | in | cluster_20 |
| Q3048622 | in | cluster_20 |
| Q948920 | in | cluster_20 |
| Q1352872 | in | cluster_40 |
| Q258661 | in | cluster_40 |
| Q1386420 | in | cluster_40 |
| Q1804720 | in | cluster_40 |
| Q1975383 | in | cluster_40 |
| Q273833 | in | cluster_40 |
| Q467861 | in | cluster_40 |
| Q5112377 | in | cluster_40 |
| Q5178632 | in | cluster_40 |
| Q5301573 | in | cluster_40 |
| Q6794923 | in | cluster_40 |
| Q165421 | in | cluster_45 |
| Q230303 | in | cluster_45 |
| Q316064 | in | cluster_45 |
| Q3290402 | in | cluster_45 |
| Q75326753 | in | cluster_45 |
| Q230654 | in | cluster_54 |
| Q317248 | in | cluster_54 |
| Q2685 | in | cluster_58 |
| Q3436301 | in | cluster_54 |
| Q3529079 | in | cluster_54 |
| Q4773467 | in | cluster_54 |
| Q6769708 | in | cluster_54 |
| Q28109921 | in | cluster_58 |
| Q28109928 | in | cluster_58 |
| Q4521676 | in | cluster_58 |
| Q901541 | in | cluster_58 |
| Q23800185 | in | cluster_58 |
| Q75494768 | in | cluster_58 |
| Q23800370 | in | cluster_58 |
| Q3288486 | in | cluster_58 |
| Q96076900 | in | cluster_58 |
| Q38196234 | in | cluster_58 |
| Q24004771 | in | cluster_58 |
| Q96077739 | in | cluster_54 |
| Q96077740 | in | cluster_54 |
| Q65589427 | in | cluster_54 |
| Q43100988 | in | cluster_58 |
| Q503706 | in | cluster_58 |
| Q4491 | in | cluster_58 |
| Q65589450 | in | cluster_54 |
| Q75496774 | in | cluster_58 |
| Q4616 | in | cluster_45 |
nested model¶
kgtk community-detection -i examples/docs/community-detection-arnold.tsv --method nested
| node1 | label | node2 |
|---|---|---|
| Q1086823 | in | cluster_0_6_11 |
| Q345517 | in | cluster_0_8_11 |
| Q432694 | in | cluster_0_0_39 |
| Q75326809 | in | cluster_0_6_11 |
| Q75326777 | in | cluster_0_8_11 |
| Q75326779 | in | cluster_0_6_11 |
| Q75326780 | in | cluster_0_8_11 |
| Q96079835 | in | cluster_0_0_11 |
| Q96079836 | in | cluster_0_0_11 |
| Q96079838 | in | cluster_0_8_11 |
| Q76363382 | in | cluster_0_0_11 |
| Q76363384 | in | cluster_0_8_11 |
| Q76363386 | in | cluster_0_6_11 |
| Q11673 | in | cluster_0_6_51 |
| Q467912 | in | cluster_0_8_51 |
| Q134549 | in | cluster_0_6_39 |
| Q9696 | in | cluster_0_6_30 |
| Q313696 | in | cluster_0_0_13 |
| Q236540 | in | cluster_0_8_13 |
| Q441424 | in | cluster_0_8_18 |
| Q7926996 | in | cluster_0_8_18 |
| Q25310 | in | cluster_0_6_22 |
| Q265595 | in | cluster_0_6_13 |
| Q268799 | in | cluster_0_8_13 |
| Q272401 | in | cluster_0_0_13 |
| Q272908 | in | cluster_0_0_30 |
| Q505178 | in | cluster_0_5_13 |
| Q2383370 | in | cluster_0_6_18 |
| Q3048622 | in | cluster_0_6_18 |
| Q948920 | in | cluster_0_0_18 |
| Q1352872 | in | cluster_0_5_51 |
| Q258661 | in | cluster_0_8_51 |
| Q1386420 | in | cluster_0_6_51 |
| Q1804720 | in | cluster_0_0_51 |
| Q1975383 | in | cluster_0_0_51 |
| Q273833 | in | cluster_0_0_51 |
| Q467861 | in | cluster_0_6_51 |
| Q5112377 | in | cluster_0_0_51 |
| Q5178632 | in | cluster_0_0_51 |
| Q5301573 | in | cluster_0_5_51 |
| Q6794923 | in | cluster_0_8_51 |
| Q165421 | in | cluster_0_8_20 |
| Q230303 | in | cluster_0_8_20 |
| Q316064 | in | cluster_0_8_20 |
| Q3290402 | in | cluster_0_8_20 |
| Q75326753 | in | cluster_0_6_20 |
| Q230654 | in | cluster_0_0_46 |
| Q317248 | in | cluster_0_0_70 |
| Q2685 | in | cluster_0_0_47 |
| Q3436301 | in | cluster_0_0_70 |
| Q3529079 | in | cluster_0_8_70 |
| Q4773467 | in | cluster_0_6_70 |
| Q6769708 | in | cluster_0_5_70 |
| Q28109921 | in | cluster_0_8_47 |
| Q28109928 | in | cluster_0_8_47 |
| Q4521676 | in | cluster_0_0_47 |
| Q901541 | in | cluster_0_6_47 |
| Q23800185 | in | cluster_0_8_47 |
| Q75494768 | in | cluster_0_8_47 |
| Q23800370 | in | cluster_0_8_47 |
| Q3288486 | in | cluster_0_0_47 |
| Q96076900 | in | cluster_0_0_47 |
| Q38196234 | in | cluster_0_0_47 |
| Q24004771 | in | cluster_0_0_47 |
| Q96077739 | in | cluster_0_0_70 |
| Q96077740 | in | cluster_0_8_70 |
| Q65589427 | in | cluster_0_6_70 |
| Q43100988 | in | cluster_0_0_47 |
| Q503706 | in | cluster_0_6_47 |
| Q4491 | in | cluster_0_6_47 |
| Q65589450 | in | cluster_0_0_70 |
| Q75496774 | in | cluster_0_0_47 |
| Q4616 | in | cluster_0_6_20 |
MCMC model¶
kgtk community-detection -i examples/docs/community-detection-arnold.tsv --method mcmc
| node1 | label | node2 | node2;prob |
|---|---|---|---|
| Q1086823 | in | cluster_0 | 1.0 |
| Q345517 | in | cluster_0 | 1.0 |
| Q432694 | in | cluster_3 | 0.7686768676867687 |
| Q75326809 | in | cluster_0 | 1.0 |
| Q75326777 | in | cluster_0 | 1.0 |
| Q75326779 | in | cluster_0 | 1.0 |
| Q75326780 | in | cluster_0 | 1.0 |
| Q96079835 | in | cluster_0 | 1.0 |
| Q96079836 | in | cluster_0 | 0.9998999899989999 |
| Q96079838 | in | cluster_0 | 0.9998999899989999 |
| Q76363382 | in | cluster_0 | 0.9998999899989999 |
| Q76363384 | in | cluster_0 | 0.9998999899989999 |
| Q76363386 | in | cluster_0 | 1.0 |
| Q11673 | in | cluster_2 | 0.9456945694569457 |
| Q467912 | in | cluster_2 | 1.0 |
| Q134549 | in | cluster_1 | 0.8934893489348935 |
| Q9696 | in | cluster_1 | 0.8660866086608661 |
| Q313696 | in | cluster_4 | 0.9994999499949995 |
| Q236540 | in | cluster_4 | 0.9994999499949995 |
| Q441424 | in | cluster_7 | 0.8274827482748275 |
| Q7926996 | in | cluster_7 | 0.5159515951595159 |
| Q25310 | in | cluster_6 | 1.0 |
| Q265595 | in | cluster_4 | 0.9995999599959996 |
| Q268799 | in | cluster_4 | 0.9994999499949995 |
| Q272401 | in | cluster_4 | 0.9993999399939995 |
| Q272908 | in | cluster_3 | 0.9943994399439944 |
| Q505178 | in | cluster_4 | 0.9992999299929993 |
| Q2383370 | in | cluster_7 | 0.8272827282728272 |
| Q3048622 | in | cluster_7 | 0.8273827382738274 |
| Q948920 | in | cluster_7 | 0.8272827282728272 |
| Q1352872 | in | cluster_2 | 1.0 |
| Q258661 | in | cluster_2 | 1.0 |
| Q1386420 | in | cluster_2 | 1.0 |
| Q1804720 | in | cluster_2 | 1.0 |
| Q1975383 | in | cluster_2 | 1.0 |
| Q273833 | in | cluster_2 | 1.0 |
| Q467861 | in | cluster_2 | 1.0 |
| Q5112377 | in | cluster_2 | 1.0 |
| Q5178632 | in | cluster_2 | 1.0 |
| Q5301573 | in | cluster_2 | 1.0 |
| Q6794923 | in | cluster_2 | 1.0 |
| Q165421 | in | cluster_7 | 1.0 |
| Q230303 | in | cluster_7 | 1.0 |
| Q316064 | in | cluster_7 | 1.0 |
| Q3290402 | in | cluster_7 | 1.0 |
| Q75326753 | in | cluster_7 | 1.0 |
| Q230654 | in | cluster_8 | 0.9826982698269827 |
| Q317248 | in | cluster_8 | 0.9997999799979999 |
| Q2685 | in | cluster_9 | 1.0 |
| Q3436301 | in | cluster_8 | 0.9998999899989999 |
| Q3529079 | in | cluster_8 | 1.0 |
| Q4773467 | in | cluster_8 | 0.9998999899989999 |
| Q6769708 | in | cluster_8 | 1.0 |
| Q28109921 | in | cluster_9 | 1.0 |
| Q28109928 | in | cluster_9 | 1.0 |
| Q4521676 | in | cluster_9 | 1.0 |
| Q901541 | in | cluster_9 | 1.0 |
| Q23800185 | in | cluster_9 | 1.0 |
| Q75494768 | in | cluster_9 | 1.0 |
| Q23800370 | in | cluster_9 | 0.9998999899989999 |
| Q3288486 | in | cluster_9 | 1.0 |
| Q96076900 | in | cluster_9 | 0.987998799879988 |
| Q38196234 | in | cluster_9 | 0.9998999899989999 |
| Q24004771 | in | cluster_9 | 0.9867986798679867 |
| Q96077739 | in | cluster_8 | 0.9931993199319932 |
| Q96077740 | in | cluster_8 | 0.9913991399139914 |
| Q65589427 | in | cluster_8 | 0.9426942694269427 |
| Q43100988 | in | cluster_9 | 0.986998699869987 |
| Q503706 | in | cluster_9 | 0.988998899889989 |
| Q4491 | in | cluster_9 | 0.9871987198719872 |
| Q65589450 | in | cluster_8 | 0.9425942594259425 |
| Q75496774 | in | cluster_9 | 0.9872987298729873 |
| Q4616 | in | cluster_7 | 0.5119511951195119 |