KG-index Config¶
This page documents the top-level indexing presets that reference the graph constructor and SFT constructor config groups.
Files Covered¶
gfmrag/workflow/config/gfm_rag/index_dataset.yamlgfmrag/workflow/config/gfm_reasoner/index_dataset.yaml
Purpose¶
These presets are used by python -m gfmrag.workflow.index_dataset to:
- select shared construction components
- choose the graph constructor preset
- choose the SFT constructor preset
- specify the dataset root and dataset name
Shared Top-level Fields¶
| Parameter | Options | Note |
|---|---|---|
hydra.run.dir |
outputs/kg_construction/<date>/<time>/ |
Directory used by Hydra for runtime logs and outputs. |
defaults |
List of config groups | Pulls in component config groups. |
dataset |
Mapping | Chooses the dataset root, dataset name, and force flag. |
Defaults in gfm_rag/index_dataset.yaml¶
| Parameter | Options | Note |
|---|---|---|
ner_model |
llm_ner_model by default |
Named entity recognition preset. |
openie_model |
llm_openie_model by default |
OpenIE preset used by the graph constructor. |
el_model |
colbert_el_model by default |
Entity-linking preset. |
graph_constructor |
kg_constructor by default |
Graph construction preset. |
sft_constructor |
gfm_rag_sft_constructor by default |
SFT constructor preset. |
Defaults in gfm_reasoner/index_dataset.yaml¶
| Parameter | Options | Note |
|---|---|---|
ner_model |
llm_ner_model by default |
Named entity recognition preset. |
openie_model |
llm_openie_model by default |
OpenIE preset used by the graph constructor. |
el_model |
colbert_el_model by default |
Entity-linking preset. |
text_emb_model |
qwen3_8b by default |
Text embedding preset used by hipporag2_sft_constructor. |
graph_constructor |
kg_constructor by default |
Graph construction preset. |
sft_constructor |
hipporag2_sft_constructor by default |
SFT constructor preset. |
dataset Fields¶
| Parameter | Options | Note |
|---|---|---|
dataset.root |
Any valid data root | Root directory that contains the dataset folder. |
dataset.data_name |
Any dataset name | Dataset name under root. |
dataset.force |
True, False |
Whether to force recomputation even if cached files exist. |
Constructor Config Groups¶
The two constructor groups are documented separately:
Use those pages when you want to change constructor-specific fields such as num_processes, threshold, topk, enable_filtering, or temporary working directories.