GFM-RAG Graph Index Configuration¶
This page documents the graph-index preset used by the GFM-RAG workflow family.
index_dataset.yaml¶
This preset is used by python -m gfmrag.workflow.index_dataset.
gfmrag/workflow/config/gfm_rag/index_dataset.yaml
gfmrag/workflow/config/gfm_rag/index_dataset.yaml
hydra:
run:
dir: outputs/kg_construction/${now:%Y-%m-%d}/${now:%H-%M-%S} # Output directory
searchpath:
- pkg://gfmrag.workflow.config
defaults:
- _self_
- ner_model: llm_ner_model # The NER model to use
- openie_model: llm_openie_model # The OpenIE model to use
- el_model: colbert_el_model # The EL model to use
- graph_constructor: kg_constructor
- sft_constructor: gfm_rag_sft_constructor
dataset:
root: ./data # data root directory
data_name: hotpotqa # data name
force: False # Whether to force recompute the dataset
Top-level Fields¶
| Parameter | Options | Note |
|---|---|---|
hydra.run.dir |
outputs/kg_construction/${now:%Y-%m-%d}/${now:%H-%M-%S} |
Directory used by Hydra for runtime logs and outputs. |
hydra.searchpath |
pkg://gfmrag.workflow.config |
Adds the packaged workflow config directory to Hydra's search path. |
defaults |
List of config groups | Selects the shared component presets used by indexing. |
dataset |
Mapping | Controls the dataset root, dataset name, and whether to force recomputation. |
defaults Fields¶
| Parameter | Options | Note |
|---|---|---|
_self_ |
Current file | Loads the local values in this preset. |
ner_model |
llm_ner_model by default |
Named entity recognition preset used when building processed QA data. |
openie_model |
llm_openie_model by default |
OpenIE preset used by the graph constructor. |
el_model |
colbert_el_model by default |
Entity-linking preset used during graph and supervision construction. |
graph_constructor |
kg_constructor by default |
Graph construction preset that builds the stage1 graph files. |
sft_constructor |
gfm_rag_sft_constructor by default |
SFT constructor preset that prepares processed supervision files. |
dataset Fields¶
| Parameter | Options | Note |
|---|---|---|
root |
Any valid data root | Root directory that contains the dataset folder. |
data_name |
Any dataset name | Dataset name under root. |
force |
True, False |
Whether to rebuild the processed outputs even if cached files already exist. |