Skip to content

G-reasoner Graph Index Configuration

This page documents the graph-index preset used by the G-reasoner workflow family.

index_dataset.yaml

This preset is used by python -m gfmrag.workflow.index_dataset --config-name gfm_reasoner/index_dataset.

gfmrag/workflow/config/gfm_reasoner/index_dataset.yaml

gfmrag/workflow/config/gfm_reasoner/index_dataset.yaml
hydra:
  run:
    dir: outputs/kg_construction/${now:%Y-%m-%d}/${now:%H-%M-%S} # Output directory
  searchpath:
    - pkg://gfmrag.workflow.config

defaults:
  - _self_
  - ner_model: llm_ner_model # The NER model to use
  - openie_model: llm_openie_model # The OpenIE model to use
  - el_model: colbert_el_model # The default EL model to use
  - text_emb_model: qwen3_8b # The text embedding model used by hipporag2 SFT constructor
  - graph_constructor: kg_constructor # The graph constructor to use
  - sft_constructor: hipporag2_sft_constructor # The SFT constructor to use

dataset:
  root: ./data # data root directory
  data_name: hotpotqa # data name
  force: False # Whether to force recompute the dataset

Compared with the GFM-RAG preset, this file additionally selects a text_emb_model because the default hipporag2_sft_constructor uses text embeddings during supervision-data construction.

Top-level Fields

Parameter Options Note
hydra.run.dir outputs/kg_construction/${now:%Y-%m-%d}/${now:%H-%M-%S} Directory used by Hydra for runtime logs and outputs.
hydra.searchpath pkg://gfmrag.workflow.config Adds the packaged workflow config directory to Hydra's search path.
defaults List of config groups Selects the shared component presets used by indexing.
dataset Mapping Controls the dataset root, dataset name, and whether to force recomputation.

defaults Fields

Parameter Options Note
_self_ Current file Loads the local values in this preset.
ner_model llm_ner_model by default Named entity recognition preset used by the SFT constructor.
openie_model llm_openie_model by default OpenIE preset used by the graph constructor.
el_model colbert_el_model by default Entity-linking preset used by both graph construction and supervision-data construction.
text_emb_model qwen3_8b by default Text embedding preset used by hipporag2_sft_constructor.
graph_constructor kg_constructor by default Graph construction preset that builds the stage1 graph files.
sft_constructor hipporag2_sft_constructor by default SFT constructor preset used to build G-reasoner supervision data.

dataset Fields

Parameter Options Note
root Any valid data root Root directory that contains the dataset folder.
data_name Any dataset name Dataset name under root.
force True, False Whether to rebuild the processed outputs even if cached files already exist.