G-reasoner Retrieval And QA Configuration¶

This page documents the retrieval, QA, and path-visualization presets in gfmrag/workflow/config/gfm_reasoner/.

`qa_inference.yaml`¶

This preset is used by python -m gfmrag.workflow.qa --config-name gfm_reasoner/qa_inference.

gfmrag/workflow/config/gfm_reasoner/qa_inference.yaml

hydra:
  run:
    dir: outputs/qa_inference/${now:%Y-%m-%d}/${now:%H-%M-%S}
  searchpath:
    - pkg://gfmrag.workflow.config

defaults:
  - _self_
  - qa_prompt: hotpotqa
  - qa_evaluator: hotpotqa

seed: 1024

llm:
  _target_: gfmrag.llms.ChatGPT
  model_name_or_path: gpt-4o-mini
  retry: 5

test:
  n_sample: -1 # Number of samples to test, -1 means all samples
  top_k: 5
  n_threads: 5
  target_types: [document] # Node types to consider for retrieval
  retrieved_result_path: null
  node_path: null
  prediction_result_path: null

Top-level Fields¶

Parameter	Options	Note
`hydra.run.dir`	`outputs/qa_inference/${now:%Y-%m-%d}/${now:%H-%M-%S}`	Directory used by Hydra for QA inference outputs.
`hydra.searchpath`	`pkg://gfmrag.workflow.config`	Adds the packaged workflow config directory to Hydra's search path.
`defaults`	List of config groups	Selects the QA prompt and evaluator presets.
`seed`	Integer	Random seed used during inference.
`llm`	Mapping	Configures the LLM that turns retrieved evidence into final answers.
`test`	Mapping	Controls evaluation size, retrieval inputs, and output paths.

`defaults` Fields¶

Parameter	Options	Note
`_self_`	Current file	Loads the local values in this preset.
`qa_prompt`	`hotpotqa` by default	Prompt template used for answer generation.
`qa_evaluator`	`hotpotqa` by default	Evaluator used to score predicted answers.

`llm` Fields¶

Parameter	Options	Note
`_target_`	`gfmrag.llms.ChatGPT` by default	LLM wrapper class used for answer generation.
`model_name_or_path`	Any supported model name	Model used to answer from retrieved evidence.
`retry`	Integer	Maximum number of retry attempts when the LLM call fails.

`test` Fields¶

Parameter	Options	Note
`n_sample`	`-1` or positive integer	Number of samples to run. `-1` means all samples.
`top_k`	Positive integer	Number of retrieved nodes used to build the QA prompt.
`n_threads`	Positive integer	Number of worker threads used during evaluation.
`target_types`	List such as `[document]`	Node types consumed from the retrieval results.
`retrieved_result_path`	File path or `null`	Path to the saved retrieval predictions.
`node_path`	File path or `null`	Path to `processed/stage1/nodes.csv`.
`prediction_result_path`	File path or `null`	Optional output path for QA predictions.

`stage3_qa_ircot_inference.yaml`¶

This preset is used by python -m gfmrag.workflow.qa_ircot_inference --config-name gfm_reasoner/stage3_qa_ircot_inference.

gfmrag/workflow/config/gfm_reasoner/stage3_qa_ircot_inference.yaml

hydra:
  run:
    dir: outputs/qa_agent_inference/${dataset.data_name}/${now:%Y-%m-%d}/${now:%H-%M-%S} # Output directory
  searchpath:
    - pkg://gfmrag.workflow.config

defaults:
  - _self_
  - agent_prompt: hotpotqa_ircot # The agent prompt to use
  - qa_prompt: hotpotqa # The QA prompt to use
  - ner_model: llm_ner_model # The NER model to use
  - openie_model: llm_openie_model # The OpenIE model to use
  - el_model: colbert_el_model # The EL model to use
  - qa_evaluator: hotpotqa # The QA evaluator to use
  - graph_constructor: kg_constructor # The graph constructor to use

seed: 1024

dataset:
  root: ./data
  data_name: hotpotqa_test

llm:
  _target_: gfmrag.llms.ChatGPT # The language model to use
  model_name_or_path: gpt-4o-mini # The model name or path
  retry: 5 # Number of retries

graph_retriever:
  model_path: rmanluo/G-reasoner-34M # Checkpoint path of the pre-trained G-reasoner model
  ner_model: ${ner_model} # The NER model to use
  el_model: ${el_model} # The EL model to use
  qa_evaluator: ${qa_evaluator} # The QA evaluator to use
  target_type: document # The target type for the graph retriever
  graph_constructor: ${graph_constructor} # The graph constructor to use

test:
  top_k: 10 # Number of documents to retrieve
  max_steps: 2 # Maximum number of steps
  max_test_samples: -1 # -1 for all samples
  resume: null # Resume from previous prediction
  target_types: [document] # The target type for evaluation

Top-level Fields¶

Parameter	Options	Note
`hydra.run.dir`	`outputs/qa_agent_inference/${dataset.data_name}/${now:%Y-%m-%d}/${now:%H-%M-%S}`	Directory used by Hydra for agent-style QA outputs.
`hydra.searchpath`	`pkg://gfmrag.workflow.config`	Adds the packaged workflow config directory to Hydra's search path.
`defaults`	List of config groups	Selects prompts, graph-side components, and evaluator presets.
`seed`	Integer	Random seed used during inference.
`dataset`	Mapping	Selects the dataset root and dataset split.
`llm`	Mapping	Configures the reasoning LLM.
`graph_retriever`	Mapping	Controls the graph retriever checkpoint and graph-side dependencies.
`test`	Mapping	Controls retrieval depth, max reasoning steps, and resume behavior.

`defaults` Fields¶

Parameter	Options	Note
`_self_`	Current file	Loads the local values in this preset.
`agent_prompt`	`hotpotqa_ircot` by default	Prompt template for iterative reasoning.
`qa_prompt`	`hotpotqa` by default	Prompt template used for final answer generation.
`ner_model`	`llm_ner_model` by default	NER preset used during retrieval-time processing.
`openie_model`	`llm_openie_model` by default	OpenIE preset used when graph construction is needed.
`el_model`	`colbert_el_model` by default	Entity-linking preset used during retrieval-time processing.
`qa_evaluator`	`hotpotqa` by default	Evaluator used to score predicted answers.
`graph_constructor`	`kg_constructor` by default	Graph constructor preset used when stage1 needs to be built or refreshed.

`dataset` Fields¶

Parameter	Options	Note
`root`	Any valid data root	Root directory that contains the dataset folder.
`data_name`	Any dataset name	Dataset split used for reasoning-time inference.

`llm` Fields¶

Parameter	Options	Note
`_target_`	`gfmrag.llms.ChatGPT` by default	LLM wrapper class used for iterative reasoning.
`model_name_or_path`	Any supported model name	Model used by the reasoning agent.
`retry`	Integer	Maximum number of retry attempts when the LLM call fails.

`graph_retriever` Fields¶

Parameter	Options	Note
`model_path`	Local path or HF model id	Checkpoint of the pretrained `G-reasoner` model.
`ner_model`	`${ner_model}` by default	NER preset passed into the graph retriever.
`el_model`	`${el_model}` by default	EL preset passed into the graph retriever.
`qa_evaluator`	`${qa_evaluator}` by default	QA evaluator preset used by the graph retriever.
`target_type`	`document` or other node type	Target node type retrieved by the graph retriever.
`graph_constructor`	`${graph_constructor}` by default	Graph constructor preset used if stage1 data must be built.

`test` Fields¶

Parameter	Options	Note
`top_k`	Positive integer	Number of nodes retrieved per reasoning step.
`max_steps`	Positive integer	Maximum number of IRCOT reasoning steps.
`max_test_samples`	`-1` or positive integer	Number of examples to run. `-1` means all samples.
`resume`	File path or `null`	Resume from a partially written prediction file.
`target_types`	List such as `[document]`	Target node types used during evaluation.

`visualize_path.yaml`¶

This preset is used for path-visualization experiments on GraphIndexDataset.

gfmrag/workflow/config/gfm_reasoner/visualize_path.yaml

hydra:
  run:
    dir: outputs/experiments/visualize/${dataset.data_name}/${now:%Y-%m-%d}/${now:%H-%M-%S}
  searchpath:
    - pkg://gfmrag.workflow.config

defaults:
  - _self_

timeout: 60
seed: 1024

load_model_from_pretrained: null # Load model from pre-trained format, which would overwrite the model configuration

dataset:
  _target_: gfmrag.graph_index_datasets.GraphIndexDataset # The QA dataset class
  data_name: hotpotqa_test_v2
  root: ./data # data root directory
  force_reload: False # Whether to force rebuild the dataset

test_max_sample: 100

Top-level Fields¶

Parameter	Options	Note
`hydra.run.dir`	`outputs/experiments/visualize/${dataset.data_name}/${now:%Y-%m-%d}/${now:%H-%M-%S}`	Directory used by Hydra for visualization outputs.
`hydra.searchpath`	`pkg://gfmrag.workflow.config`	Adds the packaged workflow config directory to Hydra's search path.
`defaults`	List	Loads the local values in this preset.
`timeout`	Positive integer	Timeout in minutes for multi-GPU execution.
`seed`	Integer	Random seed used during the experiment.
`load_model_from_pretrained`	File path or `null`	Optional pretrained checkpoint that overrides the model definition.
`dataset`	Mapping	Dataset configuration for the visualization run.
`test_max_sample`	Positive integer	Maximum number of samples used for visualization.

`dataset` Fields¶

Parameter	Options	Note
`_target_`	`gfmrag.graph_index_datasets.GraphIndexDataset`	Dataset class used by the visualization script.
`data_name`	Any dataset name	Dataset split used for visualization.
`root`	Any valid data root	Root directory that contains the dataset folder.
`force_reload`	`True`, `False`	Whether to rebuild the dataset cache before visualization.

G-reasoner Retrieval And QA Configuration¶

qa_inference.yaml¶

Top-level Fields¶

defaults Fields¶

llm Fields¶

test Fields¶

stage3_qa_ircot_inference.yaml¶

Top-level Fields¶

defaults Fields¶

dataset Fields¶

llm Fields¶

graph_retriever Fields¶

test Fields¶

visualize_path.yaml¶

Top-level Fields¶

dataset Fields¶

Related Configurations¶

`qa_inference.yaml`¶

`defaults` Fields¶

`llm` Fields¶

`test` Fields¶

`stage3_qa_ircot_inference.yaml`¶

`defaults` Fields¶

`dataset` Fields¶

`llm` Fields¶

`graph_retriever` Fields¶

`test` Fields¶

`visualize_path.yaml`¶

`dataset` Fields¶