Skip to content

G-reasoner Retrieval And QA Configuration

This page documents the retrieval, QA, and path-visualization presets in gfmrag/workflow/config/gfm_reasoner/.

qa_inference.yaml

This preset is used by python -m gfmrag.workflow.qa --config-name gfm_reasoner/qa_inference.

gfmrag/workflow/config/gfm_reasoner/qa_inference.yaml

gfmrag/workflow/config/gfm_reasoner/qa_inference.yaml
hydra:
  run:
    dir: outputs/qa_inference/${now:%Y-%m-%d}/${now:%H-%M-%S}
  searchpath:
    - pkg://gfmrag.workflow.config

defaults:
  - _self_
  - qa_prompt: hotpotqa
  - qa_evaluator: hotpotqa

seed: 1024

llm:
  _target_: gfmrag.llms.ChatGPT
  model_name_or_path: gpt-4o-mini
  retry: 5

test:
  n_sample: -1 # Number of samples to test, -1 means all samples
  top_k: 5
  n_threads: 5
  target_types: [document] # Node types to consider for retrieval
  retrieved_result_path: null
  node_path: null
  prediction_result_path: null

Top-level Fields

Parameter Options Note
hydra.run.dir outputs/qa_inference/${now:%Y-%m-%d}/${now:%H-%M-%S} Directory used by Hydra for QA inference outputs.
hydra.searchpath pkg://gfmrag.workflow.config Adds the packaged workflow config directory to Hydra's search path.
defaults List of config groups Selects the QA prompt and evaluator presets.
seed Integer Random seed used during inference.
llm Mapping Configures the LLM that turns retrieved evidence into final answers.
test Mapping Controls evaluation size, retrieval inputs, and output paths.

defaults Fields

Parameter Options Note
_self_ Current file Loads the local values in this preset.
qa_prompt hotpotqa by default Prompt template used for answer generation.
qa_evaluator hotpotqa by default Evaluator used to score predicted answers.

llm Fields

Parameter Options Note
_target_ gfmrag.llms.ChatGPT by default LLM wrapper class used for answer generation.
model_name_or_path Any supported model name Model used to answer from retrieved evidence.
retry Integer Maximum number of retry attempts when the LLM call fails.

test Fields

Parameter Options Note
n_sample -1 or positive integer Number of samples to run. -1 means all samples.
top_k Positive integer Number of retrieved nodes used to build the QA prompt.
n_threads Positive integer Number of worker threads used during evaluation.
target_types List such as [document] Node types consumed from the retrieval results.
retrieved_result_path File path or null Path to the saved retrieval predictions.
node_path File path or null Path to processed/stage1/nodes.csv.
prediction_result_path File path or null Optional output path for QA predictions.

stage3_qa_ircot_inference.yaml

This preset is used by python -m gfmrag.workflow.qa_ircot_inference --config-name gfm_reasoner/stage3_qa_ircot_inference.

gfmrag/workflow/config/gfm_reasoner/stage3_qa_ircot_inference.yaml

gfmrag/workflow/config/gfm_reasoner/stage3_qa_ircot_inference.yaml
hydra:
  run:
    dir: outputs/qa_agent_inference/${dataset.data_name}/${now:%Y-%m-%d}/${now:%H-%M-%S} # Output directory
  searchpath:
    - pkg://gfmrag.workflow.config

defaults:
  - _self_
  - agent_prompt: hotpotqa_ircot # The agent prompt to use
  - qa_prompt: hotpotqa # The QA prompt to use
  - ner_model: llm_ner_model # The NER model to use
  - openie_model: llm_openie_model # The OpenIE model to use
  - el_model: colbert_el_model # The EL model to use
  - qa_evaluator: hotpotqa # The QA evaluator to use
  - graph_constructor: kg_constructor # The graph constructor to use

seed: 1024

dataset:
  root: ./data
  data_name: hotpotqa_test

llm:
  _target_: gfmrag.llms.ChatGPT # The language model to use
  model_name_or_path: gpt-4o-mini # The model name or path
  retry: 5 # Number of retries

graph_retriever:
  model_path: rmanluo/G-reasoner-34M # Checkpoint path of the pre-trained G-reasoner model
  ner_model: ${ner_model} # The NER model to use
  el_model: ${el_model} # The EL model to use
  qa_evaluator: ${qa_evaluator} # The QA evaluator to use
  target_type: document # The target type for the graph retriever
  graph_constructor: ${graph_constructor} # The graph constructor to use

test:
  top_k: 10 # Number of documents to retrieve
  max_steps: 2 # Maximum number of steps
  max_test_samples: -1 # -1 for all samples
  resume: null # Resume from previous prediction
  target_types: [document] # The target type for evaluation

Top-level Fields

Parameter Options Note
hydra.run.dir outputs/qa_agent_inference/${dataset.data_name}/${now:%Y-%m-%d}/${now:%H-%M-%S} Directory used by Hydra for agent-style QA outputs.
hydra.searchpath pkg://gfmrag.workflow.config Adds the packaged workflow config directory to Hydra's search path.
defaults List of config groups Selects prompts, graph-side components, and evaluator presets.
seed Integer Random seed used during inference.
dataset Mapping Selects the dataset root and dataset split.
llm Mapping Configures the reasoning LLM.
graph_retriever Mapping Controls the graph retriever checkpoint and graph-side dependencies.
test Mapping Controls retrieval depth, max reasoning steps, and resume behavior.

defaults Fields

Parameter Options Note
_self_ Current file Loads the local values in this preset.
agent_prompt hotpotqa_ircot by default Prompt template for iterative reasoning.
qa_prompt hotpotqa by default Prompt template used for final answer generation.
ner_model llm_ner_model by default NER preset used during retrieval-time processing.
openie_model llm_openie_model by default OpenIE preset used when graph construction is needed.
el_model colbert_el_model by default Entity-linking preset used during retrieval-time processing.
qa_evaluator hotpotqa by default Evaluator used to score predicted answers.
graph_constructor kg_constructor by default Graph constructor preset used when stage1 needs to be built or refreshed.

dataset Fields

Parameter Options Note
root Any valid data root Root directory that contains the dataset folder.
data_name Any dataset name Dataset split used for reasoning-time inference.

llm Fields

Parameter Options Note
_target_ gfmrag.llms.ChatGPT by default LLM wrapper class used for iterative reasoning.
model_name_or_path Any supported model name Model used by the reasoning agent.
retry Integer Maximum number of retry attempts when the LLM call fails.

graph_retriever Fields

Parameter Options Note
model_path Local path or HF model id Checkpoint of the pretrained G-reasoner model.
ner_model ${ner_model} by default NER preset passed into the graph retriever.
el_model ${el_model} by default EL preset passed into the graph retriever.
qa_evaluator ${qa_evaluator} by default QA evaluator preset used by the graph retriever.
target_type document or other node type Target node type retrieved by the graph retriever.
graph_constructor ${graph_constructor} by default Graph constructor preset used if stage1 data must be built.

test Fields

Parameter Options Note
top_k Positive integer Number of nodes retrieved per reasoning step.
max_steps Positive integer Maximum number of IRCOT reasoning steps.
max_test_samples -1 or positive integer Number of examples to run. -1 means all samples.
resume File path or null Resume from a partially written prediction file.
target_types List such as [document] Target node types used during evaluation.

visualize_path.yaml

This preset is used for path-visualization experiments on GraphIndexDataset.

gfmrag/workflow/config/gfm_reasoner/visualize_path.yaml

gfmrag/workflow/config/gfm_reasoner/visualize_path.yaml
hydra:
  run:
    dir: outputs/experiments/visualize/${dataset.data_name}/${now:%Y-%m-%d}/${now:%H-%M-%S}
  searchpath:
    - pkg://gfmrag.workflow.config

defaults:
  - _self_

timeout: 60
seed: 1024

load_model_from_pretrained: null # Load model from pre-trained format, which would overwrite the model configuration

dataset:
  _target_: gfmrag.graph_index_datasets.GraphIndexDataset # The QA dataset class
  data_name: hotpotqa_test_v2
  root: ./data # data root directory
  force_reload: False # Whether to force rebuild the dataset

test_max_sample: 100

Top-level Fields

Parameter Options Note
hydra.run.dir outputs/experiments/visualize/${dataset.data_name}/${now:%Y-%m-%d}/${now:%H-%M-%S} Directory used by Hydra for visualization outputs.
hydra.searchpath pkg://gfmrag.workflow.config Adds the packaged workflow config directory to Hydra's search path.
defaults List Loads the local values in this preset.
timeout Positive integer Timeout in minutes for multi-GPU execution.
seed Integer Random seed used during the experiment.
load_model_from_pretrained File path or null Optional pretrained checkpoint that overrides the model definition.
dataset Mapping Dataset configuration for the visualization run.
test_max_sample Positive integer Maximum number of samples used for visualization.

dataset Fields

Parameter Options Note
_target_ gfmrag.graph_index_datasets.GraphIndexDataset Dataset class used by the visualization script.
data_name Any dataset name Dataset split used for visualization.
root Any valid data root Root directory that contains the dataset folder.
force_reload True, False Whether to rebuild the dataset cache before visualization.