Skip to content

GFM-RAG Retrieval And QA Config

This page documents the retrieval and inference presets in gfmrag/workflow/config/gfm_rag/.

Files Covered

  • qa_inference.yaml
  • qa_ircot_inference.yaml
  • visualize_path.yaml
  • exp_visualize_path.yaml

qa_inference.yaml

This preset is used by python -m gfmrag.workflow.qa to turn saved retrieval outputs into final QA predictions.

gfmrag/workflow/config/gfm_rag/qa_inference.yaml

gfmrag/workflow/config/gfm_rag/qa_inference.yaml
hydra:
  run:
    dir: outputs/qa_inference/${now:%Y-%m-%d}/${now:%H-%M-%S}
  searchpath:
    - pkg://gfmrag.workflow.config

defaults:
  - _self_
  - qa_prompt: hotpotqa
  - qa_evaluator: hotpotqa

seed: 1024

llm:
  _target_: gfmrag.llms.ChatGPT
  model_name_or_path: gpt-4o-mini
  retry: 5

test:
  n_sample: -1 # Number of samples to test, -1 means all samples
  top_k: 5
  n_threads: 5
  target_types: [document] # Node types to consider for retrieval
  retrieved_result_path: null
  node_path: null
  prediction_result_path: null

Top-level Fields

Parameter Options Note
hydra.run.dir outputs/qa_inference/<date>/<time>/ Directory used by Hydra for QA inference outputs.
defaults List of config groups Pulls in qa_prompt and qa_evaluator.
llm Mapping Configures the answer-generation model.
test Mapping Points to the retrieval result file, node table, and decoding options.

test Fields

Parameter Options Note
llm.model_name_or_path Model name or path LLM used to generate final answers.
test.retrieved_result_path File path Path to retrieval predictions, typically predictions_<data_name>.json.
test.node_path File path Path to processed/stage1/nodes.csv.
test.top_k Positive integer Number of retrieved nodes used to build the QA prompt.
test.target_types List of node types Target node types to read from the retrieval output.

qa_ircot_inference.yaml

This preset is used by python -m gfmrag.workflow.qa_ircot_inference to run retrieval and IRCOT-style reasoning in one workflow.

defaults Fields

Parameter Options Note
agent_prompt hotpotqa_ircot by default Prompt template for iterative reasoning.
qa_prompt hotpotqa by default Prompt template used for final answer generation.
ner_model llm_ner_model by default NER preset used during retrieval-time processing.
openie_model llm_openie_model by default OpenIE preset used when graph construction is needed.
el_model colbert_el_model by default Entity-linking preset used during retrieval-time processing.
qa_evaluator hotpotqa by default Evaluator used to score predicted answers.
graph_constructor kg_constructor by default Graph constructor preset used if stage1 data must be built.

Key Fields

Parameter Options Note
dataset Mapping Selects the dataset root and test split.
llm Mapping Chooses the reasoning and answer-generation model.
graph_retriever Mapping Selects the checkpoint and graph-side components.
test Mapping Controls top_k, max_steps, resume path, and target types.
graph_retriever.model_path rmanluo/GFM-RAG-8M by default Checkpoint path of the pretrained model.
graph_retriever.graph_constructor ${graph_constructor} Constructor used when stage1 needs to be built.
test.max_steps Positive integer Maximum IRCOT reasoning steps.
test.resume File path or null Resume from a partially written prediction file.

Visualization Presets

visualize_path.yaml

Use this preset for path visualization on a single GraphIndexDatasetV1 dataset. It loads the dataset directly rather than using dataset.root and dataset.data_name as a separate pair.

exp_visualize_path.yaml

This experimental preset adds retrieval-oriented controls such as:

  • test.retrieval_batch_size
  • test.save_retrieval
  • test.save_top_k_entity
  • test.max_sample