GFM-RAG Retrieval And QA Config¶
This page documents the retrieval and inference presets in gfmrag/workflow/config/gfm_rag/.
Files Covered¶
qa_inference.yamlqa_ircot_inference.yamlvisualize_path.yamlexp_visualize_path.yaml
qa_inference.yaml¶
This preset is used by python -m gfmrag.workflow.qa to turn saved retrieval outputs into final QA predictions.
gfmrag/workflow/config/gfm_rag/qa_inference.yaml
gfmrag/workflow/config/gfm_rag/qa_inference.yaml
hydra:
run:
dir: outputs/qa_inference/${now:%Y-%m-%d}/${now:%H-%M-%S}
searchpath:
- pkg://gfmrag.workflow.config
defaults:
- _self_
- qa_prompt: hotpotqa
- qa_evaluator: hotpotqa
seed: 1024
llm:
_target_: gfmrag.llms.ChatGPT
model_name_or_path: gpt-4o-mini
retry: 5
test:
n_sample: -1 # Number of samples to test, -1 means all samples
top_k: 5
n_threads: 5
target_types: [document] # Node types to consider for retrieval
retrieved_result_path: null
node_path: null
prediction_result_path: null
Top-level Fields¶
| Parameter | Options | Note |
|---|---|---|
hydra.run.dir |
outputs/qa_inference/<date>/<time>/ |
Directory used by Hydra for QA inference outputs. |
defaults |
List of config groups | Pulls in qa_prompt and qa_evaluator. |
llm |
Mapping | Configures the answer-generation model. |
test |
Mapping | Points to the retrieval result file, node table, and decoding options. |
test Fields¶
| Parameter | Options | Note |
|---|---|---|
llm.model_name_or_path |
Model name or path | LLM used to generate final answers. |
test.retrieved_result_path |
File path | Path to retrieval predictions, typically predictions_<data_name>.json. |
test.node_path |
File path | Path to processed/stage1/nodes.csv. |
test.top_k |
Positive integer | Number of retrieved nodes used to build the QA prompt. |
test.target_types |
List of node types | Target node types to read from the retrieval output. |
qa_ircot_inference.yaml¶
This preset is used by python -m gfmrag.workflow.qa_ircot_inference to run retrieval and IRCOT-style reasoning in one workflow.
defaults Fields¶
| Parameter | Options | Note |
|---|---|---|
agent_prompt |
hotpotqa_ircot by default |
Prompt template for iterative reasoning. |
qa_prompt |
hotpotqa by default |
Prompt template used for final answer generation. |
ner_model |
llm_ner_model by default |
NER preset used during retrieval-time processing. |
openie_model |
llm_openie_model by default |
OpenIE preset used when graph construction is needed. |
el_model |
colbert_el_model by default |
Entity-linking preset used during retrieval-time processing. |
qa_evaluator |
hotpotqa by default |
Evaluator used to score predicted answers. |
graph_constructor |
kg_constructor by default |
Graph constructor preset used if stage1 data must be built. |
Key Fields¶
| Parameter | Options | Note |
|---|---|---|
dataset |
Mapping | Selects the dataset root and test split. |
llm |
Mapping | Chooses the reasoning and answer-generation model. |
graph_retriever |
Mapping | Selects the checkpoint and graph-side components. |
test |
Mapping | Controls top_k, max_steps, resume path, and target types. |
graph_retriever.model_path |
rmanluo/GFM-RAG-8M by default |
Checkpoint path of the pretrained model. |
graph_retriever.graph_constructor |
${graph_constructor} |
Constructor used when stage1 needs to be built. |
test.max_steps |
Positive integer | Maximum IRCOT reasoning steps. |
test.resume |
File path or null |
Resume from a partially written prediction file. |
Visualization Presets¶
visualize_path.yaml¶
Use this preset for path visualization on a single GraphIndexDatasetV1 dataset. It loads the dataset directly rather than using dataset.root and dataset.data_name as a separate pair.
exp_visualize_path.yaml¶
This experimental preset adds retrieval-oriented controls such as:
test.retrieval_batch_sizetest.save_retrievaltest.save_top_k_entitytest.max_sample