GFM-RAG Configs¶

This page groups the task presets under gfmrag/workflow/config/gfm_rag/.

Directory Layout¶

File	Purpose	Typical entrypoint
`index_dataset.yaml`	Build `processed/stage1/` from raw data	`python -m gfmrag.workflow.index_dataset`
`kgc_training.yaml`	Run KGC pretraining for the original `GFM-RAG` model family	`python -m gfmrag.workflow.kgc_training`
`qa_inference.yaml`	Run QA from saved retrieval outputs	`python -m gfmrag.workflow.qa`
`qa_ircot_inference.yaml`	Run retrieval plus IRCOT-style reasoning	`python -m gfmrag.workflow.qa_ircot_inference`
`sft_training.yaml`	Run supervised fine-tuning and retrieval evaluation	`python -m gfmrag.workflow.sft_training`
`visualize_path.yaml`	Visualize reasoning paths on dataset examples	visualization workflow
`exp_visualize_path.yaml`	Experimental visualization preset with retrieval controls	visualization workflow

The gfm_rag presets follow a stable pattern:

hydra.run.dir controls the output root.
defaults pulls in shared component groups such as ner_model, openie_model, el_model, text_emb_model, doc_ranker, and wandb.
task-specific sections such as dataset, datasets, graph_retriever, model, trainer, llm, and test then override the shared pieces.