Skip to content

SFT Constructor Configuration

SFT constructors turn raw QA files into processed supervision data for downstream training and evaluation.

GFM-RAG SFT Constructor

An example GFM-RAG SFT constructor configuration file is shown below:

gfm_rag_sft_constructor

gfmrag/workflow/config/sft_constructor/gfm_rag_sft_constructor.yaml
_target_: gfmrag.graph_index_construction.sft_constructors.GFMRAGConstructor # The SFTConstructor class
root: tmp/qa_construction # Temporary directory for storing intermediate files during SFT construction
ner_model: ${ner_model}
el_model: ${el_model}
num_processes: 10 # Number of processes to use
force: False # Whether to force recompute the QA data
Parameter Options Note
_target_ gfmrag.graph_index_construction.sft_constructors.GFMRAGConstructor The class name of GFMRAGConstructor.
root tmp/qa_construction Temporary directory for processed intermediate files.
ner_model ${ner_model} NER config used to identify start entities.
el_model ${el_model} EL config used to map entities into the graph.
num_processes Positive integer Number of worker processes used during preprocessing.
force True, False Whether to recompute processed outputs.

GFM-Reasoner SFT Constructor

An example GFM-Reasoner SFT constructor configuration file is shown below:

gfm_reasoner_sft_constructor

gfmrag/workflow/config/sft_constructor/gfm_reasoner_sft_constructor.yaml
_target_: gfmrag.graph_index_construction.sft_constructors.GFMReasonerConstructor # The SFTConstructor class
root: tmp/qa_construction # Temporary directory for storing intermediate files during SFT construction
ner_model: ${ner_model}
el_model: ${el_model}
num_processes: 10 # Number of processes to use
force: False # Whether to force recompute the QA data
Parameter Options Note
_target_ gfmrag.graph_index_construction.sft_constructors.GFMReasonerConstructor The class name of GFMReasonerConstructor.
root tmp/qa_construction Temporary directory for processed intermediate files.
ner_model ${ner_model} NER config used to identify start entities.
el_model ${el_model} EL config used to map entities into the graph.
num_processes Positive integer Number of worker processes used during preprocessing.
force True, False Whether to recompute processed outputs.

HippoRAG2 SFT Constructor

An example HippoRAG2 SFT constructor configuration file is shown below:

hipporag2_sft_constructor

gfmrag/workflow/config/sft_constructor/hipporag2_sft_constructor.yaml
_target_: gfmrag.graph_index_construction.sft_constructors.HippoRAG2Constructor # The SFTConstructor class
root: tmp/qa_construction # Temporary directory for storing intermediate files during SFT construction
text_emb_model: ${text_emb_model}
enable_filtering: True # Whether to enable filtering of the constructed QA data using a language model
num_processes: 1 # Number of processes to use
topk: 5 # Top-k nodes to be selected for each question for SFT data construction
llm_for_filtering: gpt-4o-mini # The name of the language model to use for filtering facts
retry: 5 # Number of retries for LLM inference in case of failure
force: False # Whether to force recompute the QA data
start_type: ["entity", "document"]
target_type: ["entity", "document"]
Parameter Options Note
_target_ gfmrag.graph_index_construction.sft_constructors.HippoRAG2Constructor The class name of HippoRAG2Constructor.
root tmp/qa_construction Temporary directory for processed intermediate files.
text_emb_model ${text_emb_model} Text embedding model used for candidate generation.
enable_filtering True, False Whether to run LLM-based filtering over candidate facts.
num_processes Positive integer Number of worker processes.
topk Positive integer Number of candidate nodes selected per question.
llm_for_filtering Model name LLM used for fact filtering (e.g. gpt-4o-mini).
retry Positive integer Retry count for filtering calls.
force True, False Whether to recompute processed outputs.
start_type List of node types Node types allowed in start nodes.
target_type List of node types Node types allowed in target nodes.