Text Embedding Model Configuration¶

Pre-train Text Embedding Model Configuration¶

This configuration supports most of the pre-train text embedding models of SentenceTransformer. Examples of DPR text embedding model configuration files are shown below:

all-mpnet-base-v2

gfmrag/workflow/config/text_emb_model/mpnet.yaml

_target_: gfmrag.text_emb_models.BaseTextEmbModel
text_emb_model_name: sentence-transformers/all-mpnet-base-v2
normalize: False
batch_size: 32
query_instruct: null
passage_instruct: null
model_kwargs: null

BAAI/bge-large-en

gfmrag/workflow/config/text_emb_model/bge_large_en.yaml

_target_: gfmrag.text_emb_models.BaseTextEmbModel
text_emb_model_name: BAAI/bge-large-en
normalize: True
batch_size: 32
query_instruct: "Represent this sentence for searching relevant passages: "
passage_instruct: null
model_kwargs: null

Parameter	Options	Note
`_target_`	`gfmrag.text_emb_models.BaseTextEmbModel`	The class name of Text Embedding model
`text_emb_model_name`	None	The name of the pre-train text embedding model.
`normalize`	`True`, `False`	Whether to normalize the embeddings.
`query_instruct`	None	The instruction for the query.
`passage_instruct`	None	The instruction for the passage.
`model_kwargs`	`{}`	The additional model arguments.

Nvidia Embedding Model Configuration¶

This configuration supports the Nvidia embedding models. An example of a Nvidia embedding model configuration file is shown below:

nvidia/NV-Embed-v2

gfmrag/workflow/config/text_emb_model/nv_embed_v2.yaml

_target_: gfmrag.text_emb_models.NVEmbedV2
text_emb_model_name: nvidia/NV-Embed-v2
normalize: True
batch_size: 32
query_instruct: "Instruct: Given a question, retrieve entities that can help answer the question\nQuery: "
passage_instruct: null
model_kwargs:
  torch_dtype: bfloat16

Parameter	Options	Note
`_target_`	`gfmrag.kg_construction.entity_linking_model.NVEmbedV2ELModel`	The class name of Nvidia Embedding model
`text_emb_model_name`	`nvidia/NV-Embed-v2`	The name of the Nvidia embedding model.
`normalize`	`True`, `False`	Whether to normalize the embeddings.
`query_instruct`	`Instruct: Given an entity, retrieve entities that are semantically equivalent to the given entity\nQuery:`	The instruction for the query.
`passage_instruct`	None	The instruction for the passage.
`model_kwargs`	`{}`	The additional model arguments.