Skip to content

Entity Linking Model Config

Colbert EL Model Configuration

An example of a Colbert EL model configuration file is shown below:

colbertv2.0

gfmrag/workflow/config/el_model/colbert_el_model.yaml
_target_: gfmrag.kg_construction.entity_linking_model.ColbertELModel
checkpoint_path: tmp/colbertv2.0
root: tmp
doc_index_name: nbits_2
phrase_index_name: nbits_2

To use colbertv2.0 model, you need to download the checkpoint file and unzip it into the checkpoint_path (default: tmp/).

Bash
wget https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/colbertv2.0.tar.gz
tar -zxvf colbertv2.0.tar.gz -C tmp/

An example checkpoint file structure is shown below:

Text Only
tmp/colbertv2.0/
├── artifact.metadata
├── tokenizer.json
├── special_tokens_map.json
├── config.json
├── tokenizer_config.json
├── vocab.txt
└── pytorch_model.bin
Parameter Options Note
_target_ gfmrag.kg_construction.entity_linking_model.ColbertELModel The class name of Colbert EL model
checkpoint_path None The path to the checkpoint file.
root None The root directory of the model.
doc_index_name None The name of the document index.
phrase_index_name None The name of the phrase index.

Please refer to ColbertELModel for details on the other parameters.

Dense Pre-train Text Embedding Model Configuration

This configuration supports most of the dense pre-train text embedding models of SentenceTransformer. An example of a dense pre-train text embedding model configuration file is shown below:

DPR EL Model

gfmrag/workflow/config/el_model/dpr_el_model.yaml
_target_: gfmrag.kg_construction.entity_linking_model.DPRELModel
model_name: BAAI/bge-large-en-v1.5
root: tmp
use_cache: True
normalize: True
query_instruct: null
passage_instruct: null
model_kwargs: null
Parameter Options Note
_target_ gfmrag.kg_construction.entity_linking_model.DPRELModel The class name of Dense Pre-train Text Embedding model
model_name None The name of the dense pre-train text embedding model.
root None The root directory of the model.
use_cache True, False Whether to use cache.
normalize True, False Whether to normalize the embeddings.
query_instruct None The instruction for the query.
passage_instruct None The instruction for the passage.
model_kwargs None The additional model arguments.

Please refer to DPR EL Model for details on the other parameters.

Nvidia Embedding Model Configuration

This configuration supports most of the Nvidia embedding models. An example of a Nvidia embedding model configuration file is shown below:

nvidia/NV-Embed-v2

gfmrag/workflow/config/el_model/nv_embed_v2.yaml
_target_: gfmrag.kg_construction.entity_linking_model.NVEmbedV2ELModel
model_name: nvidia/NV-Embed-v2
root: tmp
use_cache: True
normalize: True
query_instruct: "Instruct: Given a entity, retrieve entities that are semantically equivalent to the given entity\nQuery: "
passage_instruct: null
model_kwargs:
  torch_dtype: bfloat16
Parameter Options Note
_target_ gfmrag.kg_construction.entity_linking_model.NVEmbedV2ELModel The class name of Nvidia Embedding model
model_name nvidia/NV-Embed-v2 The name of the Nvidia embedding model.
root None The root directory of the model.
use_cache True, False Whether to use cache.
normalize True, False Whether to normalize the embeddings.
query_instruct Instruct: Given a entity, retrieve entities that are semantically equivalent to the given entity\nQuery: The instruction for the query.
passage_instruct None The instruction for the passage.
model_kwargs {} The additional model arguments.

Please refer to NVEmbedV2 EL Model for details on the other parameters.