NER Model

`gfmrag.kg_construction.ner_model` ¶

`BaseNERModel` ¶

Bases: ABC

Source code in gfmrag/kg_construction/ner_model/base_model.py

Python

class BaseNERModel(ABC):
    @abstractmethod
    def __init__(self, **kwargs: Any) -> None:
        pass

    @abstractmethod
    def __call__(self, text: str) -> list:
        """
        This method implements the callable functionality of the class to perform Named Entity Recognition
        on input text. When an instance of the class is called directly, this method is invoked.

        Args:
            text (str): The input text to perform NER analysis on.

        Returns:
            list: A list of named entities found in the text. Each entity is represented
                  according to the model's output format.

        Examples:
            >>> ner = NERModel()
            >>> entities = ner("This is a sample text")
            >>> print(entities)
            [{'text': 'sample', 'label': 'EXAMPLE'}]

        Note:
            This is an abstract method that should be implemented by subclasses.
        """
        pass

`call(text)` `abstractmethod` ¶

This method implements the callable functionality of the class to perform Named Entity Recognition on input text. When an instance of the class is called directly, this method is invoked.

Parameters:

Name	Type	Description	Default
`text`	`str`	The input text to perform NER analysis on.	required

Returns:

Name	Type	Description
`list`	`list`	A list of named entities found in the text. Each entity is represented according to the model's output format.

Examples:

Python Console Session

>>> ner = NERModel()
>>> entities = ner("This is a sample text")
>>> print(entities)
[{'text': 'sample', 'label': 'EXAMPLE'}]

Note

This is an abstract method that should be implemented by subclasses.

Source code in gfmrag/kg_construction/ner_model/base_model.py

Python

@abstractmethod
def __call__(self, text: str) -> list:
    """
    This method implements the callable functionality of the class to perform Named Entity Recognition
    on input text. When an instance of the class is called directly, this method is invoked.

    Args:
        text (str): The input text to perform NER analysis on.

    Returns:
        list: A list of named entities found in the text. Each entity is represented
              according to the model's output format.

    Examples:
        >>> ner = NERModel()
        >>> entities = ner("This is a sample text")
        >>> print(entities)
        [{'text': 'sample', 'label': 'EXAMPLE'}]

    Note:
        This is an abstract method that should be implemented by subclasses.
    """
    pass

`LLMNERModel` ¶

Bases: BaseNERModel

A Named Entity Recognition (NER) model that uses Language Models (LLMs) for entity extraction.

This class implements entity extraction using various LLM backends (OpenAI, Together, Ollama, llama.cpp) through the Langchain interface. It processes text input and returns a list of extracted named entities.

Parameters:

Name	Type	Description	Default
`llm_api`	`Literal['openai', 'together', 'ollama', 'llama.cpp']`	The LLM backend to use. Defaults to "openai".	`'openai'`
`model_name`	`str`	Name of the specific model to use. Defaults to "gpt-4o-mini".	`'gpt-4o-mini'`
`max_tokens`	`int`	Maximum number of tokens in the response. Defaults to 1024.	`1024`

Methods:

Name	Description
`__call__`	Extracts named entities from the input text.

Raises:

Type	Description
`Exception`	If there's an error in extracting or processing named entities.

Source code in gfmrag/kg_construction/ner_model/llm_ner_model.py

Python

class LLMNERModel(BaseNERModel):
    """A Named Entity Recognition (NER) model that uses Language Models (LLMs) for entity extraction.

    This class implements entity extraction using various LLM backends (OpenAI, Together, Ollama, llama.cpp)
    through the Langchain interface. It processes text input and returns a list of extracted named entities.

    Args:
        llm_api (Literal["openai", "together", "ollama", "llama.cpp"]): The LLM backend to use. Defaults to "openai".
        model_name (str): Name of the specific model to use. Defaults to "gpt-4o-mini".
        max_tokens (int): Maximum number of tokens in the response. Defaults to 1024.

    Methods:
        __call__: Extracts named entities from the input text.

    Raises:
        Exception: If there's an error in extracting or processing named entities.
    """

    def __init__(
        self,
        llm_api: Literal["openai", "together", "ollama", "llama.cpp"] = "openai",
        model_name: str = "gpt-4o-mini",
        max_tokens: int = 1024,
    ):
        """Initialize the LLM-based NER model.

        Args:
            llm_api (Literal["openai", "together", "ollama", "llama.cpp"]): The LLM API provider to use.
                Defaults to "openai".
            model_name (str): Name of the language model to use.
                Defaults to "gpt-4o-mini".
            max_tokens (int): Maximum number of tokens for model output.
                Defaults to 1024.
        """

        self.llm_api = llm_api
        self.model_name = model_name
        self.max_tokens = max_tokens

        self.client = init_langchain_model(llm_api, model_name)

    def __call__(self, text: str) -> list:
        """Process text input to extract named entities using different chat models.

        This method handles entity extraction using various chat models (OpenAI, Ollama, LlamaCpp),
        with special handling for JSON mode responses.

        Args:
            text (str): The input text to extract named entities from.

        Returns:
            list: A list of processed named entities extracted from the text.
                 Returns empty list if extraction fails.

        Raises:
            None: Exceptions are caught and handled internally, logging errors when they occur.

        Examples:
            >>> ner_model = NERModel()
            >>> entities = ner_model("Sample text with named entities")
            >>> print(entities)
            ['Entity1', 'Entity2']
        """
        query_ner_prompts = ChatPromptTemplate.from_messages(
            [
                SystemMessage("You're a very effective entity extraction system."),
                HumanMessage(query_prompt_one_shot_input),
                AIMessage(query_prompt_one_shot_output),
                HumanMessage(query_prompt_template.format(text)),
            ]
        )
        query_ner_messages = query_ner_prompts.format_prompt()

        json_mode = False
        if isinstance(self.client, ChatOpenAI):  # JSON mode
            chat_completion = self.client.invoke(
                query_ner_messages.to_messages(),
                temperature=0,
                max_tokens=self.max_tokens,
                stop=["\n\n"],
                response_format={"type": "json_object"},
            )
            response_content = chat_completion.content
            chat_completion.response_metadata["token_usage"]["total_tokens"]
            json_mode = True
        elif isinstance(self.client, ChatOllama) or isinstance(
            self.client, ChatLlamaCpp
        ):
            response_content = self.client.invoke(query_ner_messages.to_messages())
            response_content = extract_json_dict(response_content)
            len(response_content.split())
        else:  # no JSON mode
            chat_completion = self.client.invoke(
                query_ner_messages.to_messages(),
                temperature=0,
                max_tokens=self.max_tokens,
                stop=["\n\n"],
            )
            response_content = chat_completion.content
            response_content = extract_json_dict(response_content)
            chat_completion.response_metadata["token_usage"]["total_tokens"]

        if not json_mode:
            try:
                assert "named_entities" in response_content
                response_content = str(response_content)
            except Exception as e:
                print("Query NER exception", e)
                response_content = {"named_entities": []}

        try:
            ner_list = eval(response_content)["named_entities"]
            query_ner_list = [processing_phrases(ner) for ner in ner_list]
            return query_ner_list
        except Exception as e:
            logger.error(f"Error in extracting named entities: {e}")
            return []

`call(text)` ¶

Process text input to extract named entities using different chat models.

This method handles entity extraction using various chat models (OpenAI, Ollama, LlamaCpp), with special handling for JSON mode responses.

Parameters:

Name	Type	Description	Default
`text`	`str`	The input text to extract named entities from.	required

Returns:

Name	Type	Description
`list`	`list`	A list of processed named entities extracted from the text. Returns empty list if extraction fails.

Raises:

Type	Description
`None`	Exceptions are caught and handled internally, logging errors when they occur.

Examples:

Python Console Session

>>> ner_model = NERModel()
>>> entities = ner_model("Sample text with named entities")
>>> print(entities)
['Entity1', 'Entity2']

Source code in gfmrag/kg_construction/ner_model/llm_ner_model.py

Python

def __call__(self, text: str) -> list:
    """Process text input to extract named entities using different chat models.

    This method handles entity extraction using various chat models (OpenAI, Ollama, LlamaCpp),
    with special handling for JSON mode responses.

    Args:
        text (str): The input text to extract named entities from.

    Returns:
        list: A list of processed named entities extracted from the text.
             Returns empty list if extraction fails.

    Raises:
        None: Exceptions are caught and handled internally, logging errors when they occur.

    Examples:
        >>> ner_model = NERModel()
        >>> entities = ner_model("Sample text with named entities")
        >>> print(entities)
        ['Entity1', 'Entity2']
    """
    query_ner_prompts = ChatPromptTemplate.from_messages(
        [
            SystemMessage("You're a very effective entity extraction system."),
            HumanMessage(query_prompt_one_shot_input),
            AIMessage(query_prompt_one_shot_output),
            HumanMessage(query_prompt_template.format(text)),
        ]
    )
    query_ner_messages = query_ner_prompts.format_prompt()

    json_mode = False
    if isinstance(self.client, ChatOpenAI):  # JSON mode
        chat_completion = self.client.invoke(
            query_ner_messages.to_messages(),
            temperature=0,
            max_tokens=self.max_tokens,
            stop=["\n\n"],
            response_format={"type": "json_object"},
        )
        response_content = chat_completion.content
        chat_completion.response_metadata["token_usage"]["total_tokens"]
        json_mode = True
    elif isinstance(self.client, ChatOllama) or isinstance(
        self.client, ChatLlamaCpp
    ):
        response_content = self.client.invoke(query_ner_messages.to_messages())
        response_content = extract_json_dict(response_content)
        len(response_content.split())
    else:  # no JSON mode
        chat_completion = self.client.invoke(
            query_ner_messages.to_messages(),
            temperature=0,
            max_tokens=self.max_tokens,
            stop=["\n\n"],
        )
        response_content = chat_completion.content
        response_content = extract_json_dict(response_content)
        chat_completion.response_metadata["token_usage"]["total_tokens"]

    if not json_mode:
        try:
            assert "named_entities" in response_content
            response_content = str(response_content)
        except Exception as e:
            print("Query NER exception", e)
            response_content = {"named_entities": []}

    try:
        ner_list = eval(response_content)["named_entities"]
        query_ner_list = [processing_phrases(ner) for ner in ner_list]
        return query_ner_list
    except Exception as e:
        logger.error(f"Error in extracting named entities: {e}")
        return []

`init(llm_api='openai', model_name='gpt-4o-mini', max_tokens=1024)` ¶

Initialize the LLM-based NER model.

Parameters:

Name	Type	Description	Default
`llm_api`	`Literal['openai', 'together', 'ollama', 'llama.cpp']`	The LLM API provider to use. Defaults to "openai".	`'openai'`
`model_name`	`str`	Name of the language model to use. Defaults to "gpt-4o-mini".	`'gpt-4o-mini'`
`max_tokens`	`int`	Maximum number of tokens for model output. Defaults to 1024.	`1024`

Source code in gfmrag/kg_construction/ner_model/llm_ner_model.py

Python

def __init__(
    self,
    llm_api: Literal["openai", "together", "ollama", "llama.cpp"] = "openai",
    model_name: str = "gpt-4o-mini",
    max_tokens: int = 1024,
):
    """Initialize the LLM-based NER model.

    Args:
        llm_api (Literal["openai", "together", "ollama", "llama.cpp"]): The LLM API provider to use.
            Defaults to "openai".
        model_name (str): Name of the language model to use.
            Defaults to "gpt-4o-mini".
        max_tokens (int): Maximum number of tokens for model output.
            Defaults to 1024.
    """

    self.llm_api = llm_api
    self.model_name = model_name
    self.max_tokens = max_tokens

    self.client = init_langchain_model(llm_api, model_name)

NER Model

gfmrag.kg_construction.ner_model ¶

BaseNERModel ¶

__call__(text) abstractmethod ¶

LLMNERModel ¶

__call__(text) ¶

__init__(llm_api='openai', model_name='gpt-4o-mini', max_tokens=1024) ¶

`gfmrag.kg_construction.ner_model` ¶

`BaseNERModel` ¶

`call(text)` `abstractmethod` ¶

`LLMNERModel` ¶

`call(text)` ¶

`init(llm_api='openai', model_name='gpt-4o-mini', max_tokens=1024)` ¶