GFM-RAG Development¶

Requirements¶

Name	Installation	Purpose
Python 3.12	Download	The library is Python-based.
Poetry	Instructions	Poetry is used for package management and virtualenv management in Python codebases

Getting Started¶

Install Dependencies¶

Bash

# install python dependencies
poetry install
TORCH=$(python -c "import torch; print(torch.__version__)")
pip install torch_scatter torch_sparse -f https://data.pyg.org/whl/torch-${TORCH}.html

Install Pre-commit Hooks¶

Set up pre-commit hooks for development:

Bash

pre-commit install

CUDA Installation¶

GFM-RAG require the nvcc compiler to compile the rspmm kernel. If you encounter errors related to CUDA, make sure you have the CUDA toolkit installed and the nvcc compiler is in your PATH. Meanwhile, make sure your CUDA_HOME variable is set properly to avoid potential compilation errors, e.g.,

Bash

export CUDA_HOME=/usr/local/cuda-12.4

Repository Structure¶

An overview of the repository's top-level folder structure is provided below, detailing the overall design and purpose.

Bash

gfm_rag/                     # Root directory
├── docs/                    # Documentation
|   ├── DEVELOPING.md         # Development guide
|   |── CHANGELOG.md             # Project changelog
│   ├── config/             # Configuration documentation
│   │   ├── kg_index_config.md
│   │   └── ...
│   └── workflow/           # Workflow documentation
│       ├── kg_index.md
│       ├── training.md
│       └── ...
├── gfmrag/                 # Main package
|   ├── gfmrag_retriever.py # GFM-RAG retriever
|   ├── kg_indexer.py       # KG-index builder
|   ├── models.py          # GFM models
|   ├── losses.py       # Training losses
|   ├── doc_rankers.py   # Document rankers
│   ├── datasets/           # Dataset implementations
│   │   ├── qa_dataset.py
│   │   └── ...
│   ├── kg_construction/    # Knowledge graph construction
│   │   ├── entity_linking_model/ # Entity linking models
│   │   ├── ner_model/ # Named entity recognition models
│   │   ├── openie_model/ # OpenIE models
│   │   ├── kg_constructor.py # KG constructor
│   │   ├── qa_constructor.py # QA constructor
│   │   └── utils.py
│   ├── ultra/             # ultra models
│   │   ├── models.py
│   │   ├── layers.py
│   │   └── ...
|   ├── workflow/              # Training and inference scripts
|   │   ├── config/           # Configuration files
|   │   │   ├── stage1_index_dataset.yaml
|   │   │   ├── stage2_qa_finetune.yaml
|   │   │   ├── stage3_qa_inference.yaml
|   │   │   └── ...
|   │   ├── stage1_index_dataset.py
|   │   ├── stage2_qa_finetune.py
|   │   └── stage3_qa_inference.py
│   ├── llms/              # Language models
│   ├── evaluation/         # Evaluator for QA
│   └── utils/             # Utility functions
├── tests/                  # Test cases
├── scripts/                  # Scripts for running experiments
├── mkdocs.yml           # Documentation configuration
├── poetry.lock         # Poetry lock file
└── pyproject.toml      # Project configuration

Common Commands¶

Serve the documentation locally:

Bash

mkdocs serve

Run the pre-commit hooks:

Bash

pre-commit run --all-files

Build package:

Bash

poetry build