Artificial Intelligence Tools for Knowledge-Intensive Tasks (AIKIT)
Artificial Intelligence series
Article: Randolph C, Michaleas A and Ricke DO (2025) Large language models for closed-library multi-document query, test generation, and evaluation. Front. Artif. Intell. 8:1592013. doi: 10.3389/frai.2025.1592013
Journal URL: https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1592013/full
Source code: https://github.com/mit-ll/AIKIT
The new AI tools for Large Language Models (LLMs) enable queries against large pretrained knowledge bases of information.
Retrieval-Augmented Generation (RAG) is an extension capability to LLMs that allows additional information to be combined for these queries. This additional information can be from a large variety of information sources.
This post introduces the AIKIT tool for enabling LLM-RAG with a variety of different available LLMs and also a large variety of information sources. A computer is needed that can run LLM models. AIKIT has been tested extensively on Linux and Mac computers without any known issues.
Figure 1. Large language models (LLM) and Retrieval-Augmented Generation (RAG) overview.
To make AIKIT easier to install and run on different systems, it was containerized for both Docker and Singularity/Apptainer.
Figure 2. Docker and singularity containerized AIKIT.
AIKIT includes multiple interfaces including command line, Jupyter notebooks, and custom graphical user interface (GUI).
Figure 3. AIKIT command line and web interfaces.
We discovered that the LLM-RAG responses decreased with longer documents.
Figure 4. Document coverage by LLM RAG generated questions.
The coverage of content also varied by location within the documents. Hence, there is still further room for improvements in existing LLM-RAG libraries.
Figure 5. Context utilization in varying document lengths.
AIKIT currently supports two vector stores: FAISS and chroma db (use FAISS - on the same RAG dataset, chroma failed to provide results once).
Information sources supported include: PDF, text, Word, PowerPoint, Excel, audio, images (what does the image represent), and text within images.
The command line interfaces read and write JSON files for easy interfacing with custom tools.
The AIKIT GUI interface supports any database supported by Ruby on Rails (MySQL/MariaDB, PostgreSQL, Oracle, etc.). This interface saves queries and responses; this makes comparing responses against multiple LLMs easy.
AIKIT also provides information on up to five sources of information used to formulate the LLM-RAG response.
This is a free substack.