How to build a RAG Application in a few minutes

I built this AI Application that knows everything about Formula 1 - here's how...

Dec 13, 2024

Earlier this week, I built an AI Application that knows everything about Formula 1, as soon as it happens!

Why is this so cool? Well one of the common limitations of existing Large Language Models (LLMs), such as OpenAI’s ChatGPT, is that their knowledge is limited to the data that the model is trained with. So adding new information requires the entire training process to be ran again.

So how did I built this application without having to train my own LLM?

What is RAG?

RAG (Retrieval-Augmented Generation) is a method of building AI applications which allows a developer to improve the accuracy and the relevance of the content generated by a Large Language Model.

It combines two key components:

Data ingestion: This is the process of collecting relevant information within a topic (usually through web scraping or processing documents), formatting the data into an easily queryable format (embeddings/vectors) and storing it in a knowledge base (usually a vector database).
Intelligent Querying: This is the process of taking a question/prompt from a user and using it to query the knowledge base for relevant information that could be useful when generating a response for the user. The retrieved context along with the initial prompt is then passed to an LLM in order to generate a response that the user will be able to understand.

Here’s a quick breakdown of the retrieval process for this project:

Let’s get into explaining exactly how I built this project!

Collecting data for our knowledge base

The first step in building our F1 RAG application is collecting and processing the data that will form our knowledge base. This involves three main steps:

Scraping the data
Generating embeddings
Storing the knowledge in a vector database.

1. Scrape Data

We'll use Playwright to scrape text content from websites, then split it into manageable chunks using LangChain's text splitter. This ensures our data is properly segmented for processing.

import playwright from "playwright";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";

export async function scrape(url: string) {

  // Scrape the text from the website

  const browser = await playwright.chromium.launch();

  const context = await browser.newContext();

  const page = await context.newPage();

  await page.goto(url);

  const text = await page.innerText("body");
  
  text.replace(/\\n/g, " ");

  await browser.close();

  // Split the text into chunks

  const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 512,
    chunkOverlap: 100,
  });

  const output = await splitter.createDocuments([text]);

  return output;

}

2. Generate Embeddings of Chunks

After splitting the text, we need to convert each chunk into a vector representation (embedding) using OpenAI's embedding model. These embeddings will allow us to perform semantic search later.

(You do not need to use OpenAI’s models for this project. There are open-source options available)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
});

export async function generateEmbedding(text: string) {
  const embedding = await client.embeddings.create({
    model: "text-embedding-ada-002",
    input: text
  })

  return embedding;
}

3. Store Data in Vector Database

We'll use Astra DB as our vector database to store both the text chunks and their embeddings. This setup allows for efficient similarity searches later.

import { DataAPIClient } from "@datastax/astra-db-ts";

const client = new DataAPIClient('YOUR_TOKEN');
const db = client.db('YOUR_DB_URL');
const collection = db.collection('f1gpt');

export async function createCollection() {
  const res = await db.createCollection("f1gpt", {
    vector: {
      dimension: 1536,
      metric: "dot_product"
    }
  });
  return res
}

export async function uploadData(data: {
  $vector: number[],
  text: string
}[]) {
  return await collection.insertMany(data);
}

Now that we have our individual components, we need to combine them into a complete ingestion process that will handle multiple URLs and process them in parallel.

Final Ingestion Process

This code brings everything together, processing multiple URLs concurrently and storing the results in our vector database.

For this demo, I’ve only included 2 urls. However, you could add in as many as you want!

import { createCollection, uploadData } from "./lib/db";
import { generateEmbedding } from "./lib/openai";
import { scrape } from "./lib/scrape";

const urls = [
  "<https://en.wikipedia.org/wiki/Formula_One>",
  "<https://en.wikipedia.org/wiki/George_Russell_(racing_driver)>",
];

async function ingest() {
   
  let chunks: { text: string, $vector: number[], url: string }[] = [];

  await (Promise.all(urls.map(async (url) => {
    let data = await scrape(url);

    const embeddings = await Promise.all(data.map(async (doc, index) => {
      const embedding = await generateEmbedding(doc.pageContent);
      return embedding;
    }));

    chunks = chunks.concat(data.map((doc, index) => {
      return {
        text: doc.pageContent,
        $vector: embeddings[index].data[0].embedding,
        url: url
      }
    }));
  })));

  await createCollection();
  
  await uploadData(chunks.map((doc, index) => {
    return {
      $vector: doc.$vector,
      text: doc.text,
      source: doc.url
    }
  }));
}

ingest();

Building the Query Functionality

Once we have our knowledge base set up, we need to build the functionality to query it effectively to answer any questions that our users submit. This involves three main steps:

Generating an embedding for the User’s Query
Querying the database for similar documents
Generating an answer to the question using an LLM

1. Generate an Embedding for the User's Query

We use the same “generateEmbedding” function as before to convert the user's question into a vector format.

export async function generateEmbedding(text: string) {
  const embedding = await client.embeddings.create({
    model: "text-embedding-ada-002",
    input: text
  })

  return embedding;
}

2. Query the database for similar documents

We search the vector database for documents with similar embeddings to our query, retrieving the most relevant context.

import { DataAPIClient } from "@datastax/astra-db-ts";

const client = new DataAPIClient('YOUR_TOKEN');
const db = client.db('YOUR_DB_URL');
const collection = db.collection('f1gpt');

export async function queryDatabase(query: number[]) {
  const res = await collection.find(null, {
    sort: {
      $vector: query
    },
    limit: 10
  }).toArray();

  return res
}

3. Generate an answer using an LLM

Finally, we use GPT-4o to generate a natural language response based on the retrieved context.

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
});

export async function generateResponse(question: string, context: string[]) {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{
      role: "user",
      content: `You are an expert in Formula 1 racing.
      You need to answer this question using the context provided.
      Do not mention that you have been provided with the context.
      QUESTION: ${question}.
      CONTEXT: ${context.join(" ")}
      `
    }]
  })

  return response.choices[0].message.content;

}

The complete querying process

This function brings all the querying components together into a single, easy-to-use function.

import { queryDatabase } from "./lib/db";
import { generateEmbedding, generateResponse } from "./lib/openai";

async function askQuestion(question: string) {

  const embedding = await generateEmbedding(question);

  const queryRes = await queryDatabase(embedding.data[0].embedding);

  const response = await generateResponse(question, queryRes.map((doc) => doc.text));

  return response;
}

Testing the RAG Application

Let's test our application with a real-world example about recent F1 events and news.

askQuestion("Why are George Russell and Max Verstappen arguing after Qatar 2024?").then((res) => {
  console.log(res);
});

RESULT:

“George Russell and Max Verstappen are arguing due to a clash that occurred after Verstappen received a grid-drop penalty. Verstappen was penalized for driving slowly ahead of Russell during a qualifying session…”

Without the up to date context collected from the Knowledge Base, gpt-4o wouldn't be able to answer this question due to it's knowledge being limited to pre-2024 data.

How could this project be improved?

There are two things that we could do to optimise this AI Application:

Increasing the number of sources and reference documents we are scraping / collection information from.
Increase the number of relevant context documents that can be retrieved from the vector database. Currently we’re only querying 5. This could be increased more and more, however, be careful not to increase it too much, or you could have issues with the accuracy of your responses.

Source Code

You can find the source code for this project on my GitHub account here: https://github.com/IAmTomShaw/f1-rag-ai

Tom's Tech Talk

Discussion about this post