Using Vector Databases for Retrieval Augmented Generation

Using Vector Databases for Retrieval Augmented Generation


4 min read


Hello everyone! Continuing our vector database exploration (make sure you read the last part linked here it's pretty cool ๐Ÿ˜ฎโ€๐Ÿ’จ), today we're going to be expanding the LLM's knowledge base using retrieval augmented generation.

What we'll be doing is we're going to be creating a bot where we ask him about famous people, obviously the catch is an LLM probably knows who Di Caprio is but it doesn't know who I am for example (not for long though ๐Ÿ˜).

We'll have a vector database for bios of people that aren't famous at all. When asking the bot about a certain non famous person, we'll gather semantics from our vector database for that person and if he exists in our database, we'll provide context to the LLM when asking about him. Let's dive in!

Retrieval Augmented Generation

Without getting too technical, retrieval augmented generation is nothing but referencing extra data outside of the LLM's training data. This data can help the LLM provide accurate results and avoid it 'hallucinating'

LLM's by themselves have known problems, these problems include the following:

  1. Presenting false information when it does not have the answer.

  2. Presenting out-of-date or generic information when the user expects a specific, current response.

  3. Creating a response from non-authoritative sources.

  4. Creating inaccurate responses due to terminology confusion, wherein different training sources use the same terminology to talk about different things.

So providing extra data or context when talking to the LLM helps improve the response.



We'll be using the same codebase as our previous semantic search article and expanding on that.

We'll have a database table with 2 fields, text and embedding. Where the text is a text of the person's bio.

As a refresher, the two functions below are going to be used. The first one takes in an embedding and performs a semantic query against it. The second one just abstracts the first by embedding the query text and returning any related semantics.

func (mysql *MySQL) GetRelatedEmbeddings(embedding []byte) []string {
    res, err := mysql.db.Query("SELECT text, dot_product(embedding, ?) as similarity FROM famous_people ORDER BY similarity DESC LIMIT 3", embedding)
    if err != nil {
        log.Fatal("Error querying database:", err)
    var relatedEmbeddings []string
    for res.Next() {
        var text string
        var similarity float32
        err = res.Scan(&text, &similarity)
        if err != nil {
        relatedEmbeddings = append(relatedEmbeddings, text)

    return relatedEmbeddings

func GatherContext(text string, dbclient *structs.MySQL, openAIClient *structs.OpenAIClient) []string {
    embedding := openAIClient.GetEmbeddingForText(text)
    result := dbclient.GetRelatedEmbeddings(convertFloatToByte(embedding.Embedding))

    return result

We created a new function GPT where we do two main things:

  1. Call GatherContext using the name we searched for, gathering any related context of that person in the database.

  2. Prompting the LLM. And this is a dealbreaker in the whole process. You need to correctly prompt the LLM otherwise it's going to be quite difficult getting the results you need. I'd recommend reading the prompting documentation

Prompting the LLM

We're going to be using different tactics to prompt the LLM.

  1. We'll be asking it to adopt a persona. A world-renowned detective who specializes in gathering information about famous people.

  2. Writing clear instructions to say 'I don't know' if it doesn't know who the person provided is.

  3. Providing reference text 'context' with the question if the database gave any results

This results in a prompt like this.

"You are a world-renowned detective who specializes in gathering information about famous people. 
Your task today is to gather information about some famous individuals. 
Extra context is provided below that MAY have the data you're looking for:"


"If you don't have information strictly say that you don't know who that is."

"Who is this person?"

And here's the GPT

func GPT(text string, dbclient *structs.MySQL, openAIClient *structs.OpenAIClient) string {
    contextData := GatherContext(text, dbclient, openAIClient)
    contextString := strings.Join(contextData, ",")
    prompt := // PROMPT SHOWN ABOVE
    queryReq := openai.CompletionRequest{
        Model:     openai.GPT3Dot5TurboInstruct,
        Prompt:    prompt,
        MaxTokens: 100,
        Stop:      []string{"\n"},

    queryResponse, err := openAIClient.Client.CreateCompletion(context.Background(), queryReq)
    if err != nil {
        log.Fatal("Error creating query completion:", err)
    return queryResponse.Choices[0].Text

We call the completion API from openAI with the prompt adding in the extra context. This should yield in a more accurate, responsive answer from openAI.


Not only does RAG augment with useful information it also provides endless opportunities and ideas using AI. In this article we used vector databases but knowledge graphs also are another way to augment with information. Because knowledge graphs have the ability to host tons of information and the relationships between them. Hope I was able to clear any confusion about RAG in this article. Till the next one!

Did you find this article valuable?

Support Amr Elhewy by becoming a sponsor. Any amount is appreciated!