
Retrieval Augumented Generation

Promp Engineering versus RAG versus Fine Tuning


Prompt engineering example:


RAG versus Fine tuning


Example from OpenAI


Re-ranking: Apply across enconder or rule-based.

Classification step: having. the model to classify domain and give extra metadata on the prompt.

Tools: category of questions, e.g.. figures, acccess to SQL databases, etc..

Query expansion: List of question in prompt executed in paralell.

How to evaluate?

Picture below is based on RAGAS - Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines. Re. AWS-Bedrock example here and blog to ry here.




Pattern: Direct Schema Inference, Self-Correction & Optimization

Direct schema inference:seed prompt’ that instructs the LLM to construct an SQL query corresponding to a user’s inquiry.The execution of this initial prompt continues iteratively until it meets with success.

Please craft a SQL query for BigQuery that addresses the following QUESTION provided below. 
Ensure you reference the appropriate BigQuery tables and column names provided in the SCHEMA below. 
When joining tables, employ type coercion to guarantee data type consistency for the join columns. 
Additionally, the output column names should specify units where applicable.\n
Ensure your SQL query accurately defines both the start and end of the DATETIME range.

Self-correction:f ailures are treated as critical learning opportunity for the LLM, allowing it to scrutinize tracebacks and utilize error messages to refine and evolve the seed prompt into an improved query iteration.

prompt = f"""Encountered an error: {msg}. 
To address this, please generate an alternative SQL query response that avoids this specific error. 
Follow the instructions mentioned above to remediate the error. 

Modify the below SQL query to resolve the issue:

Ensure the revised SQL query aligns precisely with the requirements outlined in the initial question."""



Dealing with Hallucination

Reducing hallucination:

a. Prompt Engineering: Your prompt is quite explicit, but you may want to make it even more stringent. You could add sentences that explicitly ask the model not to extrapolate from the data. What temperature are you using?

b. Confidence Scoring: Implement a confidence score mechanism to assess the relevance of the generated response to the query and the provided content. If the score is below a certain threshold, default to “Sorry, I am unable to answer your query.”

c. Post-processing: After the model generates an answer, you could add another layer of validation to verify the factual accuracy of the response against the data before sending it to the user.

d. User Feedback Loop: Allow users to flag incorrect answers, which could be used to fine-tune the model or adjust its confidence thresholds.

From: Hallucination in retrieval augmented chatbot (RAG)

Advanced RAG Techniques:

Figure below presents the common failures of a standard RAG method by Barnett S. et al.. Basically those possible common failures are:


Picture by Authors

Furthermore, standard RAG methods can only retrieve contiguous chunks from the document corpus, lacking understanding of the overall document context. Therefore the need for advanced RAG methods that allow solution to understand long document context and integrate knowledge from multiple parts of a text, such as an entire book.

RAPTOR - Recursive Abstractive Processing For Tree-Organized Retrieval

RAPTOR is an indexing and retrieval system that uses a tree structure to capture both high-level and low-level details about a text. RAPTOR cluster chunks of text, generates text summaries of those clusters and then repeats, generating a tree from the bottom up. This allows it to load into an LLM´s context chunks representing the text at different levels so that it can effectively and efficienltl answer questions at different levels.


Picture by Paper´s author


Picture by Langchain

The motivation behind RAPTOR is that long texts often presents subtopics and hierarchical structures. Thus the RAPTOR solution builds a recursive tree structure that balances broader topic comprehension with granular details and which allows nodes to be grouped based on semantic similarity not just order in the text.

RAPTOR steps are:

The clustering approache uses soft clustering where nodes can belong to multiple clusters without requiring a fixed numbe of clusters. Thus text segments can endup in multiple summaries as it might be relevant to various topics. The clustering algorithm is based on GMM (Gaussian Mixture Models).

RAPTOR steps in querying are:

RAPTOR queries uses two strategies: tree traversal and collapsed tree.


Picture by Authors

Colapsed tree is better as it offers greater flexibility than tree traversal.


Source code:

Langchain implementation:

Graph Neural Network with Large Language Models (Amazon)

GNN method has th


Picture by Authors


Picture by Authors
