my_ml_notes

Retrieval Augmented Generation

###

Criteria to choose right Vector Database

Choosing the right database can be difficult as it will depend on the specific application requirements.

Commonly used aspects that you need to consider are:

Here you find and example of Vector database comparison done by Dhruv Anand.

Promp Engineering versus RAG versus Fine Tuning

image-20231114095854888

Prompt engineering example:

image-20231114100356222

RAG versus Fine tuning

image-20231114100618413

Example from OpenAI

image-20231114101043664

Re-ranking: Apply across enconder or rule-based.

Classification step: having. the model to classify domain and give extra metadata on the prompt.

Tools: category of questions, e.g.. figures, acccess to SQL databases, etc..

Query expansion: List of question in prompt executed in paralell.

How to evaluate?

Picture below is based on RAGAS - Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines. Re. AWS-Bedrock example here and blog to ry here.

image-20231114102045866

Techniques

Text-to-SQL

Pattern: Direct Schema Inference, Self-Correction & Optimization

Direct schema inference:seed prompt’ that instructs the LLM to construct an SQL query corresponding to a user’s inquiry.The execution of this initial prompt continues iteratively until it meets with success.

Please craft a SQL query for BigQuery that addresses the following QUESTION provided below. 
Ensure you reference the appropriate BigQuery tables and column names provided in the SCHEMA below. 
When joining tables, employ type coercion to guarantee data type consistency for the join columns. 
Additionally, the output column names should specify units where applicable.\n
QUESTION:
{}\n
SCHEMA:
{}\n
IMPORTANT: 
Use ONLY DATETIME and DO NOT use TIMESTAMP.
--
Ensure your SQL query accurately defines both the start and end of the DATETIME range.

Self-correction:f ailures are treated as critical learning opportunity for the LLM, allowing it to scrutinize tracebacks and utilize error messages to refine and evolve the seed prompt into an improved query iteration.

prompt = f"""Encountered an error: {msg}. 
To address this, please generate an alternative SQL query response that avoids this specific error. 
Follow the instructions mentioned above to remediate the error. 

Modify the below SQL query to resolve the issue:
{generated_sql_query}

Ensure the revised SQL query aligns precisely with the requirements outlined in the initial question."""

Optimization:

image-20231115074047300

Dealing with Hallucination

Reducing hallucination:

a. Prompt Engineering: Your prompt is quite explicit, but you may want to make it even more stringent. You could add sentences that explicitly ask the model not to extrapolate from the data. What temperature are you using?

b. Confidence Scoring: Implement a confidence score mechanism to assess the relevance of the generated response to the query and the provided content. If the score is below a certain threshold, default to “Sorry, I am unable to answer your query.”

c. Post-processing: After the model generates an answer, you could add another layer of validation to verify the factual accuracy of the response against the data before sending it to the user.

d. User Feedback Loop: Allow users to flag incorrect answers, which could be used to fine-tune the model or adjust its confidence thresholds.

From: Hallucination in retrieval augmented chatbot (RAG)

Advanced RAG Techniques:

Figure below presents the common failures of a standard RAG method by Barnett S. et al.. Basically those possible common failures are:

image-20240306175319053

Picture by Authors

Furthermore, standard RAG methods can only retrieve contiguous chunks from the document corpus, lacking understanding of the overall document context. Therefore the need for advanced RAG methods that allow solution to understand long document context and integrate knowledge from multiple parts of a text, such as an entire book.

RAPTOR - Recursive Abstractive Processing For Tree-Organized Retrieval

RAPTOR is an indexing and retrieval system that uses a tree structure to capture both high-level and low-level details about a text. RAPTOR cluster chunks of text, generates text summaries of those clusters and then repeats, generating a tree from the bottom up. This allows it to load into an LLM´s context chunks representing the text at different levels so that it can effectively and efficienltl answer questions at different levels.

image-20240305214929149

Picture by Paper´s author

image-20240305215853619

Picture by Langchain

The motivation behind RAPTOR is that long texts often presents subtopics and hierarchical structures. Thus the RAPTOR solution builds a recursive tree structure that balances broader topic comprehension with granular details and which allows nodes to be grouped based on semantic similarity not just order in the text.

RAPTOR steps are:

The clustering approache uses soft clustering where nodes can belong to multiple clusters without requiring a fixed numbe of clusters. Thus text segments can endup in multiple summaries as it might be relevant to various topics. The clustering algorithm is based on GMM (Gaussian Mixture Models).

RAPTOR steps in querying are:

RAPTOR queries uses two strategies: tree traversal and collapsed tree.

image-20240305230255337

Picture by Authors

Colapsed tree is better as it offers greater flexibility than tree traversal.

Benefits:

Source code: https://github.com/parthsarthi03/raptor

Langchain implementation: https://github.com/langchain-ai/langchain/blob/master/cookbook/RAPTOR.ipynb

Graph Neural Network with Large Language Models (Amazon)

GNN method has th

image-20240305230513487

Picture by Authors

image-20240305230452152

Picture by Authors

Contextual Retrieval Preprocessing (By Anthropic)

Takeaways:

Issue: Embedding models is excellent at capturing semantic relationship, but can miss exact matches.

Proposal:

Solution:

The RAG solution proposed (ref. figure below) combine the embedding and BM25 techniques using the steps:

  1. Break knowledge base into smaller chunks of text
  2. Create TF-IDF encoding and semantic embedding for these chunks
  3. Use BM25 to find top chunks based on exact matches
  4. Use embedding to find top chunks based on semantic similarity
  5. Combine and deduplicate results from BM25 and semantic search using rank fusion techniques
  6. Add the top-K chunks to the prompt and generate response.

Context Retrieval tries to mitigate the problem of “lacking sufficient context” by prepending chunk-specific explanatory context to each chunk before embedding (“contextual embeddings”) and creating the BM25 index (“contextual BM25”). See example bellow for SEC filing example from article.

original_chunk = "The company's revenue grew by 3% over the previous quarter."

contextualized_chunk = "This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."

image-20240925081635884

The following prompt is used to generate the context for each chunk

<document> 
 
</document> 
Here is the chunk we want to situate within the whole document 
<chunk> 
 
</chunk> 
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else. 

As highlighted, important consideration while using contextual retrieval are:

Further improvement is achieved using reranking. Reranking is a commonly used filtering technique that ensure most relevant chunks are passed to the model. Basically the steps of reranking are:

  1. perform initial retrieval with top candidates (e.g. top 150)
  2. Pass the top-N chunks along with user query through the rerank model.
  3. Rerank model gives each model a score based on relevance and importance to the prompt , and then select the top-K chunks (e.g. top 20).
  4. Pass the top-K into the model as context and generate answer.

Examples of re-rank models are Cohere reranker. There is also the possibility to use out-of-box techniques such as the one done in HyDE and here.

image-20240925082248322

Picture by Author

Results:

image-20240925100145474

Contextual retrieval notebook example here.

References: