###
Choosing the right database can be difficult as it will depend on the specific application requirements.
Commonly used aspects that you need to consider are:
Here you find and example of Vector database comparison done by Dhruv Anand.
Prompt engineering example:
RAG versus Fine tuning
Example from OpenAI
Re-ranking: Apply across enconder or rule-based.
Classification step: having. the model to classify domain and give extra metadata on the prompt.
Tools: category of questions, e.g.. figures, acccess to SQL databases, etc..
Query expansion: List of question in prompt executed in paralell.
Picture below is based on RAGAS - Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines. Re. AWS-Bedrock example here and blog to ry here.
Direct schema inference: ‘seed prompt’ that instructs the LLM to construct an SQL query corresponding to a user’s inquiry.The execution of this initial prompt continues iteratively until it meets with success.
Please craft a SQL query for BigQuery that addresses the following QUESTION provided below.
Ensure you reference the appropriate BigQuery tables and column names provided in the SCHEMA below.
When joining tables, employ type coercion to guarantee data type consistency for the join columns.
Additionally, the output column names should specify units where applicable.\n
QUESTION:
{}\n
SCHEMA:
{}\n
IMPORTANT:
Use ONLY DATETIME and DO NOT use TIMESTAMP.
--
Ensure your SQL query accurately defines both the start and end of the DATETIME range.
Self-correction:f ailures are treated as critical learning opportunity for the LLM, allowing it to scrutinize tracebacks and utilize error messages to refine and evolve the seed prompt into an improved query iteration.
prompt = f"""Encountered an error: {msg}.
To address this, please generate an alternative SQL query response that avoids this specific error.
Follow the instructions mentioned above to remediate the error.
Modify the below SQL query to resolve the issue:
{generated_sql_query}
Ensure the revised SQL query aligns precisely with the requirements outlined in the initial question."""
Optimization:
Reducing hallucination:
a. Prompt Engineering: Your prompt is quite explicit, but you may want to make it even more stringent. You could add sentences that explicitly ask the model not to extrapolate from the data. What temperature are you using?
b. Confidence Scoring: Implement a confidence score mechanism to assess the relevance of the generated response to the query and the provided content. If the score is below a certain threshold, default to “Sorry, I am unable to answer your query.”
c. Post-processing: After the model generates an answer, you could add another layer of validation to verify the factual accuracy of the response against the data before sending it to the user.
d. User Feedback Loop: Allow users to flag incorrect answers, which could be used to fine-tune the model or adjust its confidence thresholds.
Figure below presents the common failures of a standard RAG method by Barnett S. et al.. Basically those possible common failures are:
Picture by Authors
Furthermore, standard RAG methods can only retrieve contiguous chunks from the document corpus, lacking understanding of the overall document context. Therefore the need for advanced RAG methods that allow solution to understand long document context and integrate knowledge from multiple parts of a text, such as an entire book.
RAPTOR is an indexing and retrieval system that uses a tree structure to capture both high-level and low-level details about a text. RAPTOR cluster chunks of text, generates text summaries of those clusters and then repeats, generating a tree from the bottom up. This allows it to load into an LLM´s context chunks representing the text at different levels so that it can effectively and efficienltl answer questions at different levels.
Picture by Paper´s author
Picture by Langchain
The motivation behind RAPTOR is that long texts often presents subtopics and hierarchical structures. Thus the RAPTOR solution builds a recursive tree structure that balances broader topic comprehension with granular details and which allows nodes to be grouped based on semantic similarity not just order in the text.
RAPTOR steps are:
The clustering approache uses soft clustering where nodes can belong to multiple clusters without requiring a fixed numbe of clusters. Thus text segments can endup in multiple summaries as it might be relevant to various topics. The clustering algorithm is based on GMM (Gaussian Mixture Models).
RAPTOR steps in querying are:
RAPTOR queries uses two strategies: tree traversal and collapsed tree.
Picture by Authors
Colapsed tree is better as it offers greater flexibility than tree traversal.
Benefits:
Source code: https://github.com/parthsarthi03/raptor
Langchain implementation: https://github.com/langchain-ai/langchain/blob/master/cookbook/RAPTOR.ipynb
GNN method has th
Picture by Authors
Picture by Authors
Takeaways:
Issue: Embedding models is excellent at capturing semantic relationship, but can miss exact matches.
Proposal:
Solution:
The RAG solution proposed (ref. figure below) combine the embedding and BM25 techniques using the steps:
Context Retrieval tries to mitigate the problem of “lacking sufficient context” by prepending chunk-specific explanatory context to each chunk before embedding (“contextual embeddings”) and creating the BM25 index (“contextual BM25”). See example bellow for SEC filing example from article.
original_chunk = "The company's revenue grew by 3% over the previous quarter."
contextualized_chunk = "This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."
The following prompt is used to generate the context for each chunk
<document>
</document>
Here is the chunk we want to situate within the whole document
<chunk>
</chunk>
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.
As highlighted, important consideration while using contextual retrieval are:
Further improvement is achieved using reranking. Reranking is a commonly used filtering technique that ensure most relevant chunks are passed to the model. Basically the steps of reranking are:
Examples of re-rank models are Cohere reranker. There is also the possibility to use out-of-box techniques such as the one done in HyDE and here.
Picture by Author
Results:
Contextual retrieval notebook example here.