Configuring the .env File
RAG Me Up implements all the components described in the High-Level Design and lets you configure them, possibly even turning some components completely off.
This is all done through the .env file (in the /server folder), for which you can find a template version in the repository named .env.template. Just rename this file to .env and configure your
RAG Me Up by following this page's documentation.
The following components can be configured through the .env file.
Logging
logging_levelcan be set through this variable. UseDEBUGfor local development and testing and something likeWARNorERRORfor production settings.
Document embedding
Embeddings are done locally in RAG Me Up, meaning we don't use OpenAI, Anthropic or other API providers for embeddings but instead directly use Huggingface models to do the embedding. They can either be run on GPU (fast, if you have one) or on CPU (slow, but always possible).
embedding_modelsets the Huggingface model to use. Use leaderboards like MTEB to decide what model you want to use. Try to find a trade-off between size/embedding dimension (hence, speed) and accuracy.embedding_cpucan be used to force CPU usage, by default RAG Me Up will use a GPU.
Data loading
data_directoryshould be the full or relative path to where your data is stored. RAG Me Up will use this when it is run for the first time to read in all the supported files once and load them into the database. Once there is data present in the database, it will not load the data from this folder yet again.file_typesis a comma-separated list of file types (extensions) that should be loaded. RAG Me Up currently supportspdf,json,docx,pptx,xslx,csv,txt. If you leave one of these out, files with that extension, while maybe present in your data directory, will not be loaded.json_schemaif JSON files are to be processed, you can specify a custom schema to load specific parts using jq.csv_seperatorif CSV files are to be processed, this should be the separator.
Chunking
splittershould be the name of the splitter to use. Allowed are:RecursiveCharacterTextSplitter,SemanticChunkerorParagraphChunker(a RecursiveCharacterTextSplitter that always respects paragraph boundaries).recursive_splitter_chunk_sizesets the chunk size for the RecursiveCharacterTextSplitter.recursive_splitter_chunk_overlapsets the overlap size for the RecursiveCharacterTextSplitter.semantic_chunker_breakpoint_threshold_typesets the threshold type for the SemanticChunker. Allowed are:percentile,standard_deviation,interquartile,gradient.semantic_chunker_breakpoint_threshold_amountset the treshold amount for the SemanticChunker.semantic_chunker_number_of_chunksset the number of chunks for the SemanticChunker.paragraph_chunker_max_chunk_sizeset the chunk size for the ParagraphChunker.paragraph_chunker_paragraph_separatorset the separator size for the ParagraphChunker. This will be used to determine what a paragraph is by splitting on it as regex.
Database setup
postgres_urishould specify the URI for the Postgres instance to use. Check thepostgressubfolder for a Docker that will run an image that can function as a hybrid retrieval store.vector_store_kis the number of document chunks to fetch (during normal retrieval) from the Postgres database.
Reranking
rerankis a boolean that you can set to turn reranking on or off.rerank_kthe number of documents to keep after reranking. Usually you setvector_store_krelatively high andrerank_kto be your final desired amount of chunks to keep.rerank_modelthe (flashrank) rerank model to use, find alternatives on Huggingface but be sure they are flashrank-compatible.
HyDE
use_hydeboolean to turn HyDE on or off.hyde_querythe prompt given to the LLM to let it generate a hypothetical document. Must have the following placeholders:questionthe user question.
Conversation summarization
use_summarizationis a boolean turning automatic summarization (of the conversation history) on or off.summarization_thresholdis the number of tokens (not characters) the history should exceed to start summarization. Be sure to keep a buffer between this threshold and your model's context window because there will be some overhead from the summarization prompt.summarization_querythis is the query that will be sent to the LLM to run the actual summarization. Must have the following placeholders:historycontaining the actual conversation history, best put at the end of the prompt.
summarization_encoderthe tiktoken model to use to count the tokens for thesummarization_thresholdcheck.
RAG configuration
temperatureshould be the model temperature. 0 for no variation and higher for more variation.rag_instructionthe system prompt/instruction to use for normal RAG query answering. It is wise to include some background on the RAG system's purpose and always try to force the system to mention sources used. Must have the following placeholders:contextthe documents retrieved from the Postgres database.
rag_question_initialdecoration around the user's question that will be asked to the LLM. RAG Me Up allows to differentiate initial and follow-up questions through different prompts. Must have the following placeholders:questionthe original user question.
rag_question_followupsame as above but for a followup question. Must have the following placeholders:questionthe user's follow-up question.
rag_fetch_new_questionthe prompt sent to the LLM to ask to check for whether or not we should fetch new documents, given that we have a follow-up already. This prompt must force the LLM to answer with yes (should fetch) or no (no fetch required) only. Must have the following placeholders:questionthe user's follow-up question.
Rewrite loop
use_rewrite_loopa boolean indicating whether or not we should use the rewrite-loop (only once) to check if the retrieved documents can be used to answer the user's question.rewrite_query_instructionthe system prompt sent to the LLM to decide if the documents can answer the question. Must contain the retrieved documents. This prompt must force the LLM to answer with yes (documents can answer the question) or no (we should rewrite the question), followed by a motivation. This motivation will be used in the actual rewriting. Must have the following placeholders:contextthe documents retrieved from the Postgres database.
rewrite_query_questionthe message sent to the LLM containing the user's question that should be answered with the documents. Must have the following placeholders:questionthe user question.
rewrite_query_promptthis prompt is used to instruct the LLM to perform the actual rewrite. It is adviced to let the LLM only answer with the rephrasing and not let it add decorations or explanations. Must have the following placeholders:questionthe user question to rewrite.motivationthe motivation output from the earlier query asking the LLM whether the documents can be used to answer the question or not.
Re2
use_re2boolean indicating whether or not to use re-reading instruction.re2_promptthe prompt that will be injected in between the re-iterating of the question. This will result in the following format:[Original question]\n{re2_prompt}\n[Original question]
Provenance
provenance_methodthe provenance attribution metric to use. Allowed values are:rerank,similarity,llmorNone(to turn provenance attribution off).provenance_similarity_llmis the model to use when applyingsimilarityprovenance attribution to compute the similarities of the documents to the answer (and question).provenance_include_queryby default provenance is attributed to the answer only. Set this flag to True to also attribute to the question.provenance_llm_promptis the prompt used to ask the LLM for provenance when the provenance_method is set to LLM. You are free to define any ranking score or mechanism but do make this really clear in the prompt. Must have the following placeholders:querythe user question.answerthe answer to the question as generated by the LLM.contextthe document chunk that we are attributing provenance for.
Model selection
You can choose between different LLM providers, including running your own locally through Ollama. Make sure that you set all environment variables that are required for your specific provider (eg. the OPENAI_API_KEY for OpenAI)
use_openaiset to True to use OpenAI.openai_model_namethe model to use when selecting OpenAI.use_azureset to True to use Azure OpenAI.use_geminiset to True to use Gemini.gemini_model_namethe Gemini model to use when selecting Gemini.use_anthropicset to True to use Anthropic.anthropic_model_namethe Antrhopic model to use when selecting Anthropic.use_ollamaset to True to use Ollama (local model).ollama_model_namethe Ollama model to use when selecting Ollama. Look for models here