RAG on living data

For the RAG on living data workflow, automate the integration of real-time data from Notion into a dynamic knowledge base. This workflow utilizes advanced AI capabilities to process and embed content, enabling efficient retrieval and interaction through a chat interface. By running every minute, it ensures that your knowledge base is always up-to-date, enhancing accessibility and responsiveness for users seeking information.

7/8/2025
34 nodes
Complex
schedulecomplexlangchainsplitinbatchesschedule triggersticky notesupabasenotionnoopnotiontriggersummarizeautomationadvancedcron
Categories:
Schedule TriggeredComplex Workflow
Integrations:
LangChainSplitInBatchesSchedule TriggerSticky NoteSupabaseNotionNoOpNotionTriggerSummarize

Target Audience

Who should use this workflow


- Data Scientists: To automate the embedding and retrieval of data from Notion for analysis and insights.
- Developers: To integrate Notion data with machine learning models seamlessly.
- Business Analysts: To ensure that the latest information from Notion is always available for decision-making.
- Content Managers: To keep track of document updates and ensure that the latest content is embedded for easy access.

Problem Solved

What problem does this workflow solve


This workflow automates the process of updating and embedding data from Notion into a vector store, ensuring that the latest information is readily available for querying. It addresses issues such as:
- Data Redundancy: By deleting old embeddings, it prevents duplicate entries in the vector store.
- Timeliness: Automatically retrieves updated pages from Notion every minute, ensuring data is current.
- Efficiency: Processes documents in batches, making it scalable for large datasets.

Workflow Steps

Detailed explanation of the workflow process


1. Trigger: The workflow is initiated either by a schedule every 1 minute or by updates in the Notion database.
2. Get Updated Pages: Retrieves pages that have been updated in the last minute from the Notion database.
3. Input Reference: Serves as a placeholder for processing each updated page individually.
4. Delete Old Embeddings: Removes any existing embeddings related to the updated pages to maintain data integrity.
5. Get Page Blocks: Fetches all blocks of content from the updated Notion pages.
6. Concatenate to Single String: Combines all fetched content into a single string for easier embedding.
7. Token Splitter: Splits the combined content into manageable chunks of 500 tokens for embedding.
8. Embeddings OpenAI: Generates embeddings using OpenAI's model.
9. Store in Supabase: Inserts the new embeddings into the Supabase vector store for retrieval.
10. Vector Store Retriever: Retrieves the embeddings for use in a question-and-answer chain.
11. OpenAI Chat Model: Facilitates chat interactions, allowing users to ask questions based on the embedded data.

Customization Guide

How users can customize and adapt this workflow


- Change Trigger Frequency: Adjust the schedule trigger to a different interval based on the frequency of updates in Notion.
- Modify Chunk Size: Alter the chunkSize in the Token Splitter node for different embedding models or content types.
- Embedding Model: Switch the OpenAI model used in the Embeddings node to a different version or type based on requirements.
- Database Integration: If using a different database instead of Supabase, update the corresponding nodes to connect to the new database.
- Additional Processing: Insert more nodes for additional processing steps after embeddings are generated, such as data validation or transformation.