RAG on living data - N8N Workflow Directory

Target Audience

Who should use this workflow

- Data Scientists: To automate the embedding and retrieval of data from Notion for analysis and insights.
- Developers: To integrate Notion data with machine learning models seamlessly.
- Business Analysts: To ensure that the latest information from Notion is always available for decision-making.
- Content Managers: To keep track of document updates and ensure that the latest content is embedded for easy access.

Problem Solved

What problem does this workflow solve

This workflow automates the process of updating and embedding data from Notion into a vector store, ensuring that the latest information is readily available for querying. It addresses issues such as:
- Data Redundancy: By deleting old embeddings, it prevents duplicate entries in the vector store.
- Timeliness: Automatically retrieves updated pages from Notion every minute, ensuring data is current.
- Efficiency: Processes documents in batches, making it scalable for large datasets.

Workflow Steps

Detailed explanation of the workflow process

1. Trigger: The workflow is initiated either by a schedule every 1 minute or by updates in the Notion database.
2. Get Updated Pages: Retrieves pages that have been updated in the last minute from the Notion database.
3. Input Reference: Serves as a placeholder for processing each updated page individually.
4. Delete Old Embeddings: Removes any existing embeddings related to the updated pages to maintain data integrity.
5. Get Page Blocks: Fetches all blocks of content from the updated Notion pages.
6. Concatenate to Single String: Combines all fetched content into a single string for easier embedding.
7. Token Splitter: Splits the combined content into manageable chunks of 500 tokens for embedding.
8. Embeddings OpenAI: Generates embeddings using OpenAI's model.
9. Store in Supabase: Inserts the new embeddings into the Supabase vector store for retrieval.
10. Vector Store Retriever: Retrieves the embeddings for use in a question-and-answer chain.
11. OpenAI Chat Model: Facilitates chat interactions, allowing users to ask questions based on the embedded data.

Customization Guide

How users can customize and adapt this workflow

- Change Trigger Frequency: Adjust the schedule trigger to a different interval based on the frequency of updates in Notion.
- Modify Chunk Size: Alter the chunkSize in the Token Splitter node for different embedding models or content types.
- Embedding Model: Switch the OpenAI model used in the Embeddings node to a different version or type based on requirements.
- Database Integration: If using a different database instead of Supabase, update the corresponding nodes to connect to the new database.
- Additional Processing: Insert more nodes for additional processing steps after embeddings are generated, such as data validation or transformation.