Store Notion's Pages as Vector Documents into Supabase with OpenAI

Target Audience

Developers looking to automate data storage from Notion to Supabase.
- Data Scientists who require a seamless method for embedding and vectorizing text data from Notion.
- Content Creators wanting to store and analyze their Notion pages efficiently.
- Business Analysts who need quick access to summarized data from Notion for reporting purposes.

Problem Solved

This workflow automates the process of storing Notion pages as vector documents in Supabase, addressing the challenge of manual data entry and ensuring that important textual information is efficiently stored and easily retrievable for analysis or further processing.

Workflow Steps

Trigger on Notion Page Addition: The workflow starts when a new page is added to a specified Notion database, ensuring real-time data capture.
2. Retrieve Page Content: It fetches all the block content from the newly added Notion page, gathering all relevant information.
3. Filter Non-Text Content: The workflow excludes non-text blocks (like images and videos) to focus solely on textual data, enhancing the quality of the stored content.
4. Summarize Content: It concatenates the textual content from the blocks into a single text for embedding, streamlining the data for processing.
5. Generate Embeddings: The workflow utilizes OpenAI's API to create embeddings for the concatenated text, enabling advanced analysis and search capabilities.
6. Create Metadata: It generates associated metadata (like page ID and creation time) to contextualize the stored data, making it easier to reference later.
7. Split Content into Chunks: The text is divided into smaller, manageable chunks for efficient processing and embedding generation.
8. Store in Supabase: Finally, the processed documents and their embeddings are stored in a Supabase table, ready for retrieval and analysis.

Customization Guide

Users can customize the workflow by:
- Changing the Notion Database: Update the databaseId in the Notion Trigger to monitor a different database.
- Modifying Filters: Adjust the filter conditions to include or exclude different types of content (e.g., text types) based on specific needs.
- Altering Chunk Size: Change the chunkSize and chunkOverlap parameters in the Token Splitter to optimize the text processing based on the size of the data being handled.
- Customizing Metadata: Add or modify metadata fields in the Create Metadata and Load Content node to capture additional information relevant to your use case.
- Adjusting Embedding Options: Modify the options in the Embeddings node to fine-tune how embeddings are generated, potentially improving the quality of the stored vectors.