Gmail to Vector Embeddings with PGVector and Ollama

For Gmail, this automated workflow extracts emails, processes them into structured data, and generates vector embeddings for advanced search capabilities. It efficiently handles bulk email imports, enabling users to analyze and retrieve relevant information quickly. With a manual trigger and integration with LangChain, it ensures timely updates and seamless data management.

7/8/2025
20 nodes
Complex
manualcomplexlangchaingmailtriggersplitinbatchessplitoutnoopsticky notepostgresqlgmailadvancedlogicconditionaldatabasedata
Categories:
Communication & MessagingComplex WorkflowManual TriggeredData Processing & Analysis
Integrations:
LangChainGmailTriggerSplitInBatchesSplitOutNoOpSticky NotePostgreSQLGmail

Target Audience

  • Data Scientists: Those looking to analyze and extract insights from email data.
    - Marketing Professionals: Individuals who want to leverage email data for targeted campaigns.
    - Developers: Coders who seek to automate email processing and storage.
    - Business Analysts: Analysts interested in understanding communication trends and patterns within email data.
    - Researchers: Academics who require structured data from emails for studies.
  • Problem Solved

    This workflow automates the process of extracting email data from Gmail, transforming it into structured records and vector embeddings for advanced analysis. It solves the problem of manual data entry and organization, enabling users to efficiently manage large volumes of emails and perform similarity searches on the content.

    Workflow Steps

  • Manual Trigger: The workflow begins when manually triggered by the user.
    - Create the Table: A PostgreSQL table named emails_metadata is created if it doesn't already exist, ensuring a structured storage for email data.
    - Explode Interval into Weeks: The workflow calculates weekly intervals from a specified Gmail account creation date, generating a list of weeks.
    - Set Before and After Dates: It assigns the after and before dates based on the generated weeks for filtering emails.
    - Get a Batch of Messages: Emails received between the specified dates are fetched from Gmail.
    - Extract Email Fields: Relevant fields such as email_text, email_from, date, and email_id are extracted from the fetched emails.
    - Store Structured: The extracted email data is inserted or updated in the emails_metadata table in PostgreSQL.
    - Loop Over Items: Each email is processed in batches, allowing for scalable handling of large datasets.
    - Recursive Character Text Splitter: Email text is split into manageable chunks for further processing.
    - Embeddings Ollama: The split email text is transformed into vector embeddings using the nomic-embed-text model.
    - Store Vectorized: The generated vector embeddings are stored in a separate emails_embeddings table for similarity searches.
  • Customization Guide

  • Changing Gmail Account Creation Date: Modify the date in the whenDidICreateMyGmailAccount variable within the Explode Interval into Weeks step to reflect the actual creation date of your Gmail account.
    - Adjusting Email Filters: Update the filters in the Get a Batch of Messages step to customize which emails are fetched based on specific criteria, such as labels or date ranges.
    - Modifying Chunk Size: In the Recursive Character Text Splitter, adjust the chunkSize and chunkOverlap parameters to control how the email text is split for embedding.
    - Database Configuration: Change the PostgreSQL table names and columns in the Store Structured and Store Vectorized steps if different naming conventions are preferred.
    - Embedding Model Selection: The model used in the Embeddings Ollama step can be replaced with another model if different embedding techniques are desired.