Insert and retrieve documents

For n8n, automate the insertion and retrieval of Paul Graham's essays by scraping, processing, and storing them in a Milvus vector store. This workflow allows users to efficiently access and query essay content, generating insightful responses with citations, all triggered manually for flexibility.

7/8/2025
25 nodes
Complex
msndwkhqmwmdxwqhtncpo8hq8ukrdaskmanualcomplexsplitoutsticky notelangchainadvancedapiintegration
Categories:
Complex WorkflowManual Triggered
Integrations:
SplitOutSticky NoteLangChain

Target Audience

  • Content Creators: Those who want to gather and analyze essays for inspiration or research.
    - Students: Individuals looking for quality essays to reference in their academic work.
    - Developers: Programmers interested in integrating essay scraping into their applications.
    - Researchers: Academics needing a streamlined process to access and store essay content for analysis.
    - Data Scientists: Professionals who require a structured way to gather and utilize textual data for machine learning models.
  • Problem Solved

    This workflow automates the process of scraping essays from a website, extracting their content, and storing them in a vector database. It provides a seamless way to gather information without manual effort, enabling users to quickly access and analyze a collection of essays.

    Workflow Steps

  • Step 1: Trigger the workflow manually by clicking "Execute Workflow".
    - Step 2: Fetch a list of essays from Paul Graham's articles page.
    - Step 3: Extract the names of the essays using HTML parsing techniques.
    - Step 4: Split the essay names into individual items for processing.
    - Step 5: Limit the process to the first 3 essays for efficiency.
    - Step 6: Fetch the full text of each essay from their respective URLs.
    - Step 7: Extract only the textual content from the fetched essays.
    - Step 8: Prepare the text for further processing by splitting it into manageable chunks.
    - Step 9: Generate embeddings for the essay texts using OpenAI's embedding model.
    - Step 10: Store the embeddings in a Milvus vector store for efficient retrieval.
    - Step 11: When a chat message is received, query the vector store to find relevant essay content.
    - Step 12: Answer the user's query based on the retrieved chunks, including citations where applicable.
  • Customization Guide

  • Change the Source URL: Modify the URL in the "Fetch Essay List" node to scrape essays from a different website.
    - Adjust the Number of Essays: Change the limit in the "Limit to first 3" node to retrieve more or fewer essays.
    - Customize Text Extraction: Update the CSS selectors in the "Extract Text Only" node to fit the structure of the new essay pages.
    - Modify Embedding Model: Swap out the OpenAI model in the "Embeddings OpenAI" node for a different model if preferred.
    - Alter Vector Store Settings: Change the parameters in the "Milvus Vector Store" nodes to suit your data retrieval needs, such as adjusting the collection name or retrieval settings.