Insert and retrieve documents - N8N Workflow Directory

Target Audience

Content Creators: Those who want to gather and analyze essays for inspiration or research.
- Students: Individuals looking for quality essays to reference in their academic work.
- Developers: Programmers interested in integrating essay scraping into their applications.
- Researchers: Academics needing a streamlined process to access and store essay content for analysis.
- Data Scientists: Professionals who require a structured way to gather and utilize textual data for machine learning models.

Problem Solved

This workflow automates the process of scraping essays from a website, extracting their content, and storing them in a vector database. It provides a seamless way to gather information without manual effort, enabling users to quickly access and analyze a collection of essays.

Workflow Steps

Step 1: Trigger the workflow manually by clicking "Execute Workflow".
- Step 2: Fetch a list of essays from Paul Graham's articles page.
- Step 3: Extract the names of the essays using HTML parsing techniques.
- Step 4: Split the essay names into individual items for processing.
- Step 5: Limit the process to the first 3 essays for efficiency.
- Step 6: Fetch the full text of each essay from their respective URLs.
- Step 7: Extract only the textual content from the fetched essays.
- Step 8: Prepare the text for further processing by splitting it into manageable chunks.
- Step 9: Generate embeddings for the essay texts using OpenAI's embedding model.
- Step 10: Store the embeddings in a Milvus vector store for efficient retrieval.
- Step 11: When a chat message is received, query the vector store to find relevant essay content.
- Step 12: Answer the user's query based on the retrieved chunks, including citations where applicable.

Customization Guide

Change the Source URL: Modify the URL in the "Fetch Essay List" node to scrape essays from a different website.
- Adjust the Number of Essays: Change the limit in the "Limit to first 3" node to retrieve more or fewer essays.
- Customize Text Extraction: Update the CSS selectors in the "Extract Text Only" node to fit the structure of the new essay pages.
- Modify Embedding Model: Swap out the OpenAI model in the "Embeddings OpenAI" node for a different model if preferred.
- Alter Vector Store Settings: Change the parameters in the "Milvus Vector Store" nodes to suit your data retrieval needs, such as adjusting the collection name or retrieval settings.