Vector DB Loader from Google Drive - N8N Workflow Directory

Target Audience

This workflow is designed for:
- Data Scientists: Who need to automate the process of loading, processing, and storing document embeddings.
- Developers: Looking for an efficient way to handle files from Google Drive and integrate them into a database.
- Researchers: Who require a systematic approach to manage and analyze large sets of documents, especially in PDF, text, and JSON formats.
- Business Analysts: Interested in leveraging document data for insights and reporting.
- Automation Enthusiasts: Wanting to streamline their workflows and minimize manual tasks.

Problem Solved

This workflow addresses the challenge of:
- Manual File Handling: Reducing the time spent on downloading, processing, and storing files from Google Drive.
- Data Integration: Seamlessly integrating various document formats into a PostgreSQL database for further analysis.
- File Organization: Automatically moving processed files to designated folders, ensuring better organization and accessibility.
- Complex Workflow Management: Simplifying the management of various document types and their embeddings through a structured automation process.

Workflow Steps

Schedule Trigger: The workflow is initiated automatically every day at 3 AM.
2. Search Folder: It searches a specific Google Drive folder for files to process.
3. Loop Over Items: Each file found is processed one by one.
4. Download File: Each file is downloaded from Google Drive.
5. Switch Node: The workflow determines the file type (PDF, text, or JSON) based on its MIME type.
6. Extract from File: Depending on the file type, the appropriate extraction method is applied:
- For PDFs, it uses the Extract from PDF node.
- For text files, it uses the Extract from Text node.
- For JSON files, it uses the Extract from JSON node.
7. Embeddings OpenAI: The extracted text is processed to generate embeddings using OpenAI's model.
8. Postgres PGVector Store: The embeddings are stored in a PostgreSQL database.
9. Move File: Finally, the processed file is moved to a designated folder in Google Drive, ensuring organization.

Customization Guide

Users can customize the workflow by:
- Changing the Schedule: Adjust the Schedule Trigger node to modify the time and frequency of execution according to their needs.
- Modifying Search Parameters: Update the Search Folder node to point to different folders or change the search criteria.
- Adding New File Types: Extend the Switch node to handle additional file types by adding new conditions and extraction methods.
- Updating Database Credentials: Modify the Postgres PGVector Store node's credentials to connect to a different database.
- Customizing Data Processing: Change the parameters in the Embeddings OpenAI node to use different models or settings based on specific requirements.