ManualTrigger Automate - N8N Workflow Directory

Target Audience

Web Developers: Those looking to automate the process of scraping and converting web pages into markdown format for easier data handling.
- Data Analysts: Individuals who need to extract and analyze content from multiple URLs without manual intervention.
- Content Managers: Professionals who manage large volumes of web content and require a streamlined method for conversion and extraction.
- API Integrators: Users who work with APIs and need to implement automated workflows to enhance data processing efficiency.

Problem Solved

This workflow addresses the challenge of efficiently scraping web pages to extract content and links while converting HTML to markdown format. It automates the process, ensuring compliance with API rate limits and enabling batch processing of URLs to optimize server resources.

Workflow Steps

Manual Trigger: The workflow begins with a manual trigger when the user clicks ‘Test workflow’.
2. Get URLs: It retrieves URLs from the specified data source, ensuring that the column named Page contains the links to be scraped.
3. Split Out URLs: The workflow separates the URLs into individual entries for processing.
4. Limit to 40 Items: It processes up to 40 items at a time to respect server memory limits.
5. Batch Processing: Each batch of 10 URLs is handled sequentially to adhere to API rate limits of 10 requests per minute.
6. Retrieve Content: For each URL, the workflow sends a POST request to the Firecrawl API to retrieve the markdown content and links.
7. Data Formatting: The retrieved data is formatted into structured fields such as title, description, content, and links.
8. Connect to Data Source: Finally, the processed data can be sent to a specified output data source, like Airtable, for further use.

Customization Guide

Users can customize this workflow by:
- Modifying Input Source: Change the data source to pull URLs from different databases or modify the array in the Example fields from data source node.
- Adjusting Rate Limits: Users can modify the batch size and maximum items processed to suit their server capabilities.
- Changing Output Format: Customize the output formatting in the Markdown data and Links node to meet specific requirements.
- Updating Authentication: Ensure the correct API key is set in the Retrieve Page Markdown and Links node to authenticate requests.