ManualTrigger Automate

ManualTrigger Automate enables users to efficiently scrape web pages, converting HTML content into markdown format while extracting links. It processes URLs in batches of 10 or 40, respecting API rate limits, and integrates seamlessly with your own data sources. This workflow streamlines content retrieval for analysis, ensuring optimal performance and memory management.

7/8/2025
17 nodes
Complex
manualcomplexwaitsticky notenoopsplitinbatchessplitoutadvancedapiintegration
Categories:
Complex WorkflowManual Triggered
Integrations:
WaitSticky NoteNoOpSplitInBatchesSplitOut

Target Audience

  • Web Developers: Those looking to automate the process of scraping and converting web pages into markdown format for easier data handling.
    - Data Analysts: Individuals who need to extract and analyze content from multiple URLs without manual intervention.
    - Content Managers: Professionals who manage large volumes of web content and require a streamlined method for conversion and extraction.
    - API Integrators: Users who work with APIs and need to implement automated workflows to enhance data processing efficiency.
  • Problem Solved

    This workflow addresses the challenge of efficiently scraping web pages to extract content and links while converting HTML to markdown format. It automates the process, ensuring compliance with API rate limits and enabling batch processing of URLs to optimize server resources.

    Workflow Steps

  • Manual Trigger: The workflow begins with a manual trigger when the user clicks ‘Test workflow’.
    2. Get URLs: It retrieves URLs from the specified data source, ensuring that the column named Page contains the links to be scraped.
    3. Split Out URLs: The workflow separates the URLs into individual entries for processing.
    4. Limit to 40 Items: It processes up to 40 items at a time to respect server memory limits.
    5. Batch Processing: Each batch of 10 URLs is handled sequentially to adhere to API rate limits of 10 requests per minute.
    6. Retrieve Content: For each URL, the workflow sends a POST request to the Firecrawl API to retrieve the markdown content and links.
    7. Data Formatting: The retrieved data is formatted into structured fields such as title, description, content, and links.
    8. Connect to Data Source: Finally, the processed data can be sent to a specified output data source, like Airtable, for further use.
  • Customization Guide

    Users can customize this workflow by:
    - Modifying Input Source: Change the data source to pull URLs from different databases or modify the array in the Example fields from data source node.
    - Adjusting Rate Limits: Users can modify the batch size and maximum items processed to suit their server capabilities.
    - Changing Output Format: Customize the output formatting in the Markdown data and Links node to meet specific requirements.
    - Updating Authentication: Ensure the correct API key is set in the Retrieve Page Markdown and Links node to authenticate requests.