ManualTrigger Automate enables users to efficiently scrape web pages, converting HTML content into markdown format while extracting links. It processes URLs in batches of 10 or 40, respecting API rate limits, and integrates seamlessly with your own data sources. This workflow streamlines content retrieval for analysis, ensuring optimal performance and memory management.
This workflow addresses the challenge of efficiently scraping web pages to extract content and links while converting HTML to markdown format. It automates the process, ensuring compliance with API rate limits and enabling batch processing of URLs to optimize server resources.
Page
contains the links to be scraped.Users can customize this workflow by:
- Modifying Input Source: Change the data source to pull URLs from different databases or modify the array in the Example fields from data source
node.
- Adjusting Rate Limits: Users can modify the batch size and maximum items processed to suit their server capabilities.
- Changing Output Format: Customize the output formatting in the Markdown data and Links
node to meet specific requirements.
- Updating Authentication: Ensure the correct API key is set in the Retrieve Page Markdown and Links
node to authenticate requests.