Read sitemap and filter URLs

For n8n, this workflow reads a sitemap.xml file, extracts URLs, and filters them to return only PDF documents. It streamlines the process of managing web content by automating the retrieval and filtering of relevant links, saving time and effort in data collection. Users can easily customize the sitemap URL and filtering criteria to suit their needs.

7/8/2025
10 nodes
Medium
manualmediumsplitoutfiltersticky noteapiintegrationdataparsing
Categories:
Manual TriggeredMedium Workflow
Integrations:
SplitOutFilterSticky Note

Target Audience

  • Webmasters: Individuals managing websites who need to analyze their sitemap data.
    - SEO Specialists: Professionals looking to filter and extract specific URLs for optimization purposes.
    - Developers: Those who want to automate the process of fetching and processing sitemap URLs.
    - Content Managers: Users who need to curate content based on specific file types like PDFs from a sitemap.
  • Problem Solved

    This workflow automates the extraction and filtering of URLs from a sitemap.xml file, specifically targeting PDF documents. It eliminates the manual effort of sifting through numerous URLs, allowing users to focus on relevant content efficiently.

    Workflow Steps

  • Trigger the Workflow: The workflow starts with a manual trigger, allowing users to initiate the process whenever needed.
    - Set Sitemap URL: Users specify the URL of the sitemap.xml file they want to analyze.
    - Get Sitemap: The workflow fetches the sitemap data from the specified URL.
    - Convert Sitemap to JSON: The XML data from the sitemap is converted into a JSON format for easier processing.
    - Split Out URLs: The JSON data is split to isolate individual URLs for further filtering.
    - Filter URLs: The workflow applies a filter to only retain URLs that end with .pdf, focusing on the desired content type.
  • Customization Guide

  • Changing the Sitemap URL: Update the Set sitemap URL node to point to a different sitemap.xml file as needed.
    - Modifying Filter Conditions: Adjust the Filter URLs node to include different file types or specific URL patterns by changing the conditions set in the node.
    - Adding Additional Processing Steps: Users can insert additional nodes between existing ones to perform more complex data manipulations or integrations as required.