Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls

For n8n, generate an AI-ready `llms.txt` file from Screaming Frog website crawls. This automated workflow extracts key data from your Screaming Frog CSV export, filters URLs based on status and indexability, and formats the output for easy use with language models. Quickly create a downloadable file that enhances content discovery for AI applications, streamlining the process of preparing valuable web content for analysis.

7/8/2025
23 nodes
Complex
manualcomplexsticky notelangchainnoopfiltersummarizeformtriggerextractfromfileconverttofileadvancedlogicconditionalfilesstorage
Categories:
Complex WorkflowManual Triggered
Integrations:
Sticky NoteLangChainNoOpFilterSummarizeFormTriggerExtractFromFileConvertToFile

Target Audience

This workflow is ideal for:
- SEO Professionals: Those who need to generate structured content files from website crawls to improve search engine optimization strategies.
- Content Marketers: Individuals looking to curate high-quality content for AI models or content discovery.
- Web Developers: Developers who want to automate the extraction and organization of website data for further analysis or integration.
- Data Analysts: Analysts needing to process and filter large amounts of data from web crawls efficiently.
- Small Business Owners: Owners of small websites who want to leverage AI for content generation without extensive technical knowledge.

Problem Solved

This workflow addresses the challenge of generating an llms.txt file from Screaming Frog exports, which can be cumbersome and time-consuming. It automates the process of filtering URLs based on specific criteria, ensuring that only valuable content is included. This helps users save time and focus on higher-level tasks while ensuring that the generated file is optimized for AI models.

Workflow Steps

  • Trigger the Workflow: The user fills out a form providing the website name, description, and uploads the internal_html.csv file from Screaming Frog.
    2. Extract Data: The workflow extracts data from the uploaded CSV file, ensuring it is in a usable format for subsequent steps.
    3. Set Useful Fields: Key fields such as URL, title, description, status, indexability, content type, and word count are defined for processing.
    4. Filter URLs: The workflow filters URLs to retain only those with a 200 status code, marked as indexable, and with a content type of text/html.
    5. Classify Content (Optional): A text classifier can be activated to intelligently filter and classify URLs based on their content quality.
    6. Format Rows for llms.txt: Each URL is formatted into a specific row structure for the llms.txt file.
    7. Concatenate Rows: All formatted rows are concatenated to create a single output string.
    8. Set Content for llms.txt: The content of the llms.txt file is prepared, including the website title, description, and concatenated rows.
    9. Generate the File: Finally, the workflow creates the llms.txt file, which can be downloaded directly from n8n or uploaded to a preferred storage solution.
  • Customization Guide

    Users can customize this workflow by:
    - Modifying the Form: Change the form fields to gather additional information or adjust existing prompts.
    - Adjusting Filters: Add or modify filters in the Filter URLs node to refine the selection criteria based on specific needs (e.g., filtering by word count or URL path).
    - Activating the Text Classifier: Enable the Text Classifier node to implement AI-driven filtering based on content quality, and customize the descriptions to fit specific content needs.
    - Changing Output Formats: Modify the Set Field - llms.txt Row node to alter how the rows are structured in the output file.
    - Integrating with Other Services: Replace the final download node with a service like Google Drive or OneDrive to automatically upload the generated file to a cloud storage solution.