For Dumpling AI, this workflow automates the process of scraping book data from a specified URL, cleaning the HTML, sorting the books by price, converting the data into a CSV file, and emailing it. It triggers when a new URL is added to Google Sheets, ensuring timely updates and easy access to organized book information.
This workflow is ideal for:
- Bookstore Owners: Looking to track and analyze book prices from various online sources.
- Data Analysts: Who need a streamlined process for extracting and sorting book data for reports.
- Developers: Interested in automating web scraping tasks without manual intervention.
- Marketers: Who want to gather competitive pricing information for marketing strategies.
- Students and Researchers: Who require book data for academic projects or market research.
This workflow addresses the following challenges:
- Time Efficiency: Automates the extraction of book data from websites, reducing the time spent on manual data entry.
- Data Accuracy: Ensures accurate collection of book titles and prices through automated scraping, minimizing human error.
- Data Organization: Sorts book information by price, making it easier to analyze and compare values.
- Easy Sharing: Converts data into CSV format for easy sharing and further analysis via email.
The workflow consists of the following steps:
1. Trigger: Monitors a Google Sheets document for new URLs.
2. Scrape Website Content: Sends a request to Dumpling AI to fetch the HTML content of the provided URL.
3. Extract All Books: Parses the HTML to find all book entries using CSS selectors.
4. Split HTML Array: Breaks down the array of book entries into individual items for further processing.
5. Extract Individual Book Data: Pulls the title and price for each book from the HTML.
6. Sort by Price: Organizes the book data in descending order based on price.
7. Convert to CSV File: Converts the sorted book data into a CSV format for easy access.
8. Send CSV via Email: Automatically sends the generated CSV file to a specified email address.
Users can customize this workflow through the following methods:
- Modify URL Source: Change the URL in the Google Sheets trigger to monitor a different sheet for new entries.
- Adjust CSS Selectors: Update the CSS selectors in the extraction nodes to accommodate different website structures.
- Email Settings: Customize the email recipient, subject, and message content in the Gmail node for personalization.
- Sorting Preferences: Alter the sorting criteria in the sorting node to organize data based on different fields, such as title or author.
- Add Additional Nodes: Extend functionality by incorporating more nodes for tasks like data validation or additional data processing.