ETL pipeline

ETL pipeline automates the collection of tweets with the hashtag #OnThisDay, analyzes their sentiment, and stores the results in PostgreSQL and MongoDB. It sends notifications to Slack for tweets with a sentiment score above a specified threshold, ensuring timely insights and efficient data management.

7/8/2025
9 nodes
Medium
schedulemediumtwitterpostgresqlmongodbslacknoopgooglecloudnaturallanguageautomationdatabasedatanosqlcommunicationnotificationlogicconditional
Categories:
Social Media ManagementData Processing & AnalysisCommunication & MessagingSchedule TriggeredMedium Workflow
Integrations:
TwitterPostgreSQLMongoDBSlackNoOpGoogleCloudNaturalLanguage

Target Audience

This workflow is ideal for:
- Social Media Managers: To monitor and analyze tweets related to specific topics, such as #OnThisDay, and engage with followers based on sentiment.
- Data Analysts: To gather and store tweet data in PostgreSQL and MongoDB for further analysis and reporting.
- Marketers: To track sentiment around their brand or campaigns on Twitter, allowing for timely responses.
- Developers: To integrate different data sources and automate notifications through Slack.

Problem Solved

This workflow addresses the challenge of efficiently gathering, analyzing, and responding to social media content. It automates the process of:
- Collecting tweets containing #OnThisDay.
- Storing relevant data in both PostgreSQL and MongoDB.
- Analyzing sentiment using Google Cloud Natural Language.
- Sending notifications to a Slack channel when certain sentiment thresholds are met, ensuring timely engagement.

Workflow Steps

  • Scheduled Trigger: The workflow runs daily at 6 AM to fetch tweets.
    2. Twitter Node: It searches for the latest 3 tweets containing the hashtag #OnThisDay.
    3. MongoDB Node: The tweets are inserted into the MongoDB collection for storage.
    4. Google Cloud Natural Language Node: Each tweet's text is analyzed to determine its sentiment score and magnitude.
    5. Set Node: The sentiment score and magnitude, along with the tweet text, are prepared for further processing.
    6. Postgres Node: The relevant data is stored in the PostgreSQL database.
    7. IF Node: A condition checks if the sentiment score is above a specified threshold.
    8. Slack Node: If the condition is met, a notification is sent to the Slack channel with details of the tweet.
    9. NoOp Node: If the condition is not met, the workflow simply ends without action.
  • Customization Guide

    Users can customize this workflow by:
    - Changing the Search Text: Modify the searchText parameter in the Twitter node to track different hashtags or keywords.
    - Adjusting the Sentiment Threshold: Update the condition in the IF node to change the sentiment score threshold for notifications.
    - Modifying the Notification Message: Customize the text parameter in the Slack node to change how notifications are formatted.
    - Altering the Schedule: Adjust the triggerTimes in the Cron node to run the workflow at different times or frequencies.
    - Expanding Data Storage: Add more fields in the Postgres and MongoDB nodes to capture additional tweet attributes as needed.