LinkedIn Web Scraping with Bright Data MCP Server & Google Gemini

For LinkedIn, this automated workflow scrapes detailed profiles and company information using Bright Data and Google Gemini. It efficiently extracts data, formats it into JSON, and saves it to disk, enabling users to gather insights quickly and streamline their data collection process.

7/8/2025
20 nodes
Complex
zowtamlepqagw76tddpkw7hg5dzhqu2wmanualcomplexsticky notemcpclientlangchainaggregatereadwritefileadvancedapiintegrationcodecustomfilesstorage
Categories:
Complex WorkflowManual Triggered
Integrations:
Sticky NoteMcpClientLangChainAggregateReadWriteFile

Target Audience

This workflow is designed for:
- Data Analysts looking to extract and analyze LinkedIn profile and company data efficiently.
- Market Researchers who need to gather insights about individuals and organizations from LinkedIn.
- Business Development Teams seeking to understand potential leads and company backgrounds.
- Developers who want to automate data collection processes using web scraping techniques.
- Entrepreneurs interested in gaining competitive intelligence from LinkedIn profiles.

Problem Solved

This workflow addresses the challenge of manually collecting and processing data from LinkedIn profiles and company pages. By automating the scraping process, it:
- Reduces the time spent on data collection by up to 90%.
- Minimizes human error associated with manual data entry.
- Provides structured data in JSON format, making it easier to analyze and integrate into other systems.
- Ensures that users can obtain up-to-date information directly from LinkedIn without violating terms of service.

Workflow Steps

  • Manual Trigger: The workflow begins when the user clicks 'Test workflow'.
    2. Setting URLs: The workflow sets the URLs for both the LinkedIn person and company profiles to scrape.
    3. Scraping LinkedIn Data: Two separate nodes invoke the Bright Data MCP Client to scrape the LinkedIn person and company profiles, respectively, using specified tool parameters.
    4. Data Extraction: The scraped data is processed to extract relevant information such as 'about' and 'company story'.
    5. Data Formatting: The extracted information is formatted into a structured JSON format using the LinkedIn Data Extractor node.
    6. AI Model Integration: The Google Gemini Chat Model is used to generate a narrative or blog post based on the extracted data.
    7. Data Storage: The workflow creates binary data for both LinkedIn person and company information and writes them to disk in specified file paths.
    8. Webhook Notifications: Finally, the workflow sends the processed data to a specified webhook URL for further integration or notification.
  • Customization Guide

    Users can customize this workflow by:
    - Changing URLs: Update the URLs in the 'Set the URLs' and 'Set the LinkedIn Company URL' nodes to scrape different LinkedIn profiles.
    - Modifying Tool Parameters: Adjust the parameters in the Bright Data MCP Client nodes to refine the data extraction process based on specific needs.
    - Editing JSON Structure: Alter the JSON structure in the 'LinkedIn Data Extractor' node to fit the desired output format.
    - Integrating Additional Nodes: Add or replace nodes to include more data processing steps or integrate with other APIs and services.
    - Adjusting Webhook URLs: Change the webhook URLs to send the output data to different endpoints for further processing or storage.