AI Voice Chat using Webhook, Memory Manager, OpenAI, Google Gemini & ElevenLabs

AI Voice Chat using Webhook automates voice interactions by transcribing speech to text, maintaining conversation context, and generating audio responses. This workflow integrates OpenAI, Google Gemini, and ElevenLabs to provide seamless, intelligent voice communication, enhancing user engagement and accessibility.

7/8/2025
15 nodes
Complex
mqornvcdgqlzpa2xwebhookcomplexlangchainsticky noteaggregaterespondtowebhookadvancedintegrationapi
Categories:
Complex WorkflowWebhook Triggered
Integrations:
LangChainSticky NoteAggregateRespondToWebhook

Target Audience

This workflow is ideal for:
- Developers looking to integrate voice chat functionalities into their applications.
- Businesses that want to enhance customer support with automated voice responses.
- Educators interested in creating interactive learning platforms using voice interactions.
- Content Creators who aim to automate their audio content generation from text inputs.

Problem Solved

This workflow addresses the challenge of creating an automated voice chat system that can:
- Convert spoken language into text using OpenAI's Speech to Text API.
- Maintain context throughout conversations to provide relevant responses.
- Generate audio responses utilizing ElevenLabs, offering a variety of voices for a more engaging user experience.

Workflow Steps

  • Webhook Trigger: The workflow starts with a webhook that listens for incoming voice messages.
    2. Speech to Text Conversion: The voice message is sent to OpenAI's Speech to Text node, which transcribes the audio into text.
    3. Context Retrieval: The transcribed text is processed to retrieve the previous chat context using the Get Chat node.
    4. Aggregation of Context: The context from previous messages is aggregated to maintain conversation history.
    5. Language Model Processing: The Basic LLM Chain node utilizes the aggregated context and the current message to generate a response using the Google Gemini Chat Model.
    6. Inserting Chat: The conversation is updated with the new user and AI messages using the Insert Chat node.
    7. Generating Audio Response: The generated text response is sent to ElevenLabs to convert it into audio format.
    8. Responding to Webhook: Finally, the audio response is sent back through the webhook to the user.
  • Customization Guide

    To customize this workflow:
    - Change Voice Options: Modify the ElevenLabs API call to select different voice IDs for varied audio outputs.
    - Adjust Context Management: Tweak the parameters in the memory management nodes to control how much context is retained or how it's processed.
    - Integrate Additional APIs: Add nodes to connect with other services for enhanced functionalities, such as sentiment analysis or translation.
    - Modify Responses: Adjust the prompts and messages in the Basic LLM Chain to alter how the AI responds based on your specific use case.