AI Voice Chat using Webhook, Memory Manager, OpenAI, Google Gemini & ElevenLabs

Target Audience

This workflow is ideal for:
- Developers looking to integrate voice chat functionalities into their applications.
- Businesses that want to enhance customer support with automated voice responses.
- Educators interested in creating interactive learning platforms using voice interactions.
- Content Creators who aim to automate their audio content generation from text inputs.

Problem Solved

This workflow addresses the challenge of creating an automated voice chat system that can:
- Convert spoken language into text using OpenAI's Speech to Text API.
- Maintain context throughout conversations to provide relevant responses.
- Generate audio responses utilizing ElevenLabs, offering a variety of voices for a more engaging user experience.

Workflow Steps

Webhook Trigger: The workflow starts with a webhook that listens for incoming voice messages.
2. Speech to Text Conversion: The voice message is sent to OpenAI's Speech to Text node, which transcribes the audio into text.
3. Context Retrieval: The transcribed text is processed to retrieve the previous chat context using the Get Chat node.
4. Aggregation of Context: The context from previous messages is aggregated to maintain conversation history.
5. Language Model Processing: The Basic LLM Chain node utilizes the aggregated context and the current message to generate a response using the Google Gemini Chat Model.
6. Inserting Chat: The conversation is updated with the new user and AI messages using the Insert Chat node.
7. Generating Audio Response: The generated text response is sent to ElevenLabs to convert it into audio format.
8. Responding to Webhook: Finally, the audio response is sent back through the webhook to the user.

Customization Guide

To customize this workflow:
- Change Voice Options: Modify the ElevenLabs API call to select different voice IDs for varied audio outputs.
- Adjust Context Management: Tweak the parameters in the memory management nodes to control how much context is retained or how it's processed.
- Integrate Additional APIs: Add nodes to connect with other services for enhanced functionalities, such as sentiment analysis or translation.
- Modify Responses: Adjust the prompts and messages in the Basic LLM Chain to alter how the AI responds based on your specific use case.