Easily Compare LLMs Using OpenAI and Google Sheets

Easily compare outputs from two language models using OpenAI and Google Sheets. This workflow allows you to evaluate model responses side by side in a chat interface while logging results for manual or automated assessment. Ideal for teams, it simplifies the decision-making process for selecting the best AI model for your needs, ensuring non-technical stakeholders can easily review performance.

7/8/2025
21 nodes
Complex
manualcomplexlangchainsplitinbatchessticky notesummarizeaggregategooglesheetssplitoutadvanced
Categories:
Complex WorkflowManual TriggeredData Processing & Analysis
Integrations:
LangChainSplitInBatchesSticky NoteSummarizeAggregateGoogleSheetsSplitOut

Target Audience

  • Data Scientists: Need to evaluate and compare different LLM outputs for specific use cases.
    - AI Developers: Working on AI agents that require assessment of multiple language models for performance.
    - Product Managers: Want to make informed decisions about which LLM to implement based on real-world evaluations.
    - Non-Technical Stakeholders: Can easily review and assess model outputs through Google Sheets without requiring deep technical knowledge.
  • Problem Solved

    This workflow addresses the challenge of evaluating and comparing outputs from different language models (LLMs) efficiently. It allows users to:
    - Assess the performance of models side by side.
    - Log responses in a structured format for easy analysis.
    - Make data-driven decisions on which model to use in production based on comparative results.

    Workflow Steps

  • Step 1: Trigger the workflow by receiving a chat message.
    - Step 2: Define the models to compare, such as openai/gpt-4.1 and mistralai/mistral-large.
    - Step 3: Loop through each model, sending the same user input to both.
    - Step 4: Each model generates a response, which is stored along with the input and context.
    - Step 5: Responses are concatenated for comparison and logged into Google Sheets.
    - Step 6: Users can evaluate the model outputs directly in the sheet, with options for manual or automated assessments.
  • Customization Guide

  • Model Selection: Modify the Define Models to Compare node to include additional models as needed.
    - Google Sheets Template: Customize the Google Sheets structure to include additional evaluation criteria or change existing ones.
    - Memory Management: Adjust the memory nodes to use different backends like Redis or Postgres if required for scalability.
    - AI Agent Configuration: Define specific system prompts and tools in the AI Agent node to tailor responses for your use case.