[2/3] Set up medoids (2 types) for anomaly detection (crops dataset)

Target Audience

This workflow is designed for data scientists, agricultural researchers, and machine learning engineers who are involved in anomaly detection within crop datasets. It is particularly useful for those who need to analyze crop data to identify outliers or unusual patterns that may indicate issues such as disease or environmental stress. Additionally, it can benefit organizations utilizing Qdrant for vector similarity search and embedding models for enhanced data insights.

Problem Solved

This workflow addresses the challenge of detecting anomalies in crop datasets by establishing cluster centers (medoids) and threshold scores. By utilizing two distinct approaches—distance matrix and multimodal embedding—it enables users to identify outliers effectively. This is crucial for maintaining crop health and optimizing agricultural practices, as it allows for timely interventions based on data-driven insights.

Workflow Steps

Trigger the Workflow: The process begins with a manual trigger, allowing users to initiate the workflow as needed.
2. Fetch Total Points: The workflow retrieves the total number of points in the Qdrant collection to understand the dataset size.
3. Calculate Cluster Distance Matrix: It generates a distance matrix for each crop cluster, which helps in identifying the most representative points (medoids).
4. Create Sparse Matrix: Using the distance data, a sparse matrix is constructed to determine the most similar points within each cluster.
5. Set Medoid ID: The workflow identifies and sets the medoid ID for each cluster based on the previous calculations.
6. Fetch Medoid Vector: It retrieves the vector associated with the medoid to prepare for threshold calculations.
7. Prepare for Searching Threshold: The workflow prepares necessary variables for searching the threshold score, including the opposite of the center vector.
8. Searching Score: It queries the Qdrant collection to find the points that are most dissimilar to the center vector, which helps in establishing the threshold score.
9. Set Medoid Threshold Score: The calculated threshold score is saved in the Qdrant collection for the identified medoid.
10. Embed Text Descriptions: The workflow processes textual descriptions of crops using a multimodal embedding model to enhance understanding and analysis.
11. Find Text Medoid: Similar to the previous steps, it identifies a medoid for textual data.
12. Set Text Medoid ID and Threshold Score: The workflow sets the medoid ID and threshold score for textual data, ensuring consistency across both approaches.
13. Finalization: The workflow concludes by updating the Qdrant collection with the new medoid and threshold information, ready for anomaly detection tasks.

Customization Guide

Users can customize this workflow by modifying the following elements:
- Qdrant API Credentials: Update the credentials to connect to your Qdrant instance.
- Crop Descriptions: Edit the textual crop descriptions to reflect the specific crops you are analyzing.
- Threshold Settings: Adjust the threshold settings based on your specific requirements for anomaly detection, such as changing the number of furthest points to consider.
- Cluster Variables: Modify the cluster variables to target different crops or datasets as needed.
- Embedding Model: Users can replace the embedding model with another model if they have a preferred method for embedding textual data.