As we move into development of our latest tool, Campaign Intelligence, we need to take a look at the high-level architecture and available resources before moving forward. This document serves as a reference for the research we did, the benefits and drawbacks of using specific integrations for our product, and the reasoning behind any decisions made.
Overview
Considering our current usage of AWS S3 for asset storage, Campaign Intelligence should leverage AWS tools for building the MVP due to the ease of integration, only using open-source tools when there are specific edge cases that AWS tools can’t handle or when an AWS tool has proven to be too costly at scale.
For vector storage, Pinecone offers faster similarity search than Snowflake, though this difference is unlikely to impact performance at our current scale. For our use case, Snowflake’s primary advantage is its Cortex agents, which can significantly accelerate MVP development. By using Cortex agents, we can create a single query endpoint that intelligently routes requests to the appropriate data source based on the nature of the query.
The list below covers tools for object detection/recognition for images and video, vector embedding, speech-to-text transcription, and vector embedding storage.
AWS Rekognition – Object Detection/Recognition
Rekognition will seamlessly integrate with our S3 data and will allow us to scale up the MVP quickly due to its out-of-the-box features. Rekognition can extract text from images and videos, but will not directly extract and transcribe video. AWS Transcribe can handle that task, however.
Rekognition can be expensive at large scale, but it’s the appropriate tool to scale our MVP considering our current tech stack.
Pros
- Single API covers object detection, scene recognition, text extraction, face detection, and moderation.
- Direct integration with AWS S3 and IAM security.
- Fully managed service with automatic scaling.
- Continuously updated models trained on large proprietary datasets for strong baseline accuracy.
- Predictable per-asset pricing.
- Faster time-to-market for MVP.
Cons
- Vendor lock-in within AWS ecosystem, dependent on AWS service availability and pricing changes.
- Less flexibility to customize models or training compared to open-source alternatives.
- Ongoing per-use costs can be higher than self-hosted solutions at large scale.
- Limited visibility into model architecture and training data.
Open-Source Alternatives:
OpenCV – General-purpose computer vision library
Detectron2 (Meta AI) – High-accuracy object detection, segmentation, and keypoint detection.
YOLOv8 (Ultralytics) – Real-time object detection and tracking.
MMDetection (OpenMMLab) – Modular object detection framework supporting many architectures.
Tesseract OCR – Open-source OCR for text extraction from images.
AWS Titan – Multimodal Embeddings
AWS Titan Multimodal Embeddings G1 is a fully managed service in Amazon Bedrock that generates high-quality vector embeddings for both images and text in a shared semantic space. This allows Campaign Intelligence to index creative assets from S3 and make them searchable by image, by text, or cross-modally (e.g., “find images like this” or “find images matching this description”). Titan integrates natively with AWS services, removing the need for GPU hosting or model management, and can process images, text captions, or both in a single request.
By pairing Titan with Rekognition, the system can store both descriptive metadata (labels, objects, text from images/videos) and semantic embeddings for every asset in Snowflake’s VECTOR columns. This enables Cortex Agents to perform rich semantic search and insight generation without moving data between systems. While open-source models like CLIP can produce similar embeddings, Titan’s managed infrastructure, AWS integration, and scalability make it the more efficient choice for an MVP build.
Pros
- Generates vector embeddings for images, text, and cross-modal search in a shared space.
- Fully managed, no GPU provisioning or model hosting required.
- Native integration with AWS S3 and Bedrock APIs.
- Works seamlessly alongside Rekognition metadata enrichment.
- Outputs can be stored directly in Snowflake VECTOR columns for search and analysis.
- Scales on demand with predictable API usage costs.
Cons
- Vendor lock-in within AWS ecosystem, dependent on Bedrock service availability and pricing changes.
- Less control over model architecture and fine-tuning compared to self-hosted open-source options.
- API usage costs may exceed self-hosted solutions at very large scale.
- Limited community benchmarking compared to open-source models like CLIP.
Open-Source Alternatives:
OpenAI CLIP – Image–text joint embeddings in the same semantic space.
BLIP / BLIP-2 – Image–text understanding and caption generation.
AWS Transcribe – Speech-to-Text Transcription
AWS Transcribe is a fully managed speech-to-text service that automatically converts audio from video or standalone audio files into accurate, time-stamped transcripts. For Campaign Intelligence, Transcribe can process the audio tracks of ad videos stored in S3, making spoken content searchable and analyzable alongside creative metadata from Rekognition and embeddings from Titan. Transcribe supports multiple languages, speaker identification, and custom vocabularies to better handle brand-specific or industry terms. Its integration with AWS services means transcripts can be generated on ingestion and stored directly in Snowflake for search, sentiment analysis, or content classification.
Pros
- Fully managed, scalable speech-to-text conversion with no model hosting required.
- Supports speaker diarization (identifying who is speaking) and channel-specific transcription.
- Custom vocabularies improve accuracy for brand names, product terms, or industry jargon.
- Native integration with S3, enabling automated processing pipelines.
- Generates time-stamped transcripts for fine-grained search and analysis.
- Works well alongside Rekognition and Titan outputs for multimodal analysis.
Cons
- Vendor lock-in within AWS ecosystem and pricing model.
- Less flexible for fine-tuned speech models compared to self-hosted open-source options like Whisper.
- Quality of transcription may vary for noisy environments or heavily accented speech.
- API costs may scale quickly for long-form or high-volume video/audio processing.
Open-Source Alternative:
OpenAI Whisper – State-of-the-art multilingual transcription, robust to noise and accents.
AWS Glue – Managed ETL/ELT Orchestration
AWS Glue is a fully managed serverless data integration service that automates the discovery, preparation, and movement of data between sources and destinations. For Campaign Intelligence, Glue can coordinate ETL pipelines from S3, BigQuery, and other sources into Snowflake, triggering processing steps such as Rekognition, Titan, and Transcribe along the way. Glue’s integration with the AWS ecosystem allows for event-driven or scheduled workflows, schema discovery, and transformation without dedicated servers or complex infrastructure management.
Pros
- Fully managed and serverless — no infrastructure provisioning or maintenance.
- Native integration with AWS services (S3, Athena, Redshift, Bedrock) and strong compatibility with Snowflake.
- Built-in data catalog for automated schema discovery and metadata management.
- Supports event-driven pipelines (e.g., process data as soon as it lands in S3).
- Scales automatically for large datasets and high-throughput jobs.
- Python (PySpark) support for flexible data transformations.
Cons
- Vendor lock-in to AWS ecosystem, with limited portability to non-AWS environments.
- Cold-start latency for serverless jobs can slow down short, frequent tasks.
- Higher costs for continuous or real-time processing compared to persistent infrastructure.
- Limited UI for complex orchestration logic — may require pairing with tools like Apache Airflow for advanced scheduling and dependency management.
- PySpark-based transformations have a steeper learning curve for teams unfamiliar with Spark.
Open-Source Alternative:
Apache Airflow – Widely used workflow orchestration, supports event and schedule triggers.
Snowflake – Vector Storage
Snowflake’s native VECTOR data type allows Campaign Intelligence to store high-dimensional embeddings alongside structured performance data and creative metadata in a single platform. This eliminates the need for a separate vector database during the MVP phase, reducing infrastructure complexity and latency. Storing embeddings directly in Snowflake enables seamless filtering, aggregation, and joining with campaign metrics, while maintaining enterprise-grade governance and security controls.
Pros
- Stores embeddings and tabular data together for unified querying.
- Supports similarity search via built-in functions.
- Benefits from Snowflake’s scalability, availability, and security features.
- Simplifies architecture — no separate vector database to maintain.
- Enables hybrid search (metadata filters + vector similarity) in a single query.
Cons
- Less specialized than dedicated vector databases for ultra-low-latency workloads.
- Fewer indexing/tuning options than tools like Pinecone or Milvus.
- Vector functionality is newer and may have a smaller support ecosystem.
Open-Source Alternatives:
PostgreSQL with pgvector extension.
Weaviate, Milvus, or Qdrant for standalone vector storage.
Snowflake Cortex Agents
Cortex Agents are Snowflake’s LLM-powered orchestration layer, allowing AI workflows to run directly on data stored in Snowflake. For Campaign Intelligence, Cortex Agents can handle natural language queries, semantic search, and multi-step reasoning without moving data into an external application server or orchestration tool. This means users can interact with campaign data conversationally, with the agent translating their intent into SQL, embedding lookups, or multi-step insight generation flows.
Pros
- Runs LLM-driven workflows directly on Snowflake-resident data.
- Reduces latency by avoiding data movement to external systems.
- Can combine structured queries, semantic search, and AI-generated insights in a single flow.
- Integrates natively with Snowflake’s vector storage and Cortex Functions (embedding generation, summarization, classification).
- Lowers infrastructure complexity
Cons
- Less customizable than open-source agent frameworks.
- Dependent on Snowflake’s model offerings and API limits.
- Early-stage feature set compared to mature orchestration tools.
Open-Source Alternatives:
LangChain Agents
LlamaIndex Agents
Pinecone – Vector Storage (Alternative to Snowflake)
Pinecone is a specialized vector database optimized for large-scale, low-latency similarity search. It can handle billions of embeddings and offers advanced ANN features like configurable indexes and metadata filtering, giving precise control over search performance. Its mature ecosystem and strong integration support make it a solid choice when vector search is the primary workload.
While Pinecone requires separate infrastructure and data syncing from systems like Snowflake, and lacks agent capabilities, Pinecone offers top-tier search performance but at the cost of extra integration and operational overhead.
Pros
- Purpose-built for vector search, delivering high performance for large-scale similarity queries.
- Optimized for low-latency retrieval, even with billions of vectors.
- Supports advanced ANN (approximate nearest neighbor) features, including configurable index types and metadata filtering.
- Scales independently of other infrastructure, allowing dedicated tuning for search workloads.
- Mature developer ecosystem with strong documentation, SDKs, and community support.
Cons
- Requires separate infrastructure, adding complexity to deployment and maintenance.
- Involves ongoing data syncing from Snowflake or other sources, introducing potential latency and duplication.
- Additional vendor relationship and cost on top of existing Snowflake spend.
- No built-in agent or orchestration layer for multi-step reasoning
- Governance, permissions, and compliance must be managed separately from the primary data warehouse.