NVIDIA Cosmos Dataset Search (CDS)

NVIDIA Cosmos Dataset Search (CDS) is a comprehensive platform for semantic search across video datasets using advanced AI models. The platform enables text-to-video and video-to-video queries against large-scale video collections with GPU-accelerated inference and vector similarity search.

Key Features

Search and Retrieval

Multimodal semantic search with text and video queries
GPU-accelerated embedding generation using NVIDIA Cosmos-embed NIM
Fast vector similarity search powered by Milvus with NVIDIA cuVS acceleration
Support for multiple collections and cross-modal retrieval

Data Management

Automated video ingestion pipeline with frame extraction and metadata indexing
S3-compatible object storage (LocalStack, MinIO, AWS S3)
Batch and incremental data ingestion with Ray-based parallel processing
Support for various video formats and frame rates

User Experience

Interactive React-based web UI for search and data curation
REST API for programmatic access with comprehensive documentation
CDS CLI for collection management and data ingestion
Real-time search results with video previews and metadata

Deployment and Operations

Flexible deployment: Docker Compose for local development, Helm for production
Production-ready Helm charts for AWS EKS and generic Kubernetes
Comprehensive monitoring, logging, and observability
Modular architecture enabling independent scaling of components

Software Components

NVIDIA NIM Microservices

Cosmos-embed NIM - Unified embedding service providing state-of-the-art text and video embeddings in a common semantic space optimized for temporal understanding and cross-modal search.

Integration and Orchestration Layer

Visual Search Service - FastAPI-based REST API orchestrating ingestion and search operations
Milvus Vector Database - GPU-accelerated vector storage and similarity search with NVIDIA cuVS
Object Storage - S3-compatible storage for video files, frames, and metadata (LocalStack, MinIO, or AWS S3)
Web UI - React-based interface for interactive search, browsing, and data curation

Technical Diagram

The diagram illustrates the complete CDS architecture showing the flow from data ingestion through embedding generation to search and retrieval via both the web UI and REST API.

Workflow

Install Prerequisites - Set up required software and dependencies for your chosen deployment method.
Deploy Services - Launch CDS using Docker Compose for local deployment or Helm charts for Kubernetes deployment.
Create and Manage Collections - Use the CDS CLI or REST API to create vector collections for organizing your video datasets.
Ingest Video Data - Upload videos using the ingestion pipeline or bulk indexing, which processes videos and extracts embeddings.
Search via Web UI - Access the interactive web interface to perform text-to-video or video-to-video searches with real-time results.
Search via CLI or API - Query collections programmatically using the CDS CLI or REST API for integration into workflows and applications.

Deployment Options

CDS supports two primary deployment methods: Docker Compose for local or small-scale setups, and Kubernetes with Helm charts for scalable, production-grade environments. Easy setup and deployment guides for both methods are provided below.

Docker Compose Deployment

Best for local development, testing, and small-scale deployments. All services run on a single node with simplified configuration.

Docker Compose Deployment Prerequisites - System requirements and setup
Docker Compose Deployment Guide - Step-by-step deployment instructions
Docker Compose Troubleshooting - Common issues and solutions

Kubernetes Deployment with Helm

Recommended for production deployments requiring scalability, high availability, and enterprise features.

Deployment Guides

AWS EKS Deployment - Deploy on Amazon Elastic Kubernetes Service

Using CDS

After deploying CDS, you can interact with the system through three primary methods: Web UI, REST API, and CLI. The user guide provides quickstart tutorials for each interface.

User Guide

CDS User Guide - Interactive tutorials for Web UI, REST API, and CLI

Documentation

For comprehensive documentation, see the Complete Documentation Index.

Repository Structure

cds/
├── src/                          # Source code
│   ├── visual_search/           # Visual search service
│   ├── haystack/                # Haystack integration components
│   ├── models/                  # Model implementations
│   └── triton/                  # Triton inference server configs
├── deploy/                       # Deployment configurations
│   ├── services/                # Docker Compose configs
│   └── standalone/              # Standalone deployment options
├── infra/                        # Infrastructure as code
│   └── blueprint/               # Helm charts and Kubernetes manifests
├── docs/                         # Documentation
│   ├── docs/                    # User documentation
│   └── guides/                  # Deployment and usage guides
├── scripts/                      # Utility scripts
│   ├── evals/                   # Evaluation scripts
│   └── performance/             # Performance testing tools
└── tests/                        # Test suites

License

This project is licensed under the terms specified in the LICENSE file. This project may download and install additional third-party open source software. Review the license terms of these open source projects before use.

Security

For security concerns, please refer to our Security Policy.

Security Considerations

The Cosmos Dataset Search Blueprint is shared as a reference and is provided "as is". The security in the production environment is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats; define the trust boundaries, implement logging and monitoring capabilities, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment up to date, ensure the containers/source code are secure and free of known vulnerabilities.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
deploy/standalone		deploy/standalone
docker		docker
docs		docs
infra/blueprint		infra/blueprint
scripts		scripts
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
OSS_SOURCES.txt		OSS_SOURCES.txt
README.md		README.md
SECURITY.md		SECURITY.md
package.json		package.json
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NVIDIA Cosmos Dataset Search (CDS)

Key Features

Search and Retrieval

Data Management

User Experience

Deployment and Operations

Software Components

NVIDIA NIM Microservices

Integration and Orchestration Layer

Technical Diagram

Workflow

Deployment Options

Docker Compose Deployment

Kubernetes Deployment with Helm

Deployment Guides

Using CDS

User Guide

Documentation

Repository Structure

License

Security

Security Considerations

About

Uh oh!

Releases 1

Packages

Contributors 2

Languages

License

NVIDIA-Omniverse-blueprints/cosmos-dataset-search

Folders and files

Latest commit

History

Repository files navigation

NVIDIA Cosmos Dataset Search (CDS)

Key Features

Search and Retrieval

Data Management

User Experience

Deployment and Operations

Software Components

NVIDIA NIM Microservices

Integration and Orchestration Layer

Technical Diagram

Workflow

Deployment Options

Docker Compose Deployment

Kubernetes Deployment with Helm

Deployment Guides

Using CDS

User Guide

Documentation

Repository Structure

License

Security

Security Considerations

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages