Back to CogniCloud
PlannedQ3 2026

Vector Store

Billion-scale embedding retrieval at sub-millisecond latency.

Purpose-built vector database for production RAG and semantic search. Hybrid BM25 + dense retrieval, GPU-accelerated HNSW indexing, real-time upserts, and a query API designed to feel as simple as a SQL SELECT statement.

Capabilities

Everything you need, nothing you don't.

1

GPU-accelerated HNSW

Index construction and nearest-neighbour search are GPU-accelerated using cuVS (RAPIDS). Build a 100M-vector HNSW index in minutes, not hours.

2

Hybrid BM25 + dense search

Combine sparse keyword matching with dense semantic similarity in a single query. Reciprocal rank fusion merges results with configurable blend weights.

3

Real-time upserts

Vectors are queryable within milliseconds of insertion — no batch ingestion jobs, no index rebuild downtime. Designed for live document pipelines.

4

Metadata filtering

Attach arbitrary JSON metadata to every vector. Pre-filter by metadata before the ANN search to dramatically reduce search space and improve relevance.

5

Multi-tenancy

Namespace isolation allows one deployment to serve thousands of tenants with strict data separation. No cross-tenant data leakage at any layer.

6

Serverless & dedicated tiers

Start with the serverless tier (pay per query) for development. Migrate to dedicated nodes for predictable latency and higher QPS guarantees in production.

Technical Specifications

Under the hood.

Index typeHNSW (GPU-accelerated via cuVS)
ScaleBillion-vector per namespace
Query latency< 1 ms (p99, dedicated tier)
Hybrid searchBM25 + dense, RRF fusion
Real-time upsertsYes, queryable in < 10 ms
Embedding dimensionsUp to 65,536
Metadata filteringJSON, pre-filter before ANN
Multi-tenancyNamespace-level isolation

Vector Store is currently planned — estimated Q3 2026.

No pricing yet. We offer tailored solutions only.

Get notified at launch
Platform in development

Be first to
shape the future.

CogniCloud is in active development. Join the waitlist to get early access and stay updated on our roadmap. No pricing yet — we'll work with each team to find the right fit.

No spam. No pricing pitches. We reach out personally to discuss your use case.

GPU Compute
Inference APIs
Vector Search
Observability