Process millions of records overnight — at the lowest cost per token.
Not every AI workload needs real-time latency. Document classification, audio transcription, image analysis, embedding generation, and data enrichment pipelines are batch workloads that run best on cost-optimised spot clusters. CogniCloud's batch scheduler maximises spot utilisation, parallelises across hundreds of GPUs, and retries failures automatically.
70%
Cost reduction vs on-demand
10M+
Documents processed per hour
0
Failed jobs — automatic retry
∞
Parallelism — add GPUs on demand
The Challenge
Batch AI workloads are expensive to run naively: reserved instances waste money sitting idle between jobs, spot instances get preempted mid-run with no automatic recovery, and parallelising across hundreds of GPUs requires orchestration code most teams don't want to maintain.
How CogniCloud helps
Batch jobs automatically bid for spot capacity across multiple instance pools, maximising utilisation and minimising cost. CogniCloud handles all spot lifecycle events.
Preempted spot instances are detected within seconds. CogniCloud re-queues interrupted work items and dispatches them to new nodes — no data is lost, no manual intervention needed.
Define a map function; CogniCloud automatically distributes work items across 1 to 1,000 GPU workers. Throughput scales linearly with the number of GPUs allocated.
Pull data from S3, GCS, Azure Blob, Hugging Face datasets, or a SQL query. Results are written back to your storage of choice with configurable output formats.
Real-time cost-per-record, items-remaining, estimated completion time, and GPU utilisation — all visible in the dashboard and queryable via the metrics API.
Run custom Python, containerised workloads, or use CogniCloud's built-in pipelines for common tasks: embedding generation, transcription, classification, and re-ranking.
How it works
Write a simple Python function that processes one item. CogniCloud handles the rest: distribution, scheduling, retries, and output collection.
from cognicloud import batch
@batch.job(
gpus_per_worker=1,
gpu_type="high-perf",
strategy="spot",
max_workers=256,
)
def embed_document(doc: dict) -> dict:
embedding = model.encode(doc["text"])
return {"id": doc["id"], "embedding": embedding}Pass a dataset reference or an iterable. CogniCloud auto-shards the input and distributes work across workers.
results = batch.run(
job=embed_document,
dataset="s3://my-bucket/documents/*.jsonl",
output="s3://my-bucket/embeddings/",
max_cost_usd=200,
)
# Live progress
# ████████████░░░░░ 68% 2.1M / 3.1M docs
# Cost so far: $41.20 ETA: 18 minOutputs are written to your destination as jobs complete — no waiting for the entire run to finish. Stream results in real time or query them after completion.
# Stream results as they finish
for result in results.stream():
db.upsert(result["id"], result["embedding"])
# Or query the final report
report = results.summary()
# {
# total: 3_100_000, success: 3_099_994,
# failed: 6, duration: "26m 42s",
# cost_usd: 58.41, tok_per_usd: 52_800
# }Built on
Spot and on-demand GPU workers for parallel batch execution
Shares the same orchestration layer for fault-tolerant distributed work
Caches model weights across workers to eliminate per-worker load time
Used for batch inference jobs that call LLMs as part of the pipeline
LLM Fine-Tuning
Adapt foundation models to your domain — faster and cheaper.
Production Inference
Serve any LLM to millions of users at sub-10 ms TTFT.
RAG Pipelines
Ground your LLMs in real knowledge at billion-document scale.
AI for Startups
Move fast, iterate daily — without a dedicated MLOps team.
Enterprise AI
Secure, compliant, and governed AI infrastructure at any scale.
CogniCloud is in active development. Join the waitlist to get early access and stay updated on our roadmap. No pricing yet — we'll work with each team to find the right fit.
No spam. No pricing pitches. We reach out personally to discuss your use case.