Back to CogniCloud
Solution·
Data EngineersML Platform TeamsBackend Engineers

Batch & Offline AI

Process millions of records overnight — at the lowest cost per token.

Not every AI workload needs real-time latency. Document classification, audio transcription, image analysis, embedding generation, and data enrichment pipelines are batch workloads that run best on cost-optimised spot clusters. CogniCloud's batch scheduler maximises spot utilisation, parallelises across hundreds of GPUs, and retries failures automatically.

70%

Cost reduction vs on-demand

10M+

Documents processed per hour

0

Failed jobs — automatic retry

Parallelism — add GPUs on demand

The Challenge

Why this is hard.

Batch AI workloads are expensive to run naively: reserved instances waste money sitting idle between jobs, spot instances get preempted mid-run with no automatic recovery, and parallelising across hundreds of GPUs requires orchestration code most teams don't want to maintain.

How CogniCloud helps

Everything you need, built in.

Spot-first scheduling

Batch jobs automatically bid for spot capacity across multiple instance pools, maximising utilisation and minimising cost. CogniCloud handles all spot lifecycle events.

Automatic retry & resumption

Preempted spot instances are detected within seconds. CogniCloud re-queues interrupted work items and dispatches them to new nodes — no data is lost, no manual intervention needed.

Fan-out parallelism

Define a map function; CogniCloud automatically distributes work items across 1 to 1,000 GPU workers. Throughput scales linearly with the number of GPUs allocated.

Any data source

Pull data from S3, GCS, Azure Blob, Hugging Face datasets, or a SQL query. Results are written back to your storage of choice with configurable output formats.

Cost & throughput dashboard

Real-time cost-per-record, items-remaining, estimated completion time, and GPU utilisation — all visible in the dashboard and queryable via the metrics API.

Flexible processing runtimes

Run custom Python, containerised workloads, or use CogniCloud's built-in pipelines for common tasks: embedding generation, transcription, classification, and re-ranking.

How it works

From zero to production in three steps.

01

Define your pipeline

Write a simple Python function that processes one item. CogniCloud handles the rest: distribution, scheduling, retries, and output collection.

from cognicloud import batch

@batch.job(
  gpus_per_worker=1,
  gpu_type="high-perf",
  strategy="spot",
  max_workers=256,
)
def embed_document(doc: dict) -> dict:
    embedding = model.encode(doc["text"])
    return {"id": doc["id"], "embedding": embedding}
02

Submit your dataset

Pass a dataset reference or an iterable. CogniCloud auto-shards the input and distributes work across workers.

results = batch.run(
    job=embed_document,
    dataset="s3://my-bucket/documents/*.jsonl",
    output="s3://my-bucket/embeddings/",
    max_cost_usd=200,
)

# Live progress
# ████████████░░░░░  68%  2.1M / 3.1M docs
# Cost so far: $41.20  ETA: 18 min
03

Collect results

Outputs are written to your destination as jobs complete — no waiting for the entire run to finish. Stream results in real time or query them after completion.

# Stream results as they finish
for result in results.stream():
    db.upsert(result["id"], result["embedding"])

# Or query the final report
report = results.summary()
# {
#   total: 3_100_000, success: 3_099_994,
#   failed: 6, duration: "26m 42s",
#   cost_usd: 58.41, tok_per_usd: 52_800
# }
Platform in development

Be first to
shape the future.

CogniCloud is in active development. Join the waitlist to get early access and stay updated on our roadmap. No pricing yet — we'll work with each team to find the right fit.

No spam. No pricing pitches. We reach out personally to discuss your use case.

GPU Compute
Inference APIs
Vector Search
Observability