DocumentationIn development — content coming soon

CogniCloud Docs

Everything you need to build, fine-tune, and serve AI models on CogniCloud — from first deployment to production scale.

Search documentation…⌘ K

Quickstart API Reference OpenAI migration guide SDK (Python & Node)

Getting Started

From zero to inference in under 60 seconds.

Install the SDK, create an API key, and deploy any open-source model. The API is OpenAI-compatible — if you've used GPT-4, you already know CogniCloud.

PythonNode.jsCLIcURL

# pip install cognicloud
import cognicloud as cogni

cogni.api_key = "ck_live_..."

# Deploy any HuggingFace model
deployment = cogni.Inference.create(
    model="meta-llama/Llama-3-70B-Instruct",
    hardware="high-perf",
)

# OpenAI-compatible chat
response = cogni.chat.completions.create(
    model="meta-llama/Llama-3-70B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Interactive code examples will be available when documentation launches.

What goes where

How our docs are organised

Our documentation is split by audience and task. Use this guide to find the right section for what you're trying to do.

Getting Started

First-time setup: API keys, SDK install, CLI config, and your first deployment. Start here if you're new.

Core Concepts

Architecture, instance types, networking, storage, regions. Use this to understand how CogniCloud works.

Product docs

GPU Compute, Inference Gateway, Neural Cache, Vector Store, Training Jobs, Global Edge. One section per product.

API & SDK Reference

REST endpoints, authentication, Python/Node SDKs, CLI commands. For integrating CogniCloud into your stack.

Security & Compliance

Data residency, SOC 2, HIPAA, VPC peering, RBAC. For security reviews and enterprise procurement.

Guides & Tutorials

Step-by-step guides: fine-tune Llama, build RAG, migrate from OpenAI. Hands-on walkthroughs.

Documentation Structure

Everything, documented.

Full reference for every API, SDK, CLI command, and concept. All sections below are being written — sign up to get notified when each section goes live.

Getting Started

Quickstart guide
5 min
Authentication & API keys
Install the SDK
Python · Node
Your first deployment
CLI setup & configuration
Choosing an instance type

Core Concepts

Architecture overview
GPU instance types & tiers
Networking & private VPC
Storage volumes & NVMe
Spot vs on-demand instances
Regions & availability zones

GPU Compute

Launching a GPU instance
Multi-node cluster setup
NVLink topology guide
Spot instance lifecycle
Persistent volume mounts
Custom CUDA containers

Inference Gateway

Deploy any HuggingFace model
OpenAI-compatible API reference
Streaming responses (SSE)
Multi-LoRA adapters
Autoscaling & scale-to-zero
Quantisation (FP8, INT4, AWQ)

Neural Cache

KV-cache configuration
Prefix caching strategies
Cache tier management
Cache warming jobs
Cost attribution & metrics
Semantic cache (coming)
Soon

Training Jobs

Defining a training job (YAML)
FSDP & DDP configuration
LoRA & QLoRA fine-tuning
Checkpoint management
Experiment tracking (W&B, MLflow)
Fault tolerance & auto-resume

Vector Store

Creating a namespace
Upserting & querying vectors
Hybrid BM25 + dense search
Metadata filtering
Multi-tenancy & isolation
HNSW index tuning

Global Edge

Anycast routing overview
Region pinning & data residency
Failover configuration
Edge model caching
Latency SLA & credits
CDN integration guide

API Reference

REST API overview
Authentication (API keys, JWTs)
Chat completions
Embeddings endpoint
Vector store API
Batch jobs API

SDK Reference

Python SDK
cognicloud
Node.js SDK
@cognicloud/sdk
Go client
Soon
Rust client
Soon
CLI reference (cogni)
OpenAI migration guide
Popular

Security & Compliance

Data residency & GDPR
SOC 2 Type II
In progress
HIPAA compliance guide
VPC peering & PrivateLink
RBAC & audit logs
Penetration testing policy

Guides & Tutorials

Fine-tune Llama-3 in 30 min
Guide
Build a RAG chatbot
Guide
Batch embedding at scale
Guide
Migrate from OpenAI
Guide
Cost optimisation playbook
Guide
Enterprise deployment pattern
Guide

Get notified when docs launch.

We'll email you as each section ships — starting with Getting Started and API Reference.

Join the waitlist

Documentation by product

◇GPU Compute ◈Inference ◆Neural Cache ○Global Edge ◉Training Jobs ▷Vector Store