DocumentationIn development — content coming soon

CogniCloud Docs

Everything you need to build, fine-tune, and serve AI models on CogniCloud — from first deployment to production scale.

Search documentation…⌘ K

Getting Started

From zero to inference in under 60 seconds.

Install the SDK, create an API key, and deploy any open-source model. The API is OpenAI-compatible — if you've used GPT-4, you already know CogniCloud.

PythonNode.jsCLIcURL
# pip install cognicloud
import cognicloud as cogni

cogni.api_key = "ck_live_..."

# Deploy any HuggingFace model
deployment = cogni.Inference.create(
    model="meta-llama/Llama-3-70B-Instruct",
    hardware="high-perf",
)

# OpenAI-compatible chat
response = cogni.chat.completions.create(
    model="meta-llama/Llama-3-70B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")
Interactive code examples will be available when documentation launches.

What goes where

How our docs are organised

Our documentation is split by audience and task. Use this guide to find the right section for what you're trying to do.

Getting Started

First-time setup: API keys, SDK install, CLI config, and your first deployment. Start here if you're new.

Core Concepts

Architecture, instance types, networking, storage, regions. Use this to understand how CogniCloud works.

Product docs

GPU Compute, Inference Gateway, Neural Cache, Vector Store, Training Jobs, Global Edge. One section per product.

API & SDK Reference

REST endpoints, authentication, Python/Node SDKs, CLI commands. For integrating CogniCloud into your stack.

Security & Compliance

Data residency, SOC 2, HIPAA, VPC peering, RBAC. For security reviews and enterprise procurement.

Guides & Tutorials

Step-by-step guides: fine-tune Llama, build RAG, migrate from OpenAI. Hands-on walkthroughs.

Documentation Structure

Everything, documented.

Full reference for every API, SDK, CLI command, and concept. All sections below are being written — sign up to get notified when each section goes live.

Getting Started

  • Quickstart guide
    5 min
  • Authentication & API keys
  • Install the SDK
    Python · Node
  • Your first deployment
  • CLI setup & configuration
  • Choosing an instance type

Core Concepts

  • Architecture overview
  • GPU instance types & tiers
  • Networking & private VPC
  • Storage volumes & NVMe
  • Spot vs on-demand instances
  • Regions & availability zones

GPU Compute

  • Launching a GPU instance
  • Multi-node cluster setup
  • NVLink topology guide
  • Spot instance lifecycle
  • Persistent volume mounts
  • Custom CUDA containers

Inference Gateway

  • Deploy any HuggingFace model
  • OpenAI-compatible API reference
  • Streaming responses (SSE)
  • Multi-LoRA adapters
  • Autoscaling & scale-to-zero
  • Quantisation (FP8, INT4, AWQ)

Neural Cache

  • KV-cache configuration
  • Prefix caching strategies
  • Cache tier management
  • Cache warming jobs
  • Cost attribution & metrics
  • Semantic cache (coming)
    Soon

Training Jobs

  • Defining a training job (YAML)
  • FSDP & DDP configuration
  • LoRA & QLoRA fine-tuning
  • Checkpoint management
  • Experiment tracking (W&B, MLflow)
  • Fault tolerance & auto-resume

Vector Store

  • Creating a namespace
  • Upserting & querying vectors
  • Hybrid BM25 + dense search
  • Metadata filtering
  • Multi-tenancy & isolation
  • HNSW index tuning

Global Edge

  • Anycast routing overview
  • Region pinning & data residency
  • Failover configuration
  • Edge model caching
  • Latency SLA & credits
  • CDN integration guide

API Reference

  • REST API overview
  • Authentication (API keys, JWTs)
  • Chat completions
  • Embeddings endpoint
  • Vector store API
  • Batch jobs API

SDK Reference

  • Python SDK
    cognicloud
  • Node.js SDK
    @cognicloud/sdk
  • Go client
    Soon
  • Rust client
    Soon
  • CLI reference (cogni)
  • OpenAI migration guide
    Popular

Security & Compliance

  • Data residency & GDPR
  • SOC 2 Type II
    In progress
  • HIPAA compliance guide
  • VPC peering & PrivateLink
  • RBAC & audit logs
  • Penetration testing policy

Guides & Tutorials

  • Fine-tune Llama-3 in 30 min
    Guide
  • Build a RAG chatbot
    Guide
  • Batch embedding at scale
    Guide
  • Migrate from OpenAI
    Guide
  • Cost optimisation playbook
    Guide
  • Enterprise deployment pattern
    Guide

Get notified when docs launch.

We'll email you as each section ships — starting with Getting Started and API Reference.

Join the waitlist