Usage Limits / Free-Tier Guardrails
Server-side guardrails that bound the resources a single Weaviate instance can consume. Designed for the upcoming Weaviate Cloud Free Tier; usable in any deployment that fits the supported deployment shape (see *Scope* below).
Loading actions...
Skill content
Main instructions and any bundled files for this skill.
Prompt Playground
1 VariableFill Variables
Preview
# Usage Limits / Free-Tier Guardrails
Server-side guardrails that bound the resources a single Weaviate instance can consume. Designed for the upcoming Weaviate Cloud Free Tier; usable in any deployment that fits the supported deployment shape (see *Scope* below).
All limits are **opt-in**: env vars unset means no enforcement.
> Source of truth for the design: the [RFC](https://www.notion.so/35870562ccd681ce9356e47fa7a37935). This file is the codebase-internal pointer that explains *what is implemented* and *where the hooks live*; the RFC has the full rationale and out-of-scope discussion.
## Environment variables
| Variable | Type | Default | Effect |
|---|---|---|---|
| `USAGE_LIMITS_ERROR_MESSAGE` | string | `"{limit} count limit of {value} reached for this instance."` | Operator-overridable template for the user-facing error message. Placeholders: `{limit}` (resource type) and `{value}` (configured threshold). |
| `MAXIMUM_ALLOWED_OBJECTS_COUNT` | int | `-1` (unlimited) | Cap on live object count, summed across all loaded local shards (node-wide). Checked on every single + batch insert at the storage chokepoint. |
| `MAXIMUM_ALLOWED_COLLECTIONS_COUNT` | int | `-1` (unlimited) | Cap on number of collections. Cluster-global when `NAMESPACES_ENABLED=false`; per namespace when `NAMESPACES_ENABLED=true`. |
| `MAXIMUM_ALLOWED_TENANTS_PER_COLLECTION` | int | `-1` (unlimited) | Cap on tenants per multi-tenant collection. Checked at tenant-create time only. |
| `MAXIMUM_ALLOWED_SHARDS_PER_COLLECTION` | int | `-1` (unlimited) | Cap on `desiredCount` of a class create request's `shardingConfig`. Config-time check. |
All values are **runtime-overrideable** via the existing runtime overrides YAML file (see `RUNTIME_OVERRIDES_*`). Field names in the YAML are the lowercase-snake-case forms of the env-var names.
### Required pairing: `REPLICATION_MAXIMUM_FACTOR=1`
The object/tenant/shard caps only work in the RF=1 deployment shape (see *Scope* below). When **any** of `MAXIMUM_ALLOWED_OBJECTS_COUNT`, `MAXIMUM_ALLOWED_TENANTS_PER_COLLECTION`, or `MAXIMUM_ALLOWED_SHARDS_PER_COLLECTION` is set, you must also set `REPLICATION_MAXIMUM_FACTOR=1`. Startup fails otherwise. `REPLICATION_MAXIMUM_FACTOR` also caps the per-class `replicationConfig.factor` for new classes, so the invariant holds at runtime too.
`MAXIMUM_ALLOWED_COLLECTIONS_COUNT` is **not** part of the linkage — it predates this RFC and tying it would break existing operators.
## Where each check fires
| Limit | Hook | File |
|---|---|---|
| Objects (single) | `Shard.PutObject` (top of function, before LSM write) | `adapters/repos/db/shard_write_put.go` |
| Objects (batch) | `Shard.PutObjectBatch` (top of function) | `adapters/repos/db/shard_write_batch_objects.go` |
| Collections | `usecases/schema/class.go` `AddClass()` | `usecases/schema/class.go` |
| Tenants | `usecases/schema/tenant.go` `AddTenants()` | `usecases/schema/tenant.go` |
| Shards | `usecases/schema/class.go` `AddClass()` (sharding-config validation) | `usecases/schema/class.go` |
The object check sits at the **storage chokepoint** rather than at the use-case layer. That covers both writes that arrive locally and writes that were forwarded from another node — both converge at `Shard.PutObject{,Batch}` on the home node for RF=1. The use-case layer (`usecases/objects/`) does not enforce the object cap; that hook was deliberately removed when we moved the chokepoint deeper.
The schema-side limits (collections, tenants, shards) stay at the use-case layer because that's the single coordinator path — `AddClass`/`AddTenants` go through RAFT, no forwarded-write concern.
## Counter source
The object count is **node-wide** across local shards: the manager sums each loaded shard's `bucket.CountAsync()` (`adapters/repos/db/lsmkv/bucket.go`) on every enforced write. Each `CountAsync()` is O(segments-per-shard) — it walks the live segment list and sums each segment's already-loaded net-additions counter, no I/O. For the Free-Tier shape (few shards, few segments) that's a handful of atomic reads on the hot path.
On namespace-enabled clusters the chokepoint passes the (namespace-qualified) class name and `CheckObjects` extracts the namespace; the counter then sums only indices in that namespace. The cap is applied **per namespace**, not cluster-wide. A plain (non-qualified) class name sums all indices (NS-disabled clusters; the global slice on NS-enabled clusters that have no namespaced classes yet).
We deliberately don't route through `UsageForIndex` — that path triggers other usage-module computations beyond a count.
The collection count is **schema/RAFT-backed**: the limit check in `AddClass()` calls `schemaManager.QueryCollectionsCount(namespace)`, which goes through RAFT to the leader. When `NAMESPACES_ENABLED=false` the namespace selector is empty and the response is the cluster-global `len(s.classes)`. When `NAMESPACES_ENABLED=true` the selector is the caller's `principal.Namespace` (namespaced creates already require a namespaced principal) and the count is restricted to stored class names whose `<namespace>:` prefix matches. Aliases live outside this map and do not consume the cap.
## Error response
When any limit is hit, Weaviate returns:
- **HTTP**: `429 Too Many Requests` with body
```json
{
"errorCode": "USAGE_LIMIT_EXCEEDED",
"limit": "objects",
"value": 10000,
"message": "Object count limit of 10000 reached for this instance."
}
```
- **gRPC**: `codes.ResourceExhausted` with `errdetails.ErrorInfo` carrying the same `limit`/`value`/`message` fields under `Reason="USAGE_LIMIT_EXCEEDED"`, `Domain="weaviate.usagelimits"`.
The structured fields (`errorCode`, `limit`, `value`) are stable contract regardless of the `USAGE_LIMITS_ERROR_MESSAGE` template — only the human-facing `message` text changes.
### Batch behavior
When a batch insert would exceed `MAXIMUM_ALLOWED_OBJECTS_COUNT`, the **shard-slice is rejected** as a unit:
- **Single-shard collections (Free Tier):** whole-batch rejection. No partial fill.
- **Multi-shard collections:** `Index.putObjectBatch` partitions a client batch by shard *before* forwarding, so the chokepoint sees one slice per shard. A single client batch can therefore produce per-shard partial success on multi-shard collections — accepted under our current scope (see *Scope* below).
## Scope
Supported deployment shapes (where the cap is meaningful and exact):
- **Single-node clusters** (the Free Tier sandbox case) — there is no other node.
- **Namespaced clusters in phase 1** — a namespace's collections/shards are pinned to a single node, so the per-namespace sum is local. `MAXIMUM_ALLOWED_OBJECTS_COUNT` applies per namespace.
**Out of scope:**
- **RF > 1.** The replicated write path bypasses `Shard.PutObject{,Batch}` (it goes through `shard_replication.go`'s `preparePutObject{,s}` → `s.putOne` / `s.putBatch` directly). Supporting RF>1 would require either dropping the check one level deeper or a smarter scheme like a lease-based quota.
- **Hypothetical multi-node, non-namespaced, RF=1, single-shard clusters where collections are distributed across nodes.** Each node would only see its local slice of the count, so the effective cap stacks (`cap × min(N_collections, N_nodes_with_shards)`). Not a deployment shape we ship the cap in.
- **Phase-2 namespaces** that spread a namespace's collections across nodes — same problem as the previous bullet.
## Backward-compat note: collections-limit status code
The pre-existing `MAXIMUM_ALLOWED_COLLECTIONS_COUNT` enforcement previously returned **HTTP 422 Unprocessable Entity** with a free-text "maximum number of collections" message. As of this release it returns **HTTP 429** with the structured `USAGE_LIMIT_EXCEEDED` body described above. Clients matching on the prior 422 status or message text must adapt.
## Accepted imperfections
- **Object count via async path.** Counts come from `CountAsync` and exclude the in-memory memtable, so during fast bulk imports the count lags slightly behind on-disk state. Bounded by in-flight write volume between count refreshes; self-corrects on the next flush. Sync counting on every write would scan the entire memtable — wasteful at the 10K free-tier scale, fatal at 10M+ scale.
- **Cold lazy-load shards are skipped from the sum.** Including them wouldn't force a load (counts can be read from on-disk segment metadata), but it would add a directory walk + per-segment metadata read per cold shard on every write — unacceptable on the hot path. Effect: accounts with dormant tenants may sit slightly under-counted. Future option: cache cold counts in memory at load time.
- **Per-shard-slice batch rejection** on multi-shard collections (see *Batch behavior*). Single-shard collections (Free Tier) see whole-batch rejection unchanged.
- **Tenants checked at create time only**, not on subsequent multi-tenancy config changes.
- **Schema-side caps are not transactional with RAFT.** Read-check-write is not atomic across the RAFT-replicated `AddClass`/`AddTenants` call, so two concurrent creates can both pass the check. Bounded overshoot; next request is correctly rejected.
---
# Configuration Restrictions
A second class of opt-in guardrails that constrain **what kind** of class an operator's tenants may create — distinct from the usage limits above, which cap **how much** state they can produce. Like usage limits, these are unset by default; existing deployments are unaffected.
## Environment variables
| Variable | Type | Default | Effect |
|---|---|---|---|
| `ALLOWED_VECTOR_INDEX_TYPES` | comma-separated list | unset (no restriction) | Allow-list for class `vectorIndexType` and named-vector `vectorConfig[*].vectorIndexType`. Valid entries: `hnsw`, `flat`, `dynamic`, `hfresh`. |
| `ALLOWED_COMPRESSION_TYPES` | comma-separated list | unset (no restriction) | Allow-list for the compression configured on a class's vector index. Valid entries: `none`, `pq`, `sq`, `rq-1`, `rq-8`, `bq` (same names accepted by `DEFAULT_QUANTIZATION`). Hfresh classes are exempt — hfresh has no compression knobs. |
| `RESTRICTIONS_ERROR_MESSAGE` | string | `"{value} is not allowed for {restriction}. Allowed values: {allowed}."` | Operator-overridable template for the user-facing message. Placeholders: `{restriction}`, `{value}`, `{allowed}`. |
All three are **runtime-overrideable** via the runtime overrides YAML (`allowed_vector_index_types`, `allowed_compression_types`, `restrictions_error_message`).
## Cross-field rules
Validated at startup in `Config.Validate()` (`usecases/config/config_handler.go`):
1. Each entry must be one of the canonical valid values.
2. **Single-entry allow-list:** the matching default (`DEFAULT_VECTOR_INDEX` / `DEFAULT_QUANTIZATION`) must either be unset (in which case it is seeded to the single value) or match it.
3. **Multi-entry allow-list:** the matching default must be explicitly set and present in the list.
4. **Hfresh + compression invariant:** `ALLOWED_VECTOR_INDEX_TYPES=hfresh` (only) paired with a non-empty `ALLOWED_COMPRESSION_TYPES` is rejected at startup — hfresh has no compression. Compression alongside hfresh in a *mixed* allow-list (e.g. `hfresh,hnsw`) is allowed because the non-hfresh members still need a compression policy.
## Common shapes
```yaml
# Force everyone to a single vector index type.
ALLOWED_VECTOR_INDEX_TYPES=hfresh
# DEFAULT_VECTOR_INDEX is seeded to "hfresh"; DEFAULT_QUANTIZATION and
# ALLOWED_COMPRESSION_TYPES must remain unset.
```
```yaml
# Allow hfresh + hnsw with a forced compression on the hnsw side.
ALLOWED_VECTOR_INDEX_TYPES=hfresh,hnsw
DEFAULT_VECTOR_INDEX=hfresh # must be set: multi-entry list
ALLOWED_COMPRESSION_TYPES=rq-8
DEFAULT_QUANTIZATION=rq-8 # seeded if unset
```
```yaml
# Maximum performance, cost no object: hnsw only, no compression.
ALLOWED_VECTOR_INDEX_TYPES=hnsw
ALLOWED_COMPRESSION_TYPES=none
# Defaults seeded to "hnsw" and "none" respectively.
```
## Where each check fires
| Restriction | Hook | File |
|---|---|---|
| Vector index type (legacy + named) | `Handler.validateVectorIndexType` | `usecases/schema/class.go` |
| Compression (legacy + named) | `Handler.validateAllowedCompression` (invoked from `validateVectorSettings`) | `usecases/schema/class.go` |
The compression check inspects user-supplied config only; the default compression applied later (in `enableQuantization`) is guaranteed by startup validation to be in the allow-list, so a request that arrives with no compression block still produces a compatible class.
## Error response
When a class create/update violates a restriction:
- **HTTP**: `422 Unprocessable Entity` with body
```json
{
"errorCode": "CONFIG_NOT_ALLOWED",
"restriction": "compression",
"value": "pq",
"allowed": ["rq-8"],
"message": "pq is not allowed for compression. Allowed values: rq-8."
}
```
- **gRPC**: `codes.FailedPrecondition` with `errdetails.ErrorInfo` carrying the same fields under `Reason="CONFIG_NOT_ALLOWED"`, `Domain="weaviate.restrictions"`.
The `errorCode`, `restriction`, `value`, and `allowed` fields are stable wire contract; the `message` is rendered from `RESTRICTIONS_ERROR_MESSAGE` and varies across deployments. Example operator override:
```
RESTRICTIONS_ERROR_MESSAGE=Invalid config: {value} for {restriction} is not allowed on this tier — please upgrade.
```
## Accepted imperfections
- **Compression detection is based on the parsed user config only.** A class submitted with `{"pq": {"enabled": false}}` is treated identically to a class with no compression block at all — both fall through to the default, which startup validation already vetted against the allow-list. The only way to *opt out* of all compression is `skipDefaultQuantization: true`, which the validator surfaces as the value `none`.
- **Hfresh classes bypass the compression check entirely.** That includes named-vector entries whose `vectorIndexType` is `hfresh`.
Related Skills
Frontend Typescript Linting.mdc
TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...
2. Apply Deepthink Protocol (reason about dependencies
risks