The Foundation: Little's Law
The entire simulation is anchored by Little's Law — the fundamental theorem of queuing theory. It guarantees that for any stable system, the average concurrency (L) equals throughput (λ) times latency (W).
In-flight requests active at any instant — bounded by your connection pool.
Successfully served requests per second after dropping errors.
Full residence time: queuing wait + service time across every hop.
Practical intuition: If your P99 latency doubles (W↑), your effective throughput (λ) must halve for a fixed thread pool (L). This is why a single slow database query can starve your entire web tier of connections.
Saturation Formula
When ρ > 1.0 the node is over-saturated. The engine adds a queuing penalty of (ρ − 1) × 25msper unit of overflow. If ρ > 2.5, there is a 40% chance of a dropped request (simulating connection queue exhaustion).
Nested Private Services
In a Project Workspace you can drop one private service inside another as a Microservice node. The engine does notreplay a frozen benchmark snapshot for that node. Instead, before sampling begins, it simulates the nested service's entire internal architecture against the exact RPS it receives in the current run, then feeds the resulting latency, error rate, and saturation back into the parent simulation.
How a Microservice node is resolved
Because the nested error rate drives both the node and the request path, per-node and overall metrics always agree. A service that is healthy at 100 RPS but collapses at 100k RPS will show exactly that — its internal Server/DB tiers saturate under the real load instead of reporting infinite capacity. Recursion is depth-guarded (up to 4 levels) so deeply composed systems resolve without runaway loops.
Each Microservice node also exposes a Request Absorption Rate. When a service references another service internally, this rate is the probability that an inbound request cascades to the downstream service — modelling caching, early returns, or validation that "absorbs" traffic at the edge.
Example: if user-service sits at the project root and is also nested inside both payment-service and lifestyle-service with an absorption rate of 1.0, it accumulates traffic from all three paths — direct plus both cascades — rather than only its direct share.
Network Physics
Every hop between nodes pays a network cost before any component logic runs. The cost depends on whether the nodes share a VPC and region.
Server / VM
CPU performance degrades exponentially as effective RPS per core approaches a saturation threshold (~800 RPS/vCPU). Low RAM adds an additional multiplier from page-fault pressure.
800 RPS per vCPU is the stable zone. Beyond it, context-switching overhead causes non-linear latency spikes.
Applies only when RAM < 1 GB: penalty = 1.5 / ram. At 0.5 GB → 3× multiplier; at 0.25 GB → 6×.
| cpu | vCPU count | Sets the per-core RPS ceiling. More cores = linear scaling until memory is the bottleneck. |
| ram | GB | Below 1 GB RAM triggers the page-fault penalty multiplier. |
| conns | max connections | Connection pool size. Controls saturation threshold via Little's Law. |
Container
Uses the same exponential CPU model as Server/VM. Containers typically run with fractional CPU shares and low RAM, so the RAM penalty is almost always active at default values (0.5 GB → 3× multiplier).
| cpu | CPU shares (0.1–8) | Fractional CPU allocation. 0.5 = half a vCPU. |
| ram | GB (0.1–16) | Almost always below 1 GB — RAM penalty nearly always active. |
Serverless Function
Every invocation incurs a cold-start penalty. Higher memory allocation gives proportionally more CPU, so 1 GB functions start roughly 2.8× faster than 128 MB functions.
100–600 ms at 128 MB. Scales down inversely with the square root of memory allocation.
If cumulative path latency exceeds timeout × 1000 ms, the request is dropped as a 504 error.
| memory | MB (128–10240) | Higher memory = more CPU = faster cold starts and execution. |
| timeout | seconds (1–900) | Absolute ceiling on total path latency. Exceeded → dropped request. |
WebSocket Server
Long-lived connections have very low per-message overhead. The main constraint is the total concurrent connection ceiling — once saturated, new handshakes are rejected. Heartbeat adds 0.5 ms overhead only when heartbeatInterval > 0 (i.e. actively configured). Setting it to 0 disables ping/pong entirely.
Utilisation uses Little's Law directly: concurrent connections ≈ throughput × avg session (30 s). At 1 000 new connections/s with a 30 s session lifetime → 30 000 concurrent, on a 50 000-conn server → 60% load.
| conns | max connections | Total simultaneous WebSocket clients. Saturation = dropped handshakes. |
| heartbeatInterval | seconds (0 = disabled) | Set to 0 to disable ping/pong. Any positive value adds 0.5 ms per-message overhead. |
Service Worker
Models a background worker pool — Node.js worker_threads, Python Celery, Go goroutine pool, Java ExecutorService. Each worker processes one job at a time with a deterministic service time. The engine uses the M/D/1 Pollaczek–Khinchine (P-K) formula: Poisson arrivals, deterministic (fixed) service time — the most accurate closed-form model for a fixed thread pool.
workerCount × memoryPerWorkerMb exceeds 16 GB (assumed host), the engine silently clamps effectiveWorkers to fit. At 64 MB/worker the ceiling is 256 workers; at 512 MB/worker only 32 workers run regardless of config.| workerCount | 1–1024 | Thread / process pool size. Directly sets serviceRate ceiling. |
| jobDurationMs | ms (1–60 000) | Average job execution time. Lower = higher throughput per worker. Sets the D in M/D/1. |
| queueDepth | 1–100 000 | Max backlog before jobs are dropped. Drop probability scales with how far backlog exceeds this. |
| retryEnabled | toggle | On failure, 50% of jobs are retried once. Clears transient errors but adds one jobDuration of latency. |
| memoryPerWorkerMb | MB (16–8 192) | Heap per worker. Clamps effective worker count if total exceeds 16 GB host memory. |
Container Orchestrator (K8s / ECS)
Models a cluster of worker nodes managed by a control plane. Throughput capacity scales linearly with node count, CPU per node, and overcommit ratio. It simulates scheduling overhead and optional autoscaling latency.
| nodeCount | nodes | Number of worker nodes in the cluster. |
| cpuPerNode | cores | Compute capacity per node. |
| overcommitRatio | multiplier | CPU over-provisioning factor. Higher = more capacity but higher saturation risk. |
| autoScaling | toggle | When enabled, high utilization triggers a 10% pod startup latency penalty. |
GraphQL Server
GraphQL introduces a parsing and resolution layer. Performance is sensitive to query depth and the number of underlying resolvers invoked per request.
Reduces resolver latency by 40% by collapsing multiple downstream N+1 requests into single batches.
If queryCacheRatio hits, the entire resolution is skipped, returning in a fixed 2ms.
| resolversPerQuery | count | Average number of data fetchers triggered per root query. |
| maxQueryDepth | layers | Complexity ceiling. Deeper queries add recursive overhead. |
| queryCacheRatio | 0–1 | Probability of a whole-query result being served from memory. |
Microservice (Nested Service)
A Microservice node represents another private service composed inside this one. It is a black box at this level — but not a static one. The engine recursively simulates the referenced service's full internal canvas at the effective RPS this node receives, so its latency, error rate, and saturation always reflect the real load. See Nested Private Services for the full model.
1.0 = every request propagates; 0.2 = 80% is absorbed at the edge (cache hits, early returns, validation drops).| hitRatio | 0–1 (Request Absorption Rate) | Fraction of inbound traffic that cascades to internally-referenced services. Lower = more traffic absorbed before reaching downstream services. |
API Gateway
The gateway adds a fixed 2 ms routing overhead and enforces rate limits. Requests above the rate limit are immediately dropped and counted as errors.
else nodeLat += 2ms
| rateLimit | req/s (0 = disabled) | Hard ceiling on accepted RPS. Overflow → error rate increase. |
| protocol | http | https | HTTPS adds +3 ms TLS overhead via the global network physics layer. |
Load Balancer
The LB routes each request to exactly one downstream using a weighted random selection. The routing weight depends on the algorithm configured.
Each backend gets equal share regardless of capacity.
If you set a weight on the connection edge, that takes precedence. Otherwise the target's CPU count is used automatically.
Same weight logic. Approximates least-connections via capacity.
Combines CPU and RAM to compute a composite capacity weight.
Equal probability per backend — hash affinity is approximated as uniform distribution.
| algorithm | round-robin | weighted-* | resource-based | ip-hash | random | Routing strategy. See algorithm cards above. |
| conns | max connections (1 000–1 000 000) | Connection pool ceiling. Default 100 000. At high RPS, a low value causes saturation errors — real LBs handle hundreds of thousands of concurrent connections. |
WAF / Firewall
Deep packet inspection adds latency proportional to the number of active rules. Shallow mode skips payload analysis.
nodeLat += (deep ? 8ms : 2ms) + rulesPenalty
Header-only inspection. Base +2 ms + rules overhead. Faster but misses payload-level attacks.
Full packet reassembly and payload matching. Base +8 ms + rules overhead.
Example: 100 active rules → log₁₀(100) × 2 = 4 ms. Deep mode total: 8 + 4 = 12 ms per hop. 5 000 rules → log₁₀(5000) × 2 ≈ 7.4 ms penalty.
| rulesCount | 10–5000 | Active WAF rules. Logarithmic overhead (doubling rules does not double latency). |
| inspectionMode | shallow | deep | Shallow = +2 ms base. Deep = +8 ms base. Both add the rules penalty on top. |
DNS Resolver
DNS resolution latency is modelled with a probabilistic cache-hit using a hyperbolic saturation curve. TTL=30 s gives a 50% hit rate; TTL=300 s gives ~91%; TTL=3 600 s gives ~99.2%. This reflects real resolver behaviour better than a linear cap — most of the caching benefit is captured in the first few minutes of TTL.
if (hit) nodeLat += 0.5ms (local cache)
else nodeLat += 5ms + rand(0–45ms) (upstream lookup)
5–50ms
No caching — every request hits upstream resolver
~27ms avg
Short TTL, half traffic hits resolver
~5ms avg
Default — most traffic served from cache
| ttl | seconds (0–86400) | Record time-to-live. Cache hit probability = ttl / (ttl + 30). TTL=0 forces every request upstream. |
| failoverEnabled | toggle | When enabled, a failed primary DNS target retries an alternate route (+5ms failover penalty). |
CDN
On every request the CDN rolls a cache-hit against the configured ratio. Hits are served from the nearest edge PoP (fast); misses add a 75 ms origin-pull penalty on top of the 5 ms edge base, giving ~80 ms total on miss.
if (rand() ≥ cacheHitRatio) nodeLat += 75ms (origin pull)
| cacheHitRatio | 0–1 slider | Probability of serving from edge. 0.8 = 80% pay only 5 ms; 20% pay 80 ms. |
| ttl | seconds | Cache object lifetime. Not directly used in latency — informs what hit ratio to configure. |
External Service (SaaS / API)
Models 3rd-party dependencies (Twilio, Stripe, Auth0). These services have their own latency distributions, error rates, and throughput limits outside your direct control.
If error rate exceeds threshold, the CB opens and fails all requests immediately for a 30s cooldown, protecting the rest of your system.
Simulated as a bifurcated distribution: 90% of requests hit p50; 10% hit p99 tail latency.
| throughputRps | req/s | The SLA ceiling. Exceeding this triggers 100ms congestion latency or 30% random drops. |
| latencyP50Ms | ms | Median latency. Typical duration for 9 out of 10 requests. |
| latencyP99Ms | ms | Tail latency. Applied to 10% of requests to model network jitter or cold paths. |
| errorRatePct | % | Baseline probability of the external service returning an error (e.g. 5xx). |
SQL Database
SQL performance is bound by IOPS (disk I/O throughput). Partitioning doubles effective IOPS. RAM above 8 GB unlocks a 1.5× buffer-pool bonus (hot pages served from memory). Sharding multiplies total capacity linearly across shard nodes. The utilisation metric uses the same effective IOPS formula as latency — partitioning and RAM bonuses are reflected in both so the load gauge is always consistent.
nodeLat += (effRPS / effectiveIOPS) × 20ms
| iops | 100–50 000 | Disk I/O operations/sec. Primary bottleneck metric for SQL. |
| ram | GB (1–256) | Buffer pool size. >8 GB unlocks 1.5× effective IOPS via hot-page caching. |
| conns | max connections | Connection pool. Default 100 — typical PostgreSQL/MySQL default. |
| partitioning | none | range | hash | Range/hash partitioning doubles effective IOPS and halves utilisation at the same RPS. |
| sharding | toggle | Enables horizontal sharding. Multiplies effective IOPS by shard count. |
| shardCount | 2–256 (shown when sharding on) | Number of shards. Each shard independently handles effRPS / shardCount queries. |
NoSQL Database
NoSQL stores are optimized for high write throughput and horizontal scale. Sharding multiplies base throughput capacity by distributing data across nodes.
nodeLat += (effRPS / (baseCapacity × (ram > 8 ? 1.5 : 1))) × 10ms
| ram | GB (1–256) | In-memory document cache. >8 GB gives 1.5× effective capacity. |
| sharding | toggle | Distributes data across multiple shard nodes. Multiplies base capacity by shard count. |
| shardCount | 2–256 (shown when sharding on) | Default 3. Each additional shard adds 2 000 RPS to total capacity. |
Elasticsearch
Search latency decreases with more cluster nodes (parallel shard execution). Low RAM per node causes JVM heap pressure and frequent GC pauses (+10 ms penalty). Utilisation is modelled against a capacity of 1 000 QPS per node — a conservative real-world estimate for mixed search/index workloads on moderate hardware.
| nodes | 1–50 | Cluster size. Latency scales as 20/nodes — doubling nodes halves base search time. Capacity = nodes × 1 000 QPS. |
| ram | GB per node | JVM heap. Below 8 GB adds 10 ms GC pressure penalty; above 8 GB adds only 2 ms overhead. |
Object Storage (S3 / GCS / Azure Blob)
Provider-agnostic object store model. Latency is composed of six additive steps: storage class retrieval tier, auth overhead (standard SigV4 vs. presigned), optional KMS encryption round-trip, Transfer Acceleration edge routing, per-prefix request-rate throttling, and bandwidth saturation. All steps respect the configured workload type (read vs. write).
nodeLat += presignedUrl ? 0 : 10ms (SigV4 auth)
nodeLat += encryption === 'kms' ? 8ms : encryption === 'sse-s3' ? 1ms : 0
if (transferAcceleration) nodeLat ×= 0.6
prefixCap = prefixCount × (write ? 3 500 : 5 500) req/s
if (effRPS / prefixCap > 1) nodeLat += (sat − 1) × 200ms [+ error risk]
bwSat = (effRPS × 0.1 MB) / throughput_MBs
if (bwSat > 1) nodeLat += (bwSat − 1) × 100ms
| throughput | MB/s (1–10 000) | Bandwidth cap. Assumes ~0.1 MB avg payload per request. |
| storageClass | standard | ia | glacier-instant | standard: 50 ms. Infrequent Access: 100 ms. Glacier Instant Retrieval: 150 ms. |
| presignedUrl | toggle | Off: +10 ms server-side SigV4 validation. On: signing is delegated to the client — 0 ms auth overhead. |
| prefixCount | 1–1 000 | Number of key prefixes. Each prefix handles 5 500 GET/s or 3 500 PUT/s before 503 SlowDown backoff fires. |
| encryption | none | sse-s3 | kms | SSE-S3: +1 ms (AES-256 inline). SSE-KMS: +8 ms (extra KMS GenerateDataKey API call). |
| transferAcceleration | toggle | Routes via nearest CDN edge PoP → private backbone. Multiplies effective latency by 0.6 (40% reduction). |
Redis
Redis is single-threaded and memory-bound. Its compute cost is negligible (0.5 ms) on top of the network round-trip. Enable sharding (Redis Cluster) to scale throughput linearly across shard nodes — each shard independently handles its key-space slice at the same 50 000 ops/s per GB rate.
util = effRPS / cap
nodeLat += 0.5ms + (util > 0.8 ? (util − 0.8) × 10ms : 0)
| memory | GB (0.1–64) | Determines ops/s capacity: 50 000 × memory GB per shard. |
| conns | max connections | Concurrent client connections. Feeds the global saturation check. |
| cacheHitRatio | 0–1 slider | Used when Redis is the target of a caching connection (isCaching flag). |
| sharding | toggle | Enables Redis Cluster mode. Throughput scales linearly with shard count. |
| shardCount | 2–256 (shown when sharding on) | Default 3 (matching Redis Cluster minimum). Each shard adds full memory × 50k capacity. |
Memcached
Memcached is a distributed, multi-threaded, volatile cache. Unlike Redis it has no persistence or complex data structures — just fast key/value storage. Its key advantage is multi-threading: throughput scales with both memory allocation and thread count. Sharding adds further horizontal capacity by distributing keys across independent Memcached nodes (client-side consistent hashing).
util = effRPS / cap
nodeLat += 0.3ms + (util > 0.8 ? (util − 0.8) × 10ms : 0)
| memory | GB (0.1–64) | Allocated RAM. Capacity = memory × threads × 40 000 ops/s. |
| threads | 1–64 | Worker threads. Default 4. Each thread independently processes requests. |
| maxItemSize | MB (0.1–128) | Maximum cacheable object size. Default 1 MB (Memcached default slab ceiling). |
| cacheHitRatio | 0–1 slider | Used when Memcached is the target of a caching connection (isCaching flag). |
| sharding | toggle | Client-side consistent hashing across multiple Memcached nodes. Multiplies total capacity by shard count. |
| shardCount | 2–256 (shown when sharding on) | Default 2. Each node runs independently — no cross-node replication. |
Time-Series DB
Optimized for append-only streaming writes and window-based reads. It models heavy replication and background downsampling (merging old data points).
| writeRateKps | k req/s | Ingestion ceiling. Shared across nodes before sharding. |
| downsamplingEnabled | toggle | Reduces read-path latency by using pre-aggregated data windows. |
Graph Database
Relies on pointer-following for deep relationship traversals. Latency grows non-linearly with traversal depth (hops) and average node degree (connections per node).
Adds 50ms sync-penalty to model global lock acquisition or ACID coordination across shards.
3 hops = baseline. 5 hops = exponential blowup. High-degree nodes (+20 edges) saturate CPUs rapidly.
| avgDegree | edges | Density of the graph. More edges = more data fetched per hop. |
| traversalDepth | hops | Search radius. Latency = degree^(depth/2). |
Key-Value Store
Models simple O(1) lookups (DynamoDB, etcd, Consul). Highly parallel, deterministic performance intended for transient configuration or distributed locking.
| consistencyLevel | eventual | strong | Strong consistency (like etcd/Raft) adds a 2ms consensus penalty. |
| evictionPolicy | lru | lfu | ttl | Behavior when RAM is full. Affects utilization metrics only. |
Blob Storage
Optimized for large object storage (images, MP4, backups). Unlike the general Object Storage node, it specifically models bandwidth-bound transfer latency with CDN bypass, multipart upload, replication penalties, and storage-class-aware first-byte cost.
| maxThroughputMbps | Mbps | Available network bandwidth. 1000 Mbps = 1 Gbps. |
Data Warehouse (Snowflake / BigQuery)
OLAP store for multi-terabyte queries. Columnar storage and massive query parallelism reduce scan times for "Very Complex" analytics jobs.
Enabled by default. Optimizes wide-table aggregation by reading only necessary columns from disk.
Total workers assigned to one job. 64 parallelism means the compute cost is divided by 64.
| parallelismDegree | threads | Compute units dedicated to one query execution. |
| avgQueryComplexity | select | Simple (1s) to Very Complex (60s) base costs. |
Kafka / Event Stream
Kafka latency is shaped by partition count (parallelism) and broker count (replication overhead). More partitions reduce per-partition queue depth via log₂ bonus; more brokers add a small ISR sync penalty. Latency is floored at 1 ms — even a massively over-provisioned cluster has serialization overhead.
brokerPenalty = brokers × 0.5ms
nodeLat += max(1ms, 10ms − partitionBonus + brokerPenalty)
10 − (log₂(10)×2) + (3×0.5) = 10 − 6.64 + 1.5 = 4.9 ms
10 − (log₂(100)×2) + (5×0.5) = max(1, −0.8) = 1 ms
| brokers | 1–100 | Broker count. Each broker adds 0.5 ms ISR replication overhead. |
| partitions | 1–1000 | Partition count. Each doubling gives a fixed log₂ reduction in per-message queuing. |
Message Queue (SQS)
FIFO queues guarantee exactly-once ordered delivery at the cost of extra coordination overhead. Standard queues trade ordering guarantees for lower latency.
| type | standard | fifo | Standard: +5 ms. FIFO: +15 ms (ordering/dedup overhead). |
| visibilityTimeout | seconds | How long a message is invisible after being consumed. Does not affect request-path latency. |
Pub / Sub
Fan-out adds a per-subscriber delivery scheduling cost to publish latency. 0.5 ms per subscriber reflects the broker's cost of dispatching to each downstream delivery agent.
| fanoutCount | 1–100 | Number of downstream subscribers. Each adds 0.5 ms fan-out scheduling overhead. |
Mail Server
SMTP is inherently high-latency (TCP handshake + EHLO + DATA phase + server ACK: 100–250 ms baseline). The connection pool queuing follows M/M/c queueing theory — latency diverges sharply as utilisation approaches and exceeds 1.0. Above 1.5× saturation, connections are dropped as errors.
baseLat = 100ms + rand(0–150ms)
if (util > 1) baseLat += (util − 1)² × 500ms
if (util > 1.5 && rand() < 0.4) → DROP (error)
| concurrentConns | 1–500 | Maximum simultaneous SMTP sessions. Queuing diverges above 100% utilisation; errors above 150%. |
| dailyLimit | messages/day | Gradual throttling kicks in at 10× fair-share per second. At 100× fair-share, 50% drop probability. |
| protocol | smtp | imap | pop3 | Protocol label for documentation. SMTP overhead is uniform across variants in simulation. |
RabbitMQ
An AMQP broker implementing smart-routing via exchanges. Performance depends on persistence settings, acknowledgment modes, and the total consumer pool size across all queues.
| consumerCount | count | Total workers pulling from the queue. Multiplies throughput capacity. |
| acknowledgmentMode | auto | manual | Manual ACK adds 3ms per-message coordination latency. |
| persistenceEnabled | toggle | Durable queues add 5ms disk-sync latency to the publish path. |
Scheduler (Cron / Job Runner)
Models a background task scheduler. Unlike the Service Worker (which focuses on thread pooling), the Scheduler focuses on long-running jobs and concurrent execution limits. It follows a variation of Little's Law for queueing wait times.
(2.0 - 1) × 500 = 500ms extra wait.| maxConcurrentJobs | count | Maximum simultaneous tasks. Throughput ceiling = concurrency / duration. |
| avgJobDurationMs | ms | Mean execution time per task. Determines how quickly concurrency slots are freed. |
| cronResolutionSec | sec | The granularity of the clock (e.g. 60s for standard crontab). Informational. |
LLM / AI Model
LLM latency is dominated by token generation speed. The engine estimates output tokens as 10% of the context window size and converts generation time to milliseconds.
nodeLat += (outputTokens / tokensPerSec) × 1000ms
| contextWindow | k tokens (4–1024) | Max input context. Proxy for output length (10% assumed as completion). |
| tokensPerSec | t/s (1–200) | Generation throughput. Primary bottleneck for LLM latency. |
| temperature | 0–2 slider | Sampling randomness. Does not affect latency in simulation. |
Vector Database
ANN search latency grows with vector dimensionality. The index type determines the search algorithm: HNSW (graph-based) is fastest; Flat (brute-force) is slowest.
indexBonus = hnsw ? 0.5 : flat ? 5 : 1 (ivf)
nodeLat += 5ms × dimPenalty × indexBonus
| dimensions | 64–3072 | Embedding vector size. Latency scales linearly with dimensions/512. |
| indexType | hnsw | ivf | flat | HNSW: 0.5× penalty. IVF: 1× (baseline). Flat: 5× (brute-force scan). |
| similarity | cosine | l2 | dot | Distance metric. Not differentiated in latency; use for documentation. |
Embedding API
Embedding generation is a fixed-cost operation whose per-item cost is amortized by batching. Larger batches reduce effective per-request latency.
Full 50 ms per request (no batch savings).
50 / log₂(128) = 50 / 7 ≈ 7.1 ms per item.
| latencyMs | ms (5–500) | Baseline embedding call latency at batch size 1. |
| batchSize | 1–2048 | Items per API call. Higher batch → logarithmically lower per-item latency. |
AI Agent / Orchestrator
Agentic workflows call the LLM multiple times per user request. The orchestrator models only its own routing overhead — each downstream LLM call accrues its own latency separately.
nodeLat += 10ms × avgSteps × strategyPenalty
| multiStepFactor | 1–20 | Average number of LLM tool-call iterations per user request. |
| strategy | chain | reflex | tree | Reasoning strategy. Multiplies step overhead: chain×1, reflex×2, tree×5. |
Document Parser / ETL
Document parsing latency grows with chunk size (more data per unit) and shrinks with parallelism.
nodeLat += (25ms × chunkPenalty) / parallelJobs
| chunkSizeBytes | 128–8192 | Size of each processed chunk. Larger chunks = more work per task = higher latency. |
| parallelJobs | 1–100 | Concurrent workers. Linearly reduces effective per-request latency. |
Workflow Engine
Workflow engines (Temporal, n8n, Airflow) persist state at every step, adding a fixed 10 ms checkpoint overhead. Queuing latency activates above 80% concurrency saturation and grows as the overflow portion using an M/D/1 model (fixed service time, variable arrival). The penalty is (sat − 0.8) × 25ms — only the overflow fraction is penalised, not the full saturation value.
execSaturation = effRPS / concurrency
if (execSaturation > 0.8) nodeLat += (execSaturation − 0.8) × 25ms
| concurrency | 10–5000 | Maximum simultaneous workflow executions. Above 80% triggers overflow queuing penalty. |
| timeout | seconds (1–3600) | Per-step execution ceiling. Long steps may cascade into downstream timeouts. |
| retryPolicy | toggle | Auto-retry on step failure adds 50 ms retry overhead but clears transient errors. |