Expressions#

Query arguments are written as expressions compiled at startup using expr-lang/expr. Each expression has access to the built-in functions, globals, and any user-defined expressions.

Tip: Use edg repl to try any expression interactively without a database connection. See REPL for details.

Functions#

These are edg’s built-in functions, available in any expression context (args:, expressions:, globals). They generate data, reference datasets, aggregate values, and control execution flow.

Function	Returns	Description
`__sep__`	`string`	Driver-aware batch field separator. A query-text token that is replaced with the SQL function producing the ASCII unit separator character (char 31) used to delimit values within batch-expanded placeholders. Resolves to `chr(31)` for pgx, `CHAR(31)` for MySQL and MSSQL, `codepoints-to-string(31)` for Oracle, `CODE_POINTS_TO_STRING([31])` for Spanner. Can be used in any argument position within SQL. Always use `__sep__` instead of a literal comma. Generated values may contain commas, which would silently corrupt your data. `string_to_array('$1', __sep__)`
`abs(x)`	`float64`	Absolute value of x. `abs(-5.0)` -> `5`
`acos(x)`	`float64`	Arc cosine of x (result in radians). `acos(1.0)` -> `0`
`arg(index)`	`any`	Returns the value of a previously evaluated arg by its zero-based index or name. Enables dependent columns where later args reference earlier ones. `arg(0)` -> `"Alice"` `arg('email')` -> `"alice@example.com"` (with named args)
`array(minN, maxN, pattern)`	`string`	PostgreSQL/CockroachDB array literal with a random number of elements. `array(2, 4, 'email')` -> `{a@b.com,c@d.com,d@e.com}`
`asin(x)`	`float64`	Arc sine of x (result in radians). `asin(1.0)` -> `1.5707...`
`atan(x)`	`float64`	Arc tangent of x (result in radians). `atan(1.0)` -> `0.7853...`
`atan2(y, x)`	`float64`	Two-argument arc tangent of y/x (result in radians). Handles quadrant correctly. `atan2(1.0, 1.0)` -> `0.7853...`
`avg(name, field)`	`float64`	Average of a numeric field across all rows in a named dataset. `avg('fetch_products', 'price')` -> `19.39`
`batch(n)`	`[][]any`	Returns sequential integers `[0, n)` as batch arg sets, `batch(3)` -> `[[0], [1], [2]]`
`bit(n)`	`string`	Random fixed-length bit string of exactly `n` bits. `bit(8)` -> `10110011`
`blob(n)`	`[]byte`	Random `n` bytes as raw binary data. Works across all databases (PostgreSQL, MySQL, Oracle, MSSQL) via bind parameters. Use this for BLOB, BYTEA, VARBINARY, and RAW columns. `blob(1024)` -> `(1024 random bytes)`
`bool()`	`bool`	Random `true` or `false`. Useful as a coin flip with `cond()` and `arg()` for mutually exclusive columns. `bool()` -> `true`
`bytes(n)`	`string`	Random `n` bytes as a hex-encoded string with `\x` prefix. PostgreSQL/CockroachDB only. For cross-database binary data, use `blob(n)` instead. `bytes(4)` -> `\x1a2b3c4d`
`ceil(x)`	`float64`	Smallest integer greater than or equal to x. `ceil(3.2)` -> `4`
`coalesce(v1, v2, ...)`	`any`	Returns the first non-nil value from arguments. `coalesce(nil, 'default')` -> `default`
`complete_array(tool, prompt, count)`	`[]map`	Generates N structured items in a single LLM call. The tool schema is automatically wrapped in an array request. Returns `[]map` for use with `ref_each()`. Memoized by (tool, prompt, count). Requires `--complete-api-key` or `EDG_COMPLETE_API_KEY`. See Complete. `ref_each(complete_array("review", "Generate 5 reviews", 5)).review_text` -> `"Great product!"`
`complete(tool, prompt)`	`map`	Calls an LLM with a named tool schema and returns structured data as a map. Access fields with dot notation. Per-row memoization ensures multiple field accesses with the same tool and prompt make only one API call. Requires `--complete-api-key` or `EDG_COMPLETE_API_KEY`. See Complete. `complete("review", "Review: Widget").review_text` -> `"Great product!"` `complete("review", "Review: Widget").rating` -> `4`
`cond(predicate, trueVal, falseVal)`	`any`	Returns `trueVal` if `predicate` is true, `falseVal` otherwise. `cond(true, 'yes', 'no')` -> `yes`
`const(value)`	`any`	Returns the value as-is. Useful for literal constants. `const(42)` -> `42`
`cos(x)`	`float64`	Cosine of x (x in radians). `cos(0.0)` -> `1`
`count(name)`	`int`	Number of rows in a named dataset. `count('fetch_products')` -> `5`
`date_offset(duration)`	`string`	Returns the current time offset by `duration`, formatted as RFC3339. `date_offset('-72h')` -> `2026-04-08T10:00:00Z`
`date(format, min, max)`	`string`	Random timestamp formatted using a Go time format string. `date('2006-01-02', '2020-01-01T00:00:00Z', '2025-01-01T00:00:00Z')` -> `2023-07-15`
`distinct(name, field)`	`int`	Number of distinct values for a field in a named dataset. `distinct('fetch_products', 'category')` -> `3`
`duration(min, max)`	`string`	Random duration between `min` and `max` (Go duration strings). `duration('1h', '24h')` -> `14h32m17s`
`embed(text...)`	`string`	Calls an external embedding API (OpenAI-compatible) and returns a vector literal. Variadic - multiple args are joined with a space. Requires `--embed-api-key` or `EDG_EMBED_API_KEY`. See Embed. `embed('hello world')` -> `[0.0123,-0.0456,...]` `embed(field('name'), field('description'))` -> `[0.0789,...]`
`env_nil(name)`	`any`	Returns the value of an environment variable as a string, or `nil` if unset. Unlike `env()`, does not error on missing variables. Designed for use with `coalesce()` to provide defaults: `int(coalesce(env_nil('PORT'), 8080))`. Always returns a string when the variable exists, so wrap with `int()` or `float()` when arithmetic is needed. `env_nil('MISSING')` -> `nil` `env_nil('HOST')` -> `localhost`
`env(name)`	`string`	Returns the value of a given environment variable (or an error if one doesn’t exist with that name). Missing variables are caught at config load time, before any queries run. Can be composed with other functions, e.g. `upper(env('HOST'))`. For numeric values, use expr-lang conversion: `int(env('PORT'))`, `float(env('RATE'))`. `env('API_KEY')` -> `ca3864628a8f29d644e1...`
`exp_f(rate, min, max, precision)`	`float64`	Exponentially-distributed random number in [min, max], rounded to `precision` decimal places. `exp_f(0.5, 0, 100, 2)` -> `3.72`
`exp(rate, min, max)`	`float64`	Exponentially-distributed random number in [min, max], rounded to 0 decimal places. `exp(0.5, 0, 100)` -> `4`
`expr(expression)`	`any`	Evaluates an arithmetic expression. Alias for `const`, the expr engine handles the arithmetic. `expr(2 + 3)` -> `5`
`fail(message)`	`error`	Returns an error that stops the current worker gracefully. Useful with `??` to catch unexpected values: `{'a': 1}['x'] ?? fail('unknown key')`. `fail('unexpected region')` -> (worker stops with error)
`fatal(message)`	`void`	Terminates the entire process immediately. Use when an unexpected value should halt all workers, not just the current one. `fatal('missing required config')` -> (process exits)
`field(name)`	`any`	Evaluates a named field from the current query’s `object:` object. Requires `object:` to be set on the query. Use in `args` to cherry-pick fields or control ordering. `field('email')` -> `alice@example.com`
`floor(x)`	`float64`	Largest integer less than or equal to x. `floor(3.7)` -> `3`
`gen_batch(total, batchSize, pattern)`	`[][]any`	Generates `total` values using gofakeit `pattern`, grouped into batches of `batchSize`. Each batch arg is a string of generated values delimited by the ASCII unit separator (char 31, `\x1f`). `gen_batch(4, 2, 'firstname')` -> `[["Alice\x1fBob"], ["Carol\x1fDave"]]`
`gen(pattern)`	`string`	Generates a random value using gofakeit patterns (e.g. `gen('number:1,100')`). `gen('number:1,10')` -> `7`
`global_iter()`	`int64`	Monotonic iteration counter shared across all workers in a stage. Increments by 1 each time any worker calls `RunIteration`. Never resets. Use for time-series seasonality and data drift patterns. `20.0 + 5.0 * sin(2.0 * pi * global_iter() / 1000)` -> `22.93...`
`global(name)`	`any`	Looks up a value from the `globals` section by name. Globals are also available directly as variables, so `global('warehouses')` and `warehouses` are equivalent. `global('warehouses')` -> `10`
`inet(cidr)`	`string`	Random IP address within the given CIDR block. `inet('192.168.1.0/24')` -> `192.168.1.42`
`iter()`	`int`	1-based row counter for `exec_batch` / `query_batch` queries. Returns 1 for the first row, 2 for the second, etc. Resets at the start of each batch query. Useful for generating sequential IDs without a global sequence. `iter()` -> `1`
`json_arr(minN, maxN, pattern)`	`string`	Builds a JSON array of N random values (N in [minN, maxN]) generated by a gofakeit `pattern`. `json_arr(1, 3, 'word')` -> `["foo","bar"]`
`json_obj(k1, v1, k2, v2, ...)`	`string`	Builds a JSON object string from key-value pair arguments. `json_obj('key', 'val')` -> `{"key":"val"}`
`local(name)`	`any`	Returns the value of a named local variable. Locals can be defined on individual queries or transactions. Query-level locals override transaction locals when both exist. Locals are re-evaluated per row in batch mode. Useful for calling `complete()` once and accessing multiple fields. `local("review").review_text` -> `"Great product!"`
`log(x)`	`float64`	Natural logarithm of x. `log(1.0)` -> `0`
`log10(x)`	`float64`	Base-10 logarithm of x. `log10(100.0)` -> `2`
`lognorm_f(mu, sigma, min, max, precision)`	`float64`	Log-normally-distributed random number in [min, max], rounded to `precision` decimal places. `lognorm_f(1.0, 0.5, 1, 1000, 2)` -> `3.42`
`lognorm(mu, sigma, min, max)`	`float64`	Log-normally-distributed random number in [min, max], rounded to 0 decimal places. `lognorm(1.0, 0.5, 1, 1000)` -> `3`
`max(name, field)`	`float64`	Maximum value of a numeric field in a named dataset. `max('fetch_products', 'price')` -> `49.99`
`min(name, field)`	`float64`	Minimum value of a numeric field in a named dataset. `min('fetch_products', 'price')` -> `1.99`
`mod(x, y)`	`float64`	Floating-point remainder of x/y. `mod(10.0, 3.0)` -> `1`
`norm_f(mean, stddev, min, max, precision)`	`float64`	Normally-distributed random number in [min, max], rounded to `precision` decimal places. `norm_f(50.0, 15.0, 1.0, 100.0, 2)` -> `52.37`
`norm_n(mean, stddev, min, max, minN, maxN)`	`string`	N unique normally-distributed values (N in [minN, maxN]) as a comma-separated string. `norm_n(50.0, 10.0, 1, 100, 2, 4)` -> `47,53,61`
`norm(mean, stddev, min, max)`	`float64`	Normally-distributed random number in [min, max], rounded to 0 decimal places. `norm(4, 1, 1, 5)` -> `4`
`null`	`nil`	Null literal. Alias for `nil`, for users more familiar with SQL/JSON terminology. Not a function, use as a bare variable. `const(null)` -> `NULL`
`nullable(expr, probability)`	`any`	Returns NULL with `probability` (0.0–1.0), otherwise returns the expression result. `nullable(gen('email'), 0.3)` -> `NULL`
`nurand_n(A, x, y, min, max)`	`string`	Generates N unique NURand values (N in [min, max]) as a comma-separated string. `nurand_n(255, 1, 100, 3, 5)` -> `42,87,13,61`
`nurand(A, x, y)`	`int`	TPC-C Non-Uniform Random: `(((random(0,A) \| random(x,y)) + C) / (y-x+1)) + x`. `nurand(255, 1, 100)` -> `42`
`obj(name, field)`	`any`	Evaluates only the named field from an object, avoiding the cost of evaluating all fields. `obj('order', 'product')` -> `Widget`
`obj(name)`	`map`	Evaluates all field expressions for a named object defined in the `objects` section and returns them as a map. Access individual fields with dot notation. `obj('order').product` -> `Widget`
`pi`	`float64`	The mathematical constant pi (3.14159…). Not a function - use as a bare variable. `2 * pi` -> `6.28318...`
`point_wkt(lat, lon, radiusKM)`	`string`	Generates a random geographic point as a WKT string: `POINT(lon lat)`. `point_wkt(51.5, -0.1, 10.0)` -> `POINT(-0.082 51.513)`
`point(lat, lon, radiusKM)`	`map`	Generates a random geographic point within `radiusKM` of (`lat`, `lon`). Access fields with `.lat` and `.lon`. `point(51.5, -0.1, 10.0).lat` -> `51.513`
`polygon_wkt(lat, lon, minKM, maxKM, points)`	`string`	Generates a jagged polygon with `points` vertices around (`lat`, `lon`), each at a random distance between `minKM` and `maxKM`. Returns a WKT POLYGON string. The ring is closed (first vertex repeated at end). `polygon_wkt(51.1, -0.4, 5, 15, 6)` -> `POLYGON((-0.33 51.18, ...))`
`polygon(lat, lon, minKM, maxKM, points)`	`[]map`	Generates a jagged polygon with `points` vertices around (`lat`, `lon`), each at a random distance between `minKM` and `maxKM`. Returns a slice of maps with `.lat` and `.lon` fields. The ring is closed (first vertex repeated at end). Requires `points >= 3`. `polygon(51.1, -0.4, 5, 15, 6)[0].lat` -> `51.18`
`pow(x, y)`	`float64`	x raised to the power y. `pow(2.0, 10.0)` -> `1024`
`ref_diff(name)`	`map`	Returns unique rows across multiple calls within the same query execution. Uses a swap-based index to avoid repeats. `ref_diff('products').name` -> `Widget`
`ref_each(query_or_dataset)`	`[][]any` or `map`	When given a SQL query string, executes it and returns all rows - each row becomes a separate arg set. When given a named reference dataset (unquoted), iterates sequentially through each row with same-row caching (like `ref_same`). `ref_each('SELECT id FROM t')` -> `[[1], [2], [3]]` `ref_each(product_catalog).name` -> `Widget`
`ref_exp(name, rate)`	`map`	Returns a random row from a named dataset using exponential distribution. Lower indices are selected more frequently. `rate` controls decay speed. `ref_exp('products', 1.5).name` -> `Widget`
`ref_lognorm(name, mu, sigma)`	`map`	Returns a random row from a named dataset using log-normal distribution. Creates a right-skewed access pattern where early rows are favored. `ref_lognorm('products', 0.0, 0.5).name` -> `Widget`
`ref_n(name, field, min, max)`	`string`	Picks N unique random rows (N in [min, max]) from a named dataset, extracts `field` from each, and returns a comma-separated string. `ref_n('products', 'name', 2, 3)` -> `Widget,Gadget`
`ref_norm(name, mean, stddev)`	`map`	Returns a random row from a named dataset using normal distribution. `mean` and `stddev` are expressed as fractions of the dataset length (e.g. 0.5 = middle, 0.2 = narrow spread). `ref_norm('products', 0.5, 0.2).name` -> `Gadget`
`ref_perm(name)`	`map`	Returns a random row on first call, then the same row for the entire lifetime of the worker. `ref_perm('products').name` -> `Widget`
`ref_rand(name)`	`map`	Returns a random row from a named dataset (populated by an `init` query). Access fields with dot notation: `ref_rand('fetch_warehouses').w_id`. `ref_rand('products').name` -> `Gadget`
`ref_same(name)`	`map`	Returns a random row, but the same row is reused across all `ref_same` calls within a single query execution. Cleared between iterations. `ref_same('products').name` -> `Widget`
`ref_zipf(name, s, v)`	`map`	Returns a random row from a named dataset using Zipfian distribution. The first row is the “hottest”, with frequency dropping off according to `s` (skew, > 1) and `v` (>= 1). `ref_zipf('products', 2.0, 1.0).name` -> `Widget`
`regex(pattern)`	`string`	Generates a random string matching the given regular expression. `regex('[A-Z]{3}-[0-9]{4}')` -> `ABK-7291`
`result()`	`map`	Returns the first row of the current query’s SELECT result as a map. Only available in `post_print` (after query execution). Access columns with dot notation. `result().total` -> `10000`
`results()`	`[]map`	Returns all rows of the current query’s SELECT result as a slice of maps. Only available in `post_print` (after query execution). Use with expr-lang builtins like `len()`, `map()`, `filter()`, `reduce()` to aggregate across rows. `len(results())` -> `5` `reduce(results(), #acc + #.balance, 0)` -> `50000`
`seq_exp(name, rate)`	`int`	Exponentially-distributed value from a global sequence. Lower indices are selected more frequently. `seq_exp("order_id", 0.5)` -> `7`
`seq_global(name)`	`int`	Shared auto-incrementing sequence across all workers. Returns the next value from a named sequence defined in the `seq` config section. Thread-safe via atomic counters. `seq_global("order_id")` -> `1`
`seq_lognorm(name, mu, sigma)`	`int`	Log-normally-distributed value from a global sequence. `seq_lognorm("order_id", 2, 0.5)` -> `8`
`seq_norm(name, mean, stddev)`	`int`	Normally-distributed value from a global sequence. `mean` and `stddev` are index positions (0-based). `seq_norm("order_id", 500, 100)` -> `487`
`seq_rand(name)`	`int`	Uniform random value from the already-generated values of a global sequence. Computes valid values from the sequence’s start, step, and current counter (no values stored in memory). `seq_rand("order_id")` -> `42`
`seq_zipf(name, s, v)`	`int`	Zipfian-distributed value from a global sequence. Lower indices (earlier values) are selected more frequently. `s` (> 1) and `v` (>= 1) control the distribution shape. `seq_zipf("order_id", 2.0, 1.0)` -> `3`
`seq(start, step)`	`int`	Auto-incrementing sequence per worker. Returns `start + counter * step`. `seq(1, 1)` -> `1`
`seq_alpha(length)`	`string`	Auto-incrementing alpha sequence per worker. Generates base-26 strings of the given length (e.g. aaa, aab, aac, …). `seq_alpha(3)` -> `aaa`
`seq_alpha_global(name)`	`string`	Shared auto-incrementing alpha sequence across all workers. Returns the next alpha value from a named sequence defined in the `seq` config section (requires `length` field). `seq_alpha_global("sku_code")` -> `aaa`
`set_exp(values, rate)`	`any`	Picks an item from a set using exponential distribution. `set_exp(['low', 'med', 'high'], 0.5)` -> `low`
`set_lognorm(values, mu, sigma)`	`any`	Picks an item from a set using log-normal distribution. `set_lognorm(['free', 'basic', 'pro'], 0.5, 0.5)` -> `free`
`set_norm(values, mean, stddev)`	`any`	Picks an item from a set using normal distribution. `set_norm([1, 2, 3, 4, 5], 2, 0.8)` -> `3`
`set_rand(values, weights)`	`any`	Picks a random item from a set. If weights are provided, weighted random selection is used; otherwise uniform. `set_rand(['a', 'b', 'c'], [])` -> `b`
`set_zipf(values, s, v)`	`any`	Picks an item from a set using Zipfian distribution. `set_zipf(['a', 'b', 'c'], 2.0, 1.0)` -> `a`
`sin(x)`	`float64`	Sine of x (x in radians). `sin(pi / 2)` -> `1`
`sqrt(x)`	`float64`	Square root of x. `sqrt(144.0)` -> `12`
`sum(name, field)`	`float64`	Sum of a numeric field across all rows in a named dataset. `sum('fetch_products', 'price')` -> `96.95`
`tan(x)`	`float64`	Tangent of x (x in radians). `tan(pi / 4)` -> `1`
`template(format, args...)`	`string`	Formats a string using Go’s `fmt.Sprintf` syntax. `template('ORD-%05d', seq(1, 1))` -> `ORD-00001`
`time(min, max)`	`string`	Random time of day between `min` and `max` (HH:MM:SS format). `time('08:00:00', '18:00:00')` -> `14:32:07`
`timestamp(min, max)`	`string`	Random timestamp between `min` and `max` (RFC3339). `timestamp('2020-01-01T00:00:00Z', '2025-01-01T00:00:00Z')` -> `2023-07-15T14:32:07Z`
`timez(min, max)`	`string`	Random time of day with `+00:00` timezone suffix. `timez('09:00:00', '17:00:00')` -> `14:32:07+00:00`
`uniform_f(min, max, precision)`	`float64`	Uniform random float in [min, max] rounded to `precision` decimal places. `uniform_f(0.01, 999.99, 2)` -> `347.82`
`uniform(min, max)`	`float64`	Uniform random float in [min, max]. `uniform(1, 100)` -> `73.12`
`uniq(expression [, expression...] [, maxRetries])`	`any`	Evaluates one or more string expressions repeatedly until a unique value (or composite tuple) is produced. Defaults to 100 retry attempts; pass an optional integer as the last argument to override. Single expression - returns a single value: `uniq("gen('airlineairportiata')")` -> `LAX` Composite - pass multiple expressions to enforce cross-column uniqueness. Returns `[]any`; index to pick each column. Same-row calls with identical expressions return a cached tuple: `uniq("gen('first_name')", "gen('last_name')")[0]` -> `Alice` `uniq("gen('first_name')", "gen('last_name')")[1]` -> `Smith` Seen values persist across rows within a query and reset between queries.
`uuid_v1()`	`string`	Generates a Version 1 UUID (timestamp + node ID). `uuid_v1()` -> `6ba7b810-9dad-11d1-80b4-00c04fd430c8`
`uuid_v4()`	`string`	Generates a Version 4 UUID (random). `uuid_v4()` -> `550e8400-e29b-41d4-a716-446655440000`
`uuid_v6()`	`string`	Generates a Version 6 UUID (reordered timestamp). `uuid_v6()` -> `1ef21d2f-6ba7-6810-9dad-00c04fd430c8`
`uuid_v7()`	`string`	Generates a Version 7 UUID (Unix timestamp + random, sortable). `uuid_v7()` -> `018ef4c9-7f3a-7b3c-8d1a-2b4c5d6e7f8a`
`varbit(n)`	`string`	Random variable-length bit string of 1 to `n` bits. `varbit(8)` -> `10110`
`vector_norm(dims, clusters, spread, mean, stddev)`	`string`	Like `vector` but picks centroids using a normal distribution over cluster indices. `mean` is the center cluster index, `stddev` controls spread. `vector_norm(32, 5, 0.1, 2.0, 0.8)`
`vector_zipf(dims, clusters, spread, s, v)`	`string`	Like `vector` but picks centroids using a Zipfian distribution. Cluster 0 is the “hottest”, with frequency dropping off according to `s` (skew) and `v` (>= 1). Simulates real-world data where some categories have far more embeddings. `vector_zipf(32, 5, 0.1, 2.0, 1.0)`
`vector(dims, clusters, spread)`	`string`	vector literal with uniform centroid selection. Generates clustered, unit-length vectors for realistic similarity search. `dims` is the number of dimensions, `clusters` is the number of cluster centroids, and `spread` controls intra-cluster noise (Gaussian σ). `vector(4, 3, 0.1)` -> `[0.512340,-0.234567,0.678901,0.456789]`
`weighted_sample_n(name, field, weightField, minN, maxN)`	`string`	Picks N unique rows using weighted selection, returns a comma-separated string. `weighted_sample_n('products', 'name', 'stock', 2, 3)` -> `Widget,Pen`
`zipf(s, v, max)`	`int`	Zipfian-distributed random integer in [0, max]. `zipf(2.0, 1.0, 999)` -> `3`

Choosing a Sequence Generator#

edg has three ways to generate sequential IDs. Picking the wrong one silently produces incorrect data, so choose carefully.

Function	Scope	Resets?	IDs Unique Across Workers?	Use When
`iter()`	Per batch query	Yes - resets to 1 at the start of each `exec_batch` / `query_batch`	N/A (single-worker seed)	Seeding tables with fixed-size ID ranges (1..N). Always starts at 1, unaffected by other queries.
`seq_global(name)`	Global (all workers)	Never	Yes - atomic counter	Generating globally unique IDs across concurrent workers in `run`. Requires a `seq` config entry.
`seq(start, step)`	Per worker	Never	No - each worker has its own counter	Generating monotonic values within a single worker’s `run` loop (e.g. increasing timestamps, per-worker order numbers).
`seq_alpha_global(name)`	Global (all workers)	Never	Yes - atomic counter	Generating globally unique alpha codes (aaa, aab, …) across workers. Requires a `seq` config entry with `length`.
`seq_alpha(length)`	Per worker	Never	No - each worker has its own counter	Generating monotonic alpha codes within a single worker’s `run` loop.

Common mistakes#

Don’t use seq() across multiple seed queries.

seq(1, 1) is a single counter that never resets. If populate_accounts uses seq(1, 1) with count: 10, the counter reaches 10. A later populate_counters query using the same seq(1, 1) continues from 11, not 1. Use iter() instead - it resets per batch query.

# WRONG - Counter IDs will be 11, 12 (not 1, 2)
seed:
  - name: populate_accounts
    type: exec_batch
    count: 10
    args:
      - seq(1, 1)        # 1..10
    query: INSERT INTO account (id) VALUES ($1)

  - name: populate_counters
    type: exec_batch
    count: 10
    args:
      - seq(1, 1)        # 11..20
    query: INSERT INTO counter (id) VALUES ($1)

# CORRECT - iter() resets per query
seed:
  - name: populate_accounts
    type: exec_batch
    count: 10
    args:
      - iter()           # 1..10
    query: INSERT INTO account (id) VALUES ($1)

  - name: populate_counters
    type: exec_batch
    count: 10
    args:
      - iter()           # 1..10
    query: INSERT INTO counter (id) VALUES ($1)

Don’t use seq() when you need globally unique IDs.

With multiple workers, each worker’s seq(1, 1) produces 1, 2, 3, … independently - you’ll get duplicate IDs. Use seq_global instead.

Don’t use seq_global() for seed queries.

The counter never resets, so re-running deseed + seed produces new IDs each time. Use iter() for seeds and reserve seq_global for run workloads.

Function Lifecycle#

Several functions maintain state. Understanding when that state resets is important for getting correct results:

Function	Scope	Resets
`arg(index)` / `arg('name')`	Per-query	Returns the value of arg at `index` (or by name when using named args). Cleared before the next query. In batch queries, resets per row.
`complete_array(tool, prompt, count)`	Per-query	Makes one API call per unique (tool, prompt, count) tuple. The result (`[]map`) is memoized so multiple `ref_each(local(...)).field` accesses within a row share the same call. Not deferred - resolves immediately even in batch queries.
`complete(tool, prompt)`	Per-batch	In `exec`/`query` (non-batch) queries, each unique (tool, prompt) pair makes one API call; same-row field accesses are memoized. In `exec_batch`/`query_batch` queries, all `complete()` calls are deferred - placeholder maps are inserted during arg evaluation, then all pending requests are resolved concurrently (up to 8 parallel) after the batch is generated.
`embed(text...)`	Per-batch	In `exec`/`query` (non-batch) queries, each call makes a separate API request. In `exec_batch`/`query_batch` queries, all `embed()` calls within a batch are deferred - placeholders are inserted during arg evaluation, then all pending texts are resolved in a single API call (or multiple calls if `--embed-max-batch` is set). For example, a 100-row batch with `--embed-max-batch 30` produces 4 API calls (30+30+30+10) instead of 100 individual calls.
`global_iter()`	Global	Monotonic counter incremented once per `RunIteration` call by any worker. Never resets. Shared across all workers via atomic int64. Use for time-series seasonality and data drift.
`iter()`	Per-query	Returns 1 for the first row, 2 for the second, etc. Resets to 0 at the start of each batch query.
`nurand(A, x, y)`	Per-worker	The TPC-C constant C is generated once per worker per A value and stays fixed for the worker’s lifetime.
`ref_diff(name)`	Per-query	Returns a unique row on each call within a query (no repeats). Index resets before the next query.
`ref_exp(name, rate)`	None	Fresh random row on every call (exponential distribution)
`ref_lognorm(name, mu, sigma)`	None	Fresh random row on every call (log-normal distribution)
`ref_norm(name, mean, stddev)`	None	Fresh random row on every call (normal distribution)
`ref_perm(name)`	Per-worker	Picks a row on first call and returns that same row for the entire lifetime of the worker. Never resets.
`ref_rand(name)`	None	Fresh random row on every call
`ref_same(name)`	Per-query	Picks a row on first call within a query; all subsequent `ref_same` calls for the same dataset within that query return the same row. Cleared before the next query.
`ref_zipf(name, s, v)`	None	Fresh random row on every call (Zipfian distribution)
`result()` / `results()`	Per-query	Returns the last query’s result rows. Only available in `post_print` expressions. Set after each `type: query` execution; cleared after each `type: exec`.
`seq_global(name)`	Global	Single counter shared across all workers via atomic increment. Values are globally unique. Configured in the `seq` config section.
`seq_rand`	Global	Pick from already-generated sequence values using the named distribution. The valid value set grows as `seq_global` advances the counter. No values are stored in memory.
`seq_zipf` / `seq_norm` / `seq_exp` / `seq_lognorm`	Global	Same as `seq_rand` but with shaped distributions.
`seq(start, step)`	Per-worker	Counter starts at 0 for each worker and increments on every call. Two workers both calling `seq(1, 1)` will produce the same sequence independently – values are not globally unique.
`uniq(expression [, ...])`	Per-query	Tracks seen values (or composite tuples) across all rows within a query. Composite calls are cached per-row so multiple arg positions share the same tuple. Resets between queries.
`vector` / `vector_zipf` / `vector_norm`	Per-worker	Cluster centroids are generated on first call (keyed by dims+clusters) and reused for the worker’s lifetime. Each call picks a centroid (uniform, Zipfian, or normal) and adds noise.