Expressions#
Query arguments are written as expressions compiled at startup using expr-lang/expr. Each expression has access to the built-in functions, globals, and any user-defined expressions.
Tip: Use
edg replto try any expression interactively without a database connection. See REPL for details.
Functions#
These are edg’s built-in functions, available in any expression context (args:, expressions:, globals). They generate data, reference datasets, aggregate values, and control execution flow.
| Function | Returns | Description |
|---|---|---|
__sep__ | string | Driver-aware batch field separator. A query-text token that is replaced with the SQL function producing the ASCII unit separator character (char 31) used to delimit values within batch-expanded placeholders. Resolves to chr(31) for pgx, CHAR(31) for MySQL and MSSQL, codepoints-to-string(31) for Oracle, CODE_POINTS_TO_STRING([31]) for Spanner. Can be used in any argument position within SQL. Always use __sep__ instead of a literal comma. Generated values may contain commas, which would silently corrupt your data.string_to_array('$1', __sep__) |
abs(x) | float64 | Absolute value of x.abs(-5.0) -> 5 |
acos(x) | float64 | Arc cosine of x (result in radians).acos(1.0) -> 0 |
arg(index) | any | Returns the value of a previously evaluated arg by its zero-based index or name. Enables dependent columns where later args reference earlier ones.arg(0) -> "Alice"arg('email') -> "alice@example.com" (with named args) |
array(minN, maxN, pattern) | string | PostgreSQL/CockroachDB array literal with a random number of elements.array(2, 4, 'email') -> {a@b.com,c@d.com,d@e.com} |
asin(x) | float64 | Arc sine of x (result in radians).asin(1.0) -> 1.5707... |
atan(x) | float64 | Arc tangent of x (result in radians).atan(1.0) -> 0.7853... |
atan2(y, x) | float64 | Two-argument arc tangent of y/x (result in radians). Handles quadrant correctly.atan2(1.0, 1.0) -> 0.7853... |
avg(name, field) | float64 | Average of a numeric field across all rows in a named dataset.avg('fetch_products', 'price') -> 19.39 |
batch(n) | [][]any | Returns sequential integers [0, n) as batch arg sets,batch(3) -> [[0], [1], [2]] |
bit(n) | string | Random fixed-length bit string of exactly n bits.bit(8) -> 10110011 |
blob(n) | []byte | Random n bytes as raw binary data. Works across all databases (PostgreSQL, MySQL, Oracle, MSSQL) via bind parameters. Use this for BLOB, BYTEA, VARBINARY, and RAW columns.blob(1024) -> (1024 random bytes) |
bool() | bool | Random true or false. Useful as a coin flip with cond() and arg() for mutually exclusive columns.bool() -> true |
bytes(n) | string | Random n bytes as a hex-encoded string with \x prefix. PostgreSQL/CockroachDB only. For cross-database binary data, use blob(n) instead.bytes(4) -> \x1a2b3c4d |
ceil(x) | float64 | Smallest integer greater than or equal to x.ceil(3.2) -> 4 |
coalesce(v1, v2, ...) | any | Returns the first non-nil value from arguments.coalesce(nil, 'default') -> default |
complete_array(tool, prompt, count) | []map | Generates N structured items in a single LLM call. The tool schema is automatically wrapped in an array request. Returns []map for use with ref_each(). Memoized by (tool, prompt, count). Requires --complete-api-key or EDG_COMPLETE_API_KEY. See Complete.ref_each(complete_array("review", "Generate 5 reviews", 5)).review_text -> "Great product!" |
complete(tool, prompt) | map | Calls an LLM with a named tool schema and returns structured data as a map. Access fields with dot notation. Per-row memoization ensures multiple field accesses with the same tool and prompt make only one API call. Requires --complete-api-key or EDG_COMPLETE_API_KEY. See Complete.complete("review", "Review: Widget").review_text -> "Great product!"complete("review", "Review: Widget").rating -> 4 |
cond(predicate, trueVal, falseVal) | any | Returns trueVal if predicate is true, falseVal otherwise.cond(true, 'yes', 'no') -> yes |
const(value) | any | Returns the value as-is. Useful for literal constants.const(42) -> 42 |
cos(x) | float64 | Cosine of x (x in radians).cos(0.0) -> 1 |
count(name) | int | Number of rows in a named dataset.count('fetch_products') -> 5 |
date_offset(duration) | string | Returns the current time offset by duration, formatted as RFC3339.date_offset('-72h') -> 2026-04-08T10:00:00Z |
date(format, min, max) | string | Random timestamp formatted using a Go time format string.date('2006-01-02', '2020-01-01T00:00:00Z', '2025-01-01T00:00:00Z') -> 2023-07-15 |
distinct(name, field) | int | Number of distinct values for a field in a named dataset.distinct('fetch_products', 'category') -> 3 |
duration(min, max) | string | Random duration between min and max (Go duration strings).duration('1h', '24h') -> 14h32m17s |
embed(text...) | string | Calls an external embedding API (OpenAI-compatible) and returns a vector literal. Variadic - multiple args are joined with a space. Requires --embed-api-key or EDG_EMBED_API_KEY. See Embed.embed('hello world') -> [0.0123,-0.0456,...]embed(field('name'), field('description')) -> [0.0789,...] |
env_nil(name) | any | Returns the value of an environment variable as a string, or nil if unset. Unlike env(), does not error on missing variables. Designed for use with coalesce() to provide defaults: int(coalesce(env_nil('PORT'), 8080)). Always returns a string when the variable exists, so wrap with int() or float() when arithmetic is needed.env_nil('MISSING') -> nilenv_nil('HOST') -> localhost |
env(name) | string | Returns the value of a given environment variable (or an error if one doesn’t exist with that name). Missing variables are caught at config load time, before any queries run. Can be composed with other functions, e.g. upper(env('HOST')). For numeric values, use expr-lang conversion: int(env('PORT')), float(env('RATE')).env('API_KEY') -> ca3864628a8f29d644e1... |
exp_f(rate, min, max, precision) | float64 | Exponentially-distributed random number in [min, max], rounded to precision decimal places.exp_f(0.5, 0, 100, 2) -> 3.72 |
exp(rate, min, max) | float64 | Exponentially-distributed random number in [min, max], rounded to 0 decimal places.exp(0.5, 0, 100) -> 4 |
expr(expression) | any | Evaluates an arithmetic expression. Alias for const, the expr engine handles the arithmetic.expr(2 + 3) -> 5 |
fail(message) | error | Returns an error that stops the current worker gracefully. Useful with ?? to catch unexpected values: {'a': 1}['x'] ?? fail('unknown key').fail('unexpected region') -> (worker stops with error) |
fatal(message) | void | Terminates the entire process immediately. Use when an unexpected value should halt all workers, not just the current one.fatal('missing required config') -> (process exits) |
field(name) | any | Evaluates a named field from the current query’s object: object. Requires object: to be set on the query. Use in args to cherry-pick fields or control ordering.field('email') -> alice@example.com |
floor(x) | float64 | Largest integer less than or equal to x.floor(3.7) -> 3 |
gen_batch(total, batchSize, pattern) | [][]any | Generates total values using gofakeit pattern, grouped into batches of batchSize. Each batch arg is a string of generated values delimited by the ASCII unit separator (char 31, \x1f).gen_batch(4, 2, 'firstname') -> [["Alice\x1fBob"], ["Carol\x1fDave"]] |
gen(pattern) | string | Generates a random value using gofakeit patterns (e.g. gen('number:1,100')).gen('number:1,10') -> 7 |
global_iter() | int64 | Monotonic iteration counter shared across all workers in a stage. Increments by 1 each time any worker calls RunIteration. Never resets. Use for time-series seasonality and data drift patterns.20.0 + 5.0 * sin(2.0 * pi * global_iter() / 1000) -> 22.93... |
global(name) | any | Looks up a value from the globals section by name. Globals are also available directly as variables, so global('warehouses') and warehouses are equivalent.global('warehouses') -> 10 |
inet(cidr) | string | Random IP address within the given CIDR block.inet('192.168.1.0/24') -> 192.168.1.42 |
iter() | int | 1-based row counter for exec_batch / query_batch queries. Returns 1 for the first row, 2 for the second, etc. Resets at the start of each batch query. Useful for generating sequential IDs without a global sequence.iter() -> 1 |
json_arr(minN, maxN, pattern) | string | Builds a JSON array of N random values (N in [minN, maxN]) generated by a gofakeit pattern.json_arr(1, 3, 'word') -> ["foo","bar"] |
json_obj(k1, v1, k2, v2, ...) | string | Builds a JSON object string from key-value pair arguments.json_obj('key', 'val') -> {"key":"val"} |
local(name) | any | Returns the value of a named local variable. Locals can be defined on individual queries or transactions. Query-level locals override transaction locals when both exist. Locals are re-evaluated per row in batch mode. Useful for calling complete() once and accessing multiple fields.local("review").review_text -> "Great product!" |
log(x) | float64 | Natural logarithm of x.log(1.0) -> 0 |
log10(x) | float64 | Base-10 logarithm of x.log10(100.0) -> 2 |
lognorm_f(mu, sigma, min, max, precision) | float64 | Log-normally-distributed random number in [min, max], rounded to precision decimal places.lognorm_f(1.0, 0.5, 1, 1000, 2) -> 3.42 |
lognorm(mu, sigma, min, max) | float64 | Log-normally-distributed random number in [min, max], rounded to 0 decimal places.lognorm(1.0, 0.5, 1, 1000) -> 3 |
max(name, field) | float64 | Maximum value of a numeric field in a named dataset.max('fetch_products', 'price') -> 49.99 |
min(name, field) | float64 | Minimum value of a numeric field in a named dataset.min('fetch_products', 'price') -> 1.99 |
mod(x, y) | float64 | Floating-point remainder of x/y.mod(10.0, 3.0) -> 1 |
norm_f(mean, stddev, min, max, precision) | float64 | Normally-distributed random number in [min, max], rounded to precision decimal places.norm_f(50.0, 15.0, 1.0, 100.0, 2) -> 52.37 |
norm_n(mean, stddev, min, max, minN, maxN) | string | N unique normally-distributed values (N in [minN, maxN]) as a comma-separated string.norm_n(50.0, 10.0, 1, 100, 2, 4) -> 47,53,61 |
norm(mean, stddev, min, max) | float64 | Normally-distributed random number in [min, max], rounded to 0 decimal places.norm(4, 1, 1, 5) -> 4 |
null | nil | Null literal. Alias for nil, for users more familiar with SQL/JSON terminology. Not a function, use as a bare variable.const(null) -> NULL |
nullable(expr, probability) | any | Returns NULL with probability (0.0–1.0), otherwise returns the expression result.nullable(gen('email'), 0.3) -> NULL |
nurand_n(A, x, y, min, max) | string | Generates N unique NURand values (N in [min, max]) as a comma-separated string.nurand_n(255, 1, 100, 3, 5) -> 42,87,13,61 |
nurand(A, x, y) | int | TPC-C Non-Uniform Random: (((random(0,A) | random(x,y)) + C) / (y-x+1)) + x.nurand(255, 1, 100) -> 42 |
obj(name, field) | any | Evaluates only the named field from an object, avoiding the cost of evaluating all fields.obj('order', 'product') -> Widget |
obj(name) | map | Evaluates all field expressions for a named object defined in the objects section and returns them as a map. Access individual fields with dot notation.obj('order').product -> Widget |
pi | float64 | The mathematical constant pi (3.14159…). Not a function - use as a bare variable.2 * pi -> 6.28318... |
point_wkt(lat, lon, radiusKM) | string | Generates a random geographic point as a WKT string: POINT(lon lat).point_wkt(51.5, -0.1, 10.0) -> POINT(-0.082 51.513) |
point(lat, lon, radiusKM) | map | Generates a random geographic point within radiusKM of (lat, lon). Access fields with .lat and .lon.point(51.5, -0.1, 10.0).lat -> 51.513 |
polygon_wkt(lat, lon, minKM, maxKM, points) | string | Generates a jagged polygon with points vertices around (lat, lon), each at a random distance between minKM and maxKM. Returns a WKT POLYGON string. The ring is closed (first vertex repeated at end).polygon_wkt(51.1, -0.4, 5, 15, 6) -> POLYGON((-0.33 51.18, ...)) |
polygon(lat, lon, minKM, maxKM, points) | []map | Generates a jagged polygon with points vertices around (lat, lon), each at a random distance between minKM and maxKM. Returns a slice of maps with .lat and .lon fields. The ring is closed (first vertex repeated at end). Requires points >= 3.polygon(51.1, -0.4, 5, 15, 6)[0].lat -> 51.18 |
pow(x, y) | float64 | x raised to the power y.pow(2.0, 10.0) -> 1024 |
ref_diff(name) | map | Returns unique rows across multiple calls within the same query execution. Uses a swap-based index to avoid repeats.ref_diff('products').name -> Widget |
ref_each(query_or_dataset) | [][]any or map | When given a SQL query string, executes it and returns all rows - each row becomes a separate arg set. When given a named reference dataset (unquoted), iterates sequentially through each row with same-row caching (like ref_same).ref_each('SELECT id FROM t') -> [[1], [2], [3]]ref_each(product_catalog).name -> Widget |
ref_exp(name, rate) | map | Returns a random row from a named dataset using exponential distribution. Lower indices are selected more frequently. rate controls decay speed.ref_exp('products', 1.5).name -> Widget |
ref_lognorm(name, mu, sigma) | map | Returns a random row from a named dataset using log-normal distribution. Creates a right-skewed access pattern where early rows are favored.ref_lognorm('products', 0.0, 0.5).name -> Widget |
ref_n(name, field, min, max) | string | Picks N unique random rows (N in [min, max]) from a named dataset, extracts field from each, and returns a comma-separated string.ref_n('products', 'name', 2, 3) -> Widget,Gadget |
ref_norm(name, mean, stddev) | map | Returns a random row from a named dataset using normal distribution. mean and stddev are expressed as fractions of the dataset length (e.g. 0.5 = middle, 0.2 = narrow spread).ref_norm('products', 0.5, 0.2).name -> Gadget |
ref_perm(name) | map | Returns a random row on first call, then the same row for the entire lifetime of the worker.ref_perm('products').name -> Widget |
ref_rand(name) | map | Returns a random row from a named dataset (populated by an init query). Access fields with dot notation: ref_rand('fetch_warehouses').w_id.ref_rand('products').name -> Gadget |
ref_same(name) | map | Returns a random row, but the same row is reused across all ref_same calls within a single query execution. Cleared between iterations.ref_same('products').name -> Widget |
ref_zipf(name, s, v) | map | Returns a random row from a named dataset using Zipfian distribution. The first row is the “hottest”, with frequency dropping off according to s (skew, > 1) and v (>= 1).ref_zipf('products', 2.0, 1.0).name -> Widget |
regex(pattern) | string | Generates a random string matching the given regular expression.regex('[A-Z]{3}-[0-9]{4}') -> ABK-7291 |
result() | map | Returns the first row of the current query’s SELECT result as a map. Only available in post_print (after query execution). Access columns with dot notation.result().total -> 10000 |
results() | []map | Returns all rows of the current query’s SELECT result as a slice of maps. Only available in post_print (after query execution). Use with expr-lang builtins like len(), map(), filter(), reduce() to aggregate across rows.len(results()) -> 5reduce(results(), #acc + #.balance, 0) -> 50000 |
seq_exp(name, rate) | int | Exponentially-distributed value from a global sequence. Lower indices are selected more frequently.seq_exp("order_id", 0.5) -> 7 |
seq_global(name) | int | Shared auto-incrementing sequence across all workers. Returns the next value from a named sequence defined in the seq config section. Thread-safe via atomic counters.seq_global("order_id") -> 1 |
seq_lognorm(name, mu, sigma) | int | Log-normally-distributed value from a global sequence.seq_lognorm("order_id", 2, 0.5) -> 8 |
seq_norm(name, mean, stddev) | int | Normally-distributed value from a global sequence. mean and stddev are index positions (0-based).seq_norm("order_id", 500, 100) -> 487 |
seq_rand(name) | int | Uniform random value from the already-generated values of a global sequence. Computes valid values from the sequence’s start, step, and current counter (no values stored in memory).seq_rand("order_id") -> 42 |
seq_zipf(name, s, v) | int | Zipfian-distributed value from a global sequence. Lower indices (earlier values) are selected more frequently. s (> 1) and v (>= 1) control the distribution shape.seq_zipf("order_id", 2.0, 1.0) -> 3 |
seq(start, step) | int | Auto-incrementing sequence per worker. Returns start + counter * step.seq(1, 1) -> 1 |
seq_alpha(length) | string | Auto-incrementing alpha sequence per worker. Generates base-26 strings of the given length (e.g. aaa, aab, aac, …).seq_alpha(3) -> aaa |
seq_alpha_global(name) | string | Shared auto-incrementing alpha sequence across all workers. Returns the next alpha value from a named sequence defined in the seq config section (requires length field).seq_alpha_global("sku_code") -> aaa |
set_exp(values, rate) | any | Picks an item from a set using exponential distribution.set_exp(['low', 'med', 'high'], 0.5) -> low |
set_lognorm(values, mu, sigma) | any | Picks an item from a set using log-normal distribution.set_lognorm(['free', 'basic', 'pro'], 0.5, 0.5) -> free |
set_norm(values, mean, stddev) | any | Picks an item from a set using normal distribution.set_norm([1, 2, 3, 4, 5], 2, 0.8) -> 3 |
set_rand(values, weights) | any | Picks a random item from a set. If weights are provided, weighted random selection is used; otherwise uniform.set_rand(['a', 'b', 'c'], []) -> b |
set_zipf(values, s, v) | any | Picks an item from a set using Zipfian distribution.set_zipf(['a', 'b', 'c'], 2.0, 1.0) -> a |
sin(x) | float64 | Sine of x (x in radians).sin(pi / 2) -> 1 |
sqrt(x) | float64 | Square root of x.sqrt(144.0) -> 12 |
sum(name, field) | float64 | Sum of a numeric field across all rows in a named dataset.sum('fetch_products', 'price') -> 96.95 |
tan(x) | float64 | Tangent of x (x in radians).tan(pi / 4) -> 1 |
template(format, args...) | string | Formats a string using Go’s fmt.Sprintf syntax.template('ORD-%05d', seq(1, 1)) -> ORD-00001 |
time(min, max) | string | Random time of day between min and max (HH:MM:SS format).time('08:00:00', '18:00:00') -> 14:32:07 |
timestamp(min, max) | string | Random timestamp between min and max (RFC3339).timestamp('2020-01-01T00:00:00Z', '2025-01-01T00:00:00Z') -> 2023-07-15T14:32:07Z |
timez(min, max) | string | Random time of day with +00:00 timezone suffix.timez('09:00:00', '17:00:00') -> 14:32:07+00:00 |
uniform_f(min, max, precision) | float64 | Uniform random float in [min, max] rounded to precision decimal places.uniform_f(0.01, 999.99, 2) -> 347.82 |
uniform(min, max) | float64 | Uniform random float in [min, max].uniform(1, 100) -> 73.12 |
uniq(expression [, expression...] [, maxRetries]) | any | Evaluates one or more string expressions repeatedly until a unique value (or composite tuple) is produced. Defaults to 100 retry attempts; pass an optional integer as the last argument to override. Single expression - returns a single value: uniq("gen('airlineairportiata')") -> LAXComposite - pass multiple expressions to enforce cross-column uniqueness. Returns []any; index to pick each column. Same-row calls with identical expressions return a cached tuple:uniq("gen('first_name')", "gen('last_name')")[0] -> Aliceuniq("gen('first_name')", "gen('last_name')")[1] -> SmithSeen values persist across rows within a query and reset between queries. |
uuid_v1() | string | Generates a Version 1 UUID (timestamp + node ID).uuid_v1() -> 6ba7b810-9dad-11d1-80b4-00c04fd430c8 |
uuid_v4() | string | Generates a Version 4 UUID (random).uuid_v4() -> 550e8400-e29b-41d4-a716-446655440000 |
uuid_v6() | string | Generates a Version 6 UUID (reordered timestamp).uuid_v6() -> 1ef21d2f-6ba7-6810-9dad-00c04fd430c8 |
uuid_v7() | string | Generates a Version 7 UUID (Unix timestamp + random, sortable).uuid_v7() -> 018ef4c9-7f3a-7b3c-8d1a-2b4c5d6e7f8a |
varbit(n) | string | Random variable-length bit string of 1 to n bits.varbit(8) -> 10110 |
vector_norm(dims, clusters, spread, mean, stddev) | string | Like vector but picks centroids using a normal distribution over cluster indices. mean is the center cluster index, stddev controls spread.vector_norm(32, 5, 0.1, 2.0, 0.8) |
vector_zipf(dims, clusters, spread, s, v) | string | Like vector but picks centroids using a Zipfian distribution. Cluster 0 is the “hottest”, with frequency dropping off according to s (skew) and v (>= 1). Simulates real-world data where some categories have far more embeddings.vector_zipf(32, 5, 0.1, 2.0, 1.0) |
vector(dims, clusters, spread) | string | vector literal with uniform centroid selection. Generates clustered, unit-length vectors for realistic similarity search. dims is the number of dimensions, clusters is the number of cluster centroids, and spread controls intra-cluster noise (Gaussian σ).vector(4, 3, 0.1) -> [0.512340,-0.234567,0.678901,0.456789] |
weighted_sample_n(name, field, weightField, minN, maxN) | string | Picks N unique rows using weighted selection, returns a comma-separated string.weighted_sample_n('products', 'name', 'stock', 2, 3) -> Widget,Pen |
zipf(s, v, max) | int | Zipfian-distributed random integer in [0, max].zipf(2.0, 1.0, 999) -> 3 |
Choosing a Sequence Generator#
edg has three ways to generate sequential IDs. Picking the wrong one silently produces incorrect data, so choose carefully.
| Function | Scope | Resets? | IDs Unique Across Workers? | Use When |
|---|---|---|---|---|
iter() | Per batch query | Yes - resets to 1 at the start of each exec_batch / query_batch | N/A (single-worker seed) | Seeding tables with fixed-size ID ranges (1..N). Always starts at 1, unaffected by other queries. |
seq_global(name) | Global (all workers) | Never | Yes - atomic counter | Generating globally unique IDs across concurrent workers in run. Requires a seq config entry. |
seq(start, step) | Per worker | Never | No - each worker has its own counter | Generating monotonic values within a single worker’s run loop (e.g. increasing timestamps, per-worker order numbers). |
seq_alpha_global(name) | Global (all workers) | Never | Yes - atomic counter | Generating globally unique alpha codes (aaa, aab, …) across workers. Requires a seq config entry with length. |
seq_alpha(length) | Per worker | Never | No - each worker has its own counter | Generating monotonic alpha codes within a single worker’s run loop. |
Common mistakes#
Don’t use seq() across multiple seed queries.
seq(1, 1) is a single counter that never resets. If populate_accounts uses seq(1, 1) with count: 10, the counter reaches 10. A later populate_counters query using the same seq(1, 1) continues from 11, not 1. Use iter() instead - it resets per batch query.
# WRONG - Counter IDs will be 11, 12 (not 1, 2)
seed:
- name: populate_accounts
type: exec_batch
count: 10
args:
- seq(1, 1) # 1..10
query: INSERT INTO account (id) VALUES ($1)
- name: populate_counters
type: exec_batch
count: 10
args:
- seq(1, 1) # 11..20
query: INSERT INTO counter (id) VALUES ($1)# CORRECT - iter() resets per query
seed:
- name: populate_accounts
type: exec_batch
count: 10
args:
- iter() # 1..10
query: INSERT INTO account (id) VALUES ($1)
- name: populate_counters
type: exec_batch
count: 10
args:
- iter() # 1..10
query: INSERT INTO counter (id) VALUES ($1)Don’t use seq() when you need globally unique IDs.
With multiple workers, each worker’s seq(1, 1) produces 1, 2, 3, … independently - you’ll get duplicate IDs. Use seq_global instead.
Don’t use seq_global() for seed queries.
The counter never resets, so re-running deseed + seed produces new IDs each time. Use iter() for seeds and reserve seq_global for run workloads.
Function Lifecycle#
Several functions maintain state. Understanding when that state resets is important for getting correct results:
| Function | Scope | Resets |
|---|---|---|
arg(index) / arg('name') | Per-query | Returns the value of arg at index (or by name when using named args). Cleared before the next query. In batch queries, resets per row. |
complete_array(tool, prompt, count) | Per-query | Makes one API call per unique (tool, prompt, count) tuple. The result ([]map) is memoized so multiple ref_each(local(...)).field accesses within a row share the same call. Not deferred - resolves immediately even in batch queries. |
complete(tool, prompt) | Per-batch | In exec/query (non-batch) queries, each unique (tool, prompt) pair makes one API call; same-row field accesses are memoized. In exec_batch/query_batch queries, all complete() calls are deferred - placeholder maps are inserted during arg evaluation, then all pending requests are resolved concurrently (up to 8 parallel) after the batch is generated. |
embed(text...) | Per-batch | In exec/query (non-batch) queries, each call makes a separate API request. In exec_batch/query_batch queries, all embed() calls within a batch are deferred - placeholders are inserted during arg evaluation, then all pending texts are resolved in a single API call (or multiple calls if --embed-max-batch is set). For example, a 100-row batch with --embed-max-batch 30 produces 4 API calls (30+30+30+10) instead of 100 individual calls. |
global_iter() | Global | Monotonic counter incremented once per RunIteration call by any worker. Never resets. Shared across all workers via atomic int64. Use for time-series seasonality and data drift. |
iter() | Per-query | Returns 1 for the first row, 2 for the second, etc. Resets to 0 at the start of each batch query. |
nurand(A, x, y) | Per-worker | The TPC-C constant C is generated once per worker per A value and stays fixed for the worker’s lifetime. |
ref_diff(name) | Per-query | Returns a unique row on each call within a query (no repeats). Index resets before the next query. |
ref_exp(name, rate) | None | Fresh random row on every call (exponential distribution) |
ref_lognorm(name, mu, sigma) | None | Fresh random row on every call (log-normal distribution) |
ref_norm(name, mean, stddev) | None | Fresh random row on every call (normal distribution) |
ref_perm(name) | Per-worker | Picks a row on first call and returns that same row for the entire lifetime of the worker. Never resets. |
ref_rand(name) | None | Fresh random row on every call |
ref_same(name) | Per-query | Picks a row on first call within a query; all subsequent ref_same calls for the same dataset within that query return the same row. Cleared before the next query. |
ref_zipf(name, s, v) | None | Fresh random row on every call (Zipfian distribution) |
result() / results() | Per-query | Returns the last query’s result rows. Only available in post_print expressions. Set after each type: query execution; cleared after each type: exec. |
seq_global(name) | Global | Single counter shared across all workers via atomic increment. Values are globally unique. Configured in the seq config section. |
seq_rand | Global | Pick from already-generated sequence values using the named distribution. The valid value set grows as seq_global advances the counter. No values are stored in memory. |
seq_zipf / seq_norm /seq_exp / seq_lognorm | Global | Same as seq_rand but with shaped distributions. |
seq(start, step) | Per-worker | Counter starts at 0 for each worker and increments on every call. Two workers both calling seq(1, 1) will produce the same sequence independently – values are not globally unique. |
uniq(expression [, ...]) | Per-query | Tracks seen values (or composite tuples) across all rows within a query. Composite calls are cached per-row so multiple arg positions share the same tuple. Resets between queries. |
vector / vector_zipf /vector_norm | Per-worker | Cluster centroids are generated on first call (keyed by dims+clusters) and reused for the worker’s lifetime. Each call picks a centroid (uniform, Zipfian, or normal) and adds noise. |