Queries#
Sections#
Each section (up, seed, deseed, down, init, run) contains a list of named queries:
up:
- name: create_users
query: |-
CREATE TABLE IF NOT EXISTS users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email STRING NOT NULL
)
seed:
- name: populate_users
type: exec_batch
count: 1000
size: 100
args:
- gen('email')
query: |-
INSERT INTO users (email)
__values__upanddownmanage schema (CREATE/DROP).seedanddeseedmanage data (INSERT/TRUNCATE).initruns once per worker before the workload starts, typically to fetch reference data for use inrunqueries.runcontains the workload queries executed in a loop. Queries can be standalone or grouped into transactions.
Args#
The args field provides values for query placeholders ($1, $2, etc.). Each expression is evaluated at runtime using the full expression environment (globals, reference data, ref_* functions, generators, and more).
Args can be written in two forms: positional (list) or named (map). Both bind to $1, $2, $3, etc. in declaration order. The difference is how you reference previously computed args within expressions.
Positional args#
The default form is a list. Reference earlier args by zero-based index with arg(0), arg(1), etc.:
args:
- gen('email')
- ref_same('regions').name
- set_rand(ref_same('regions').cities, [])
- uniform(1, 500)
- arg(0) + " (" + arg(1) + ")" # depends on args 0 and 1Named args#
The map form gives each arg a name. Reference earlier args by name with arg('name'):
args:
email: gen('email')
region: ref_same('regions').name
city: set_rand(ref_same('regions').cities, [])
amount: uniform(1, 500)
label_named: arg('email') + " (" + arg('region') + ")"
label_pos: arg(0) + " (" + arg(1) + ")" # Produces the same as label_named.Named args bind to placeholders in declaration order (email -> $1, region -> $2, etc.), so query SQL is identical to the positional form. Index-based access still works (arg(0) and arg('email') return the same value).
Named and positional forms are mutually exclusive per query. Use one or the other.
See _examples/named_args/ for a complete working example.
Inline Args#
Instead of separating args into their own field, you can inline expressions directly in the query using ${expr} syntax. Each ${...} is extracted and replaced with the appropriate $N placeholder at load time. Inline args are appended after any explicit args entries.
Before (explicit args):
- name: get_user
args: [ref_rand('fetch_users').id]
query: SELECT * FROM users WHERE id = $1After (inline args):
- name: get_user
query: SELECT * FROM users WHERE id = ${ref_rand('fetch_users').id}Both produce the same query at runtime. Inline args work in all sections and can be mixed with explicit args:
- name: update_user
args:
- gen('name')
query: |-
UPDATE users SET name = $1
WHERE id = ${ref_rand('fetch_users').id}Here $1 is bound to the explicit arg (gen('name')) and $2 is bound to the inline ${ref_rand('fetch_users').id}.
Query Types#
| Type | Description |
|---|---|
query (default) | Executes the SQL and reads result rows. Results are stored in separate memory for each worker by query name, making them available to ref_* functions. |
exec | Executes the SQL without reading results. Use for DDL, DML that returns no rows, or when results aren’t needed. |
query_batch | Like query, but evaluates args repeatedly (controlled by count and size) and collects values into unit-separator-delimited (ASCII 31) strings per arg position. Each batch becomes a separate query execution whose results are stored. |
exec_batch | Like exec, but evaluates args repeatedly (controlled by count and size) and collects values into unit-separator-delimited (ASCII 31) strings per arg position. Each batch becomes a separate exec. |
Type Inference#
The type field is optional. When omitted, edg infers the type from the query text:
| SQL Prefix | Inferred Type |
|---|---|
SELECT | query |
INSERT, UPDATE, DELETE, CREATE, DROP, ALTER, TRUNCATE, UPSERT | exec |
For MongoDB JSON commands, the first key determines the type:
| First Key | Inferred Type |
|---|---|
find, aggregate, count, distinct | query |
insert, update, delete, create, drop, createIndexes | exec |
If a write statement (INSERT, UPDATE, DELETE, UPSERT) contains a RETURNING clause (Postgres, CockroachDB, SQLite) or an OUTPUT INSERTED/OUTPUT DELETED clause (MSSQL), the inferred type is query instead of exec. Oracle’s RETURNING ... INTO is excluded because it captures into PL/SQL variables rather than returning rows to the driver.
When count and size are both set, the inferred type is auto-promoted to its batch variant (exec -> exec_batch, query -> query_batch).
An explicit type: always takes precedence.
Batch Fields#
The query_batch and exec_batch types use two additional fields to control how args are generated and grouped:
| Field | Description |
|---|---|
count | Total number of rows to generate. Evaluated as an expression, so it can reference globals. |
size | Number of rows per batch. If omitted or zero, defaults to count (single batch). Also evaluated as an expression. |
batch_format | Controls how batch values are serialized when using driver-specific batch expansion. Default uses the ASCII unit separator (char 31, \x1f). Set to json for JSON arrays (used with MSSQL OPENJSON). Not needed when using __values__. |
Each arg expression is evaluated once per row. For example, with count: 1000 and size: 100, you get 10 batches of 100 rows each.
Multi-Row VALUES (__values__ and __columns__) - Recommended#
The __values__ token generates a standard multi-row VALUES clause. This is the recommended approach for batch inserts. It produces a single INSERT statement per batch and works the same way across all SQL drivers that support multi-row VALUES (pgx, mysql, mssql, spanner, dsql).
The __columns__ token auto-expands to the column list from the query’s object: object, in declaration order. Use both together to avoid duplicating column names between the object definition and the query:
objects:
contact:
name: gen('name')
email: gen('email')
seed:
- name: populate_contacts
count: 1000
size: 100
object: contact
query: |-
INSERT INTO contact __columns__ __values__This expands to INSERT INTO contact (name, email) VALUES (...) at load time. The args are auto-populated from the object fields and the type is inferred as exec_batch from the INSERT prefix plus count/size.
__columns__requiresobject:to be set. Using__columns__withoutobject:is a validation error.
You can also write the explicit form without __columns__:
seed:
- name: populate_contacts
count: 1000
size: 100
args:
- gen('name')
- gen('email')
query: |-
INSERT INTO contact (name, email)
__values__Which resolves to the following query automatically by edg:
INSERT INTO contact (name, email)
VALUES ('Alice', 'alice@example.com'), ('Bob', 'bob@example.com'), ...Each batch of size rows produces exactly one INSERT statement. With count: 1000 and size: 100, edg sends 10 INSERT statements, each containing 100 value tuples.
__values__ also works with upsert and update patterns:
Upsert (PostgreSQL / CockroachDB):
query: |-
INSERT INTO product (name, price)
__values__
ON CONFLICT (name) DO UPDATE SET price = EXCLUDED.priceUpsert (MySQL):
query: |-
INSERT INTO product (name, price)
__values__
ON DUPLICATE KEY UPDATE price = VALUES(price)Upsert (MSSQL):
query: |-
MERGE INTO product AS target
USING (
__values__
) AS source(name, price)
ON target.name = source.name
WHEN MATCHED THEN UPDATE SET price = source.price
WHEN NOT MATCHED THEN INSERT (name, price) VALUES (source.name, source.price);Update via CTE (PostgreSQL / CockroachDB):
query: |-
UPDATE product
SET price = v.price
FROM (
__values__
) AS v(id, price)
WHERE product.id = v.id::UUID
__values__works with pgx, mysql, mssql, spanner, and dsql. SQL Server limits multi-row VALUES to 1000 rows per INSERT; setsizeaccordingly. For MongoDB and Cassandra, use their native batch patterns described below.
Oracle (INSERT ALL)#
Oracle does not support multi-row VALUES. Instead, use the parameterized form __values__(table(col1, col2)) to generate an INSERT ALL ... SELECT 1 FROM DUAL statement:
seed:
- name: insert_products
type: exec_batch
count: 100
size: 10
args:
- gen('productname')
- uniform_f(1.00, 100.00, 2)
query: |-
INSERT ALL __values__(product(name, price))This produces:
INSERT ALL
INTO product (name, price) VALUES ('Widget', 9.99)
INTO product (name, price) VALUES ('Gadget', 24.50)
...
SELECT 1 FROM DUALThe table name and column list are extracted from the token parameters. Non-Oracle drivers ignore the parameters and use standard multi-row VALUES syntax.
See _examples/multi_row_dml/ for complete working examples across multiple databases.
Driver-Specific Batch Expansion#
As an alternative to __values__, edg supports driver-specific batch expansion patterns using __sep__ (the ASCII unit separator). These are more verbose but give you full control over the SQL shape (useful for non-INSERT contexts or advanced patterns).
Postgres / CockroachDB#
seed:
- name: populate_users
type: exec_batch
count: 1000
size: 100
args:
- gen('email')
query: |-
INSERT INTO users (email)
SELECT unnest(string_to_array('$1', __sep__))Which resolves to:
INSERT INTO users (email)
SELECT unnest(string_to_array(
'a@x.com\x1fb@y.com\x1f...\x1fz@x.com', chr(31)
))MySQL#
MySQL uses JSON_TABLE to unpack batch values. The unit-separator-delimited string is converted to a JSON array using CONCAT and REPLACE, then JSON_TABLE extracts each element as a row. Multiple columns are correlated using FOR ORDINALITY:
seed:
- name: populate_users
type: exec_batch
count: 1000
size: 100
args:
- gen('email')
query: |-
INSERT INTO users (email)
SELECT j.val
FROM JSON_TABLE(
CONCAT('["', REPLACE('$1', __sep__, '","'), '"]'),
'$[*]' COLUMNS(val VARCHAR(255) PATH '$')
) jWhich resolves to:
INSERT INTO users (email)
SELECT j.val
FROM JSON_TABLE(
CONCAT('["',
REPLACE(
'a@x.com\x1fb@y.com\x1f...\x1fz@x.com',
CHAR(31), '","'
),
'"]'),
'$[*]' COLUMNS(val VARCHAR(255) PATH '$')
) jOracle#
Oracle batch values are unpacked using xmltable with tokenize. Multiple columns are correlated by joining on rowid:
seed:
- name: populate_users
type: exec_batch
count: 3
size: 3
args:
- gen('name')
- gen('email')
query: |-
INSERT INTO users (name, email)
SELECT x1.value, x2.value
FROM xmltable(
'for $s in tokenize($v, __sep__) return <r>{$s}</r>'
PASSING '$1' AS "v"
COLUMNS value VARCHAR2(255) PATH '.'
) x1
JOIN xmltable(
'for $s in tokenize($v, __sep__) return <r>{$s}</r>'
PASSING '$2' AS "v"
COLUMNS value VARCHAR2(255) PATH '.'
) x2 ON x1.rowid = x2.rowidWhich resolves to:
INSERT INTO users (name, email)
SELECT x1.value, x2.value
FROM xmltable(
'for $s in tokenize($v, codepoints-to-string(31)) return <r>{$s}</r>'
PASSING 'Alice\x1fBob\x1fCharlie' AS "v"
COLUMNS value VARCHAR2(255) PATH '.'
) x1
JOIN xmltable(
'for $s in tokenize($v, codepoints-to-string(31)) return <r>{$s}</r>'
PASSING 'a@x.com\x1fb@y.com\x1fc@z.com' AS "v"
COLUMNS value VARCHAR2(255) PATH '.'
) x2 ON x1.rowid = x2.rowidMSSQL#
When batch_format is set to json, each arg position is serialized as a properly escaped JSON array (e.g. ["val1","val2","val3"]). Multiple OPENJSON calls are correlated using [key], which is the zero-based array index. NULL values appear as JSON null and can be handled with NULLIF(j.value, 'null') if the target column is nullable:
seed:
- name: populate_contacts
type: exec_batch
batch_format: json
count: 1000
size: 100
args:
- gen('name')
- gen('email')
query: |-
INSERT INTO contact (name, email)
SELECT j1.value, j2.value
FROM OPENJSON('$1') j1
JOIN OPENJSON('$2') j2 ON j1.[key] = j2.[key]Which resolves to:
INSERT INTO contact (name, email)
SELECT j1.value, j2.value
FROM OPENJSON('["Alice","Bob",...,"Zara"]') j1
JOIN OPENJSON('["a@x.com","b@y.com",...,"z@x.com"]') j2
ON j1.[key] = j2.[key]Spanner (GoogleSQL)#
Spanner uses GoogleSQL syntax. Batch values are unpacked with UNNEST(SPLIT(..., __sep__)). The __sep__ token resolves to CODE_POINTS_TO_STRING([31]) for Spanner, and UNNEST converts the resulting array into rows. Multiple columns are correlated using WITH OFFSET:
seed:
- name: populate_users
type: exec_batch
count: 1000
size: 100
args:
- gen('email')
query: |-
INSERT INTO users (email)
SELECT val
FROM UNNEST(SPLIT('$1', __sep__)) AS valWhich resolves to:
INSERT INTO users (email)
SELECT val
FROM UNNEST(SPLIT(
'a@x.com\x1fb@y.com\x1f...\x1fz@x.com',
CODE_POINTS_TO_STRING([31])
)) AS valFor multiple columns, use separate UNNEST calls joined with WITH OFFSET:
seed:
- name: populate_contacts
type: exec_batch
count: 1000
size: 100
args:
- gen('name')
- gen('email')
query: |-
INSERT INTO contact (name, email)
SELECT n, e
FROM UNNEST(SPLIT('$1', __sep__)) AS n WITH OFFSET o1
JOIN UNNEST(SPLIT('$2', __sep__)) AS e WITH OFFSET o2
ON o1 = o2Spanner limitations:
- No
TRUNCATE- useDELETE FROM table WHERE TRUEfor deseed operations.INSERT OR UPDATEfor upserts instead ofON CONFLICT.- Drop indexes before tables - Spanner requires all indexes on a table to be dropped before the table itself. Add
DROP INDEX IF EXISTS idx_nameentries in thedownsection before the correspondingDROP TABLE.- No
RAND()- useMOD(ABS(FARM_FINGERPRINT(GENERATE_UUID())), N)for random integers in range[0, N).- No
CHR()- useCODE_POINTS_TO_STRING([code_point])instead.- No
UNNEST(...) AS v(col1, col2, ...)- Spanner does not support column aliasing on UNNEST. Use__values__for batch inserts, orUNNEST(...) AS val WITH OFFSETfor single-column expansion.- Strict typing with bind params -
gen('number:...')returns float64, which Spanner rejects for INT64 columns when using bind params (@pN). Wrap inint():int(gen('number:1,100')).- String bind params - if a value needs to be STRING for Spanner but the expression returns a number, use inlined
$1/'$1'placeholders instead of@pNbind params.
Cassandra#
CQL has no equivalent of chr(31), so __sep__ resolves to the raw unit separator byte (\x1f). Cassandra does not need __sep__ in query text for batch expansion because exec_batch produces one statement per row automatically and edg wraps them in an unlogged batch. See the Cassandra section below for the recommended pattern.
For comparison with other drivers:
| Driver | __sep__ resolves to |
|---|---|
| PostgreSQL / CockroachDB | chr(31) |
| MySQL / MSSQL | CHAR(31) |
| Spanner | CODE_POINTS_TO_STRING([31]) |
| Oracle | codepoints-to-string(31) |
| Cassandra | raw \x1f byte |
MongoDB#
MongoDB queries use BSON/JSON command syntax instead of SQL. Placeholders ($1, $2, etc.) are inlined into the JSON command text. Batch operations insert multiple documents in a single command:
seed:
- name: insert_customers
type: exec_batch
count: 1000
args:
- gen('uuid')
- gen('email')
query: |-
{"insert": "customer", "documents": [{"_id": $1, "email": $2}]}Schema operations use JSON commands:
up:
- name: create_customer
type: exec
query: |-
{"create": "customer"}
down:
- name: drop_customer
type: exec
query: |-
{"drop": "customer"}Read queries use the find command:
init:
- name: fetch_customers
query: |-
{"find": "customer", "filter": {}}Cassandra#
Cassandra uses CQL (Cassandra Query Language). Placeholders ($1, $2, etc.) are converted to ? automatically. Batch operations use Cassandra’s unlogged batch internally. Unlike SQL drivers where __sep__ resolves to a database function like chr(31), the Cassandra driver uses the raw unit separator byte (\x1f) directly, since CQL has no equivalent function. The __sep__ token works in both standalone and transaction contexts. In transactions, each separated statement is added as an individual entry in the logged batch.
seed:
- name: insert_customers
type: exec_batch
count: 1000
size: 100
args:
- gen('uuid')
- gen('email')
query: |-
INSERT INTO edg.customer (id, email) VALUES ($1, $2)Which resolves to an unlogged batch containing one statement per row:
BEGIN UNLOGGED BATCH
INSERT INTO edg.customer (id, email) VALUES ('a1b2...', 'alice@x.com')
INSERT INTO edg.customer (id, email) VALUES ('c3d4...', 'bob@y.com')
...
APPLY BATCHWith count: 1000 and size: 100, edg sends 10 batches of 100 INSERT statements each.
Cassandra requires a keyspace to be created before tables:
up:
- name: create_keyspace
type: exec
query: |-
CREATE KEYSPACE IF NOT EXISTS edg
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}
- name: create_customer
type: exec
query: |-
CREATE TABLE IF NOT EXISTS edg.customer (
id UUID PRIMARY KEY,
email TEXT
)Prepared Statements#
Setting prepared: true on a run query causes the SQL statement to be prepared once per worker and reused across iterations. This reduces server-side parse overhead for high-throughput workloads by allowing the database to cache the query plan.
run:
- name: lookup_product
type: query
prepared: true
args:
- ref_rand('fetch_products').id
query: |-
SELECT id, name, price FROM product WHERE id = $1
- name: update_price
type: exec
prepared: true
args:
- ref_rand('fetch_products').id
- uniform_f(1, 100, 2)
query: |-
UPDATE product SET price = $2 WHERE id = $1Prepared statements work with both query and exec types. They are not used for batch types (query_batch, exec_batch) or queries that undergo batch expansion (via gen_batch, batch, or ref_each), since the SQL changes on each execution in those cases.
Each worker maintains its own statement cache, so prepared statements are safe to use with any number of concurrent workers. Statements are prepared lazily on first use and automatically closed when the worker finishes.
Prepared queries always use $1, $2, … placeholders regardless of the target driver. edg automatically translates them to the driver’s native format (? for mysql, :N for oracle, @pN for mssql/spanner) at prepare time.
The benefit scales with query complexity. Simple point lookups show minimal improvement, but multi-table joins and aggregations can see significant gains. For example, a 4-table join with GROUP BY, HAVING, and multiple aggregates against CockroachDB:
QUERY AVG p50 p95 p99
category_revenue 5.671ms 5.362ms 7.351ms 11.393ms
category_revenue_no_prepare 7.493ms 7.099ms 9.505ms 14.5ms
order_details 3.353ms 3.151ms 4.453ms 7.839ms
order_details_no_prepare 4.377ms 4.258ms 6.37ms 8.354msSee _examples/prepared/ for complete working examples across all supported databases.
Transactions#
The run section supports grouping multiple queries into an explicit database transaction. Queries inside a transaction block are wrapped in BEGIN/COMMIT and execute against the same database connection, so reads and writes within a transaction see each other’s results.
run:
- transaction: make_transfer
locals:
amount: gen('number:1,100')
queries:
- name: read_source
type: query
args:
- ref_diff('fetch_accounts').id
query: SELECT id, balance FROM account WHERE id = $1::UUID
- name: read_target
type: query
args:
- ref_diff('fetch_accounts').id
query: SELECT id, balance FROM account WHERE id = $1::UUID
- name: debit_source
type: exec
args:
- ref_same('read_source').id
- local('amount')
query: UPDATE account SET balance = balance - $2::FLOAT WHERE id = $1::UUID
- name: credit_target
type: exec
args:
- ref_same('read_target').id
- local('amount')
query: UPDATE account SET balance = balance + $2::FLOAT WHERE id = $1::UUIDThe locals map defines transaction-scoped variables. Each expression is evaluated once when the transaction begins, and the result is available to all queries via local('name'). This ensures the same value is used consistently across multiple queries. For example, the same transfer amount for both the debit and credit.
Local names must not collide with query names in the same transaction.
Transactions and standalone queries can coexist in the same run section:
run:
- name: check_balance
type: query
args:
- ref_rand('fetch_accounts').id
query: SELECT balance FROM account WHERE id = $1::UUID
- transaction: make_transfer
locals:
amount: gen('number:1,100')
queries:
- name: read_source
type: query
args:
- ref_diff('fetch_accounts').id
query: SELECT id, balance FROM account WHERE id = $1::UUID
- name: read_target
type: query
args:
- ref_diff('fetch_accounts').id
query: SELECT id, balance FROM account WHERE id = $1::UUID
- name: debit_source
type: exec
args:
- ref_same('read_source').id
- local('amount')
query: UPDATE account SET balance = balance - $2::FLOAT WHERE id = $1::UUID
- name: credit_target
type: exec
args:
- ref_same('read_target').id
- local('amount')
query: UPDATE account SET balance = balance + $2::FLOAT WHERE id = $1::UUIDWhen to use transactions#
Use transactions when your workload needs read-then-write patterns or multi-statement atomicity. For example:
- Read an account balance, then debit it (the read and write must see consistent data).
- Insert a row into two related tables that must either both succeed or both roll back.
- Simulate realistic application behaviour where multiple queries happen inside a single database transaction.
Use standalone queries when each operation is independent and doesn’t need transactional guarantees.
Conditional rollback#
A rollback_if element can be placed between queries in a transaction. When reached, its expression is evaluated; if it returns true, the transaction is rolled back immediately. This is not treated as an error, the worker continues to the next iteration.
run:
- name: check_balance
type: query
args:
- ref_rand('fetch_accounts').id
query: SELECT balance FROM account WHERE id = $1::UUID
- transaction: make_transfer
locals:
amount: gen('number:1,100')
queries:
- name: read_source
type: query
args: [ref_diff('fetch_accounts').id]
query: SELECT id, balance FROM account WHERE id = $1::UUID
- rollback_if: "ref_same('read_source').balance < local('amount')"
- name: debit_source
type: exec
args:
- ref_same('read_source').id
- local('amount')
query: UPDATE account SET balance = balance - $2::FLOAT WHERE id = $1::UUIDIn this example, the transfer amount is generated once as a local. After read_source runs, the condition checks whether the account balance can cover the transfer amount. If not, the transaction rolls back before debit_source executes. Multiple rollback_if elements can be placed at different points in the transaction to check different conditions.
The expression has access to all data in the environment, including results from queries that have already run within the transaction. A rollback_if element must not have name, type, args, or query fields.
For general-purpose branching (
if/then/elseandmatch/when/default), see the Conditionals page.
Constraints#
- Batch types not allowed:
query_batchandexec_batchcannot be used inside a transaction. - Prepared statements not allowed: Queries inside a transaction cannot set
prepared: true. - A transaction must contain at least one query.
- Transaction names and standalone query names share the same namespace for
run_weights. - The
rollback_ifexpression must evaluate to a boolean.
Non-transactional mode (--no-atomic-tx)#
Pass --no-atomic-tx (or set EDG_NO_ATOMIC_TX=true) to execute transaction queries without BEGIN/COMMIT. Queries still run sequentially with shared locals and ref_same context, but each statement commits independently. This is useful for comparing database behaviour with and without transactional guarantees.
# Transactional (default)
edg workload consistency all --driver pgx --url "..." -w 16 -d 30s
# Non-transactional
edg workload consistency all --driver pgx --url "..." -w 16 -d 30s --no-atomic-txWhen --no-atomic-tx is set:
- Queries execute against the connection pool directly (no
BEGIN/COMMIT). localsandref_samestill work and shared state is maintained in memory.rollback_ifconditions are skipped, allowing concurrency anomalies to manifest.
Error handling#
If any query inside a transaction fails, the transaction is rolled back. During the run phase, the error is logged and the worker continues to the next iteration (same as standalone query errors). Conditional rollbacks (via rollback_if) are not errors and do not appear in the error rate.
Cassandra transactions#
Cassandra does not support ACID transactions. When you use a transaction block with the Cassandra driver, edg uses a logged batch internally. This has important implications:
- Writes are atomic: All
execqueries in the transaction are accumulated into a single batch and sent to the database on commit. Either all writes succeed or none do. __sep__is supported: Queries containing__sep__are split into individual statements, each added to the logged batch as a separate entry. This lets you use batch-format queries inside transactions.- Reads bypass the batch: Any
querytype inside the transaction executes immediately against the session, outside the batch. Reads do not see uncommitted writes from the same transaction. - Rollback is a no-op: If a
rollback_ifcondition triggers or a query fails, the batch is simply discarded. There is no undo of already-executed reads.
rollback_ifhas no effect with the Cassandra driver. The condition is evaluated but the rollback does nothing. The unsent batch is silently discarded, and any reads that already executed cannot be undone. If your config includesrollback_if, edg will log a warning at startup. For conditional write logic, use lightweight transactions (LWT) withIFconditions in individual queries instead.
This makes Cassandra transactions suitable for grouping related writes that must succeed together, but not for read-then-write patterns that require isolation.
For example:
run:
- transaction: create_order
queries:
- name: insert_order
type: exec
args:
- gen('uuid')
- gen('email')
query: |-
INSERT INTO edg.orders (id, email) VALUES ($1, $2)
- name: insert_audit
type: exec
args:
- gen('uuid')
- const('order_created')
query: |-
INSERT INTO edg.audit_log (id, event) VALUES ($1, $2)Which resolves to a logged batch:
BEGIN BATCH
INSERT INTO edg.orders (id, email) VALUES ('a1b2...', 'alice@x.com')
INSERT INTO edg.audit_log (id, event) VALUES ('c3d4...', 'order_created')
APPLY BATCHCompare this to standalone exec_batch queries, which use an unlogged batch (BEGIN UNLOGGED BATCH). Transaction blocks always use a logged batch, guaranteeing atomicity across partitions.
MongoDB transactions#
MongoDB transactions use sessions and provide full ACID guarantees. Both reads and writes participate in the transaction and see each other’s results, matching the behaviour of SQL drivers.
MongoDB transactions require a replica set or sharded cluster. Standalone
mongodinstances do not support multi-document transactions. If you are testing locally, usemongod --replSet rs0or a Docker Compose setup with replica set initialisation.
Wait#
Queries can specify a wait duration (e.g. wait: 18s) to introduce a keying/think-time delay after execution. This only applies to queries in the run section and is ignored in other sections.
Print#
The print field accepts a list of expressions that are evaluated each iteration and aggregated across all workers for display in the progress and summary output.
When
Simple form#
A plain string entry auto-detects the value type. String values show frequency distributions (top 10 by count) and numeric values show min/avg/max.
run:
- name: insert_order
type: exec
args:
- gen('email')
- ref_same('regions').name
- set_rand(ref_same('regions').cities, [])
- uniform(1, 500)
print:
- ref_same('regions').name
- arg(3)
query: |-
INSERT INTO print_order (email, region, city, amount)
VALUES ($1, $2, $3, $4::DECIMAL)With named args, the amount print becomes more readable:
print:
- ref_same('regions').name
- arg('amount')Print expressions have access to the same context as query args: ref_same, ref_rand, arg(), global(), local(), and all built-in functions. Expressions using ref_same see the same row selected for the query args in that iteration.
Custom aggregation#
Use the map form with expr and agg fields for full control over how values are aggregated and displayed. The agg field is an expr expression evaluated against the accumulated state:
| Variable | Type | Description |
|---|---|---|
count | int | Total observations |
freq | map[string]int | Value frequency distribution |
min | float | Minimum numeric value |
max | float | Maximum numeric value |
avg | float | Mean of numeric values |
sum | float | Sum of numeric values |
These variables can be combined in any expr-lang expression to produce custom summary output:
PRINT VALUES
insert_order eu=1647 ap=1604 us=1585
insert_order min=248.87 avg=1.01 max=499.93 n=4836
insert_order paris=552 london=548 berlin=547 sydney=542 tokyo=535
insert_order avg $248 n=4836
insert_order total=$1203541
insert_order 1 - 499
insert_order 9 unique
read_order us=1699 eu=1602 ap=1531
read_order 4832See _examples/print/ for a complete working example.
Post Print#
The post_print field works like print but evaluates after the query executes. This gives access to result(), which returns the first row of a type: query SELECT result as a map.
run:
- name: check_total_balance
type: query
query: |-
SELECT SUM(balance) AS total, COUNT(*) AS num_accounts
FROM account
post_print:
- expr: result().total
agg: "string(int(min)) + '..' + string(int(max)) + ' n=' + string(count)"
- expr: result().num_accounts
agg: "string(int(min)) + '..' + string(int(max))"result() returns a map with column names as keys. Access columns with dot notation: result().total, result().num_accounts. If no result is available (e.g. the query is type: exec), result() returns an error.
results() - all rows#
results() returns every row from the SELECT result as a slice of maps ([]map). Combine it with expr-lang builtins to aggregate, filter, or transform across all rows:
run:
- name: verify_balances
type: query
query: |-
SELECT id, balance FROM account
post_print:
# total rows returned
- expr: len(results())
# sum all balances
- expr: reduce(results(), #acc + #.balance, 0)
# find largest balance
- expr: reduce(results(), cond(#.balance > #acc, #.balance, #acc), 0)
# count overdrawn accounts
- expr: len(filter(results(), #.balance < 0))
# collect ids of high-value accounts
- expr: join(map(filter(results(), #.balance > 5000), string(int(#.id))), ",")post_print supports the same simple and custom aggregation forms as print. Both fields can be used on the same query - print for input distributions, post_print for output observations:
PRINT VALUES
check_total_balance 10000..10000 n=470
check_total_balance 10..10Placeholders#
Arg placeholders ($1, $2, etc.) are passed to the database in one of two ways: inlined or as bind params.
Inlining#
Inlining means edg performs a text replacement on the SQL string before sending it to the database. Every $N in the query (or __values__ token) is replaced with the literal arg values. For example, with __values__:
type: exec_batch
count: 3
args:
- gen('email')
query: |-
INSERT INTO users (email)
__values__The SQL sent to the database becomes:
INSERT INTO users (email)
VALUES ('alice@x.com'), ('bob@y.com'), ('carol@z.com')With driver-specific batch expansion, $N placeholders are inlined similarly:
args:
- gen_batch(1000, 100, 'email')
query: |-
INSERT INTO users (email)
SELECT unnest(string_to_array('$1', __sep__))If $1 evaluates to alice@x.com\x1fbob@y.com\x1f..., the SQL sent to the database becomes:
INSERT INTO users (email)
SELECT unnest(string_to_array('alice@x.com\x1fbob@y.com\x1f...', chr(31)))The database never sees $1 or __values__, it receives a fully formed query with the values baked in. This is used for:
__values__token replacement (recommended).query_batch/exec_batchtypes with$Nplaceholders (always inlined).- Batch-expanded queries using
gen_batch,batch, orref_each(in any section).
Inlining lets you use $N as a universal placeholder syntax across all drivers (pgx, MySQL, Oracle, SQL Server) without worrying about driver-specific bind param formats. It also avoids a pgx-stdlib issue where numeric values are sent as DECIMAL, which CockroachDB can’t mix with INT in arithmetic.
Bind params#
All other queries use native driver bind parameters. The placeholder stays in the SQL and the values are sent separately, allowing the database to cache query plans and avoid re-parsing.
Each driver has its own placeholder format:
| Driver | Placeholder format |
|---|---|
pgx (PostgreSQL / CockroachDB) | $1, $2, $3 |
dsql (Aurora DSQL) | $1, $2, $3 |
mysql | ? (positional) |
oracle | :1, :2, :3 |
mssql | @p1, @p2, @p3 |
spanner | @p1, @p2, @p3 |
mongodb | $1, $2, $3 (inlined into JSON commands) |
cassandra | ? (positional) |
Since run queries always use bind params, their SQL must use the correct format for the target driver.
Column Name Normalisation#
When a query or query_batch result is stored, all column names are lowercased before being added to the environment. This means a SQL column W_ID becomes accessible as ref_rand('fetch_warehouses').w_id, not .W_ID.