Queries#

Sections#

Each section (up, seed, deseed, down, init, run) contains a list of named queries:

up:
  - name: create_users
    query: |-
      CREATE TABLE IF NOT EXISTS users (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        email STRING NOT NULL
      )

seed:
  - name: populate_users
    type: exec_batch
    count: 1000
    size: 100
    args:
      - gen('email')
    query: |-
      INSERT INTO users (email)
      __values__

up and down manage schema (CREATE/DROP).
seed and deseed manage data (INSERT/TRUNCATE).
init runs once per worker before the workload starts, typically to fetch reference data for use in run queries.
run contains the workload queries executed in a loop. Queries can be standalone or grouped into transactions.

Args#

The args field provides values for query placeholders ($1, $2, etc.). Each expression is evaluated at runtime using the full expression environment (globals, reference data, ref_* functions, generators, and more).

Args can be written in two forms: positional (list) or named (map). Both bind to $1, $2, $3, etc. in declaration order. The difference is how you reference previously computed args within expressions.

Positional args#

The default form is a list. Reference earlier args by zero-based index with arg(0), arg(1), etc.:

args:
  - gen('email')
  - ref_same('regions').name
  - set_rand(ref_same('regions').cities, [])
  - uniform(1, 500)
  - arg(0) + " (" + arg(1) + ")" # depends on args 0 and 1

Named args#

The map form gives each arg a name. Reference earlier args by name with arg('name'):

args:
  email: gen('email')
  region: ref_same('regions').name
  city: set_rand(ref_same('regions').cities, [])
  amount: uniform(1, 500)
  label_named: arg('email') + " (" + arg('region') + ")"
  label_pos: arg(0) + " (" + arg(1) + ")" # Produces the same as label_named.

Named args bind to placeholders in declaration order (email -> $1, region -> $2, etc.), so query SQL is identical to the positional form. Index-based access still works (arg(0) and arg('email') return the same value).

Named and positional forms are mutually exclusive per query. Use one or the other.

See _examples/named_args/ for a complete working example.

Inline Args#

Instead of separating args into their own field, you can inline expressions directly in the query using ${expr} syntax. Each ${...} is extracted and replaced with the appropriate $N placeholder at load time. Inline args are appended after any explicit args entries.

Before (explicit args):

- name: get_user
  args: [ref_rand('fetch_users').id]
  query: SELECT * FROM users WHERE id = $1

After (inline args):

- name: get_user
  query: SELECT * FROM users WHERE id = ${ref_rand('fetch_users').id}

Both produce the same query at runtime. Inline args work in all sections and can be mixed with explicit args:

- name: update_user
  args:
    - gen('name')
  query: |-
    UPDATE users SET name = $1
    WHERE id = ${ref_rand('fetch_users').id}

Here $1 is bound to the explicit arg (gen('name')) and $2 is bound to the inline ${ref_rand('fetch_users').id}.

Query Types#

Type	Description
`query` (default)	Executes the SQL and reads result rows. Results are stored in separate memory for each worker by query name, making them available to `ref_*` functions.
`exec`	Executes the SQL without reading results. Use for DDL, DML that returns no rows, or when results aren’t needed.
`query_batch`	Like `query`, but evaluates args repeatedly (controlled by `count` and `size`) and collects values into unit-separator-delimited (ASCII 31) strings per arg position. Each batch becomes a separate query execution whose results are stored.
`exec_batch`	Like `exec`, but evaluates args repeatedly (controlled by `count` and `size`) and collects values into unit-separator-delimited (ASCII 31) strings per arg position. Each batch becomes a separate exec.

Type Inference#

The type field is optional. When omitted, edg infers the type from the query text:

SQL Prefix	Inferred Type
`SELECT`	`query`
`INSERT`, `UPDATE`, `DELETE`, `CREATE`, `DROP`, `ALTER`, `TRUNCATE`, `UPSERT`	`exec`

For MongoDB JSON commands, the first key determines the type:

First Key	Inferred Type
`find`, `aggregate`, `count`, `distinct`	`query`
`insert`, `update`, `delete`, `create`, `drop`, `createIndexes`	`exec`

If a write statement (INSERT, UPDATE, DELETE, UPSERT) contains a RETURNING clause (Postgres, CockroachDB, SQLite) or an OUTPUT INSERTED/OUTPUT DELETED clause (MSSQL), the inferred type is query instead of exec. Oracle’s RETURNING ... INTO is excluded because it captures into PL/SQL variables rather than returning rows to the driver.

When count and size are both set, the inferred type is auto-promoted to its batch variant (exec -> exec_batch, query -> query_batch).

An explicit type: always takes precedence.

Batch Fields#

The query_batch and exec_batch types use two additional fields to control how args are generated and grouped:

Field	Description
`count`	Total number of rows to generate. Evaluated as an expression, so it can reference globals.
`size`	Number of rows per batch. If omitted or zero, defaults to `count` (single batch). Also evaluated as an expression.
`batch_format`	Controls how batch values are serialized when using driver-specific batch expansion. Default uses the ASCII unit separator (char 31, `\x1f`). Set to `json` for JSON arrays (used with MSSQL `OPENJSON`). Not needed when using `__values__`.

Each arg expression is evaluated once per row. For example, with count: 1000 and size: 100, you get 10 batches of 100 rows each.

Multi-Row VALUES (`values` and `columns`) - Recommended#

The __values__ token generates a standard multi-row VALUES clause. This is the recommended approach for batch inserts. It produces a single INSERT statement per batch and works the same way across all SQL drivers that support multi-row VALUES (pgx, mysql, mssql, spanner, dsql).

The __columns__ token auto-expands to the column list from the query’s object: object, in declaration order. Use both together to avoid duplicating column names between the object definition and the query:

objects:
  contact:
    name: gen('name')
    email: gen('email')

seed:
  - name: populate_contacts
    count: 1000
    size: 100
    object: contact
    query: |-
      INSERT INTO contact __columns__ __values__

This expands to INSERT INTO contact (name, email) VALUES (...) at load time. The args are auto-populated from the object fields and the type is inferred as exec_batch from the INSERT prefix plus count/size.

__columns__ requires object: to be set. Using __columns__ without object: is a validation error.

You can also write the explicit form without __columns__:

seed:
  - name: populate_contacts
    count: 1000
    size: 100
    args:
      - gen('name')
      - gen('email')
    query: |-
      INSERT INTO contact (name, email)
      __values__

Which resolves to the following query automatically by edg:

INSERT INTO contact (name, email)
VALUES ('Alice', 'alice@example.com'), ('Bob', 'bob@example.com'), ...

Each batch of size rows produces exactly one INSERT statement. With count: 1000 and size: 100, edg sends 10 INSERT statements, each containing 100 value tuples.

__values__ also works with upsert and update patterns:

Upsert (PostgreSQL / CockroachDB):

query: |-
  INSERT INTO product (name, price)
  __values__
  ON CONFLICT (name) DO UPDATE SET price = EXCLUDED.price

Upsert (MySQL):

query: |-
  INSERT INTO product (name, price)
  __values__
  ON DUPLICATE KEY UPDATE price = VALUES(price)

Upsert (MSSQL):

query: |-
  MERGE INTO product AS target
  USING (
    __values__
  ) AS source(name, price)
  ON target.name = source.name
  WHEN MATCHED THEN UPDATE SET price = source.price
  WHEN NOT MATCHED THEN INSERT (name, price) VALUES (source.name, source.price);

Update via CTE (PostgreSQL / CockroachDB):

query: |-
  UPDATE product
  SET price = v.price
  FROM (
    __values__
  ) AS v(id, price)
  WHERE product.id = v.id::UUID

__values__ works with pgx, mysql, mssql, spanner, and dsql. SQL Server limits multi-row VALUES to 1000 rows per INSERT; set size accordingly. For MongoDB and Cassandra, use their native batch patterns described below.

Oracle (`INSERT ALL`)#

Oracle does not support multi-row VALUES. Instead, use the parameterized form __values__(table(col1, col2)) to generate an INSERT ALL ... SELECT 1 FROM DUAL statement:

seed:
  - name: insert_products
    type: exec_batch
    count: 100
    size: 10
    args:
      - gen('productname')
      - uniform_f(1.00, 100.00, 2)
    query: |-
      INSERT ALL __values__(product(name, price))

This produces:

INSERT ALL
INTO product (name, price) VALUES ('Widget', 9.99)
INTO product (name, price) VALUES ('Gadget', 24.50)
...
SELECT 1 FROM DUAL

The table name and column list are extracted from the token parameters. Non-Oracle drivers ignore the parameters and use standard multi-row VALUES syntax.

See _examples/multi_row_dml/ for complete working examples across multiple databases.

Driver-Specific Batch Expansion#

As an alternative to __values__, edg supports driver-specific batch expansion patterns using __sep__ (the ASCII unit separator). These are more verbose but give you full control over the SQL shape (useful for non-INSERT contexts or advanced patterns).

Postgres / CockroachDB#

seed:
  - name: populate_users
    type: exec_batch
    count: 1000
    size: 100
    args:
      - gen('email')
    query: |-
      INSERT INTO users (email)
      SELECT unnest(string_to_array('$1', __sep__))

Which resolves to:

INSERT INTO users (email)
SELECT unnest(string_to_array(
  'a@x.com\x1fb@y.com\x1f...\x1fz@x.com', chr(31)
))

MySQL#

MySQL uses JSON_TABLE to unpack batch values. The unit-separator-delimited string is converted to a JSON array using CONCAT and REPLACE, then JSON_TABLE extracts each element as a row. Multiple columns are correlated using FOR ORDINALITY:

seed:
  - name: populate_users
    type: exec_batch
    count: 1000
    size: 100
    args:
      - gen('email')
    query: |-
      INSERT INTO users (email)
      SELECT j.val
      FROM JSON_TABLE(
        CONCAT('["', REPLACE('$1', __sep__, '","'), '"]'),
        '$[*]' COLUMNS(val VARCHAR(255) PATH '$')
      ) j

Which resolves to:

INSERT INTO users (email)
SELECT j.val
FROM JSON_TABLE(
  CONCAT('["',
    REPLACE(
      'a@x.com\x1fb@y.com\x1f...\x1fz@x.com',
      CHAR(31), '","'
    ),
  '"]'),
  '$[*]' COLUMNS(val VARCHAR(255) PATH '$')
) j

Oracle#

Oracle batch values are unpacked using xmltable with tokenize. Multiple columns are correlated by joining on rowid:

seed:
  - name: populate_users
    type: exec_batch
    count: 3
    size: 3
    args:
      - gen('name')
      - gen('email')
    query: |-
      INSERT INTO users (name, email)
      SELECT x1.value, x2.value
      FROM xmltable(
             'for $s in tokenize($v, __sep__) return <r>{$s}</r>'
             PASSING '$1' AS "v"
             COLUMNS value VARCHAR2(255) PATH '.'
           ) x1
      JOIN xmltable(
             'for $s in tokenize($v, __sep__) return <r>{$s}</r>'
             PASSING '$2' AS "v"
             COLUMNS value VARCHAR2(255) PATH '.'
           ) x2 ON x1.rowid = x2.rowid

Which resolves to:

INSERT INTO users (name, email)
SELECT x1.value, x2.value
FROM xmltable(
  'for $s in tokenize($v, codepoints-to-string(31)) return <r>{$s}</r>'
  PASSING 'Alice\x1fBob\x1fCharlie' AS "v"
  COLUMNS value VARCHAR2(255) PATH '.'
) x1
JOIN xmltable(
  'for $s in tokenize($v, codepoints-to-string(31)) return <r>{$s}</r>'
  PASSING 'a@x.com\x1fb@y.com\x1fc@z.com' AS "v"
  COLUMNS value VARCHAR2(255) PATH '.'
) x2 ON x1.rowid = x2.rowid

MSSQL#

When batch_format is set to json, each arg position is serialized as a properly escaped JSON array (e.g. ["val1","val2","val3"]). Multiple OPENJSON calls are correlated using [key], which is the zero-based array index. NULL values appear as JSON null and can be handled with NULLIF(j.value, 'null') if the target column is nullable:

seed:
  - name: populate_contacts
    type: exec_batch
    batch_format: json
    count: 1000
    size: 100
    args:
      - gen('name')
      - gen('email')
    query: |-
      INSERT INTO contact (name, email)
      SELECT j1.value, j2.value
      FROM OPENJSON('$1') j1
      JOIN OPENJSON('$2') j2 ON j1.[key] = j2.[key]

Which resolves to:

INSERT INTO contact (name, email)
SELECT j1.value, j2.value
FROM OPENJSON('["Alice","Bob",...,"Zara"]') j1
JOIN OPENJSON('["a@x.com","b@y.com",...,"z@x.com"]') j2
  ON j1.[key] = j2.[key]

Spanner (GoogleSQL)#

Spanner uses GoogleSQL syntax. Batch values are unpacked with UNNEST(SPLIT(..., __sep__)). The __sep__ token resolves to CODE_POINTS_TO_STRING([31]) for Spanner, and UNNEST converts the resulting array into rows. Multiple columns are correlated using WITH OFFSET:

seed:
  - name: populate_users
    type: exec_batch
    count: 1000
    size: 100
    args:
      - gen('email')
    query: |-
      INSERT INTO users (email)
      SELECT val
      FROM UNNEST(SPLIT('$1', __sep__)) AS val

Which resolves to:

INSERT INTO users (email)
SELECT val
FROM UNNEST(SPLIT(
  'a@x.com\x1fb@y.com\x1f...\x1fz@x.com',
  CODE_POINTS_TO_STRING([31])
)) AS val

For multiple columns, use separate UNNEST calls joined with WITH OFFSET:

seed:
  - name: populate_contacts
    type: exec_batch
    count: 1000
    size: 100
    args:
      - gen('name')
      - gen('email')
    query: |-
      INSERT INTO contact (name, email)
      SELECT n, e
      FROM UNNEST(SPLIT('$1', __sep__)) AS n WITH OFFSET o1
      JOIN UNNEST(SPLIT('$2', __sep__)) AS e WITH OFFSET o2
        ON o1 = o2

Spanner limitations:
No TRUNCATE - use DELETE FROM table WHERE TRUE for deseed operations.
INSERT OR UPDATE for upserts instead of ON CONFLICT.
Drop indexes before tables - Spanner requires all indexes on a table to be dropped before the table itself. Add DROP INDEX IF EXISTS idx_name entries in the down section before the corresponding DROP TABLE.
No RAND() - use MOD(ABS(FARM_FINGERPRINT(GENERATE_UUID())), N) for random integers in range [0, N).
No CHR() - use CODE_POINTS_TO_STRING([code_point]) instead.
No UNNEST(...) AS v(col1, col2, ...) - Spanner does not support column aliasing on UNNEST. Use __values__ for batch inserts, or UNNEST(...) AS val WITH OFFSET for single-column expansion.
Strict typing with bind params - gen('number:...') returns float64, which Spanner rejects for INT64 columns when using bind params (@pN). Wrap in int(): int(gen('number:1,100')).
String bind params - if a value needs to be STRING for Spanner but the expression returns a number, use inlined $1/'$1' placeholders instead of @pN bind params.

Cassandra#

CQL has no equivalent of chr(31), so __sep__ resolves to the raw unit separator byte (\x1f). Cassandra does not need __sep__ in query text for batch expansion because exec_batch produces one statement per row automatically and edg wraps them in an unlogged batch. See the Cassandra section below for the recommended pattern.

For comparison with other drivers:

Driver	`__sep__` resolves to
PostgreSQL / CockroachDB	`chr(31)`
MySQL / MSSQL	`CHAR(31)`
Spanner	`CODE_POINTS_TO_STRING([31])`
Oracle	`codepoints-to-string(31)`
Cassandra	raw `\x1f` byte

MongoDB#

MongoDB queries use BSON/JSON command syntax instead of SQL. Placeholders ($1, $2, etc.) are inlined into the JSON command text. Batch operations insert multiple documents in a single command:

seed:
  - name: insert_customers
    type: exec_batch
    count: 1000
    args:
      - gen('uuid')
      - gen('email')
    query: |-
      {"insert": "customer", "documents": [{"_id": $1, "email": $2}]}

Schema operations use JSON commands:

up:
  - name: create_customer
    type: exec
    query: |-
      {"create": "customer"}

down:
  - name: drop_customer
    type: exec
    query: |-
      {"drop": "customer"}

Read queries use the find command:

init:
  - name: fetch_customers
    query: |-
      {"find": "customer", "filter": {}}

Cassandra#

Cassandra uses CQL (Cassandra Query Language). Placeholders ($1, $2, etc.) are converted to ? automatically. Batch operations use Cassandra’s unlogged batch internally. Unlike SQL drivers where __sep__ resolves to a database function like chr(31), the Cassandra driver uses the raw unit separator byte (\x1f) directly, since CQL has no equivalent function. The __sep__ token works in both standalone and transaction contexts. In transactions, each separated statement is added as an individual entry in the logged batch.

seed:
  - name: insert_customers
    type: exec_batch
    count: 1000
    size: 100
    args:
      - gen('uuid')
      - gen('email')
    query: |-
      INSERT INTO edg.customer (id, email) VALUES ($1, $2)

Which resolves to an unlogged batch containing one statement per row:

BEGIN UNLOGGED BATCH
  INSERT INTO edg.customer (id, email) VALUES ('a1b2...', 'alice@x.com')
  INSERT INTO edg.customer (id, email) VALUES ('c3d4...', 'bob@y.com')
  ...
APPLY BATCH

With count: 1000 and size: 100, edg sends 10 batches of 100 INSERT statements each.

Cassandra requires a keyspace to be created before tables:

up:
  - name: create_keyspace
    type: exec
    query: |-
      CREATE KEYSPACE IF NOT EXISTS edg
      WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}

  - name: create_customer
    type: exec
    query: |-
      CREATE TABLE IF NOT EXISTS edg.customer (
        id UUID PRIMARY KEY,
        email TEXT
      )

Prepared Statements#

Setting prepared: true on a run query causes the SQL statement to be prepared once per worker and reused across iterations. This reduces server-side parse overhead for high-throughput workloads by allowing the database to cache the query plan.

run:
  - name: lookup_product
    type: query
    prepared: true
    args:
      - ref_rand('fetch_products').id
    query: |-
      SELECT id, name, price FROM product WHERE id = $1

  - name: update_price
    type: exec
    prepared: true
    args:
      - ref_rand('fetch_products').id
      - uniform_f(1, 100, 2)
    query: |-
      UPDATE product SET price = $2 WHERE id = $1

Prepared statements work with both query and exec types. They are not used for batch types (query_batch, exec_batch) or queries that undergo batch expansion (via gen_batch, batch, or ref_each), since the SQL changes on each execution in those cases.

Each worker maintains its own statement cache, so prepared statements are safe to use with any number of concurrent workers. Statements are prepared lazily on first use and automatically closed when the worker finishes.

Prepared queries always use $1, $2, … placeholders regardless of the target driver. edg automatically translates them to the driver’s native format (? for mysql, :N for oracle, @pN for mssql/spanner) at prepare time.

The benefit scales with query complexity. Simple point lookups show minimal improvement, but multi-table joins and aggregations can see significant gains. For example, a 4-table join with GROUP BY, HAVING, and multiple aggregates against CockroachDB:

QUERY                        AVG      p50      p95      p99
category_revenue             5.671ms  5.362ms  7.351ms  11.393ms
category_revenue_no_prepare  7.493ms  7.099ms  9.505ms  14.5ms
order_details                3.353ms  3.151ms  4.453ms  7.839ms
order_details_no_prepare     4.377ms  4.258ms  6.37ms   8.354ms

See _examples/prepared/ for complete working examples across all supported databases.

Transactions#

The run section supports grouping multiple queries into an explicit database transaction. Queries inside a transaction block are wrapped in BEGIN/COMMIT and execute against the same database connection, so reads and writes within a transaction see each other’s results.

run:
  - transaction: make_transfer
    locals:
      amount: gen('number:1,100')
    queries:
      - name: read_source
        type: query
        args:
          - ref_diff('fetch_accounts').id
        query: SELECT id, balance FROM account WHERE id = $1::UUID

      - name: read_target
        type: query
        args:
          - ref_diff('fetch_accounts').id
        query: SELECT id, balance FROM account WHERE id = $1::UUID

      - name: debit_source
        type: exec
        args:
          - ref_same('read_source').id
          - local('amount')
        query: UPDATE account SET balance = balance - $2::FLOAT WHERE id = $1::UUID

      - name: credit_target
        type: exec
        args:
          - ref_same('read_target').id
          - local('amount')
        query: UPDATE account SET balance = balance + $2::FLOAT WHERE id = $1::UUID

The locals map defines transaction-scoped variables. Each expression is evaluated once when the transaction begins, and the result is available to all queries via local('name'). This ensures the same value is used consistently across multiple queries. For example, the same transfer amount for both the debit and credit.

Local names must not collide with query names in the same transaction.

Transactions and standalone queries can coexist in the same run section:

run:

  - name: check_balance
    type: query
    args:
      - ref_rand('fetch_accounts').id
    query: SELECT balance FROM account WHERE id = $1::UUID

  - transaction: make_transfer
    locals:
      amount: gen('number:1,100')
    queries:
      - name: read_source
        type: query
        args:
          - ref_diff('fetch_accounts').id
        query: SELECT id, balance FROM account WHERE id = $1::UUID

      - name: read_target
        type: query
        args:
          - ref_diff('fetch_accounts').id
        query: SELECT id, balance FROM account WHERE id = $1::UUID

      - name: debit_source
        type: exec
        args:
          - ref_same('read_source').id
          - local('amount')
        query: UPDATE account SET balance = balance - $2::FLOAT WHERE id = $1::UUID

      - name: credit_target
        type: exec
        args:
          - ref_same('read_target').id
          - local('amount')
        query: UPDATE account SET balance = balance + $2::FLOAT WHERE id = $1::UUID

When to use transactions#

Use transactions when your workload needs read-then-write patterns or multi-statement atomicity. For example:

Read an account balance, then debit it (the read and write must see consistent data).
Insert a row into two related tables that must either both succeed or both roll back.
Simulate realistic application behaviour where multiple queries happen inside a single database transaction.

Use standalone queries when each operation is independent and doesn’t need transactional guarantees.

Conditional rollback#

A rollback_if element can be placed between queries in a transaction. When reached, its expression is evaluated; if it returns true, the transaction is rolled back immediately. This is not treated as an error, the worker continues to the next iteration.

run:

  - name: check_balance
    type: query
    args:
      - ref_rand('fetch_accounts').id
    query: SELECT balance FROM account WHERE id = $1::UUID

  - transaction: make_transfer
    locals:
      amount: gen('number:1,100')
    queries:
      - name: read_source
        type: query
        args: [ref_diff('fetch_accounts').id]
        query: SELECT id, balance FROM account WHERE id = $1::UUID

      - rollback_if: "ref_same('read_source').balance < local('amount')"

      - name: debit_source
        type: exec
        args:
          - ref_same('read_source').id
          - local('amount')
        query: UPDATE account SET balance = balance - $2::FLOAT WHERE id = $1::UUID

In this example, the transfer amount is generated once as a local. After read_source runs, the condition checks whether the account balance can cover the transfer amount. If not, the transaction rolls back before debit_source executes. Multiple rollback_if elements can be placed at different points in the transaction to check different conditions.

The expression has access to all data in the environment, including results from queries that have already run within the transaction. A rollback_if element must not have name, type, args, or query fields.

For general-purpose branching (if/then/else and match/when/default), see the Conditionals page.

Constraints#

Batch types not allowed: query_batch and exec_batch cannot be used inside a transaction.
Prepared statements not allowed: Queries inside a transaction cannot set prepared: true.
A transaction must contain at least one query.
Transaction names and standalone query names share the same namespace for run_weights.
The rollback_if expression must evaluate to a boolean.

Non-transactional mode (`--no-atomic-tx`)#

Pass --no-atomic-tx (or set EDG_NO_ATOMIC_TX=true) to execute transaction queries without BEGIN/COMMIT. Queries still run sequentially with shared locals and ref_same context, but each statement commits independently. This is useful for comparing database behaviour with and without transactional guarantees.

# Transactional (default)
edg workload consistency all --driver pgx --url "..." -w 16 -d 30s

# Non-transactional
edg workload consistency all --driver pgx --url "..." -w 16 -d 30s --no-atomic-tx

When --no-atomic-tx is set:

Queries execute against the connection pool directly (no BEGIN/COMMIT).
locals and ref_same still work and shared state is maintained in memory.
rollback_if conditions are skipped, allowing concurrency anomalies to manifest.

Error handling#

If any query inside a transaction fails, the transaction is rolled back. During the run phase, the error is logged and the worker continues to the next iteration (same as standalone query errors). Conditional rollbacks (via rollback_if) are not errors and do not appear in the error rate.

Cassandra transactions#

Cassandra does not support ACID transactions. When you use a transaction block with the Cassandra driver, edg uses a logged batch internally. This has important implications:

Writes are atomic: All exec queries in the transaction are accumulated into a single batch and sent to the database on commit. Either all writes succeed or none do.
__sep__ is supported: Queries containing __sep__ are split into individual statements, each added to the logged batch as a separate entry. This lets you use batch-format queries inside transactions.
Reads bypass the batch: Any query type inside the transaction executes immediately against the session, outside the batch. Reads do not see uncommitted writes from the same transaction.
Rollback is a no-op: If a rollback_if condition triggers or a query fails, the batch is simply discarded. There is no undo of already-executed reads.

rollback_if has no effect with the Cassandra driver. The condition is evaluated but the rollback does nothing. The unsent batch is silently discarded, and any reads that already executed cannot be undone. If your config includes rollback_if, edg will log a warning at startup. For conditional write logic, use lightweight transactions (LWT) with IF conditions in individual queries instead.

This makes Cassandra transactions suitable for grouping related writes that must succeed together, but not for read-then-write patterns that require isolation.

For example:

run:
  - transaction: create_order
    queries:
      - name: insert_order
        type: exec
        args:
          - gen('uuid')
          - gen('email')
        query: |-
          INSERT INTO edg.orders (id, email) VALUES ($1, $2)

      - name: insert_audit
        type: exec
        args:
          - gen('uuid')
          - const('order_created')
        query: |-
          INSERT INTO edg.audit_log (id, event) VALUES ($1, $2)

Which resolves to a logged batch:

BEGIN BATCH
  INSERT INTO edg.orders (id, email) VALUES ('a1b2...', 'alice@x.com')
  INSERT INTO edg.audit_log (id, event) VALUES ('c3d4...', 'order_created')
APPLY BATCH

Compare this to standalone exec_batch queries, which use an unlogged batch (BEGIN UNLOGGED BATCH). Transaction blocks always use a logged batch, guaranteeing atomicity across partitions.

MongoDB transactions#

MongoDB transactions use sessions and provide full ACID guarantees. Both reads and writes participate in the transaction and see each other’s results, matching the behaviour of SQL drivers.

MongoDB transactions require a replica set or sharded cluster. Standalone mongod instances do not support multi-document transactions. If you are testing locally, use mongod --replSet rs0 or a Docker Compose setup with replica set initialisation.

Wait#

Queries can specify a wait duration (e.g. wait: 18s) to introduce a keying/think-time delay after execution. This only applies to queries in the run section and is ignored in other sections.

Print#

The print field accepts a list of expressions that are evaluated each iteration and aggregated across all workers for display in the progress and summary output.

When print expressions are provided, they replace the default log output for that query.

Simple form#

A plain string entry auto-detects the value type. String values show frequency distributions (top 10 by count) and numeric values show min/avg/max.

run:
  - name: insert_order
    type: exec
    args:
      - gen('email')
      - ref_same('regions').name
      - set_rand(ref_same('regions').cities, [])
      - uniform(1, 500)
    print:
      - ref_same('regions').name
      - arg(3)
    query: |-
      INSERT INTO print_order (email, region, city, amount)
      VALUES ($1, $2, $3, $4::DECIMAL)

With named args, the amount print becomes more readable:

    print:
      - ref_same('regions').name
      - arg('amount')

Print expressions have access to the same context as query args: ref_same, ref_rand, arg(), global(), local(), and all built-in functions. Expressions using ref_same see the same row selected for the query args in that iteration.

Custom aggregation#

Use the map form with expr and agg fields for full control over how values are aggregated and displayed. The agg field is an expr expression evaluated against the accumulated state:

Variable	Type	Description
`count`	int	Total observations
`freq`	map[string]int	Value frequency distribution
`min`	float	Minimum numeric value
`max`	float	Maximum numeric value
`avg`	float	Mean of numeric values
`sum`	float	Sum of numeric values

These variables can be combined in any expr-lang expression to produce custom summary output:

PRINT         VALUES
insert_order  eu=1647 ap=1604 us=1585
insert_order  min=248.87 avg=1.01 max=499.93 n=4836
insert_order  paris=552 london=548 berlin=547 sydney=542 tokyo=535
insert_order  avg $248 n=4836
insert_order  total=$1203541
insert_order  1 - 499
insert_order  9 unique
read_order    us=1699 eu=1602 ap=1531
read_order    4832

See _examples/print/ for a complete working example.

Post Print#

The post_print field works like print but evaluates after the query executes. This gives access to result(), which returns the first row of a type: query SELECT result as a map.

run:
  - name: check_total_balance
    type: query
    query: |-
      SELECT SUM(balance) AS total, COUNT(*) AS num_accounts
      FROM account
    post_print:
      - expr: result().total
        agg: "string(int(min)) + '..' + string(int(max)) + ' n=' + string(count)"
      - expr: result().num_accounts
        agg: "string(int(min)) + '..' + string(int(max))"

result() returns a map with column names as keys. Access columns with dot notation: result().total, result().num_accounts. If no result is available (e.g. the query is type: exec), result() returns an error.

`results()` - all rows#

results() returns every row from the SELECT result as a slice of maps ([]map). Combine it with expr-lang builtins to aggregate, filter, or transform across all rows:

run:
  - name: verify_balances
    type: query
    query: |-
      SELECT id, balance FROM account
    post_print:
      # total rows returned
      - expr: len(results())
      # sum all balances
      - expr: reduce(results(), #acc + #.balance, 0)
      # find largest balance
      - expr: reduce(results(), cond(#.balance > #acc, #.balance, #acc), 0)
      # count overdrawn accounts
      - expr: len(filter(results(), #.balance < 0))
      # collect ids of high-value accounts
      - expr: join(map(filter(results(), #.balance > 5000), string(int(#.id))), ",")

post_print supports the same simple and custom aggregation forms as print. Both fields can be used on the same query - print for input distributions, post_print for output observations:

PRINT                VALUES
check_total_balance  10000..10000 n=470
check_total_balance  10..10

Placeholders#

Arg placeholders ($1, $2, etc.) are passed to the database in one of two ways: inlined or as bind params.

Inlining#

Inlining means edg performs a text replacement on the SQL string before sending it to the database. Every $N in the query (or __values__ token) is replaced with the literal arg values. For example, with __values__:

type: exec_batch
count: 3
args:
  - gen('email')
query: |-
  INSERT INTO users (email)
  __values__

The SQL sent to the database becomes:

INSERT INTO users (email)
VALUES ('alice@x.com'), ('bob@y.com'), ('carol@z.com')

With driver-specific batch expansion, $N placeholders are inlined similarly:

args:
  - gen_batch(1000, 100, 'email')
query: |-
  INSERT INTO users (email)
  SELECT unnest(string_to_array('$1', __sep__))

If $1 evaluates to alice@x.com\x1fbob@y.com\x1f..., the SQL sent to the database becomes:

INSERT INTO users (email)
SELECT unnest(string_to_array('alice@x.com\x1fbob@y.com\x1f...', chr(31)))

The database never sees $1 or __values__, it receives a fully formed query with the values baked in. This is used for:

__values__ token replacement (recommended).
query_batch / exec_batch types with $N placeholders (always inlined).
Batch-expanded queries using gen_batch, batch, or ref_each (in any section).

Inlining lets you use $N as a universal placeholder syntax across all drivers (pgx, MySQL, Oracle, SQL Server) without worrying about driver-specific bind param formats. It also avoids a pgx-stdlib issue where numeric values are sent as DECIMAL, which CockroachDB can’t mix with INT in arithmetic.

Bind params#

All other queries use native driver bind parameters. The placeholder stays in the SQL and the values are sent separately, allowing the database to cache query plans and avoid re-parsing.

Each driver has its own placeholder format:

Driver	Placeholder format
`pgx` (PostgreSQL / CockroachDB)	`$1`, `$2`, `$3`
`dsql` (Aurora DSQL)	`$1`, `$2`, `$3`
`mysql`	`?` (positional)
`oracle`	`:1`, `:2`, `:3`
`mssql`	`@p1`, `@p2`, `@p3`
`spanner`	`@p1`, `@p2`, `@p3`
`mongodb`	`$1`, `$2`, `$3` (inlined into JSON commands)
`cassandra`	`?` (positional)

Since run queries always use bind params, their SQL must use the correct format for the target driver.

Column Name Normalisation#

When a query or query_batch result is stored, all column names are lowercased before being added to the environment. This means a SQL column W_ID becomes accessible as ref_rand('fetch_warehouses').w_id, not .W_ID.

Queries#

Sections#

Args#

Positional args#

Named args#

Inline Args#

Query Types#

Type Inference#

Batch Fields#

Multi-Row VALUES (__values__ and __columns__) - Recommended#

Oracle (INSERT ALL)#

Driver-Specific Batch Expansion#

Postgres / CockroachDB#

MySQL#

Oracle#

MSSQL#

Spanner (GoogleSQL)#

Cassandra#

MongoDB#

Cassandra#

Prepared Statements#

Transactions#

When to use transactions#

Conditional rollback#

Constraints#

Non-transactional mode (--no-atomic-tx)#

Error handling#

Cassandra transactions#

MongoDB transactions#

Wait#

Print#

Simple form#

Custom aggregation#

Post Print#

results() - all rows#

Placeholders#

Inlining#

Bind params#

Column Name Normalisation#

Multi-Row VALUES (`values` and `columns`) - Recommended#

Oracle (`INSERT ALL`)#

Non-transactional mode (`--no-atomic-tx`)#

`results()` - all rows#