Configuration#
Workloads are defined in a single YAML file with the following top-level keys:
# Variables available in all expressions.
globals:
# User-defined expression functions.
expressions:
# Reusable arg templates for queries.
objects:
# Named query templates for reducing boilerplate.
templates:
# Static datasets available to ref_* functions without a database query.
reference:
# Named auto-incrementing sequences shared across all workers.
seq:
# Staged workload execution (overridden by explicit -w and -d flags).
stages:
# Schema creation queries.
up:
# Data population queries.
seed:
# Data cleanup queries.
deseed:
# Schema teardown queries.
down:
# Per-worker initialisation queries (run before workload).
init:
# Weighted transaction mix (optional).
run_weights:
# Workload queries (standalone or grouped into transactions).
run:
# Background queries that run independently on a fixed schedule.
workers:
# Post-run assertions for CI/CD (exit non-zero on failure).
expectations:Globals#
The globals section defines top-level variables available in all expressions:
globals:
warehouses: 1
districts: 10
customers: 30000
items: 100000These can be referenced directly in arg expressions, including in arithmetic:
args:
- customers / districts # evaluates to 3000
- warehouses * 10 # evaluates to 10Expression-valued globals#
Global values can be expressions, including references to other globals and environment variables. String values are compiled as expressions; if compilation fails (e.g. a plain string like "new york"), the value is kept as a literal.
Globals are evaluated in YAML document order, so later globals can reference earlier ones:
globals:
warehouses: 1
districts: warehouses * 10 # evaluates to 10
customers: districts * 3000 # evaluates to 30000Globals from environment variables#
Use env() to read a required environment variable, or env_nil() for an optional one. Combine env_nil() with coalesce() to provide a default:
globals:
# Required (fails at startup if DB_BATCH_SIZE is not set).
batch_size: env('DB_BATCH_SIZE')
# Optional (falls back to 10000 if CUSTOMERS is not set).
customers: int(coalesce(env_nil('CUSTOMERS'), 10000))
env()returns a string. Wrap withint()orfloat()if arithmetic is needed downstream.
Objects#
The objects section defines reusable named arg templates. Each key is an object name, and the value is a map of field names to expressions. Queries reference an object using the object field instead of duplicating args. Fields are bound to query parameters in declaration order ($1, $2, …):
objects:
customer:
email: gen('email')
name: gen('name')
created_at: timestamp('2020-01-01T00:00:00Z', '2024-01-01T00:00:00Z')Object names must be defined in the objects section; referencing an unknown name is a validation error. Objects support all the same expressions as args. Multiple calls to obj() within the same arg-set evaluation return the same instance.
object: without args#
Expands all object fields as positional args in declaration order:
- name: insert_customer_row
type: exec
object: customer
query: |-
INSERT INTO customer (email, name, created_at)
VALUES ($1, $2, $3)field()#
Cherry-picks fields from the object: context. Requires object: to be set. Mixable with other expressions:
- name: insert_customer_field
type: exec
object: customer
args:
- field('email')
- field('name')
- ref_rand('fetch_cities').name
query: |-
INSERT INTO customer (email, name, city)
VALUES ($1, $2, $3)obj('name', 'field')#
References a specific field from any object. Does not require object::
- name: insert_customer_obj
type: exec
args:
- obj('customer', 'email')
- obj('customer', 'name')
- obj('customer', 'created_at')
query: |-
INSERT INTO customer (email, name, created_at)
VALUES ($1, $2, $3)obj('name').field#
Evaluates all fields and returns a map for dot-access. Does not require object::
- name: insert_customer_dot
type: exec
args:
- obj('customer').email
- obj('customer').name
- obj('customer').created_at
query: |-
INSERT INTO customer (email, name, created_at)
VALUES ($1, $2, $3)Summary#
| Syntax | Requires object: | Evaluates |
|---|---|---|
object: order (no args) | yes | all fields, declaration order |
field('product') | yes | single field from object context |
obj('order', 'product') | no | single field, explicit object |
obj('order').product | no | all fields, dot-access |
Templates#
The templates section defines named query templates that reduce boilerplate. A template is a partial query definition; any query can reference it with template: and inherit its fields. Fields set on the query override the template.
templates:
batch_insert:
count: batch_size
size: 1000
seed:
- name: seed_users
template: batch_insert
object: user
query: INSERT INTO users __columns__ __values__
- name: seed_orders
template: batch_insert
object: order
query: INSERT INTO orders __columns__ __values__Template fields work as defaults - any field set on the query wins. Referencing an unknown template name is a validation error. Templates can be used in any section (seed, run, workers, transactions, etc.).
Reference#
The reference section defines static datasets that are loaded into the environment at startup, making them available to ref_rand, ref_same, ref_perm, and ref_diff functions without needing an init query or database connection. Each key is a dataset name, and the value is a list of row objects:
reference:
regions:
- {name: us, cities: [a, b, c]}
- {name: eu, cities: [d, e, f]}
- {name: ap, cities: [g, h, i]}Reference datasets work exactly like datasets populated by init queries. You can use them in any arg expression:
args:
# Random region row, access the 'name' field.
- ref_rand('regions').name
# Same row reused across all ref_same calls in this query execution.
- ref_same('regions').name
- set_rand(ref_same('regions').cities, [])This is useful when your lookup data is small and known ahead of time, avoiding the need for a database round-trip.
Loading from CSV files#
Instead of inlining reference data in YAML, you can load it from CSV files at startup using the --csv-file and --csv-directory flags. Each CSV file becomes a reference dataset named after its filename (minus the extension). The header row defines the field names.
edg seed \
--config workload.yaml \
--csv-file data/regions.csv \
--csv-file data/translations.csv \
--url $DATABASE_URLOr load all CSV files from a directory (non-recursive):
edg seed \
--config workload.yaml \
--csv-directory data/ \
--url $DATABASE_URLA file named regions.csv with this content:
code,name,currency
us,United States,USD
gb,United Kingdom,GBP
de,Germany,EURis equivalent to this YAML reference: block:
reference:
regions:
- {code: us, name: United States, currency: USD}
- {code: gb, name: United Kingdom, currency: GBP}
- {code: de, name: Germany, currency: EUR}Both flags are repeatable and work with all commands (up, seed, run, stage, repl, etc.). If a CSV filename collides with an existing reference: dataset name, edg returns an error.
See _examples/csv/ for a complete working example.
Seq#
The seq section defines named auto-incrementing sequences that are shared across all workers. Unlike seq(start, step) which maintains a separate counter per worker, seq_global("name") returns globally unique values using atomic counters.
seq:
- name: order_id
start: 1
step: 1
- name: line_item_id
start: 1000
step: 5
- name: sku_code
length: 3| Field | Description |
|---|---|
name | Sequence identifier, referenced in seq_global("name") or seq_alpha_global("name") calls. Must be unique. |
start | Initial value for numeric sequences. |
step | Increment between consecutive numeric values. |
length | Character length for alpha sequences (e.g. 3 produces aaa, aab, …, zzz). Mutually exclusive with start/step. |
Each entry must use either start/step (numeric) or length (alpha), not both.
Use seq_global("name") in any arg expression:
seed:
- name: seed_orders
type: exec
count: 1000
args:
- seq_global("order_id")
- gen('firstname')
query: INSERT INTO orders (id, name) VALUES ($1, $2)Sequences work in all sections (seed, run, workers) and produce gap-free, globally unique values across concurrent workers. The same sequence can be used in both seed and run; the counter continues from where seeding left off.
To reference a random value that has already been generated by a sequence, use the distribution functions:
| Function | Description |
|---|---|
seq_rand("name") | Uniform random pick from generated values |
seq_zipf("name", s, v) | Zipfian-distributed pick (hot early values) |
seq_norm("name", mean, stddev) | Normal-distributed pick |
seq_exp("name", rate) | Exponential-distributed pick |
seq_lognorm("name", mu, sigma) | Log-normal-distributed pick |
These functions compute valid values algebraically from the sequence’s start, step, and current counter without generating and storing each sequence in memory. This is useful for foreign key references where the referenced column was populated by seq_global.
See _examples/global_sequences/ for a complete working example but here’s a quick introduction.
Create two tables, one to store 1,000 orders and one to store references to them (using each of the sequence generators):
globals:
orders: 1000
samples: 10000
batch_size: 100
seq:
- name: order_id
start: 1
step: 1
up:
- name: create_orders
query: |-
CREATE TABLE IF NOT EXISTS orders (
id INT PRIMARY KEY,
customer STRING NOT NULL
)
- name: create_samples
query: |-
CREATE TABLE IF NOT EXISTS samples (
id INT PRIMARY KEY DEFAULT unique_rowid(),
uniform_val INT NOT NULL,
zipf_val INT NOT NULL,
norm_val INT NOT NULL,
exp_val INT NOT NULL,
lognorm_val INT NOT NULL
)
seed:
- name: seed_orders
type: exec_batch
count: orders
size: batch_size
args:
- seq_global("order_id")
- gen('firstname') + ' ' + gen('lastname')
query: |-
INSERT INTO orders (id, customer)
SELECT
unnest(string_to_array('$1', chr(31))::INT[]),
unnest(string_to_array('$2', chr(31)))
- name: seed_samples
type: exec_batch
count: samples
size: batch_size
args:
- seq_rand("order_id")
- seq_zipf("order_id", 1.1, 1.0)
- seq_norm("order_id", 500, 150)
- seq_exp("order_id", 0.01)
- seq_lognorm("order_id", 5.5, 0.5)
query: |-
INSERT INTO samples (uniform_val, zipf_val, norm_val, exp_val, lognorm_val)
SELECT
unnest(string_to_array('$1', chr(31))::INT[]),
unnest(string_to_array('$2', chr(31))::INT[]),
unnest(string_to_array('$3', chr(31))::INT[]),
unnest(string_to_array('$4', chr(31))::INT[]),
unnest(string_to_array('$5', chr(31))::INT[])After seeding, the samples table will show various distributions of order ids (all with complete referential integrity to the orders table) and can be queried to show their distribution as follows:
-- Uniform distribution.
SELECT
div(uniform_val - 1, 50) * 50 + 1 AS bucket,
count(*) AS total,
repeat('█', (count(*) * 50 / max(count(*)) OVER ())::INT) AS histogram
FROM samples
GROUP BY 1
ORDER BY 1;
bucket | total | histogram
---------+-------+-----------------------------------------------------
1 | 481 | █████████████████████████████████████████████
51 | 484 | █████████████████████████████████████████████
101 | 496 | ███████████████████████████████████████████████
151 | 465 | ████████████████████████████████████████████
201 | 492 | ██████████████████████████████████████████████
251 | 510 | ████████████████████████████████████████████████
301 | 516 | ████████████████████████████████████████████████
351 | 513 | ████████████████████████████████████████████████
401 | 471 | ████████████████████████████████████████████
451 | 533 | ██████████████████████████████████████████████████
501 | 519 | █████████████████████████████████████████████████
551 | 498 | ███████████████████████████████████████████████
601 | 522 | █████████████████████████████████████████████████
651 | 518 | █████████████████████████████████████████████████
701 | 492 | ██████████████████████████████████████████████
751 | 473 | ████████████████████████████████████████████
801 | 522 | █████████████████████████████████████████████████
851 | 515 | ████████████████████████████████████████████████
901 | 472 | ████████████████████████████████████████████
951 | 508 | ████████████████████████████████████████████████
-- Normal distribution.
SELECT
div(norm_val - 1, 50) * 50 + 1 AS bucket,
count(*) AS total,
repeat('█', (count(*) * 50 / max(count(*)) OVER ())::INT) AS histogram
FROM samples
GROUP BY 1
ORDER BY 1;
bucket | total | histogram
---------+-------+-----------------------------------------------------
1 | 10 |
51 | 23 | █
101 | 59 | ██
151 | 142 | █████
201 | 254 | █████████
251 | 390 | ██████████████
301 | 667 | ████████████████████████
351 | 905 | █████████████████████████████████
401 | 1159 | ██████████████████████████████████████████
451 | 1380 | ██████████████████████████████████████████████████
501 | 1362 | █████████████████████████████████████████████████
551 | 1162 | ██████████████████████████████████████████
601 | 917 | █████████████████████████████████
651 | 652 | ████████████████████████
701 | 440 | ████████████████
751 | 263 | ██████████
801 | 128 | █████
851 | 57 | ██
901 | 27 | █
951 | 3 |
-- Exponential distribution.
SELECT
div(exp_val - 1, 50) * 50 + 1 AS bucket,
count(*) AS total,
repeat('█', (count(*) * 50 / max(count(*)) OVER ())::INT) AS histogram
FROM samples
GROUP BY 1
ORDER BY 1;
bucket | total | histogram
---------+-------+-----------------------------------------------------
1 | 3961 | ██████████████████████████████████████████████████
51 | 2436 | ███████████████████████████████
101 | 1475 | ███████████████████
151 | 782 | ██████████
201 | 532 | ███████
251 | 329 | ████
301 | 207 | ███
351 | 111 | █
401 | 66 | █
451 | 40 | █
501 | 26 |
551 | 19 |
601 | 7 |
651 | 1 |
701 | 1 |
751 | 5 |
851 | 1 |
901 | 1 |
-- Log-normal distribution.
SELECT
div(lognorm_val - 1, 50) * 50 + 1 AS bucket,
count(*) AS total,
repeat('█', (count(*) * 50 / max(count(*)) OVER ())::INT) AS histogram
FROM samples
GROUP BY 1
ORDER BY 1;
bucket | total | histogram
---------+-------+-----------------------------------------------------
1 | 10 |
51 | 342 | █████████
101 | 1293 | ███████████████████████████████████
151 | 1836 | ██████████████████████████████████████████████████
201 | 1699 | ██████████████████████████████████████████████
251 | 1425 | ███████████████████████████████████████
301 | 1024 | ████████████████████████████
351 | 767 | █████████████████████
401 | 516 | ██████████████
451 | 335 | █████████
501 | 238 | ██████
551 | 147 | ████
601 | 121 | ███
651 | 81 | ██
701 | 71 | ██
751 | 38 | █
801 | 25 | █
851 | 15 |
901 | 13 |
951 | 4 |
-- Zipfian distribution.
SELECT
div(zipf_val - 1, 50) * 50 + 1 AS bucket,
count(*) AS total,
repeat('█', (count(*) * 50 / max(count(*)) OVER ())::INT) AS histogram
FROM samples
GROUP BY 1
ORDER BY 1;
bucket | total | histogram
---------+-------+-----------------------------------------------------
1 | 6904 | ██████████████████████████████████████████████████
51 | 778 | ██████
101 | 458 | ███
151 | 312 | ██
201 | 255 | ██
251 | 177 | █
301 | 146 | █
351 | 112 | █
401 | 106 | █
451 | 77 | █
501 | 104 | █
551 | 76 | █
601 | 79 | █
651 | 87 | █
701 | 56 |
751 | 64 |
801 | 66 |
851 | 41 |
901 | 54 |
951 | 48 |Stages#
The stages section defines a sequence of workload phases, each with its own worker count and duration. Explicitly passing -w or -d on the command line overrides the stages section, falling back to single-stage mode with the provided values.
stages:
- name: ramp
workers: 1
duration: 10s
- name: steady
workers: 10
duration: 30s
- name: cooldown
workers: 2
duration: 10sEach stage runs sequentially. When a stage completes (its duration expires), the next stage starts immediately with a new set of workers. The init section runs once before the first stage, and its results are shared across all stages.
| Field | Description |
|---|---|
name | Stage identifier, logged when the stage starts. |
workers | Number of concurrent workers for this stage. Defaults to 1 if omitted. |
duration | How long this stage runs (e.g. 10s, 5m, 1h). |
run_weights | Optional per-stage weight overrides. Falls back to top-level run_weights when omitted. |
Per-stage run weights#
Each stage can optionally define its own run_weights to change the workload mix for that phase. When a stage omits run_weights, the top-level run_weights apply. When neither exists, all run items execute sequentially.
stages:
- name: ramp
workers: 1
duration: 10s
run_weights:
check_balance: 90
credit_account: 5
make_transfer: 5
- name: steady
workers: 10
duration: 30s
# Falls back to top-level run_weights
run_weights:
check_balance: 50
credit_account: 5
make_transfer: 45This is useful for simulating ramp-up patterns, sustained load tests, or multi-phase benchmarks without running separate commands.
See _examples/stages/ for basic staged execution and _examples/stages_run_weights/ for per-stage weight overrides.
Includes#
Use the !include YAML tag to split workload configs into reusable fragments. This is useful when multiple workloads share the same schema, reference data, or expressions.
globals: !include shared/globals.yaml
up: !include shared/schema.yaml
down: !include shared/teardown.yaml
run: !include shared/run_queries.yamlPaths are resolved relative to the file containing the !include directive.
Mapping value#
Replace a key’s value with the content of a file:
globals: !include shared/globals.yamlWhere shared/globals.yaml contains:
batch_size: 10000
customers: 100000Sequence value#
Replace an entire list:
up: !include shared/schema.yamlWhere shared/schema.yaml contains:
- name: create_users
query: CREATE TABLE users (id UUID PRIMARY KEY, email STRING NOT NULL)
- name: create_orders
query: CREATE TABLE orders (id UUID PRIMARY KEY, user_id UUID REFERENCES users(id))Sequence item#
Splice items from an included file into a list alongside local entries:
run:
- name: local_query
type: query
query: SELECT 1
- !include shared/extra_queries.yamlItems from the included file are merged into the parent sequence rather than nested.
Nested includes#
Included files can themselves use !include. Circular includes are detected and produce an error.
See _examples/includes/ for a complete working example.