Graph Data#
The tree: config section generates hierarchical and graph-structured data declaratively. Instead of writing one seed query per level, you describe the shape with levels and edg expands it into the correct batch inserts with parent references wired up automatically.
Trees#
A tree generates strict parent-child hierarchies (e.g. org charts, category trees, nested comments). Each node at level N references exactly one parent at level N-1. Root nodes (level 0) get NULL parents.
seq:
- name: emp_id
start: 1
step: 1
tree:
- name: org
levels: [1, 5, 15, 50, 200]
id_column: id
parent_column: manager_id
args:
id: seq_global('emp_id')
name: gen('name')
role: ~
manager_id: ~
level_args:
0:
role: const('CEO')
1:
role: const('VP')
2:
role: const('Director')
3:
role: const('Manager')
4:
role: set_rand(['Engineer', 'IC'], [])
query: |-
INSERT INTO employees (id, name, role, manager_id)
__values__This produces 271 rows across 5 levels:
| Level | Count | Role |
|---|---|---|
| 0 | 1 | CEO |
| 1 | 5 | VP |
| 2 | 15 | Director |
| 3 | 50 | Manager |
| 4 | 200 | Engineer or IC |
The
~(tilde) in YAML representsnull. Use it as a placeholder for args that edg manages automatically. Theparent_columnarg (manager_idabove) is overwritten by edg at each level with the correct parent reference expression. Args overridden bylevel_args(likeroleabove) also use~as a default since they are replaced at every level.
How it works#
edg expands the tree config into separate seed queries at load time:
| Expanded query | Count | manager_id expression |
|---|---|---|
org_level_0 | 1 | const(nil) |
org_level_1 | 5 | ref_rand('org_level_0').id |
org_level_2 | 15 | ref_rand('org_level_1').id |
org_level_3 | 50 | ref_rand('org_level_2').id |
org_level_4 | 200 | ref_rand('org_level_3').id |
DAGs#
Set dag: true to generate directed acyclic graphs. In DAG mode, a node can reference a parent from any prior level, not just the one directly above. This models dependency graphs, prerequisite chains, and similar structures.
tree:
- name: deps
levels: [3, 8, 20]
id_column: id
parent_column: dependency_id
dag: true
args:
id: seq_global('dep_id')
name: template('pkg-%d', iter())
dependency_id: ~
query: |-
INSERT INTO dependencies (id, name, dependency_id)
__values__In DAG mode, the parent expression picks from all prior levels uniformly:
| Expanded query | dependency_id expression |
|---|---|
deps_level_0 | const(nil) |
deps_level_1 | ref_rand(set_rand(['deps_level_0'], [])).id |
deps_level_2 | ref_rand(set_rand(['deps_level_0', 'deps_level_1'], [])).id |
Per-level argument overrides#
Use level_args to override specific args at each level. Any expression is valid. Args not overridden at a given level use the default from args.
tree:
- name: org
levels: [1, 5, 100]
id_column: id
parent_column: parent_id
args:
id: seq_global('id')
title: gen('jobtitle')
parent_id: ~
level_args:
0:
title: const('CEO')
query: |-
INSERT INTO org (id, title, parent_id)
__values__Level 0 gets const('CEO'), levels 1 and 2 fall back to gen('jobtitle').
Templates and batch options#
Use template to inherit shared settings across multiple tree configs. Combine with size, wait, and prepared to control how expanded queries execute.
templates:
tree_batch:
size: 500
prepared: true
wait: 100ms
tree:
- name: categories
template: tree_batch
levels: [1, 20, 200]
id_column: id
parent_column: parent_id
args:
id: seq_global('cat_id')
name: gen('noun')
parent_id: ~
query: |-
INSERT INTO categories (id, name, parent_id)
__values__
- name: regions
template: tree_batch
levels: [1, 10, 50]
id_column: id
parent_column: parent_id
args:
id: seq_global('region_id')
name: gen('country')
parent_id: ~
query: |-
INSERT INTO regions (id, name, parent_id)
__values__Both trees inherit size: 500, prepared: true, and wait: 100ms from tree_batch. Each expanded level query inserts in batches of 500 rows using prepared statements, pausing 100ms between batches.
Fields set directly on the tree override the template:
tree:
- name: huge_tree
template: tree_batch
size: 5000
levels: [1, 100, 10000]
id_column: id
parent_column: parent_id
args:
id: seq_global('id')
parent_id: ~
query: |-
INSERT INTO nodes (id, parent_id)
__values__Here size: 5000 overrides the template’s size: 500, but prepared and wait are still inherited.
Config reference#
| Field | Required | Description |
|---|---|---|
name | yes | Prefix for expanded query names ({name}_level_{n}) |
levels | yes | List of row counts per level (integers or global references) |
id_column | yes | Column name used as the node identifier |
parent_column | yes | Column name for the parent reference (must appear in args) |
dag | no | true for DAG mode (default false for strict tree) |
args | yes | Named argument expressions (same as seed queries) |
level_args | no | Per-level argument overrides, keyed by level index |
query | yes | INSERT query with __values__ placeholder |
template | no | Inherit defaults from a named template |
size | no | Batch size for expanded queries |
batch_format | no | Batch format (json, etc.) |
wait | no | Delay between expanded queries |
prepared | no | Use prepared statements |
Expression-valued levels#
Level counts can reference globals, so you can parameterise the shape:
globals:
total_subcats: 50
total_items: 500
tree:
- name: categories
levels: [1, total_subcats, total_items]
id_column: id
parent_column: parent_id
args:
id: seq_global('cat_id')
parent_id: ~
query: |-
INSERT INTO categories (id, parent_id)
__values__