Graph Data#

The tree: config section generates hierarchical and graph-structured data declaratively. Instead of writing one seed query per level, you describe the shape with levels and edg expands it into the correct batch inserts with parent references wired up automatically.

Trees#

A tree generates strict parent-child hierarchies (e.g. org charts, category trees, nested comments). Each node at level N references exactly one parent at level N-1. Root nodes (level 0) get NULL parents.

seq:
  - name: emp_id
    start: 1
    step: 1

tree:
  - name: org
    levels: [1, 5, 15, 50, 200]
    id_column: id
    parent_column: manager_id
    args:
      id: seq_global('emp_id')
      name: gen('name')
      role: ~
      manager_id: ~
    level_args:
      0:
        role: const('CEO')
      1:
        role: const('VP')
      2:
        role: const('Director')
      3:
        role: const('Manager')
      4:
        role: set_rand(['Engineer', 'IC'], [])
    query: |-
      INSERT INTO employees (id, name, role, manager_id)
      __values__

This produces 271 rows across 5 levels:

LevelCountRole
01CEO
15VP
215Director
350Manager
4200Engineer or IC

The ~ (tilde) in YAML represents null. Use it as a placeholder for args that edg manages automatically. The parent_column arg (manager_id above) is overwritten by edg at each level with the correct parent reference expression. Args overridden by level_args (like role above) also use ~ as a default since they are replaced at every level.

How it works#

edg expands the tree config into separate seed queries at load time:

Expanded queryCountmanager_id expression
org_level_01const(nil)
org_level_15ref_rand('org_level_0').id
org_level_215ref_rand('org_level_1').id
org_level_350ref_rand('org_level_2').id
org_level_4200ref_rand('org_level_3').id

DAGs#

Set dag: true to generate directed acyclic graphs. In DAG mode, a node can reference a parent from any prior level, not just the one directly above. This models dependency graphs, prerequisite chains, and similar structures.

tree:
  - name: deps
    levels: [3, 8, 20]
    id_column: id
    parent_column: dependency_id
    dag: true
    args:
      id: seq_global('dep_id')
      name: template('pkg-%d', iter())
      dependency_id: ~
    query: |-
      INSERT INTO dependencies (id, name, dependency_id)
      __values__

In DAG mode, the parent expression picks from all prior levels uniformly:

Expanded querydependency_id expression
deps_level_0const(nil)
deps_level_1ref_rand(set_rand(['deps_level_0'], [])).id
deps_level_2ref_rand(set_rand(['deps_level_0', 'deps_level_1'], [])).id

Per-level argument overrides#

Use level_args to override specific args at each level. Any expression is valid. Args not overridden at a given level use the default from args.

tree:
  - name: org
    levels: [1, 5, 100]
    id_column: id
    parent_column: parent_id
    args:
      id: seq_global('id')
      title: gen('jobtitle')
      parent_id: ~
    level_args:
      0:
        title: const('CEO')
    query: |-
      INSERT INTO org (id, title, parent_id)
      __values__

Level 0 gets const('CEO'), levels 1 and 2 fall back to gen('jobtitle').

Templates and batch options#

Use template to inherit shared settings across multiple tree configs. Combine with size, wait, and prepared to control how expanded queries execute.

templates:
  tree_batch:
    size: 500
    prepared: true
    wait: 100ms

tree:
  - name: categories
    template: tree_batch
    levels: [1, 20, 200]
    id_column: id
    parent_column: parent_id
    args:
      id: seq_global('cat_id')
      name: gen('noun')
      parent_id: ~
    query: |-
      INSERT INTO categories (id, name, parent_id)
      __values__

  - name: regions
    template: tree_batch
    levels: [1, 10, 50]
    id_column: id
    parent_column: parent_id
    args:
      id: seq_global('region_id')
      name: gen('country')
      parent_id: ~
    query: |-
      INSERT INTO regions (id, name, parent_id)
      __values__

Both trees inherit size: 500, prepared: true, and wait: 100ms from tree_batch. Each expanded level query inserts in batches of 500 rows using prepared statements, pausing 100ms between batches.

Fields set directly on the tree override the template:

tree:
  - name: huge_tree
    template: tree_batch
    size: 5000
    levels: [1, 100, 10000]
    id_column: id
    parent_column: parent_id
    args:
      id: seq_global('id')
      parent_id: ~
    query: |-
      INSERT INTO nodes (id, parent_id)
      __values__

Here size: 5000 overrides the template’s size: 500, but prepared and wait are still inherited.

Config reference#

FieldRequiredDescription
nameyesPrefix for expanded query names ({name}_level_{n})
levelsyesList of row counts per level (integers or global references)
id_columnyesColumn name used as the node identifier
parent_columnyesColumn name for the parent reference (must appear in args)
dagnotrue for DAG mode (default false for strict tree)
argsyesNamed argument expressions (same as seed queries)
level_argsnoPer-level argument overrides, keyed by level index
queryyesINSERT query with __values__ placeholder
templatenoInherit defaults from a named template
sizenoBatch size for expanded queries
batch_formatnoBatch format (json, etc.)
waitnoDelay between expanded queries
preparednoUse prepared statements

Expression-valued levels#

Level counts can reference globals, so you can parameterise the shape:

globals:
  total_subcats: 50
  total_items: 500

tree:
  - name: categories
    levels: [1, total_subcats, total_items]
    id_column: id
    parent_column: parent_id
    args:
      id: seq_global('cat_id')
      parent_id: ~
    query: |-
      INSERT INTO categories (id, parent_id)
      __values__