Sync#

The sync command writes identical data to two databases and verifies consistency. It’s designed for testing dual-write patterns, CDC pipelines, and cross-database replication.

Each database gets its own config file with driver-specific SQL, and the --rng-seed flag ensures both sides generate identical data.

Subcommands#

CommandDescription
sync runRun up and seed on the source, and optionally the target
sync verifyCompare tables row-by-row across source and target
sync downRun deseed and down on both databases

Flags#

Connection flags#

These persistent flags apply to all sync subcommands:

Flag / Env VarDefaultDescription
--source-driver
EDG_SOURCE_DRIVER
pgxSource database driver
--source-url
EDG_SOURCE_URL
Source database connection URL
--source-config
EDG_SOURCE_CONFIG
Source edg config file
--target-driver
EDG_TARGET_DRIVER
pgxTarget database driver
--target-url
EDG_TARGET_URL
Target database connection URL
--target-config
EDG_TARGET_CONFIG
Target edg config file (omit for CDC mode)

Verify flags#

These flags are specific to sync verify:

FlagDefaultDescription
--tablesComma-separated table names to verify (required)
--order-byColumn for deterministic row ordering (required)
--ignore-columnsComma-separated columns to skip during comparison
--wait0Delay before verifying, to allow for replication lag
--batch-size10000Rows per verification batch
--verbosefalsePrint individual row-level mismatches

Dual-Write Mode example#

This example writes 1,000 users and 5,000 orders to both CockroachDB and MySQL, then verifies the data matches.

Setup#

Start both databases and wait for them to be available:

docker compose -f cmd/harness/compose/compose_crdb.yml up -d
docker compose -f cmd/harness/compose/compose_mysql.yml up -d

Seed both databases#

edg sync run \
  --source-driver mysql \
  --source-url "root:password@tcp(localhost:3306)/defaultdb?parseTime=true" \
  --source-config _examples/sync/mysql.yaml \
  --target-driver pgx \
  --target-url "postgres://root:password@localhost:26257/defaultdb?sslmode=disable" \
  --target-config _examples/sync/crdb.yaml \
  --rng-seed 42

The PRNG is re-seeded before each side, so both databases receive identical generated values.

Check the data#

CockroachDB:

cockroach sql --insecure -e "SELECT id, name FROM users ORDER BY id LIMIT 5"
  id |      name
-----+-----------------
   1 | Shea Gibson
   2 | Alta Bayer
   3 | Reagan Powell
   4 | Aiden Elliott
   5 | Dock Marquardt
(5 rows)

MySQL:

mysql -u root -ppassword -h 127.0.0.1 defaultdb -e "SELECT id, name FROM users ORDER BY id LIMIT 5"
+----+----------------+
| id | name           |
+----+----------------+
|  1 | Shea Gibson    |
|  2 | Alta Bayer     |
|  3 | Reagan Powell  |
|  4 | Aiden Elliott  |
|  5 | Dock Marquardt |
+----+----------------+

CockroachDB orders:

cockroach sql --insecure -e "SELECT * FROM orders LIMIT 5"
   id  | user_id | amount |  status   |         created_at
-------+---------+--------+-----------+-----------------------------
  1001 |     620 | 199.20 | shipped   | 2026-04-24 14:51:56.250121
  1002 |     513 | 140.76 | pending   | 2026-04-24 14:51:56.250121
  1003 |     249 | 138.95 | delivered | 2026-04-24 14:51:56.250121
  1004 |     498 | 371.20 | pending   | 2026-04-24 14:51:56.250121
  1005 |     642 | 246.28 | pending   | 2026-04-24 14:51:56.250121
(5 rows)

MySQL orders:

mysql -u root -ppassword -h 127.0.0.1 defaultdb -e "SELECT * FROM orders LIMIT 5"
+------+---------+--------+-----------+---------------------+
| id   | user_id | amount | status    | created_at          |
+------+---------+--------+-----------+---------------------+
| 1001 |     620 | 199.20 | shipped   | 2026-04-24 14:51:56 |
| 1002 |     513 | 140.76 | pending   | 2026-04-24 14:51:56 |
| 1003 |     249 | 138.95 | delivered | 2026-04-24 14:51:56 |
| 1004 |     498 | 371.20 | pending   | 2026-04-24 14:51:56 |
| 1005 |     642 | 246.28 | pending   | 2026-04-24 14:51:56 |
+------+---------+--------+-----------+---------------------+

IDs, names, amounts, and statuses match across both databases.

Verify consistency#

edg sync verify \
  --source-driver mysql \
  --source-url "root:password@tcp(localhost:3306)/defaultdb?parseTime=true" \
  --target-driver pgx \
  --target-url "postgres://root:password@localhost:26257/defaultdb?sslmode=disable" \
  --tables users,orders \
  --order-by id \
  --ignore-columns created_at
INFO table verified table=users rows=1000
INFO table verified table=orders rows=5000
INFO all tables verified

The --ignore-columns created_at flag skips the timestamp column since CockroachDB and MySQL store timestamps at different precisions.

Verification uses batched keyset pagination (default 10,000 rows per batch), so it works efficiently on tables with millions of rows without loading the entire table into memory.

Teardown#

edg sync down \
  --source-driver mysql \
  --source-url "root:password@tcp(localhost:3306)/defaultdb?parseTime=true" \
  --source-config _examples/sync/mysql.yaml \
  --target-driver pgx \
  --target-url "postgres://root:password@localhost:26257/defaultdb?sslmode=disable" \
  --target-config _examples/sync/crdb.yaml

docker compose -f cmd/harness/compose/compose_crdb.yml down
docker compose -f cmd/harness/compose/compose_mysql.yml down

External Replication mode#

When --target-config is omitted, sync run only writes to the source. The target is expected to receive data through an external replication mechanism (e.g. MOLT Fetch/Replicator, Debezium, logical replication).

edg sync run \
  --source-driver mysql \
  --source-url "root:password@tcp(localhost:3306)/defaultdb?parseTime=true" \
  --source-config _examples/sync/mysql.yaml \
  --rng-seed 42

After replication completes, use sync verify with --wait to allow for lag:

edg sync verify \
  --source-driver mysql \
  --source-url "root:password@tcp(localhost:3306)/defaultdb?parseTime=true" \
  --target-driver pgx \
  --target-url "postgres://root:password@localhost:26257/defaultdb?sslmode=disable" \
  --tables users,orders \
  --order-by id \
  --ignore-columns created_at \
  --wait 5s

Supported drivers#

sync verify works with all database drivers. SQL databases use batched keyset pagination. MongoDB uses server-side sort with keyset pagination. Cassandra fetches all rows with server-side paging and sorts client-side by --order-by.

How verification works#

The sync verify command compares tables using a merge-join over batched keyset pagination:

  1. Both databases are queried in batches using driver-appropriate pagination
  2. Rows are compared by the --order-by column using a sorted merge. Rows present in one side but not the other are reported as MISSING or EXTRA
  3. For matching rows, all columns (except --ignore-columns) are compared. Numeric columns are compared as numbers, so "364.8" and "364.80" are treated as equal
  4. Columns unique to one side (e.g. _id in MongoDB, created_at in SQL) are automatically excluded from comparison
  5. If both source and target return zero rows, verification fails with a diagnostic message rather than silently succeeding

If any mismatches are found, sync verify exits with status code 1.

Memory usage is O(batch_size) for SQL and MongoDB. Cassandra loads all rows for client-side sorting.

Output#

By default, sync verify prints only a per-table summary:

WARN table has mismatches table=orders mismatches=3
ERRO verification failed: 3 mismatches

Add --verbose to see individual row-level discrepancies:

MISMATCH table=users id=42 column=email source="old@example.com" target="new@example.com"
MISSING  table=orders id=99 side=target
EXTRA    table=orders id=100 side=target
TypeMeaning
MISMATCHRow exists in both but a column value differs
MISSINGRow exists in source but not in target
EXTRARow exists in target but not in source