Configuration¶
SynchDB supports the following GUC variables in postgresql.conf. These are common parameters that apply the all connectors managed by SynchDB:
GUC Variable | Type | Default Value | Description |
---|---|---|---|
synchdb.naptime | integer | 100 | The delay in milliseconds between each data polling from Debezium runner engine |
synchdb.dml_use_spi | boolean | false | Option to use SPI to handle DML operations |
synchdb.synchdb_auto_launcher | boolean | true | Option to automatically launch active SynchDB connector workers. This option only works when SynchDB is included in shared_preload_library GUC option |
synchdb.dbz_batch_size | integer | 2048 | The maximum number of change events produced by Debezium embedded engine for SynchDB to process. This batch of changes is processed within a single transaction by SynchDB |
synchdb.dbz_queue_size | integer | 8192 | The maximum size (measured in number of change events) of Debezium embedded engine's change event queue. It should be set at least twice of synchdb.dbz_batch_size |
synchdb.dbz_connect_timeout_ms | integer | 30000 | The timeout value in milliseconds for Debezium embedded engine to established an initial connection to a remote database |
synchdb.dbz_query_timeout_ms | integer | 600000 | The timeout value in milliseconds for Debezium embedded engine to execute a query on a remote database |
synchdb.dbz_skipped_oeprations | string | "t" | A comma-separated list of operations Debezium shall skip when processing change events. "c" is for inserts, "u" is for updates, "d" is for deletes, "t" is for truncates |
synchdb.jvm_max_heap_size | integer | 1024 | The maximum heap size in MB to be allocated to Java Virtual Machine (JVM) when starting a connector. |
synchdb.dbz_snapshot_thread_num | integer | 2 | The number of threads Debezium embedded connector should spawn during initial snapshot. Please note that according to Debezium, multi-threaded snapshot is an incubating feature |
synchdb.dbz_snapshot_fetch_size | integer | 0 | The number of rows Debezium embedded connector should fetch at a time during initial snapshot. Set it to 0 to let the engine choose automatically |
synchdb.dbz_snapshot_min_row_to_stream_results | integer | 0 | The minimum number of rows a remote table should contain before Debezium embedded engine will switch to streaming mode during initial snapshot. Set it to 0 to always switching to stream mode |
synchdb.dbz_incremental_snapshot_chunk_size | integer | 2048 | The maximum number of change events produced by Debezium embedded engine for SynchDB to process during incremental snapshot |
synchdb.dbz_incremental_snapshot_watermarking_strategy | string | "insert-insert | The watermarking strategy used by Debezium embedded engine to resolve potential conflicts during incremental snapshot. Possible values are "insert-insert" and "insert-delete" |
synchdb.dbz_offset_flush_interval_ms | integer | 60000 | The interval in milliseconds that Debezium embedded engine flushes offset data to disk |
synchdb.dbz_capture_only_selected_table_ddl | boolean | true | whether or not Debezium embedded engine should capture the schema of all tables (false) or selected tables(true) during initial snapshot |
synchdb.max_connector_workers | integer | 30 | the maximum number of connector workers that can be running at a time |
Technical Notes¶
- GUC (Grand Unified Configuration) variables are global configuration parameters in PostgreSQL
- Values are set in the
postgresql.conf
file - Changes require a server restart to take effect
shared_preload_library
is a critical system configuration that determines which libraries are loaded at startup
Configuration Examples¶
# Example configuration in postgresql.conf
synchdb.naptime = 1000 # Increase wait time to 1 second
synchdb.dml_use_spi = true # Enable SPI usage for DML operations
synchdb.synchdb_auto_launcher = true # Enable automatic connector startup
synchdb.dbz_batch_size=4096 # Each batch can have at most 4096 change events
synchdb.dbz_queue_size=8192 # Debezium will use 8192 change event queue size
synchdb.jvm_max_heap_size=2048 # 2GB heap memory to be allocated to a connector
synchdb.dbz_snapshot_fetch_size=0 # Let Debezium figure out the optimal number of rows to fetch during initial snapshot
synchdb.dbz_min_row_to_stream_results=0 # Always stream the results during initial snapshot
synchdb.dbz_snapshot_thread_num=1 # Single thread during Debezium's initial snapshot
synchdb.dbz_incremental_snapshot_chunk_size=4096 # Incremental snapshot produces change events in batches of 4096 max
synchdb.dbz_incremental_snapshot_watermarking_strategy='insert_insert' # Use insert_insert watermarking strategy
synchdb.dbz_offset_flush_interval_ms=60000 # Flush offset data to disk every minute if needed
synchdb.dbz_capture_only_selected_table_ddl=false # Debezium will only capture the schema of selected tables rather than all tables
synchdb.max_connector_workers=10 # 10 connector workers can be run at a time
Usage Recommendations¶
-
synchdb.naptime
- Lower values: Higher update frequency but more system load
- Higher values: Lower system load but less frequent updates
- Adjust based on data latency requirements
-
synchdb.dml_use_spi
- Enable if specific SPI integration is needed
- Keep
false
for standard DML operations
-
synchdb.synchdb_auto_launcher
- Recommended to keep
true
for automatic connector resume upon PostgreSQL restarts - Change to
false
only if manual connector control is required
- Recommended to keep
-
synchdb.dbz_batch_size
- Lower values: Slower processing of change events at lower JVM memory usage
- Higher values: Faster processing of change events at higher JVM memory usage
- Adjust based on resource requirements
-
synchdb.dbz_queue_size
- Lower values: Smaller Debezium queue to hold change events
- Higher values: Larger Debezium queue to hold change events
- Need to be set at least twice of
synchdb.dbz_batch_size
-
synchdb.jvm_max_heap_size
- Lower values: Smaller heap memory allocated to JVM
- Higher values: Larger heap memory allocated to JVM
- Adjust based on system resource and workload requirements
- Needs increase when working with large number of tables
-
synchdb.dbz_snapshot_fetch_size
- Lower values: Less rows to be fetched from a table during snapshot
- Higher values: More rows to be fetched from a table during snapshot
- Recommended to keep it 0 to let Debezium figure out an optimal value
-
synchdb.dbz_min_row_to_stream_results
- Lower values: Less JVM memory requirement, slower processing of change events
- Higher values: More JVM memory requirement, faster processing of change events
- Recommended to keep it 0 to let Debezium use streaming mode always to reduce memory usage
-
synchdb.dbz_snapshot_thread_num
- Lower values: Slower data export to SynchDB for processing
- Higher values: Faster data export to SyncDB for processing
- Recommended to set it to the same number of CPU cores
-
synchdb.dbz_incremental_snapshot_chunk_size
- Lower values: Slower processing of change events at lower JVM memory usage during incremental snapshot
- Higher values: Faster processing of change events at higher JVM memory usage during incremental snapshot
- Recommended to set it the same as
synchdb.dbz_batch_size
and adjust Adjust based on resource requirements
-
synchdb.dbz_offset_flush_interval_ms
- Lower values: More frequent update to offset file, more IO, less old batches to re-preocess after fault restored
- Higher values: Less frequent update to offset file, less IO, more old batches to re-preocess after fault restored
- Recommended to set it to 60000 as Debezium's recommendation
-
synchdb.max_connector_workers
- Lower values: less connector workers can be run at a time, less shared memory requirement
- Higher values: more connector workers can be run at a time, more shared memory requirement
Performance Considerations¶
- Adjust
synchdb.naptime
based on system load and latency requirements - Adjust
synchdb.dbz_batch_size
andsynchdb.dbz_queue_size
higher to increase processing throughput - Adjust
synchdb.jvm_max_heap_size
based on workload- Smaller number of tables (10k or less) + large amount of data per table: 512MB ~ 1024MB should suffice
- Larger number of tables (100k or more) + moderate amount of data per table: consider increasing to 2048MB or above
- Set
synchdb.dbz_snapshot_fetch_size
to 0 to let Debezium pick optimal fetch value - Set
synchdb.dbz_snapshot_thread_num
to match number of CPU cores - Set
synchdb.dbz_min_row_to_stream_results
to 0 to always use stream mode to reduce memory usage
Common Use Cases¶
High-Throughput Systems¶
synchdb.naptime = 100 # Faster polling for real-time updates
synchdb.dml_use_spi = false # Standard DML for better performance
synchdb.dbz_batch_size = 4096
synchdb.dbz_queue_size = 8192
synchdb.jvm_max_heap_size = 2048
synchdb.dbz_snapshot_thread_num = 4
synchdb.dbz_snapshot_fetch_size = 0
synchdb.dbz_min_row_to_stream_results = 0
Resource-Constrained Systems¶
synchdb.naptime = 1000 # Reduced polling frequency
synchdb.dml_use_spi = false # Minimize additional overhead
synchdb.dbz_batch_size = 1024
synchdb.dbz_queue_size = 2048
synchdb.jvm_max_heap_size = 512
synchdb.dbz_snapshot_thread_num = 1
synchdb.dbz_snapshot_fetch_size = 0
synchdb.dbz_min_row_to_stream_results = 0
Development/Testing¶
synchdb.naptime = 500 # Default polling
synchdb.dml_use_spi = true # Enable advanced features for testing
synchdb.dbz_batch_size = 2048
synchdb.dbz_queue_size = 4096
synchdb.jvm_max_heap_size = 1024
synchdb.dbz_snapshot_thread_num = 2
synchdb.dbz_snapshot_fetch_size = 0
synchdb.dbz_min_row_to_stream_results = 0
Troubleshooting¶
-
High CPU Usage
- Increase
synchdb.naptime
- Review DML operation patterns
- Reduce
synchdb.dbz_batch_size
andsynchdb.dbz_queue_size
- Increase
synchdb.dbz_snapshot_thread_num
- Increase
-
Data Latency Issues
- Decrease
synchdb.naptime
- Increase
synchdb.dbz_batch_size
andsynchdb.dbz_queue_size
- Check network connectivity
- Increase
shared_buffers
- Split workload to more connectors rather than just one
- Start the connector with
no_data
mode to obtain schema only and begin CDC rather thaninitial
mode which capture both schema and initial data before CDC begins.
- Decrease
-
Startup Problems
- Verify
shared_preload_library
configuration - Check error messages from
synchdb_get_state()
- Check connector worker status
- Verify
-
Out of Memory Problems
- Increase
synchdb.jvm_max_heap_size
- Increase
shared_buffers
- Increase
Best Practices¶
-
Initial Setup
- Start with default values
- Monitor system performance
- Adjust gradually based on requirements
-
Production Environment
- Document all configuration changes
- Test changes in staging first
- Maintain backup of working configurations
-
Monitoring
- Track system resource usage
- Monitor data synchronization latency
- Log configuration changes