Diagnostics

Good ZoneTree diagnostics start with the shape of the LSM tree. Most operational questions can be answered by watching how records move through the mutable segment, read-only segments, disk segment, bottom segments, WAL files, and caches.

Core Counters

Start with these maintenance counters:

CounterWhat It Tells You
MutableSegmentRecordCountrecords currently in the writable mutable segment
ReadOnlySegmentsCountnumber of frozen in-memory segments waiting for merge
ReadOnlySegmentsRecordCountrecords waiting in read-only in-memory segments
InMemoryRecordCountmutable plus read-only segment records
TotalRecordCounttotal segment records, including duplicated historical records across LSM layers
IsMergingnormal merge is running
IsBottomSegmentsMergingbottom segment merge is running
BottomSegments.Countnumber of bottom disk segments

TotalRecordCountis not the unique logical row count. LSM layers can contain older versions and deletion markers until merges remove them. Use scan-based counting when the exact live row count matters.

Maintenance Events

Maintenance events are the best low-cost way to build logs and metrics around segment movement:

  • OnMutableSegmentMovedForward
  • OnMergeOperationStarted
  • OnMergeOperationEnded
  • OnBottomSegmentsMergeOperationStarted
  • OnBottomSegmentsMergeOperationEnded
  • OnDiskSegmentCreated
  • OnDiskSegmentActivated
  • OnCanNotDropReadOnlySegment
  • OnCanNotDropDiskSegment
  • OnCanNotDropDiskSegmentCreator

Example:

zoneTree.Maintenance.OnMergeOperationEnded += (_, result) =>
{
    Console.WriteLine($"Merge ended: {result}");
};

zoneTree.Maintenance.OnMutableSegmentMovedForward += tree =>
{
    Console.WriteLine(
        $"read-only segments={tree.ReadOnlySegmentsCount}, " +
        $"in-memory records={tree.InMemoryRecordCount}");
};

Failed drop events do not mean the database is corrupt. They usually mean a file or segment is still referenced. ZoneTree can continue, and cleanup can happen later.

Write Pressure

Symptoms:

  • ReadOnlySegmentsCountkeeps growing,
  • ReadOnlySegmentsRecordCountkeeps growing,
  • memory grows with write volume,
  • merges run often or for a long time.

Check:

  • whether a maintainer is running,
  • MaximumReadOnlySegmentCount,
  • ThresholdForMergeOperationStart,
  • MutableSegmentMaxItemCount,
  • disk write throughput,
  • value size.

For large values, a defaultMutableSegmentMaxItemCountof1_000_000records can be too high. Lower it so the mutable segment moves forward before the process holds too much live value data.

See large values and write-heavy workloads.

Merge Pressure

Symptoms:

  • normal merge is almost always active,
  • bottom segments accumulate,
  • disk usage grows faster than live data,
  • foreground writes are fine but background work never catches up.

Check:

  • merge result events,
  • read-only segment count and record count,
  • disk segment max item count,
  • multipart minimum and maximum record count,
  • bottom segment count,
  • long-lived iterators that may delay segment disposal.

Multipart disk segments reduce rewrite work for localized merges, but they do not remove the cost of wide random overlap. If incoming read-only segments touch most of the keyspace, many parts can be affected.

See write amplification.

Read Path And Cache Behavior

For compressed disk segments, ZoneTree reads compressed blocks and keeps decompressed blocks in an internal block cache. The maintainer cleanup job releases inactive blocks based onBlockCacheLifeTime.

Symptoms:

  • repeated disk reads are slower than expected,
  • memory grows during read-heavy workloads,
  • full scans disturb random-read performance.

Check:

  • whetherCreateMaintainer()is used in long-lived applications,
  • BlockCacheLifeTime,
  • InactiveBlockCacheCleanupInterval,
  • disk segment compression block size,
  • iteratorcontributeToTheBlockCache.

One-off scans should usually avoid contributing to the block cache. Repeated scans over a useful working set can enable cache contribution intentionally.

See read path caching.

WAL And Recovery Signals

Track WAL size and recovery logs for persistent databases.

Symptoms:

  • startup recovery takes longer than expected,
  • WAL files grow,
  • recovery reports incomplete tails,
  • recovery reports checksum or deserialization failures.

Check:

  • WAL mode,
  • whether maintenance is merging read-only segments,
  • whether incremental backup is intentionally enabled,
  • serializer compatibility,
  • storage errors.

An incomplete tail after a process crash is different from checksum or deserialization failure. Valid records before the incomplete tail can still be used. Checksum and deserialization failures should be investigated as data-integrity issues.

See recovery and WAL modes.

Backup Signals

For live backup, watch:

  • generation duration,
  • failed generation logs,
  • file transfer duration,
  • record batch size,
  • local retention behavior if usingLocalLiveBackupProvider,
  • restore tests from real generations.

Live backup is generation based. A generation contains the disk segment files and optional in-memory record batch needed for that backup point. Restore should be tested before production traffic depends on it.

See backups.

Memory Diagnostics

OS process memory does not equal live ZoneTree data. .NET can retain memory for future allocations even after objects become collectible.

Use .NET diagnostics when memory matters:

  • live managed object size,
  • allocation rate,
  • large object heap usage,
  • GC pauses,
  • retained references,
  • long-lived iterators,
  • mutable segment size,
  • block cache lifetime.

The most common ZoneTree-side memory levers are:

  • MutableSegmentMaxItemCount,
  • value size,
  • maintainer cleanup,
  • block cache lifetime,
  • iterator lifetime.

Logging

Configure a logger when running in production:

using ZoneTree.Logger;

using var zoneTree = new ZoneTreeFactory<int, string>()
    .SetDataDirectory("data/app")
    .SetLogger(new ConsoleLogger(LogLevel.Info))
    .OpenOrCreate();

Capture at least:

  • failed merges,
  • failed drops,
  • WAL read errors,
  • recovery warnings,
  • backup failures,
  • unusually long maintenance operations.

Benchmark Records

When comparing results, record the full shape:

  • key and value type,
  • serializers,
  • comparer,
  • WAL mode,
  • disk segment mode,
  • compression settings,
  • mutable segment size,
  • multipart min/max record count,
  • disk segment max item count,
  • block cache lifetime,
  • storage hardware,
  • maintenance settings.

Without those details, benchmark numbers are difficult to explain or reproduce.