Large Values

Large values change the economics of a ZoneTree. Most defaults are expressed as record counts, but large values are governed by bytes, serialization cost, compression behavior, and merge IO.

This page is about values such as:

  • long strings,
  • JSON documents,
  • large byte arrays,
  • serialized object payloads,
  • mutable object graphs.

Start With The Value Model

Decide whether ZoneTree should store the whole value or a smaller reference to it.

Good fits for storing directly:

  • compact records,
  • small documents,
  • values that are read and updated as one unit,
  • values that compress well and remain operationally manageable.

Consider storing metadata or a content reference when values are very large:

  • content address,
  • file/object-store path,
  • blob id,
  • offset into an external payload store,
  • small searchable metadata plus external content.

ZoneTree is a storage engine. It can store large values, but the best product shape may be a compact ordered index in ZoneTree and large payloads elsewhere.

Mutable Segment Size

The first large-value control isMutableSegmentMaxItemCount.

The default is1_000_000records. That can be reasonable for compact records, but1_000_000large strings or documents can be far too much memory.

Tune this by expected byte size, not only by record count.

using var zoneTree = new ZoneTreeFactory<int, string>()
    .SetDataDirectory("data/large-values")
    .SetMutableSegmentMaxItemCount(10_000)
    .OpenOrCreate();

Lower values move data toward read-only segments and disk sooner. That reduces active mutable memory, but can increase merge frequency. Keep a maintainer alive so read-only segments do not accumulate.

Disk Part Size

Large values also change multipart disk segment tuning.

MinimumRecordCountandMaximumRecordCountare record counts. With large values, the same count can represent much more disk data and much more merge work.

For large values, use lower multipart part counts so each local rewrite unit stays reasonable.

using var zoneTree = new ZoneTreeFactory<int, string>()
    .SetDataDirectory("data/large-values")
    .ConfigureDiskSegmentOptions(options =>
    {
        options.MinimumRecordCount = 50_000;
        options.MaximumRecordCount = 150_000;
    })
    .OpenOrCreate();

This gives multipart merge a smaller horizontal unit. Clean old parts can still be carried forward, but changed ranges do not have to rewrite as much payload data.

See write amplification.

Disk Segment Max Item Count

DiskSegmentMaxItemCountcontrols when the active disk segment is sealed and moved to bottom segments.

The default is20_000_000records. For large values, that may represent a very large active disk segment.

Use a lower value when you want smaller operational boundaries:

using var zoneTree = new ZoneTreeFactory<int, string>()
    .SetDataDirectory("data/large-values")
    .SetDiskSegmentMaxItemCount(2_000_000)
    .OpenOrCreate();

Tune this together with multipart part counts.DiskSegmentMaxItemCountcontrols the vertical boundary.MinimumRecordCountandMaximumRecordCountcontrol the horizontal part size inside that disk segment.

Compression

Large text values, JSON, and repetitive documents often compress well. Already-compressed payloads usually do not.

ZoneTree uses compression for both WAL and disk segment storage by default. Compression can reduce IO and storage size, but it adds CPU cost.

For disk segments, compression block size also affects read memory. A cached disk block is held decompressed. The default disk compression block size is4 MB.

For large random reads, smaller disk compression blocks may reduce read amplification and block cache memory pressure. For sequential reads and compressible data, larger blocks may be better.

using var zoneTree = new ZoneTreeFactory<int, string>()
    .SetDataDirectory("data/large-values")
    .SetDiskSegmentCompressionBlockSize(1024 * 1024)
    .OpenOrCreate();

Benchmark with representative values.

See compression and read-path caching.

WAL Cost

Large values make WAL records larger. The default async compressed WAL is usually the right starting point because it keeps WAL protection enabled while compressing and writing in the background.

If WAL files are large or write latency is sensitive, test with real payloads and storage. Do not switch toNo WALfor persistent data just to hide large-value cost. UseNo WALonly for cache, temporary, or intentionally rebuildable data.

Value Mutability

Large values are often reference types. Treat stored values as immutable snapshots.

Do not insert a mutable object and then mutate it in place. If the object is still in an in-memory segment, the mutation can change the visible value without a new WAL record, without a new operation index, and without predictable recovery behavior.

Prefer:

  • immutable records or classes,
  • serialized payloads,
  • clone-and-upsert updates,
  • compact structs when the value is small enough.

See value mutability.

Reads

Large values make point reads and range scans more expensive.

For repeated nearby reads, block cache behavior matters. Keep a maintainer alive and tuneBlockCacheLifeTimeif repeated reads benefit from cached decompressed blocks.

For one-off scans, keep iterator block cache contribution disabled so the scan does not fill the shared block cache with data that will not be read again.

using var iterator = zoneTree.CreateIterator(
    contributeToTheBlockCache: false);

UsecontributeToTheBlockCache: trueonly when the scan represents a useful working set.

Symptom Guide

SymptomLikely pressureFirst actions
Memory grows quickly during insertstoo many large values in mutable/read-only memorylowerMutableSegmentMaxItemCount; keep maintenance active
Read-only segments accumulatelarge values make merge slower than writeslower mutable segment size carefully; inspect merge duration and storage throughput
Merges are slowlarge payloads increase merge IO and serialization costreduce value size; lower multipart part counts; tune compression
Active disk segment becomes too largeDiskSegmentMaxItemCountis too high for payload sizelowerDiskSegmentMaxItemCount
WAL files are largevalues are large or poorly compressibletest compressed WAL with real payloads; reduce payload size if possible
Random reads are expensivelarge compressed blocks or large serialized valuestune disk compression block size; split metadata from payload
Process memory stays high after readsdecompressed block cache retains large blocksshortenBlockCacheLifeTime; check iterator behavior

Practical Model

For large values, tune by bytes and access pattern:

  • lowerMutableSegmentMaxItemCount,
  • lower multipartMinimumRecordCountandMaximumRecordCount,
  • consider lowerDiskSegmentMaxItemCount,
  • benchmark compression with real values,
  • tune disk compression block size for read behavior,
  • keep block cache lifetime aligned with memory budget,
  • store references instead of payloads when values are too large for the desired operational shape.