Menu

Post image 1
Post image 2
1 / 2
0

Inside GBase 8a’s DataCell: How 65,536-Row Blocks Shape Loading, Query, and Compression

DEV Community·Michael·21 days ago
#q5nfcwha
Reading 0:00
15s threshold

GBase 8a’s columnar engine uses DataCells (DCs) as its fundamental I/O unit. Each DC holds exactly 65,536 rows , and the last block of a DC remains uncompressed by design. This architecture has profound and largely positive effects on data loading, query performance, and compression — a deliberate trade‑off tailored for analytical workloads. Impact on Data Loading Ultra‑fast bulk writes : Organising data into 65,536‑row DCs turns scattered inserts into massive sequential writes, drastically cutting disk seeks. Documented load speeds exceed 30 TB/hour . Append‑only tail : New data always lands in the uncompressed tail of the current DC without touching existing DCs, making insertion extremely lightweight. Trade‑off : Data that doesn’t fill a full DC stays uncompressed and misses out on bulk compression and optimal I/O until a full DC is accumulated. Impact on Query Performance Column‑level I/O : Only the columns referenced in a query trigger I/O; untouched columns are never read.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More