Gaurav Sarma

Open a text editor, type a character, and it appears on screen. That single keystroke triggers a surprisingly deep question: how does the editor represent your document in memory so that insertions, deletions, and cursor movements all feel instant, even on a file with millions of lines?

The naive answer, a flat string or array of characters, falls apart fast. Insert a character at position 0 of a 10MB file and you are copying 10MB of data to make room. Real editors cannot afford this. The data structure behind the text buffer is one of the most consequential architectural decisions an editor makes, and different editors have made radically different choices.

The Problem

Text editing looks simple but has a hostile access pattern for data structures. The operations are:

Insert a character or string at an arbitrary position
Delete a character or range at an arbitrary position
Read a range of text (for rendering, search, syntax highlighting)
Navigate to a line number or byte offset quickly

The challenge is that these operations happen at random positions throughout the document, often thousands of times per second (think: holding down backspace, or a find-and-replace across a large file). The user expects every operation to feel instantaneous.

A flat array handles reads beautifully, O(1) random access, but insertions and deletions in the middle are O(n) because everything after the edit point must shift. For a 100-line config file this is invisible. For a 500,000-line log file, it is a visible stutter on every keystroke.

This is the core tension: data structures that are great for sequential reads tend to be terrible for random inserts, and vice versa. Every text editor buffer is a different answer to this trade-off.

Prerequisites

Familiarity with basic data structures: arrays, linked lists, balanced binary trees
Understanding of big-O complexity notation
General awareness of how text editors render documents (viewport, cursor, selections)
For the CRDT section: a rough idea of what eventual consistency means in distributed systems

Technical Decisions

The Four Contenders

The history of text editor buffers is essentially four major data structures, each born from a different era and set of constraints:

Data Structure	Notable Users	Era
Gap Buffer	Emacs, Scintilla	1970s-present
Piece Table	VS Code, AbiWord, original Word for Windows	1980s-present
Rope	Xi Editor, Crop, some game engines	1990s-present
CRDT (Yjs, Automerge, etc.)	Google Docs (OT variant), Zed, various collaborative editors	2010s-present

Each one made a bet about what matters most: simplicity, memory efficiency, worst-case latency, or multi-user concurrency. None of them won universally.

Why Not Just Use a Linked List of Lines?

Early editors like ed and vi used arrays of lines, where each line was a separate string. This works reasonably well for line-oriented editing, but it has two fatal problems for modern use. First, operations that span lines (multi-cursor edits, block selections, large pastes) become complicated because you are working across array boundaries. Second, very long lines (minified JavaScript, for example) degenerate to the flat array problem within a single line. Most modern editors operate on a flat character sequence internally and derive line information separately through a line index.

Why Undo is a Data Structure Concern

The choice of buffer data structure deeply affects how undo/redo works. A gap buffer must snapshot or diff the content to support undo, because edits modify the buffer in place. A piece table, by contrast, is append-only: the original file content is never mutated, and every edit just adds a new piece descriptor. This means you can implement undo by simply removing the most recent piece table entries, essentially rewinding the edit history. This difference alone was a major reason the VS Code team chose piece tables.

Implementation

The Gap Buffer: Emacs's Workhorse

The gap buffer is the oldest trick in the editor book and possibly the most elegant for its simplicity. The idea: store the document in a single contiguous array, but maintain an empty "gap" at the cursor position. When the user types, characters fill the gap. When the gap runs out, you grow the array (typically doubling, amortized O(1) for sequential inserts). When the cursor moves, you shift the gap to the new position by copying text across it.

The memory layout looks like this for the text "Hello World" with the cursor after "Hello":

[ H | e | l | l | o | _ | _ | _ | _ | _ | W | o | r | l | d ]
                      ^                   ^
                   gap start           gap end

Typing a space fills one gap slot. Moving the cursor to the end of "World" means copying "World" from after the gap to before it, then the gap sits at the end.

Where it shines: The gap buffer is fast when edits are localized. Most human editing happens near the cursor: you type a line, maybe backspace a few characters, type some more. For this access pattern, the gap buffer is extremely fast, often just a single array write per keystroke with no allocations.

Where it breaks down: Moving the cursor a large distance and then editing requires shifting the gap, which copies data proportional to the distance moved. On a 50MB file, jumping from the top to the bottom and inserting a character copies 50MB. This is why Emacs can sometimes pause noticeably on very large files when you jump around. Multi-cursor editing is also painful because you either need multiple gaps (complicated) or you are constantly shifting a single gap between cursor locations.

The gap buffer also has poor cache behavior when the gap is large. Modern CPUs love sequential memory access, and the gap creates a discontinuity that the prefetcher cannot bridge.

Complexity summary:

Operation	Cost
Insert at cursor	O(1) amortized
Insert at arbitrary position	O(n) to move gap
Delete at cursor	O(1)
Read across gap	O(1) but two-part copy
Line index lookup	Requires auxiliary structure

Emacs has used a gap buffer since the late 1970s. Despite its limitations, it has survived because most editing really is local, and the simplicity means fewer bugs and easier maintenance than fancier structures.

The Piece Table: VS Code's Append-Only Log

The piece table was described by J Strother Moore in 1981, but it reached mainstream attention when the VS Code team published their analysis of why they chose it. The core idea is deceptively simple: never modify the original file content. Instead, maintain two buffers:

The original buffer: the file as it was read from disk, immutable
The add buffer: an append-only buffer where all new text goes

The document is described by a table of "pieces," each of which points to a span in either the original buffer or the add buffer:

Original buffer: "Hello World"
Add buffer:      " Beautiful"

Piece table:
  [original, 0, 5]   → "Hello"
  [add, 0, 10]       → " Beautiful"
  [original, 5, 6]   → " World"

Logical document: "Hello Beautiful World"

Inserting text means: append the new text to the add buffer, then split the piece that contains the insertion point into two pieces and insert a new piece between them pointing to the appended text. Deletion means adjusting piece boundaries (or removing pieces entirely). The original buffer and previously appended text are never touched.

Where it shines: Memory efficiency is outstanding for typical editing sessions. Opening a 10MB file uses ~10MB for the original buffer (which can be memory-mapped directly from disk), and the add buffer only grows by the amount of text you actually type, often a few KB. The piece table itself is a small list of descriptors.

Undo is almost free: since the original buffer and add buffer are never modified, you can undo by reverting piece table entries. VS Code exploits this heavily.

The piece table also handles large file operations well. Deleting a 1MB block is just adjusting a few piece boundaries, not moving any text. Copy-paste of a large block within the same file can reference the same underlying buffer spans.

Where it breaks down: Reading text is no longer a simple array index. To read a range, you must walk the piece table to find which pieces contain the range, then concatenate slices from potentially different buffers. For syntax highlighting and rendering, this means the editor must materialize text into a contiguous buffer for the renderer, or the renderer must understand the piece table abstraction.

Sequential character-by-character reads (like a regex engine scanning the file) pay overhead per piece boundary. If the piece table has thousands of entries after heavy editing, this cost adds up. VS Code mitigates this by storing the piece table in a balanced binary tree (a red-black tree) indexed by both offset and line number, giving O(log n) access to any position where n is the number of pieces.

Complexity summary (VS Code's tree-based implementation):

Operation	Cost
Insert	O(log n) where n = piece count
Delete	O(log n)
Read at offset	O(log n) to find piece, then O(1) within piece
Line number lookup	O(log n) via augmented tree
Memory overhead	Original file + typed text + piece descriptors

One subtle advantage: because the original file buffer is immutable, VS Code can detect external file modifications by comparing the on-disk content to the original buffer. If they match, the piece table is still valid. If not, the file was modified externally.

The Rope: Xi Editor's Balanced Tree of Strings

The rope data structure, formalized by Boehm, Atkinson, and Plass in their 1995 paper "Ropes: an Alternative to Strings," takes a different approach entirely. Instead of one buffer with clever indexing, a rope breaks the text into chunks stored at the leaves of a balanced binary tree. Internal nodes store metadata: the total length of their left subtree (and often line counts, Unicode code point counts, or other metrics).

A rope representing "Hello Beautiful World" might look like:

           [21]
          /    \
       [5]     [16]
       /       /    \
   "Hello"  [10]   [6]
            /       |
   " Beautiful"  " World"

Each leaf holds a chunk of text (typically 64 to 1024 bytes). Internal nodes cache aggregate information. To find the character at position 7, you walk from the root: the left subtree has 5 characters, so position 7 is at offset 2 in the right subtree. Walk right: the left child has 10 characters, so offset 2 is in that leaf. The character is 'e' (the 'e' in "Beautiful"). This walk is O(log n) where n is the number of leaves.

Where it shines: Ropes have excellent worst-case behavior. Every operation, insert, delete, concatenation, split, is O(log n) regardless of where in the document it happens. There is no gap to move, no piece table to fragment. This makes ropes predictable, which matters for real-time editors that need consistent frame times.

Concatenation is particularly cheap: to join two ropes, create a new root node with the two ropes as children. This is O(1) if you defer rebalancing (or O(log n) if you rebalance immediately). This makes operations like "paste a 100MB chunk" essentially instant.

Ropes also compose well with functional programming patterns. Since nodes are immutable (you create new nodes for edits rather than modifying in place), you get persistent data structures for free. Xi editor used this for its undo system: each edit creates a new rope that shares most of its nodes with the previous version. The memory overhead is proportional to the edit, not the document size.

The chunk-based structure also maps well to parallel processing. Syntax highlighting, word counting, and search can operate on chunks independently and combine results.

Where it breaks down: Ropes have higher constant factors than gap buffers and piece tables for small documents. Each node is a heap allocation (or arena allocation), and tree traversal involves pointer chasing, which is hostile to CPU caches. For a 200-line file, a gap buffer will outperform a rope on every operation simply because the gap buffer is one contiguous allocation.

The implementation complexity is also significantly higher. Balancing the tree, managing chunk sizes (too small and you have overhead, too large and you lose the benefits), and maintaining augmented metadata through rotations requires careful engineering. Xi editor's rope implementation in Rust is roughly 2,000 lines of non-trivial code. Emacs's gap buffer logic is a fraction of that.

Reading a contiguous range of text requires visiting multiple leaves and copying their contents. For a renderer that needs a screen's worth of text (say, 80 columns by 50 rows = 4,000 characters), this might touch 4 to 60 leaves depending on chunk size and edit history, with a memory copy for each.

Complexity summary:

Operation	Cost
Insert	O(log n)
Delete	O(log n)
Concatenate two ropes	O(log n) with rebalance
Split at position	O(log n)
Read at offset	O(log n) to find leaf
Index by line number	O(log n) with augmented nodes

Xi editor, which was developed at Google as an experimental high-performance editor, chose ropes precisely because of this predictable worst-case behavior. The project also used ropes as the wire format between the front-end and back-end processes, serializing rope diffs rather than full text snapshots. The project was archived in 2023, but its rope library (xi-rope) influenced several subsequent editors.

CRDTs: When Multiple People Edit the Same Document

Everything above assumes a single user. The moment two users edit the same document simultaneously, the problem changes fundamentally. It is no longer enough to have an efficient buffer. You need the buffer to support concurrent, potentially conflicting edits and converge to the same state on all replicas without central coordination.

This is the domain of Conflict-free Replicated Data Types (CRDTs) and their predecessor, Operational Transformation (OT).

Operational Transformation was the first approach, used by Google Docs and earlier collaborative editors. OT works by transforming operations against each other: if user A inserts at position 5 and user B deletes at position 3, then by the time A's operation reaches B's replica, position 5 is now position 4 (because B's deletion shifted everything). OT defines transformation functions that adjust operations to account for concurrent edits.

OT works, Google Docs proves it at scale, but it has a painful property: the transformation functions must be correct for every possible pair of concurrent operations, and proving correctness is notoriously hard. The original OT paper had bugs. Many subsequent papers also had bugs. OT also typically requires a central server to determine the total ordering of operations, which limits architectural flexibility.

CRDTs take a different approach. Instead of transforming operations after the fact, CRDTs design the data structure itself so that concurrent operations commute: applying them in any order produces the same result. For text editing, this usually means assigning a globally unique, ordered identifier to every character.

In a sequence CRDT like Yjs or Automerge, each character gets an ID that encodes both its position in the sequence and which replica created it. These IDs are designed so that the intended ordering can always be reconstructed, regardless of the order in which replicas receive operations.

For example, if user A types "Hello" and user B concurrently types "World" after the same anchor point, the CRDT's ID scheme ensures a deterministic merge: maybe "HelloWorld" or "WorldHello" depending on the tiebreaking rule, but always the same result on every replica.

Where CRDTs shine: No central server required. Replicas can work offline, sync later, and converge automatically. This enables true peer-to-peer collaboration and offline-first editing. Zed, the collaborative code editor built in Rust, uses a CRDT for its buffer. So does the Ink & Switch research lab's suite of local-first applications.

CRDTs also compose well: you can have a CRDT for the text content, a separate CRDT for cursor positions, another for comments or annotations, and they all merge independently.

Where CRDTs break down: Memory overhead. Every character that has ever existed in the document (including deleted ones, which must be retained as "tombstones" for convergence) needs a unique ID. For a document with heavy editing, the metadata can exceed the text content by 2-10x. Yjs is remarkably efficient here, compressing runs of sequential inserts by the same user, but the overhead is still real.

Performance of the merge operation can also be surprising. While single-character inserts are typically O(log n), merging two replicas that have diverged significantly can be expensive as the CRDT must integrate many concurrent operations and resolve their ordering.

The "intention preservation" problem is also fundamental. When user A selects a word and bolds it while user B deletes that word, what should happen? CRDTs guarantee convergence (all replicas agree) but not necessarily that the result matches anyone's intention. These semantic conflicts still require application-level resolution.

Complexity summary:

Operation	Cost	Notes
Local insert	O(log n) typical	Depends on CRDT implementation
Local delete	O(log n)	Tombstone created, not actually removed
Merge with remote	O(k log n)	k = number of remote operations
Memory per character	ID + tombstone flag	2-10x overhead vs plain text
Convergence	Guaranteed	By mathematical construction

How It All Fits Together

The choice of buffer data structure ripples through the entire editor architecture:

Rendering pipeline: A gap buffer can hand the renderer a near-contiguous block of memory (just skip the gap). A piece table requires materializing text from multiple pieces. A rope requires walking leaves. A CRDT must filter tombstones. The renderer's complexity is inversely proportional to the buffer's simplicity.

Syntax highlighting: Modern editors use incremental parsing (often via tree-sitter) which needs to efficiently re-parse only the changed region. Ropes and piece tables naturally track edit boundaries, making incremental parse tree updates easier. Gap buffers require the editor to separately track what changed.

Undo/redo: Piece tables and ropes (when used persistently) get undo nearly for free because they preserve history structurally. Gap buffers require an explicit undo stack with either snapshots or inverse operations. CRDTs can treat undo as a new operation (insert what was deleted, delete what was inserted) that propagates to all replicas, though this interacts complexly with concurrent edits.

Memory mapping: Piece tables are uniquely suited to memory-mapped file I/O. The original buffer can be an mmap'd view of the file, meaning the OS handles paging and the editor uses no additional memory for unchanged content. Gap buffers cannot be mmap'd because they mutate the buffer in place. Ropes could theoretically mmap leaf nodes, but the chunk structure rarely aligns with file layout.

Large file performance: For files over 100MB, piece tables and ropes maintain consistent performance because they avoid O(n) copies. Gap buffers become impractical unless the user only edits near the cursor. CRDTs are generally not optimized for large-file single-user editing.

The tradeoff space in summary:

Dimension	Gap Buffer	Piece Table	Rope	CRDT
Local edit speed	Excellent (at cursor)	Good	Good	Good
Worst-case edit	Poor (O(n) gap move)	Good (O(log n))	Good (O(log n))	Varies
Memory efficiency	Very good	Excellent	Good	Poor
Undo complexity	High (explicit stack)	Low (structural)	Low (persistent)	Medium (operational)
Implementation complexity	Low	Medium	High	Very high
Multi-cursor support	Poor	Good	Good	Native
Collaborative editing	Not supported	Not supported	Possible but complex	Native
Cache friendliness	Good (near cursor)	Moderate	Poor (pointer chasing)	Poor

Lessons Learned

Locality of edits is the key insight. The gap buffer survives because it exploits a deep truth about human editing: most edits happen near the cursor. Data structures that optimize for the common case rather than the worst case often win in practice. Emacs has been fast enough for 45 years on a data structure with O(n) worst-case behavior.

Immutability is a superpower for editors. Both piece tables and persistent ropes demonstrate that never mutating existing content simplifies undo, crash recovery, and change detection. VS Code's piece table literally keeps the original file intact in memory. If the editor crashes, the original file is untouched on disk.

CRDTs change the question, not just the answer. Moving from single-user to collaborative editing is not about finding a faster data structure. It is about accepting fundamentally different constraints: you must handle concurrent operations, you must preserve causality, and you must accept that "correct" sometimes means "deterministically resolved but not what either user intended."

There is no universal winner. The Zed editor uses CRDTs even for single-user editing because they built for collaboration from the start. Emacs uses a gap buffer because its extension system and editing model assume contiguous memory. VS Code uses a piece table because it handles large files and undo elegantly. Xi used ropes because it wanted predictable latency. Each choice reflects the editor's values and constraints, not a ranking of data structures.

The markdown-specific problem is actually the rendering problem. Markdown editors like Obsidian, Typora, and notable do not typically use exotic buffer data structures. Their technical challenge is live preview: parsing markdown to an AST, rendering it, and keeping the rendered view synchronized with edits. Tree-sitter's incremental parsing makes this tractable by re-parsing only the changed subtree, but the complexity lives in the rendering pipeline, not the buffer.

What's Next

This post covered the buffer, the core data structure that stores text. But a modern editor has at least three other data-structure-heavy subsystems worth exploring:

The line index: How do you go from byte offset to line:column and back? Augmented balanced trees, Fenwick trees, and cached newline arrays all appear in the wild.
The syntax tree: Tree-sitter builds an incremental concrete syntax tree that survives edits. How it does incremental re-parsing is its own deep topic.
The selection model: Multiple cursors, rectangular selections, and folded regions all need their own data structures that compose with the buffer.

Each of these interacts with the buffer choice and inherits its trade-offs.

The Data Structures Behind Text Editors: Gap Buffers, Piece Tables, Ropes, and CRDTs

The Problem

Prerequisites

Technical Decisions

The Four Contenders

Why Not Just Use a Linked List of Lines?

Why Undo is a Data Structure Concern

Implementation

The Gap Buffer: Emacs's Workhorse

The Piece Table: VS Code's Append-Only Log

The Rope: Xi Editor's Balanced Tree of Strings

CRDTs: When Multiple People Edit the Same Document

How It All Fits Together

Lessons Learned

What's Next

References