<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://jtsylve.blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://jtsylve.blog/" rel="alternate" type="text/html" /><updated>2026-06-09T05:00:36+00:00</updated><id>https://jtsylve.blog/feed.xml</id><title type="html">Joe T. Sylve, Ph.D.</title><subtitle>Digital Forensic Researcher and Educator</subtitle><entry><title type="html">Clonegroups</title><link href="https://jtsylve.blog/post/2026/06/09/APFS-Clonegroups" rel="alternate" type="text/html" title="Clonegroups" /><published>2026-06-09T00:00:00+00:00</published><updated>2026-06-09T00:00:00+00:00</updated><id>https://jtsylve.blog/post/2026/06/09/APFS%20Clonegroups</id><content type="html" xml:base="https://jtsylve.blog/post/2026/06/09/APFS-Clonegroups"><![CDATA[<p>In our <a href="/post/2022/12/19/APFS-Data-Streams">post on Data Streams</a>, we discussed how APFS implements file cloning through shared extents and reference counting. While <code class="language-plaintext highlighter-rouge">j_phys_ext_val_t</code> reference counts and <code class="language-plaintext highlighter-rouge">j_dstream_id_val_t</code> track sharing at the extent level, APFS also maintains a higher-level grouping mechanism called <em>clonegroups</em> that tracks which inodes share physical data. This post covers the clonegroup tree and its role in managing cloned files.</p>

<h2 id="overview">Overview</h2>

<p>The <em>clonegroup tree</em> tracks groups of files that share physical data extents through cloning (e.g., <code class="language-plaintext highlighter-rouge">cp --clone</code> or the <code class="language-plaintext highlighter-rouge">clonefile</code> syscall). It is a <a href="/post/2022/12/08/APFS-BTrees">B-Tree</a> with subtype <code class="language-plaintext highlighter-rouge">OBJECT_TYPE_CLONEGROUP_TREE</code>, referenced by the <code class="language-plaintext highlighter-rouge">apfs_clonegroup_tree_oid</code> field in the <a href="/post/2022/12/13/APFS-Volume-Superblock">Volume Superblock</a>.</p>

<p>Within each clone group, exactly one inode is designated the <em>full clone</em>: it owns the physical data extents shared by the group. All other members are <em>partial clones</em> that reference the full clone’s extents via copy-on-write. When an inode has a <code class="language-plaintext highlighter-rouge">INO_EXT_TYPE_CLONEGROUP_ID</code> (type 21) extended field set, it belongs to the clone group identified by that field’s value.</p>

<h2 id="record-types">Record Types</h2>

<p>The clonegroup tree contains two types of records, distinguished by a <code class="language-plaintext highlighter-rouge">record_type</code> field in the key:</p>

<table style="margin-left: 0">
  <thead>
    <tr>
      <th>Type</th>
      <th>Name</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>Mapping</td>
      <td>Maps an inode to a clone group. One record per member inode.</td>
    </tr>
    <tr>
      <td>2</td>
      <td>Cookie</td>
      <td>Inserted when only one member remains, signaling the group can be cleaned up.</td>
    </tr>
  </tbody>
</table>

<h2 id="on-disk-structures">On-Disk Structures</h2>

<h3 id="mapping-records-record_type--1">Mapping Records (record_type = 1)</h3>

<p>Mapping records track which inodes belong to a clone group.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">clonegroup_mapping_key</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">group_id</span><span class="p">;</span>     <span class="c1">// 0x00</span>
    <span class="kt">uint8_t</span> <span class="n">record_type</span><span class="p">;</span>   <span class="c1">// 0x08 (always 1)</span>
    <span class="kt">uint64_t</span> <span class="n">inode_id</span><span class="p">;</span>     <span class="c1">// 0x09</span>
    <span class="kt">uint64_t</span> <span class="n">private_id</span><span class="p">;</span>   <span class="c1">// 0x11</span>
<span class="p">}</span> <span class="n">clonegroup_mapping_key_t</span><span class="p">;</span> <span class="c1">// 0x19 (25 bytes, packed)</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">group_id</code>: The clone group identifier</li>
  <li><code class="language-plaintext highlighter-rouge">record_type</code>: Always 1 for mapping records</li>
  <li><code class="language-plaintext highlighter-rouge">inode_id</code>: The inode number of the group member</li>
  <li><code class="language-plaintext highlighter-rouge">private_id</code>: The inode’s data stream identifier (<code class="language-plaintext highlighter-rouge">private_id</code> from <code class="language-plaintext highlighter-rouge">j_inode_val_t</code>)</li>
</ul>

<p>Keys are sorted by <code class="language-plaintext highlighter-rouge">group_id</code>, then <code class="language-plaintext highlighter-rouge">record_type</code>, then <code class="language-plaintext highlighter-rouge">inode_id</code>, then <code class="language-plaintext highlighter-rouge">private_id</code>.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define CLONEGROUP_FLAG_FULL_CLONE     0x10
#define CLONEGROUP_FLAG_PURGEABLE_MASK 0x0F
</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="nc">clonegroup_val</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">physical_size</span><span class="p">;</span> <span class="c1">// 0x00</span>
    <span class="kt">uint32_t</span> <span class="n">flags</span><span class="p">;</span>         <span class="c1">// 0x08</span>
    <span class="kt">uint8_t</span> <span class="n">xfields</span><span class="p">[];</span>      <span class="c1">// 0x0C</span>
<span class="p">}</span> <span class="n">clonegroup_val_t</span><span class="p">;</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">physical_size</code>: The total physical size in bytes of extents this inode contributes to the group. For the full clone, this equals the on-disk size of all shared extents. For partial clones, this is 0.</li>
  <li><code class="language-plaintext highlighter-rouge">flags</code>: Bit 4 (<code class="language-plaintext highlighter-rouge">CLONEGROUP_FLAG_FULL_CLONE</code>) indicates this inode owns the physical extents. Bits 0-3 encode purgeable urgency.</li>
  <li><code class="language-plaintext highlighter-rouge">xfields</code>: Optional extended fields (same format as inode extended fields)</li>
</ul>

<h3 id="cookie-records-record_type--2">Cookie Records (record_type = 2)</h3>

<p>Cookie records signal that a clone group has been reduced to a single member and can be cleaned up.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">clonegroup_cookie_key</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">group_id</span><span class="p">;</span>    <span class="c1">// 0x00</span>
    <span class="kt">uint8_t</span> <span class="n">record_type</span><span class="p">;</span>  <span class="c1">// 0x08 (always 2)</span>
    <span class="kt">uint64_t</span> <span class="n">cookie</span><span class="p">;</span>      <span class="c1">// 0x09</span>
<span class="p">}</span> <span class="n">clonegroup_cookie_key_t</span><span class="p">;</span> <span class="c1">// 0x11 (17 bytes, packed)</span>
</code></pre></div></div>

<p>The cookie value is a single byte set to 0. Its presence triggers the solo-group cleanup path.</p>

<h2 id="lifecycle">Lifecycle</h2>

<h3 id="group-creation">Group Creation</h3>

<p>When a file is first cloned and the clone group does not yet exist:</p>

<ol>
  <li>A mapping record is inserted for the source inode with <code class="language-plaintext highlighter-rouge">CLONEGROUP_FLAG_FULL_CLONE</code> set and <code class="language-plaintext highlighter-rouge">physical_size</code> reflecting its data extent size.</li>
  <li><code class="language-plaintext highlighter-rouge">INO_EXT_TYPE_CLONEGROUP_ID</code> is set on the source inode.</li>
  <li>A mapping record is inserted for the clone with <code class="language-plaintext highlighter-rouge">physical_size = 0</code> (partial clone).</li>
  <li><code class="language-plaintext highlighter-rouge">INO_EXT_TYPE_CLONEGROUP_ID</code> is set on the clone.</li>
</ol>

<h3 id="adding-members">Adding Members</h3>

<p>Each subsequent clone of any group member gets its own mapping record as a partial clone. The group grows without any data being physically copied.</p>

<h3 id="full-clone-promotion-and-demotion">Full Clone Promotion and Demotion</h3>

<p>As clones diverge through copy-on-write, an inode’s relationship to the shared extents changes:</p>

<ul>
  <li>When an inode that was a partial clone has fully diverged (all its extents are unique), it becomes a full clone of its own data.</li>
  <li>When a full clone is deleted, ownership of the shared physical extents must transfer to another group member.</li>
</ul>

<p>These transitions are tracked by setting or clearing <code class="language-plaintext highlighter-rouge">CLONEGROUP_FLAG_FULL_CLONE</code> and updating <code class="language-plaintext highlighter-rouge">physical_size</code>.</p>

<h3 id="deletion">Deletion</h3>

<p>When a group member is deleted:</p>

<ol>
  <li>Its mapping record is removed from the clonegroup tree.</li>
  <li>If the deleted inode was the full clone, ownership transfers to another member.</li>
  <li>If only one member remains, a cookie record is inserted to mark the group for cleanup.</li>
</ol>

<h3 id="solo-group-cleanup">Solo Group Cleanup</h3>

<p>When a group is reduced to a single member, the clone group tracking overhead is no longer needed. The cleanup process removes the remaining mapping record, the cookie record, and the <code class="language-plaintext highlighter-rouge">INO_EXT_TYPE_CLONEGROUP_ID</code> extended field from the surviving inode.</p>

<h2 id="forensic-considerations">Forensic Considerations</h2>

<p>The clonegroup tree provides insight into file relationships that cannot be derived from extent records alone:</p>

<ul>
  <li>It reveals which files were created by cloning, even after copy-on-write has caused their extents to partially or fully diverge.</li>
  <li>The <code class="language-plaintext highlighter-rouge">physical_size</code> field on the full clone indicates how much shared data exists, which is important for accurate disk space accounting.</li>
  <li>Cookie records reveal clone groups that are in the process of being dissolved.</li>
  <li>The <code class="language-plaintext highlighter-rouge">group_id</code> links related files that may be spread across different directories, enabling reconstruction of clone relationships.</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>Clonegroups provide the bookkeeping layer above APFS’s extent-level reference counting. While physical extents track shared blocks, clonegroups track shared <em>relationships</em> between files. This enables efficient space accounting, orderly ownership transfer during deletion, and cleanup when clone groups dissolve.</p>]]></content><author><name></name></author><category term="file-systems" /><category term="apfs" /><category term="apfs" /><category term="clonegroups" /><category term="copy-on-write" /><summary type="html"><![CDATA[In our post on Data Streams, we discussed how APFS implements file cloning through shared extents and reference counting. While j_phys_ext_val_t reference counts and j_dstream_id_val_t track sharing at the extent level, APFS also maintains a higher-level grouping mechanism called clonegroups that tracks which inodes share physical data. This post covers the clonegroup tree and its role in managing cloned files.]]></summary></entry><entry><title type="html">Transparent Compression (DECMPFS)</title><link href="https://jtsylve.blog/post/2026/06/08/APFS-DECMPFS" rel="alternate" type="text/html" title="Transparent Compression (DECMPFS)" /><published>2026-06-08T00:00:00+00:00</published><updated>2026-06-08T00:00:00+00:00</updated><id>https://jtsylve.blog/post/2026/06/08/APFS%20DECMPFS</id><content type="html" xml:base="https://jtsylve.blog/post/2026/06/08/APFS-DECMPFS"><![CDATA[<p>APFS supports transparent file compression through the DECMPFS (Decompression File System) framework, shared with HFS+. Compressed files appear normal to applications but store their data in a compressed form on disk, significantly reducing space usage on system volumes. This post covers the on-disk format, compression types, and how to parse compressed files.</p>

<h2 id="overview">Overview</h2>

<p>A compressed file is identified by the <code class="language-plaintext highlighter-rouge">UF_COMPRESSED</code> BSD flag set in its <a href="/post/2022/12/16/APFS-Inode-and-Directory-Records">inode record</a>. When this flag is present, the file’s actual data is stored in either an <a href="/post/2022/12/19/APFS-Data-Streams">extended attribute</a> named <code class="language-plaintext highlighter-rouge">com.apple.decmpfs</code> (for small files) or in the file’s resource fork (for larger files). The kernel transparently decompresses data on read, so applications never see the compressed form.</p>

<h2 id="the-decmpfs_disk_header">The decmpfs_disk_header</h2>

<p>The <code class="language-plaintext highlighter-rouge">com.apple.decmpfs</code> extended attribute begins with a fixed header:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define DECMPFS_MAGIC 0x636d7066 // 'cmpf'
</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">compression_magic</span><span class="p">;</span>  <span class="c1">// 0x00</span>
    <span class="kt">uint32_t</span> <span class="n">compression_type</span><span class="p">;</span>   <span class="c1">// 0x04</span>
    <span class="kt">uint64_t</span> <span class="n">uncompressed_size</span><span class="p">;</span>  <span class="c1">// 0x08</span>
    <span class="kt">uint8_t</span> <span class="n">attr_bytes</span><span class="p">[];</span>        <span class="c1">// 0x10</span>
<span class="p">}</span> <span class="n">decmpfs_disk_header</span><span class="p">;</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">compression_magic</code>: Must equal <code class="language-plaintext highlighter-rouge">DECMPFS_MAGIC</code> (<code class="language-plaintext highlighter-rouge">0x636d7066</code>). All fields are little-endian.</li>
  <li><code class="language-plaintext highlighter-rouge">compression_type</code>: Identifies the compression algorithm and data location (see below)</li>
  <li><code class="language-plaintext highlighter-rouge">uncompressed_size</code>: The original uncompressed file size in bytes (for <code class="language-plaintext highlighter-rouge">DATALESS_PKG_CMPFS_TYPE</code> this field is reinterpreted: the low 40 bits hold the package size and the upper bits hold a child-entry count)</li>
  <li><code class="language-plaintext highlighter-rouge">attr_bytes</code>: Inline compressed data (for xattr-stored types), or empty for resource fork types</li>
</ul>

<p>The maximum size of the entire <code class="language-plaintext highlighter-rouge">com.apple.decmpfs</code> extended attribute is 3802 bytes. If the compressed data exceeds this limit, it must be stored in the resource fork.</p>

<h2 id="compression-types">Compression Types</h2>

<table style="margin-left: 0">
  <thead>
    <tr>
      <th>Type</th>
      <th>Algorithm</th>
      <th>Location</th>
      <th>Notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>None</td>
      <td>xattr</td>
      <td>Small files stored uncompressed inline</td>
    </tr>
    <tr>
      <td>3</td>
      <td>zlib</td>
      <td>xattr</td>
      <td>Small zlib-compressed files</td>
    </tr>
    <tr>
      <td>4</td>
      <td>zlib</td>
      <td>resource fork</td>
      <td>Larger zlib-compressed files</td>
    </tr>
    <tr>
      <td>5</td>
      <td>Dataless</td>
      <td>none</td>
      <td>Data fetched on demand (iCloud/network)</td>
    </tr>
    <tr>
      <td>7</td>
      <td>LZVN</td>
      <td>xattr</td>
      <td>Fast LZ77 variant (macOS 10.9+)</td>
    </tr>
    <tr>
      <td>8</td>
      <td>LZVN</td>
      <td>resource fork</td>
      <td>Larger LZVN files</td>
    </tr>
    <tr>
      <td>9</td>
      <td>None</td>
      <td>xattr</td>
      <td>Uncompressed variant in LZVN format</td>
    </tr>
    <tr>
      <td>10</td>
      <td>None</td>
      <td>resource fork</td>
      <td>64KB chunks, uncompressed</td>
    </tr>
    <tr>
      <td>11</td>
      <td>LZFSE</td>
      <td>xattr</td>
      <td>High-efficiency entropy-coded (macOS 10.11+)</td>
    </tr>
    <tr>
      <td>12</td>
      <td>LZFSE</td>
      <td>resource fork</td>
      <td>Larger LZFSE files</td>
    </tr>
    <tr>
      <td>13</td>
      <td>LZBITMAP</td>
      <td>xattr</td>
      <td>Block bitmap compression</td>
    </tr>
    <tr>
      <td>14</td>
      <td>LZBITMAP</td>
      <td>resource fork</td>
      <td>Larger LZBITMAP files</td>
    </tr>
  </tbody>
</table>

<p>Odd-numbered types (3, 7, 9, 11, 13) store data inline in the extended attribute. Even-numbered types (4, 8, 10, 12, 14) store data in the resource fork.</p>

<h3 id="dataless-files">Dataless Files</h3>

<p>Special compression types represent files whose content is not stored locally:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define DATALESS_CMPFS_TYPE     0x80000001
#define DATALESS_PKG_CMPFS_TYPE 0x80000002
</span></code></pre></div></div>

<p>These are placeholders for iCloud-synced or network-mounted content. The metadata (size, permissions) exists locally, but the data is fetched on demand.</p>

<h2 id="parsing-a-compressed-file">Parsing a Compressed File</h2>

<ol>
  <li>Check the <code class="language-plaintext highlighter-rouge">UF_COMPRESSED</code> flag (bit 5 of <code class="language-plaintext highlighter-rouge">bsd_flags</code> in <code class="language-plaintext highlighter-rouge">j_inode_val_t</code>).</li>
  <li>Read the <code class="language-plaintext highlighter-rouge">com.apple.decmpfs</code> extended attribute from the File System Tree.</li>
  <li>Verify <code class="language-plaintext highlighter-rouge">compression_magic</code> equals <code class="language-plaintext highlighter-rouge">DECMPFS_MAGIC</code>.</li>
  <li>Read <code class="language-plaintext highlighter-rouge">compression_type</code> to determine the algorithm and data location.</li>
  <li>Locate the compressed data:
    <ul>
      <li><strong>Inline (types 1, 3, 7, 9, 11, 13):</strong> Data follows the header in <code class="language-plaintext highlighter-rouge">attr_bytes</code>.</li>
      <li><strong>Resource fork (types 4, 8, 10, 12, 14):</strong> Data is in the <code class="language-plaintext highlighter-rouge">com.apple.ResourceFork</code> extended attribute.</li>
    </ul>
  </li>
  <li>Decompress using the appropriate algorithm.</li>
</ol>

<h2 id="resource-fork-chunking">Resource Fork Chunking</h2>

<p>Resource fork compression types split data into 65,536-byte (64 KB) chunks. Two chunking schemes exist:</p>

<h3 id="scheme-v1-type-4-zlib">Scheme v1 (Type 4, zlib)</h3>

<p>The resource fork data section begins with a chunk table: an array of <code class="language-plaintext highlighter-rouge">uint32_t</code> offsets, one per chunk plus a trailing entry. The compressed size of chunk <code class="language-plaintext highlighter-rouge">i</code> is <code class="language-plaintext highlighter-rouge">offsets[i+1] - offsets[i]</code>.</p>

<h3 id="scheme-v2-types-8-10-12-14">Scheme v2 (Types 8, 10, 12, 14)</h3>

<p>The resource fork contains a resource map with type <code class="language-plaintext highlighter-rouge">'cmpf'</code> (<code class="language-plaintext highlighter-rouge">0x636D7066</code>). At offset 260, a <code class="language-plaintext highlighter-rouge">uint32_t</code> chunk count is stored. Starting at offset 264, each chunk is described by an 8-byte entry containing a <code class="language-plaintext highlighter-rouge">uint32_t</code> offset and <code class="language-plaintext highlighter-rouge">uint32_t</code> size.</p>

<h2 id="interaction-with-apfs">Interaction with APFS</h2>

<p>When the kernel hides extended attributes from userland for compressed files:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">com.apple.decmpfs</code> is always hidden</li>
  <li><code class="language-plaintext highlighter-rouge">com.apple.ResourceFork</code> is hidden when it contains compression data</li>
</ul>

<p>This means forensic tools accessing raw APFS structures will see these attributes, but tools going through the VFS layer will not. The <code class="language-plaintext highlighter-rouge">INODE_HAS_UNCOMPRESSED_SIZE</code> flag (0x40000) in <code class="language-plaintext highlighter-rouge">internal_flags</code> indicates the inode’s <code class="language-plaintext highlighter-rouge">uncompressed_size</code> field is valid.</p>

<p>On <a href="/post/2022/12/20/APFS-Sealed-Volumes">sealed volumes</a>, compressed data integrity is verified through the sealed volume’s hash tree. The <code class="language-plaintext highlighter-rouge">apfs_verify_uncompressed_data</code> function checks decompressed blocks against expected hashes.</p>

<h2 id="forensic-considerations">Forensic Considerations</h2>

<ul>
  <li>Transparent compression is extremely common on macOS system volumes. Most files in <code class="language-plaintext highlighter-rouge">/System</code> and <code class="language-plaintext highlighter-rouge">/usr</code> are compressed.</li>
  <li>The reported file size (in the inode) is the <em>compressed</em> size (allocated extents). The <em>actual</em> size is in <code class="language-plaintext highlighter-rouge">uncompressed_size</code> from the decmpfs header or the inode’s extended field.</li>
  <li>Tools that read raw disk data must handle decompression to access file contents.</li>
  <li>The compression type reveals which macOS version created the file: LZVN (10.9+), LZFSE (10.11+), LZBITMAP (macOS 11+).</li>
  <li>Dataless files (types 0x80000001, 0x80000002) indicate cloud-synced content whose data was never stored locally or has been evicted.</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>DECMPFS provides transparent, per-file compression that is deeply integrated into APFS through extended attributes and resource forks. Understanding the compression types and chunking schemes is essential for any tool that needs to access file contents on APFS volumes, particularly system volumes where compression is the default.</p>]]></content><author><name></name></author><category term="file-systems" /><category term="apfs" /><category term="apfs" /><category term="compression" /><category term="decmpfs" /><summary type="html"><![CDATA[APFS supports transparent file compression through the DECMPFS (Decompression File System) framework, shared with HFS+. Compressed files appear normal to applications but store their data in a compressed form on disk, significantly reducing space usage on system volumes. This post covers the on-disk format, compression types, and how to parse compressed files.]]></summary></entry><entry><title type="html">Hard Links and Siblings</title><link href="https://jtsylve.blog/post/2026/06/05/APFS-Siblings" rel="alternate" type="text/html" title="Hard Links and Siblings" /><published>2026-06-05T00:00:00+00:00</published><updated>2026-06-05T00:00:00+00:00</updated><id>https://jtsylve.blog/post/2026/06/05/APFS%20Siblings</id><content type="html" xml:base="https://jtsylve.blog/post/2026/06/05/APFS-Siblings"><![CDATA[<p>In our <a href="/post/2022/12/16/APFS-Inode-and-Directory-Records">post on Inode and Directory Records</a>, we noted that a single inode may be referenced by more than one directory record, as is the case with hard links. In <a href="/post/2022/12/15/APFS-FSTrees">File System Trees</a>, we listed <code class="language-plaintext highlighter-rouge">APFS_TYPE_SIBLING_LINK</code> and <code class="language-plaintext highlighter-rouge">APFS_TYPE_SIBLING_MAP</code> among the record types. Today we examine how APFS explicitly tracks hard links through a mechanism called <em>siblings</em>.</p>

<h2 id="why-siblings-exist">Why Siblings Exist</h2>

<p>Traditional Unix file systems track hard links implicitly: an inode has a link count (<code class="language-plaintext highlighter-rouge">nlink</code>), and each directory entry pointing to it constitutes a link. There is no built-in way to enumerate all the names of a hard-linked file without scanning the entire file system.</p>

<p>APFS tracks hard links explicitly. Each hard link to an inode is called a <em>sibling</em> and is assigned its own unique identifier. This enables:</p>
<ul>
  <li>Efficient enumeration of all names for a file</li>
  <li>Bidirectional mapping between sibling identifiers and inodes</li>
  <li>Support for macOS Carbon APIs that require distinguishing between links to the same file</li>
  <li>Proper Spotlight indexing and file coordination across multiple names</li>
</ul>

<p>The sibling with the lowest identifier is the <em>primary link</em>. The inode’s <code class="language-plaintext highlighter-rouge">parent_id</code> and <code class="language-plaintext highlighter-rouge">INO_EXT_TYPE_NAME</code> extended field always reflect the primary link’s parent directory and name.</p>

<h2 id="sibling-link-records">Sibling Link Records</h2>

<p><em>Sibling link records</em> (type <code class="language-plaintext highlighter-rouge">APFS_TYPE_SIBLING_LINK</code>) map from an inode to each of its hard links. They are stored in the <a href="/post/2022/12/15/APFS-FSTrees">File System Tree</a>.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">j_sibling_key</span> <span class="p">{</span>
    <span class="n">j_key_t</span> <span class="n">hdr</span><span class="p">;</span>          <span class="c1">// 0x00</span>
    <span class="kt">uint64_t</span> <span class="n">sibling_id</span><span class="p">;</span>  <span class="c1">// 0x08</span>
<span class="p">}</span> <span class="n">j_sibling_key_t</span><span class="p">;</span>        <span class="c1">// 0x10</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">hdr</code>: The record’s header. The object identifier is the inode number.</li>
  <li><code class="language-plaintext highlighter-rouge">sibling_id</code>: The sibling’s unique identifier</li>
</ul>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">j_sibling_val</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">parent_id</span><span class="p">;</span>  <span class="c1">// 0x00</span>
    <span class="kt">uint16_t</span> <span class="n">name_len</span><span class="p">;</span>   <span class="c1">// 0x08</span>
    <span class="kt">uint8_t</span> <span class="n">name</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>     <span class="c1">// 0x0A</span>
<span class="p">}</span> <span class="n">j_sibling_val_t</span><span class="p">;</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">parent_id</code>: The inode number of the parent directory containing this link</li>
  <li><code class="language-plaintext highlighter-rouge">name_len</code>: The length of the name including the null terminator</li>
  <li><code class="language-plaintext highlighter-rouge">name</code>: The null-terminated UTF-8 name of the directory entry</li>
</ul>

<p>For a file with three hard links, there will be three sibling link records, all sharing the same inode number in their key header but each with a unique <code class="language-plaintext highlighter-rouge">sibling_id</code>. Each record stores the parent directory and name for that particular link.</p>

<h2 id="sibling-map-records">Sibling Map Records</h2>

<p><em>Sibling map records</em> (type <code class="language-plaintext highlighter-rouge">APFS_TYPE_SIBLING_MAP</code>) provide the reverse mapping: given a sibling identifier, find the inode.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">j_sibling_map_key</span> <span class="p">{</span>
    <span class="n">j_key_t</span> <span class="n">hdr</span><span class="p">;</span> <span class="c1">// 0x00</span>
<span class="p">}</span> <span class="n">j_sibling_map_key_t</span><span class="p">;</span> <span class="c1">// 0x08</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">hdr</code>: The record’s header. The object identifier is the sibling’s unique identifier.</li>
</ul>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">j_sibling_map_val</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">file_id</span><span class="p">;</span> <span class="c1">// 0x00</span>
<span class="p">}</span> <span class="n">j_sibling_map_val_t</span><span class="p">;</span> <span class="c1">// 0x08</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">file_id</code>: The inode number of the underlying file</li>
</ul>

<p>This bidirectional mapping (sibling link: inode -&gt; sibling ID + location; sibling map: sibling ID -&gt; inode) allows efficient traversal in either direction.</p>

<h2 id="sibling-identifier-allocation">Sibling Identifier Allocation</h2>

<p>Sibling identifiers are allocated from the same object identifier space as inode numbers (from the volume’s <code class="language-plaintext highlighter-rouge">next_obj_id</code> counter). Each directory record for a hard-linked file stores its sibling identifier in the <code class="language-plaintext highlighter-rouge">DREC_EXT_TYPE_SIBLING_ID</code> extended field, linking the directory entry to its corresponding sibling records.</p>

<h2 id="operations">Operations</h2>

<p>When the first hard link is created (the target’s <code class="language-plaintext highlighter-rouge">nlink</code> is still 1 and its existing directory entry has no <code class="language-plaintext highlighter-rouge">DREC_EXT_TYPE_SIBLING_ID</code> field), the original entry is first promoted to a sibling: a sibling identifier is allocated for it, a <code class="language-plaintext highlighter-rouge">DREC_EXT_TYPE_SIBLING_ID</code> field is added to that existing directory entry, and sibling link and map records are created for the original link. The steps below then run for the new link.</p>

<p>When a hard link is created:</p>
<ol>
  <li>A new sibling identifier is allocated from <code class="language-plaintext highlighter-rouge">next_obj_id</code> for the new link (on the first hard link, a second identifier is also allocated to promote the original entry; see above).</li>
  <li>A sibling link record is inserted into the File System Tree, keyed by the target inode number and the new sibling ID.</li>
  <li>A sibling map record is inserted, keyed by the sibling ID, with the target inode as the value.</li>
  <li>The directory record receives a <code class="language-plaintext highlighter-rouge">DREC_EXT_TYPE_SIBLING_ID</code> extended field with the sibling ID.</li>
  <li>Because sibling identifiers are handed out in increasing order from <code class="language-plaintext highlighter-rouge">next_obj_id</code>, a newly created link always has a higher identifier than every existing sibling, so creating a link never changes which sibling is the primary link.</li>
</ol>

<p>When a hard link is removed:</p>
<ol>
  <li>Both the sibling link record and sibling map record are deleted.</li>
  <li>If the removed link was the primary link, the inode’s metadata is updated to reflect the next-lowest sibling as the new primary.</li>
</ol>

<h2 id="hard-link-fixup-at-mount">Hard-Link Fixup at Mount</h2>

<p>On volumes where the <code class="language-plaintext highlighter-rouge">APFS_FEATURE_HARDLINK_MAP_RECORDS</code> feature flag (bit 1 of <code class="language-plaintext highlighter-rouge">apfs_features</code>) is not set, the implementation runs a fixup pass at mount time. This pass iterates all <code class="language-plaintext highlighter-rouge">APFS_TYPE_SIBLING_LINK</code> records and ensures a corresponding <code class="language-plaintext highlighter-rouge">APFS_TYPE_SIBLING_MAP</code> record exists for each one. Progress is tracked via the <code class="language-plaintext highlighter-rouge">fixup-hardlink-progress</code> extended attribute on the root directory (inode 2), which stores the last processed object identifier.</p>

<p>Once fixup completes, <code class="language-plaintext highlighter-rouge">APFS_FEATURE_HARDLINK_MAP_RECORDS</code> is set and the progress attribute is removed. This mechanism handles the transition from older APFS implementations that did not maintain sibling map records.</p>

<h2 id="forensic-considerations">Forensic Considerations</h2>

<p>Sibling records are valuable for forensic analysis:</p>

<ul>
  <li>They allow complete enumeration of all paths to a file without scanning every directory entry on the volume.</li>
  <li>The <code class="language-plaintext highlighter-rouge">parent_id</code> in sibling link records reveals which directories contain links to a file, even if some of those directory entries have been deleted or are in snapshots.</li>
  <li>Inconsistencies between sibling records and directory entries (or between sibling link and sibling map records) may indicate tampering or corruption.</li>
  <li>The <code class="language-plaintext highlighter-rouge">DREC_EXT_TYPE_SIBLING_ID</code> extended field in directory records provides a cross-reference that can validate the integrity of the sibling mapping.</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>APFS’s explicit hard link tracking through sibling records distinguishes it from traditional Unix file systems. The bidirectional mapping between inodes and sibling identifiers enables efficient enumeration, correct primary link tracking, and robust support for macOS APIs that distinguish between names of the same file.</p>]]></content><author><name></name></author><category term="file-systems" /><category term="apfs" /><category term="apfs" /><category term="hard-links" /><category term="siblings" /><summary type="html"><![CDATA[In our post on Inode and Directory Records, we noted that a single inode may be referenced by more than one directory record, as is the case with hard links. In File System Trees, we listed APFS_TYPE_SIBLING_LINK and APFS_TYPE_SIBLING_MAP among the record types. Today we examine how APFS explicitly tracks hard links through a mechanism called siblings.]]></summary></entry><entry><title type="html">EFI Jumpstart</title><link href="https://jtsylve.blog/post/2026/06/04/APFS-EFI-Jumpstart" rel="alternate" type="text/html" title="EFI Jumpstart" /><published>2026-06-04T00:00:00+00:00</published><updated>2026-06-04T00:00:00+00:00</updated><id>https://jtsylve.blog/post/2026/06/04/APFS%20EFI%20Jumpstart</id><content type="html" xml:base="https://jtsylve.blog/post/2026/06/04/APFS-EFI-Jumpstart"><![CDATA[<p>APFS containers include an embedded EFI driver that allows UEFI firmware to boot from APFS partitions without requiring a built-in APFS driver. This post covers the <code class="language-plaintext highlighter-rouge">nx_efi_jumpstart_t</code> structure and the boot procedure that uses it.</p>

<h2 id="overview">Overview</h2>

<p>The EFI jumpstart mechanism is intentionally simple. The driver can be located by reading a few data structures starting from physical block zero, without any B-Tree traversal or complex APFS parsing. This minimal dependency means that UEFI firmware (or virtualization software) can load the APFS driver with only basic block-read capability.</p>

<p>The <code class="language-plaintext highlighter-rouge">nx_efi_jumpstart</code> field of the <a href="/post/2022/12/06/APFS-NX-Superblock">NX Superblock</a> stores the physical block address of the jumpstart structure. This field is written during container creation and is not used by the kernel APFS driver during normal operation.</p>

<h2 id="boot-procedure">Boot Procedure</h2>

<p>To boot from an APFS partition using the embedded EFI driver:</p>

<ol>
  <li>
    <p>Read physical block zero (the container superblock). Verify the Fletcher-64 checksum and confirm <code class="language-plaintext highlighter-rouge">nx_magic</code> equals <code class="language-plaintext highlighter-rouge">NX_MAGIC</code> (<code class="language-plaintext highlighter-rouge">'BSXN'</code>).</p>
  </li>
  <li>
    <p>Read the physical block at the address in <code class="language-plaintext highlighter-rouge">nx_efi_jumpstart</code>.</p>
  </li>
  <li>
    <p>Verify <code class="language-plaintext highlighter-rouge">nej_magic</code> equals <code class="language-plaintext highlighter-rouge">NX_EFI_JUMPSTART_MAGIC</code> (<code class="language-plaintext highlighter-rouge">'RDSJ'</code>), verify the Fletcher-64 checksum, and confirm <code class="language-plaintext highlighter-rouge">nej_version</code> is 1.</p>
  </li>
  <li>
    <p>Allocate a contiguous memory buffer of at least <code class="language-plaintext highlighter-rouge">nej_efi_file_len</code> bytes.</p>
  </li>
  <li>
    <p>Read the <code class="language-plaintext highlighter-rouge">nej_num_extents</code> extent records from <code class="language-plaintext highlighter-rouge">nej_rec_extents</code> and load each extent’s blocks sequentially into the memory buffer.</p>
  </li>
  <li>
    <p>Execute the loaded EFI driver.</p>
  </li>
</ol>

<h2 id="nx_efi_jumpstart_t">nx_efi_jumpstart_t</h2>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define NX_EFI_JUMPSTART_MAGIC 'RDSJ'
#define NX_EFI_JUMPSTART_VERSION 1
</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="nc">nx_efi_jumpstart</span> <span class="p">{</span>
    <span class="n">obj_phys_t</span> <span class="n">nej_o</span><span class="p">;</span>             <span class="c1">// 0x00</span>
    <span class="kt">uint32_t</span> <span class="n">nej_magic</span><span class="p">;</span>           <span class="c1">// 0x20</span>
    <span class="kt">uint32_t</span> <span class="n">nej_version</span><span class="p">;</span>         <span class="c1">// 0x24</span>
    <span class="kt">uint32_t</span> <span class="n">nej_efi_file_len</span><span class="p">;</span>    <span class="c1">// 0x28</span>
    <span class="kt">uint32_t</span> <span class="n">nej_num_extents</span><span class="p">;</span>     <span class="c1">// 0x2C</span>
    <span class="kt">uint64_t</span> <span class="n">nej_reserved</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>    <span class="c1">// 0x30</span>
    <span class="n">prange_t</span> <span class="n">nej_rec_extents</span><span class="p">[];</span>   <span class="c1">// 0xB0</span>
<span class="p">}</span> <span class="n">nx_efi_jumpstart_t</span><span class="p">;</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">nej_o</code>: The object header (type <code class="language-plaintext highlighter-rouge">OBJECT_TYPE_EFI_JUMPSTART</code>, physical)</li>
  <li><code class="language-plaintext highlighter-rouge">nej_magic</code>: Must equal <code class="language-plaintext highlighter-rouge">NX_EFI_JUMPSTART_MAGIC</code> (<code class="language-plaintext highlighter-rouge">'RDSJ'</code>, on-disk bytes <code class="language-plaintext highlighter-rouge">4A 53 44 52</code>)</li>
  <li><code class="language-plaintext highlighter-rouge">nej_version</code>: Must equal 1</li>
  <li><code class="language-plaintext highlighter-rouge">nej_efi_file_len</code>: The total size of the embedded EFI driver in bytes</li>
  <li><code class="language-plaintext highlighter-rouge">nej_num_extents</code>: The number of physical extent records that follow</li>
  <li><code class="language-plaintext highlighter-rouge">nej_reserved</code>: Reserved (128 bytes, set to zero)</li>
  <li><code class="language-plaintext highlighter-rouge">nej_rec_extents</code>: A variable-length array of <code class="language-plaintext highlighter-rouge">prange_t</code> records describing where the EFI driver blocks are stored on disk</li>
</ul>

<p>Each <code class="language-plaintext highlighter-rouge">prange_t</code> in the extent array specifies a starting physical address and a block count:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">prange</span> <span class="p">{</span>
    <span class="n">paddr_t</span> <span class="n">pr_start_paddr</span><span class="p">;</span> <span class="c1">// 0x00</span>
    <span class="kt">uint64_t</span> <span class="n">pr_block_count</span><span class="p">;</span> <span class="c1">// 0x08</span>
<span class="p">}</span> <span class="n">prange_t</span><span class="p">;</span>                  <span class="c1">// 0x10</span>
</code></pre></div></div>

<p>The extents must be read sequentially and concatenated to assemble the complete driver image.</p>

<h2 id="gpt-partition-type">GPT Partition Type</h2>

<p>APFS partitions are identified in the GUID Partition Table by the following type UUID:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>7C3457EF-0000-11AA-AA11-00306543ECAC
</code></pre></div></div>

<p>UEFI firmware uses this UUID to identify partitions that may contain an APFS container with an embedded EFI driver.</p>

<h2 id="forensic-considerations">Forensic Considerations</h2>

<p>The EFI jumpstart structure is useful for forensic validation:</p>

<ul>
  <li>Its presence and validity confirm that the partition was formatted as APFS (as opposed to being partially overwritten).</li>
  <li>The driver extents reference physical blocks that should be within the container’s bounds. Out-of-range addresses indicate corruption.</li>
  <li>The jumpstart structure is independent of checkpoints. Since it is only written during container creation, it provides a stable reference point that survives checkpoint-level corruption.</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>The EFI jumpstart mechanism provides a minimal, self-contained boot path for APFS containers. Its simplicity (a single physical object with direct extent references) ensures that UEFI firmware can load the APFS driver without implementing any of the complex B-Tree or checkpoint logic that the rest of APFS requires.</p>]]></content><author><name></name></author><category term="file-systems" /><category term="apfs" /><category term="apfs" /><category term="efi" /><category term="boot" /><summary type="html"><![CDATA[APFS containers include an embedded EFI driver that allows UEFI firmware to boot from APFS partitions without requiring a built-in APFS driver. This post covers the nx_efi_jumpstart_t structure and the boot procedure that uses it.]]></summary></entry><entry><title type="html">The Reaper</title><link href="https://jtsylve.blog/post/2026/06/03/APFS-Reaper" rel="alternate" type="text/html" title="The Reaper" /><published>2026-06-03T00:00:00+00:00</published><updated>2026-06-03T00:00:00+00:00</updated><id>https://jtsylve.blog/post/2026/06/03/APFS%20Reaper</id><content type="html" xml:base="https://jtsylve.blog/post/2026/06/03/APFS-Reaper"><![CDATA[<p>In our <a href="/post/2022/12/05/APFS-Containers">post on Containers</a>, we introduced the Reaper as the subsystem responsible for garbage collection in APFS. The Reaper handles deletions that are too large to complete within a single transaction, such as deleting an entire volume or cleaning up after a snapshot deletion. In this post, we will examine the Reaper’s on-disk structures and its multi-phase state machine.</p>

<h2 id="overview">Overview</h2>

<p>Each APFS container has exactly one Reaper, stored as an ephemeral object in the checkpoint data area. Its object identifier is recorded in the <code class="language-plaintext highlighter-rouge">nx_reaper_oid</code> field of the <a href="/post/2022/12/06/APFS-NX-Superblock">NX Superblock</a>. The Reaper runs in a dedicated kernel thread with throttled I/O priority, processing entries from a linked list of <em>reap list blocks</em>. When a handler cannot complete its work within a single transaction, it saves its progress in a state buffer and resumes in a new transaction.</p>

<h2 id="nx_reaper_phys_t">nx_reaper_phys_t</h2>

<p>The top-level Reaper structure.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">nx_reaper_phys</span> <span class="p">{</span>
    <span class="n">obj_phys_t</span> <span class="n">nr_o</span><span class="p">;</span>              <span class="c1">// 0x00</span>
    <span class="kt">uint64_t</span> <span class="n">nr_next_reap_id</span><span class="p">;</span>    <span class="c1">// 0x20</span>
    <span class="kt">uint64_t</span> <span class="n">nr_completed_id</span><span class="p">;</span>    <span class="c1">// 0x28</span>
    <span class="n">oid_t</span> <span class="n">nr_head</span><span class="p">;</span>               <span class="c1">// 0x30</span>
    <span class="n">oid_t</span> <span class="n">nr_tail</span><span class="p">;</span>               <span class="c1">// 0x38</span>
    <span class="kt">uint32_t</span> <span class="n">nr_flags</span><span class="p">;</span>           <span class="c1">// 0x40</span>
    <span class="kt">uint32_t</span> <span class="n">nr_rlcount</span><span class="p">;</span>         <span class="c1">// 0x44</span>
    <span class="kt">uint32_t</span> <span class="n">nr_type</span><span class="p">;</span>            <span class="c1">// 0x48</span>
    <span class="kt">uint32_t</span> <span class="n">nr_size</span><span class="p">;</span>            <span class="c1">// 0x4C</span>
    <span class="n">oid_t</span> <span class="n">nr_fs_oid</span><span class="p">;</span>             <span class="c1">// 0x50</span>
    <span class="n">oid_t</span> <span class="n">nr_oid</span><span class="p">;</span>                <span class="c1">// 0x58</span>
    <span class="n">xid_t</span> <span class="n">nr_xid</span><span class="p">;</span>                <span class="c1">// 0x60</span>
    <span class="kt">uint32_t</span> <span class="n">nr_nrle_flags</span><span class="p">;</span>      <span class="c1">// 0x68</span>
    <span class="kt">uint32_t</span> <span class="n">nr_state_buffer_size</span><span class="p">;</span> <span class="c1">// 0x6C</span>
    <span class="kt">uint8_t</span> <span class="n">nr_state_buffer</span><span class="p">[];</span>   <span class="c1">// 0x70</span>
<span class="p">}</span> <span class="n">nx_reaper_phys_t</span><span class="p">;</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">nr_o</code>: The object header (type <code class="language-plaintext highlighter-rouge">OBJECT_TYPE_NX_REAPER</code>, ephemeral)</li>
  <li><code class="language-plaintext highlighter-rouge">nr_next_reap_id</code>: The identifier to assign to the next reap operation (initialized to 1)</li>
  <li><code class="language-plaintext highlighter-rouge">nr_completed_id</code>: The identifier of the most recently completed reap operation</li>
  <li><code class="language-plaintext highlighter-rouge">nr_head</code>: Object identifier of the first reap list block (zero if empty)</li>
  <li><code class="language-plaintext highlighter-rouge">nr_tail</code>: Object identifier of the last reap list block (zero if empty)</li>
  <li><code class="language-plaintext highlighter-rouge">nr_flags</code>: Reaper state flags (see below)</li>
  <li><code class="language-plaintext highlighter-rouge">nr_rlcount</code>: Number of reap list blocks in the chain</li>
  <li><code class="language-plaintext highlighter-rouge">nr_type</code>: The object type currently being reaped</li>
  <li><code class="language-plaintext highlighter-rouge">nr_size</code>: Size parameter for the current reap operation</li>
  <li><code class="language-plaintext highlighter-rouge">nr_fs_oid</code>: The volume object identifier associated with the current reap</li>
  <li><code class="language-plaintext highlighter-rouge">nr_oid</code>: Object identifier of the object being reaped (zero when idle)</li>
  <li><code class="language-plaintext highlighter-rouge">nr_xid</code>: Transaction identifier for the current operation</li>
  <li><code class="language-plaintext highlighter-rouge">nr_nrle_flags</code>: Flags from the reap list entry being processed</li>
  <li><code class="language-plaintext highlighter-rouge">nr_state_buffer_size</code>: Size of the state buffer in bytes</li>
  <li><code class="language-plaintext highlighter-rouge">nr_state_buffer</code>: Variable-length buffer for handler progress state</li>
</ul>

<p>The state buffer allows reap handlers to save their position across transaction boundaries. For a 4096-byte block, this buffer is 3984 bytes.</p>

<h4 id="reaper-flags">Reaper Flags</h4>

<table style="margin-left: 0">
  <thead>
    <tr>
      <th>Name</th>
      <th>Value</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>NR_BHM_FLAG</td>
      <td>0x00000001</td>
      <td>Must always be set (initialization flag)</td>
    </tr>
    <tr>
      <td>NR_CONTINUE</td>
      <td>0x00000002</td>
      <td>An object is partially reaped and requires continued processing</td>
    </tr>
  </tbody>
</table>

<h2 id="reap-lists">Reap Lists</h2>

<p>Reap list blocks form a singly linked list from <code class="language-plaintext highlighter-rouge">nr_head</code> to <code class="language-plaintext highlighter-rouge">nr_tail</code>. Each block contains an array of entries describing objects to be reaped.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">nx_reap_list_phys</span> <span class="p">{</span>
    <span class="n">obj_phys_t</span> <span class="n">nrl_o</span><span class="p">;</span>                <span class="c1">// 0x00</span>
    <span class="n">oid_t</span> <span class="n">nrl_next</span><span class="p">;</span>                  <span class="c1">// 0x20</span>
    <span class="kt">uint32_t</span> <span class="n">nrl_flags</span><span class="p">;</span>              <span class="c1">// 0x28</span>
    <span class="kt">uint32_t</span> <span class="n">nrl_max</span><span class="p">;</span>                <span class="c1">// 0x2C</span>
    <span class="kt">uint32_t</span> <span class="n">nrl_count</span><span class="p">;</span>              <span class="c1">// 0x30</span>
    <span class="kt">uint32_t</span> <span class="n">nrl_first</span><span class="p">;</span>             <span class="c1">// 0x34</span>
    <span class="kt">uint32_t</span> <span class="n">nrl_last</span><span class="p">;</span>              <span class="c1">// 0x38</span>
    <span class="kt">uint32_t</span> <span class="n">nrl_free</span><span class="p">;</span>              <span class="c1">// 0x3C</span>
    <span class="n">nx_reap_list_entry_t</span> <span class="n">nrl_entries</span><span class="p">[];</span> <span class="c1">// 0x40</span>
<span class="p">}</span> <span class="n">nx_reap_list_phys_t</span><span class="p">;</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">nrl_o</code>: The object header (type <code class="language-plaintext highlighter-rouge">OBJECT_TYPE_NX_REAP_LIST</code>, ephemeral)</li>
  <li><code class="language-plaintext highlighter-rouge">nrl_next</code>: Object identifier of the next reap list block in the chain, or zero</li>
  <li><code class="language-plaintext highlighter-rouge">nrl_flags</code>: Reserved</li>
  <li><code class="language-plaintext highlighter-rouge">nrl_max</code>: Maximum number of entries (calculated as <code class="language-plaintext highlighter-rouge">(block_size - 64) / 40</code>)</li>
  <li><code class="language-plaintext highlighter-rouge">nrl_count</code>: Number of active entries</li>
  <li><code class="language-plaintext highlighter-rouge">nrl_first</code>: Index of the first active entry, or <code class="language-plaintext highlighter-rouge">0xFFFFFFFF</code> if empty</li>
  <li><code class="language-plaintext highlighter-rouge">nrl_last</code>: Index of the last active entry, or <code class="language-plaintext highlighter-rouge">0xFFFFFFFF</code> if empty</li>
  <li><code class="language-plaintext highlighter-rouge">nrl_free</code>: Index of the first free entry slot, or <code class="language-plaintext highlighter-rouge">0xFFFFFFFF</code> if full</li>
</ul>

<p>Within each block, two linked lists are threaded through the entry array using index chains: an active list of entries awaiting processing and a free list of available slots.</p>

<h3 id="nx_reap_list_entry_t">nx_reap_list_entry_t</h3>

<p>Each entry describes a single object to be reaped.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">nx_reap_list_entry</span> <span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">nrle_next</span><span class="p">;</span>   <span class="c1">// 0x00</span>
    <span class="kt">uint32_t</span> <span class="n">nrle_flags</span><span class="p">;</span>  <span class="c1">// 0x04</span>
    <span class="kt">uint32_t</span> <span class="n">nrle_type</span><span class="p">;</span>   <span class="c1">// 0x08</span>
    <span class="kt">uint32_t</span> <span class="n">nrle_size</span><span class="p">;</span>   <span class="c1">// 0x0C</span>
    <span class="n">oid_t</span> <span class="n">nrle_fs_oid</span><span class="p">;</span>    <span class="c1">// 0x10</span>
    <span class="n">oid_t</span> <span class="n">nrle_oid</span><span class="p">;</span>       <span class="c1">// 0x18</span>
    <span class="n">xid_t</span> <span class="n">nrle_xid</span><span class="p">;</span>       <span class="c1">// 0x20</span>
<span class="p">}</span> <span class="n">nx_reap_list_entry_t</span><span class="p">;</span>   <span class="c1">// 0x28</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">nrle_next</code>: Index of the next entry in the chain, or <code class="language-plaintext highlighter-rouge">0xFFFFFFFF</code></li>
  <li><code class="language-plaintext highlighter-rouge">nrle_flags</code>: Entry flags (see below)</li>
  <li><code class="language-plaintext highlighter-rouge">nrle_type</code>: The object type to reap</li>
  <li><code class="language-plaintext highlighter-rouge">nrle_size</code>: Size parameter for the handler</li>
  <li><code class="language-plaintext highlighter-rouge">nrle_fs_oid</code>: Volume object identifier (zero for container-level objects)</li>
  <li><code class="language-plaintext highlighter-rouge">nrle_oid</code>: Object identifier of the object to reap</li>
  <li><code class="language-plaintext highlighter-rouge">nrle_xid</code>: Transaction or reap identifier</li>
</ul>

<h4 id="reap-list-entry-flags">Reap List Entry Flags</h4>

<table style="margin-left: 0">
  <thead>
    <tr>
      <th>Name</th>
      <th>Value</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>NRLE_VALID</td>
      <td>0x00000001</td>
      <td>The entry contains valid data</td>
    </tr>
    <tr>
      <td>NRLE_REAP_ID_RECORD</td>
      <td>0x00000002</td>
      <td>Triggers a completion notification (updates <code class="language-plaintext highlighter-rouge">nr_completed_id</code>)</td>
    </tr>
    <tr>
      <td>NRLE_CALL</td>
      <td>0x00000004</td>
      <td>Triggers the reap handler for the specified object</td>
    </tr>
    <tr>
      <td>NRLE_COMPLETION</td>
      <td>0x00000008</td>
      <td>Marks the entry as a post-reap completion callback</td>
    </tr>
    <tr>
      <td>NRLE_CLEANUP</td>
      <td>0x00000010</td>
      <td>Triggers cleanup operations after reaping</td>
    </tr>
  </tbody>
</table>

<p>When an object is added to the Reaper, two entries are typically appended: a <em>call entry</em> (<code class="language-plaintext highlighter-rouge">NRLE_VALID | NRLE_CALL</code>) that triggers the type-specific handler, and a <em>completion entry</em> (<code class="language-plaintext highlighter-rouge">NRLE_VALID | NRLE_REAP_ID_RECORD</code>) that updates <code class="language-plaintext highlighter-rouge">nr_completed_id</code> when processed. Sub-object entries (such as a volume’s object map during volume deletion) are inserted at the head so they are processed before their parent.</p>

<h2 id="volume-deletion-phases">Volume Deletion Phases</h2>

<p>The most complex reap operation is deleting an entire volume. This proceeds through a sequence of phases (beginning at <code class="language-plaintext highlighter-rouge">APFS_REAP_PHASE_START</code> = 0, which transitions immediately to the snapshot phase), tracked in an <code class="language-plaintext highlighter-rouge">apfs_reap_state_t</code> stored in <code class="language-plaintext highlighter-rouge">nr_state_buffer</code>:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">apfs_reap_state</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">last_pbn</span><span class="p">;</span>    <span class="c1">// 0x00</span>
    <span class="n">xid_t</span> <span class="n">cur_snap_xid</span><span class="p">;</span>   <span class="c1">// 0x08</span>
    <span class="kt">uint32_t</span> <span class="n">phase</span><span class="p">;</span>       <span class="c1">// 0x10</span>
<span class="p">}</span> <span class="n">apfs_reap_state_t</span><span class="p">;</span>     <span class="c1">// 0x14</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">last_pbn</code>: Physical block number where extent reaping last paused</li>
  <li><code class="language-plaintext highlighter-rouge">cur_snap_xid</code>: Transaction identifier of the snapshot currently being reaped</li>
  <li><code class="language-plaintext highlighter-rouge">phase</code>: Current deletion phase (0-4)</li>
</ul>

<h3 id="phase-1-apfs_reap_phase_snapshots">Phase 1: APFS_REAP_PHASE_SNAPSHOTS</h3>

<p>All snapshots belonging to the volume are reaped. The Reaper iterates through each snapshot’s extent reference tree, freeing physical extents. Progress is tracked by <code class="language-plaintext highlighter-rouge">cur_snap_xid</code>. Each snapshot’s extentref tree is then deleted, exactly as in normal <a href="/post/2022/12/28/APFS-Snapshot-Metadata">snapshot deletion</a>.</p>

<h3 id="phase-2-apfs_reap_phase_active_fs">Phase 2: APFS_REAP_PHASE_ACTIVE_FS</h3>

<p>After all snapshots are gone, the active file system’s extents are freed. The Reaper walks the volume’s extent reference tree and frees all data extents owned by the volume. Progress is tracked by <code class="language-plaintext highlighter-rouge">last_pbn</code>. Supplemental trees are also destroyed: the sealed volume’s file extent tree (<code class="language-plaintext highlighter-rouge">apfs_fext_tree_oid</code>, present when <code class="language-plaintext highlighter-rouge">APFS_INCOMPAT_SEALED_VOLUME</code> is set) and the per-file key upgrade/rotation tree (<code class="language-plaintext highlighter-rouge">apfs_pfkur_tree_oid</code>, present when <code class="language-plaintext highlighter-rouge">APFS_INCOMPAT_PFK_UPGRADE_ROTATION</code> is set).</p>

<h3 id="phase-3-apfs_reap_phase_destroy_omap">Phase 3: APFS_REAP_PHASE_DESTROY_OMAP</h3>

<p>The volume’s <a href="/post/2022/12/12/APFS-OMAP">Object Map</a> is destroyed. This is added to the Reaper as a sub-object, using its own state tracking (<code class="language-plaintext highlighter-rouge">omap_reap_state_t</code>). After the OMAP is fully reaped, crypto state, key caches, and the volume’s superblock metadata are cleared.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">omap_reap_state</span> <span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">omr_phase</span><span class="p">;</span>  <span class="c1">// 0x00</span>
    <span class="kt">uint32_t</span> <span class="n">omr_pad</span><span class="p">;</span>    <span class="c1">// 0x04</span>
    <span class="n">omap_key_t</span> <span class="n">omr_ok</span><span class="p">;</span>   <span class="c1">// 0x08</span>
<span class="p">}</span> <span class="n">omap_reap_state_t</span><span class="p">;</span>     <span class="c1">// 0x18</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">omr_phase</code>: Current phase (<code class="language-plaintext highlighter-rouge">OMAP_REAP_PHASE_MAP_TREE</code> = 1, <code class="language-plaintext highlighter-rouge">OMAP_REAP_PHASE_SNAPSHOT_TREE</code> = 2)</li>
  <li><code class="language-plaintext highlighter-rouge">omr_ok</code>: The last freed key, used to resume iteration after a transaction boundary</li>
</ul>

<h3 id="phase-4-apfs_reap_phase_done">Phase 4: APFS_REAP_PHASE_DONE</h3>

<p>All volume structures have been freed. The reap operation is complete.</p>

<h2 id="crash-recovery">Crash Recovery</h2>

<p>The Reaper’s design guarantees crash-safe resumption. If the system crashes mid-reap:</p>

<ol>
  <li>The Reaper’s ephemeral object is restored from the checkpoint. Since <code class="language-plaintext highlighter-rouge">nr_oid</code> is nonzero and <code class="language-plaintext highlighter-rouge">NR_CONTINUE</code> is set in <code class="language-plaintext highlighter-rouge">nr_flags</code>, the Reaper knows to resume.</li>
  <li>On the next mount, the Reaper thread starts and enters a transaction.</li>
  <li>Since <code class="language-plaintext highlighter-rouge">nr_oid</code> is already set, it skips entry dequeue and goes directly to handler dispatch.</li>
  <li>The handler reads its saved state from <code class="language-plaintext highlighter-rouge">nr_state_buffer</code> and resumes where it left off.</li>
</ol>

<p>This ensures that even multi-transaction deletions spanning many checkpoints will always complete, regardless of how many crashes occur during the process.</p>

<h2 id="forensic-considerations">Forensic Considerations</h2>

<p>The Reaper is forensically significant because:</p>

<ul>
  <li><strong>Partially reaped volumes</strong> may still contain recoverable data. The <code class="language-plaintext highlighter-rouge">phase</code> field in the reap state indicates how far deletion has progressed. Data in phases not yet reached may be fully intact.</li>
  <li><strong>The reap list</strong> reveals which objects are pending deletion. A volume that appears missing from the container’s <code class="language-plaintext highlighter-rouge">nx_fs_oid</code> array may still exist in the Reaper’s queue.</li>
  <li><strong>The <code class="language-plaintext highlighter-rouge">nr_completed_id</code> and <code class="language-plaintext highlighter-rouge">nr_next_reap_id</code> fields</strong> provide a history of how many reap operations have occurred, giving insight into container activity.</li>
  <li><strong>Free queue entries</strong> from reaper-freed blocks retain their transaction identifiers, indicating when deletion occurred.</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>The Reaper provides crash-safe, multi-transaction garbage collection for APFS. Its state machine design allows arbitrarily large deletions (entire volumes, snapshot cleanup, object map destruction) to proceed incrementally across as many transactions as needed, with guaranteed resumption after crashes. Combined with the <a href="/post/2026/06/02/APFS-Space-Manager">Space Manager’s</a> free queues, it ensures that block deallocation is always consistent and recoverable.</p>]]></content><author><name></name></author><category term="file-systems" /><category term="apfs" /><category term="apfs" /><category term="reaper" /><category term="garbage-collection" /><summary type="html"><![CDATA[In our post on Containers, we introduced the Reaper as the subsystem responsible for garbage collection in APFS. The Reaper handles deletions that are too large to complete within a single transaction, such as deleting an entire volume or cleaning up after a snapshot deletion. In this post, we will examine the Reaper’s on-disk structures and its multi-phase state machine.]]></summary></entry><entry><title type="html">SpiceCrypt 3.0: QSPICE Support</title><link href="https://jtsylve.blog/post/2026/06/03/spice-crypt-3.0" rel="alternate" type="text/html" title="SpiceCrypt 3.0: QSPICE Support" /><published>2026-06-03T00:00:00+00:00</published><updated>2026-06-03T00:00:00+00:00</updated><id>https://jtsylve.blog/post/2026/06/03/spice-crypt-3.0</id><content type="html" xml:base="https://jtsylve.blog/post/2026/06/03/spice-crypt-3.0"><![CDATA[<p><a href="https://github.com/jtsylve/spice-crypt/">SpiceCrypt 3.0.0</a> is out.  When I <a href="/post/2026/03/18/PSpice-Encryption-Weakness">introduced SpiceCrypt in March</a>, it decrypted PSpice and LTspice model files so engineers could use lawfully obtained models in any simulator.  This release adds QSPICE, the protection scheme used by Qorvo’s simulator, and with it SpiceCrypt now spans the three most widely used SPICE tools in a single auto-detecting library and tool.</p>

<h2 id="whats-new">What’s new</h2>

<ul>
  <li><strong>QSPICE <code class="language-plaintext highlighter-rouge">.prot</code> decryption.</strong>  SpiceCrypt now decrypts QSPICE protected sub-circuits: randomized base-16 encoding, a seed-keyed dual stream cipher, DEFLATE decompression, and Windows-1252 detokenization.  Surrounding plaintext lines pass through untouched.  The full reverse-engineered scheme is documented in <a href="https://github.com/jtsylve/spice-crypt/blob/master/SPECIFICATIONS/qspice.md"><code class="language-plaintext highlighter-rouge">SPECIFICATIONS/qspice.md</code></a>.</li>
  <li><strong>Unified auto-detection.</strong>  <code class="language-plaintext highlighter-rouge">decrypt_stream()</code> and <code class="language-plaintext highlighter-rouge">decrypt()</code> now auto-detect across Binary File, PSpice, QSPICE, and LTspice formats.  Point SpiceCrypt at a file and it picks the right scheme.</li>
  <li><strong>New public API.</strong>  <code class="language-plaintext highlighter-rouge">QSpiceFileParser</code> and <code class="language-plaintext highlighter-rouge">QSpiceCipher</code> are now exported for callers that want to work with QSPICE directly.</li>
  <li><strong>Block-count reporting.</strong>  Since a single file can hold many protected sub-circuits, <code class="language-plaintext highlighter-rouge">decrypt_stream()</code> now returns the block count for QSPICE inputs.</li>
  <li><strong>Graceful degradation.</strong>  A protected block that fails to decode now passes through unchanged with a warning instead of aborting the whole file.</li>
</ul>

<h2 id="breaking-changes">Breaking changes</h2>

<p>The deprecated v2.0.0 backward-compatibility shims have been removed: the top-level <code class="language-plaintext highlighter-rouge">des.py</code>, <code class="language-plaintext highlighter-rouge">binary_file.py</code>, and <code class="language-plaintext highlighter-rouge">crypto_state.py</code> modules are gone.  Import the LTspice internals directly instead:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">spice_crypt.ltspice</span> <span class="kn">import</span> <span class="bp">...</span>
</code></pre></div></div>

<p>The CLI and the primary <code class="language-plaintext highlighter-rouge">decrypt</code> / <code class="language-plaintext highlighter-rouge">decrypt_stream</code> entry points are unchanged.</p>

<h2 id="upgrading">Upgrading</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install</span> <span class="nt">--upgrade</span> spice-crypt
</code></pre></div></div>

<p>Or with uv:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>uv tool <span class="nb">install</span> <span class="nt">--upgrade</span> spice-crypt
</code></pre></div></div>

<h2 id="links">Links</h2>

<ul>
  <li><strong>Repository</strong>: <a href="https://github.com/jtsylve/spice-crypt">github.com/jtsylve/spice-crypt</a></li>
  <li><strong>PyPI</strong>: <a href="https://pypi.org/project/spice-crypt/">pypi.org/project/spice-crypt</a></li>
  <li><strong>QSPICE specification</strong>: <a href="https://github.com/jtsylve/spice-crypt/blob/master/SPECIFICATIONS/qspice.md">SPECIFICATIONS/qspice.md</a></li>
</ul>

<p>If you run into issues or have feature requests, please <a href="https://github.com/jtsylve/spice-crypt/issues">open an issue</a> on GitHub.</p>

<p><strong>Disclaimer:</strong> SpiceCrypt is intended solely for enabling simulator interoperability with lawfully obtained models.  Using it to violate intellectual property rights is immoral and is not an acceptable use of the tool.</p>]]></content><author><name></name></author><category term="security-research" /><category term="encryption" /><category term="qspice" /><category term="spice" /><category term="reverse-engineering" /><category term="encryption" /><category term="interoperability" /><summary type="html"><![CDATA[SpiceCrypt 3.0.0 is out. When I introduced SpiceCrypt in March, it decrypted PSpice and LTspice model files so engineers could use lawfully obtained models in any simulator. This release adds QSPICE, the protection scheme used by Qorvo’s simulator, and with it SpiceCrypt now spans the three most widely used SPICE tools in a single auto-detecting library and tool.]]></summary></entry><entry><title type="html">Space Manager</title><link href="https://jtsylve.blog/post/2026/06/02/APFS-Space-Manager" rel="alternate" type="text/html" title="Space Manager" /><published>2026-06-02T00:00:00+00:00</published><updated>2026-06-02T00:00:00+00:00</updated><id>https://jtsylve.blog/post/2026/06/02/APFS%20Space%20Manager</id><content type="html" xml:base="https://jtsylve.blog/post/2026/06/02/APFS-Space-Manager"><![CDATA[<p>In our <a href="/post/2022/12/05/APFS-Containers">earlier post on Containers</a>, we introduced the Space Manager as the subsystem responsible for tracking which blocks are in use across all storage tiers and for allocating and freeing blocks on behalf of volumes. That post promised more detail in the future. Today we deliver on that promise by examining the Space Manager’s on-disk structures, including its hierarchical chunk tracking system, free queues, internal pool, and allocation zones.</p>

<h2 id="overview">Overview</h2>

<p>Each APFS container has exactly one Space Manager, stored as an ephemeral object in the checkpoint data area. Its object identifier is recorded in the <code class="language-plaintext highlighter-rouge">nx_spaceman_oid</code> field of the <a href="/post/2022/12/06/APFS-NX-Superblock">NX Superblock</a>. The Space Manager tracks block allocation using a three-tier hierarchy: the top-level <code class="language-plaintext highlighter-rouge">spaceman_phys_t</code> structure contains per-device metadata, which references <em>Chunk Address Blocks</em> (CABs) or <em>Chunk Info Blocks</em> (CIBs) directly, which in turn reference individual allocation bitmaps.</p>

<h2 id="chunks-and-bitmaps">Chunks and Bitmaps</h2>

<p>The Space Manager divides each storage device into fixed-size <em>chunks</em>. Each chunk is a contiguous range of blocks tracked by a single allocation bitmap. The number of blocks per chunk is stored in <code class="language-plaintext highlighter-rouge">sm_blocks_per_chunk</code>.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">chunk_info</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">ci_xid</span><span class="p">;</span>         <span class="c1">// 0x00</span>
    <span class="kt">uint64_t</span> <span class="n">ci_addr</span><span class="p">;</span>        <span class="c1">// 0x08</span>
    <span class="kt">uint32_t</span> <span class="n">ci_block_count</span><span class="p">;</span> <span class="c1">// 0x10</span>
    <span class="kt">uint32_t</span> <span class="n">ci_free_count</span><span class="p">;</span>  <span class="c1">// 0x14</span>
    <span class="n">paddr_t</span> <span class="n">ci_bitmap_addr</span><span class="p">;</span>  <span class="c1">// 0x18</span>
<span class="p">}</span> <span class="n">chunk_info_t</span><span class="p">;</span>              <span class="c1">// 0x20</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">ci_xid</code>: The transaction identifier of the last transaction that modified this chunk’s bitmap</li>
  <li><code class="language-plaintext highlighter-rouge">ci_addr</code>: The first block address of this chunk</li>
  <li><code class="language-plaintext highlighter-rouge">ci_block_count</code>: The number of blocks in this chunk (lower 20 bits). Upper 12 bits hold flags (see below).</li>
  <li><code class="language-plaintext highlighter-rouge">ci_free_count</code>: The number of free blocks in this chunk (lower 20 bits)</li>
  <li><code class="language-plaintext highlighter-rouge">ci_bitmap_addr</code>: The physical address of the allocation bitmap for this chunk, or zero if no bitmap has been allocated</li>
</ul>

<h4 id="chunk-info-flags">Chunk Info Flags</h4>

<table style="margin-left: 0">
  <thead>
    <tr>
      <th>Name</th>
      <th>Value</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>CI_PINNED_TO_MAIN</td>
      <td>0x04000000</td>
      <td>The chunk is within the metazone region (reserved for metadata)</td>
    </tr>
    <tr>
      <td>CI_ALLOC_ZONE_HINT</td>
      <td>0x08000000</td>
      <td>The chunk is currently assigned to an allocation zone</td>
    </tr>
  </tbody>
</table>

<h2 id="chunk-info-blocks-and-chunk-address-blocks">Chunk Info Blocks and Chunk Address Blocks</h2>

<p>Chunk info structures are grouped into <em>Chunk Info Blocks</em> (CIBs), physical objects that each hold an array of <code class="language-plaintext highlighter-rouge">chunk_info_t</code> entries.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">chunk_info_block</span> <span class="p">{</span>
    <span class="n">obj_phys_t</span> <span class="n">cib_o</span><span class="p">;</span>              <span class="c1">// 0x00</span>
    <span class="kt">uint32_t</span> <span class="n">cib_index</span><span class="p">;</span>            <span class="c1">// 0x20</span>
    <span class="kt">uint32_t</span> <span class="n">cib_chunk_info_count</span><span class="p">;</span> <span class="c1">// 0x24</span>
    <span class="n">chunk_info_t</span> <span class="n">cib_chunk_info</span><span class="p">[];</span> <span class="c1">// 0x28</span>
<span class="p">}</span> <span class="n">chunk_info_block_t</span><span class="p">;</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">cib_o</code>: The object header (type <code class="language-plaintext highlighter-rouge">OBJECT_TYPE_SPACEMAN_CIB</code>)</li>
  <li><code class="language-plaintext highlighter-rouge">cib_index</code>: The index of this CIB within its device’s array</li>
  <li><code class="language-plaintext highlighter-rouge">cib_chunk_info_count</code>: The number of chunk info entries in this block</li>
  <li><code class="language-plaintext highlighter-rouge">cib_chunk_info</code>: A variable-length array of chunk info structures</li>
</ul>

<p>For large containers where the number of CIBs exceeds what can be stored directly in the Space Manager, a second level of indirection is used: <em>Chunk Address Blocks</em> (CABs).</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">cib_addr_block</span> <span class="p">{</span>
    <span class="n">obj_phys_t</span> <span class="n">cab_o</span><span class="p">;</span>       <span class="c1">// 0x00</span>
    <span class="kt">uint32_t</span> <span class="n">cab_index</span><span class="p">;</span>     <span class="c1">// 0x20</span>
    <span class="kt">uint32_t</span> <span class="n">cab_cib_count</span><span class="p">;</span> <span class="c1">// 0x24</span>
    <span class="n">paddr_t</span> <span class="n">cab_cib_addr</span><span class="p">[];</span> <span class="c1">// 0x28</span>
<span class="p">}</span> <span class="n">cib_addr_block_t</span><span class="p">;</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">cab_o</code>: The object header (type <code class="language-plaintext highlighter-rouge">OBJECT_TYPE_SPACEMAN_CAB</code>)</li>
  <li><code class="language-plaintext highlighter-rouge">cab_index</code>: The index of this CAB within its device’s array</li>
  <li><code class="language-plaintext highlighter-rouge">cab_cib_count</code>: The number of CIB addresses stored in this block</li>
  <li><code class="language-plaintext highlighter-rouge">cab_cib_addr</code>: A variable-length array of physical CIB addresses</li>
</ul>

<p>When <code class="language-plaintext highlighter-rouge">sm_cab_count</code> in the device structure is zero, CIB addresses are stored directly in the Space Manager. When nonzero, the CAB indirection layer is present.</p>

<h2 id="free-queues">Free Queues</h2>

<p>When blocks are freed, they are not immediately returned to the allocation bitmaps. Instead, they are placed into <em>free queues</em>: B-Trees that hold recently freed extents until all transactions that might reference them have been checkpointed. This ensures crash-safe deallocation.</p>

<p>APFS maintains three free queues:</p>

<table style="margin-left: 0">
  <thead>
    <tr>
      <th>Name</th>
      <th>Value</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>SFQ_IP</td>
      <td>0</td>
      <td>Internal pool free queue</td>
    </tr>
    <tr>
      <td>SFQ_MAIN</td>
      <td>1</td>
      <td>Main device free queue</td>
    </tr>
    <tr>
      <td>SFQ_TIER2</td>
      <td>2</td>
      <td>Tier-2 (HDD on Fusion) device free queue</td>
    </tr>
  </tbody>
</table>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">spaceman_free_queue</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">sfq_count</span><span class="p">;</span>           <span class="c1">// 0x00</span>
    <span class="n">oid_t</span> <span class="n">sfq_tree_oid</span><span class="p">;</span>           <span class="c1">// 0x08</span>
    <span class="n">xid_t</span> <span class="n">sfq_oldest_xid</span><span class="p">;</span>         <span class="c1">// 0x10</span>
    <span class="kt">uint16_t</span> <span class="n">sfq_tree_node_limit</span><span class="p">;</span> <span class="c1">// 0x18</span>
    <span class="kt">uint16_t</span> <span class="n">sfq_pad16</span><span class="p">;</span>           <span class="c1">// 0x1A</span>
    <span class="kt">uint32_t</span> <span class="n">sfq_pad32</span><span class="p">;</span>           <span class="c1">// 0x1C</span>
    <span class="kt">uint64_t</span> <span class="n">sfq_reserved</span><span class="p">;</span>        <span class="c1">// 0x20</span>
<span class="p">}</span> <span class="n">spaceman_free_queue_t</span><span class="p">;</span>          <span class="c1">// 0x28</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">sfq_count</code>: The number of entries in this free queue</li>
  <li><code class="language-plaintext highlighter-rouge">sfq_tree_oid</code>: The object identifier of the B-Tree that stores the entries, or zero if not yet created</li>
  <li><code class="language-plaintext highlighter-rouge">sfq_oldest_xid</code>: The oldest transaction identifier among all entries</li>
  <li><code class="language-plaintext highlighter-rouge">sfq_tree_node_limit</code>: When the B-Tree node count exceeds this limit, the queue is drained more aggressively</li>
</ul>

<h3 id="free-queue-entries">Free Queue Entries</h3>

<p>Free queue entries use a key that sorts first by transaction identifier, then by physical address:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">spaceman_free_queue_key</span> <span class="p">{</span>
    <span class="n">xid_t</span> <span class="n">sfqk_xid</span><span class="p">;</span>          <span class="c1">// 0x00</span>
    <span class="n">paddr_t</span> <span class="n">sfqk_paddr</span><span class="p">;</span>      <span class="c1">// 0x08</span>
<span class="p">}</span> <span class="n">spaceman_free_queue_key_t</span><span class="p">;</span> <span class="c1">// 0x10</span>
</code></pre></div></div>

<p>The value is a <code class="language-plaintext highlighter-rouge">uint64_t</code> block count. Single-block extents store a zero-length value in the B-Tree to save space (the count of 1 is implied).</p>

<p>When inserting entries, the implementation coalesces adjacent extents that share the same transaction identifier, reducing B-Tree size and improving drain efficiency.</p>

<h2 id="internal-pool">Internal Pool</h2>

<p>The <em>Internal Pool</em> (IP) is a dedicated set of blocks used for allocating B-Tree nodes and other metadata structures. It provides a reserved area that guarantees metadata allocations can succeed even when the container is nearly full. The IP has its own allocation bitmaps, separate from the per-chunk bitmaps used for data.</p>

<p>Key fields in <code class="language-plaintext highlighter-rouge">spaceman_phys_t</code>:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">sm_ip_base</code>: The physical base address of the internal pool blocks</li>
  <li><code class="language-plaintext highlighter-rouge">sm_ip_block_count</code>: The total number of blocks in the pool (bit 63 is a fragmentation flag)</li>
  <li><code class="language-plaintext highlighter-rouge">sm_ip_bm_base</code>: The physical base address of the IP bitmap blocks</li>
  <li><code class="language-plaintext highlighter-rouge">sm_ip_bm_block_count</code>: The number of IP bitmap blocks (bit 31 is a fragmentation flag)</li>
  <li><code class="language-plaintext highlighter-rouge">sm_ip_bm_size_in_blocks</code>: The number of bitmap blocks needed to cover the pool</li>
  <li><code class="language-plaintext highlighter-rouge">sm_ip_bm_tx_multiplier</code>: The number of bitmaps per transaction (at least 4)</li>
</ul>

<p>When the fragmentation flag is set (bit 63 of <code class="language-plaintext highlighter-rouge">sm_ip_block_count</code> or bit 31 of <code class="language-plaintext highlighter-rouge">sm_ip_bm_block_count</code>), the pool blocks or bitmaps are not contiguous. Their physical addresses must be looked up through a <em>Metadata Fragmented Extent List Tree</em> rather than computed from the base address.</p>

<h2 id="allocation-zones">Allocation Zones</h2>

<p>APFS uses <em>allocation zones</em> to group related allocations together on disk, reducing fragmentation and improving sequential read performance. Each device has up to 8 allocation zones (<code class="language-plaintext highlighter-rouge">SM_DATAZONE_ALLOCZONE_COUNT</code>), with zone IDs 1 through 4 corresponding to minimum allocation sizes in blocks.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">spaceman_allocation_zone_info_phys</span> <span class="p">{</span>
    <span class="n">spaceman_allocation_zone_boundaries_t</span> <span class="n">saz_current_boundaries</span><span class="p">;</span>
    <span class="n">spaceman_allocation_zone_boundaries_t</span> <span class="n">saz_previous_boundaries</span><span class="p">[</span><span class="mi">7</span><span class="p">];</span>
    <span class="kt">uint16_t</span> <span class="n">saz_zone_id</span><span class="p">;</span>
    <span class="kt">uint16_t</span> <span class="n">saz_previous_boundary_index</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">saz_reserved</span><span class="p">;</span>
<span class="p">}</span> <span class="n">spaceman_allocation_zone_info_phys_t</span><span class="p">;</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">saz_current_boundaries</code>: The current start and end block addresses of this zone</li>
  <li><code class="language-plaintext highlighter-rouge">saz_previous_boundaries</code>: A circular buffer of the 7 most recent previous chunk assignments</li>
  <li><code class="language-plaintext highlighter-rouge">saz_zone_id</code>: The allocation size class (1-4 blocks, or 0 for unused)</li>
  <li><code class="language-plaintext highlighter-rouge">saz_previous_boundary_index</code>: Index into the circular buffer for the next rotation</li>
</ul>

<p>Each allocation zone boundary is a simple range:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">spaceman_allocation_zone_boundaries</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">saz_zone_start</span><span class="p">;</span> <span class="c1">// 0x00</span>
    <span class="kt">uint64_t</span> <span class="n">saz_zone_end</span><span class="p">;</span>   <span class="c1">// 0x08</span>
<span class="p">}</span> <span class="n">spaceman_allocation_zone_boundaries_t</span><span class="p">;</span>
</code></pre></div></div>

<p>When an allocation zone’s current chunk becomes full, the allocator scans for a new chunk with sufficient free space, rotates the old boundaries into the circular buffer, and updates the current boundaries. The <code class="language-plaintext highlighter-rouge">CI_ALLOC_ZONE_HINT</code> flag on chunks tracks which chunk is currently assigned to a zone.</p>

<h2 id="metazone">Metazone</h2>

<p>The <em>metazone</em> is a contiguous region at the beginning of each device reserved exclusively for metadata allocation. Data allocations must not use metazone blocks. This separation ensures that metadata structures (B-Tree nodes, Space Manager bitmaps) are clustered together near the start of the device for efficient access.</p>

<p>The metazone size scales with device capacity:</p>
<ul>
  <li>Devices smaller than approximately 6 GB have no metazone</li>
  <li>Devices smaller than 16 GB use a 512 MB metazone</li>
  <li>Larger devices use a tiered formula that allocates progressively smaller fractions as device size increases, capped at one-quarter of the total device size</li>
</ul>

<p>Chunks within the metazone are marked with the <code class="language-plaintext highlighter-rouge">CI_PINNED_TO_MAIN</code> flag and are excluded from data allocation zones.</p>

<h2 id="spaceman_phys_t">spaceman_phys_t</h2>

<p>The top-level structure tying everything together:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">spaceman_phys</span> <span class="p">{</span>
    <span class="n">obj_phys_t</span> <span class="n">sm_o</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_block_size</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_blocks_per_chunk</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_chunks_per_cib</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_cibs_per_cab</span><span class="p">;</span>
    <span class="n">spaceman_device_t</span> <span class="n">sm_dev</span><span class="p">[</span><span class="n">SD_COUNT</span><span class="p">];</span>
    <span class="kt">uint32_t</span> <span class="n">sm_flags</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_ip_bm_tx_multiplier</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">sm_ip_block_count</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_ip_bm_size_in_blocks</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_ip_bm_block_count</span><span class="p">;</span>
    <span class="n">paddr_t</span> <span class="n">sm_ip_bm_base</span><span class="p">;</span>
    <span class="n">paddr_t</span> <span class="n">sm_ip_base</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">sm_fs_reserve_block_count</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">sm_fs_reserve_alloc_count</span><span class="p">;</span>
    <span class="n">spaceman_free_queue_t</span> <span class="n">sm_fq</span><span class="p">[</span><span class="n">SFQ_COUNT</span><span class="p">];</span>
    <span class="kt">uint16_t</span> <span class="n">sm_ip_bm_free_head</span><span class="p">;</span>
    <span class="kt">uint16_t</span> <span class="n">sm_ip_bm_free_tail</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_ip_bm_xid_offset</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_ip_bitmap_offset</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_ip_bm_free_next_offset</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_version</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">sm_struct_size</span><span class="p">;</span>
    <span class="n">spaceman_datazone_info_phys_t</span> <span class="n">sm_datazone</span><span class="p">;</span>
    <span class="c1">// Variable-length arrays follow...</span>
<span class="p">}</span> <span class="n">spaceman_phys_t</span><span class="p">;</span>
</code></pre></div></div>

<p>The structure is followed by variable-length arrays: IP bitmap XID arrays, IP bitmap offset arrays, IP bitmap free-next arrays, and CIB/CAB address arrays for each device. The total on-disk size must fit within one block.</p>

<p>Each device is described by a <code class="language-plaintext highlighter-rouge">spaceman_device_t</code>:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="nc">spaceman_device</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">sm_block_count</span><span class="p">;</span>  <span class="c1">// 0x00</span>
    <span class="kt">uint64_t</span> <span class="n">sm_chunk_count</span><span class="p">;</span>  <span class="c1">// 0x08</span>
    <span class="kt">uint32_t</span> <span class="n">sm_cib_count</span><span class="p">;</span>    <span class="c1">// 0x10</span>
    <span class="kt">uint32_t</span> <span class="n">sm_cab_count</span><span class="p">;</span>    <span class="c1">// 0x14</span>
    <span class="kt">uint64_t</span> <span class="n">sm_free_count</span><span class="p">;</span>   <span class="c1">// 0x18</span>
    <span class="kt">uint32_t</span> <span class="n">sm_addr_offset</span><span class="p">;</span>  <span class="c1">// 0x20</span>
    <span class="kt">uint32_t</span> <span class="n">sm_reserved</span><span class="p">;</span>     <span class="c1">// 0x24</span>
    <span class="kt">uint64_t</span> <span class="n">sm_reserved2</span><span class="p">;</span>    <span class="c1">// 0x28</span>
<span class="p">}</span> <span class="n">spaceman_device_t</span><span class="p">;</span>          <span class="c1">// 0x30</span>
</code></pre></div></div>
<ul>
  <li><code class="language-plaintext highlighter-rouge">sm_block_count</code>: Total blocks on this device</li>
  <li><code class="language-plaintext highlighter-rouge">sm_chunk_count</code>: Number of chunks</li>
  <li><code class="language-plaintext highlighter-rouge">sm_cib_count</code>: Number of CIBs</li>
  <li><code class="language-plaintext highlighter-rouge">sm_cab_count</code>: Number of CABs (zero if CIBs are stored directly)</li>
  <li><code class="language-plaintext highlighter-rouge">sm_free_count</code>: Total free blocks on this device</li>
  <li><code class="language-plaintext highlighter-rouge">sm_addr_offset</code>: Byte offset within <code class="language-plaintext highlighter-rouge">spaceman_phys_t</code> where the CIB/CAB address array begins</li>
</ul>

<h2 id="forensic-considerations">Forensic Considerations</h2>

<p>The Space Manager is particularly valuable for forensic analysis:</p>

<ul>
  <li><strong>Free queue entries</strong> identify blocks that were recently freed but may still contain recoverable data. The transaction identifier on each entry indicates when the block was freed.</li>
  <li><strong>Allocation bitmaps</strong> reveal which blocks are currently in use versus free, which can be cross-referenced against file extent records to find orphaned data.</li>
  <li><strong>Chunk info transaction identifiers</strong> (<code class="language-plaintext highlighter-rouge">ci_xid</code>) indicate when each chunk’s allocation state last changed, providing a coarse timeline of write activity across the disk.</li>
  <li><strong>Allocation zones</strong> reveal where the file system tends to place related data, which can help reconstruct file system activity patterns.</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>The Space Manager implements a sophisticated hierarchical allocation system that balances performance, fragmentation avoidance, and crash safety. Its three-tier structure (CABs, CIBs, bitmaps) scales from tiny containers to multi-terabyte devices. Free queues ensure safe deallocation across transactions, while allocation zones and the metazone organize blocks for optimal access patterns.</p>]]></content><author><name></name></author><category term="file-systems" /><category term="apfs" /><category term="apfs" /><category term="space-manager" /><category term="allocation" /><summary type="html"><![CDATA[In our earlier post on Containers, we introduced the Space Manager as the subsystem responsible for tracking which blocks are in use across all storage tiers and for allocating and freeing blocks on behalf of volumes. That post promised more detail in the future. Today we deliver on that promise by examining the Space Manager’s on-disk structures, including its hierarchical chunk tracking system, free queues, internal pool, and allocation zones.]]></summary></entry><entry><title type="html">Revisiting the APFS Series</title><link href="https://jtsylve.blog/post/2026/06/01/Revisiting-the-APFS-Series" rel="alternate" type="text/html" title="Revisiting the APFS Series" /><published>2026-06-01T00:00:00+00:00</published><updated>2026-06-01T00:00:00+00:00</updated><id>https://jtsylve.blog/post/2026/06/01/Revisiting%20the%20APFS%20Series</id><content type="html" xml:base="https://jtsylve.blog/post/2026/06/01/Revisiting-the-APFS-Series"><![CDATA[<p>Back in 2022 I started the <a href="/post/2022/11/27/APFS-Advent-Challenge-2022">APFS Advent Challenge</a>: a daily run of posts dissecting the on-disk internals of Apple’s file system. Nearly four years later, both APFS and our collective understanding of it have moved on. So I’ve gone back through the entire series, brought every post up to date, and over the next two weeks I’ll be adding new parts to fill in the gaps.</p>

<h2 id="whats-changed-since-2022">What’s changed since 2022</h2>

<p>APFS is not a frozen target. It has continued to evolve across macOS releases, picking up new on-disk features and quietly refining old ones. The structures I documented in December 2022 were accurate then, but a lot has shifted underneath them since.</p>

<p>My own approach has changed too. The original posts were written largely day-by-day, under the self-imposed pressure of an advent calendar. Since then I’ve built far better tooling for this kind of work, and I now validate each structure directly against the current implementation rather than against memory and notes. That has let me correct a few details, sharpen explanations that were fuzzier than I’d like, and add depth in places where the original posts only scratched the surface.</p>

<h2 id="living-references-not-snapshots">Living references, not snapshots</h2>

<p>The most important change is one of intent. I want this series to be something you can actually rely on, not a museum piece dated December 2022.</p>

<p>So rather than leaving the original posts untouched and bolting corrections onto the end, I’ve revised them in place. Each one now reflects current APFS instead of a four-year-old snapshot. Posts that were updated carry an “Updated” date in their byline so you can see at a glance what has been touched. The permalinks haven’t moved, so any links or bookmarks you already have will keep working.</p>

<h2 id="new-parts-coming-over-the-next-two-weeks">New parts, coming over the next two weeks</h2>

<p>The original run also left real gaps. Some of the container’s internal machinery never got covered, and several features either postdate the 2022 series or simply didn’t make the cut when I ran out of December. Over the next two weeks I’ll be publishing new entries to round the series out:</p>

<ul>
  <li><strong>Space Manager</strong>: how APFS tracks free and allocated blocks</li>
  <li><strong>The Reaper</strong>: crash-safe, multi-transaction garbage collection</li>
  <li><strong>EFI Jumpstart</strong>: booting from an APFS container</li>
  <li><strong>Hard Links and Siblings</strong>: the sibling-link records behind hard links</li>
  <li><strong>Transparent Compression (DECMPFS)</strong>: inline and resource-fork compression</li>
  <li><strong>Clonegroups</strong>: tracking copy-on-write clones</li>
  <li><strong>Encryption Rolling</strong>: re-encrypting a volume in place</li>
  <li><strong>Volume Grafting</strong>: overlaying volumes for system updates</li>
  <li><strong>Speculative Telemetry</strong>: tracking speculatively downloaded content</li>
</ul>

<p>That brings the series to 27 parts spanning the container layer, B-Trees, the volume and file-system layer, integrity and encryption, and APFS’s more advanced features.</p>

<h2 id="read-the-series">Read the series</h2>

<p>The <a href="/apfs/">series index</a> lays out the full planned structure. Parts that haven’t published yet are marked “Coming Soon” and will light up as they go live over the coming days. If you read the series back in 2022, it’s worth a second look. If you’re coming to it fresh, there’s never been a better time.</p>]]></content><author><name></name></author><category term="meta" /><category term="apfs" /><summary type="html"><![CDATA[Back in 2022 I started the APFS Advent Challenge: a daily run of posts dissecting the on-disk internals of Apple’s file system. Nearly four years later, both APFS and our collective understanding of it have moved on. So I’ve gone back through the entire series, brought every post up to date, and over the next two weeks I’ll be adding new parts to fill in the gaps.]]></summary></entry><entry><title type="html">IDA-MCP Is Now RE-MCP With Ghidra Support</title><link href="https://jtsylve.blog/post/2026/05/04/ida-mcp-becomes-re-mcp" rel="alternate" type="text/html" title="IDA-MCP Is Now RE-MCP With Ghidra Support" /><published>2026-05-04T00:00:00+00:00</published><updated>2026-05-04T00:00:00+00:00</updated><id>https://jtsylve.blog/post/2026/05/04/ida-mcp-becomes-re-mcp</id><content type="html" xml:base="https://jtsylve.blog/post/2026/05/04/ida-mcp-becomes-re-mcp"><![CDATA[<p>When I started building ida-mcp, the goal was simple: give an LLM headless access to IDA Pro through MCP (Model Context Protocol). Open a binary, decompile functions, follow cross-references, rename symbols.</p>

<p>2.0 added a supervisor/worker architecture for analyzing multiple binaries simultaneously. 2.1 introduced progressive tool discovery so the LLM could find specialized tools on demand instead of loading ~195 schemas at startup. 2.2 added meta-tools that let the LLM write multi-step analysis scripts, issue bulk operations, and persist state across sessions through a daemon.</p>

<p>Each release solved a real friction point. But that progression revealed something about the interface itself. The tools the LLM actually calls (decompile this function, get cross-references to that address, rename this symbol, search for strings matching this pattern) described reverse engineering in the abstract, not IDA in particular. IDA was the engine behind those tools, but the tool surface itself was generic. An LLM asking to decompile <code class="language-plaintext highlighter-rouge">main</code> doesn’t care whether the answer comes from Hex-Rays or Ghidra’s decompiler. It cares about the pseudocode.</p>

<p>That realization is why ida-mcp is now <a href="https://github.com/jtsylve/re-mcp">re-mcp</a> (reverse engineering MCP). Version 3.0 ships with a full <a href="https://ghidra-sre.org/">Ghidra</a> backend alongside the existing IDA Pro backend, with a shared tool interface that makes LLM workflows portable across both.</p>

<h2 id="why-ghidra-matters-here">Why Ghidra matters here</h2>

<p>The most common response I heard after publishing ida-mcp was some variation of “this looks great, but I don’t have an IDA license.” IDA Pro is the industry standard for binary analysis, but it costs thousands of dollars per seat. For students, independent researchers, CTF players, and hobbyists, that puts LLM-driven reverse engineering out of reach before it even starts.</p>

<p>Ghidra, released by the NSA as open source in 2019, has become the primary free alternative. It supports dozens of processor architectures, its decompiler is capable, and it has an active community building extensions and loaders. By adding Ghidra as a backend, re-mcp makes everything from 2.0 through 2.2 (multi-database analysis, progressive tool discovery, <code class="language-plaintext highlighter-rouge">execute</code> scripts, <code class="language-plaintext highlighter-rouge">batch</code> operations) available to anyone willing to install a free tool and a JDK.</p>

<h2 id="getting-started-with-ghidra">Getting started with Ghidra</h2>

<p>The Ghidra backend requires Python 3.12+, <a href="https://ghidra-sre.org/">Ghidra 12+</a>, and JDK 21+. Ghidra’s install path is found automatically from the <code class="language-plaintext highlighter-rouge">GHIDRA_INSTALL_DIR</code> environment variable or platform-specific default locations.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>uv tool <span class="nb">install </span>re-mcp-ghidra
</code></pre></div></div>

<p>Then configure your MCP client:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"mcpServers"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"ghidra"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"command"</span><span class="p">:</span><span class="w"> </span><span class="s2">"uvx"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"args"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"re-mcp-ghidra"</span><span class="p">]</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>From there, everything works the way it did with IDA. Open a binary, wait for analysis to complete, and start asking questions.</p>

<p>The meta-tools from 2.2 work on the Ghidra backend too. Here’s an <code class="language-plaintext highlighter-rouge">execute</code> script that finds functions referencing error strings and summarizes them:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">strings</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">find_code_by_string</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span>
    <span class="sh">"</span><span class="s">pattern</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">invalid|error|fail</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">limit</span><span class="sh">"</span><span class="p">:</span> <span class="mi">50</span>
<span class="p">})</span>
<span class="n">seen</span> <span class="o">=</span> <span class="nf">set</span><span class="p">()</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">hit</span> <span class="ow">in</span> <span class="n">strings</span><span class="p">[</span><span class="sh">"</span><span class="s">items</span><span class="sh">"</span><span class="p">]:</span>
    <span class="n">fn</span> <span class="o">=</span> <span class="n">hit</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">function_name</span><span class="sh">"</span><span class="p">,</span> <span class="sh">""</span><span class="p">)</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">fn</span> <span class="ow">or</span> <span class="n">fn</span> <span class="ow">in</span> <span class="n">seen</span><span class="p">:</span>
        <span class="k">continue</span>
    <span class="n">seen</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="n">fn</span><span class="p">)</span>
    <span class="n">decomp</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">decompile_function</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span>
        <span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">:</span> <span class="n">hit</span><span class="p">[</span><span class="sh">"</span><span class="s">function_address</span><span class="sh">"</span><span class="p">]</span>
    <span class="p">})</span>
    <span class="n">results</span><span class="p">.</span><span class="nf">append</span><span class="p">({</span>
        <span class="sh">"</span><span class="s">function</span><span class="sh">"</span><span class="p">:</span> <span class="n">decomp</span><span class="p">[</span><span class="sh">"</span><span class="s">function_name</span><span class="sh">"</span><span class="p">],</span>
        <span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">:</span> <span class="n">decomp</span><span class="p">[</span><span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">],</span>
        <span class="sh">"</span><span class="s">matched_string</span><span class="sh">"</span><span class="p">:</span> <span class="n">hit</span><span class="p">[</span><span class="sh">"</span><span class="s">string_value</span><span class="sh">"</span><span class="p">],</span>
        <span class="sh">"</span><span class="s">lines</span><span class="sh">"</span><span class="p">:</span> <span class="nf">len</span><span class="p">(</span><span class="n">decomp</span><span class="p">[</span><span class="sh">"</span><span class="s">decompiled_code</span><span class="sh">"</span><span class="p">].</span><span class="nf">splitlines</span><span class="p">())</span>
    <span class="p">})</span>
<span class="k">return</span> <span class="p">{</span><span class="sh">"</span><span class="s">functions_with_error_strings</span><span class="sh">"</span><span class="p">:</span> <span class="n">results</span><span class="p">}</span>
</code></pre></div></div>

<p>One tool call. The LLM gets back every function that references an error string, with its decompiled size, ready for triage. The same workflow pattern from the <a href="/post/2026/04/21/ida-mcp-2.2">2.2 post</a> applies here (the only difference being response field names like <code class="language-plaintext highlighter-rouge">decompiled_code</code> vs. <code class="language-plaintext highlighter-rouge">pseudocode</code>).</p>

<h2 id="comparing-engines">Comparing engines</h2>

<p>There’s a practical reason to support both backends even if you already have an IDA license. IDA and Ghidra have different analysis engines, different heuristics for function boundary detection, different type propagation strategies. Running the same binary through both and comparing the output is a common practice in professional reverse engineering; each tool catches things the other misses.</p>

<p>With re-mcp, you configure both servers, and the LLM can open the same binary in each and compare function lists, decompiler output, and cross-references across the two.</p>

<h2 id="one-interface-two-engines">One interface, two engines</h2>

<p>Both backends implement the same core tool interface: identical tool names, identical parameters, and the same categories of information in responses (though individual field names in responses may differ slightly between engines). From a user’s perspective, it doesn’t matter which engine is running: the LLM issues the same tool calls and returns comparable results either way.</p>

<p>The shared surface covers the operations that define a reverse engineering session:</p>

<ul>
  <li><strong>Functions</strong>: list, decompile, disassemble, rename, set prototypes</li>
  <li><strong>Navigation</strong>: cross-references (to and from), imports, exports, entry points, names</li>
  <li><strong>Search</strong>: strings with regex filtering, byte patterns, immediate values</li>
  <li><strong>Types</strong>: local type libraries, structures, enums, type application</li>
  <li><strong>Annotation</strong>: comments, names, bookmarks</li>
  <li><strong>Patching</strong>: byte-level modification, segment operations</li>
  <li><strong>Meta-tools</strong>: <code class="language-plaintext highlighter-rouge">search_tools</code>, <code class="language-plaintext highlighter-rouge">get_schema</code>, <code class="language-plaintext highlighter-rouge">call</code>, <code class="language-plaintext highlighter-rouge">execute</code>, <code class="language-plaintext highlighter-rouge">batch</code></li>
</ul>

<p>An <code class="language-plaintext highlighter-rouge">execute</code> script that crawls error strings, decompiles referencing functions, and renames them follows the same logic on either engine; scripts only need to adjust for the field name differences noted above.</p>

<p>Each backend also retains capabilities specific to its engine. The IDA backend keeps everything from the 2.x releases: IDAPython scripting via <code class="language-plaintext highlighter-rouge">run_script</code>, file region mapping, executable rebuilding, IDC evaluation, and the eight guided prompts for structured analysis workflows. The Ghidra backend brings its own strengths: Function ID for automatic library function identification and data type archive support.</p>

<h2 id="architecture-and-transport">Architecture and transport</h2>

<p>re-mcp is a monorepo with three packages: <strong>re-mcp-core</strong> (supervisor, transport, meta-tools), <strong>re-mcp-ida</strong> (IDA Pro backend wrapping idalib), and <strong>re-mcp-ghidra</strong> (Ghidra backend wrapping <a href="https://github.com/NationalSecurityAgency/ghidra/tree/master/Ghidra/Features/PyGhidra">pyghidra</a>). The core package doesn’t depend on IDA or Ghidra. Backends are discovered through Python entry points, so you install only what you need:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># IDA users</span>
uv tool <span class="nb">install </span>re-mcp-ida

<span class="c"># Ghidra users</span>
uv tool <span class="nb">install </span>re-mcp-ghidra

<span class="c"># Both</span>
uv tool <span class="nb">install </span>re-mcp <span class="nt">--with</span> re-mcp-ida <span class="nt">--with</span> re-mcp-ghidra
</code></pre></div></div>

<p>Future backends (Binary Ninja, radare2, or something that doesn’t exist yet) would slot in as additional packages implementing the same worker interface, with no changes to the core or any existing backend.</p>

<p>re-mcp 3.0 switches the default transport to direct stdio: one session, workers terminate on disconnect. This is simpler to set up than the HTTP daemon that ida-mcp 2.2 defaulted to, and it works universally with every MCP client. For workflows that need persistence, the daemon is still available via <code class="language-plaintext highlighter-rouge">proxy</code> or <code class="language-plaintext highlighter-rouge">serve</code> subcommands (e.g., <code class="language-plaintext highlighter-rouge">re-mcp-ghidra serve</code>, <code class="language-plaintext highlighter-rouge">re-mcp-ida serve</code>). The transport mode is independent of the backend; all options work the same for <code class="language-plaintext highlighter-rouge">re-mcp-ida</code>, <code class="language-plaintext highlighter-rouge">re-mcp-ghidra</code>, and the unified <code class="language-plaintext highlighter-rouge">re-mcp --backend &lt;name&gt;</code> command.</p>

<h2 id="migrating-from-ida-mcp">Migrating from ida-mcp</h2>

<p>The legacy <code class="language-plaintext highlighter-rouge">ida-mcp</code> PyPI package now redirects to <code class="language-plaintext highlighter-rouge">re-mcp-ida</code>. Existing installations continue to work after upgrading:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>uv tool <span class="nb">install</span> <span class="nt">--upgrade</span> ida-mcp
<span class="c"># or install directly</span>
uv tool <span class="nb">install </span>re-mcp-ida
</code></pre></div></div>

<p>The MCP tool interface is backward compatible. Existing <code class="language-plaintext highlighter-rouge">execute</code> scripts, <code class="language-plaintext highlighter-rouge">batch</code> operations, and direct tool calls work without changes. Requirements are unchanged: IDA Pro 9+ with Python 3.12+. The main visible difference is the entry point name (<code class="language-plaintext highlighter-rouge">ida-mcp</code> becomes <code class="language-plaintext highlighter-rouge">re-mcp-ida</code>), though the old name continues to work as an alias.</p>

<p>Environment variables follow the same pattern as before, prefixed per backend. <code class="language-plaintext highlighter-rouge">IDA_MCP_</code> variables carry over unchanged for the IDA backend; the Ghidra backend uses <code class="language-plaintext highlighter-rouge">GHIDRA_MCP_</code> with the same suffixes.</p>

<h2 id="links">Links</h2>

<ul>
  <li><strong>Repository</strong>: <a href="https://github.com/jtsylve/re-mcp">github.com/jtsylve/re-mcp</a></li>
  <li><strong>PyPI</strong>: <a href="https://pypi.org/project/re-mcp-ida/">re-mcp-ida</a> · <a href="https://pypi.org/project/re-mcp-ghidra/">re-mcp-ghidra</a> · <a href="https://pypi.org/project/re-mcp/">re-mcp</a></li>
</ul>

<p>If you run into issues or have feature requests, please <a href="https://github.com/jtsylve/re-mcp/issues">open an issue</a> on GitHub.</p>

<hr />

<p><em>IDA Pro and Hex-Rays are trademarks of Hex-Rays SA. Ghidra is developed by the National Security Agency. re-mcp is an independent project and is not affiliated with or endorsed by Hex-Rays or the NSA.</em></p>]]></content><author><name></name></author><category term="reverse-engineering" /><category term="tools" /><category term="ida-pro" /><category term="ghidra" /><category term="mcp" /><category term="llm" /><category term="ai" /><category term="idalib" /><category term="pyghidra" /><category term="reverse-engineering" /><summary type="html"><![CDATA[When I started building ida-mcp, the goal was simple: give an LLM headless access to IDA Pro through MCP (Model Context Protocol). Open a binary, decompile functions, follow cross-references, rename symbols.]]></summary></entry><entry><title type="html">ida-mcp 2.2: From Tool Calls to Analysis Scripts</title><link href="https://jtsylve.blog/post/2026/04/21/ida-mcp-2.2" rel="alternate" type="text/html" title="ida-mcp 2.2: From Tool Calls to Analysis Scripts" /><published>2026-04-21T00:00:00+00:00</published><updated>2026-04-21T00:00:00+00:00</updated><id>https://jtsylve.blog/post/2026/04/21/ida-mcp-2.2</id><content type="html" xml:base="https://jtsylve.blog/post/2026/04/21/ida-mcp-2.2"><![CDATA[<p><a href="https://github.com/jtsylve/ida-mcp">ida-mcp 2.2.0</a> is out. This release removes the friction between what the LLM <em>wants</em> to do and what MCP lets it express in a single round trip.</p>

<p>In 2.1, each action was a discrete tool call: decompile this function, get cross-references to that address, rename this symbol. Every step was a full MCP round trip. Every intermediate result landed in the context window. An analysis workflow that a human would express as a ten-line IDAPython script became thirty sequential tool calls, each waiting for the previous one to return before the LLM could decide what to do next. The LLM knew what it wanted to do, but it couldn’t say it all at once.</p>

<p>2.2 introduces meta-tools that let the LLM operate at a higher level of abstraction: writing multi-step analysis scripts, issuing bulk operations, and calling tools it discovers at runtime. It also makes the server persistent, so analysis state survives across sessions. And for the first time, ida-mcp can analyze firmware and raw binaries directly.</p>

<h2 id="meta-tools">Meta-tools</h2>

<h3 id="execute-sandboxed-analysis-scripts"><code class="language-plaintext highlighter-rouge">execute</code>: sandboxed analysis scripts</h3>

<p><code class="language-plaintext highlighter-rouge">execute</code> accepts Python code that calls IDA tools through <code class="language-plaintext highlighter-rouge">await invoke(name, params)</code>, with full control flow: loops, conditionals, regex, <code class="language-plaintext highlighter-rouge">struct</code> unpacking, list comprehensions. Individual tools are still the right choice for simple operations, but for multi-step analysis, the LLM becomes a script writer.</p>

<p>Consider a common reverse engineering task: finding every function that references an error string and understanding how each one handles the error. In 2.1, this was a multi-step conversation:</p>

<ol>
  <li>Call <code class="language-plaintext highlighter-rouge">get_strings</code> with a filter → get back 40 matching strings</li>
  <li>Call <code class="language-plaintext highlighter-rouge">get_xrefs_to</code> for the first string address → get back 3 cross-references</li>
  <li>Call <code class="language-plaintext highlighter-rouge">decompile_function</code> for each referencing function → get back pseudocode</li>
  <li>Repeat steps 2–3 for each of the remaining 39 strings</li>
</ol>

<p>That’s potentially 160+ tool calls, each a full round trip, with the LLM holding intermediate addresses in context between calls. If the context window fills up mid-workflow, earlier results get compacted and the LLM loses track of where it was.</p>

<p>With <code class="language-plaintext highlighter-rouge">execute</code>, the same workflow is a single tool call:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">strings</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">get_strings</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span><span class="sh">"</span><span class="s">filter</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">error|fail|panic</span><span class="sh">"</span><span class="p">})</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">strings</span><span class="p">[</span><span class="sh">"</span><span class="s">strings</span><span class="sh">"</span><span class="p">]:</span>
    <span class="n">xrefs</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">get_xrefs_to</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span><span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">:</span> <span class="n">s</span><span class="p">[</span><span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">]})</span>
    <span class="k">for</span> <span class="n">xref</span> <span class="ow">in</span> <span class="n">xrefs</span><span class="p">[</span><span class="sh">"</span><span class="s">xrefs</span><span class="sh">"</span><span class="p">]:</span>
        <span class="n">decomp</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">decompile_function</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span><span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">:</span> <span class="n">xref</span><span class="p">[</span><span class="sh">"</span><span class="s">from</span><span class="sh">"</span><span class="p">]})</span>
        <span class="n">results</span><span class="p">.</span><span class="nf">append</span><span class="p">({</span>
            <span class="sh">"</span><span class="s">string</span><span class="sh">"</span><span class="p">:</span> <span class="n">s</span><span class="p">[</span><span class="sh">"</span><span class="s">value</span><span class="sh">"</span><span class="p">],</span>
            <span class="sh">"</span><span class="s">function</span><span class="sh">"</span><span class="p">:</span> <span class="n">decomp</span><span class="p">[</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">],</span>
            <span class="sh">"</span><span class="s">pseudocode</span><span class="sh">"</span><span class="p">:</span> <span class="n">decomp</span><span class="p">[</span><span class="sh">"</span><span class="s">pseudocode</span><span class="sh">"</span><span class="p">]</span>
        <span class="p">})</span>
<span class="k">return</span> <span class="n">results</span>
</code></pre></div></div>

<p>One round trip. The LLM gets back a structured result containing every error-handling function with its decompiled pseudocode. No intermediate state to track, no context window spent on addresses it only needed temporarily. And if the LLM decides the approach is wrong, it’s only wasted one tool call finding out.</p>

<p>Any “get a list, then process each item” workflow collapses from O(n) tool calls to one. The bigger gain is for workflows that don’t reduce to sequential calls: conditional logic, data transformation, or cross-referencing between results.</p>

<p><strong>Automated renaming based on string references:</strong></p>

<p>A stripped binary might have thousands of <code class="language-plaintext highlighter-rouge">sub_*</code> functions with no meaningful names, but many of them reference string literals that hint at their purpose. A human analyst would scan through decompiled output, spot a string like <code class="language-plaintext highlighter-rouge">"failed to parse header"</code>, and rename the function accordingly. With <code class="language-plaintext highlighter-rouge">execute</code>, the LLM can do this systematically across the entire binary in a single tool call:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">re</span>

<span class="n">funcs</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">list_functions</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span><span class="sh">"</span><span class="s">filter</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">sub_</span><span class="sh">"</span><span class="p">})</span>
<span class="n">renamed</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">func</span> <span class="ow">in</span> <span class="n">funcs</span><span class="p">[</span><span class="sh">"</span><span class="s">functions</span><span class="sh">"</span><span class="p">]:</span>
    <span class="n">decomp</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">decompile_function</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span><span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">:</span> <span class="n">func</span><span class="p">[</span><span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">]})</span>
    <span class="n">strings</span> <span class="o">=</span> <span class="n">re</span><span class="p">.</span><span class="nf">findall</span><span class="p">(</span><span class="sa">r</span><span class="sh">'"</span><span class="s">([^</span><span class="sh">"</span><span class="s">]{4,})</span><span class="sh">"'</span><span class="p">,</span> <span class="n">decomp</span><span class="p">[</span><span class="sh">"</span><span class="s">pseudocode</span><span class="sh">"</span><span class="p">])</span>
    <span class="k">if</span> <span class="n">strings</span><span class="p">:</span>
        <span class="n">candidate</span> <span class="o">=</span> <span class="n">re</span><span class="p">.</span><span class="nf">sub</span><span class="p">(</span><span class="sa">r</span><span class="sh">'</span><span class="s">[^a-zA-Z0-9_]</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">_</span><span class="sh">'</span><span class="p">,</span> <span class="n">strings</span><span class="p">[</span><span class="mi">0</span><span class="p">])[:</span><span class="mi">40</span><span class="p">]</span>
        <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">rename_function</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span>
            <span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">:</span> <span class="n">func</span><span class="p">[</span><span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">],</span>
            <span class="sh">"</span><span class="s">new_name</span><span class="sh">"</span><span class="p">:</span> <span class="sa">f</span><span class="sh">"</span><span class="s">uses_</span><span class="si">{</span><span class="n">candidate</span><span class="si">}</span><span class="sh">"</span>
        <span class="p">})</span>
        <span class="n">renamed</span><span class="p">.</span><span class="nf">append</span><span class="p">({</span><span class="sh">"</span><span class="s">old</span><span class="sh">"</span><span class="p">:</span> <span class="n">func</span><span class="p">[</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">],</span> <span class="sh">"</span><span class="s">new</span><span class="sh">"</span><span class="p">:</span> <span class="sa">f</span><span class="sh">"</span><span class="s">uses_</span><span class="si">{</span><span class="n">candidate</span><span class="si">}</span><span class="sh">"</span><span class="p">})</span>
<span class="k">return</span> <span class="p">{</span><span class="sh">"</span><span class="s">renamed</span><span class="sh">"</span><span class="p">:</span> <span class="nf">len</span><span class="p">(</span><span class="n">renamed</span><span class="p">),</span> <span class="sh">"</span><span class="s">functions</span><span class="sh">"</span><span class="p">:</span> <span class="n">renamed</span><span class="p">}</span>
</code></pre></div></div>

<p>The names this generates are rough: a first pass rather than a final answer. But <code class="language-plaintext highlighter-rouge">uses_failed_to_parse_header</code> is vastly more useful than <code class="language-plaintext highlighter-rouge">sub_140001A30</code> when you’re trying to understand a binary’s structure, and the LLM can refine them in a second pass once it understands the broader architecture.</p>

<p><strong>Cross-database patch diffing:</strong></p>

<p>Patch analysis requires comparing function lists between two versions of a library, identifying what was added or removed, and diffing the implementations that exist in both. Without <code class="language-plaintext highlighter-rouge">execute</code>, the LLM would pull function lists from each database in separate tool calls, hold both in context, compute set differences itself, and decompile changed functions one at a time. Dozens of round trips, large intermediate results sitting in context.</p>

<p>With <code class="language-plaintext highlighter-rouge">execute</code>, the entire triage happens server-side:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">old_funcs</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">list_functions</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span><span class="sh">"</span><span class="s">database</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">libcrypto_1.1.1</span><span class="sh">"</span><span class="p">})</span>
<span class="n">new_funcs</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">list_functions</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span><span class="sh">"</span><span class="s">database</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">libcrypto_1.1.2</span><span class="sh">"</span><span class="p">})</span>

<span class="n">old_names</span> <span class="o">=</span> <span class="p">{</span><span class="n">f</span><span class="p">[</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">]</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">old_funcs</span><span class="p">[</span><span class="sh">"</span><span class="s">functions</span><span class="sh">"</span><span class="p">]}</span>
<span class="n">new_names</span> <span class="o">=</span> <span class="p">{</span><span class="n">f</span><span class="p">[</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">]</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">new_funcs</span><span class="p">[</span><span class="sh">"</span><span class="s">functions</span><span class="sh">"</span><span class="p">]}</span>

<span class="n">added</span> <span class="o">=</span> <span class="nf">sorted</span><span class="p">(</span><span class="n">new_names</span> <span class="o">-</span> <span class="n">old_names</span><span class="p">)</span>
<span class="n">removed</span> <span class="o">=</span> <span class="nf">sorted</span><span class="p">(</span><span class="n">old_names</span> <span class="o">-</span> <span class="n">new_names</span><span class="p">)</span>

<span class="c1"># Spot-check shared functions for implementation changes
</span><span class="n">changed</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="nf">sorted</span><span class="p">(</span><span class="n">old_names</span> <span class="o">&amp;</span> <span class="n">new_names</span><span class="p">)[:</span><span class="mi">30</span><span class="p">]:</span>
    <span class="n">old_dec</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">decompile_function</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span>
        <span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">:</span> <span class="n">name</span><span class="p">,</span> <span class="sh">"</span><span class="s">database</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">libcrypto_1.1.1</span><span class="sh">"</span>
    <span class="p">})</span>
    <span class="n">new_dec</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">decompile_function</span><span class="sh">"</span><span class="p">,</span> <span class="p">{</span>
        <span class="sh">"</span><span class="s">address</span><span class="sh">"</span><span class="p">:</span> <span class="n">name</span><span class="p">,</span> <span class="sh">"</span><span class="s">database</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">libcrypto_1.1.2</span><span class="sh">"</span>
    <span class="p">})</span>
    <span class="k">if</span> <span class="n">old_dec</span><span class="p">[</span><span class="sh">"</span><span class="s">pseudocode</span><span class="sh">"</span><span class="p">]</span> <span class="o">!=</span> <span class="n">new_dec</span><span class="p">[</span><span class="sh">"</span><span class="s">pseudocode</span><span class="sh">"</span><span class="p">]:</span>
        <span class="n">changed</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>

<span class="k">return</span> <span class="p">{</span>
    <span class="sh">"</span><span class="s">added</span><span class="sh">"</span><span class="p">:</span> <span class="n">added</span><span class="p">[:</span><span class="mi">50</span><span class="p">],</span>
    <span class="sh">"</span><span class="s">removed</span><span class="sh">"</span><span class="p">:</span> <span class="n">removed</span><span class="p">[:</span><span class="mi">50</span><span class="p">],</span>
    <span class="sh">"</span><span class="s">changed</span><span class="sh">"</span><span class="p">:</span> <span class="n">changed</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">summary</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
        <span class="sh">"</span><span class="s">added</span><span class="sh">"</span><span class="p">:</span> <span class="nf">len</span><span class="p">(</span><span class="n">added</span><span class="p">),</span>
        <span class="sh">"</span><span class="s">removed</span><span class="sh">"</span><span class="p">:</span> <span class="nf">len</span><span class="p">(</span><span class="n">removed</span><span class="p">),</span>
        <span class="sh">"</span><span class="s">shared_checked</span><span class="sh">"</span><span class="p">:</span> <span class="nf">min</span><span class="p">(</span><span class="mi">30</span><span class="p">,</span> <span class="nf">len</span><span class="p">(</span><span class="n">old_names</span> <span class="o">&amp;</span> <span class="n">new_names</span><span class="p">)),</span>
        <span class="sh">"</span><span class="s">shared_changed</span><span class="sh">"</span><span class="p">:</span> <span class="nf">len</span><span class="p">(</span><span class="n">changed</span><span class="p">)</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">database</code> parameter override lets a single <code class="language-plaintext highlighter-rouge">execute</code> block work across multiple open databases. Each <code class="language-plaintext highlighter-rouge">invoke</code> call can target a different database by name. The LLM gets back a structured summary of what changed between versions, and can then drill into specific changed functions in follow-up calls. The set operations, sorting, and conditional comparison all happen server-side rather than burning context on intermediate data the LLM only needs to pass through.</p>

<h4 id="the-sandbox">The sandbox</h4>

<p>The code runs in a <a href="https://restrictedpython.readthedocs.io/">RestrictedPython</a> sandbox. The LLM can import <code class="language-plaintext highlighter-rouge">re</code>, <code class="language-plaintext highlighter-rouge">struct</code>, <code class="language-plaintext highlighter-rouge">json</code>, <code class="language-plaintext highlighter-rouge">math</code>, <code class="language-plaintext highlighter-rouge">collections</code>, <code class="language-plaintext highlighter-rouge">itertools</code>, <code class="language-plaintext highlighter-rouge">functools</code>, and a few other safe standard library modules. It cannot access the filesystem, open network connections, or spawn subprocesses. Attribute access to dunder names (<code class="language-plaintext highlighter-rouge">__class__</code>, <code class="language-plaintext highlighter-rouge">__globals__</code>, <code class="language-plaintext highlighter-rouge">__code__</code>) is blocked at the AST level, closing Python sandbox escape hatches. Print output is capped at ~1 MiB to prevent runaway loops from exhausting worker memory.</p>

<p>Database lifecycle tools (<code class="language-plaintext highlighter-rouge">open_database</code>, <code class="language-plaintext highlighter-rouge">close_database</code>, <code class="language-plaintext highlighter-rouge">wait_for_analysis</code>) are blocked inside the sandbox; an <code class="language-plaintext highlighter-rouge">execute</code> block shouldn’t be spawning or tearing down workers as a side effect. The meta-tools themselves (<code class="language-plaintext highlighter-rouge">execute</code>, <code class="language-plaintext highlighter-rouge">batch</code>, <code class="language-plaintext highlighter-rouge">call</code>) are also blocked to prevent recursion. Everything else (decompilation, disassembly, renaming, commenting, type manipulation, structure editing) is available through <code class="language-plaintext highlighter-rouge">await invoke()</code>.</p>

<p>A failed <code class="language-plaintext highlighter-rouge">invoke</code> call raises a Python exception that the script can catch with <code class="language-plaintext highlighter-rouge">try</code>/<code class="language-plaintext highlighter-rouge">except</code>, or that terminates the block with an error message if uncaught.</p>

<p>If the LLM writes an <code class="language-plaintext highlighter-rouge">execute</code> block that contains a single <code class="language-plaintext highlighter-rouge">invoke</code> call with no processing logic around it, the server detects this and returns a hint suggesting the simpler <code class="language-plaintext highlighter-rouge">call</code> meta-tool instead. Small nudges like this help the LLM learn the right tool for the job over the course of a session.</p>

<h3 id="batch-bulk-operations-without-scripting-overhead"><code class="language-plaintext highlighter-rouge">batch</code>: bulk operations without scripting overhead</h3>

<p>Not every multi-call workflow needs control flow. Sometimes it’s the same operation twenty times: decompile a list of functions, rename a set of symbols, add comments at known addresses. For these, <code class="language-plaintext highlighter-rouge">execute</code> is overkill: sandbox overhead just to loop over a list. <code class="language-plaintext highlighter-rouge">batch</code> handles this directly: a list of operations, run sequentially with per-item error handling.</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"operations"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="p">{</span><span class="nl">"tool"</span><span class="p">:</span><span class="w"> </span><span class="s2">"decompile_function"</span><span class="p">,</span><span class="w"> </span><span class="nl">"params"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nl">"address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0x401000"</span><span class="p">}},</span><span class="w">
    </span><span class="p">{</span><span class="nl">"tool"</span><span class="p">:</span><span class="w"> </span><span class="s2">"decompile_function"</span><span class="p">,</span><span class="w"> </span><span class="nl">"params"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nl">"address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0x401100"</span><span class="p">}},</span><span class="w">
    </span><span class="p">{</span><span class="nl">"tool"</span><span class="p">:</span><span class="w"> </span><span class="s2">"rename_function"</span><span class="p">,</span><span class="w"> </span><span class="nl">"params"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nl">"address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0x401000"</span><span class="p">,</span><span class="w"> </span><span class="nl">"new_name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"parse_header"</span><span class="p">}},</span><span class="w">
    </span><span class="p">{</span><span class="nl">"tool"</span><span class="p">:</span><span class="w"> </span><span class="s2">"rename_function"</span><span class="p">,</span><span class="w"> </span><span class="nl">"params"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nl">"address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0x401100"</span><span class="p">,</span><span class="w"> </span><span class="nl">"new_name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"validate_checksum"</span><span class="p">}},</span><span class="w">
    </span><span class="p">{</span><span class="nl">"tool"</span><span class="p">:</span><span class="w"> </span><span class="s2">"set_comment"</span><span class="p">,</span><span class="w"> </span><span class="nl">"params"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nl">"address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0x401000"</span><span class="p">,</span><span class="w"> </span><span class="nl">"comment"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Entry point for packet parsing"</span><span class="p">}},</span><span class="w">
    </span><span class="p">{</span><span class="nl">"tool"</span><span class="p">:</span><span class="w"> </span><span class="s2">"set_comment"</span><span class="p">,</span><span class="w"> </span><span class="nl">"params"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nl">"address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0x401100"</span><span class="p">,</span><span class="w"> </span><span class="nl">"comment"</span><span class="p">:</span><span class="w"> </span><span class="s2">"CRC-32 validation"</span><span class="p">}}</span><span class="w">
  </span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Up to 50 operations per call, mixing different tools freely. This example decompiles two functions, renames them, and annotates them: six operations that would have been six separate tool calls in 2.1, collapsed into one.</p>

<p>In 2.1, batching was baked into individual tools: <code class="language-plaintext highlighter-rouge">decompile_function</code> accepted up to 50 addresses, <code class="language-plaintext highlighter-rouge">get_xrefs_to</code> accepted up to 50, each with its own batch parameter format. The LLM had to remember which tools supported batching and how each one worked. The unified <code class="language-plaintext highlighter-rouge">batch</code> meta-tool replaces all of that: a list of <code class="language-plaintext highlighter-rouge">{tool, params}</code> objects. Any tool can be batched.</p>

<p><code class="language-plaintext highlighter-rouge">stop_on_error</code> controls whether the batch aborts on the first failure or continues collecting results. The default is to continue: if 30 functions are being renamed and one address is invalid, the other 29 still succeed. The response includes per-operation success/failure status, so the LLM can see exactly what failed and decide whether to retry or move on.</p>

<p>The split is straightforward: if there’s no data dependency between operations (the output of one doesn’t feed into another), the LLM uses <code class="language-plaintext highlighter-rouge">batch</code>. If the workflow chains outputs, filters intermediate results, or applies conditional logic, it writes an <code class="language-plaintext highlighter-rouge">execute</code> script.</p>

<h3 id="call-and-get_schema-the-discovery-layer"><code class="language-plaintext highlighter-rouge">call</code> and <code class="language-plaintext highlighter-rouge">get_schema</code>: the discovery layer</h3>

<p>2.1 introduced progressive tool discovery: ~20 core tools registered upfront, the rest discoverable via <code class="language-plaintext highlighter-rouge">search_tools</code> and callable through <code class="language-plaintext highlighter-rouge">call_tool</code>. 2.2 refines this into a cleaner surface:</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">search_tools</code></strong>: regex search over tool names, descriptions, and tags. Returns compact signatures by default; pass <code class="language-plaintext highlighter-rouge">detail="detailed"</code> for descriptions or <code class="language-plaintext highlighter-rouge">detail="full"</code> for complete schemas.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">get_schema</code></strong>: fetch the full parameter schema for a specific tool by name, skipping the search when the LLM already knows what it wants.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">call</code></strong>: invoke any tool by name (renamed from <code class="language-plaintext highlighter-rouge">call_tool</code>), including hidden tools not in the client’s tool list.</li>
</ul>

<p>~25 tools are now pinned (up from ~20), and the total count is down to ~125 after 2.1’s resource consolidation. The remaining ~100 specialized tools are discoverable through <code class="language-plaintext highlighter-rouge">search_tools</code> and callable through <code class="language-plaintext highlighter-rouge">call</code>, <code class="language-plaintext highlighter-rouge">batch</code>, or <code class="language-plaintext highlighter-rouge">execute</code>.</p>

<p>Together, the five meta-tools form a hierarchy:</p>

<table>
  <thead>
    <tr>
      <th>Need</th>
      <th>Meta-tool</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Find a tool</td>
      <td><code class="language-plaintext highlighter-rouge">search_tools</code></td>
    </tr>
    <tr>
      <td>Check its parameters</td>
      <td><code class="language-plaintext highlighter-rouge">get_schema</code></td>
    </tr>
    <tr>
      <td>Call it once</td>
      <td><code class="language-plaintext highlighter-rouge">call</code> (or directly, if pinned)</td>
    </tr>
    <tr>
      <td>Call many tools independently</td>
      <td><code class="language-plaintext highlighter-rouge">batch</code></td>
    </tr>
    <tr>
      <td>Chain tool outputs with logic</td>
      <td><code class="language-plaintext highlighter-rouge">execute</code></td>
    </tr>
  </tbody>
</table>

<p>The LLM picks the right level without prompting. A quick rename uses a pinned tool directly. A bulk annotation uses <code class="language-plaintext highlighter-rouge">batch</code>. A multi-step investigation uses <code class="language-plaintext highlighter-rouge">execute</code>. When it needs something specialized (applying a calling convention, editing register variables), it searches, checks the schema, and calls through <code class="language-plaintext highlighter-rouge">call</code>.</p>

<h2 id="daemon-mode">Daemon mode</h2>

<p>The meta-tools only pay off if the server stays alive long enough to use them. In 2.1, ida-mcp ran as a stdio subprocess of the MCP client. When the client disconnected (closing an editor, cycling a session, restarting after a crash), the server process died and took all worker state with it. Every open database, every completed auto-analysis pass, every renamed function: gone. For quick, single-session analysis, this was acceptable. But reverse engineering work rarely fits in a single session. You open a binary, let auto-analysis run, rename a few hundred functions, apply types, and then come back the next day to continue. Or the session cycles for an unrelated reason and you lose everything.</p>

<p>The problem was worse in Claude Code, where subagents share a single MCP session. A subagent halfway through analyzing a firmware image (hundreds of functions renamed, types applied) loses everything when the session cycles. It reconnects, but has to reopen, re-analyze, and reconstruct its progress from whatever survived context compaction.</p>

<p>In 2.2, the server runs as a persistent HTTP daemon behind a lightweight stdio proxy:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LLM Client  &lt;──stdio──&gt;  Proxy  &lt;──HTTP──&gt;  Daemon
                                             Workers + Databases
</code></pre></div></div>

<p>The first time an MCP client connects, the proxy spawns a daemon process and detaches it. Subsequent connections (including reconnections after a session cycle, from a different editor, or from a completely new conversation) reuse the running daemon. Workers and their databases persist across disconnects: renamed symbols, added comments, applied types all survive.</p>

<p>The daemon also supports collaboration across clients. If a human analyst has been annotating a binary through one MCP session, a second session connecting to the same daemon sees all those annotations immediately. The daemon doesn’t care who made the changes; it just maintains the databases.</p>

<p>The daemon listens on <code class="language-plaintext highlighter-rouge">127.0.0.1</code> with a per-instance 256-bit bearer token. The state file is written with <code class="language-plaintext highlighter-rouge">0600</code> permissions so only the spawning user can read the token. To stop the daemon:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ida-mcp stop
</code></pre></div></div>

<p>This is the default transport now. Existing MCP client configurations (<code class="language-plaintext highlighter-rouge">ida-mcp</code> as the command) work without changes. The proxy handles daemon lifecycle transparently.</p>

<h2 id="raw-binary-and-firmware-support">Raw binary and firmware support</h2>

<p>ida-mcp could already open ELF, PE, and Mach-O files, where IDA auto-detects the architecture and load address from file headers. But firmware analysis (bootloaders, ROM dumps, flash extractions) starts with a blob of bytes and no metadata. Previously, you had to preprocess the binary in IDA’s GUI or write a loader script before ida-mcp could work with it. In 2.2, <code class="language-plaintext highlighter-rouge">open_database</code> accepts three new parameters that give the LLM what it needs to bootstrap analysis on raw binaries:</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">processor</code></strong>: the IDA processor module with an optional variant (e.g., <code class="language-plaintext highlighter-rouge">arm:ARMv7-M</code> for Cortex-M firmware, <code class="language-plaintext highlighter-rouge">metapc:80386p</code> for 32-bit x86, <code class="language-plaintext highlighter-rouge">mips:mipsl</code> for little-endian MIPS)</li>
  <li><strong><code class="language-plaintext highlighter-rouge">loader</code></strong>: explicit loader selection (e.g., <code class="language-plaintext highlighter-rouge">"Binary file"</code> for raw blobs)</li>
  <li><strong><code class="language-plaintext highlighter-rouge">base_address</code></strong>: the load address in hex or decimal (e.g., <code class="language-plaintext highlighter-rouge">"0x08000000"</code> for a typical STM32 flash base)</li>
</ul>

<p>For structured formats, these parameters are optional. IDA figures them out from the file headers. For raw binaries, the LLM needs to provide them. If the user says “analyze this Cortex-M firmware dump loaded at 0x08000000,” those three parameters map directly:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"file_path"</span><span class="p">:</span><span class="w"> </span><span class="s2">"/path/to/firmware.bin"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"processor"</span><span class="p">:</span><span class="w"> </span><span class="s2">"arm:ARMv7-M"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"loader"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Binary file"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"base_address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0x08000000"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The server validates processor names and catches a subtle headless-mode pitfall: processor names like <code class="language-plaintext highlighter-rouge">arm</code>, <code class="language-plaintext highlighter-rouge">metapc</code>, and <code class="language-plaintext highlighter-rouge">mips</code> are ambiguous. In IDA’s GUI, selecting one of these pops up a dialog asking which variant you mean: ARM or AArch64? 32-bit or 64-bit x86? But headless <code class="language-plaintext highlighter-rouge">idalib</code> never shows that dialog. It silently picks a default, and the default is often wrong. A Cortex-M firmware blob opened with bare <code class="language-plaintext highlighter-rouge">arm</code> ends up disassembled as AArch64, producing nonsense.</p>

<p>The server rejects these bare names on raw binaries and returns the available variants with descriptions:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"arm" is ambiguous for raw binaries. It defaults to AArch64 in headless mode.
Use a specific variant:
  arm:ARMv7-M    Cortex-M (32-bit Thumb-2)
  arm:ARMv7-A    32-bit A-profile
  arm:AArch64    64-bit (explicit)
</code></pre></div></div>

<p>The LLM can also call <code class="language-plaintext highlighter-rouge">list_targets</code> to enumerate all available processors and loaders, so it can match an unknown binary to the right target without guessing.</p>

<h2 id="fat-mach-o-support">Fat Mach-O support</h2>

<p>macOS universal binaries pack multiple architecture slices into a single file. In 2.1, opening one would silently pick whichever slice IDA defaulted to, usually arm64, even when the target was x86_64. Nothing indicated the wrong slice had been selected until the disassembly didn’t make sense.</p>

<p>In 2.2, the server parses the fat header, identifies the available slices, and requires the caller to choose explicitly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>AmbiguousFatBinary: universal binary contains multiple architectures.
Available slices: arm64, arm64e, x86_64
Pass fat_arch="arm64" to select a slice.
</code></pre></div></div>

<p>Each slice gets its own <code class="language-plaintext highlighter-rouge">.i64</code> sidecar (<code class="language-plaintext highlighter-rouge">binary.arm64.i64</code>, <code class="language-plaintext highlighter-rouge">binary.x86_64.i64</code>), so multiple architectures can be opened simultaneously in separate workers. Combined with <code class="language-plaintext highlighter-rouge">execute</code>’s cross-database support, the LLM can decompile the same function in both the arm64 and x86_64 slices and diff the pseudocode. This helps when finding platform-specific behavior, verifying that a vulnerability affects all architectures, or understanding how the compiler optimized differently for each target.</p>

<p>The fat header parser also handles an edge case that has bitten other tools: Java <code class="language-plaintext highlighter-rouge">.class</code> files share the same magic bytes (<code class="language-plaintext highlighter-rouge">0xCAFEBABE</code>) as Mach-O fat binaries. The parser validates slice counts and CPU types to distinguish the two, so a directory full of Java classes won’t trigger false fat-binary detection.</p>

<h2 id="tuning-for-your-model-and-client">Tuning for your model and client</h2>

<p>Not every model writes good Python, and not every MCP client needs server-side tool discovery. The meta-tools are designed to be independently useful, so you can enable the ones that match your setup and disable the ones that don’t.</p>

<p>Three environment variables control which meta-tools are available:</p>

<ul>
  <li>
    <p><strong><code class="language-plaintext highlighter-rouge">IDA_MCP_DISABLE_EXECUTE</code></strong>: hides the <code class="language-plaintext highlighter-rouge">execute</code> meta-tool. Smaller models or those without strong code generation can produce unreliable Python in <code class="language-plaintext highlighter-rouge">execute</code> blocks: wrong parameter names, broken control flow, off-by-one iteration. For these models, discrete tool calls are more reliable: each call is independently validated, and errors are clear and localized. Disabling <code class="language-plaintext highlighter-rouge">execute</code> keeps <code class="language-plaintext highlighter-rouge">batch</code> for bulk operations and <code class="language-plaintext highlighter-rouge">call</code> for hidden tools.</p>
  </li>
  <li>
    <p><strong><code class="language-plaintext highlighter-rouge">IDA_MCP_DISABLE_BATCH</code></strong>: hides the <code class="language-plaintext highlighter-rouge">batch</code> meta-tool. Useful if your workflow routes all multi-step work through <code class="language-plaintext highlighter-rouge">execute</code> anyway, since having both visible can lead the LLM to pick the wrong one.</p>
  </li>
  <li>
    <p><strong><code class="language-plaintext highlighter-rouge">IDA_MCP_DISABLE_TOOL_SEARCH</code></strong>: disables server-side progressive disclosure entirely. All ~125 tools become directly visible in the client’s tool list, and <code class="language-plaintext highlighter-rouge">search_tools</code> and <code class="language-plaintext highlighter-rouge">get_schema</code> are removed. This is the right setting for clients like Claude Code that already implement their own tool deferral. Claude Code defers tool schemas and loads them on demand. If ida-mcp is <em>also</em> hiding tools behind <code class="language-plaintext highlighter-rouge">search_tools</code>, the LLM has to go through two layers of discovery to reach a specialized tool. Disabling the server-side layer removes the redundancy.</p>
  </li>
</ul>

<p>These are environment variables on the server process, so they apply to all sessions against that daemon. Set them in your MCP client configuration:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"command"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ida-mcp"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"env"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"IDA_MCP_DISABLE_TOOL_SEARCH"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1"</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>As a starting point: if you’re using Claude (Opus or Sonnet) through Claude Code, disable tool search. If you’re using a smaller model or a client without native tool deferral, leave everything enabled and let server-side progressive disclosure handle it.</p>

<h2 id="other-improvements">Other improvements</h2>

<ul>
  <li><strong>Per-run log files</strong>: Each server run writes to its own timestamped log file, and <code class="language-plaintext highlighter-rouge">open_database</code> warnings (e.g., loader compatibility issues) are surfaced to the client instead of silently swallowed. When something goes wrong, you can find the relevant log without scrolling through a monolithic file.</li>
  <li><strong>Heartbeat progress reporting</strong>: <code class="language-plaintext highlighter-rouge">save_database</code> and <code class="language-plaintext highlighter-rouge">execute</code> blocks send progress notifications every 5 seconds to prevent client timeouts on large databases. Saving a database with millions of functions and extensive annotations can take minutes; without heartbeats, the MCP client would assume the server had hung and disconnect.</li>
  <li><strong>Database reopen fix</strong>: Reopening an existing <code class="language-plaintext highlighter-rouge">.i64</code> no longer passes stale loader options that caused <code class="language-plaintext highlighter-rouge">idalib</code> to <code class="language-plaintext highlighter-rouge">exit(1)</code> on format mismatch. This was annoying because the failure mode was a silent exit with no error message: the worker just disappeared.</li>
</ul>

<h2 id="upgrading">Upgrading</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>uv tool <span class="nb">install</span> <span class="nt">--upgrade</span> ida-mcp
</code></pre></div></div>

<p>Or with pip:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install</span> <span class="nt">--upgrade</span> ida-mcp
</code></pre></div></div>

<p>The MCP interface is backward compatible. Existing client configurations work without changes. The daemon spawns automatically on first connection.</p>

<h2 id="links">Links</h2>

<ul>
  <li><strong>Repository</strong>: <a href="https://github.com/jtsylve/ida-mcp">github.com/jtsylve/ida-mcp</a></li>
  <li><strong>PyPI</strong>: <a href="https://pypi.org/project/ida-mcp/">pypi.org/project/ida-mcp</a></li>
  <li><strong>Previous post</strong>: <a href="/post/2026/04/07/ida-mcp-2.1">ida-mcp 2.1: Progressive Tool Discovery, Background Analysis, and Batch Operations</a></li>
</ul>

<p>If you run into issues or have feature requests, please <a href="https://github.com/jtsylve/ida-mcp/issues">open an issue</a> on GitHub.</p>

<hr />

<p><em>IDA Pro and Hex-Rays are trademarks of Hex-Rays SA. ida-mcp is an independent project and is not affiliated with or endorsed by Hex-Rays.</em></p>]]></content><author><name></name></author><category term="reverse-engineering" /><category term="tools" /><category term="ida-pro" /><category term="mcp" /><category term="llm" /><category term="ai" /><category term="idalib" /><category term="reverse-engineering" /><summary type="html"><![CDATA[ida-mcp 2.2.0 is out. This release removes the friction between what the LLM wants to do and what MCP lets it express in a single round trip.]]></summary></entry></feed>