Checkpoint Maps and Ephemeral Objects
In our last post, we discussed NX Superblock Objects and how they can be used to locate the Checkpoint Descriptor Area in which they are stored. Today, we will discuss the other type of objects that are stored in the descriptor area, Checkpoint Maps, and how they can be used to find persistent, ephemeral objects on disk.
On-Disk Structures
Each Checkpoint Mapping structure gives information about a single ephemeral object that is stored in the Checkpoint Data Area.
typedef struct checkpoint_mapping {
uint32_t cpm_type; // 0x00
uint32_t cpm_subtype; // 0x04
uint32_t cpm_size; // 0x08
uint32_t cpm_pad; // 0x0C
oid_t cpm_fs_oid; // 0x10
oid_t cpm_oid; // 0x18
oid_t cpm_paddr; // 0x20
} checkpoint_mapping_t; // 0x28
cpm_type: The type of the mapped objectcpm_subtype: The (optional) subtype of the mapped objectcpm_size: The size (in-bytes) of the mapped objectcpm_pad: reserved paddingcpm_fs_oid: The virtual object identifier of the file system that owns the ephemeral objectcpm_oid: The object identifier of the mapped objectcpm_paddr: The physical address of the start of the object
Checkpoint Map Objects contain a simple array of checkpoint_mapping_t structures. Each entry in the map corresponds to an ephemeral object stored in the Checkpoint Data Area. If there are more mappings than can fit into a single Checkpoint Map Object, additional map objects are added to the Checkpoint Descriptor Area. A Checkpoint’s final Checkpoint Map Object is marked with the CHECKPOINT_MAP_LAST flag.
#define CHECKPOINT_MAP_LAST 0x00000001
typedef struct checkpoint_map_phys {
obj_phys_t cpm_o; // 0x00
uint32_t cpm_flags; // 0x20
uint32_t cpm_count; // 0x24
checkpoint_mapping_t cpm_map[]; // 0x28
} checkpoint_map_phys_t;
cpm_o: The object headercpm_flags: A set of bit-flags. Currently, onlyCHECKPOINT_MAP_LASTis definedcpm_count: The number of mappings stored in this Checkpoint Mapcpm_map: An array ofcpm_countCheckpoint Mappings
Locating Ephemeral Objects
Once you’ve identified the location of the Checkpoint Data Area, enumeration of on-disk ephemeral objects is simple. NOTE: You cannot rely on the block-zero NX Superblock copy. You must locate the NX Superblock that belongs to the Checkpoint you’re examining.
Because there are relatively few persistent, ephemeral objects, linear time enumeration of all of a checkpoint’s mappings is practical. This means that there aren’t any complex data structures that get in between us and the objects that we’re looking for.
-
The
nx_xp_desc_indexmember of the Checkpoint’snx_superblock_tstores a zero-based block index into the Checkpoint Descriptor Area. This is the location of the first Checkpoint Map Object. Locate this object and validate it using the checksum stored in its object header. -
Read the
cpm_countmember of the Checkpoint Map. This contains the number of Checkpoint Mappings stored in the current map. -
Enumerate the mappings stored in the
cpm_maparray. These mappings each contain information about an on-disk ephemeral object, including the physical block address in which it is stored. -
Once all mappings have been enumerated, read the
cpm_flagsmember. If the bit defined inCHECKPOINT_MAP_LASTis set, you’ve reached the end of your journey; otherwise, there are more ephemeral objects to enumerate. -
The next Checkpoint Map Object should follow the current map object, but it is important to remember that the Checkpoint Descriptor Area acts as a circular buffer. You can determine the number of blocks in the Checkpoint Descriptor Area, by reading the
nx_xp_desc_blocksmember of the NX Superblock and ignoring the most-significant bit. If the current map is stored in the last block of the descriptor area, then the next map will be stored in the first.
// calculating the next index in the circular buffer
next_index = current_index + 1;
if (next_index == (nx_xp_desc_blocks & 0x7FFFFFFF)) {
next_index = 0;
}
// alternatively...
next_index = (current_index + 1) % (nx_xp_desc_blocks & 0x7FFFFFFF);
Checkpoint Validation
When loading a checkpoint, each mapping block and its entries must be validated before the ephemeral objects they reference can be trusted:
Mapping block validation:
o_typemust beOBJECT_TYPE_CHECKPOINT_MAPwith the physical flag set (0x4000000C)o_subtypemust be zeroo_xidmust match the checkpoint superblock’so_xido_oidmust equal the physical block address where the mapping block is storedcpm_countmust not exceed(block_size - 40) / 40(eachcheckpoint_mapping_tis 40 bytes, and the fixed header is 40 bytes)- The last mapping block must have
CHECKPOINT_MAP_LASTset; earlier blocks must not
Per-entry validation:
cpm_typemust haveOBJ_EPHEMERAL(0x80000000) set, and the low 16-bit type must be one of:OBJECT_TYPE_BTREE(2),OBJECT_TYPE_BTREE_NODE(3),OBJECT_TYPE_SPACEMAN(5),OBJECT_TYPE_NX_REAPER(0x11), orOBJECT_TYPE_NX_REAP_LIST(0x12)cpm_subtypemust not haveOBJ_PHYSICALorOBJ_EPHEMERALset, and its low 16 bits must be a valid subtype (zero or one of the recognized B-tree subtypes such asOBJECT_TYPE_OMAP,OBJECT_TYPE_FSTREE, orOBJECT_TYPE_SPACEMAN_FREE_QUEUE)cpm_oidmust be nonzerocpm_paddrmust fall within the checkpoint data areacpm_sizemust be nonzero, block-aligned, and fit within the data area bounds
When reading ephemeral objects from the data area, verify each object’s checksum and confirm that its o_type, o_subtype, o_oid, and o_xid match the corresponding mapping entry. If any ephemeral object is malformed or metadata ranges overlap, the entire checkpoint is invalid; the mount procedure must fall back to an older checkpoint from the descriptor area.
Recovery from Invalid Checkpoints
If a checkpoint’s mapping blocks or ephemeral objects fail validation, APFS does not give up. It falls back to scanning the checkpoint descriptor area for an older valid checkpoint with a lower transaction identifier. This is why the descriptor area is a circular buffer: it retains multiple checkpoints, and the mount procedure can walk backward to find one that is fully consistent.
On untrusted storage (external or removable media), additional consistency checks are performed on recently-changed container structures. If these checks also fail, the mount procedure continues scanning backward. This ensures that even after a crash that corrupted the most recent checkpoint, the container can recover to a consistent state.
Conclusion
Compared to other kinds of objects in APFS, each checkpoint only maintains a relatively small number of on-disk ephemeral objects. Due to their nature, these objects are likely all read into memory at once when the Checkpoint is mounted. Thanks to these facts, ephemeral objects are stored on disk in a way that is relatively simple for us to find and enumerate. Proper validation of both the mapping blocks and the ephemeral objects they reference is essential for ensuring that the checkpoint represents a consistent state.
If only it were always that simple… Next up in this series we will discuss B-Trees, APFS’s method of choice for referencing potentially large sets of data on disk.
Find an issue or technical inaccuracy in this post? Please file an issue so that it may be corrected.