2022 APFS Advent Challenge Day 5 - Checkpoint Maps and Ephemeral Objects
In our last post, we discussed NX Superblock Objects and how they can be used to locate the Checkpoint Descriptor Area in which they are stored. Today, we will discuss the other type of objects that are stored in the descriptor area, Checkpoint Maps, and how they can be used to find persistent, ephemeral objects on disk.
On-Disk Structures
Each Checkpoint Mapping structure gives information about a single ephemeral object that is stored in the Checkpoint Data Area.
typedef struct checkpoint_mapping {
uint32_t cpm_type; // 0x00
uint32_t cpm_subtype; // 0x04
uint32_t cpm_size; // 0x08
uint32_t cpm_pad; // 0x0C
oid_t cpm_fs_oid; // 0x10
oid_t cpm_oid; // 0x18
oid_t cpm_paddr; // 0x30
} checkpoint_mapping_t; // 0x38
cpm_type
: The type of the mapped objectcpm_subtype
: The (optional) subtype of the mapped objectcmp_size
: The size (in-bytes) of the mapped objectcmp_pad
: reserved paddingcpm_fs_oid
: The virtual object identifier of the file system that owns the ephemeral objectcpm_oid
: The object identifier of the mapped objectcpm_paddr
: The physical address of the start of the object
Checkpoint Map Objects contain a simple array of checkpoint_mapping_t
structures. Each entry in the map corresponds to an ephemeral object stored in the Checkpoint Data Area. If there are more mappings than can fit into a single Checkpoint Map Object, additional map objects are added to the Checkpoint Descriptor Area. A Checkpoint’s final Checkpoint Map Object is marked with the CHECKPOINT_MAP_LAST
flag.
#define CHECKPOINT_MAP_LAST 0x00000001
typedef struct checkpoint_map_phys {
obj_phys_t cpm_o; // 0x00
uint32_t cpm_flags; // 0x20
uint32_t cpm_count; // 0x24
checkpoint_mapping_t cpm_map[]; // 0x28
} checkpoint_map_phys_t;
cpm_o
: The object headercpm_flags
: A set of bit-flags. Currently, onlyCHECKPOINT_MAP_LAST
is definedcmp_count
: The number of mappings stored in this Checkpoint Mapcmp_map
: An array ofcmp_count
Checkpoint Mappings
Locating Ephemeral Objects
Once you’ve identified the location of the Checkpoint Data Area, enumeration of on-disk ephemeral objects is fairly straight forward. NOTE: You cannot rely on the zero-block NX Superblock copy. You must locate the NX Superblock that belongs to the Checkpoint you’re examining.
Because there are relatively few persistent, ephemeral objects, linear time enumeration of all of a checkpoint’s mappings is practical. This means that there aren’t any complex data structures that get in between us and the objects that we’re looking for.
-
The
nx_xp_desc_index
member of the Checkpoint’snx_superblock_t
stores a zero-based block index into the Checkpoint Descriptor Area. This is the location of the first Checkpoint Map Object. Locate this object and validate it using the checksum stored in its object header. -
Read the
cmp_count
member of the Checkpoint Map. This contains the number of Checkpoint Mappings stored in the current map. -
Enumerate the mappings stored in the
cpm_map
array. These mappings each contain information about an on-disk ephemeral object, including the physical block address in which it is stored. -
Once all mappings have been enumerated, read the
cmp_flags
member. If the bit defined inCHECKPOINT_MAP_LAST
is set, you’ve reached the end of your journey; otherwise, there are more ephemeral objects to enumerate. -
The next Checkpoint Map Object should follow the current map object, but it is important to remember that the Checkpoint Descriptor Area acts as a circular buffer. You can determine the number of blocks in the Checkpoint Descriptor Area, by reading the
nx_xp_desc_blocks
member of the NX Superblock and ignoring the most-significant bit. If the current map is stored in the last block of the descriptor area, then the next map will be stored in the first.
// calculating the next index in the circular buffer
next_index = current_index + 1;
if (next_index == (nx_cp_desc_blocks & 0x7FFFFFFF)) {
next_index = 0;
}
// alternatively...
next_index = (current_index + 1) % (nx_cp_desc_blocks & 0x7FFFFFFF);
Conclusion
Compared to other kinds of objects in APFS, each checkpoint only maintains a relatively small amount of on-disk ephemeral objects. Due to their nature, these objects are likely all read into memory at once when the Checkpoint is mounted. Thanks to these facts, ephemeral objects are stored on disk in a way that is relatively simple for us to find and enumerate.
If only it were always that simple… Next up in this series we will discuss B-Trees – APFS’s method of choice for referencing potentially large sets of data on disk.
Find an issue or technical inaccuracy in this post? Please file an issue so that it may be corrected.