2022 APFS Advent Challenge Day 13 - Data Streams

Monday, December 19, 2022

Data in APFS that is too large to store within records are stored elsewhere on disk and referenced by data streams (dstreams). Similar to non-resident attributes in NTFS, APFS data streams manage a set of extents that reference the number and order of blocks on the disk which contain external data. In this post, we will discuss how data streams are used in APFS to manage one or more forks of data in inodes as well as their record structures in the File System Tree.

Inode Default Data Streams

Each file has a default data stream that stores what we typically refer to as the file’s data. This stream’s object identifier may or may not be different from the inode’s. It is stored in the private_id field of the inode’s j_inode_val_t structure. Metadata about the default data stream is stored as a j_dstream_t structure in an inode extended field with the type of INO_EXT_TYPE_DSTREAM.

typedef struct j_dstream {
    uint64_t size;                // 0x00
    uint64_t alloced_size;        // 0x08
    uint64_t default_crypto_id;   // 0x10
    uint64_t total_bytes_written; // 0x18
    uint64_t total_bytes_read;    // 0x20
} j_stream_t;                     // 0x28
  • size: The size of the logical data (in bytes)
  • alloced_size: The total space allocated for the data stream (in bytes), including any unused space
  • default_crypto_id: The default encryption key or tweak used in this data stream
  • total_bytes_written: The total number of bytes that have been written to this data stream
  • total_bytes_read: The total number of bytes that have been read from this data stream

The logical size and allocated size of a dstream may differ. The allocated size is always a factor of the container’s block size. If the file contents do not fill up the last block, then the allocated size may be larger than the logical size. APFS also allows dstreams to be sparsely allocated. Some extent ranges that logically contain all zero-bytes may not be stored on disk. In these instances, the allocated size may be smaller than the logical size of the stream.

The default_crypto_id comes in to play when we’re dealing with encrypted volumes. We will discuss more about APFS encryption in a future post.

The total_bytes_written and total_bytes_read fields are performance counters we can use to determine how often a data stream has been read-from or written-to. They are only periodically updated, and more research is needed to determine what triggers these values to be flushed to disk. Both values are allowed to overflow and reset from zero, so their utility for forensic analysis is relatively limited.

Extended Attributes

Along with the default data stream, files in APFS can also contain other forks. Like in HFS+, these additional data streams are called extended attributes but are similar in concept to alternate data streams in NTFS.

Extended attributes are stored in the File System Tree as records with a type identifier of APFS_TYPE_XATTR and the same object identifier as the inode record. The key half of an extended attribute record is a j_xattr_key_t structure.

typedef struct j_xattr_key {
    j_key_t hdr;       // 0x00
    uint16_t name_len; // 0x08
    uint8_t name[0];   // 0x0A
} j_xattr_key_t;
  • hdr: The record’s header
  • name_len: The length of the extended attribute’s name (in bytes), including the final null character.
  • name: The null-terminated, UTF-8 encoded name of the extended attribute

The value half of the extended attribute record is a j_xattr_val_t structure.

typedef struct j_xattr_val {
    uint16_t flags;     // 0x00
    uint16_t xdata_len; // 0x02
    uint8_t xdata[0];   // 0x04
} j_xattr_val_t;
  • flags: The extended attribute record’s flags
  • xdata_len: The length of the extended attribute’s inline data
  • xdata: The extended attribute data or the identifier of a data stream that contains the data

Extended Attribute Value Flags

Name Value Description
XATTR_DATA_STREAM 0x00000001 The attribute data is stored in a data stream
XATTR_DATA_EMBEDDED 0x00000002 The attribute data is stored directly in the record
XATTR_FILE_SYSTEM_OWNED 0x00000004 The extended attribute record is owned by the file system
XATTR_RESERVED_8 0x00000008 reserved

Like NTFS attributes, APFS extended attributes that are small enough can store their data directly in the attribute record itself. In these instances, the XATTR_DATA_EMBEDDED flag will be set and the stream’s data is stored in the xdata field.

Instead, when the XATTR_DATA_STREAM flag is set, xdata stores a j_xattr_dstream_t structure.

typedef struct j_xattr_dstream {
    uint64_t xattr_obj_id; // 0x00
    j_dstream_t dstream;   // 0x08
};                         // 0x30
  • xattr_obj_id: The object identifier of the extended attribute’s data stream
  • dstream: The metadata of the extended attribute’s data stream (see above)

Data Stream Extents

Except for Sealed Volumes__ (which we will discuss in the future), the _extents of a dstream are stored in the volume’s File System Tree as a set of records with the type APFS_TYPE_FILE_EXTENT. For streams with non-contiguous data, there will be more than one extent record.

The file extent record keys are of the type j_file_extent_key_t and encode the object identifier of the dstream in their record header, along with the logical offset of the extent in the stream.

typedef struct j_file_extent_key {
    j_key_t hdr;           // 0x00
    uint64_t logical_addr; // 0x08
} j_file_extent_key_t;     // 0x10
  • hdr: The record’s header
  • logical_addr: The offset within the file’s data (in bytes) for the data stored in this extent

The value half of a file extent record takes the form of a j_file_extent_val_t structure and is used to denote the physical location of the extent data on disk.

// length and flags masks
#define J_FILE_EXTENT_LEN_MASK 0x00ffffffffffffffULL
#define J_FILE_EXTENT_FLAG_MASK 0xff00000000000000ULL
#define J_FILE_EXTENT_FLAG_SHIFT 56

typedef struct j_file_extent_val {
    uint64_t len_and_flags;  // 0x00
    uint64_t phys_block_num; // 0x08
    uint64_t crypto_id;      // 0x10
} j_file_extent_val_t;       // 0x18
  • len_and_flags: A bit-field encoding the length (in bytes) of the extent in the 56 least significant bits and its flags in the most significant bits
  • phys_block_num: The physical block number of the first block in the extent
  • crypto_id: The encryption key or tweak used in this extent (or zero if not encrypted)

The eight most significant bits of the len_and_flags field are reserved for flags, but no flags are currently defined.

If the value of phys_block_num is zero, then the extent is sparse and should be interpreted as containing all zero bytes.

The crypto_id field is specific to encrypted volumes and will be discussed in a future post.

Conclusion

Understanding data streams and their on-disk structures are essential to analyzing APFS. This post discussed the default data stream, extended attributes, and file extents. Later this week, we will discuss how parsing this information differs in both Sealed and Encrypted volumes.

This post is part of my 2022 APFS Advent Challenge

Every weekday in the month of December, I will attempt to post a blog about APFS internals. For each day that I miss a post, I will donate $100 to support humanitarian aid for the Ukrainian people. If you find value in this series, and would like to support this effort, please consider donating to the GoFundMe. Slava Ukraini! 🇺🇦

Find an issue or technical inaccuracy in this post? Please file an issue so that it may be corrected.