Programs like Everything are incredibly fast at searching by filename. How do they do it? Let's find out!
NTFS, the filesystem used by all modern versions of Windows, has an interesting property. Instead of storing file metadata in tables scattered across the partition (usually to optimise hard disk seek times going from file metadata to actual file data), NTFS stores all the file metadata in a few large contiguous blocks.
(Please note that I will be using "file" to refer to both files and directories, as is done in NTFS.)
These blocks are collectively known as the MFT (master file table).
For those unaware, the metadata for a given file typically includes its filename, its size, and pointers to where the contents of the file can be found in the partition. If the file is a directory, this will also include pointers to where the metadata for its contents can be found in the MFT.
Here's why this is useful. Because file metadata is stored in only a few contiguous blocks in the partition, we can really quickly scan all the files in the partition to get each file's associated metadata. On most other filesystems, you'd have to recurse through each directory manually and enumerate each's contents. This is perfect for when you want to search through an entire partition, or compute summary statistics for directories.
In this tutorial we'll look at how to extract this information.
First, let's get a simple program setup.
We'll be using Sean Barrett's data structures library to keep things simple.
Compile with cl /Zi mftreader.cpp
.
1 2 3 4 5 6 7 8 9 10 11 | #include <stdio.h> #include <assert.h> #include <stdint.h> #include <windows.h> #define STB_DS_IMPLEMENTATION #include "stb_ds.h" int main(int argc, char **argv) { return 0; } |
Instead of reading from a particular file, we need to read from arbitrary locations in the partition, so let's open a partition handle.
1 | drive = CreateFile("\\\\.\\C:", GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 0, NULL); |
For this to work, you'll need to run the program as administrator. Make sure to not pass the GENERIC_WRITE flag, as you really, really, do not want to modify your filesystem's structures. I am not responsible for any data loss or damage resulting from following this tutorial.
We'll also define a function to read from a specified position in the partition.
1 2 3 4 5 6 7 8 9 | HANDLE drive; void Read(void *buffer, uint64_t from, uint64_t count) { DWORD bytesAccessed; LONG high = from >> 32; SetFilePointer(drive, from & 0xFFFFFFFF, &high, FILE_BEGIN); ReadFile(drive, buffer, count, &bytesAccessed, NULL); assert(bytesAccessed == count); } |
We'll be using assert
s throughout this tutorial instead of proper error checking, as this is just a simple program.
You may wish to replace these later, but make sure to keep in mind that directly reading from the MFT isn't something you're supposed to do.
The filesystem driver could be updating the MFT as we read it, so there may be inconsistencies in the data you gather.
Before we can parse the MFT, we need to find where the first block begins. Once there, we can find pointers to the subsequent blocks.
The position of the first MFT block is stored in the very first sector of the NTFS partition. A sector is a 512-byte section of a partition. The first sector of a partition is affectionately known as the boot sector, since up until very recently it also stored code needed to boot the operating system. I've annotated the fields important to us.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | #pragma pack(push,1) struct BootSector { uint8_t jump[3]; char name[8]; uint16_t bytesPerSector; // The number of bytes in a sector. This should be 512. uint8_t sectorsPerCluster; // The number of sectors in a cluster. Clusters are used for less-granular access to the partition. They're usually 4KB. uint16_t reservedSectors; uint8_t unused0[3]; uint16_t unused1; uint8_t media; uint16_t unused2; uint16_t sectorsPerTrack; uint16_t headsPerCylinder; uint32_t hiddenSectors; uint32_t unused3; uint32_t unused4; uint64_t totalSectors; uint64_t mftStart; // The start of the MFT, given as a cluster index. uint64_t mftMirrorStart; uint32_t clustersPerFileRecord; uint32_t clustersPerIndexBlock; uint64_t serialNumber; uint32_t checksum; uint8_t bootloader[426]; uint16_t bootSignature; }; #pragma pack(pop) BootSector bootSector; |
I've wrapped this structure in pack pragmas so no padding is inserted by the compiler. You'll want to put all other filesystem structures in between these pragmas too.
Now we read the boot sector:
1 | Read(&bootSector, 0, 512); |
And now we can read the first 1KB of the MFT (that's the bit that points to the other blocks in the MFT).
1 2 3 4 5 | #define MFT_FILE_SIZE (1024) uint8_t mftFile[MFT_FILE_SIZE]; uint64_t bytesPerCluster = bootSector.bytesPerSector * bootSector.sectorsPerCluster; Read(&mftFile, bootSector.mftStart * bytesPerCluster, MFT_FILE_SIZE); |
The MFT consists of 1KB entries, and the data describing the MFT is also 1KB. That's because the first entry of the MFT is the data describing the MFT. Since all other entries in the MFT are files, this is stored as if the MFT was a file. It actually works quite neatly. Therefore to find the other MFT blocks, we need to parse the first entry of the MFT.
The basic layout of an MFT entry - called a file record - is the following:
struct FileRecord { FileRecordHeader header; struct { AttributeHeader header; uint8_t contents[...]; // Note: the contents might be stored elsewhere in the partition (a non-resident attribute). } attributes[...]; };
Therefore we'll need some more structure definitions. (You haven't forgotten about the pack pragmas, right?)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | struct FileRecordHeader { uint32_t magic; // "FILE" uint16_t updateSequenceOffset; uint16_t updateSequenceSize; uint64_t logSequence; uint16_t sequenceNumber; uint16_t hardLinkCount; uint16_t firstAttributeOffset; // Number of bytes between the start of the header and the first attribute header. uint16_t inUse : 1; // Whether the record is in use. uint16_t isDirectory : 1; uint32_t usedSize; uint32_t allocatedSize; uint64_t fileReference; uint16_t nextAttributeID; uint16_t unused; uint32_t recordNumber; // The record number. We'll need this later. }; struct AttributeHeader { uint32_t attributeType; // The type of attribute. We'll be interested in $DATA (0x80), and $FILE_NAME (0x30). uint32_t length; // The length of the attribute in the file record. uint8_t nonResident; // false = attribute's contents is stored within the file record in the MFT; true = it's stored elsewhere uint8_t nameLength; uint16_t nameOffset; uint16_t flags; uint16_t attributeID; }; |
Unfortunately, it's a bit more complicated than this. There are some additional fields that are different for non-resident and resident attributes.
See the comment in AttributeHeader.nonResident
for an explanation of the difference between resident and non-resident attributes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | struct ResidentAttributeHeader : AttributeHeader { uint32_t attributeLength; uint16_t attributeOffset; uint8_t indexed; uint8_t unused; }; struct NonResidentAttributeHeader : AttributeHeader { uint64_t firstCluster; uint64_t lastCluster; uint16_t dataRunsOffset; // The offset in bytes from the start of attribute header to the description of where the attribute's contents can be found. uint16_t compressionUnit; uint32_t unused; uint64_t attributeAllocated; uint64_t attributeSize; uint64_t streamDataSize; }; |
We're going to search for the $DATA attribute in the file record. The $DATA attribute is used to store the contents of a file, so in the case of the MFT, it stores the MFT. Since the entire MFT can't fit within a single MFT file record, the $DATA attribute will be non-resident. That means the attribute will instead list the contiguous blocks that do contain the MFT.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | FileRecordHeader *fileRecord = (FileRecordHeader *) mftFile; AttributeHeader *attribute = (AttributeHeader *) (mftFile + fileRecord->firstAttributeOffset); NonResidentAttributeHeader *dataAttribute = nullptr; assert(fileRecord->magic == 0x454C4946); while (true) { if (attribute->attributeType == 0x80) { dataAttribute = (NonResidentAttributeHeader *) attribute; } else if (attribute->attributeType == 0xFFFFFFFF) { break; } attribute = (AttributeHeader *) ((uint8_t *) attribute + attribute->length); } assert(dataAttribute); |
As described above, a non-resident attribute lists the contiguous blocks that make up the attribute's contents. Therefore, with the MFT's $DATA attribute, we have the list of blocks containing the MFT. (Remember - we only got the location of the first MFT block from the bootloader.)
Each block is called a data run.
Data runs must begin and end at cluster boundaries (usually every 4KB - see the BootSector
definition).
In NTFS, the array of data runs is stored in the following format.
struct DataRun { uint8_t lengthFieldBytes : 4; uint8_t offsetFieldBytes : 4; uint64_t length : lengthFieldBytes * 8; // Length of the data run in clusters. int64_t offset : offsetFieldBytes * 8; // Offset of the start of this data run in clusters from the start of the previous data run, or the start of the partition if this is the first run. };
Variable length fields... it's complicated. We'll have to parse these carefully. First, the header structure.
1 2 3 4 | struct RunHeader { uint8_t lengthFieldBytes : 4; uint8_t offsetFieldBytes : 4; }; |
Loop through each data run in the $DATA attribute.
1 2 3 4 5 6 7 | RunHeader *dataRun = (RunHeader *) ((uint8_t *) dataAttribute + dataAttribute->dataRunsOffset); uint64_t clusterNumber = 0; while (((uint8_t *) dataRun - (uint8_t *) dataAttribute) < dataAttribute->length && dataRun->lengthFieldBytes) { ... dataRun = (RunHeader *) ((uint8_t *) dataRun + 1 + dataRun->lengthFieldBytes + dataRun->offsetFieldBytes); } |
For each data run we need to calculate its length and absolute offset (i.e. from the start of the partition).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | uint64_t length = 0, offset = 0; for (int i = 0; i < dataRun->lengthFieldBytes; i++) { length |= (uint64_t) (((uint8_t *) dataRun)[1 + i]) << (i * 8); } for (int i = 0; i < dataRun->offsetFieldBytes; i++) { offset |= (uint64_t) (((uint8_t *) dataRun)[1 + dataRun->lengthFieldBytes + i]) << (i * 8); } if (offset & ((uint64_t) 1 << (dataRun->offsetFieldBytes * 8 - 1))) { // Sign extend the offset. for (int i = dataRun->offsetFieldBytes; i < 8; i++) { offset |= (uint64_t) 0xFF << (i * 8); } } clusterNumber += offset; |
Then we have clusterNumber
and length
calculated.
We've made it! It's time to load each MFT block and scan the file records. Because an MFT block can be quite large (hundred of MBs), we'll process it in 64MB chunks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | #define MFT_FILES_PER_BUFFER (65536) uint8_t mftBuffer[MFT_FILES_PER_BUFFER * MFT_FILE_SIZE]; uint64_t filesRemaining = length * bytesPerCluster / MFT_FILE_SIZE; uint64_t positionInBlock = 0; while (filesRemaining) { fprintf(stderr, "...\n"); uint64_t filesToLoad = MFT_FILES_PER_BUFFER; if (filesRemaining < MFT_FILES_PER_BUFFER) filesToLoad = filesRemaining; Read(&mftBuffer, clusterNumber * bytesPerCluster + positionInBlock, filesToLoad * MFT_FILE_SIZE); positionInBlock += filesToLoad * MFT_FILE_SIZE; filesRemaining -= filesToLoad; for (int i = 0; i < filesToLoad; i++) { // Process the file record. } } |
Standard stuff. Now we process each file.
Let's ignore any file records that aren't in use.
1 2 3 4 | FileRecordHeader *fileRecord = (FileRecordHeader *) (mftBuffer + MFT_FILE_SIZE * i); assert(fileRecord->magic == 0x454C4946); if (!fileRecord->inUse) continue; |
Now we need to find the $FILENAME attribute. (There might be multiple $FILENAME attributes actually - that's how hard links work in NTFS.) The attribute parsing code is the same as before.
1 2 3 4 5 6 7 8 9 10 11 | AttributeHeader *attribute = (AttributeHeader *) ((uint8_t *) fileRecord + fileRecord->firstAttributeOffset); while ((uint8_t *) attribute - (uint8_t *) fileRecord < MFT_FILE_SIZE) { if (attribute->attributeType == 0x30) { // Parse the $FILE_NAME attribute. } else if (attribute->attributeType == 0xFFFFFFFF) { break; } attribute = (AttributeHeader *) ((uint8_t *) attribute + attribute->length); } |
The $FILE_NAME attribute contains the file's name in its contents. The filename is unlikely to be very long, so this attribute will be resident. (If you never understood the whole non-resident stuff, this just means the filename will be within the file record - we don't have to hunt it down elsewhere.)
Here's the header for the attribute.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | struct FileNameAttributeHeader : ResidentAttributeHeader { uint64_t parentRecordNumber : 48; // The record number (see FileRecordHeader.recordNumber) of the directory containing this file. uint64_t sequenceNumber : 16; uint64_t creationTime; uint64_t modificationTime; uint64_t metadataModificationTime; uint64_t readTime; uint64_t allocatedSize; uint64_t realSize; uint32_t flags; uint32_t repase; uint8_t fileNameLength; // The length of the filename in wide chars. uint8_t namespaceType; // If this = 2 then it's the DOS name of the file. We ignore these. wchar_t fileName[1]; // The filename in UTF-16. }; |
There's a couple other interesting fields in there you might want to grab, but we'll ignore them for now. We'll just store an array of each file's name and its parent.
1 2 3 4 5 6 | struct File { uint64_t parent; // The index in the files array of the parent. char *name; // The file name. Zero-terminated; UTF-8. }; File *files; |
For every $FILE_NAME attribute, we'll add an entry to this array.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | FileNameAttributeHeader *fileNameAttribute = (FileNameAttributeHeader *) attribute; if (fileNameAttribute->namespaceType != 2 && !fileNameAttribute->nonResident) { File file = {}; file.parent = fileNameAttribute->parentRecordNumber; file.name = DuplicateName(fileNameAttribute->fileName, fileNameAttribute->fileNameLength); // We'll write the DuplicateName function next! uint64_t oldLength = arrlenu(files); if (fileRecord->recordNumber >= oldLength) { // Initialise skipped fields to 0. arrsetlen(files, fileRecord->recordNumber + 1); memset(files + oldLength, 0, sizeof(File) * (fileRecord->recordNumber - oldLength)); } files[fileRecord->recordNumber] = file; } |
We need to store the filenames elsewhere, since the MFT buffer will be reused. The following function allocates spaces for the filename, and converts it to UTF-8. We group allocations together into 16MB chunks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | char *DuplicateName(wchar_t *name, size_t nameLength) { static char *allocationBlock = nullptr; static size_t bytesRemaining = 0; size_t bytesNeeded = WideCharToMultiByte(CP_UTF8, 0, name, nameLength, NULL, 0, NULL, NULL) + 1; if (bytesRemaining < bytesNeeded) { allocationBlock = (char *) malloc((bytesRemaining = 16 * 1024 * 1024)); } char *buffer = allocationBlock; buffer[bytesNeeded - 1] = 0; // Zero-terminate. WideCharToMultiByte(CP_UTF8, 0, name, nameLength, allocationBlock, bytesNeeded, NULL, NULL); bytesRemaining -= bytesNeeded; allocationBlock += bytesNeeded; return buffer; } |
...and that's it! We're done.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 | /* This is free and unencumbered software released into the public domain. Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means. In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include <stdio.h> #include <assert.h> #include <stdint.h> #include <windows.h> #define STB_DS_IMPLEMENTATION #include "stb_ds.h" #pragma pack(push,1) struct BootSector { uint8_t jump[3]; char name[8]; uint16_t bytesPerSector; uint8_t sectorsPerCluster; uint16_t reservedSectors; uint8_t unused0[3]; uint16_t unused1; uint8_t media; uint16_t unused2; uint16_t sectorsPerTrack; uint16_t headsPerCylinder; uint32_t hiddenSectors; uint32_t unused3; uint32_t unused4; uint64_t totalSectors; uint64_t mftStart; uint64_t mftMirrorStart; uint32_t clustersPerFileRecord; uint32_t clustersPerIndexBlock; uint64_t serialNumber; uint32_t checksum; uint8_t bootloader[426]; uint16_t bootSignature; }; struct FileRecordHeader { uint32_t magic; uint16_t updateSequenceOffset; uint16_t updateSequenceSize; uint64_t logSequence; uint16_t sequenceNumber; uint16_t hardLinkCount; uint16_t firstAttributeOffset; uint16_t inUse : 1; uint16_t isDirectory : 1; uint32_t usedSize; uint32_t allocatedSize; uint64_t fileReference; uint16_t nextAttributeID; uint16_t unused; uint32_t recordNumber; }; struct AttributeHeader { uint32_t attributeType; uint32_t length; uint8_t nonResident; uint8_t nameLength; uint16_t nameOffset; uint16_t flags; uint16_t attributeID; }; struct ResidentAttributeHeader : AttributeHeader { uint32_t attributeLength; uint16_t attributeOffset; uint8_t indexed; uint8_t unused; }; struct FileNameAttributeHeader : ResidentAttributeHeader { uint64_t parentRecordNumber : 48; uint64_t sequenceNumber : 16; uint64_t creationTime; uint64_t modificationTime; uint64_t metadataModificationTime; uint64_t readTime; uint64_t allocatedSize; uint64_t realSize; uint32_t flags; uint32_t repase; uint8_t fileNameLength; uint8_t namespaceType; wchar_t fileName[1]; }; struct NonResidentAttributeHeader : AttributeHeader { uint64_t firstCluster; uint64_t lastCluster; uint16_t dataRunsOffset; uint16_t compressionUnit; uint32_t unused; uint64_t attributeAllocated; uint64_t attributeSize; uint64_t streamDataSize; }; struct RunHeader { uint8_t lengthFieldBytes : 4; uint8_t offsetFieldBytes : 4; }; #pragma pack(pop) struct File { uint64_t parent; char *name; }; File *files; DWORD bytesAccessed; HANDLE drive; BootSector bootSector; #define MFT_FILE_SIZE (1024) uint8_t mftFile[MFT_FILE_SIZE]; #define MFT_FILES_PER_BUFFER (65536) uint8_t mftBuffer[MFT_FILES_PER_BUFFER * MFT_FILE_SIZE]; char *DuplicateName(wchar_t *name, size_t nameLength) { static char *allocationBlock = nullptr; static size_t bytesRemaining = 0; size_t bytesNeeded = WideCharToMultiByte(CP_UTF8, 0, name, nameLength, NULL, 0, NULL, NULL) + 1; if (bytesRemaining < bytesNeeded) { allocationBlock = (char *) malloc((bytesRemaining = 16 * 1024 * 1024)); } char *buffer = allocationBlock; buffer[bytesNeeded - 1] = 0; WideCharToMultiByte(CP_UTF8, 0, name, nameLength, allocationBlock, bytesNeeded, NULL, NULL); bytesRemaining -= bytesNeeded; allocationBlock += bytesNeeded; return buffer; } void Read(void *buffer, uint64_t from, uint64_t count) { LONG high = from >> 32; SetFilePointer(drive, from & 0xFFFFFFFF, &high, FILE_BEGIN); ReadFile(drive, buffer, count, &bytesAccessed, NULL); assert(bytesAccessed == count); } int main(int argc, char **argv) { drive = CreateFile("\\\\.\\C:", GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 0, NULL); Read(&bootSector, 0, 512); uint64_t bytesPerCluster = bootSector.bytesPerSector * bootSector.sectorsPerCluster; Read(&mftFile, bootSector.mftStart * bytesPerCluster, MFT_FILE_SIZE); FileRecordHeader *fileRecord = (FileRecordHeader *) mftFile; AttributeHeader *attribute = (AttributeHeader *) (mftFile + fileRecord->firstAttributeOffset); NonResidentAttributeHeader *dataAttribute = nullptr; uint64_t approximateRecordCount = 0; assert(fileRecord->magic == 0x454C4946); while (true) { if (attribute->attributeType == 0x80) { dataAttribute = (NonResidentAttributeHeader *) attribute; } else if (attribute->attributeType == 0xB0) { approximateRecordCount = ((NonResidentAttributeHeader *) attribute)->attributeSize * 8; } else if (attribute->attributeType == 0xFFFFFFFF) { break; } attribute = (AttributeHeader *) ((uint8_t *) attribute + attribute->length); } assert(dataAttribute); RunHeader *dataRun = (RunHeader *) ((uint8_t *) dataAttribute + dataAttribute->dataRunsOffset); uint64_t clusterNumber = 0, recordsProcessed = 0; while (((uint8_t *) dataRun - (uint8_t *) dataAttribute) < dataAttribute->length && dataRun->lengthFieldBytes) { uint64_t length = 0, offset = 0; for (int i = 0; i < dataRun->lengthFieldBytes; i++) { length |= (uint64_t) (((uint8_t *) dataRun)[1 + i]) << (i * 8); } for (int i = 0; i < dataRun->offsetFieldBytes; i++) { offset |= (uint64_t) (((uint8_t *) dataRun)[1 + dataRun->lengthFieldBytes + i]) << (i * 8); } if (offset & ((uint64_t) 1 << (dataRun->offsetFieldBytes * 8 - 1))) { for (int i = dataRun->offsetFieldBytes; i < 8; i++) { offset |= (uint64_t) 0xFF << (i * 8); } } clusterNumber += offset; dataRun = (RunHeader *) ((uint8_t *) dataRun + 1 + dataRun->lengthFieldBytes + dataRun->offsetFieldBytes); uint64_t filesRemaining = length * bytesPerCluster / MFT_FILE_SIZE; uint64_t positionInBlock = 0; while (filesRemaining) { fprintf(stderr, "%d%% ", (int) (recordsProcessed * 100 / approximateRecordCount)); uint64_t filesToLoad = MFT_FILES_PER_BUFFER; if (filesRemaining < MFT_FILES_PER_BUFFER) filesToLoad = filesRemaining; Read(&mftBuffer, clusterNumber * bytesPerCluster + positionInBlock, filesToLoad * MFT_FILE_SIZE); positionInBlock += filesToLoad * MFT_FILE_SIZE; filesRemaining -= filesToLoad; for (int i = 0; i < filesToLoad; i++) { // Even on an SSD, processing the file records takes only a fraction of the time to read the data, // so there's not much point in multithreading this. FileRecordHeader *fileRecord = (FileRecordHeader *) (mftBuffer + MFT_FILE_SIZE * i); recordsProcessed++; if (!fileRecord->inUse) continue; AttributeHeader *attribute = (AttributeHeader *) ((uint8_t *) fileRecord + fileRecord->firstAttributeOffset); assert(fileRecord->magic == 0x454C4946); while ((uint8_t *) attribute - (uint8_t *) fileRecord < MFT_FILE_SIZE) { if (attribute->attributeType == 0x30) { FileNameAttributeHeader *fileNameAttribute = (FileNameAttributeHeader *) attribute; if (fileNameAttribute->namespaceType != 2 && !fileNameAttribute->nonResident) { File file = {}; file.parent = fileNameAttribute->parentRecordNumber; file.name = DuplicateName(fileNameAttribute->fileName, fileNameAttribute->fileNameLength); uint64_t oldLength = arrlenu(files); if (fileRecord->recordNumber >= oldLength) { arrsetlen(files, fileRecord->recordNumber + 1); memset(files + oldLength, 0, sizeof(File) * (fileRecord->recordNumber - oldLength)); } files[fileRecord->recordNumber] = file; } } else if (attribute->attributeType == 0xFFFFFFFF) { break; } attribute = (AttributeHeader *) ((uint8_t *) attribute + attribute->length); } } } } fprintf(stderr, "\nFound %lld files.\n", arrlen(files)); return 0; } |