File sizes not adding up in Snow Leopard

When you get information on a file in the Finder, you can see numerous attributes including the file size. Under some circumstances, however, the size reported in the information window of the Finder may differ from the size reported by other file-sizing utilities.


When you get information on a file in the Finder, you can see numerous attributes including the file size. Under some circumstances, however, the size reported in the information window of the Finder may differ from the size reported by other file-sizing utilities.

We were recently contacted by a MacFixIt reader who was trying to get the size of a selection of files via AppleScript, and in order to do so was using the OS X "System Events" to grab file sizes with AppleScript code similar to the following:

tell application "Finder"
     set selectionList to selection as alias list
     tell application "System Events"
          set x to 0
          repeat with thisItem in selectionList
               set x to x + (physical size of disk item (thisItem as text))
          end repeat
     end tell
     display dialog "The Size of " & selectionList & " is " & x & " Bytes"
end tell

Byte count windows
The byte count when using the script (foremost window) differs from the byte count shown in the Finder's information window for the file (click for larger view).

The code is simple and straightforward, and essentially asks "System Events" to grab the size of each file in a list, then adds the file sizes together. For the most part it outputs the same file size as the Finder for selected files; however, for some items (namely Apple-supplied application packages included in OS X like "Stickies") the output is far different than what the Finder reports.

Testing the problem

Since the problem could be a bug in System Events, the first step was to test the file sizing of various utilities, including the Unix commands "ls" (list) and "du" (disk usage). Oddly, "ls" showed the size of the file to be the same as the Finder reports, and "du" showed the file size to be significantly less, just like System Events. Since we have the Finder and one Unix command showing one size value, and System Events and another Unix command showing another value, it seems the fact that both values are correct; however, a file cannot be two sizes at the same time, so there must be something wrong with one of these calculations.

Terminal view of DU and LS
The Unix commands "ls" and "du" also show different sizes for the same file on disk (click for larger view).

One reason might be if System Events is not including files in the application packing with the size calculation; however, this only happened for some application packages, and specifically only those that were included with OS X. Additionally, when the "ls" and "du" commands were run on an individual package component they still showed different sizes, which suggests the reason for this was not in the file packaging.

Though only a remote and not very related possibility, the next step was to test the application package's "code signing" feature by adding files to the application package and thereby breaking the code signature of the file. As expected, this did not affect the file sizes, which were still reported differently by the different commands.

Moving on, the last step was to test the application and components of it after moving and copying them to new locations on the hard drive. When the files were moved around the drive, the sizes did not change, which was the case for both the whole application package as well as the individual components. However, when any of the files were copied to new locations on the drive either when duplicating the files or copying them to a new folder, they all were reported by the Finder, System Events, "ls", and "du" as the larger of the previously reported sizes (for example, 218KB for the binary file instead of 84KB).

Reason: Filesystem compression

hfsdebug being run on the file
The utility "hfsdebug" shows the catalog and attributes entries for the specific file, revealing that filesystem compression is resulting in the two reported file sizes (click for larger view).

These tests showed that the utilities are working fine in reporting file sizes, and the differences are based in how the files are stored on disk. When files were installed by the OS and were not moved to new places on the drive either by copying or duplicating, they were in a smaller, compressed form. This compression was somehow undone for new copies of the files.

To test this, I grabbed a copy of hfsdebug and ran it on an application's binary file (the "Stickies" binary file as was used in previous tests) to see that file's properties as recorded in the filesystem structure. The utility showed that the file had been compressed, and both the logical size of the file and uncompressed size were the same as was previously reported by the various utilities used previously.

For some reason, some utilities are reporting the compressed size, and others are reporting the compressed size for these files.

Why the difference?

HFS+ filesystem compressions is a low-level method of making the most efficient use of space on the hard drive without a noticeable hit on performance. Apple is making use of it in Snow Leopard, and used it to help further reduce the installation footprint of Snow Leopard beyond just removing all PowerPC code in the system. As a result, applications like Stickies and other files installed by the OS X installer may have compression enabled on them.

Even though filesystem compression is enabled, it is odd that that some system utilities report the compressed size and others report the uncompressed size. The reason for this discrepancy lies in where each utility gets information on the size of the file, which can be understood by taking a brief look at the HFS+ filesystem organization.

The HFS+ filesystem uses several files to store information about data on the disk. There is the header file which contains items like the total number of files, number of blocks, and total volume size, and there is the the Allocation file (volume bitmap) which is a binary image of all the blocks used on disk (kind of like a grid).

Beyond these are the Catalog file and an Attributes file, which are basically very efficient databases (called "B-Trees") of information about files on the disk, including the file ID, the physical location of the files in the drive bitmap, the owner, group, respective permissions, modification dates, and the logical size of files on the disk. The Catalog file contains older types of information about files, and the Attributes file contains more modern and updated information (in this case, details about filesystem compression) about files.

When Apple added filesystem compression to the HFS+ filesystem, the compression information (including the uncompressed size of files) was added to the HFS+ Attributes file, leaving the actual size of data on disk (the "compressed" size) in the HFS+ Catalog file. Therefore, if a utility is not aware of filesystem compression or otherwise calculates sizes without referencing the Attributes file (either of which may be the case with the Unix command "du" or System Events), it will show the compressed size.

This is not necessarily a bad thing, since it shows the true size of the file; however, it may lead to confusion when trying to run scripts or other routines that need the full size of the files in their uncompressed form.

Overall the only real bug here (if one can call it that) is the lack of consistency between the various ways to size files on disk, with no indication of why there is a difference. The sizes reported are accurate depending on the viewpoint, but may be a little confusing to users based on what they are trying to do.

Autoplay: ON Autoplay: OFF