How to manually verify copied files in OS X

If you would like to verify two copies of files or two mirrored directories, there are some Terminal commands that will let you do just that.

When you copy files from one location to another in OS X, the system should run a number of checks to validate the data and ensure that it was copied correctly; however, if you are using third-party utilities to copy a file, you might be concerned about potential corruption.

MacFixIt reader Douglas recently wrote in with such a concern.

I was wondering if there is any way to verify a copy of files from one hard drive to another? I used SuperDuper to copy one of my RAIDs to another RAID; it was 7TB of stuff and took two days to copy. I just want to make sure nothing was corrupted while copying. I used Chronosync to check SuperDuper and it looked OK, but I [want to be sure].

Most file copying processes will run special "checksum" routines on files that are being copied, which is a quick way to summarize the bits and bytes in a file to give it a unique signature code that can be used to verify its integrity.

In addition, Apple's default file-system format includes a journal, which caches file writes before they are made permanent on the disk and therefore greatly reduces potential corruption from write interruptions such as power failures.

Because of this, manual verification of files is not needed; however, if desired, it can be done.

There are several ways you can manually check the integrity of a copied file. Two approaches involve checking the folder structure against the original to see if any files are missing, and if all have been copied, then verifying the integrity of the copied files.

Checking folder structure
To check the folder structure, you can use the "diff" command in OS X Terminal. This command is used to compare two sources line by line, and can be used to compare directory trees of copied files to see if a file or two is missing. To do this, open the OS X Terminal utility and run the following steps:

  1. Type the following command followed by a single space:

    diff -rq

  2. Drag the first directory to the terminal, and then drag the second directory there as well.
  3. Complete the command by typing the following:

    > ~/diff.txt

  4. Press Enter to execute the command when it looks something like the following:

    diff -rq folderpath1 folderpath2 > ~/diff.txt

When finished, the command will create a text file called "diff.txt" in your home directory that will contain a listing of the files that are in one directory tree but not in the other. You can then use this file to inspect the files that were not copied.

The "diff" command should pick up individual file changes in the directory tree, but may not do so in all instances. Therefore in addition to using diff you might consider using a checksum routine to verify the integrity of the files copied.

Validating individual files
To verify each file, you will need to run a checksum on it and then compare the checksums.

To do this for a single file, type the command "md5" (or "md5 -q" to only show the checksum itself) in Terminal followed by a single space. Then drag the first copy to the Terminal window (this should enter a full path to the file) and then press Enter to execute the command.

Checksum comparison in the OS X Terminal
In this example, the two files show different checksums indicating they are different. The first instance shows "md5" run by itself and the second shows "md5 -q" where only the checksum is output. Screenshot by Topher Kessler/CNET

In the output you should see a long alphanumeric string that represents a unique signature for the file. Perform these steps on the second file, and then see if the output string is the same for both files. If so, then the files are intact.

In addition to "md5" you can use the command "shasum" which computes a different checksum.

Validating contents of folders
You can also use checksums in a similar manner to validate the contents of two directories. For example, if you have a folder containing multiple files, you can compute a cumulative checksum for all the files within that folder, and compare this checksum to that of another directory. To do this, type the following command, replacing "FOLDER_PATH" with the one you want to check:

find -s FOLDER_PATH -type f -exec md5sum {} \; | md5sum

One way to do this is, as mentioned above, to first type "find -s" in Terminal, followed by a single space, then drag the folder of interest to the Terminal window to fill out the FOLDER_PATH aspect and type the rest of the command.

This command will act on the specified folder, and similarly give an output like the previous checksum options for single files. For example, to run this on a folder called "folder1" in your account's Desktop directory, the following two versions of this command will work:

find -s ~/Desktop/folder1 -type f -exec md5sum {} \; | md5sum
find -s /Users/USERNAME/Desktop/folder1 -type f -exec md5sum {} \; | md5sum

In this manner you can compute the checksum of an entire folder and then compare them to see if the folder contents were properly copied.

Rsync is your friend
A final approach is to use the popular "rsync" command to perform a dry-run synchronization between two folders and display what files command has found to be different. Rsync uses a checksum routine of its own to compare files when syncing two folders, so you can make use of this to quickly run a comparison between them. To do this, in the Terminal type "rsync -nrcv" followed by a single space, and then drag the initial folder to the Terminal window, followed by dragging the destination folder to the window.

rsync dry run in the OS X Terminal
The rsync command's output will list the files in the source directory that are different, which you can then investigate. Screenshot by Topher Kessler/CNET

After their paths have been entered in Terminal, execute the command, and rsync will list the files in the initial folder that are not in the proper place or do not otherwise match those in the destination folder. You can then use rsync itself to update the folders and ensure that they are synced properly, or you can investigate these files and determine why they were not copied properly to begin with (Note: Some popular syncing tools like Carbon Copy Cloner use rsync, so you can make use of them to run this command).

Update (3/20/2013): Clarified and corrected the usage of the "diff" command.

Questions? Comments? Have a fix? Post them below or e-mail us!
Be sure to check us out on Twitter and the CNET Mac forums.