Note as of Feb. 2010, this program has updated to work with long (>255) paths.
The MD5 program is designed to be used for forensic purposes to verify file integrity.
The MD5 program is very similar to the HASH program. It can perform the same calculations as the HASH program but provides a slightly different output format. The HASH program however has been specifically designed to work with the HASHKEEPER program designed by NDIC.
The MD5 program is designed to calculate the MD5 (128 bit) hash value of a file. It uses the MD5 algorithm as described by R. Rivest in an article published in 1992. The article is available on the internet by searching the Web on MD5.
The following is a quote from that article. “The algorithm takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest.”
What this means is that no two files will ever produce the same “fingerprint” unless they are identicle.
CERT at Carnegie Mellon University uses an MD5 signature to validate sensitive data sent out over the Internet. When information is distributed with the MD5 signature value, the MD5 program can be used to validate the integrity of the data.
MD5 can be used to produce an MD5 hash of a file. The output produced is identicle to that produced by the UNIX version of the MD5 and md5sum programs available at many internet sites.
MD5 can also be used to find matches or mismatches of MD5's supplied by the user in a sorted file of MD5's. This type of match can most efficiently be used when matching the NSRL MD5's, or a reference set of MD5's that the user has generated. (see the --MATCH option in the options section.)
The current version of MD5 also implements the NIST recognized SHA-1 (Secure Hash Algorithm). Use of the -s option will produce the SHA calculation instead of the MD5. The SHA calculation is the only secure hash algorithm currently recognized by NIST.
The use of the -B (Both) option will produce both the MD5 and SHA of a file. It is a time consuming option. For a single file, you might also try the sha_verify program found in the FTP site at dmares.com. Login as anonymous and look in the NT_32 directory.
The NIST SHA2 versions of the Secure Hash Algorithm have been also implemented. There are three versions of the SHA2. There are 256, 384 and 512 bit versions. These options are appropriately implemented as: -256, -384, and -512. When using these options, the -s option may also be used, to get a full range of SHA values. A little bit of overkill.
The SHA2 code implemented in this program was modified from code written by:
AUTHOR: Aaron D. Gifford <email@example.com>
Copyright (c) 2000-2001, Aaron D. Gifford All rights reserved.
Redistribution and use in source and binary forms, with or without modification are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTOR(S) ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTOR(S) BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
When the user supplies a filename(s) on the command line, the MD5 program calculates the hash value and prints it on the screen. The output is simple and can be redirected.
Default output includes the path and hash output. If no recursion is chosen then only the current directory is searched for the file and only the filename is provided. If recursion is chosen, then the entire path is provided along with the calculated value.
If any options are used that would request the inclusion of file times (all three can be selected in the 32 bit version), then the output is substantially increased to include file size, and times. These outputs can grow depending on the length of the output filename path field.
This program is INI file compatable.
All options should be preceded by a (-) minus sign. Some can be grouped together, and others where specified MUST be grouped without a space. The options are grouped where approriate.
Some options because they deal with specific 32 bit items like MDS or file times are only active in the 32 bit version running on an appropriate file system.
-p + path(s): If more than one directory is needed to be looked at, then add the paths here as appropriate. (-p c:\windows d:\work) [PATH]=path
-f + filespec: If more than one file type is needed, add them here. (-f *.c *.obj *.dll) [FILES]=filetype
If these options are used, the program builds a matrix of paths and file types. It searches all the requested directories for all the requested file types. Thus giving a total of all the files in all the paths requested. These options are added to any default command line provided. (C:>mdir c:\work\*.c -f *.dll -p d:\windows)
-x+filespec: e(x)clude these file types from listing. (same format as -f option) (-x thesefiles.txt) [EXCLUDE]=filetype
-oO + filename: Output file name. Place the output to a filename. If uppercase ‘O’ then existing output is appended to. [OUTPUT]=filename
-a: append output to filename provided in -o option. Serves same purpose as using an upper case O. (-a) [APPEND]=[ON|OFF]
-1 + filename: (that's a one, not ell) The filename here is a file which will contain accounting/log information about the run. It is always appended to, and contains the command line, and statistics about how many files and time of run. The file can later be used as a batch file for duplicating the runs. The ACCT environment variable can also be set. (SET ACCT=logfilename). Or use the .INI option [ACCT=filename] The order of priority is: Environment, INI file, Command Line option. To explicity turn off use a +1.
-S: If the file system is NTFS, this option causes all Alternate Data Stream files to be processed also.
Hash calculation options: (-s -B -c -256 -384 -512 ) Default option is MD5 128 bit.
-s: produce the 160 bit SHA output instead of the 128 bit MD5 hash.
-B: produce Both the MD5 and SHA (160) of a file.
-256: produce the 256 bit SHA2 calculation. (see note below)
-384:produce the 384 bit SHA2 calculation. (see note below)
-512: produce the 512 bit SHA2 calculation. (see note below)
-c: produce a 32 bit CRC output instead of the 128 bit MD5 hash.(see note below)
(Note: some combinations of the -c -s -256 -384 -512 are mutually exclusive. However, the MD5 hash is ALWAYS included with the -256, -384 and -512 as a default. If you wish to use combinations, try them first.)
-g + #: Where the # is replaced by a number indicating, list all files ‘g’reater than # days old. You can use a -gl pair to bracket file ages. [OLDER]=50
-1 + #: Where the # is replaced by a number indicating, list all files ‘l’ess than # days old. You can use a -gl pair to bracket file ages. To get todays files, use (-l 1) [NEWER]=10
-g + # Where the # is replaced by a number indicating: list all files ‘g’reater than # days old. You can use a -gl pair to bracket file ages. [OLDER]=50
-l + # (ell, not one) Where the # is replaced by a number indicating: list all files ‘l’ess than # days old. You can use a -gl pair to bracket file ages. To get todays files, use (-l 1) [NEWER]=10
-g + mm-dd-yyyy[acw]
Process only those files (g)reater (older) than this mm-dd-yyyy date. The date MUST be in the form mm-dd-yyyy. It MUST have two digit month and days (leading 0 if necessary), and it MUST have a 4 digit year. The date calculation is calculated as of midnite on the date given for the -g option of mm-dd-yyyy. For this reason, the day provided is NOT included in the calculation. Ie. if you entered -g 01-01-2006 you would only process dates PRIOR to 1/1/2006. This means all of 2005 and before. See below for the [acw] meanings.
-l + mm-dd-yyyy[acw]: (that's and ell, not a one). Process only those files (l)ess than (newer) than this mm-dd-yyyy date. The date MUST be in the form mm-dd-yyyy. It MUST have two digit month and days (leading 0 if necessary), and it MUST have a 4 digit year. The date calculation is calculated as of midnite on the date given for the -l option of mm-dd-yyyy. For this reason, the day provided IS included in the calculation. Ie. if you entered -l 01-01-2006 you would process all of 2006 to the current date.
If no 'acw' modifier is used, the default time used to check the age is the current write or last modification time.
You can however, alter which time is used in the age calculation. To do this, add any or all of the acw indicators. For instance, if you wanted the date checking to respond to the access date, you would add an 'a'. ie: -l 10-10-2005a would show all files accessed on or after 10-10-2005.
If you added more letters, to the date, ie: -g 10-10-2005cw you would get all files with EITHER an access or a last modified date older than 10-10-2005. The added [acw] times are logically OR'd. So any date meeting the criteria will cause it to be selected for processing.
The use of all three -g 10-10-2005acw allow the program to simultaneously check and evaluate all three dates.
Caution should be exercised in using all three dates, as in most cases, almost every file may fit the criteria.
-L + #: Where the # is replaced by a number indicating, list all files less than # bytes in size. (-L 100000) [LESSTHAN]=100000
-G + #: Where the # is replaced by a number indicating, list all files greater than # bytes in size. You can use a -GL pair to bracket file sizes. (-G 10000) (-G 10000 -L 100000) [GREATER]=10000
-P: Pause after every 20 lines. (default is not to pause after every screen.) [PAUSE]=ON
-d “delimeter”: replace “delimeter” with a delimeter (typically a pipe ‘ |’ ) within double quotes with which to delimet fields. If the delimeter is not printable, use its decimal ascii value but don’t place it it quotes. (-d “|”) [DELIMETER]=|
-w #: Change the default width of the filename from 38 to whatever value you wish. If you have long filenames, this may be necessary to accomodate the entire name. If a filename longer than 38 is used, the output tends to be more than one line long. Note: If the special sequence, (-w 1, thats a one net ell) is used then the output becomes a variable length record with only 1 space between the filename and the hash value. (-w 50) [WIDTH]=50
-[tT][acw3]: Show the file time as last ‘a’ccessed, last ‘w’ritten, ‘c’reated, or show all ‘3’. No spaces between the -t and the modifier. ( -tc or -t3 ) Default is the ‘w’rite, which is identicle to what DIR or Explorer displays. Note: The 3 file time capability is only available under 32 bit operating systems using the 32 bit version of the program. [TIME]=[A|C|W|3], [ALLTIMES]=]ON|OFF]. If the 't' is uppercase 'T' the file dates are printed in YYYYMMDD format for easier sorting.
-z:If using 32 bit version, display time in ‘Z’ULU GMT format. The letters GMT will be at the end of the output line indicating such. Use GMT to get relative references especially when dealing with 2 or more time zones. (-z) [ZULU]=[ON|OFF]
-m: Show file last write date. Same as -tw option. This significantly adds to the size of the output record. (-m) [MILITARY]=[ON|OFF]
-A[hrsm]: Show only files with the following attributes: h=Hidden files, r=Readonly, s=system only, m=modified. The [hrsdm] must be right after the -A withou any spaces. The -A is case sensitive. [HIDDEN|READONLY|SYSTEM|ARCHIVE]=[ON|OFF]
-R Reset file times.
-v Silent run. NO VERBOSE. Do not print normal column headings above numbers. This provides cleaner screen output for redirection to a file. This can also be accomplished by settting an environment variable called silent to ON. (set SILENT=ON). The SILENT environment variable is used by crckit also.
-U no 'U'pper case. This converts all the hex values in the md5 field to lower case values. Thus ABCDEF would be abcdef.
-D xx: This is the standard default format of the -D option. It will start processing the file xx bytes from the beginning. The xx offset is counted from 0. So to start at the 100th byte, you would enter 99 (which is actually the 100th byte). It then processes the rest of the file. If you need to process only a portion of the file, use the modified version of the -D option. (see next option).
-D xx[[,XX[oc]]: supercedes the basic -D option, and is ONLY available in a
special version which costs a few dollars more.
Use this option to process only a part of a file. This option will start processing the file at the xx byte of the file, and process to the XX byte of the file, or; with the proper modifier (c) process to the xx byte plus XX bytes.
The xx value counts from offset 0, so to start at the 100th character, enter 99.
To use this modified xx,XX option, the format must have the comma (,) followed by another value, with an optional alpha modifier. The XX value defaults to the ending byte that should be processed to. The default modifier for this is letter o.
If you wish to have the program count for you, you can use the 'c' modifier which means "count" this many characters from the beginning value entered. The xx,XX format is required. The [co] modifiers are optional.
A sample would be:
--MATCH=NSRL_MD5_filename: (This --MATCH option has three alternatives.) The user provides the filename of the file that contains the MD5 hashes to compare against. The file MUST be one record (MD5) per line, it MUST be sorted, and it must be CR/LF delimeted. In other words the file must contain ONLY 34 character records. If it is NOT sorted, or each record is NOT 34 bytes in size, the comparisons WILL fail. Versions after 3/12/2012 attempt to perform an internal sort check of this file to make sure it is properly sorted.
If the --MATCH= option is used, all the files information is printed, and the results (MATCH, NO_MATCH) is also displayed.
If the 0 is added to the keyword MATCH, as --MATCH0=...., then the output will contain only those files whose hash DO NOT MATCH the MD5's in the reference file. An output file MUST be provided, output to the screen will not accomplish the proper results.
If the 1 is added to the keyword MATCH, as --MATCH1=...., then the output will contain only those files whose hash DO MATCH the MD5's in the reference file. An output file MUST be provided, output to the screen will not accomplish the proper results.Sample:
NOTE: when viewing the output on the screen, and the filename is chosen to be longer than about 35 characters, the display of the MATCH0, MATCH1 will not properly remove the incorrect lines from the screen. It is therefore highly recommended that an output file always be used when using the MATCH0 or MATCH1 optoins.
--source=listfilename: Provide a list of files to hash in the file identified by the name: listfilename. One filename per line. The filename must contain the complete path of the file to hash. The program reads the text file one line at a time and processes that file. There should be a blank line at the end to indicate no more files to process.
Sample output. (if the -r option was used, the entire path would be shown )
OPTIONS.obj F057CBF3F765F30B0CA8C3DFBBFC8BA0 RECURSE.obj A16C61DD74DAE55241909D6B1604929A FIXNAME.obj AB4C84E456F6293749AA5A4FA7EFF9A2
C:>md5 --MATCH=reference_md5s -o outputfile
C:>md5 --MATCH1=reference_md5s -o matched_outputfile
C:>md5 --MATCH0=reference_md5s -o no_matched_outputfile
Sample output. (if the --MATCH was used,)
OPTIONS.obj F057CBF3F765F30B0CA8C3DFBBFC8BA0 NO_MATCH RECURSE.obj A16C61DD74DAE55241909D6B1604929A MATCH FIXNAME.obj AB4C84E456F6293749AA5A4FA7EFF9A2 NO_MATCH
C:>md5 -256 -o junk
Sample output of -256 option. One space between MD5 and SHA value.
MD5 SHA junk 94A2ED51F8B7255685B85BA2AE36140B D1A9E9E993A6EB1A45FB7A0DC250FE1C2131BD2B
Sample output with -m option. The size of the output filename has been shortened for display purposes.
Program started Mon Dec 28 13:43:51 1998 GMT, 08:43 EST MD5.exe *.exe -o junk -m MD5_32.EXE 6CE903B30B410F8A9E6BCF1F05A74864 130760 12/27/1998 16:19w EST MD5.exe 142D15AE29D85406F8A23A843D0B0D73 130760 12/27/1998 16:19w EST Processed 2 files, 261520 bytes: Elapsed: 0 hrs. 0 mins. 0 secs.