ZIP IT - HASHES

The truth is out there
So make sure your software can find it.


First authored Feb. 2023


Before you get into this article, you might read these associated sequence of articles.

Start here:

Inventory/Catalog files  Creating an inventory of evidentiary files
Forensic file copying   Article tests over 40 "forensic" file copiers
Forensic Hashing        Article tests over 30 "forensic" hash programs.
ZIP-IT for forensic retention  Article test a few zipping programs and
ZIP_IT_TAKE2       More tests for your zipping capabilities.
ZIP FILE/container  Hashing your zip container reliably
MATCH FILE HASHES  Demonstrates hash matches using Maresware.
A HASH software buffet   How-to use Maresware hash software


Read this article and raise your forensic intelligence level a few points. 😄

A little background. If you haven't already read ZIP-IT above, please read it before reading this article..

Some preliminary information:

Did you ever stop to think about this situation, or something close. You created a zip container to provide to the opposition. Then at a later date, for whatever reason you created another zip container with the identicle files, and again provided it to the opposition. However, your opponent pulled a fast one and hashed both containers. Guess what, the hashes are different. This started a major discussion: WHY: if the contents of both containers are supposed to be identicle, are the container hashes different. Answer that one in court. Well, here is a possible answer.

I want to remind you that all the testing I have done for this and the articles mentioned above, and reference in this and any other testing related article was done using Windows on an NTFS file system on a desktop computer. The NTFS file system was used as the test environment because I believe that a significant number of corporations and other forensic investigations take place using the NTFS file system. Also, the test environment regarding ability to alter a files last access date, use long filenames and alternate data streams adds to the forensic and evidentiary complexity.

The test computer for other forensic evidentiary tests has the last access update registry key turned on. This is so any action on a file will cause the files last access date to be updated. In the following tests of various zipping programs that capability does not come into question, but in some instances it is referenced. It has nothing to do with the overall testing process for this article. However, as stated below, when the zipping program altered (did not reset) the source last access date, it was reset so that all subsequent tests and comparisons were not affected by altered source access dates.

This short article was written as a result of my performing tests on some of the recognized file zipping software programs. They include programs that are routinely recommended and used by most people processing evidence for retention, attorney discovery, or court adjudication. The programs used are: WINrar, 7-Zip, PKzip, WINzip. During the testing, WINzip proved to be the hardest and least forensic program to use in an evidentiary environment, so minimal testing was done using WINzip.

Now, let the tests begin.


TEST OVERVIEW:

These tests were to confirm or deny (like my governmenteze) that when a zipping program zips up suspect files to a container, the container ends up with a specific hash value.

(NOTE: because some of the zipping programs routinely alter source last access date when they include the program into the container, after each run the file dates, especially last access were ALWAYS reset to the initial setting to make sure subsequent zipping was looking at the same source file date/times.)

Then, at a later time, maybe a few minutes or a few hours or days, if those same suspect files are zipped to a new container, that new container hash is completely different from the prior one(s). Discussion was done between myself and another old investigator (don't tell him I called him old) regarding this change/update of the container hash each time a new container was created. So I decided to test that theory.

I set up the following test data as described here:
A top level directory called TEST21 (irrelevant) was set up with six files containing 50 bytes of hex values as shown below by their names.
HEX00 contains 50 hex 00 values, HEX01 contains 50 values of hex01, etc.

HEX00.TXT | 50 | 01/01/2019 07:34:56:789c | 01/01/2019 07:34:56:789w | 01/01/2019 07:34:56:789a EST
HEX01.TXT | 50 | 01/01/2019 07:34:56:789c | 01/01/2019 07:34:56:789w | 01/01/2019 07:34:56:789a EST
HEX02.TXT | 50 | 01/01/2019 07:34:56:789c | 01/01/2019 07:34:56:789w | 01/01/2019 07:34:56:789a EST
HEX03.TXT | 50 | 01/01/2019 07:34:56:789c | 01/01/2019 07:34:56:789w | 01/01/2019 07:34:56:789a EST
HEX04.TXT | 50 | 01/01/2019 07:34:56:789c | 01/01/2019 07:34:56:789w | 01/01/2019 07:34:56:789a EST
HEX05.TXT | 50 | 01/01/2019 07:34:56:789c | 01/01/2019 07:34:56:789w | 01/01/2019 07:34:56:789a EST

These files were then hashed to determine the correct hash of each file:

E:\...\HEX00.TXT | 871BDD96B159C14D15C8D97D9111E9C8 | 50 |01/01/2019 07:34:56:789c 01/01/2019 07:34:56:789w 01/01/2019 07:34:56:789a EST  
E:\...\HEX01.TXT | 76E7E36462E7E73C6D8D927BA0E78F73 | 50 |01/01/2019 07:34:56:789c 01/01/2019 07:34:56:789w 01/01/2019 07:34:56:789a EST  
E:\...\HEX02.TXT | 85F7588E2D312BBD69E927CD3701AF2E | 50 |01/01/2019 07:34:56:789c 01/01/2019 07:34:56:789w 01/01/2019 07:34:56:789a EST  
E:\...\HEX03.TXT | 07EE8AEA7E9AA3EFAC64666095EC4876 | 50 |01/01/2019 07:34:56:789c 01/01/2019 07:34:56:789w 01/01/2019 07:34:56:789a EST  
E:\...\HEX04.TXT | AE86B1B5EE54B541BCF64E2C3743D00D | 50 |01/01/2019 07:34:56:789c 01/01/2019 07:34:56:789w 01/01/2019 07:34:56:789a EST
E:\...\HEX05.TXT | 37953A8A9A6E70A349875E4B69DCFF1C | 50 |01/01/2019 07:34:56:789c 01/01/2019 07:34:56:789w 01/01/2019 07:34:56:789a EST  

Then for each zipping program I compressed the files into a "container" using mostly default settings. In some instances I did it only 2 or 3 times, in other instances a few more containers were created. Between each run, as mentioned before, because the system had last access update turned on, I then reset the file date/times to the correct values as seen above before creating the next container. This way all the containers contained identicle data.

The formats of the output container file names is generally of the form: TEST21_xx_DD_HHHH.ext,
where the xx is generally (except for the rar) replaced by 7z (7-zip), zp (pkzip), wz (winzip), and
DD is the day: ie 22, or 23, and the
HHHH is the time of the run.
So the container name for 7z containers created at two times would have a name as: TEST21_7z_22_0929.7z, and TEST21_7z_22_1005.7z appropriately identified.


WINRAR

First things first. Reading my other zipping articles you will see that winrar is the only zipping program I have found that is truly compatable with all my evidentiary tests, and is the best one to use in my humble opinion. That being said, lets begin.
WINrar Hash of the test container files at different times:

TEST21_19_1556.rar | 1CD6A6B06BC093893C3BD3DA65FDD130 | 550 | 01/19/2023 15:56:53:234c 01/19/2023 15:56:53:234w 01/19/2023 15:56:53:234a EST
TESt21_19_1559.rar | 9C381BB508D0E9A76D7A9555DBFF85CC | 538 | 01/19/2023 15:59:28:576c 01/19/2023 15:59:28:576w 01/19/2023 15:59:28:576a EST
TEST21_19_1604.rar | B8E5D4573CE01D993AD99F11A1328C2A | 550 | 01/19/2023 16:04:12:441c 01/19/2023 16:04:12:457w 01/19/2023 16:04:12:457a EST
It appears that even as little as a 3 minute delay will cause the rar file to have a different hash. It is thought (I know thinking is dangerous) that this difference is the result that somewhere in the header the WINRAR program maintains some sort of date/time reference as to when the program was run and rar file created. Other administrative data may also be maintained in the container header, but I'm not concerned with that. What is of interest is that each run created a different container hash.

At this point I have no way of identifying in any of the programs which/where/when/why the final compressed file header is altered or modified. Just that it appears all the programs may alter something in the header based on the current time it is run.


7-ZIP

7-zip runs a few hours after the first shows obvious hash difference of the 7z output file. This alteration is anticipated in any future zip creation. Most likely, as mentioned before, the result of some type of header information.

TEST21_7z_22_0929.7z  | 507396E4262FD18A3A5AA9D226E23057 | 262 | 01/22/2023 09:29:54:623c 01/22/2023 09:05:10:875w 01/22/2023 09:29:54:639a 
TEST21_7z_22_1005.7z  | 7362015616B8FBE14989C6DA9525FA2E | 259 | 01/22/2023 10:05:32:400c 01/22/2023 10:01:46:325w 01/22/2023 10:05:32:416a 
TEST21_7z_22_1414.7z  | 1EA8F6D1FCA90C54D628956778AC8CF1 | 228 | 01/22/2023 14:14:00:890c 01/22/2023 14:14:00:890w 01/22/2023 14:14:00:890a 


PKZIP:

PKzip alters the last access date of the files being zipped, so after each run the last access date was reset before the next run.

TEST21_zp_22_0942.zip | 5DFD6A149863C66A7975B082034154ED | 1076 | 01/22/2023 09:44:44:207c 01/22/2023 09:44:44:394w 01/22/2023 09:44:44:394a EST
TEST21_zp_22_1034.zip | 9B737AEA124D90B52E4691EABF5AE35E | 1076 | 01/22/2023 10:34:39:636c 01/22/2023 10:34:39:761w 01/22/2023 10:34:39:761a EST
TEST21_zp_22_1053.zip | D08A6686FA5F24C561933ACDBC2DD9AC | 1076 | 01/22/2023 10:53:38:736c 01/22/2023 10:53:38:876w 01/22/2023 10:53:38:876a EST
TEST21_zp_22_1425.zip | 7668D7FF9E36C31ADEDF270FF67E7805 | 1224 | 01/22/2023 14:26:11:075c 01/22/2023 14:24:01:257w 01/22/2023 14:26:11:075a EST
TEST21_zp_23_0635.zip | 91EB0C67203CBC69A71CCA4B2DC1648B | 1072 | 01/23/2023 06:36:07:370c 01/23/2023 06:35:26:552w 01/23/2023 06:36:07:370a EST


WINzip

Winzip is a terrible GUI program to conduct evidentiary zipping, etc. So not too much was done except to confirm that it too like all others created different container hashes after each run. Seel below.

TEST21_wz_22_1000.zip | 14488AFD9393867E48EBDF1187FC2047 | 1042 | 01/22/2023 10:00:56:697c 01/22/2023 09:59:05:197w 01/23/2023 07:50:15:404a EST  
TEST21_wz_23_0738.zip | 3692BC8D73DC19BF4C05A4A6D45CBBFA | 1042 | 01/23/2023 07:38:27:440c 01/23/2023 07:37:04:236w 01/23/2023 07:38:27:456a EST  
TEST21_wz_23_0752.zip | 4E61DD7BF881AB9D114981361C48073A | 1042 | 01/23/2023 07:52:13:357c 01/23/2023 07:51:26:049w 01/23/2023 07:52:13:379a EST  


FINAL REVIEW CONFIRMATION

All the compressed files were extracted to separate directories bearing the HHHH name to keep all the extracted data separate. Then a hash was done (see command line used below) for all the files: HEXnn.TXT which was a total of 15 directories, and 90 files. Excerpt from the log file created is shown here. Directory names which follow the HHHH names of the zipped containers:

 DIR    0635
 DIR    0738
 DIR    0752
 DIR    0929
 DIR    0942
 DIR    1000
 DIR    1005
 DIR    1023
 DIR    1034
 DIR    1053
 DIR    1414
 DIR    1425
 DIR    1556
 DIR    1559
 DIR    1604
Maresware HASH command line used to obtain hashes of the 90 extracted files, and log file count.
c:\maresware\hash.exe    -f hex*.txt -w 50 -tw -d "|" -v -o RESTORED_HASHES.TXT -1 logfile
Number of files processed:        90
The 90 hashes were then sorted and counted so see that all the extracts were as they should be, and no erroneous hash values showed up. Below is the total count for each of the file hashes. You can see the final hashes match the original inputs, with no unusual hashes showing up, indicating that all extracts are as they should be. Had an unusual hash showed up in the extraction of the containers, this would mean that the zipping program upon extraction caused an alteration in the data. That would not be nice.!!!!
HEX03.TXT | 07EE8AEA7E9AA3EFAC64666095EC4876 |  50|01/01/2019|07:34:56:789w|EST|  +15 
HEX05.TXT | 37953A8A9A6E70A349875E4B69DCFF1C |  50|01/01/2019|07:34:56:789w|EST|  +15 
HEX01.TXT | 76E7E36462E7E73C6D8D927BA0E78F73 |  50|01/01/2019|07:34:56:789w|EST|  +15 
HEX02.TXT | 85F7588E2D312BBD69E927CD3701AF2E |  50|01/01/2019|07:34:56:789w|EST|  +15 
HEX00.TXT | 871BDD96B159C14D15C8D97D9111E9C8 |  50|01/01/2019|07:34:56:789w|EST|  +15 
HEX04.TXT | AE86B1B5EE54B541BCF64E2C3743D00D |  50|01/01/2019|07:34:56:789w|EST|  +15


Final Conclusion regarding container hash values.

(reminder: some zipping programs DO not allow for resetting of the original access date of the suspect file when including it into a container, so any subsequent container creation would obviously have alterations in its hash. This is not a situation that is to be considered here. Again, ALL subsequent container creationts were done on IDENTICALLY dated suspect files. Any container creation of files with different access dates, or any date would not fit in to the logic of these tests.)

When using a common zipping program to create a container for the evidence, each time the same evidence is included into a new container, the hash of that new container will be different from the previously created container. It was not studied, but a practical explanation as to why each subsequent container has a different hash value might be that the zipping program itself embedds something within the header of the container which is differendt from each run. The most logical (and not tested) might be the date/time the container is created. This would account for the fact that even a few minute interval results in a different hash value of the container.

I would be interested in knowing if anyone reading this article does their own test and can identify what changes in the header are made from run to run. That might identify why the hashes are different.


Associated articles and programs of interest:
Inventory/Catalog files  Creating an inventory of evidentiary files
Forensic file copying  Article tests over 40 "forensic" file copiers
Forensic Hashing  Article tests over 30 "forensic" hash programs.
ZIP-IT for forensic retention  Article test a few zipping programs and
MATCH FILE HASHES  Demonstrates hash matches using Maresware.
A HASH software buffet   How-to use Maresware hash software

 

A fun fact for you real forensicators: decode the time of all the source files or: 07:34:56:789w|EST| and figure what it is in GMT.

I would appreciate any comment or input you have regarding this article. Thank you. dan at dmares dot com,