|
|||||
Software | Training | Services | |||
|
NSRL Hash Set
Version 2022.03 SQL data set: (circa Mar. 2022)The new formatted March 2022, SQL versin 2022.03.xxx.modern is what is currently available from NIST, and the massaged data found here. The references below to prior versions (<2.75) are here only for information and review.
Various NSRL versions from version 2.58 to Dec. 2021 version 2.75 files containing 174+ million unique MD5|SHA values of the RDS, ANDROID and
IOS, have been combined, uniqued and zipped into a set of four (4) zip file of about 1.5G each in size. The record format of the file as seen
below is: MD5|SHA which makes it a 75 character record, with the MD5 | SHA pipe delimited. It is sorted on the MD5 value, so it can be searched
on MD5 as if it were indexed, and sequentially searched on the SHA values. Sequential searches of the 151 million records should take a
reasonable fast computer under 2 minutes, while a binary search of the MD5 values is as fast as a traditional indexed search.
The current (March 2022) RDS_2022.03.1_modern values from NIST are The values of this v 2.74 from the NIST page are: (See the NIST - NSRL site for explanation.) NSRL-NIST overview. RDS 2.75 December 2021 Hash Counts (before combining and uniquing) Modern: 202,302,512 Modern (minimal): 41,850,362 Modern (unique): 22,366,821 Legacy: 134,570,414 Android: 50,308,347 iOS: 13,124,271 Just a quick FYI regarding the Maresware search potential. It did a linear search of the 170+ million records in just about 2 minutes. And my CPU is over 10 years old. Imagine the speed of a newer system. The bsearch, does a binary search in seconds. Output file sample Processing input _FINAL_MERGED_UNIQUED Records in file 174,391,680 program started: Sat Oct 30 06:53:54 2021 Output file name = sample Output record length is 75 No of records read = 174,391,680 Elapsed time: 0 hrs. 2 mins. 6 secs ========================== The same run using bsearch resulted in the following log, and the 1 second is rounded up: Records: 174,391,680 No of records written = 8 Elapsed time: 0 hrs. 0 mins. 1 secs
I have split the combined (older and new) 175,831,092 million items into 4 smaller zipped files and made them
available for download. Stats of the files for the 2022 SQL data sets, total, individual, zipped FILE | MD5| SIZE | Records NSRL_0-3.zip | 735F327311D8152EEE36E902FBA34135| 1,763,191,848 NSRL_4-7.zip | 09A1F69CF7AB6645C130B27CE9369096| 1,763,027,765 NSRL_8-B.zip | 3CE491F678BE387E42DDFAD45CD11B09| 1,762,833,850 NSRL_C-F.zip | C8688E36255FEE393B8CCDA27C470380| 1,762,737,301 0-3.MD5 | 38BC44D9BFA08534F183707D538374D8| 3,297,294,300 | 43,963,924 4-7.MD5 | 5822E67F0B3DA92FD619D013BA155B7D| 3,296,989,275 | 43,959,857 8-B.MD5 | 6A8BE019408109D2CA4865BE62B014C6| 3,296,614,350 | 43,954,858 C-F.MD5 | 83B2F2BA8A17BC47585BB170E2C3C512| 3,296,433,975 | 43,952,453 ============== ========== _FINAL_MERGED_UNIQUED | 1B42BA8E0175DCE5378C277C088B8CB1| 13,187,331,900 | 175,831,092 You should unzip them, and then merge them in the sorted fashion to restore the entire data set. If you need help doing the merge, let me know. Once you merge the files, I suggest you use the sortchek.exe and the help file program to verify that the total set is still sorted. The suggested command line for sortchek is: D:>sortchek merged_nist_nsrl -r 75 -p 0 -l 32 replace the merged_nist_nsrl name with whatever yours is named. If it finds a record out of order, it will show you.
SAMPLE RUNS:
Actual Real life demo-test I conducted a hash run on one of my folders which had approximately 68 gig in 41000 files. Then I sorted it on the MD5 hash, and proceeded to compare that (the 41000 items) with the 174+ million NSRL records to see how many of the 41000 were not on the NSRL data set. Using two (disksort and compare) of the Maresware programs, after the initial hash run was completed. Perform the hash of the suspect tree/directory which depending on the size of the data set being hashed, will take some time to obtain the file h_hashes. Or you could extract the hash values from your suite process and reform the records to fit the disksort required format. Then: c:>disksort h_hashes h_hashes.srt -r 354 -p 211 -l 32 -A C:>compare K:_FINAL_MERGED_UNIQUED H:h_hashes.srt not_on_nsrl compare_nsrl.par -u -1 compare.log Elapsed time: 0 hrs. 1 mins. 56 secsI found that about 24000 of the hashed files were not on the NSRL data set. That was not too surprising. What was surprising (to some) is that the entire process after the initial hash run was completed, took less than 2 minutes to run the sort program which sorts the hash output, then run the compare program to find the mis-matches not on the 150 million item NSRL. And this was done on a 10 year old CPU. This above simple batch script can be used as a basis for a generic batch file which can easily be modified on a per case basis and re-run over and over to not only find suspect items, but ensure consistant repeatability (if thats a word). The delimited outputs of various hash software and suite software can easily be modified to be processed by the Maresware software which accomplishes the above tasks. If you wish to find out how to create and use such a batch process, send me an email at: dm @ dmares.com or give me a call at: 678-427-3275 and leave a message because I don't answer unless you are in my phone directory. (spam calls, you understand).
A reference page for algorithms and other documemts may be found at: NIST. Research the documents link.
|