Computer Forensics and Data Analysis
Software Training Services  
      Search:
NSRL Hash Set

Version 2022.03 SQL data set: (circa Mar. 2022)

The new formatted March 2022, SQL versin 2022.03.xxx.modern is what is currently available from NIST, and the massaged data found here. The references below to prior versions (<2.75) are here only for information and review.

Various NSRL versions from version 2.58 to Dec. 2021 version 2.75 files containing 174+ million unique MD5|SHA values of the RDS, ANDROID and IOS, have been combined, uniqued and zipped into a set of four (4) zip file of about 1.5G each in size. The record format of the file as seen below is: MD5|SHA which makes it a 75 character record, with the MD5 | SHA pipe delimited. It is sorted on the MD5 value, so it can be searched on MD5 as if it were indexed, and sequentially searched on the SHA values. Sequential searches of the 151 million records should take a reasonable fast computer under 2 minutes, while a binary search of the MD5 values is as fast as a traditional indexed search.

Even though many suites can process MD5 and SHA lists. I would imagine 170+ million records might cause some to choke. Maresware search, bsearch and compare are programs which can perform the searches and comparisons very easily and are batch file compatable. Each of these programs has its own specific speciality for the process, and review of the help file is suggested.

1D6EBB5A789ABD108FF578263E1F40F3|0000002D9D62AEBE1E0E9DB6C4C4C7C16A163D2C
9B3702B0E788C6D62996392FE3C9786A|00000142988AFA836117B1B572FAE4713F200567
Review this section: (NSRL data files) for explanation of the data formats, and what makes up the contens of the zip file.

If you wish a single MD5 file that looks like this, contact me. Or better yet, learn how to use the filbreak.exe program, and create it yourself.
1D6EBB5A789ABD108FF578263E1F40F3
9B3702B0E788C6D62996392FE3C9786A

The current (March 2022) RDS_2022.03.1_modern values from NIST are
552,038,839 total
43,262,568 unique

The values of this v 2.74 from the NIST page are: (See the NIST - NSRL site for explanation.) NSRL-NIST overview.

RDS 2.75 December 2021 Hash Counts (before combining and uniquing)

Modern:             202,302,512
Modern (minimal):    41,850,362
Modern (unique):     22,366,821
Legacy:             134,570,414
Android:             50,308,347
iOS:                 13,124,271

Just a quick FYI regarding the Maresware search potential. It did a linear search of the 170+ million records in just about 2 minutes. And my CPU is over 10 years old. Imagine the speed of a newer system. The bsearch, does a binary search in seconds.

Output file         sample
Processing input _FINAL_MERGED_UNIQUED
Records in file           174,391,680
program started: Sat Oct 30 06:53:54 2021


Output file name =            sample
Output record length is         75
No of records read =     174,391,680

Elapsed time: 0 hrs. 2 mins. 6 secs 
   ==========================
The same run using bsearch resulted in the following log, and the 1 second is rounded up:
Records:                  174,391,680
No of records written =             8
Elapsed time: 0 hrs. 0 mins. 1 secs

I have split the combined (older and new) 175,831,092 million items into 4 smaller zipped files and made them available for download.
NSRL_0-3.zip   contains 43,604,416 items with first character 0-3
NSRL_4-7.zip   contains 43,599,754 items with first character 4-7
NSRL_8-B.zip   contains 43,594,971 items with first character 8-B
NSRL_C-F.zip   contains 43,592,539 items with first character C-F

  Stats of the files for the 2022 SQL data sets, total, individual, zipped
  FILE                  |                               MD5|      SIZE        |   Records 
 NSRL_0-3.zip           |  735F327311D8152EEE36E902FBA34135|  1,763,191,848   
 NSRL_4-7.zip           |  09A1F69CF7AB6645C130B27CE9369096|  1,763,027,765
 NSRL_8-B.zip           |  3CE491F678BE387E42DDFAD45CD11B09|  1,762,833,850
 NSRL_C-F.zip           |  C8688E36255FEE393B8CCDA27C470380|  1,762,737,301

 0-3.MD5                |  38BC44D9BFA08534F183707D538374D8|  3,297,294,300   |  43,963,924
 4-7.MD5                |  5822E67F0B3DA92FD619D013BA155B7D|  3,296,989,275   |  43,959,857
 8-B.MD5                |  6A8BE019408109D2CA4865BE62B014C6|  3,296,614,350   |  43,954,858
 C-F.MD5                |  83B2F2BA8A17BC47585BB170E2C3C512|  3,296,433,975   |  43,952,453
                                                             ==============      ==========
 _FINAL_MERGED_UNIQUED  |  1B42BA8E0175DCE5378C277C088B8CB1| 13,187,331,900   | 175,831,092

You should unzip them, and then merge them in the sorted fashion to restore the entire data set. If you need help doing the merge, let me know. Once you merge the files, I suggest you use the sortchek.exe and the help file program to verify that the total set is still sorted. The suggested command line for sortchek is:
D:>sortchek     merged_nist_nsrl     -r 75 -p 0 -l 32
replace the merged_nist_nsrl name with whatever yours is named. If it finds a record out of order, it will show you.

SAMPLE RUNS:

NSRL_DEMO.zip  is a 1G zipped file containing some sample NSRL data with a few MARESWARE batch files to demonstrate how to use and run MARESWARE when processing the NSRL data which is referenced above. When you download the file, unzip it to a clean directory, and read the _readme.txt file. The batch files contained within the folders give a high level view of how to run some of the software to efficiently process NSRL data and your collection of MD5 records. If you have questions let me know.

Actual Real life demo-test

I conducted a hash run on one of my folders which had approximately 68 gig in 41000 files. Then I sorted it on the MD5 hash, and proceeded to compare that (the 41000 items) with the 174+ million NSRL records to see how many of the 41000 were not on the NSRL data set. Using two (disksort and compare) of the Maresware programs, after the initial hash run was completed.

Perform the hash of the suspect tree/directory which depending on the size of the data set being hashed, will take some time to obtain the file h_hashes. Or you could extract the hash values from your suite process and reform the records to fit the disksort required format. Then:

c:>disksort    h_hashes    h_hashes.srt   -r 354   -p 211   -l 32 -A 
C:>compare     K:_FINAL_MERGED_UNIQUED    H:h_hashes.srt   not_on_nsrl   compare_nsrl.par -u -1 compare.log
Elapsed time: 0 hrs. 1 mins. 56 secs
I found that about 24000 of the hashed files were not on the NSRL data set. That was not too surprising. What was surprising (to some) is that the entire process after the initial hash run was completed, took less than 2 minutes to run the sort program which sorts the hash output, then run the compare program to find the mis-matches not on the 150 million item NSRL. And this was done on a 10 year old CPU.

This above simple batch script can be used as a basis for a generic batch file which can easily be modified on a per case basis and re-run over and over to not only find suspect items, but ensure consistant repeatability (if thats a word). The delimited outputs of various hash software and suite software can easily be modified to be processed by the Maresware software which accomplishes the above tasks. If you wish to find out how to create and use such a batch process, send me an email at: dm @ dmares.com or give me a call at: 678-427-3275 and leave a message because I don't answer unless you are in my phone directory. (spam calls, you understand).

A reference page for algorithms and other documemts may be found at: NIST.  Research the documents link.

Top

Files A-C  |  Files D-F  |  Files G-K  |  Files L-O  |  Files P-S  |  Files T-Z  |
 
copyright © 1998-2021 by Dan Mares and/or Mares and Company, LLC