FILE INVENTORY or LIST_IT

The truth is out there
So make sure your software can list it.

First authored June 2022. However, by the time you read the article, a lot of time may have passed and the software that was tested may have been updated and now just might pass the tests. However, you should conduct tests of your own to see if the current version passes your tests and meets your needs.


Here are a few articles you might like to read in the order listed. But before reading them, think about this small difference: the difference between "processing the evidence", and "conducting the forensic investigation". I think these articles are more targeted to the processing of the evidence rather than the direction you use to conduct the investigation. They may be very similar but no cigar.



CAVEAT
Your know what that is: "a modifying or cautionary detail to be considered when evaluating".

The cautionary detail is that my testing took place during 2019-2022 time frame and I suspect that some of the versions of the software being tested has been updated, modified, fixed. Not to say, the problems I found have been fixed, just that there are newer versions out there, and maybe, just maybe the short comings I found may have been fixed. Just saying.


After reading this article, read these next ones in forensic order.

Start here:

This is the article you are currently on Inventory/Catalog files  Creating an inventory of evidentiary files
Forensic Hashing  Article tests over 30 "forensic" hash programs.
Forensic file copying  Article tests over 40 "forensic" file copiers
ZIP-IT for forensic retention  Article test a few zipping programs and
ZIP_IT_TAKE2  More tests for your zipping capabilities.
ZIP FILE/container/container  Hashing your zip container reliably
MATCH FILE HASHES  Demonstrates hash matches using Maresware.
A HASH software buffet   How-to use Maresware hash software


Preliminary case information which determines why I chose the items to test.

First is you have a situation where you can seize the entire computer, or make a full bit image of the drive then some of these test requirements will be easily met using a suite. See Suite stuff   below. However, there are situations which will be a little more restrictive, and which will cause you (or rather your software) to be more restrictive in what and how you process the evidence. That situation will be explained here, and again below, just so you get the idea behind the topics I chose to perform the tests around. I think (I know thinking is bad), that testing software under these more restrictive scenarios will show that the software can not only perform in a more restrictive environment, but also in one in which you have conplete control.

So lets begin:

The tests were performed on an NTFS file system because I believe that is the most common file system used by corporations today. It also offers the more items with which we will perform the tests.

So number one is the fact that the software will be able to find unicode file names. Not necessarily display in full unicode format, but merely find and process those items.

Then, second because we are on NTFS files system, we must be able to find and process long filenames. Those filename paths greater than 255 characters. You will be surprised at how many programs can't do that. I have seen cataloging software turn the long filenames into traditional 8.3 path/filenames. Try and explain that to the opposition.

Third, again because NTFS, we will ass ume that the owner of the computer system, (usually a corporation) has last access update turned on. The last access update may or may not be important to your investigations, but if it is turned on, your program should be able to NOT tamper with the evidences last access date. Wouldn't you like to keep a record of all the original file MAC date/times?? I would. This is part of the inventory you should take of everything you seize regardless of the type of case.

And fourth and final: again because of NTFS, we should be able to find, identify, and process where necessary any alternate data streams. Consider a porn investigation where the user downloads porn from various sites. Did you know, that some browsers (I'm not telling you which, thats for you to find out) actually store in ADS's the original URL and other information of the download. Might be very interesting in porn or other internet investigations.

Also, you must consider when copying from suspect to a work drive for transmit to your office, that the copy program retains ALL original file dates so as not to corrupt or influence the analysys. Does yours maintain all the dates?

If you perform a bit-image of the drive using a suite, most of these items above will easily be identified and located as evidence. However, in our test scenario, we are sitting at a corporate server where we can ONLY process/examine/image/copy (call it what you will) that directory tree belonging to the suspect. So this fine line refinement and restriction must be considered when testing our software. Period....


 

Do you create an inventory of the items you seize during an investigation?
Then why don't you routinely create an inventory (full listing, catalog) of ALL the files within the drive, tree, directory of the suspect computer you just seized?

Read this article and raise your forensic intelligence level a few points. 😄

This article will discuss the idea and processes that might be considered when listing, or producing a catalog of files
A: contained within the entire tree/directory on the suspects machine while at the original physical location or
B: within a specific evidence directory that you have restored on your forensic/analysis computer from the suspect or seized computer. (hopefully A and B match)
C: the work or evidence files you are reviewing on your forensic workstation, or
D: the entirety of the evidence you are turning over to the reviewer or prosecutor, which is most likely NOT the entire file list.

After all, if you can't create a catalog of the files within this subject tree, or a list of files within your evidence presentation, how will you or others know what files might be of interest and should be highlighted or captured for evidentiary purpose.

We also discuss some of the possible shortcomings, problems, and/or restrictions you may encounter when using the more traditional or recommended file listing software which may have been recommended thru one of your forensic list serves.

I tested a number file listing programs. Some of which are: suite type, installable, and stand alone. When tested against my simple evidence tree, I found most of them lacking the capability to accomplish all the requirements I set up.

A definition: Let's use the term catalog to also mean a list or listing of the files within the subject location. This catalog listing should lend itself to easily be imported or further massaged by a data base, spreadsheet, or simple text editor. Technically and legally, what we are talking about here is actually an "inventory" of the files contained within the specific evidence location (ie: computer hard drive, server or suspect folder, forensic work drive files). So for practical purposes and the purpose of this article, the following all mean the same: catalog, list, inventory, whole bag of ....

Table of Contents:
BASIC  explanation of why you should create a catalog.
Overview   of my test requirements.
Suite stuff  Suites don't do it all.
Other considerations  of format of the data.
Programs Tested  

Before we start:
A challenge   (6/2020) for you to test your forensic LIST_IT/copy/zip software for forensic and evidentiary reliability.

TOP  

my cataloging BASICS

List or create an inventory of the files within the evidence source, or your forensic working folder.
A lot of circular references. That way you become a big wheel. 😄

There are some situations where you might want or need to create a catalog of the files within the specific evidentiary tree you are working with.
1. Create a true and accurate catalog of ALL the files contained within the seized evidence available (ie: a suspect tree on the company server, or a tree on the suspect computer).
2. Create a catalog of the files produced or mentioned in your forensic report so the reviewer will have a clean succinct easy to review list of ALL the files you are working with.
3. Create a catalog of any and all files within other key directories which are either part of the original evidence collection, or the final product that is going to long term storage.
4. Create a catalog of those files seized, which are NOT part of the final report.

Situation 1: Installaion of cataloging software is restricted due to corporate or legal restriction".
The source of your evidence is located on a corporate server or on a stand alone (users) computer at a corporation. The major problem is that this corporation, or search warrant has the following restriction.

You cannot install any of your software on the corporate computer system or on the suspect computer. This means that any software you use must be run from a thumb drive and be ready to go from the thumb. You cannot load anything onto the corporate computer(s). This restriction most likely will prohibit you from using a suite software package.

Situation 2: Suspect system has Last Access Date update turned on.
Once you determine that you can only run the cataloging software from the thumb drive you have to consider if the software will also perform some sort of hashing of the files. This is not generally a requirement, but some of the cataloging software has this capability. Since the suspect computer has last access update turned on, you MUST make sure any and all of your processes do NOT update or alter any of the MAC dates. You must also make sure that your software, when capturing the file list also captures the true and accurate MAC dates. Else you could alter/corrupt the original evidence. Capturing the MAC times in GMT format is a plus.

Situation 3: Final data production/catalog for reviewer.
Now you have completed your examination on your forensic computer and it's time to prepare a final report, provide evidence files and a file list to the attorney. You may also wish to create a separate catalog of all the files which make up your final report. This list might be created for future reference.

You have hundreds or thousands of evidence files extracted from your forensic process which are on your forensic computer. Your production process produces the selected files to the reviewer. However, to make things understandable for the attorney, you wish to create a catalog of ALL the evidence files which are being provided. This list is in a clean and succint listing which most likely will be imported and possibly additionally massaged/sorted/selected by the reviewer. So the final catalog for presentation to the reviewer must be clean and easily manipulated and re-processed by the reviewer.

You would be surprised at the kludgey (that's a technical term) format which a lot of these recommended packages produce. Which are almost impossible to form or reform to a clean format. But don't take my word for it.

Below. Notice some of the output formats create completely separate segments for each folder. This would be almost impossible to reprocess logically for hundreds of folders in the tree.

There are other output formats found (see the "other consideration" section below). All of which are problematic when considering taking the output to the next step.

For instance, prior to 2022 the NIST NSRL data sets were produced in a clean flat file format. This meant that those files could be processed/reprocessed by almost any program capable of manipulating "flat" files. So 10 years from now, any program worth its weight in bits could process that data. Then in 2022 NIST decided in their infinite "wisdom" that they would now produce the data sets in sql format. Which means that basically only one type of software could process the data for the next step. Maybe you have sql knowlendge and maybe you don't. But you will need to obtain sql software. Then, what happens years down the road when sql becomes extinct, and you have little capability of processing this ancient data. However, "flat" data formats will probably never go out of style, however you decide to process that data.

Finally, the easiest output format might be a fixed length or properly delimited file with complete information in each record that would allow for easy processing or import to a spreadsheet or database. So research and practice generating different formats which would best suit your needs. DAH!
Something like this pipe delimited record. No added overhead or "STUFF" to bloat the size.

  NAME       | EXT |   SIZE | WRITE     | WR_TIME | CREATE    | CR TIME | ACCESS   | ACC TIME| MD5     | FULL PATH                |DR SER NO
filename.jpg | JPG | 176,626| 2020/03/03| 07:34:56| 2020/03/03| 07:34:56|2021-12-31| 11:38:00| C06BA...| F:\SUBJECT1\filename.jpg | ABC909   

Which of the following sample output formats would you rather create and/or have available to load into a spreadsheet of perform additional analysis on? (some fields truncated, or removed for legibility). Notice some of the output formats create completely seperate segments for each folder. This would be almost impossible to reprocess logically for hundreds of folders in the tree.

There are other output formats found. All of which are problematic when considering taking the output to the next step.

Finally, the easiest output format might be a fixed length or properly delimited file (i.e: pipe) as shown above with complete information in each record that would allow for easy processing or import to a spreadsheet or database. So research and practice generating different formats which would best suite your needs. DAH!

TOP  
====================================================
 
 =====================================================
notice a seperate set of records/lines identifying the new folder name. difficult to injest.
 Volume in drive Y is Y_2T
 Volume Serial Number is 7C1E-81A3
 Directory of Y:\TMP\TEST_USB\SOURCE2
05/23/2022  11:21 AM    "DIRECTORY"    CYRILLIC_COPY
05/23/2022  11:21 AM    "DIRECTORY"    CYRILLIC_NAMES
01/01/2019  08:34 AM                48 ALTERNATE_STREAM_FILE.TXT
                                    34 ALTERNATE_STREAM_FILE.TXT:ALTERNATE.TXT:$DATA

 Directory of Y:\TMP\TEST_USB\SOURCE2\CYRILLIC_NAMES
01/01/2019  08:34 AM            12,889 Cyrillic.7z
                                47,814 Cyrillic.7z:LFN_HASHES.TXT:$DATA
                                    34 Cyrillic.7z:signature.txt:$DATA
01/01/2019  08:34 AM            25,894 CYRILLIC_NAMES_W_ADS.7z
 ======================================================  OR

reasonable output as long as its properly delimited.

FOLDER	C:\TMP\TEST_USB\D1\	-------	2	15	772,744	772,744
FILE	---A---X	1/1/2019 07:34	1/1/2019 07:34	1/1/2019 07:34	54	_RESET_D1.BAT
FILE	---A----	1/1/2019 07:34	1/1/2019 07:34	1/1/2019 07:34	48	ALTERNATE_STREAM_FILE.TXT
FOLDER	C:\TMP\TEST_USB\D1\CYRILLIC_NAMES\	-------	0	5	226,341	226,341
FILE	--------	1/1/2019 07:34	1/1/2019 07:34	1/1/2019 07:34	12,889	Cyrillic.7z
FILE	--------	1/1/2019 07:34	1/1/2019 07:34	1/1/2019 07:34	93,971	Cyrillic_NAMES_W_ADS_PK.zip

  ======================================================  OR Next 3 as long as properly delimited

Path,File,Size,Created,Modified
Y:\TMP\TEST_USB\SOURCE2\,ALTERNATE_STREAM_FILE.TXT,48,1/1/2019 7:34:56 AM -05:00,1/1/2019 7:34:56 AM -05:00
Y:\TMP\TEST_USB\SOURCE2\CYRILLIC_COPY\,Cyrillic.7z,12889,1/1/2019 7:34:56 AM -05:00,1/1/2019 7:34:56 AM -05:00
Y:\TMP\TEST_USB\SOURCE2\,Lec 11.htm,52219,1/1/2019 7:34:56 AM -05:00,1/1/2019 7:34:56 AM -05:00
Y:\TMP\TEST_USB\SOURCE2\,ZERO_BYTE.TXT,"",1/1/2019 7:34:56 AM -05:00,1/1/2019 7:34:56 AM -05:00
  ====================================================  OR

  PATH                                                                               |   SIZE| ATTR  |   MDATE  |   MTIME     | TZ|   SERIAL #| DISK LABEL 
F:\SOURCE2\CYRILLIC_COPY\Cyrillic.7z                                                 |  12889|.......|01/01/2019|07:34:56:789w|EST|  BA0E-5287|   1G_CRUZER
F:\SOURCE2\CYRILLIC_COPY\Cyrillic.7z:LFN_HASHES.TXT                                  |  47814|.adata.|01/01/2019|07:34:56:789w|EST|  BA0E-5287|   1G_CRUZER
F:\SOURCE2\CYRILLIC_COPY\Cyrillic.7z:signature.txt                                   |     34|.adata.|01/01/2019|07:34:56:789w|EST|  BA0E-5287|   1G_CRUZER
F:\...\fifth_folder_starting_at_188_characters_of_longfilenames\ads.htm              |   8550|.......|01/01/2019|07:34:56:789w|EST|  BA0E-5287|   1G_CRUZER
F:\...\fifth_folder_starting_at_188_characters_of_longfilenames\ads.htm:ads_hash.txt |    388|.adata.|01/01/2019|07:34:56:789w|EST|  BA0E-5287|   1G_CRUZER

  ===================================================== OR
Name                         Format        Size  Modified             Created              Accessed             MD5                               Path                                  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ALTERNATE_STREAM_FILE.TXT    TXT        48 Byte  2020/01/01 07:34:56  2020/01/01 07:34:56  2021-12-31 11:38:00  844FB10A6494E10139AD5D91661B5D29  F:\SUBJECT1\ALTERNATE_STREAM_FILE.TXT 
CHESS_20180226A_sml.jpg      JPG     176.626 KB  2020/03/03 07:34:56  2020/03/03 07:34:56  2021-12-31 11:38:00  C0668A5AC70243D30EE3C4DD35B0678B  F:\SUBJECT1\CHESS_20180226A_sml.jpg   

TOP  

OTHER CONSIDERATIONS

As far as the actual format of the final data file, a lot a responses have been received. Some suggest XML is how many of the suites can export the data. Others say JSON (SON of Java 😇 ) might be better. And still others might say a Mongo data base (yeh right!!!), or any number of other formats.

But consider this. First, what I was explaining in the last section was the initial output of the more popular cataloging software. In most cases shown above, all the formats are little less than cludgey (not sure of the correct spelling) when attempting to import them into the next step, which may be a data base or a spreadsheet. Also, let me mention that I have seen Excel choke on incorrectly formatted CSV data where a field may have an unusual format, and the CSV totally confuses the program.

Try importing these two data records (seperately), one is CSV, the other pipe delimited into Excel and see what happens to the quoted \"Roswell\" city name.
"dan mares","1234 lakeway,\"rosell\", ga","12345"
dan mares|1234 lakeway,\"rosell\", ga|12345

Also, the added volume of data which formatting each JSON record in the hundreds of thousands of records may also be a problem either for data loading or just handling the size. Bigger isn't always better. Just ask .... (well you know).

XML and JSON might be alternative processing formats, but again, how many of the usual cataloging software packages (except suites) output that way. And for what reason? That they say, do it my way, not how you wish to further process the data.

Then the most important consideration comes to light. In what format can your customer (yes the prosecutor or manager) handle the data. Do they have the knowledge and expertise, and maybe even the software that can efficiently handle the XML, JSON or DataBase format you provide them. Do they even want to learn another language format? Or maybe, all they want in the report is a clean set of records that they can open with a word processor, or text editor, to get a quick view of the data. Maybe they have their own program which they wish to use, and converting from your format to theirs might be somewhat difficult. So, its not so much how you process or handle the data, it how does this initial cataloging software output the data so the next step is manageable for ALL.

So, KISS it when considering the next persons' needs and capabilities to analyze or review the data. And keep in mind years down the road, will your format be usable?

TOP  

TEST REQUIREMENTS

The requirements I set up when I performed my tests were a few simple items as described here.

When performing your own tests of the cataloging software, these items are simple requirements but most of the tested software failed one or all of them. So test your own software against similar requirements. Remember, your requirements may not be those which the defense attorney will challenge.

Consider that today, I would expect to find most corporations are using Windows operating systems, with NTFS file formats on their main drives. For this reason, I concentrated a lot my testing with requirements of the NTFS system.

My "definition" and explanation of what a good cataloging software capability and should accomplish is:

The test requirements at a minimum are the following:
   -   ★ NTFS Long Filename/path identification/process: Able to find, articulate and list all files found within any long filename paths.
   -   ★ NTFS Alternate Data Streams: Able to find, articulate and list all Alternate data streams, whether in LFN's or normal file lengths.
   -   ★ Report Generation: Able to provide output easily imported into a spreadsheet or data base for next step process, see above sample outputs.
   -   ★ Time display/retention: Able to find/display and include in report all three MAC times. GMT time listing might also be nice.
   -    An added plus might be to produce a log of the "cataloging" process for the final report. But not part of the testing.

TOP  

SUITE STUFF

A suite digression here.

Some times you can do a bit image of the entire drive with a suitable suite. Lets hope you aren't imaging a multi-terabyte server where you only need a single suspects directory. When you do the bit image, the suites generally can create reasonbly understandable catalog outputs which can be further manipulated as needed. This section does not deal with processing full bit images of the data. It deals with using a suite to process data less than the bit image. And may digress a little to explain why if you obtain my test data, you should not use a suite to perform the tests. Again, suites on full bit images work fine, but we are not talking about that here. We are talking a single top level tree/directory of a single suspect possibly contained on a large corporate server located who knows where.

Remember, the situation we are talking about here is twofold. First you may be at a location which has a large server farm, and you can only obtain a single tree/folder belonging to the suspect. A full bit image of the server is not possible. Another instance might be that you are prohibited either by the corporation or court order to install the suite on the computer, or you just aren't in a position to do a full bit image of the drive and must ultimately rely on a logical processing of the tree.

In the cases mentioned above you must therefore run the suite against a single tree/directory. Now if the suite can capture a low level bit image at this point, good for you. But in most instances where you can only operate at the directory level, most of the suite software can at best operate at the logical level. Which means you will have to capture your data at the tree level. Therefore no bit captures allowed.

So when testing your suite against its capability to create a reasonable catalog of files, and to make the test results evenly evaluated, make sure you are doing so at the logical folder level and not the bit image level. You will find a significant difference in the output capability.

Also, remember, your final output is something the reviewer can see, feel, and massage for their use. So the output of the suite may not totally be compatable with his needs.

================================================================

Want to see how bad some software is at creating a full catalog of a tree. Try creating a full tree catalog using Windows Explorer. HA HA

Some preliminary information: I want to remind that all the testing I have done and reference in this and any other testing related article was done using Windows10 on an NTFS file system on a desktop computer. The NTFS file system was used as the test environment because I believe that a significant number of corporations and other forensic investigations take place using the NTFS file system. Also, the test environment regarding ability to alter a files last access date, use long filenames and alternate data streams adds to the forensic and evidentiary complexity.


Final Review Note
Cataloging of evidence filenames for reports and court.

Everyone who deals with “digital evidence” should be aware that no matter what or when you obtain the evidence, your ultimate goal or expected end point is to present this evidence in court. So treat all electronic evidence as if it will end up as court evidence. If you don’t do it from the beginning, it will be hard to backtrack later.

When you first encounter the suspect system, don't you think it might be wise to obtain a full listing of all the files that are visible within the suspect tree/directory. Don't forget, this is possible evidence, and you want to catalog all the evidence you seize. Yes/NO?

This suspect tree might be the entire drive of the suspect, or a single tree on a large server that belongs to the suspect. In any case, you may have hundreds of thousands of files within this evidence location. Wouldn't you, and your later reviewers/attorneys, etc. like to have a full and complete list of those files???

Create a catalog or list of the original evidence and any important intermediate product, and above all the final product produced that will be sent to attorneys, or produced in court should be created.

Cataloging original evidence files is justified and almost mandatory. But let the attorneys argue that one. What about any report provided to outsiders. Are you certain the recipient will not alter the content (add or delete files) and present the alteration as original. Yes, you will say, the recipient has integrity. DAH!

And to put the icing on the cake, so to speak. What if this catalog made sure to include any Alternate Data Stream of each file. If you have reviewed some of my other articles, you will remember that when downloading files from the internet, some browsers add the source of the item and place it in an alternate data stream. Wouldn't it be nice to be able to see in the download folder, that there might be good source evidence in alternate data streams which might lead you to valuable evidence.

TOP  

For those adventurous souls who wish to test their forensic suites or stand alone software I have created a software testing challenge  to see if your cataloging/listing, copy, zip software passes the test.
Also available is an executable which contains about 50+- files, in a self extracting executable which must be run on and NTFS file system to get all the benefits. Email or call 678-427-3275 (leave a message) for the file and its password.

TOP  

Programs Tested
Here is a list of the programs tested.
Versions not listed. The tests were conducted circa 2022.
Some, lik FTK-IMAGER had the capability of running agains a Drive and a Folder and it was tested under both.
DIR_CMD       		CMD
DIRLISTER     		GUI    
DISKCAT       		CMD  
FORENSIC EXPLORER 	GUI
FILELIST_CREATOR  	GUI 
FILELIST_v2       	GUI 
FTK_IMAGER - Tested on both Drive and Folder   GUI
KARENWARE        	GUI (must be installed) 
PARABEN_E3       	GUI
POWERSHELL       	CMD 
SEARCHMYFILES    	GUI 
SLEUTHKIT_FLS           CMD
TREESIZE_FREE     	GUI (must be installed)
Windows - Powershell  	CMD
Below are the results sorted in best to not the best. The order listed in no way corresponds to the alphabetic listing above. And some of the above, like DIR actually have two lines within the results.

When an item reads 1/2 this means that most likely the process worked on short filenames but did not work on long filenames, or it worked on top level files, but did not work on Alternate Data Streams. The 1/2 representation merely means it worked sometimes but not 100%.

The date columns mean if the program recorded that particular date of the file.

Some outputs created a two line output for each file as shown in some of the above examples. The chart does not indicate that operation as it is left to the user to test and figure that one out for yourself.

If a column is missing, it probably means I couldn't figure out what the program was doing.
  LFN   ADS   Write Date  Create Date Access Date 
                                                  
   YES   YES   YES         YES         YES        
   YES   NO    YES         YES         YES        
   YES   NO    YES         YES         YES        
   YES   1/2   YES         YES         YES        
   YES   NO    YES         YES                    
   YES   NO    YES         NO          NO         
   YES   NO    NO          NO          NO         
   YES   1/2   1/2         1/2         1/2        
   FAIL  FAIL  YES         YES         YES        
   NO    NO    YES         YES         YES        
   NO    NO    YES         YES         NO         
   Image Image Image       Image       Image  only
   1/2   NO    YES         YES         1/2        
   NO    NO                                       
   NO    NO                                       
   NO    1/2                                      

TOP  

Associated articles and programs of interest:
Forensic file copying  Article tests over 40 "forensic" file copiers
Forensic Hashing  Article tests over 30 "forensic" hash programs.
ZIP-IT for forensic retention  Article test a few zipping programs and
ZIP_IT_TAKE2  More tests for your zipping capabilities.
MATCH FILE HASHES  Demonstrates hash matches using Maresware.
A HASH software buffet   How-to use Maresware hash software

I would appreciate any comment or input you have regarding this article. Thank you. dan at dmares dot com