This program currently does NOT support Unicode input files.
This is a command line program.
MUST be run within a command window as administrator.
Both versions operate and have similar command lines. But work on different input files. So read and adjust the wording of this help file according to the input file you are using.
Here is a small amount of
Sample data in a
zip file which can be downloaded. The two files within this zip file are named:
there are a number of sample keyword files like:metadata_docs_keywords
there are two sample TAB delimeted x-ways metadata files: DOCS and JPEG_META.TAB
and a sample of two commands in the commands.bat file which shows how to process the sample data. Unzip the files. then run the command.bat:
The output files have a _tmp added to the filename: DOCS_META_tmp.tab
Load (import data) this file to a spreadsheet and take a look at the added columns. (Make sure your import criteria is a tab delimeted file)
To find (and isolate) semi-colon (;) delimited fields within the X-Ways metadata field that is exported during the X-Ways "Export List" operation, or HTML report generation of X-Ways. The default X-Ways export list is tab delimited, and this program ONLY works on those TAB delimited files.
The X-ways report that is generated is a typical HTML report.
The report contains a Metadata: field. The contents are usually identified as individual
Metadata items seperated with a carriage return or (HTML line break command).
Similar to this sample shown here. It may contain a lot of irrelevant information.
Important note: See below section labelled HTML_REPORT.
Equipment make: SOME CAMERA MAKE
Keywords: help me
Date Original: 2009:08:20 10:27:26
Date digitized: 2009:08:20 10:27:26
Focal length: 4.0
F number: 3.5
The "Export List" version which deals with the X-Ways "export list" options, take "user" identified field(s), isolates each one, extracts each one and sets each field up as its own tab delimited field within the output record which was generated by X-Ways. These newly inserted tab delimeted fields, can now be easily imported to a spreadsheet and manipulated by the user.
The resulting output record, now has an additional tab delimited field(s) which was identified as the semi-colon (;) delimited sub-field within the X-Ways metadata field. The original metadata field is not modified in any way, and is always maintained in the newly designed output record.
The HTML report processing version takes the selected or requested metadata fields and these are the only metadata components included in the resulting html report. Any additional unused/unnecessary Metadata items are removed from the output html file. This reduces erroneous, and extra html data which the user feels may be unnecessary in the final HTML report. Each (new) line in the resulting html output file is now also labelled Metadata:
The "Metadata:" line in the report MUST be the first item on the line in the html report and be left justified with no spaces. If you wish to change the Metadata: field to be BOLDED so it will stand out, you may do this. But you can only use the < B > and < / B > html tags. Those are the only html tags that the program understands when looking for the Metadata field name. If you use the < STRONG > tag it will not work. If the program does not respond properly, (meaning it does not properly find and parse the Metadata field) please open the html report with a text editor and do a search and replace. Remove the bolding or any other formatting around the word Metadata, and make it left justified on the line. In other words: replace the string < B >Metadata: < / b >, with just Metadata: Open and examine the X-Ways generated html report, and you will understand this above restriction.
WHY THIS PROGRAM WAS WRITTEN
The reason this program was created, is that when X-Ways finds and extracts metadata from within a file, it (X-WAYS) extracts many different metadata items from within the metadata content of any particular file. This metadata extracted is different depending on the type and content of the file being processed. There are a large number of variable items placed within this metadata field. (See below the list of meta data fields i have found). Some of which (when available) may be the Last Printed Date, Author, various Exif data, date(s), camera type, and other useful metadata information. However, since the metadata field as extracted is basically a field with free form sub-fields semi-colon (;) delimited, it is not easy to either identify the targeted item (ie: Last Printed date), or in which location within the metadata field it is found. If you have ever imported the export list to a spreadsheet you have experienced this problem.
This unknown location within the metadata field, and the variableness of the metadata information makes it almost impossible to isolate and reparse the targetted item (ie. Last Printed Date:). This program identifies the targeted (user identified) sub-field, extracts and isolates its data and makes a seperate delimited item/column of the data. Then when the resulting data file is imported to a spreadsheet such as Excel, that user identified sub-field is now its own unique column within Excel, and can be processed as other columns. If you have ever tried to parse the metadata field you know what we are talking about.
Said another way. This program takes the metadata field, and based on the users input (hopefully properly and correctly researched information) parses the metadata field to locate the semi-colon delimited field(s) which is needed. It then reparses the metadata field and seperates out the selected field(s) into its OWN seperate tab column, which when imported to the spreadsheet will process very nicely. The original data record is not changed, except now it has added tab delimited items.
The original field which this program was written to parse was the "Last Printed:" date item within the doc and spreadhseet generated metadata. It has also been tested on email, Exif, and link file metadata, and seems to work with all of these metadata fields. Any feedback on its operation is appreciated.
There is one caveat.
Which is, that X-WAYS ALLOWS carriage returns to be embedded within the metadata field of the
"Exported List" process. These embedded carriage returns usually are a result of parsing
email items, but, regardless of the source file, will cause a spreadsheet (Excel) to have
major problems. This program finds those embedded carriage returns
If you only want to convert the carriage returns embedded within the metadata field, simple select one metadata field, (any field will do), and run the program. Your output contains the added field, but the metadata field itself, now has the carriage returns converted.
04-04-2012 NOTE: Thanks to Jimmy Weg, I have made a second program called: x-ways_report_process.exe that is designed to work on the metadata field that is included in the X-Ways HTML report files. It will search the Metadata: line and select out only those segments which the user requests on the command line. The user can input up to 10 items to select on the command line. OR OR, (4/9/2012) if the 3rd item on the command line (which would normally be the metadata item searched for) is replaced by a filename containing the items, then these items are searched for. The metadata items must be one per line in a text file. see the command line below.
This program takes a single filename as its input, and up to ten other items on the command line. Be careful, this IS a COMMAND LINE program.
The program takes the input filename, parses it, and adds an _tmp to the input name. Thus generating a new filename and uses this as the output. So an initial input file named: xways_export.txt will generate an output name: xways_export_tmp.txt. Look for a new output filename similar to the input, with the added _tmp.
The input file should be the usual tab delimited file which is exported thru the X-Ways "Export List" option. THE INPUT MUST BE A TAB DELIMITED FILE, and you must advise the spreadsheet program of this when importing the data. This is the default export format of the X-Ways "export list" operation. The user may include in the ouput record any other fields they would usually include. HOWEVER: The last field in the exported record MUST be the metadata field. This last field being the metadata field is the ONLY one which is being searched or processed for the item(s) which is provided by the user on the command line. If the metadata field is NOT the last field, the output file will not have the expected content.
The traditional format of the X-Ways metadaa field from the "Export List:" process is a single field within the tab delimited record. This is a sample of three fields below (path, hash, metadata, I split the metadata to two lines for easy reading. Notice in the metadata, there are (colon delimeted) sub-fields of: File name:, Sequence:, Version:, Length:, Cluster:, Modification: )
\WINDOWS\system32\config\Newsid Backup F67ACE253768387C57471BE55F051ABC File name: temRoot\System32\Config\DEFAULT;Sequence: 831;Version: 1.5;Length: 499712; Cluster: 1;Modification: 12/20/2011 04:50:23;Last Printed: 12/25/2011 06:50:23
From the report html file, we find the format: Notice this version displays the html code of the BR to indicate a line break. (carriage returns inserted for clarity)
Metadata: Width: 3296 < BR > Height: 2472 < BR > Orientation: 1 < BR > Software: OLYMUS CAMERA MODEL < BR > Equipment make: YOUR CAMERA COMPANY < BR > Model: THE MODEL < BR > Maker note: (12728 bytes) < BR > Keywords: any words in the metadata < BR > Date Original: 2011:03:20 13:27:26 < BR > Date digitized: 2011:03:20 13: 27:26 < BR > Thumbnail: true < BR > Focal length: 4.0 < BR > F number: 4.70
Within this "Export List" metadata field, X-Ways usually delimits the metadata with semi-colon (;) delimited fields. So that within the metadata you have multiple items which X-Ways has parsed into semi-colon delimted items. One of these items is what the user will probably be looking for. One usual item is the "Last Printed:" date of Office documents. If available, this "Last Printed:" date will be one of the semi-colon; delimited items within the metadata field. In other instances, there may not be any metadata at all, or the item being looked for is not part of the metadata extracted. These are the three possbilities. If you don't know what this is referring to, don't bother to read on.
On the command line, after the user provides the input filename, you are required to input a search string (or a text file, one item per line of the fields to search for). This search string is the name of the semi-colon delimited field within the metadata field which is the item to look for. For the purposes of further discussion we will use the "Last Printed:" field name which is sometimes part of the metadata of Office documents. Notice that the actual name of the field usually ends with a colon (:). This is how X-Ways seems to identify the item name.
SPECIAL CASE for not adding field title in output recordThe Last Printed: field is displayed in the output record as:
The program will read each record within the input file. It then finds the last tab delimited field (which MUST be the metadata field). Within the metadata field, it then looks for the string(s) which the user has input, in this case "Last Printed:". The string searched for is case sensitive, so be aware of any anomolies that might exist in the X-Ways data record, especially CaSe sensiTivity of the item being sought.
Once the string is located, the program assumes it is the semi-colon delimited field to extract. It then outputs the first part of the record, up to this metadata field, it then outputs this subset of the metadata field, which is what the user asked for, and finally it outputs the complete metadata field as it was originally in the X-Ways output record.
What we end up with is the searched for field, tab delimited inserted just BEFORE the originals meta data field.
This stand alone tabbed field is now properly formatted so that when the user imports the resutling output file into a spreadsheet that field is easily identified and processed.
C:> x-ways_report_process.exe report_input.html file_containing_metadata_fields_to_look_for (preferred version) C:> x_ways_meta_processing inputfilename.txt "String_to_search_for:" "Another_string_max_of_10:" C:> x_ways_meta_processing inputfilename.txt "Last Printed:" C:> x_ways_meta_processing inputfilename.txt "Last Printed:" 2> CR_error_filename C:> x-ways_report_process.exe report_input.html "Last Printed:" "Keywords:"
Notice that all the strings in the inputfilename.txt above to search for terminate in a colon (:). This is because in my research, most if not all of the metadata field names within X-Ways metadata column are identified by a colon terminator. It is not required, but seems to be the standard.
Will attempt to locate the "String(s)_to_search_for" field within the metadata field, and extract it to another tab delimited field within a new output file.
Please note, When using the command line to identify the meta data field(s) you wish to locate, that there are a max of ten metadata strings in the x_ways_meta_processing.exe per run that can be searched for.
For this reason, it is preferred that you use the text file which contains your strings. This makes the list easily modified and reusable.
A sample text file might contain
Sample string(s) file
Last Saved By:
See the full list of items i have found.
The redirection 2> to the CR_error_filename, (only used in the "Export List" processing) finds and lists those records in the input file which contain embedded carriage returns in the metadata field, and changes the embedded carriage return to blanks. The result is that the data file can easily and cleanly be imported to the spreadsheet.
There is a way to get the "report html process" version to create seperate tagged Metadata: lines for each nd every metadata item. It makes the reading of the report a lot cleaner and easier. If you wish to learn how to do this, give us a call: 770-242-6687 X 119.
None, but a weird way to search for items case insensitive.
The default is to search for the strings as case sensistive.
So you better get it correct.
However, if you call the program with ALL UPPERCASE characters (X_WAYS_META_PROCESSING), then the search is done case insensitive.This case insensitivity is NOT currently available in the report processor program.
Below are fields i've found in the metadata column of X-Ways. I have yet to add any email eml fields. The list is long. When using in the program, if your research confirms what we have here, be sure to include the colon as part of the field name. That is usually the field delimiter. Also, do proper research to determine the case of the field you are searching for. Many programs arbitrarily alter the case. Notice some items below (see Content-type) have two versions.
attached to a shape:
Creator Host OS:
File format revision:
File history flags:
Last Opened By:
Last Saved By:
Moved to recycle bin:
Network share name:
(Original Filename: *** see below)
Target File Size:
(Volume Serial: *** see below)
Original Filename: use with metadata of $R... files
SPECIAL INSTRUCTIONS: READ CAREFULLY
The "Volume Serial:" number is the serial number given to the disk by Microsoft at the time of formatting. It is most easily seen when doing a "dir" of the drive. The response shows up as " Volume Serial Number is 1442-13FE". However Microsoft stores the volume serial number in the boot record in little-endian fashion at displacement 72 (from 0). So if you are trying to confirm/find the serial number 1442-13FE at displacement 72, you would actuall need to look for: FE134214 (without the dash). The link file internal record of the serial number is displayed as it is in the DIR command, so when looking at the raw (boot record) data, you need to convert to little-endian.
X-WAYS and $R.... MetaData for Original Filename:
X-Ways $R... (recycled files) and obaining the Original Filename
When X-Ways exports the metadata of the $R files, it produces a "Movedd to recycle bin" field like:
Moved to recycle bin: 2015/03/24 22:46:58.0 +0;C:\Users\DAN\Documents\Admin\Filename_whatever.pdf
Notice the actual original filename doesn't have traditional (colon :) field delimeter or a unique field name before it.
It is merged with the Moved to... as a single field. The default operation of this program will not be able to parse the original filename because it is combined with the MOVED date.
In order to allow for correct parsing of the original filename into a field, we must do the following.
Look at the time offset which was used. In this case it is +0 followed by a semicolon delimeter. Assume the entire file has the same +0; offset, we can change the +0; to reflect a correct field delimeter. Do the following,
Perform a search and replace with the following parameter (using the offset as the key).
Find: +0; Replace with: +0;Original Filename:This will fix the fields so that now we have:
X-WAYS_ID_rename A sister program to take the X-Ways export list data and rename the exported files.
EML_PROCESS A sister program which can easily separate the header fields within eml files.
CSV2PIPE Is capable of removing embedded carriage returns from csv files.