URL_SRCH

PURPOSE   OPERATION   OPTIONS   COMMAND LINES  


Author: Dan Mares, dmares @ maresware . com (you will be asked for e-mail address confirmation)
Portions Copyright © 2005-2016 by Dan Mares and Mares and Company, LLC
Phone: 678-427-3275
Last update: March 28, 2017

Get url_srch.exe

This is a command line program.
MUST be run within a command window as administrator.


PURPOSE

Search files on a drive to determine if they contain any of the following indicators:

The program takes command line options to define the path and file types to search. Once it determines the files to search, it procedes to open and examine the contens of each file for indications of IP addresses, E-mail addresses, web site URL's, U.S.Phone number, U.S. Social Security Numbers, or Credit Card numbers of bank issuance (not gas or store cards).

This program has been found to be extremely useful in finding these items in exported freespace and unused space files. Export those items from FTK, and run this program against those items to determine the existance of the targets.


top

OPERATION

The program opens each file that is identified by the command line parameters and procedes to identify the items (IP, E-Mail address, URL).

The program is set to identify:

Every effort has been used to eliminate false hits of IP addresses, e-mails, URL's, Phone and SSN numbers that would normally not be of an acceptable range or format. However, it is always better to have some incorrect formats, than miss a meaningful item.

IPV6 formats
The basic IPV6 format is 8 sets of 4 hex (ABCD) digits separated with colons (:)
abcd:1234:5678:90ef:abcd:1234:5678:90ef
However, there are so many possible exceptions to this format, that any one of the exceptions could be missed by this program.
The user should research the standard and familiarize themselves with the exceptions, and if the exception appears in their search.
Possible research:
Reference 1
ipv6.com
Oracle reference
Note special instance of bypassing 0000:0000 segments
IPV6 formats

Some users have a single list of credit card numbers. (ie: just numbers, no other text on each line). Because of an anomoy (not a bug) in the logic, this format, a single item on the line, will not obtain correct answers. If you have credit card numbers, with only one item per line, the best way to obtain correct processing is to add about 10 blank spaces, or just add a dummy text into each line. Instead of this
1234567890123
do something like this
1234567890123 add any text here
This added text to each line, will obtain correct processing.

The international country URL's have generally been covered and accounted for as best as possible. The following usual list are currently accounted for:

.com.mx
.com
.edu
.net.nz
.net
.org.uk
.org.nz
.org
.gov
.biz
.info
.us
.co.uk
.co.nz
.me.uk
.name


top

OPTIONS

-?:  Get a help screen.

-p + path(s):  If more than one directory is needed to be looked at, then add the paths here as appropriate. (-p c:\windows    d:\work)   [PATH=path]

-f + filespec:  If more than one file type is needed, add them here. (-f   *.c   *.obj   *.dll)   [FILES=filetype]

If these options are used, the program builds a matrix of paths and file types. It searches all the requested directories for all the requested file types. Thus giving a total of all the files in all the paths requested. These options are added to any default command line provided. (C:>hash c:\work\*.c -f *.dll -p d:\windows)

-r:  DO NOT recurse through path provided. Default is recurse through path (-p option).

-x + filespec:  e(x)clude these file types from listing. Maximum of 100 file types accepted. (same format as -f option) (-x thesefiles.txt)

-oO + filename:  Output file name. Place the output to a filename. If uppercase ‘O’ then existing output is appended to.

-U:  Search also for Unicode type hits. If this is not chosen, only 8 bit ascii values are looked at. This means that you might miss a lot of Unicode hits. This option slows down processing.

-[euiPSC6]:    Select ONLY (e)mails, (u)rls, (i)ps, (P)hone numbers, (S)SN's, (C)redit card numbers, (6)IPV6 values. Only those items meeting the criteria are found. The default is to find all items except credit card numbers.
NOTE: IPV6 in unicode files is not yet implemented

-m #[CLR]:   where # is new maximum length of output line containing hit. if -m used, hit string is encased within « and ». decimal 174, 175 after the number you can place a 'C', 'L' or 'R' (i.e. -m 90L) The 'C L or R' tell the program where to place the hit string. 'C' is default

-d + #:  where # is the ascii value of the delimiter to use between fields. The default delimiter is the pipe (|), ascii decimal 124. If the value is a single digit, it must be preceeded by a 0. (-d 02) -d is only available with a -m or -w option.

-Ww [#]:  print single line wide output for input into data base. the -d (delimeter) option is encouraged at this point. if -W is used, then output file header is not inserted. this is better for import into data bases. replace # with max path value to print. the # for path size is optional the -w is default. to turn off use -w0

-D [#]: begin processing files this many bytes in from beginning.

-E [#]: end processing at this location in file.

-1 + logfile:  Create a logfile of the operation.

-R:  Reset access date to original date before operation

-g + #:  Where the # is replaced by a number indicating, list all files ‘g’reater than # days old. You can use a -gl pair to bracket file ages. [OLDER=xxx]

-g + mm-dd-yyyy
-l + mm-dd-yyyy
:  (that's and ell, not a one). Process only those files (g)reater (older) than or (l)ess than (newer) than this mm-dd-yyyy date. The date MUST be in the form mm-dd-yyyy. It MUST have two digit month and days (leading 0 if necessary), and it MUST have a 4 digit year. The date given mm-dd-yyyy is NOT included in the calculation. Ie. if today was 01-10-2003 and you entered -l 01-09-2003 you would only process todays files. If you wanted to include those on 01-09, you should have entered -l 01-08-2003.

-g + #    Where the # is replaced by a number indicating: list all files ‘g’reater than # days old. You can use a -gl pair to bracket file ages. [OLDER]=50

-l + #    (ell, not one) Where the # is replaced by a number indicating: list all files ‘l’ess than # days old. You can use a -gl pair to bracket file ages. To get todays files, use (-l 1) [NEWER]=10

-g + mm-dd-yyyy[acw]
Process only those files (g)reater (older) than this mm-dd-yyyy date. The date MUST be in the form mm-dd-yyyy. It MUST have two digit month and days (leading 0 if necessary), and it MUST have a 4 digit year. The date calculation is calculated as of midnite on the date given for the -g option of mm-dd-yyyy. For this reason, the day provided is NOT included in the calculation. Ie. if you entered -g 01-01-2006 you would only process dates PRIOR to 1/1/2006. This means all of 2005 and before. See below for the [acw] meanings.

-l + mm-dd-yyyy[acw]:  (that's and ell, not a one). Process only those files (l)ess than (newer) than this mm-dd-yyyy date. The date MUST be in the form mm-dd-yyyy. It MUST have two digit month and days (leading 0 if necessary), and it MUST have a 4 digit year. The date calculation is calculated as of midnite on the date given for the -l option of mm-dd-yyyy. For this reason, the day provided IS included in the calculation. Ie. if you entered -l 01-01-2006 you would process all of 2006 to the current date.

-L + #:  Where the # is replaced by a number indicating, list all files less than # bytes in size. (-L 100000) [LESSTHAN=xxx]

-G + #:  Where the # is replaced by a number indicating, list all files greater than # bytes in size. You can use a -GL pair to bracket file sizes. (-G 10000) (-G 10000 -L 100000) [GREATER]=10000

--email=textfile:    textfile contains a list, one per line, of the email addresses to look for. This restricts the output of the email searches to ONLY those emails listed in this text file. The file can contain a single domain to get all those emails. ie: @gmail.com or @yahoo.com etc. will get all yahoo and gmail emails.

--urls=textfile:    textfile contains a list, one per line, of the urls to look for. The format should be abc.com. This restricts the output of the URL searches to ONLY those listed in this text file. Do not include the http: unless you feel it is absolutely necessary. Sample: dmares.com, nist.gov

--ips=textfile:    textfile contains a list, one per line, of the ip's to look for. The format should be n[nn].n[nn].n[nn].n[nn], 4 octets, but doesn't have to be 3 digits each. This restricts the output of the IP searches to ONLY those listed in this text file. sample: 123.45.90.1, 69.89.12.222

--split[=xx]: split output file into this many records. the -v option is turned on to eliminate headers. if no modifier is chosed for split counts, 5,000 is default. example: --split, --split=5000


top

COMMAND LINES

Command lines can take one of three formats:

DO NOT use as the -p or -f option the full filename/path.
If using the -p option, include only the path here, and
with the -f option, only place the filename, without paths

URL_SRCH
URL_SRCH -p  d:\path  -o  c:\tmp\IP_output   -i6          -w -m 200 -d "|" 
URL_SRCH -p  d:\path  -o  c:\tmp\output                   -w -m 200 -d "|" 
URL_SRCH -p  d:\path  -f  ccards.txt -o  c:\tmp\output    -w -m 200 -d "|" 
URL_SRCH -p  d:\path  -o  c:\tmp\output -C                -w -m 200 -d "|" 
URL_SRCH -p  d:\path  -o  c:\tmp\output -U                -w -m 200 -d "|"   (add UNICODE to the search) 

NONE

top