I needed to search Google for things, so I decided to write a little tool in Python to do it for me . This tool will allow you to search Google for a search string. You can then further filter the results by preforming a regex comparison of the description, the url, or both. This uses Google’s web interface and parses the results, it does NOT use the Google API. It works with Google’s current interface.
Usage is as follows:
–
usage: googleFinder.py [-h] [-o OUTFILE] [-cookie COOKIE] [-uG URL_GREP]
[-dG DESCRIPTION_GREP] [--version]
Start_Result End_Result Search_Terms
usage: googleFinder.py [-h] [-o OUTFILE] [-cookie COOKIE] [-uG URL_GREP]
[-dG DESCRIPTION_GREP] [--version]
Start_Result End_Result Search_Terms
goolgeFinder v1.0. Command Google.
positional arguments:
Start_Result Starting Search Result
End_Result Ending Search Result
Search_Terms Terms of your google search
Start_Result Starting Search Result
End_Result Ending Search Result
Search_Terms Terms of your google search
optional arguments:
-h, –help show this help message and exit
-o OUTFILE Specify a file to output to.
-cookie COOKIE If google has blocked traffic capture an authenticated
cookie by solving the google captcha and enter it
here.
-uG URL_GREP Regex for URL matching.
-dG DESCRIPTION_GREP Regex for Description body matching
–version show program’s version number and exit
–
-h, –help show this help message and exit
-o OUTFILE Specify a file to output to.
-cookie COOKIE If google has blocked traffic capture an authenticated
cookie by solving the google captcha and enter it
here.
-uG URL_GREP Regex for URL matching.
-dG DESCRIPTION_GREP Regex for Description body matching
–version show program’s version number and exit
–
Examples of usage:
Search results 1-500 for the string “Nissan preformance Powered by oSCommerce”
googleFinder.py 1 500 “Powered by oSCommerce”
Filter the above results such that the description contains the word “sale”
googleFinder.py -dG sale 1 500 “Powered by oSCommerce”
Filter the above results such tha the actual URL contains
“(p|P)roduct-info.php”
googleFinder.py -uG “(p|P)roduct-info.php” -dG sale 1 500 “Powered by oSCommerce”
By default googleFinder will get a new Google cookie every few rounds (in order to prevent being CAPTCHAed), however this only works for so long until the IP is CAPTCHAed. Once this occurs you must solve the captcha. Once the captcha is solved all future searaches conting the solved captcha cookie will not be captchaed (from my trials). A captcha cookie can be obtained by clearning your cookies then solving the captcha and grabbing the new cookie, monitoring network traffic, or the Firefox “Cookie Editor” plugin. Once hte cookie is obtained you will insert it as follows
googleFinder.py –cookie PREF=ID=63393ff579a63fa6:FF=0:TM=1297657258:LM=129 7657258:S=fgmi3-uDGra9MDkk; ex
pires=Wed, 13-Feb-2013 04:20:58 GMT; path=/; domain=.google.com; NID=44=LqLWYcrV
DW1rw89614a-ZOcvFliG2We_vDJd7ebERHQCz8i6cZxAumats9BCDY8tC19kL5 WaFY1DE2jdgnp4wRVi
GGryl_rfGrsipdK-IeBrEK97uP9XjrJu526LKcbs; expires=Tue, 16-Aug-2011 04:20:58 GMT;
path=/; domain=.google.com; HttpOnly -uG “(p|P)roduct-info.php” -dG sale 1 500 “Powered by oSCommerce”
pires=Wed, 13-Feb-2013 04:20:58 GMT; path=/; domain=.google.com; NID=44=LqLWYcrV
DW1rw89614a-ZOcvFliG2We_vDJd7ebERHQCz8i6cZxAumats9BCDY8tC19kL5 WaFY1DE2jdgnp4wRVi
GGryl_rfGrsipdK-IeBrEK97uP9XjrJu526LKcbs; expires=Tue, 16-Aug-2011 04:20:58 GMT;
path=/; domain=.google.com; HttpOnly -uG “(p|P)roduct-info.php” -dG sale 1 500 “Powered by oSCommerce”
To write the above to an output file tack a -o FILENAME directly after googleFinder.py
I hope you guys enjoy this and fine it useful. I plan to release a multithreaded version soon, so the speedup will be significant. For now just run the script in several screens.
0 comments:
Post a Comment