Scrape Google from the command line

 Bash

This code is POC only -- actually using it would violate Google's TOS, which forbids scraping. It is published here for educational value only.

Hypothetically, the following command should return a list of the top 500 or so hits in Google for

The results will be prepended with digits, followed by a dot and some whitespace (Lynx adds these).

You must have Lynx and Wget installed on your system for this to work.

Keep in mind that *nix shells don't like it when you double-quote strings, see the comments.

  1. perl -e "$i=0;while($i<1000){sleep 1; open(WGET,qq/|xargs lynx -dump/);printf WGET qq{$i&sa=N},$i+=10}" | grep "\/\/[^/]*\/"

Posted By: hemanthhm on January 11, 2009

syntax error at -e line 1, near "=" Unterminated operator at -e line 1.


Posted By: noah on June 11, 2009

@hemanthhm I don't know what to tell you -- it works fine for me, I just double-checked.

Posted By: knshetty on July 19, 2009

For some reason if a perl script that is followed with quotes (i.e. perl -e ".....") produces syntax error, then try such an alternative -> perl -e '.....' Hence, applying the above pattern to the script at hand, we get -> perl -e '$i=0;while($i

Posted By: noah on September 29, 2009

I think I finally get what the problem was here: the *nix shell uses single quotes, while the DOS/Windows shell uses double quotes. So you have to be aware of which platform you are on and wrap the argument to perl -e in the appropriate type of quotes.

Posted By: scraper on November 21, 2009

Thanks for the nice perl command. That for sure is one more proof that perl is a spaghetti langauge but powerful :-)

