Posted By

noah on 12/09/07


search google results commandline iterator perl wget metrics aggregator lynx scraping analysis one-liners

Versions (?)

Who likes this?

3 people have marked this snippet as a favorite


Scrape Google from the command line

 / Published in: Bash

This code is POC only -- actually using it would violate Google's TOS, which forbids scraping. It is published here for educational value only.

Hypothetically, the following command should return a list of the top 500 or so hits in Google for

The results will be prepended with digits, followed by a dot and some whitespace (Lynx adds these).

You must have Lynx and Wget installed on your system for this to work.

Keep in mind that *nix shells don't like it when you double-quote strings, see the comments.

  1. perl -e "$i=0;while($i<1000){sleep 1; open(WGET,qq/|xargs lynx -dump/);printf WGET qq{$i&sa=N},$i+=10}" | grep "\/\/[^/]*\/"

Report this snippet  


RSS Icon Subscribe to comments
Posted By: hemanthhm on January 11, 2009

syntax error at -e line 1, near "=" Unterminated operator at -e line 1.


Posted By: noah on June 11, 2009

@hemanthhm I don't know what to tell you -- it works fine for me, I just double-checked.

Posted By: knshetty on July 19, 2009

For some reason if a perl script that is followed with quotes (i.e. perl -e ".....") produces syntax error, then try such an alternative -> perl -e '.....' Hence, applying the above pattern to the script at hand, we get -> perl -e '$i=0;while($i

Posted By: noah on September 29, 2009

I think I finally get what the problem was here: the *nix shell uses single quotes, while the DOS/Windows shell uses double quotes. So you have to be aware of which platform you are on and wrap the argument to perl -e in the appropriate type of quotes.

Posted By: scraper on November 21, 2009

Thanks for the nice perl command. That for sure is one more proof that perl is a spaghetti langauge but powerful :-)

While this perl/lynx code will work to get results it won't really work well.

I recently stumbled upon an article called "Scraping Google for Fun and Profit", it goes much deeper into that subject. It shows how you can scrape not only a few hundred, it can scrape millions of hits from Google. Free PHP code, including filtering of advertisement and parsing the data (title, descripion, host, url, etc) into an array is included.

Works for web and console.

Here is the article, hope you like it:

You need to login to post a comment.