You're viewing a comment by chris and its responses.

chris Permalink
August 23, 2010, 10:50

Very nice program! I'm finding google hates being scraped unless I put in a huge delay. I have a need to thoroughly sift one single domain name for tens of thousands of pages of data. This is going to take weeks at this rate. Does anyone know where I can get / buy archived search data so I could sort it locally without lag and terms of service issues?

Comment Responses

Bob Permalink
October 19, 2010, 21:13

Get a web crawling program, and just crawl that domain. Skip Google.

I do this all the time, using a free program called WinHTTrack (on Windows; also available on other platforms). See

Aim HTTrack at the top page of the site, and start the crawling. It does a great job grabbing anything that is linked.

Reply To This Comment

(why do I need your e-mail?)

(Your twitter handle, if you have one.)

Type the word "floppy_139": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.