You're replying to a comment by chad.

May 05, 2009, 01:32

The code was just a snippet of some other code I have, but in regards to your points:

1. There are only 3 points where errors can creep in that I see:
1- if the urlopen fails or
2- during the htmlparser() if the html is super-malformed (same w/ BeautifulSoup).
3- if google changes their html format(but that will screw up almost any scraper)

The xpath and rest of the code will be work without problem since xpath will return '[]' if the xpath fails.

4. If you change the line:
then lxml handles malformed html almost as well as BeautifulSoup.

Anyways, I enjoy your blog, and just thought that I'd throw that out there.

Reply To This Comment

(why do I need your e-mail?)

(Your twitter handle, if you have one.)

Type the word "cdrom_139": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.