You're replying to a comment by Chris S.

Chris S Permalink
February 17, 2011, 23:35

Hello,
I was running into issues calculating the number of search results (num_results) returned from Google as they changed their layout and formating. The following is one solution I found to the problem. I have edited the regular expression to accept the new format along with small other tweaks.

 
def _extract_info(self, soup):
    empty_info = {'from': 0, 'to': 0, 'total': 0}
    div_results = soup.find('div', id='resultStats')  #div handle has changed
    if not div_results:
        self._maybe_raise(ParseError, "Div with number of results was not found on Google search page", soup)
        return empty_info

    txt = ''.join(div_results.findAll(text=True))
    txt = txt.replace(',', '')          #Remove commas
    txt = txt.rstrip(' ')		#Remove line break

    ##new format: About XXXXX results  (x.xx seconds)
    matches = re.search( r'%s (\d+) %s\s+\((\d+\.\d+) %s\)' % self._re_search_strings, txt, re.U)

    if not matches:
        return empty_info
    return {'total': int(matches.group(1)), 'time': float(matches.group(2))} 

I have only tested the above code using a few queries but it appears to return the correct results. I hope this will be of help to someone.

Chris

Reply To This Comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the word "linux": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.