You're replying to a comment by alessio.

alessio Permalink
June 22, 2011, 12:47

Hello, you did a great job and your code really helped me. For the results number I have a very dirty solution that works with the current google version, and that might be improved by someone with regexp skills.
Just add this function (adapted from the previous _extract_info):

def _extract_total_results_num(self, soup):
        empty_info = {'from': 0, 'to': 0, 'total': 0}
        div_ssb = soup.find('div', id='resultStats')
        
        if not div_ssb:
            self._maybe_raise(ParseError, "Div with number of results was not found on Google search page", soup)
            return empty_info

        txt = ''.join(div_ssb.findAll(text=True))
        txt = txt.replace(',', '')
        matches = re.search(r'About \d* results', txt, re.U)
        if not matches:
            return ''
        res_num = matches.group(0)[6:-8]
        return res_num

And then modify the get_results function by substituting:

total': MAX_VALUE}

with:

'total': self._extract_total_results_num(page)}

Just let me know if it worked for someone else...

Reply To This Comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the word "computer": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.