You're viewing a comment by Chris S and its responses.
You're viewing a comment by Chris S and its responses.
I am being sponsored by Syntress! They bought me an amazing dedicated server to run catonmat on. If you're looking web services, I highly recommend the Syntress guys!
I am being sponsored by A-Writer! If you ever need help with essay writing, look no further than A-Writer! They will help you with your writing in as quickly as 3 hours!
I love to read science books. They make my day and I get ideas for awesome blog posts, such as Busy Beaver, On Functors, Recursive Regular Expressions and many others.
Take a look at my
Amazon wish list, if you're curious about what I have planned reading next, and want to surprise me. :)
If you are interested in advertising on catonmat.net, contact me.
Free tools for coding on Vietstarsoft.com.
Programming homework help.


Hello,
I was running into issues calculating the number of search results (num_results) returned from Google as they changed their layout and formating. The following is one solution I found to the problem. I have edited the regular expression to accept the new format along with small other tweaks.
def _extract_info(self, soup): empty_info = {'from': 0, 'to': 0, 'total': 0} div_results = soup.find('div', id='resultStats') #div handle has changed if not div_results: self._maybe_raise(ParseError, "Div with number of results was not found on Google search page", soup) return empty_info txt = ''.join(div_results.findAll(text=True)) txt = txt.replace(',', '') #Remove commas txt = txt.rstrip(' ') #Remove line break ##new format: About XXXXX results (x.xx seconds) matches = re.search( r'%s (\d+) %s\s+\((\d+\.\d+) %s\)' % self._re_search_strings, txt, re.U) if not matches: return empty_info return {'total': int(matches.group(1)), 'time': float(matches.group(2))}I have only tested the above code using a few queries but it appears to return the correct results. I hope this will be of help to someone.
Chris
Reply To This Comment