Why Does the Google Search API Suck?

First attempts

Ruby doesn’t have CPAN

I spent quite a bit of time Friday evening and this morning trying to get a simple Ruby script to pull search data from Google’s crappy search API. There is a standard library for SOAP/WSDL stuff for Ruby here. It is, like so many other potentially good Ruby libraries, almost totally undocumented.

I played around with that for a while and failed to get anything useful. After a little more searching, I found ruby-google on the RAA. It looked promising, specifically:

Ruby/Google offers a higher-level abstraction of Google’s SOAP-driven Web API. It allows you to programmatically query the Google search-engine from the comfort of your favourite programming language, as long as that’s Ruby.

The aim of the library is to make the details of the raw data structures returned by the Web API irrelevant, in the process making the API more accessible for everyday use.

After installing it on amartya, our development server, I tried using the thing. It barfed with some error. I then tried tweaking it for a bit but gave up because the thing hadn’t been updated since mid-2003.

More Annoyed

The rest of the evening and this morning progressed in the same way. Long story short: tried to get a Ruby version to work on 3 different machines (and OSes), gave up on Ruby.

Perl has CPAN

I switched to Perl and happily found CPAN modules that looked promising, specifically Net::Google .

Not if you haven’t got a good version or Perl

I then spent hours trying to install all of the dependencies for the CPAN meta-module on amartya before realizing that it was using 5.005_03!. After that, I gave up completely on amartya, tried on my Dreamhost machine, failed because I wasn’t root [yes, I know how to fix that but was too annoyed by this point], then failed on my iBook because I somehow screwed up the urllist to download from [yeah, I tried fixing that too].

Why Google?

Then I stopped and looked at Yahoo’s API. It worked out of the box (in Ruby). Here’s the basic code: (with WordPress escaping the double quotes for some reason)

require 'rexml/document'
require 'net/http'
APP_ID = "whatever_you_asked_Yahoo_for"
query = "a%20search%20string"
url ="http://api.search.yahoo.com/WebSearchService/V1/webSearch?appid=#{APP_ID}&query=#{query}"
  response = Net::HTTP.get_response(URI.parse(url))
data_xml = REXML::Document.new(response.body)
data_xml.elements.each("ResultSet/Result") {|r|
  puts r.elements["Title"].text,
  puts r.elements["Url"].text,
  puts r.elements["Summary"].text

I wish I would have tried Yahoo in the beginning.

Comments are closed.