July 25, 2003

Scraping news.google.de

Ah, Google, endless source of joy and anger! Google's German version recently launched news.google.de, a news aggregator collecting headlines from online news sources.

I'd really like to know how they do it since they claim that it works without human intervention and it does an impressive job at recognizing common subjects and clustering them. But my guess is that Google won't publish any details about their technology. The best we can hope for is a joke, like in their "explanation" of PageRank.

Another thing Google doesn't provide us with is a RSS feed for the news. However, it's easy to roll your own. I did a version in Perl on the day news.google.de started, but it can also be done in a more exotic language.

#!/usr/bin/ruby
require 'net/http'
google_s = Net::HTTP.get('news.google.de', '/news/de/de/mainlite.html')
google_re = Regexp.new('<a class=y href="(.*?)">(.*?)<\/a>.*?<\/b><br>(.*?)<br>')
puts "<?xml version=\"1.0\" encoding=\"utf-8\"?>"
puts "<rdf:RDF"
puts "  xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\""
puts "  xmlns:dc=\"http://purl.org/dc/elements/1.1/\""
puts "  xmlns:sy=\"http://purl.org/rss/1.0/modules/syndication/\""
puts "  xmlns=\"http://purl.org/rss/1.0/\">"
puts "<channel rdf:about=\"http://news.google.de/\">"
puts "<title>news.google.de</title>"
puts "<link>http://news.google.de/</link>"
puts "<description>Schlagzeilen von Google (auf deutsch)</description>"
puts "<dc:language>de-de</dc:language>"
puts "<dc:creator></dc:creator>"
puts "</channel>"
while google_m = google_re.match(google_s)
  puts "<item>"
  puts "  <link>http://news.google.de/#{google_m[1]}</link>"
  puts "  <title>#{google_m[2]}</title>"
  puts "  <description>#{google_m[3]}</description>"
  puts "</item>"
  google_s.sub!(google_re, '')
end
puts "</rdf:RDF>"
Posted by jens at July 25, 2003 11:40 AM
Comments

Hi, I just noticed this page due to a referer. Can I clarify that Google doesn't inspire anger in me? Anger at the people who spam and censor it - but not Google itself.

Posted by: Seth Finkelstein at February 2, 2004 10:51 AM

http://compare-penis-enlargement.penis-enlargement-techniques-1.us
http://enlarge-penis.penis-enlargement-techniques-1.us
http://enlarging-of-penis.penis-enlargement-techniques-1.us
http://enlarging-penis.penis-enlargement-techniques-1.us
http://enlarging-penis-techniques.penis-enlargement-techniques-1.us
http://penile-enlargement.penis-enlargement-techniques-1.us
http://penile-enlargement-methods.penis-enlargement-techniques-1.us
http://penile-enlargement-programs.penis-enlargement-techniques-1.us
http://penile-enlargement-techniques.penis-enlargement-techniques-1.us
http://penis-enlargement.penis-enlargement-techniques-1.us
http://penis-enlargement-compared.penis-enlargement-techniques-1.us
http://penis-enlargement-exercises.penis-enlargement-techniques-1.us
http://penis-enlargement-methods.penis-enlargement-techniques-1.us
http://penis-enlargement-pills.penis-enlargement-techniques-1.us
http://penis-enlargement-techniques.penis-enlargement-techniques-1.us
http://penis-enlargement-tecniques.penis-enlargement-techniques-1.us
http://penis-enlarging-techniques.penis-enlargement-techniques-1.us
http://penis-enlarging-ways.penis-enlargement-techniques-1.us
http://penis-enlargment.penis-enlargement-techniques-1.us
http://penis-pills.penis-enlargement-techniques-1.us
http://www.penis-enlargement-techniques-1.us
http://bigger-penis.penis-bigger-larger.us
http://bigger-penis-pills.penis-bigger-larger.us
http://big-penis.penis-bigger-larger.us
http://for-men-only-exercise.penis-bigger-larger.us
http://large-penis.penis-bigger-larger.us
http://larger-penis.penis-bigger-larger.us
http://more-penis-exercise.penis-bigger-larger.us
http://penis.penis-bigger-larger.us
http://prosolutions.penis-bigger-larger.us
http://viacyn-pills.penis-bigger-larger.us

Posted by: penis size increase at March 25, 2004 05:54 AM

http://vigrx.penis-bigger-larger.us
http://vig-rx.penis-bigger-larger.us
http://vig-rx-pills.penis-bigger-larger.us
http://virility-pills.penis-bigger-larger.us
http://www.penis-bigger-larger.us
http://elongate-penis.penis-size-increase.us
http://increaseing-penis-length.penis-size-increase.us
http://increase-penis.penis-size-increase.us
http://increase-penis-width.penis-size-increase.us
http://lengthning-penis-size.penis-size-increase.us
http://longer-penis.penis-size-increase.us
http://penis-length.penis-size-increase.us
http://penis-lengthening.penis-size-increase.us
http://penis-size-increase.penis-size-increase.us
http://penis-sizes.penis-size-increase.us
http://penis-width-increase.penis-size-increase.us
http://penus-size-increase.penis-size-increase.us
http://widen-penis.penis-size-increase.us
http://www.penis-size-increase.us
http://best-penis-enhancement-methods.penis--enhancement.biz
http://compare-penis-enhancement.penis--enhancement.biz
http://enhancement.penis--enhancement.biz
http://enhancing-penis.penis--enhancement.biz
http://methods-of-enhancing-penis.penis--enhancement.biz
http://penile.penis--enhancement.biz
http://penile-enhancement.penis--enhancement.biz
http://penile-enhancment.penis--enhancement.biz
http://penis.penis--enhancement.biz
http://penis-augmentation.penis--enhancement.biz
http://penis-enhancement.penis--enhancement.biz
http://penis-enhancement-methods.penis--enhancement.biz
http://penis-enhancement-techniques.penis--enhancement.biz

Posted by: penis sizes at March 25, 2004 05:54 AM

tramadol

Posted by: tramadol at June 4, 2005 07:22 PM
Post a comment









Remember personal info?