Detecting atom/rss Feeds in Ruby
In the current SNS - Social Networking Site - boom it is becoming increasingly important to deal with usability. People have accounts for many different websites and it's getting more and more tiring to register for a new account. This is one of the main reasons why Confabio.com doesn't require you to signup and login. And it's also one of the main reasons why websites like Wakoopa.com make their registration as painless as possible. A colleague of mine was even experimenting with the idea of omitting the username/email requirement at all. Also, OpenID is yet too young and Sun's Liberty Alliance is just too corporate and slow.
But for most social networking sites it's pretty simple: they just need people to enter information. So let's make that as easy for the user as possible.
Entering Syndication Feeds
For one of my projects I have to let users enter information about themselves. This is so they can build up their own profile. What I really like about some of the new sites is that they aggregate your blog's contents and your FlickR pictures.
One of such websites is the Tokyo based Social Networking Site Asooboo.com. After signing up you can enter your blog feed and FlickR username and it will keep track of all your stories and pictures. I think that's really cool and it's one of the first steps in making the web more ubiquitous. You can later change your Feed URL in your 'edit my profile':
Entering Links instead of Feeds
Entering feeds is nice, but to users that are not tech-savy 'Feed, RSS and Atom' might raise question marks. Therefore I think it would be nice if the users wouldn't have to worry about feeds, but instead can just enter their links like:
My Websites and Profiles:
- http://blog.dominiek.com/
- http://www.flickr.com/photos/dominiekterheide/
- http://del.icio.us/dominiekth
It would then show a fancy spinner and convert it to 'My Blog', 'My Pictures' and 'My Links'. All content will be automatically aggregated if it can detect any RSS feeds on those pages.
Detecting RSS feeds
When you use a proper browser like Mozilla Firefox you will see a syndication icon every time you visit a website that has RSS feeds:
It does this by reading certain HTML tags.
After a quick search I couldn't find any code to do this in my own project, so I wrote a little piece of code for it with a RubyOnRails integration test.
You can use it like this:
FeedDetector.fetch_feed_url('http://blog.dominiek.com/')
=> "http://blog.dominiek.com/feed/atom.xml"
FeedDetector.fetch_feed_url('http://blog.dominiek.com/feed/atom.xml')
=> "http://blog.dominiek.com/feed/atom.xml"
FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :rss)
=> "http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=rss_200"
# alternatively you can parse HTML with FeedDetector.get_feed_path(html_data)
# see integration test for more examples
FeedDetector + Test
Excuse my quick mash code. The FeedDetector (lib/feed_detector.rb):
require 'net/http'
class FeedDetector
##
# return the feed url for a url
# for example: http://blog.dominiek.com/ => http://blog.dominiek.com/feed/atom.xml
# only_detect can force detection of :rss or :atom
def self.fetch_feed_url(page_url, only_detect=nil)
url = URI.parse(page_url)
host_with_port = url.host
host_with_port << ":#{url.port}" unless url.port == 80
req = Net::HTTP::Get.new(url.path)
# something fishy going on with URI.host
res = Net::HTTP.start(url.host.gsub(/:[0-9]+/, ''), url.port) {|http|
http.request(req)
}
feed_url = self.get_feed_path(res.body, only_detect)
feed_url = "http://#{host_with_port}/#{feed_url.gsub(/^\//, '')}" unless !feed_url || feed_url =~ /^http:\/\//
feed_url || page_url
end
##
# get the feed href from an HTML document
# for example:
# ...
# <link href="/feed/atom.xml" rel="alternate" type="application/atom+xml" />
# ...
# => /feed/atom.xml
# only_detect can force detection of :rss or :atom
def self.get_feed_path(html, only_detect=nil)
unless only_detect && only_detect != :atom
md ||= /<link.*href=['"]*([^\s'"]+)['"]*.*application\/atom\+xml.*>/.match(html)
md ||= /<link.*application\/atom\+xml.*href=['"]*([^\s'"]+)['"]*.*>/.match(html)
end
unless only_detect && only_detect != :rss
md ||= /<link.*href=['"]*([^\s'"]+)['"]*.*application\/rss\+xml.*>/.match(html)
md ||= /<link.*application\/rss\+xml.*href=['"]*([^\s'"]+)['"]*.*>/.match(html)
end
md && md[1]
end
end
The integration test (test/integration/feed detector test.rb:
require "#{File.dirname(__FILE__)}/../test_helper"
class FeedDetectorTest < ActionController::IntegrationTest
def test_fetch_feed_url
return # uncomment me to test HTTP fetching
# test mephisto
feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com/')
assert_equal('http://blog.dominiek.com/feed/atom.xml', feed_url)
# test wordpress
feed_url = FeedDetector.fetch_feed_url('http://digigen.nl/')
assert_equal('http://digigen.nl/feed/', feed_url)
# test non conventional port
feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com:8000/')
assert_equal('http://blog.dominiek.com:8000/feed/atom.xml', feed_url)
# test only_detect rss/atom on flickr
feed_url = FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :atom)
assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=atom', feed_url)
feed_url = FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :rss)
assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=rss_200', feed_url)
# make sure that feeds return themselves
feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com/feed/atom.xml')
assert_equal('http://blog.dominiek.com/feed/atom.xml', feed_url)
feed_url = FeedDetector.fetch_feed_url('http://digigen.nl/feed/')
assert_equal('http://digigen.nl/feed/', feed_url)
end
def test_get_feed_path
body = []
body << ' <html>'
body << ' <head>'
body << ' <link href="/super.css" rel="alternate" type="text/css"/>'
body << ' <link href="/feed/atom.xml" rel="alternate" type="application/atom+xml" />'
body << ' </head>'
body << ' </html>'
# Mephisto
feed_path = FeedDetector.get_feed_path(body.join("\n"))
assert_equal('/feed/atom.xml', feed_path)
body[3] = ' <link href=\'/feed/atom.xml\' rel="alternate" type="application/atom+xml" />'
feed_path = FeedDetector.get_feed_path(body.join("\n"))
assert_equal('/feed/atom.xml', feed_path)
# FlickR
body[3] = '<link rel="alternate" type="application/atom+xml" title="Flickr: Photos from dominiekth Atom feed" href="http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=atom">'
feed_path = FeedDetector.get_feed_path(body.join("\n"))
assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=atom', feed_path)
body[4] = '<link rel="alternate" type="application/rss+xml" title="Flickr: Photos from dominiekth RSS feed" href="http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=rss_200">'
feed_path = FeedDetector.get_feed_path(body.join("\n"))
assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=atom', feed_path)
feed_path = FeedDetector.get_feed_path(body.join("\n"), :rss)
assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=rss_200', feed_path)
# Wordpress
body[3] = '<link rel="alternate" type="application/rss+xml" title="Digigen RSS Feed" href="http://digigen.nl/feed/" />'
body[4] = ' </head>'
feed_path = FeedDetector.get_feed_path(body.join("\n"), :atom)
assert_equal(nil, feed_path)
feed_path = FeedDetector.get_feed_path(body.join("\n"), :rss)
assert_equal('http://digigen.nl/feed/', feed_path)
end
end
I'm sure this might be useful to some people so Enjoy!
Sorry, comments are closed for this article.