Today and yesterday I’ve spend another couple of hours coding and designing away on my (autistic) little project Wigitize.com. I’m keeping an exact log of the amount of hours I’m spending and what’s happening so I will publish some full-exposure articles about how to build a project like this.

Currently these are the things that still need to be done:

  • enabling people to choose a style for their widget
  • write a simple API and a page to show how it works
  • optimize backend so that a background worker will be used
  • allowing people to ‘grab the grabber’ so that they can offer widgets of their own content to their users
  • renting a machine with bandwidth, I’m thinking of slicehost.com (much cheaper than EC2)
  • rethinking the name WIGITIZE. It sounds good to my Dutch-English, but it might not for Natives.

Another little sneak preview:

click image to expand

I've wanted to do this for a while now, basically a service that allows you to convert RSS feeds into embeddable widgets. (Little geeky, yes I know, more about this later!)

click to view the mocked up design:


iKnowの日記を自分のミクシィ日記に設定することが出来ます。
Did you know you can set iKnow as a journal on Mixi?

ミクシィで設定変更をクリックして下さい。
Click on settings when logged into your Mixi account.


設定変更のページで「日記・ブログの選択」と「日記・ブログのURL」と「RSSのURL」を入力して下さい。
On the settings page, fill in “blog selection”, “blog URL” and “RSS URL”.

「dominiekth」は私のiKnowユーザーネーム。
Replace “dominiekth” with your own username.

That’s it!

In the current SNS - Social Networking Site - boom it is becoming increasingly important to deal with usability. People have accounts for many different websites and it's getting more and more tiring to register for a new account. This is one of the main reasons why Confabio.com doesn't require you to signup and login. And it's also one of the main reasons why websites like Wakoopa.com make their registration as painless as possible. A colleague of mine was even experimenting with the idea of omitting the username/email requirement at all. Also, OpenID is yet too young and Sun's Liberty Alliance is just too corporate and slow.

But for most social networking sites it's pretty simple: they just need people to enter information. So let's make that as easy for the user as possible.

Entering Syndication Feeds

For one of my projects I have to let users enter information about themselves. This is so they can build up their own profile. What I really like about some of the new sites is that they aggregate your blog's contents and your FlickR pictures.

One of such websites is the Tokyo based Social Networking Site Asooboo.com. After signing up you can enter your blog feed and FlickR username and it will keep track of all your stories and pictures. I think that's really cool and it's one of the first steps in making the web more ubiquitous. You can later change your Feed URL in your 'edit my profile':

Entering Links instead of Feeds

Entering feeds is nice, but to users that are not tech-savy 'Feed, RSS and Atom' might raise question marks. Therefore I think it would be nice if the users wouldn't have to worry about feeds, but instead can just enter their links like:

My Websites and Profiles:

  • http://blog.dominiek.com/
  • http://www.flickr.com/photos/dominiekterheide/
  • http://del.icio.us/dominiekth

It would then show a fancy spinner and convert it to 'My Blog', 'My Pictures' and 'My Links'. All content will be automatically aggregated if it can detect any RSS feeds on those pages.

Detecting RSS feeds

When you use a proper browser like Mozilla Firefox you will see a syndication icon every time you visit a website that has RSS feeds:

It does this by reading certain HTML tags.

After a quick search I couldn't find any code to do this in my own project, so I wrote a little piece of code for it with a RubyOnRails integration test.

You can use it like this:

 FeedDetector.fetch_feed_url('http://blog.dominiek.com/')
 => "http://blog.dominiek.com/feed/atom.xml"
 FeedDetector.fetch_feed_url('http://blog.dominiek.com/feed/atom.xml')
 => "http://blog.dominiek.com/feed/atom.xml"
 FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :rss)
 => "http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=rss_200"
 # alternatively you can parse HTML with FeedDetector.get_feed_path(html_data)
 # see integration test for more examples

FeedDetector + Test

Excuse my quick mash code. The FeedDetector (lib/feed_detector.rb):


require 'net/http'

class FeedDetector

  ##
  # return the feed url for a url
  # for example: http://blog.dominiek.com/ => http://blog.dominiek.com/feed/atom.xml
  # only_detect can force detection of :rss or :atom
  def self.fetch_feed_url(page_url, only_detect=nil)
    url = URI.parse(page_url)
    host_with_port = url.host
    host_with_port << ":#{url.port}" unless url.port == 80
    req = Net::HTTP::Get.new(url.path)
    # something fishy going on with URI.host
    res = Net::HTTP.start(url.host.gsub(/:[0-9]+/, ''), url.port) {|http|
      http.request(req)
    }
    feed_url = self.get_feed_path(res.body, only_detect)
    feed_url = "http://#{host_with_port}/#{feed_url.gsub(/^\//, '')}" unless !feed_url || feed_url =~ /^http:\/\// 
    feed_url || page_url
  end

  ##
  # get the feed href from an HTML document
  # for example:
  # ...
  # <link href="/feed/atom.xml" rel="alternate" type="application/atom+xml" />
  # ...
  # => /feed/atom.xml
  # only_detect can force detection of :rss or :atom
  def self.get_feed_path(html, only_detect=nil)
    unless only_detect && only_detect != :atom
      md ||= /<link.*href=['"]*([^\s'"]+)['"]*.*application\/atom\+xml.*>/.match(html) 
      md ||= /<link.*application\/atom\+xml.*href=['"]*([^\s'"]+)['"]*.*>/.match(html) 
    end
    unless only_detect && only_detect != :rss
      md ||= /<link.*href=['"]*([^\s'"]+)['"]*.*application\/rss\+xml.*>/.match(html) 
      md ||= /<link.*application\/rss\+xml.*href=['"]*([^\s'"]+)['"]*.*>/.match(html) 
    end
    md && md[1]
  end

end

The integration test (test/integration/feed detector test.rb:


require "#{File.dirname(__FILE__)}/../test_helper"


class FeedDetectorTest < ActionController::IntegrationTest

  def test_fetch_feed_url
    return # uncomment me to test HTTP fetching

    # test mephisto
    feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com/')
    assert_equal('http://blog.dominiek.com/feed/atom.xml', feed_url)
    # test wordpress
    feed_url = FeedDetector.fetch_feed_url('http://digigen.nl/')
    assert_equal('http://digigen.nl/feed/', feed_url)

    # test non conventional port
    feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com:8000/')
    assert_equal('http://blog.dominiek.com:8000/feed/atom.xml', feed_url)

    # test only_detect rss/atom on flickr
    feed_url = FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :atom)
    assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=atom', feed_url)
    feed_url = FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :rss)
    assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=rss_200', feed_url)

    # make sure that feeds return themselves
    feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com/feed/atom.xml')
    assert_equal('http://blog.dominiek.com/feed/atom.xml', feed_url)
    feed_url = FeedDetector.fetch_feed_url('http://digigen.nl/feed/')
    assert_equal('http://digigen.nl/feed/', feed_url)
  end

  def test_get_feed_path
    body = []
    body << ' <html>'
    body << '  <head>'
    body << '   <link href="/super.css" rel="alternate" type="text/css"/>'
    body << '   <link href="/feed/atom.xml" rel="alternate" type="application/atom+xml" />'
    body << '  </head>'
    body << ' </html>'

    # Mephisto
    feed_path = FeedDetector.get_feed_path(body.join("\n"))
    assert_equal('/feed/atom.xml', feed_path)
    body[3] = '   <link href=\'/feed/atom.xml\' rel="alternate" type="application/atom+xml" />'
    feed_path = FeedDetector.get_feed_path(body.join("\n"))
    assert_equal('/feed/atom.xml', feed_path)

    # FlickR
    body[3] = '<link rel="alternate" type="application/atom+xml" title="Flickr: Photos from dominiekth Atom feed" href="http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=atom">'
    feed_path = FeedDetector.get_feed_path(body.join("\n"))
    assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=atom', feed_path)
          body[4] = '<link rel="alternate"   type="application/rss+xml" title="Flickr: Photos from dominiekth RSS feed" href="http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=rss_200">'
    feed_path = FeedDetector.get_feed_path(body.join("\n"))
    assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=atom', feed_path)
    feed_path = FeedDetector.get_feed_path(body.join("\n"), :rss)
    assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=rss_200', feed_path)

    # Wordpress
    body[3] = '<link rel="alternate" type="application/rss+xml" title="Digigen RSS Feed" href="http://digigen.nl/feed/" />'
    body[4] = ' </head>'
    feed_path = FeedDetector.get_feed_path(body.join("\n"), :atom)
    assert_equal(nil, feed_path)
    feed_path = FeedDetector.get_feed_path(body.join("\n"), :rss)
    assert_equal('http://digigen.nl/feed/', feed_path)
  end

end

I'm sure this might be useful to some people so Enjoy!