Next Opportunities in Seamless Web Integration
on January 15, 2008This is a more extensive writeup of a problem I have talked about before. It’s about something many of us web innovators face today. Like many problems, they can be turned into opportunities if implemented well.
The new Web – let’s stop using 2.0 – is much more than social networking sites and round cornered layouts. As Tim O’Reilly said, people who think Web 2.0 is about AJAX are completely missing the boat. This is true, but it is important to keep in mind that the shift from desktop products to web services is a big deal. Online services are already gaining a lot of productivity for people and thanks to the characteristics of the web the companies developing it can distribute and evolve much better. So this means that certain value adders have now migrated to the web.
What is happening now is that many of these online services want to be able to harness the power of the new social characteristics of the web. People have their friends online now and there is a lot of activity between those people and services (UGC). Everybody wants to tap into this attention stream between people. It helps contextualize your offering and helps spread the goodness between like minded.
Unfortunately, getting this social activity going around your service is not easy. And frankly, most of the time it results in creating another lousy SNS with worse social features. Fortunately, most online services open up their API’s allowing you to integrate with their services to create a mashup.
- rate, review and discuss cheeses
- have support groups like: ‘Why is cheese so expensive in Japan?’
- add cheeses they’ve eaten, for example through IM or mail
- upload/import pictures of their cheese related experience
- provide little cheese diary entries for the real fanatics
- using Twitter to add and rate cheeses
- using Flickr to upload/import pictures about cheeses
- using Tumblr to provide journal functionality
We could use one of those create-your-own-community sites like Ning, but than we can’t build our core business around it: selling and recommending cheese based on their attention profiles. Fulfilling all cheese needs to our customers is why we exist.
Mashing up a service like this using Twitter, Flickr and Tumblr is easy. Turning that mashup into a success is nearly impossible. The problem lies in the shallowness of the way we do mashups now. People need a Twitter, Flickr and Tumblr account to use all functionality in our system. Explain that to the 60 year old French cheese farmer who just plugged in his DSL modem!
Now, I don’t want to start debating about web savvyness and targeting certain markets. My main point is that there is a certain flaw in all API’s: You need to register with the API provider in order to make use of the API anywhere.

- our cheese shop is using OpenID
- our cheese shop can create a new OpenID easily
- the services we integrate with are using OpenID
- the services allow first-time use of their service without ANY burden to our customers (truly seamless)
I think OpenID is the path we are moving towards, although I’m sure that at first most services will not be prepared for the seamlessness we want. Also, I have an interesting scenario for the case that OpenID doesn’t take off (will save that one for later).
Services will be able to focus on their core added value by specializing. Building things that have been build before is an extra baggage that seamless integration can fix. Also, when a service continues to specialize it needs to be thinking from the very start about it’s own integration in the collective. In our Cheese example, our service should have an API from day 1.
But right now, most services are not ready for this seamless integration. If FlickR would allow use of their API without showing any FlickR logos or Flickr ads, how would they make money? What if we want to display our own CheesR logo everywhere and completely exclude the user from knowing FlickR is used?
This is where Freemium Integration is born.
Right now, most popular services offer API’s. If you look carefully most of them don’t guarantee any service with the exception of Google’s. API’s are in their very early stage.
Soon, API’s will become the core of any successful online service. They will be the door to most of the revenue. Services will continue to specialize in the things they do well, recommending wine, recommending books or providing entertainment you appreciate. This specialization might go on to a point where there will no longer be an actual site, but just a widget, API, Facebook Application or some other integratable entity.
These integratable entities – that I will just call API’s for now – will have several ways of generating revenue for their creators.
- Free Integration. This could be either seamless or semi-seamless integration together with in-content contextual advertisement. In order to make these advertisements contextual there will need to be some kind of Attention Profile exchange between the two parties.
- Premium Integration. Seamless integration without brands or attention debt. The client-party will have to pay the provider an Amazon AWS like compensation for usage.
I think over the next short-term march towards the first intelligent agents, it is important to keep the API in mind. Make sure that your online service can be used with or without OpenID and that third parties can integrate seamlessly without any hassle. This also means changing your mindset towards branding since this is a concept likely to change very rapidly soon.
The purpose of this article is to spark discussion from the community. Have any of you encountered a similar discussion or desire for seamlessness? Any opinions welcome!
Wigitize.com, widgitize any web feed
on December 31, 2007Twittering, blogging and yocto-content
on September 10, 2007Most of you might have heard about the latest trend: twitter.com. At first I thought the idea was really stupid and could only be used for egocentric people (like me):
- twitter is a blog that allows you to write posts with a maximum of about 250 characters
- these one-liners can be 'watched' by your friends or you can watch your friends

"I'm at the backery", "Britney spears is so cool!" are examples of these so-called 'tweets'. But these examples are bad. Like blogging, people can write really really useless things. Blogs are also most often abused by people who write about 'how depressed they are' or how shallow they are.
But some blogs, a small number, provide real high quality content. Some of them even get printed to books (The book I'm reading now by Seth Godin appears to be a printout of his blog).
If I'm right, blog posts are micro-content (and my semantic web obsessed colleague can correct me if I'm wrong). Quality blog posts are on average the size of magazine articles and they provide the same information. Blog posts have the extra advantage of being interconnected in the blogosphere.
Twitter on the other hand is nano-content, and it has obviously other uses. People are not quite clear yet about these uses. An example of high quality tweets are: "stranded in Korea because of typhoon" and "Harry Potter dies in the end". These provide quick communications to the people that are subscribed to your messages.
Another use would be to use twitter as a thought notebook, where you can write little ideas like "hey, what about yocto-content" or "KFC is booming here, buy KFC China stocks". An advantage that these tweets have is that they can be written quickly which allows you to have a very active messaging stream. This is one of the main abstract uses of twitter: displaying activity. My colleage had an interesting suggestion that just like corporate blogs, there might be a use for corporate twitter-like applications.
As a wannabe entrepreneur, I think there are 2 big opportunities here:
Twitter without the hype
There are many tools that make twittering more easy, like twittermail (mail2twitter). But there is one obvious thing always painfully stuck in our eyes: the big ugly twitter logo on our profiles. On your twitter page (like http://twitter.com/dominiek) you can only customize the layout for a bit, but it will always look like this.
Therefore, it would be great to have a service that allows you to simply have a list of latest-thoughts or latest-communiques. Also allowing geeks to put in their own markup code and to attach their own domains (like thoughts.dominiek.com). But more importantly to allow corporations to make use their own brand. So this service should be transparent and brandless (conforms with Seth's statement that branding is a dying industry, sorry Russ Meyer).
Yocto-content
If twitter is nano content, would something even smaller also work? Let's check wikipedia for a name:

So what would this yocto-content look like? Probably one word or a hyphened word. You can have a stream of simple keywords to 'tag your life' in a way. For example: work, container, work, work, namkee, heineken, back-to-work, vacation, china, work, holland, bureaucracy, fly, korean You could then visualize this stream (over time) like a tagcloud. Also you can compare it to the cloud of other people and detect similar lives or interests. Other uses are still to be explored.. ;]
Detecting atom/rss Feeds in Ruby
on June 22, 2007In the current SNS - Social Networking Site - boom it is becoming increasingly important to deal with usability. People have accounts for many different websites and it's getting more and more tiring to register for a new account. This is one of the main reasons why Confabio.com doesn't require you to signup and login. And it's also one of the main reasons why websites like Wakoopa.com make their registration as painless as possible. A colleague of mine was even experimenting with the idea of omitting the username/email requirement at all. Also, OpenID is yet too young and Sun's Liberty Alliance is just too corporate and slow.
But for most social networking sites it's pretty simple: they just need people to enter information. So let's make that as easy for the user as possible.
Entering Syndication Feeds
For one of my projects I have to let users enter information about themselves. This is so they can build up their own profile. What I really like about some of the new sites is that they aggregate your blog's contents and your FlickR pictures.
One of such websites is the Tokyo based Social Networking Site Asooboo.com. After signing up you can enter your blog feed and FlickR username and it will keep track of all your stories and pictures. I think that's really cool and it's one of the first steps in making the web more ubiquitous. You can later change your Feed URL in your 'edit my profile':
Entering Links instead of Feeds
Entering feeds is nice, but to users that are not tech-savy 'Feed, RSS and Atom' might raise question marks. Therefore I think it would be nice if the users wouldn't have to worry about feeds, but instead can just enter their links like:
My Websites and Profiles:
- http://blog.dominiek.com/
- http://www.flickr.com/photos/dominiekterheide/
- http://del.icio.us/dominiekth
It would then show a fancy spinner and convert it to 'My Blog', 'My Pictures' and 'My Links'. All content will be automatically aggregated if it can detect any RSS feeds on those pages.
Detecting RSS feeds
When you use a proper browser like Mozilla Firefox you will see a syndication icon every time you visit a website that has RSS feeds:
It does this by reading certain HTML tags.
After a quick search I couldn't find any code to do this in my own project, so I wrote a little piece of code for it with a RubyOnRails integration test.
You can use it like this:
FeedDetector.fetch_feed_url('http://blog.dominiek.com/')
=> "http://blog.dominiek.com/feed/atom.xml"
FeedDetector.fetch_feed_url('http://blog.dominiek.com/feed/atom.xml')
=> "http://blog.dominiek.com/feed/atom.xml"
FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :rss)
=> "http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=rss_200"
# alternatively you can parse HTML with FeedDetector.get_feed_path(html_data)
# see integration test for more examples
FeedDetector + Test
Excuse my quick mash code. The FeedDetector (lib/feed_detector.rb):
require 'net/http'
class FeedDetector
##
# return the feed url for a url
# for example: http://blog.dominiek.com/ => http://blog.dominiek.com/feed/atom.xml
# only_detect can force detection of :rss or :atom
def self.fetch_feed_url(page_url, only_detect=nil)
url = URI.parse(page_url)
host_with_port = url.host
host_with_port << ":#{url.port}" unless url.port == 80
req = Net::HTTP::Get.new(url.path)
# something fishy going on with URI.host
res = Net::HTTP.start(url.host.gsub(/:[0-9]+/, ''), url.port) {|http|
http.request(req)
}
feed_url = self.get_feed_path(res.body, only_detect)
feed_url = "http://#{host_with_port}/#{feed_url.gsub(/^\//, '')}" unless !feed_url || feed_url =~ /^http:\/\//
feed_url || page_url
end
##
# get the feed href from an HTML document
# for example:
# ...
# <link href="/feed/atom.xml" rel="alternate" type="application/atom+xml" />
# ...
# => /feed/atom.xml
# only_detect can force detection of :rss or :atom
def self.get_feed_path(html, only_detect=nil)
unless only_detect && only_detect != :atom
md ||= /<link.*href=['"]*([^\s'"]+)['"]*.*application\/atom\+xml.*>/.match(html)
md ||= /<link.*application\/atom\+xml.*href=['"]*([^\s'"]+)['"]*.*>/.match(html)
end
unless only_detect && only_detect != :rss
md ||= /<link.*href=['"]*([^\s'"]+)['"]*.*application\/rss\+xml.*>/.match(html)
md ||= /<link.*application\/rss\+xml.*href=['"]*([^\s'"]+)['"]*.*>/.match(html)
end
md && md[1]
end
end
The integration test (test/integration/feed detector test.rb:
require "#{File.dirname(__FILE__)}/../test_helper"
class FeedDetectorTest < ActionController::IntegrationTest
def test_fetch_feed_url
return # uncomment me to test HTTP fetching
# test mephisto
feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com/')
assert_equal('http://blog.dominiek.com/feed/atom.xml', feed_url)
# test wordpress
feed_url = FeedDetector.fetch_feed_url('http://digigen.nl/')
assert_equal('http://digigen.nl/feed/', feed_url)
# test non conventional port
feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com:8000/')
assert_equal('http://blog.dominiek.com:8000/feed/atom.xml', feed_url)
# test only_detect rss/atom on flickr
feed_url = FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :atom)
assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=atom', feed_url)
feed_url = FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :rss)
assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=rss_200', feed_url)
# make sure that feeds return themselves
feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com/feed/atom.xml')
assert_equal('http://blog.dominiek.com/feed/atom.xml', feed_url)
feed_url = FeedDetector.fetch_feed_url('http://digigen.nl/feed/')
assert_equal('http://digigen.nl/feed/', feed_url)
end
def test_get_feed_path
body = []
body << ' <html>'
body << ' <head>'
body << ' <link href="/super.css" rel="alternate" type="text/css"/>'
body << ' <link href="/feed/atom.xml" rel="alternate" type="application/atom+xml" />'
body << ' </head>'
body << ' </html>'
# Mephisto
feed_path = FeedDetector.get_feed_path(body.join("\n"))
assert_equal('/feed/atom.xml', feed_path)
body[3] = ' <link href=\'/feed/atom.xml\' rel="alternate" type="application/atom+xml" />'
feed_path = FeedDetector.get_feed_path(body.join("\n"))
assert_equal('/feed/atom.xml', feed_path)
# FlickR
body[3] = '<link rel="alternate" type="application/atom+xml" title="Flickr: Photos from dominiekth Atom feed" href="http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=atom">'
feed_path = FeedDetector.get_feed_path(body.join("\n"))
assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=atom', feed_path)
body[4] = '<link rel="alternate" type="application/rss+xml" title="Flickr: Photos from dominiekth RSS feed" href="http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=rss_200">'
feed_path = FeedDetector.get_feed_path(body.join("\n"))
assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=atom', feed_path)
feed_path = FeedDetector.get_feed_path(body.join("\n"), :rss)
assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=rss_200', feed_path)
# Wordpress
body[3] = '<link rel="alternate" type="application/rss+xml" title="Digigen RSS Feed" href="http://digigen.nl/feed/" />'
body[4] = ' </head>'
feed_path = FeedDetector.get_feed_path(body.join("\n"), :atom)
assert_equal(nil, feed_path)
feed_path = FeedDetector.get_feed_path(body.join("\n"), :rss)
assert_equal('http://digigen.nl/feed/', feed_path)
end
end
I'm sure this might be useful to some people so Enjoy!
Second the Haque web 2.0 Borrel
on June 17, 2007Yesterday I've attended a borrel - a gathering for drinks (のみかい) in the beautiful city of the Haque. About twenty Dutch web entrepreneurs met up in the appropriate venue Ondernemerscafe (entrepreneur cafe). There were a lot of interesting people and the food was quite nice. I'm not quite sure who made the free beers possible, but thanks a lot!
The following projects are both domestically and internationally interesting:
- fleck.com adding collaborative notes to web pages
- wakoopa.com community software usage
- roomware project physical world meets virtual self
And I'm happy to anounce that people we're quite excited about:
Friday I released the first version of confabio.com and it's running quietly now. This version is just to offer a sneak preview like I did at the meeting yesterday. Plug in your cam and check it out: confabio.com!
How Last.fm saved my Love for Music
on May 31, 2007Music - the symphonic phenomenon that binds all us Humans and maybe even separates us from the lower lifeforms - is something I've been forgetting over the past two years. The last two years I've been very focussed on my career and I haven't spend much time downloading or listening to music. Since my rock teenager days my passion for music has been decreasing. The only moments of music for me was around the time I purchased an iPod or went to [the Royal Dutch Concert Building].
But now, all has changed.
I found a way to be lazy, not having to purchase or download any songs. This new way, this revolution, is called Last.fm. Some of my friends already had an account for a long time. Typically the real audiophiles were the early users of this system. Ok so what's that last.fm and why is it so great?
Last.fm User Experience
Last.fm is a little web startup from London. A few months ago they probably had a double dozen of employees (now 40). Last.fm is an online audio player that gives you these experiences:
- generate your own 'radio' by entering artists/categories
- unlike normal radio you can skip tracks, love tracks or ban tracks
- while playing, real-time user generated information is displayed in a wiki fashion
- the company's obsession with statistics allows you to see your musical compatibility with other users
- when listening to iTunes, last.fm will try to spy on what you listen to mostly
- The Long Tail, one of last.fm's goals is making all the music in the world available
Basically last.fm and the whole world knows what kind of music you like. For most people this is not a problem, since they really really like showing off what music they like (especially teenagers).
The effects of the Long Tail allows you to explore niches you didn't think existed:

Look I'm the top listener of that dude from Okinawa!
Last.fm Freemium
Last.fm get's the essence of music, they just 'get it'. To quote the first sentence from this research paper on 'Why do Humans Value Music?':
Whenever and wherever humans have existed music has existed also. Since music occurs only when people choose to create and share it, and since they always have done so and no doubt always will, music clearly must have important value for people.
You hear that? Sharing the music!
Like Google, last.fm connects supply with demand. Thanks to their thorough statistical systems, they really know what people like. Last.fm has a Freemium business model that allows you to upgrade your account for 30$/year to get these benefits:
- listening to the tracks you specified with the 'love' button
- listening to the tracks of your musical compatible friends
- listen to your own composed playlists
- social status (like Flickr-pro)
For now, last.fm doesn't offer real 'select and play'. All tracks played are randomly contained in a radio stream of at least 10 tracks. They do have a little link that allows you to 'buy the cd' (linking you to amazon.com). I assume that in the future we can expected a button that says 'buy and put in my songs'. Of course they depend on the slowly evolving record industry for that.
Maybe these advertisements on their website will speed things up:

Anouncing Confabio.com (beta), Web Video Conference
on May 17, 2007The project is well underway to developing a nice product. Also, as you might see in the below picture, it's kind of fun to use!
Digigen.nl, joint web entrepreneurship blog
on May 02, 2007During my scarce free time in the weekends and nights, I will work on some of my web 2.0 projects. A good friend of mine, Aram Versteegen, who’s doing the same thing will join me in blogging the hardships of starting cool web projects.
digigen.nl will be a full-disclosure of our idea’s, technologies and experiences.
Newest articles:
Digigen, Short History and Naked Future
Web Video Conference for the Masses
New Web Project, Confabio.com
on April 24, 2007This article is better explained on my new joined entrepreneur blog: digigen.nl
Ok I think I will blog my idea’s more openly. Being afraid that someone might steal it is just stupid, for several reasons. As Clive Thompson correctly points out in the latest Wired businesses are starting to realize the importance of getting naked. Sharing the secrets, mistakes and hardships. Also in the great manifest Getting Real there is this simple equation:
idea can be:
- -10 really dumb
- 1 stupid
- 5 ok
- 10 smart
- 20 excellent
execution can be:
- 1 almost nothing
- 10 bad
- 1000 ok
- 10000 good
- 10000000 excellent
If you multiply these two variables, you will get the amount of cash you earn (in euro’s I recommend).
Confabio.com
This idea is not a 20, but it could be at least a 5. It happened while I was doing two things:
- communicating a lot overseas, for work and private matters
- playing around with my hot sexy black macbook (named burakubuuku) and its cool built-in webcam
- playing around with Adobe flex
This let me to make a little website that does the following:
- Show your head on the screen
- Recording your nasty head and broadcasting it to a Red5 server
- Showing your head again by streaming that same broadcast
Confabio will be as simple as possible. First it was intended to be an arty farty project, but I think it can actually be useful. When a person goes to confabio.com he will see his own video. When another internet user opens confabio.com a second video-stream will be displayed, a third, fourth and so on. Of course, the more people join, the smaller the screens become.
I haven’t really looked at so-called competitors yet and I don’t intend to. The only features I’m willing to add is:
- displaying your location discretely under your video-stream
- having tags/channels/rooms: confabio.com/dominiek_room, confabio.com/cooking
Techniques
Mochiron I will use RubyOnRails, but most of the stuff I will need to use flash for.
- RubyOnRails will facilitate metadata to swf: geo-location information, IP
- RubyOnRails will use the weborb plugin to facilitate serverside data facilities for the swf app
- Adobe Flex will be used to compile .swf files (using .as, .mxml and .css files)
- Red5 is a Java server that facilitates video play/publish streams (Note: I really hate this piece of software because it solves a solution and not a problem, checkout these 5 pages hello world)
- Adobe Flash Player > 8 is needed in order to access microphone and webcam
Why will Confabio.com work?
I don’t know and I shouldn’t care. Primary focus should be: getting this working and actually start using it. The factors that could make the execution a success:
- Simplicity, this is relatively not-complex as long as we don’t add too much features
- the need to use it, I really need a tool like this so I will start using my own product
Why will Confabio.com not work?
There is no business-model yet, and I think a nice extra in this project will be a bandwidth monitor to project expenses. Also, we need a good design and perhaps a better name.
the Interdependency Stage of the Web
on March 09, 2007note: I wrote this article in January and I just found it hidden deep down my backpack
I think that one of the mayor success of the new Web is thanks to open communication, open standards and open information. Opening up information has given rise to the Community websites. The meaning of Community is ‘People that are grouped together and share’. Opening up information and communication is the lubricant of this sharing process. But I think there is more to this then just social websites.
Apart from open communication, the current Web 2.0 hype offers a nice example of ‘openness’: often unique identifiers to information are human-readable. For example an URL like http://website/buy_product/frogpad is much more trustworthy than: http://website/rq_handler.asp?corporate_id=672&transaction_confirm=true&request_id=2. These little details of openness generate trust and I think unconsciously attracts users.About two years ago I started reading a self-help book called ‘The Seven Habits of Highly Effective People’. This book contains some nice principles/habits of achieving success in life. Success, in this case, means both material, mental and spiritual success. One of the core principles of the book is achieving ‘Interdependence’. In order to become interdependent, one must first become independent (rise from co-dependency). Only after becoming independent we can start reaping the benefits of synergy: The whole will become more then the sum of it’s parts.
Websites are starting to offer information loosely, for example trough RSS, Webservices and other data offerings. They not only offer data, but they also create new data from multiple offerings. I think this loosely coupled way of exchanging information can be seen as the first steps of the internet becoming interdependent. The early web was static and the only form of interdependency was exchanging links (which nonetheless made it a tremendous success!).
I think it’s time we start examining the impacts of interdependency on humans and their systems. We can then use this to anticipate the Web’s needs and innovate accordingly.
Highly Effective People 2.0
on January 28, 2007Since my last school days are numbered – last week I had a 9 out of 10 for my final thesis – I’m preparing to get more organized. Even though this week is about celebration and slacking off, I will have to prepare for a busy and volatile life.
That’s where Backpackit comes in! This online tool let’s you organize loosely and integrate it with email/sms/calendar. So far it works pretty neat for my situation. Loose and principle based which backs the philosophies of The Seven Habits of highly Effective People
This Friday I will move to Tokyo for two months where I will do: projects, job hunting and learn Japanese.
Code update: www.darkwired.org IRC 2.0
on January 19, 2007Since my domain darkwired.org is only used for IRC (chatting) I decided to write a web interface. The Web interface uses the EyeRC API and is built on RubyOnRails.
Currently, you can join our test channel on http://www.darkwired.org/ :

I have no intention to develop it any further, however, I might take some time to finish it off (sourcecode):
- refactoring
- unit test case
I hope this was the last IRC client I ever wrote :)
(related code: BASH IRC Client )
the Webemoth
on January 15, 2007Before Oreilly dropped the term ‘Web 2.0’ there was no Web 1.0. At this point, people are discussing the definition of Web 3.0 – while we don’t even have a definition of Web 2.0.
In his article on KurzweilAI, Nova Spivack outlines the characteristics of all three versions. All of them are valid observations I think. However, Web3.0 observations like mobile internet access, broadband adoption, SaaS, open communications etc. can also be done in the Web2.0 and Web1.0 eras.
I don’t think we’re entering the ‘third generation’ because there can be no clear definition of ‘the new version’. I think the term Web2.0 was just a joke (and intended so by Oreilly), funny, ha ha, over. Let’s just drop the dot and the whole versioning system altogether. The evolution of the Web is real however, and it’s dead serious.
The post 2006 Web – I like to call it ‘the Webemoth’ – is growing and shaping itself in numerous trends. In the Herald Tribune, John Markoff observes the following ‘next steps’:
- natural language understanding
- machine learning
- the semantic web
- data mining
I think these trends – when added to our current trends – will indeed move us towards a better web.
Natural Language Parsing is something which is popular among some of the latest ‘top secret’ start-ups like powerset.com. I also think it’s something very interesting and I bet Google has been working on it for quite some time. NLP is already being done in high-end systems like NSA’s echelon to keep track of our emails here in Europe. Just like powerful encryption, it’s the next thing to come in hands of ordinary cattle like us.
Machine Learning is something being done all over the world – like for example at your credit card company – and will advance the web enormously. Even though, some of the top ranking websites are already harvesting the power of pattern recognition for music and book recommendations (pandora.com, amazon.com). In my country there are a lot of artificial intelligence studies available and I know a lot of people attending them.
The Semantic Web is something we are missing now. Many websites are already making advances to this semantic web below the radar. Every day there are numerous new so called mash-ups that stitch together data from various sources. In order to do something useful with this data they need to know the semantics. Sites providing API interfaces like Wikipedia and Amazon.com offer only limited semantics. Sometimes they are missing and then developers switch to data mining techniques like scraping (parsing the user interfaces to ‘pick out’ the data).
I must say I’m a bit skeptical about the move towards the Semantic Web in the short run. Semantic – semantikos, giving signs – refers to the meaning of information. Current .com behemoths like youtube.com profit from the user contributed information. More specifically, they profit from displaying this information but not from the information itself (at least, not yet). Sometimes I watch some TV at youtube like sites and I guess I’m a very bad customer: I block the advertisements (so I guess they should move the adds to the stream itself). This is my skepticism:
- Are there business models that can profit from just providing semantics?
- If there are, will they affect the semantics/content itself?
The Web in 2007, Hype Down, Potential Up.
on December 29, 2006ReadWriteWeb has just posted their Web predictions for 2007 [1]. I think it’s nice that they omitted ‘2.0’ in that title. Just like in the first bubble, the Web has two types of trends, the potential and the hype. This has several consequences for 2007.
Andreas Kluth writes that the Web 2.0 hype will die down in 2007 [2]. Thanks to private equity, this bubble burst will be fairly discrete compared to the 2000 burst [3]. I think this will happen too, especially since US’ economy is becoming less optimistic. The first ones to go are probably some of the 400 social-networking alternatives or rounded-cornered BullshitR websites. However, this is just the ‘hype’ part of Web 2.0, not the potential/abstract part.
‘Don’t bet against the Internet’ is what Eric Schmidt advices. In his article in ‘the Economist’ he states: “The past few years have taught us that business models based on controlling consumers or content don’t work. Betting against the net is foolish because you’re betting against human ingenuity and creativity“ [4]. Indeed we have seen profound results of social-content websites like Wikipedia [5], another example can be found in the defeat of portals by search [6]. I think this is the ‘potential’ part of the new web, which can be summarized by a set of ground principles:
- clouds rather than trees
- simple rather than complex (KISS, Keep It Simple Stupid [7])
- loose rather than tight
- open rather than closed
references:
- [1] http://www.readwriteweb.com/archives/biggest_web_trend_2007.php
- [2] “When the hype dies down”, The Economist, 21st edition, page 26
- [3] http://en.wikipedia.org/wiki/Dot-com_bubble
- [4] “Don’t bet against the internet”, The Economist, 21st edition, page 124
- [5] http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
- [6] “The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture”, John Battelle
- [7] http://en.wikipedia.org/wiki/KISS_principle
Making Content Smarter, using the Web's Collective Knowledge.
on December 12, 2006My English vocabulary is relatively small since it’s not my native language. Often I read materials on the Internet and sometimes I have to lookup some terms. If I wouldn’t lookup these new words, I will never learn them.
Secondly, it often happens that I know the meaning of some concepts, but I would like to know more about them. Lookup everything takes a while, which will disturb your mental model while reading.
This is why I started working on a little project to embed the knowledge of the Web in the reading process. I’m developing this thing in a pragmatic way using RubyOnRails. The main objectives are: I’ve just finished these objectives, which required writing a little HTML parser and an interface to use the Python NLP toolkit (which is far more superior then Ruby’s).
- Providing the content with a simple piece of code that can make any piece of HTML smarter.
- Using Natural Language Parsing (NLP) to pick out the important words.
- Using Princeton’s WordNet to explain basic concepts.
| NLP screenshot | Wordnet screenshot |
At the moment I’m refactoring the code so it can be distributed as a RubyOnRails plugin. For the near future the following features are on my TODO list:
- Interfacing with the Wikipedia encyclopaedia.
- Detecting concepts like ‘Software Engineer’ rather then detecting ‘Software’ and ‘Engineer’.
- Looking for alternatives forms of user/reading interaction.
Current code: laboratoire_nuage-101206.tar.gz (or browse)
Requires: Python-NLTK, Ruby-Linguistics, Ruby-WordNet and RubyOnRails

