Gmail, another Scalable Service for Startups
on December 31, 2007I think it's getting more and more easy to start up your own knowledge-driven venture. For a couple of years now, I have been managing my own server (together with my friend). Managing your own server is a pain, but a must if you are a tech entrepreneur.
Luckily, lately there have been services popping up that alleviate a lot of IT-pains. This week, I have been looking for services that allow me to externalize the hosting of my Email service. A few services like SproutIt's Mailroom came close, but didn't quite cut it.
Then, I found out, that I can actually attach my domain names using Gmail! Also, I can use IMAP, POP etc. I now have most of my domain names attached to it, and I pay JACK SHIT!

All thanks to another great article from ReadWriteWeb.

Wigitize.com, widgitize any web feed
on December 31, 2007Enjoy the Holidays, Drink 2.0
on December 28, 2007For me, an important part of enjoying these holidays is having a nice drink. And what better than to go drinking 2.0!
I'm trying to input the wines I'm drinking into snooth.com, a wine recommendation website. Based on how I rank certain wines, it will recommend me others.
Also, if you're new to wine (like me), checkout this guy:
http://tv.winelibrary.com/new_to_wine360+ web episodes about drinking wine, their slogan: 'changing the wine world'. Very interesting indeed!
What's wrong with this Shampoo design?
on December 24, 2007In the hotel I’m staying at now there are the usual little soap items. They claim to be Italian and therefore they have the text in both Italian and English:

Apart from the bad coloring, there is an obvious flaw in the most right one (the Shampoo): it says shampoo twice. Now, I know I’m really nitpicking this one thing and I might be wrong, but I really think the philosophy behind this is wrong. The current shift to value in attention requires us to limit the amount of information we display. That means radically stripping out all the unnecessary information. Also, the age of autistically organizing things into hierarchies if over. We contextualize now, so shampoo is just shampoo – not shampoo:shampoo because ‘we HAVE to show both languages’.
Perhaps I’m totally wrong and my mind is just corrupted with the programming heuristic of Don’t Repeat Yourself.
Another reason to love my Macbook
on December 17, 2007After upgrading to the new Mac OS X Lehopard I can browse the network in Cover Flow on acid.
For PC’s they chose an excellent icon:

ミクシィでiKnow日記が見れるよ!
on December 17, 2007iKnowの日記を自分のミクシィ日記に設定することが出来ます。
Did you know you can set iKnow as a journal on Mixi?
ミクシィで設定変更をクリックして下さい。
Click on settings when logged into your Mixi account.
設定変更のページで「日記・ブログの選択」と「日記・ブログのURL」と「RSSのURL」を入力して下さい。
On the settings page, fill in “blog selection”, “blog URL” and “RSS URL”.
「dominiekth」は私のiKnowユーザーネーム。
Replace “dominiekth” with your own username.
That’s it!

AJAX snippet: Blank out a Div with a Spinner
on December 16, 2007While doing some quick prototyping for one of my projects I was building a week, month, year period selector. When loading stuff with AJAX it is important to indicate to your users that you are actually refreshing a piece of data on your page. There are many ways of indicating this load, but for now I just wanted something quick and useful (without requiring any strings that might have to be translated in the future):
example data:

when loading:

I meshed together a quick piece of Javascript and CSS since I couldn’t find anything out there. If you know something that does this, please let me know! (Since my code is obviously quick and dirty and only tested on Firefox).
Put this in one of your Javascript files (in case of rails: application.js):
1 2 3 4 5 |
function spin_div(div_id) { container = $(div_id); positioning = 'top: '+container.offsetTop+'px; width: '+container.offsetWidth+'px; height: '+container.offsetHeight+'px; '; container.innerHTML += '<div class="spin_div" style="position: absolute; ' + positioning + '"></div>'; } |
And here is some CSS which you can customize (spinner.gif is a generated spinner from ajaxload.info):
1 2 3 4 5 6 7 |
.spin_div {
background: #fff url('/images/spinner.gif') no-repeat center center;
opacity: 0.75;
filter:alpha(opacity: 75);
-moz-opacity: 0.75;
-khtml-opacity: 0.75;
}
|
Now when using Rail’s or prototype’s AJAX routines, just pass spin_div as a onLoad parameter:
1 |
link_to_remote('label', :url => takes_awhile_url, :loading => "spin_div('container_id');") |
The great thing about this method is that the spinner get’s automatically destroyed when the content of the container is refreshed.
Radical Transparency and Web Integration
on December 15, 2007Twitter is great, and this small 5 million $ company is growing like crazy. The core of their service is inherently simple: blogging/chatting with no longer than 150 characters. They opened up all of their API’s and Twitter is now flurishing with activity. Hell, we even integrated it in our language learning platform: http://iknow.co.jp/ (more to come) It’s only the first step however, these services could open up even more!
iKnow is a service that specializes in online learning and therefore the SNS part of the site is nevertheless important but still secondary. Right now in our service iKnow you can upload a picture together with your journal entry. It would be nice if we could provide more picture uploading and managing functionality, but we don’t want to build that. Flickr specializes in these things and would make a nice addition. Unfortunately, a service like Flickr doesn’t provide a real transparent API yet – people still need to register for a Flickr account.
Also, it would be nice if could provide status updates for all people on the website using Twitter – but people still need to go through the registration procedure at Twitter.com to be able to use it. I think the next success in web integration lies in opening up your API’s to an extend where it is completely transparent to use them. You don’t need to worry about registering at the third party service.
But what about the revenue? If people don’t come to your website anymore, you cannot get any advertising money! That maybe so since most sites still rely on people being exposed to banners/adwords on their website. These things will change however:
- there will be in-content advertising, an example is the already emerging in-video ads
- the freemium model will also be applied to API’s. A free API is for personal use, a premium API is for integration use in other web services. This premium API will not require you to let people pre-register at the very least.
The big picture really makes sense: online services will be more specialized. Right now you could imagine that photo management is done by something like flickr/picasaweb, status updates by twitter, music integration by last.fm etc. But things could fragment even more when there is an open integration market: facebook-style wall service, embeddable message/mail service, tagging service, rating service or image cropping service. An example of the latter is PicNik a service recently integrated into Flickr.
sidethought: will this kill branding?
Quick and Dirty Rails Optimization Guide
on December 11, 2007these are quick notes I spontaneously ranted down about my experience with rails and making it perform
One of the reasons I am working on this current project here in Tokyo is because I can experience the hardships that come with user growth. Apart from learning how to actually get a project to take off, it is also interesting what to do when it actually does!
When we had our first growth spikes we had a lot of people using the system at the same time. Being a learning system, we have the disadvantage of having a lot of data intensive processing. This article is about the code optimization part of a rails project rather than the systems part (which is another chapter, properly described in articles as these)
Finding the slowest requests
There are several tools to spit through the production logs ot find out what the slowest pages are. These pages you will have to tackle first.
Depending on your system load (high mem / cpu / db), you might want to prioritize render-heavy over db-heavy pages, but from what I’ve seen most of the first optimization steps are in DB-heavy pages.
Optimizing a request / page
Render heavy, Database heavy? What are you talking about? Basically, when you look at the mongrel development log load time is separated into two categories: render time and db time. Render time is simply the time it takes excluding calls to the database.

When you have a page that has a high render time, but a low percentage of database time it means that a lot of time is spent on calculations or moving around data. With these pages you have to make sure that:
- Have no code that blocks the request (HTTP/network calls, external commands, Disk IO). This code should be moved to it’s own background worker.
- Don’t have too many ActiveRecord code that loads big chunks of data. These calls will appear as having a low DB load, but in fact use up a lot of CPU and memory. Only load the data you display.
When optimizing individual pages, this is my way the way to go:
- run mongrel_rails on your powerful development machine
- this might be controversial, but…. LOAD THE ENTIRE PRODUCTION DATABASE. I’m not kidding. This will give you benefits in terms of optimization but also for the usability aspect of developing (might be an extreme literal example of a getting real chapter). However, I do recommend that you make a ‘rake db:make_developer_friendly’ task that will obfuscate the private user data.
- open up a terminal and run ‘tail -f log/development.log’. That way you can take a good realtime look at all the stuff happening when a request is done. Hit enter a couple of times to create a visual separation between requests :)
ActiveRecord is killing with a thousand cuts
ActiveRecord is a great thing, but when it comes to performance you have to keep it in check. (Even when you’re in production I think it still adds great value!). Stuff like blog_entry.user.username looks quite innocent, but when you have a listing of 100 blog_entries, you’re screwed (it will do a query to load the belongs_to :user relationship everytime this is called in the listing, so you will have another 100 queries to your 1 HTTP request).

You can combat this by preloading and customizing ActiveRecord loads. In the case of blog_entry.user.username, you could do a BlogEntry.find(:all, :include => :user) which will preload the user belongs_to, however this might be inappropriate:
- :include doesn’t preload polymorphic associations
- :include doesn’t play well with :joins/:select yet
- if you only need the username, don’t load the whole user
I’m not sure if I remember correctly, but sometimes it is actually faster to don’t preload at all, but just use the ActiveRecord craze.
Explain these slooow queries
When you have one of those queries that take more than 0.03 secs, you might want to analyze it a bit.
In my current project there are MySQL pro hired-gun consultants that go very far with this stuff, but it’s always good to know a few of their tricks yourself:
Open up your MySQL client and start executing it on your production data copy:
- Always put SQL_NO_CACHE after your SELECT statement, this will make sure you aren’t looking at MySQL cache load-times.
- Put the ‘explain’ statement in front of your query to look for big integers which might mean that you’re missing an index.
- Geez, this output looks fucked! Yes, put \G at the end of you’re pipe characters are gone.
Appropriate indexes should be set up in an early stage. Adding and removing an index can take up to hours when you’ve accumulated a lot of data!
SQL caching is great, but totally useless for datasources that change by the second, an SQL_NO_CACHE can be faster in those cases. For those places that ARE suitable for SQL caching, make sure you don’t work against it. SQL caching needs queries to be always the same.
1 2 |
('created_at > ?', 5.days.ago.utc) # Not SQL cached ('created_at > ?', 5.days.ago.beginning_of_day.utc) # SQL cached! |
In some cases you might be pulling in data of a restricted subset of parents. For example: You want to get all the messages posted by the users that belong to a certain group with x conditions. In those cases, it might be faster to actually retrieve the id’s of all those users in one query. And doing a second SELECT with a giant “user_id IN (?)” condition.
Fragment cache the hell out of it!
You can lower the load of your pages by fragment caching certain area’s in your views. A fragment cache works like this:
1 2 3 |
<% cache_method(identifier) do %> your code here <% end %> |
Code in that block will only run once until clear_cache_method(identifier) is called.
There are several ways of clearing these fragments:- Clearing it on specific places during the execution of alterations. This requires specific knowledge of the behavior.
- Clearing it whenever a change is made to an entity/model. You can use cache_sweepers (observers) for that.
- Clearing it periodically with a cronjob. This is useful for when behavior is very complicated.
There is one rails good-practice guideline that plays very well with fragment caching: Fat models, Skinny controllers.
1 2 3 |
<% cache_method(identifier) do %> <% @newest_users.each do %> ... # no DB calls cached here!! |
The instance variable @users is populated in the controller, making the cache nothing more then a HTML cache. What you should do, is MAKE SURE THAT THE DATABASE CALL IS DONE IN THE VIEW.
PHP users will go insane now. What? Database queries in the view? Are you an amateur? Well, it’s actually quite elegantly tucked away in the model:
1 2 3 |
<% cache_method(identifier) do %> <% User.newest.each do %> ... |
No code in the controller, all execution in the fragment. Yeah!
Join the summaries!
When you have a system with very complicated datasets you will need big queries with a lot of joins. To improve performance you can ‘denormalize’ the database – making the structure more simple. But sometimes you can’t. What you CAN do and probably have to do, is summarize that data so it can be accessed quickly.
Finding the right architecture for a summary table took a few trials and errors. At the moment this is the way I roll with this:- add a AR model MyEntitySummary
- add a class method MyEntitySummary.full_regenerate (truncate table and insert all my_entity_summaries)
- add a class method MyEntitySummary.update_for(my_entity) (update one row of my_entity_summaries)
- make sure that both are using the same pieces of SQL (DRY)
- the first time you migrate, call MyEntitySummary.full_regenerate
And now comes the tricky part. Preferably, you only want to call update_for from now on. full_regenerate is only for the first time or emergencies. You can call full_regenerate on an after_save or an observer (preferably through backgroundrb)
You might tempted to put full_regenerate in a cronjob and run it every hour. Only do that when it’s really necesarry since it will cause big load spikes on your servers. Also, we have had some troubles with table locks etc.
Size, count, length?? Cache that count
As you might know, these methods have different behavior when running them on an association. For example user.blog_entries.length will pull in the full data set and return the size of that data set. user.blog_entries.count on the other hand will just do a count query without pulling in any data.
I could show you a nice table of when to use what, but I’m not going too. Actually, I’m not so sure anymore since I’ve seen some weird stuff lately. Basically, you don’t use length unless you know what you’re doing. I like size, but to be sure I just use count.
If you have a lot of count queries or you want to join in a count for a big query, you might want to take a look at counter_cache. Documentation for counter_cache can be find nowhere, so I will tell you briefly:
- counter cache stores the count in the parent that has many
- this count is stored in an SQL field and should ALWAYS BE AN INTEGER (I wasted some time on that, rails will not say anything)
- In the example of user.blog_entries, you have to open the BlogEntry model and add: belongs_to :user, :counter_cache => true
- All you need to do now is add a migration saying: add_column :users, :blog_entries_count, :integer, :default => 0
- If you are testing properly, you WILL get failing tests now :-)
C’est tout
This is just a small set of things you can do to get better performance. Some of it might be wrong or idiotic since it is based on my own trial and error experience (only way I learn). I hope you can use this to solve your performance luxury problem soon ;-)
Dominiek.com, Now with Extra Data
on December 02, 2007This week, a cool Japanese coder reminded me of the importance of blogging: it is the best resume out there. Also, it is you permanent footstep into the eternal virtual world.
This week I have been playing around with the JSON feeds that twitter offers. This is how I got the 'What am I doing now?' in the top here. I also did it for delicious. I am very excited about these JSON widgets and I will soon start coding and blogging more about it. Too bad last.fm doesn't have them yet.
Soon, I will also start blogging about the project I'm consulting for:
