The World’s Leading Microsoft .NET Magazine
   
 
The .NET Addict's Blog

My Top Tags

                                                           

My RSS Feeds








Latest Diggs - Programming

Internet Blogs - Blog Top Sites

Site Hits

Total: 2,242,369
since: 19 Jan 2005

Whitepaper : In Search of the Perfect Tag Cloud

posted Fri 11 Aug 06

Lately I have been working on several social software projects. One of which required me to display a tag cloud. For the Web 2.0-challenged, a Tag Cloud is essentially just a weighted list. In the case of most social bookmarking applications, the weight is typically the number of bookmarks associated with a given tag.
 
There are two main problems that I encountered when trying to build the perfect tag cloud. The first was with data volume. The more tags you have, the more you need to accomodate within the summary clouds. The problem is that if you have to scroll down to see the rest of the cloud, the relative weight of tags within the cloud loses that ability to immediately grasp meaning. The other problem is with font size distribution: how do you create the perfect distribution of tags across a fixed number of rendered font sizes? To deal with these questions I considered several approaches for limiting and filtering data and other algorithms for font size distribution such as linear algorithms, logarithmic algorithms, and even k-Means clustering.
 
As a result of my research into building tag clouds, I've created a Whitepaper : In Search of the Perfect Tag Cloud. Its a pretty hefty paper that comes in around 13 pages in PDF format so I can't blog the whole thing. However, I have uploaded the Whitepaper to my blog host and you can download that PDF here.
 
I'd love to know what people think of the analysis I've done so far. Hopefully some of the information contained in the Whitepaper will be useful to some people and will help them get further along in their Web 2.0 / social software projects.

Again, click here to download the Whitepaper PDF.

tags:              

links: digg this    del.icio.us    technorati    reddit




1. Don Libes left...
Tue 15 Aug 06 5:39 pm :: http://www.libes.com/don/blog

Nice paper but it kind of fell down at the end where you covered clustering algorithms. Surely, there's a fast heuristic clustering algorithm that while not perfect is good enough. But your conclusion is probably fine too - logarithmic is 'good enough.'

Two other points: (1) I think you should talk about clouds in a larger sense though. They're not just for social bookmarking. (2) Sometimes we care less about the common tags and more about the not-so-common tags. This is particularly important when using systems that intuit the tags by analyzing the text. For example, a cloud about US government material would prominently feature tags like "US" "government" "federal" etc but those are pointless to have. Sometimes, it's more important to be able to see the tags that are only mentioned a few times. That isn't handled well by any of the approaches you mentioned (except for the ones you considered as failures - and I agree with you on those).


2. Kevin Hoffman left...
Tue 15 Aug 06 7:22 pm

The algorithms that I discuss are purely about font size distribution. Filtering the data that goes into a cloud is relatively trivial. I am currently using the logarithmic cloud algorithm to "cloud" popular tags, not-so-popular tags, my tags, someone else's tags... you get the idea.


3. Dick left...
Wed 18 Oct 06 6:37 am

Kevin, it might be worth checking that algorithm for the log(size). The original pseudo code at echochamber.code was incorrect. In a quick test for me it looked like it just goes back to linear, plus the top band never deviates etc...


Tag Related Posts

Gin and the Cognitive Surplus

Fri 09 May 08 12:55 P GMT-05

Microsoft unveils an MVC framework for ASP.NET

Mon 08 Oct 07 12:58 P GMT-05
tags:      

Yet another half-app from Google

Tue 18 Sep 07 7:08 P GMT-05
tags:      

Astoria and the Semantic Web

Mon 16 Jul 07 3:47 P GMT-05

Microsoft Volta - just another codename?

Wed 11 Jul 07 2:43 P GMT-05
tags:        

SuiteTwo Debuts

Thu 19 Apr 07 2:53 P GMT-05

Web 2.0 - I've had it all wrong!

Mon 30 Oct 06 8:51 P GMT-05
tags:    

MySpace: We don't need Web 2.0

Fri 15 Sep 06 11:29 A GMT-05

Review of Diigo

Wed 09 Aug 06 12:20 P GMT-05

Scrobbles, Diggs, Flickrs and Tags Oh My!

Tue 01 Aug 06 5:25 P GMT-05