Short URL’s

Microblogging makes this important. There are a bunch, but I prefer http://is.gd because its five characters long.

Services

Other’s I’ve found and used include

A number of people publish top 10 lists, you can use google to find them.

Problem definition

I am interested in writing or implementing a url-shortener, for use inside the firewall, or implementing through an API. It seems quite simple until you consider performance at scale. My first thought was to use a database with an identifier/sequence data type as the key to the hashing algorithm, since its best to keep a record of the URL’s issued so if people ask twice for the same url it can be reused.

The second part is resolving the URL on request. Is this just an apache solution with a very large redirect file?

Some research

I used google: url shortner howto, to see what it came up with.

Two interesting finds included

Ideas

Idea No 1 was to use a database identifier type and hash it, I was going to investigate if we could use a web server redirection file, or if I’d have to write something using a db retrieval. However both these the database transactions introduce serial bottlenecks and inhibit scalability.

So can pre-allocate the sh.urls in blocks , and allocate them using hadoop? Is hadoop a sensible solution for “find me next unused”, or perhaps it doesn’t matter; its just find me one unused slot?

Related Posts

  • Apache Flume So I need to check this out. Here are the two links I'd start with. Welcome to Flume […]
  • Apache Here are some notes on using apache. This talks, a bit about Access Control and […]
  • Apache as a Proxy Server for Snipsnap This article describes my efforts to implement the configuration described at […]
  • LAMP & Ubuntu This should be easier, but I think made harder because I corrupted my Firefox […]
  • Hadoop The Apache Hadoop project develops open-source software for reliable, scalable, […]

Leave a Reply

Your email address will not be published. Required fields are marked *