I attempted this(url shortener) project as a kind of a dare. Over a lunch one day, atul casually commented that a url shortener,
would be a ten minute work for someone like you and a blog about it would make a nice fit for my site.
I was not sure it’s a 10 minute work, but sounded like a small fun project for weekend attempts.
It took more than half a day, for me to be content(not happy, but just content) with the resulting work.
Nevertheless, it was time well spent. Here’s the link to the actual project.<a href=”https://github.com/emofeedback/urlshortener”>
Few clarifications about some choices(should i call it software engineering??):
- Redis because it’s extremely efficient, stays in RAM, and has some unconfirmed reports of being used to power china’s internal twitter clone.
- RAM based data base means lower latency, as opposed to a disk+ cache based database, but needs higher RAM.
- Used 5 chars, because, well the internet is big, and thought it’s a good number to cover most unique urls.(especially, when combined with the 78 chars allowed for url, i.e: 5^78) Thanks to a colleague’s suggestion for the idea, i was thinking about hashing/random before his suggestion.
- A hash map of shortened url-> original url and original url -> shortened url was kinda against my idea. I still keep thinking this is too much memory, we should find a different way. I think if I take away the use-case of already shortened url, being submitted for new shortening, should return, I can eliminate the original url -> shortened url hash map.
- To check if a new original url has already been shortened, the hyperloglog comes to the rescue. Just pass it through the HLL algorithm and see if it returns True and False.
- From the UI point, well i just put up a minimal html form to take a original url and return a json.
- Yet to do, add a proper fabric script to setup virtualenv, python packages, redis install and nginx config and restart nginx.
You can see it live here.