21 November 2010

A Simple URL Shortening Algorithm in JAVA

We have so many url shortening services available today, I am not sure what kind of algorithm they use to shorten a particular url. Given the limitations over the characters which can be used in a url it becomes pretty much obvious that we are limited to 62 alpha numeric chars i.e. [a-z 0-9 A-Z]. Though - (hyphen) and _ (underscore) are allowed in a url still we want to avoid them for many good reasons. Very obvious would be a bad looking url like http://xyz.com/c0--rw_ or http://xyz.com/______-.
Following is the simple implementation of base10 to base62 converter, that's all we need to shorten a url. With 62 chars and a unique string 7 char long we can shorten:
627 = 3,521,614,606,208 urls
that's a lots of urls.

How shortening works in the present case:

Suppose you have a table with following columns:
1. unique auto increment id (long),
2. url (string),
3. base62 string (string)
Now the trick is that we convert unique id to base62 string not the url, and then the url is mapped to the unique id. For example if we want to shorten the following url:
http://news.xinhuanet.com/english2010/world/2010-11/18/c_13612801.htm
First we need to look for the last unique id in the table then add 1 to it and convert the resulting number to base62. Suppose last unique id was 678544325 now the next id 678544326 will be mapped to the above url and base62 of a 678544326 will be:
45*624+57*623+6*622+23*621+20*620
means a five char url, having following array indexes {45}{57}{6}{23}{20} in
String[] elements = {
                "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o",
                "p","q","r","s","t","u","v","w","x","y","z","1","2","3","4",
                "5","6","7","8","9","0","A","B","C","D","E","F","G","H","I",
                "J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X",
                "Y","Z"
                };
which will give a base62 string: JVgxu and a shortened url can be http://xyz.com/JVgxu

Following is the java code to convert a number to base62 string
/**
 * @author vikasing
 *
 */
public class Base62Converter {
    private final int LENGTH_OF_URL_CODE=6;
    public String convertTo62Base(long toBeConverted)
    {
        String[] elements = {
                "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o",
                "p","q","r","s","t","u","v","w","x","y","z","1","2","3","4",
                "5","6","7","8","9","0","A","B","C","D","E","F","G","H","I",
                "J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X",
                "Y","Z"
                };
        String convertedString="";
        int numOfDiffChars= elements.length;
        if(toBeConverted<numOfDiffChars+1 && toBeConverted>0)
        {
            convertedString=elements[(int) (toBeConverted-1)];
        }
        else if(toBeConverted>numOfDiffChars)
        {
            long mod = 0;
            long multiplier = 0;
            boolean determinedTheLength=false;
            for(int j=LENGTH_OF_URL_CODE;j>=0;j--)
            {
                multiplier=(long) (toBeConverted/Math.pow(numOfDiffChars,j));
                if(multiplier>0 && toBeConverted>=numOfDiffChars)
                {
                    convertedString+=elements[(int) multiplier];
                    determinedTheLength=true;
                }
                else if(determinedTheLength && multiplier==0)
                {
                    convertedString+=elements[0];
                }
                else if(toBeConverted<numOfDiffChars)
                {
                    convertedString+=elements[(int) mod];
                }
                
                mod=(long) (toBeConverted%Math.pow(numOfDiffChars,j));
                toBeConverted=mod;                
            }
            
        }
        return convertedString;
    }

}
Above code is part of the project Crowl on which I have been working for a while. File can be browsed under org.crow.utils package.

Update: found this precise code for base62 conversion on the web:
public String converter ( int base, long decimalNumber)
 {
  
   String tempVal = decimalNumber == 0 ? "0" : "";
         long mod = 0;

         while( decimalNumber != 0 ) {
             mod = decimalNumber % base;
             tempVal = baseDigits.substring( (int)mod, (int)mod + 1 ) + tempVal;
             decimalNumber = decimalNumber / base;
         }
         System.out.print(tempVal);
         return tempVal;
 }

I didn't check the performance of the above code but it is smaller than the first one but both give the same output.

8 comments:

  1. Hi Vikash, good article.
    I have a doubt about converting a URL to a shorter one. After conversion, if we post this URL (http://xxxx.com/Au7jP for example) to a service (Twitter for example) and somebody click it, where the user should be take to?
    When rendering the shortened URL, should it be done between using the correct link? Something like that: "http://xxxx.com/Au7jP"

    Thanks guy!

    ReplyDelete
  2. @Gustavo
    Ya that's correct, it should be done between redirecting to the correct link by the url shortening service. Imagine a MySQL table with 2 columns:
    ---------------------------------
    UniqueID | URL
    ---------------------------------
    Au7jP |somedomain.com/1222

    your program (shortening service e.g e.co, goo.gl) will look for the id Au7jP and will redirect to somedomain.com/1222.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Hi Vikash,

    Good one. This meets my requirement exactly. Thanks. Between how could we confirm that, It will always return the unique string. I am very much afraid in this point. Because this is the core part if we implement a shorten URL using this.

    Thanks,
    Matthew stephen.

    ReplyDelete
  5. Ji,

    just want to say that he 2 method do give different results if number to convert <=62.

    Stefan

    ReplyDelete
  6. I read quite a few tutorials before this one for this topic but I have to say nothing is as simple to understand and clearly written as this one. Keep up the good work.

    ReplyDelete