21 November 2010

A Simple URL Shortening Algorithm in JAVA

We have so many url shortening services available today, I am not sure what kind of algorithm they use to shorten a particular url. Given the limitations over the characters which can be used in a url it becomes pretty much obvious that we are limited to 62 alpha numeric chars i.e. [a-z 0-9 A-Z]. Though - (hyphen) and _ (underscore) are allowed in a url still we want to avoid them for many good reasons. Very obvious would be a bad looking url like http://xyz.com/c0--rw_ or http://xyz.com/______-.
Following is the simple implementation of base10 to base62 converter, that's all we need to shorten a url. With 62 chars and a unique string 7 char long we can shorten:
627 = 3,521,614,606,208 urls
that's a lots of urls.

How shortening works in the present case:

Suppose you have a table with following columns:
1. unique auto increment id (long),
2. url (string),
3. base62 string (string)
Now the trick is that we convert unique id to base62 string not the url, and then the url is mapped to the unique id. For example if we want to shorten the following url:
http://news.xinhuanet.com/english2010/world/2010-11/18/c_13612801.htm
First we need to look for the last unique id in the table then add 1 to it and convert the resulting number to base62. Suppose last unique id was 678544325 now the next id 678544326 will be mapped to the above url and base62 of a 678544326 will be:
45*624+57*623+6*622+23*621+20*620
means a five char url, having following array indexes {45}{57}{6}{23}{20} in
String[] elements = {
                "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o",
                "p","q","r","s","t","u","v","w","x","y","z","1","2","3","4",
                "5","6","7","8","9","0","A","B","C","D","E","F","G","H","I",
                "J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X",
                "Y","Z"
                };
which will give a base62 string: JVgxu and a shortened url can be http://xyz.com/JVgxu

Following is the java code to convert a number to base62 string
/**
 * @author vikasing
 *
 */
public class Base62Converter {
    private final int LENGTH_OF_URL_CODE=6;
    public String convertTo62Base(long toBeConverted)
    {
        String[] elements = {
                "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o",
                "p","q","r","s","t","u","v","w","x","y","z","1","2","3","4",
                "5","6","7","8","9","0","A","B","C","D","E","F","G","H","I",
                "J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X",
                "Y","Z"
                };
        String convertedString="";
        int numOfDiffChars= elements.length;
        if(toBeConverted<numOfDiffChars+1 && toBeConverted>0)
        {
            convertedString=elements[(int) (toBeConverted-1)];
        }
        else if(toBeConverted>numOfDiffChars)
        {
            long mod = 0;
            long multiplier = 0;
            boolean determinedTheLength=false;
            for(int j=LENGTH_OF_URL_CODE;j>=0;j--)
            {
                multiplier=(long) (toBeConverted/Math.pow(numOfDiffChars,j));
                if(multiplier>0 && toBeConverted>=numOfDiffChars)
                {
                    convertedString+=elements[(int) multiplier];
                    determinedTheLength=true;
                }
                else if(determinedTheLength && multiplier==0)
                {
                    convertedString+=elements[0];
                }
                else if(toBeConverted<numOfDiffChars)
                {
                    convertedString+=elements[(int) mod];
                }
                
                mod=(long) (toBeConverted%Math.pow(numOfDiffChars,j));
                toBeConverted=mod;                
            }
            
        }
        return convertedString;
    }

}
Above code is part of the project Crowl on which I have been working for a while. File can be browsed under org.crow.utils package.

Update: found this precise code for base62 conversion on the web:
public String converter ( int base, long decimalNumber)
 {
  
   String tempVal = decimalNumber == 0 ? "0" : "";
         long mod = 0;

         while( decimalNumber != 0 ) {
             mod = decimalNumber % base;
             tempVal = baseDigits.substring( (int)mod, (int)mod + 1 ) + tempVal;
             decimalNumber = decimalNumber / base;
         }
         System.out.print(tempVal);
         return tempVal;
 }

I didn't check the performance of the above code but it is smaller than the first one but both give the same output.