vikasing

05 May 2015

Word vectors using LSA, Part - 2

Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. [1] More about LSA can be found here and here. LSA uses Singular Value Decomposition (SVD), a matrix factorization method. For a given matrix A,

SVD (A) = U*S*V^T

In the current scenario matrix A is a term-document matrix (m terms * n documents). Visually SVD looks like this:

Unlike word2vec, LSA does not require any training. But it suffers from curse of dimensionality because SVD calculations get slower and slower as we increase the number of documents, i.e. size of matrix A. On a single machine it can take hours. The overall cost of calculating SVD is O(mn²) Flops. This means if we had m =100,000 unique words with n = 80,000 documents, it would require 6.4 x 10¹⁴Flops or 640,000 GFlops. At stock clock speed (4.0 GHz) my AMD FX-8350 gives around 40 GFlops. So it will take around 640,000/40 = 16,000 Seconds which is around 4 hours 30 minutes. [2]

In my previous post I had used 1.7 million sentences and 44 million words for training word2vec, i.e. if we run SVD on this large matrix, it might end up taking centuries on my machine. However SVD calculations on large matrices can be done using a large cluster of Spark. [3] [4]

Results

I kept the document size constant at 2500 and let the term size vary. In order to rank the terms in relation to query term I used cosine distance. This time along with named entities I also added the noun phrases. The data is the news articles from yesterday (4th May, 2015). Here is the vector for the first query "delhi":

[law_minister=0.34, jitender_singh_tomar=0.23, chief_minister=0.22, fake=0.21, arvind_kejriwal=0.21, degree=0.18, protest=0.16, law_degree=0.16, win=0.15, congress=0.15, aam_aadmi_party=0.14, incident=0.14]

Notice that the vector contains terms like chief_minister and law_degree which are not named entities.
Query for "chief_minister":

[arvind_kejriwal=0.34, parkash_singh_badal=0.29, today=0.28, delhi=0.28, mamata_banerjee=0.27, office=0.26, state=0.26, people=0.26, act=0.25, mufti_mohammad_sayeed=0.25, bjp=0.25, jammu_and_kashmir=0.25, governor=0.24]

The vector gives the name of all the chief ministers which were in the news recently. Same goes for the query "prime_minster":

[shinzo_abe=0.37, japanese=0.33, sushil_koirala=0.27, david_cameron=0.27, tony_abbott=0.26, 2015=0.24, benjamin_netanyahu=0.24, president=0.23, country=0.23, government=0.22, washington=0.22]

Lets look up for a person now, "rohit_sharma":

[mumbai_indians=0.45, skipper=0.4, ritika_sajdeh=0.37, captain=0.36, batsman=0.35, indian=0.34, lendl_simmons=0.34, kieron_pollard=0.33, parthiv_patel=0.31, ipl=0.29, runs=0.29, good=0.29, ambati_rayudu=0.29, mitchell_mcclenaghan=0.27, unmukt_chand=0.27]

Finding relations

What if I query for chief_minister and west_bengal and add both the vectors?

[mamata_banerjee=0.69, bjp=0.67, state=0.6]

It gives the correct result, Mamata Banerjee is the current Chief Minister of West Bengal. Note that now numbers don't represent the cosine distance.

What if we want to find out a relationship, instead of querying? Query for india and narendra_modi:

[prime_minister=0.5, make=0.42, government=0.4, country=0.4]

Querying mumbai_attack with charged gives a list of a few names of those who were involved/charged:

[people=1.14, left=1.08, november=1.05, dead=1.05, 166=1.05, executing=1.04, planning=1.04, 2008=1.02, hamad_amin_sadiq=1.0, shahid_jameel_riaz=1.0, mazhar_iqbal=1.0, jamil_ahmed=1.0, younis_anjum=0.94, abdul_wajid=0.94, zaki-ur_rehman_lakhvi=0.62]

Although above results look good, they are not always accurate, for example, query for captain and royal_challengers_bangalore does not return virat_kohli as the first result:

[ipl=0.67, rcb=0.66, match=0.64, kolkata_knight_riders=0.6, virat_kohli=0.57]

I guess more data from different time periods can help in establishing concrete relationships.

Word vectors obtained from LSA can be useful in expanding the search queries, guessing the relationships (as shown above), generating similarity based recommendations and many other tasks related to text.
I wrote a one file implementation of LSA in Java (its buggy and design patterns free!), it uses jBLAS for SVD and other matrix operations, code can be found at github.
A couple of more links to understand LSA through examples:

24 March 2015

Word vectors (word2vec) on named entities and phrases - I

word2vec is a C lib to compute the vector representation of a given word (or a phrase). It was released by a few Googlers and being maintained at word2vec. A couple of nice articles on what word2vec is capable of (roughly):

Word vectors can boost performance of many ML and NLP applications, for example, sentiment analysis, recommendations, chat threading etc.
I used deeplearning4j's implementation of word2vec. The example given on that page does not work with the latest release of dl4j (at present 0.0.3.3) , working example can be found here. I ended up using Stanford's CoreNLP for named entity recognition, OpenNLP works fine too.

The training was done on the recent news data gathered from various sources, articles were split into sentences (using OpenNLP), duplicate and short sentences were removed. The size of the corpus was around 300MB containing 1.7 million sentences and 44 million words. The training took almost 36 Hours with 3 iteration and a layer size of 200. Lets start with simple examples:

For the word water I get the following word vector (limited to 21 words):

[groundwater, vapor, heater, pollutant, rainwater, dioxide, wastewater, sewage, potable, moisture, seawater, methane, nitrogen, vegetation, vapour, oxide, reservoir, hydrogen, plume, monoxide, sediment]

We can see that almost all the words are used in the context of water, but this is limited to the trained corpus. With different corpus you'll get different set of results. Lets look at something which was there in the news recently, e.g., the term plane:

[mh370, c17, crashland, skidd, 777, takeoff, qz8501, transasia, malaysia_airline, aircraft, globemaster, turboprop, cockpit, laguardia_airport, 1086, locator, singleengine, atr, solarpower, midair]

Except 1086 and atr, every other word (or phrase) in the vector makes sense, but if you search for 1086 and atr, you'll find that 1086 was a Delta Air Lines Flight which crashed recently and ATR is an aircraft manufacturer company. Lets look for an entity (specially phrase) vector, for example Leslee Udwin was in the news recently:

[mukesh_singh, gangrape, nirbhaya, storyville, documentary, rapist, tihar, andrew_jarecki, citizenfour, telecast, bar_council_of_india, udwin, laura_poitra, filmmaker, jinx, bci, bbc, derogatory, chai_j, leslie_udwin, hansal_mehta, bbc_storyville]

You can relate most of the words/phrases in the vector to Leslee Udwin or her documentary India's Daughter. Other words in the vector are either the names of the documentaries or the documentary makers, for example, The Jinx is an HBO documentary mini-series directed by Andrew Jarecki.

dl4j library also provides the vector addition and subtraction mechanism, for subtraction code is as follows:

List<String>  p = new ArrayList<>(), n = new ArrayList<>();
p.add("imitation");
n.add("oscar");
vec.wordsNearest(p, n, 20);

Here is how the subtraction works, vector for imitation:

[grand_budapest_hotel, michael_keaton, screenplay, birdman, boyhood, eddie_redmayne, whiplash, benedict_cumberbatch, felicity_jone, jk_simmon, richard_linklater, julianne_moore, j_k_simmon, wes_anderson, patricia_arquette, edward_norton, graham_moore, alejandro_gonzalez_inarritu, stephen_hawk, alejandro_g_inarritu, alexandre_desplat]

It contains many Oscars entries and related terms, so subtracting the vector of term oscar should remove all those entries and give us something related to The Imitation Game:

[changer, throne, lllp, alan_tur, chris_kyle, oneindia, lilih620150308, sniper, rarerbeware, grand_budapest_hotel, benedict_cumberbatch, mockingjay, iseven, cable_news_network, extractable, theory, watchapple, enigma, codebreaker, washington_posttv, mathematician]

This vector is not a very good representation of the movie The Imitation Game, there is a lot of noise. This is because of the poor and small training data. But we see a few terms in the vector which are related to the movie, e.g.,

alan_tur*, benedict_cumberbatch, enigma, theory, mathematician

* ing was removed by the tokenizer

I have trained the data on entities for now (by replacing the space with underscore), I am planning to train it on general phrases as well, like Member of Parliament should be combined into a single term member_of_parliament. Will publish the results in the second part. Next I want to compare it with Brown Clustering, it is also used for the similar purpose.

28 February 2015

Side Projects

I started my professional career in July 2008, fresh out of college I was really excited about working on real projects. After two months of training I was assigned to the Call Handling (IVR) team, a .Net based project. Soon I realized that the project did not require much coding, whole day went into writing test cases and IVR workflows in XML. This motivated me to work on the following side projects:

Indews.in (2008)
Soon after 2008 Mumbai attacks, I decided to create my own news aggregation site for India, hence the name Indews (Indian News). I was not happy how Google News clubbed the news and sometimes ended up showing stale information as the main headline. I implemented some part of the site, wrote a very basic crawler in C#, hosted the site on a local IIS server at home. It had a really bad interface and did not survive more than 3 months. I had lost all my interest in a general news aggregator, now I was mainly interested in the tech/programming news.

9AM.in (2009-2010)
After shutting down indews.in, I started a project called 9am under http://www.natmac.org/9am/. I got bored with ASP.Net technologies, there was a lot of abstraction and many times I did not understand how things worked underneath, for example ajax implementation of ASP.Net. It was all magic and lots of dlls. And there were a very few open source projects in C#. So decided to move away from .Net and migrated all my code to Java. Having worked on Java and J2EE in college projects, it was easy.
9am was again an RSS/ATOM aggregator, but this time I was crawling the whole web for the tech related stuff and some news sites for general news (kept general news anyway from indews!). You can find an internet archive snapshot here.
9am had a many features, like: finding top keywords, grouping similar items, inbuilt search, categorizing a feed item into one of these categories:

DBs, UI, .Net, S/W Engg, Languages, Mobile, Java, XML, OS

Categorization was based on bag of words and worked fairly well. It was also hosted at my home computer using a static IP and had a 384 Kbps network connection. Crawler used to crawl a few thousand URLs everyday based on a Revisit Policy. The database had around 60,000 feed (RSS/ATOM) URLs and everyday it used to discover new ones. Some of the website owners got pissed with the crawling and asked me to remove the URLs. Since everything was automated, I had no control over URL discovery.
9am was really fun, it used to discover really good articles on the web everyday and I always had something amazing to read in my office. Following is an internet archive screenshot of the Language category under TECH tab:

The whole setup had many issues, day time power cuts, internet outage, slow internet, slow machine, poor MySQL full text search. Despite of all, it used to get ~5000 visits/day from Google.
When I was moving to another city, I had to shut it down. For unknown reasons it remained that way forever. The crawler code can be found at https://code.google.com/p/crowl/ and https://github.com/vikasing/crowl

Mozvo.com (2011-2013)
Mozvo analyzed the sentiments of tweets, reviews and blogs to create a Mozvo score for a movie. It had many other cool features like: movie recommendations, actor profiles, friends' tweets about a movie, movie explorer based on many attributes etc. This was the most ambitious side project I ever did. It also involved two more guys from the same company I was working at. We worked after office, almost everyday, initially it felt like it might end up evolving in a startup. I mainly worked on the back-end part of it, which had MongoDB as its database and a data layer written in Java. It was fun building the core parts. I ended up learning lots of new stuff.

We kept on adding many features without asking our users whether they really wanted them or not. It was like a playground for us, whatever we (or any one of us) thought was cool, we ended up implementing that ignoring the outcome. We did not analyze whether any feature was helping us in retaining the users. Google brought all the traffic and that was not really enough, ~200 visits/day. Gradually we lost our interest and in April 2013 we altogether stopped working on it. It is still alive at mozvo.com but in a dormant state.

GizmoAge (2012)
This was an Android app built on top of PhoneGap, main aim was to collect latest gadget news and group it to remove ambiguity. The first version of the app was ready to use and did look much better than many apps in the Play Store. I published the app in Play Store, but removed it after a couple of months, don't remember why :), I guess there were some server issues.

Cryptocurrency Mining (2014)
This was my first hardware hacking project. I ended up investing around $1000 in this project, bought 2 top end graphics cards, a 850 watt SMPS and lots of hacky stuff like PCI risers, power buttons from Hong Kong etc.
The rig mined all the popular alt coin currencies at that time from Dogcoin to Coinocoin. I also did some trading at various exchanges. After three months of mining all the fun was gone so I stopped my rig and decided to sell the hardware. But before that following happened:
I had to RMA one of the graphics card and the motherboard short circuited (no RMA). Also lost around 0.1 bitcoin in trading. Sometime later I sold my 0.42 bitcoin and stopped the crypto currency madness all together.
Nevertheless it was fun, got to learn many things about crypto currencies eg. bitcoin, blockchain, ASIC, primecoin, mintcoin, CPU only coins, and there were these crazy ideas of coin drops, also country specific coins like Auroracoin for Iceland.

The mining rig, mining #Fluttercoin pic.twitter.com/SgcwsNIYoa
— Vikash Singh (@vikasing) April 14, 2014

Others
There were some other small projects here and there:

Crowl (2009- ): The web crawler which powered 9am, still working on it, its much more powerful now.
NiceText (2012-): This is a very small library I wrote to extract the text from a webpage. Other libs boilerpipe and readability port did not work that well on many pages. This is a part of crowl project. I wrote a post about it and here is the github link.
jaLSA (2014-): A lib I wrote to do Latent Semantic Analysis. It was needed for a project I was working on in my previous company.
velocityplus (2011): An eclipse plugin for Apache velocity templating engine. Worked on it when I was working in my first company. It is unfinished, got really bored while developing it.
Fing.in (2012): A Bollywood news portal, it was supposed to be a sub project of mozvo.com. Finished it locally but did not publish it anywhere.
Paltan.org (2008): I briefly worked on creating a social networking website for my college group, it was based on Wordpress based Buddypress. But the Buddypress itself was in beta development and lacked many obvious features. It was all PHP, lost my interest very soon, did not go anywhere, shut it down after sometime.
letsj.com (2012): An aggregator for Java related articles. Intention was to use Lucene as a database as well as indexing engine, got into many issues, abandoned.

26 November 2014

HDMI monitor blinking (flickering)

I have a dual monitor setup at home, one is connected via HDMI and another one through DVI. A couple days back I noticed that the HDMI monitor was blinking very often, it used to go blank for a couple of seconds but the DVI one just worked fine. According to few forum posts the issue was related to graphics driver but in my case the problem was occurring on both Windows 7 as well as ArchLinux so I ruled out the driver issues, my Arch installation uses open source drivers.

Somewhere it was also mentioned that it might have something to do with "ground loop", you can read more about it on Wikipedia. Then I realized I had recently changed the power cords for the HDMI monitor, and the power cord which was in use did not have an earthing pin (3rd big pin in a plug), it looked like this:

When I replaced the 2-pin plug power chord with a 3-pin plug one (the one with the earthing pin) the blinking problem disappeared. So I guess it was related to earthing but not necessarily to ground loop.

20 August 2014

Install aura on ArchLinux without haskell dependencies

Here is how to install Aura package manager on ArchLinux:

Download this https://aur.archlinux.org/cgit/aur.git/snapshot/aura-bin.tar.gz file and extract, go to the extracted directory, you'll find PKGBUILD file.
Install the missing dependencies using pacman e.g. sudo pacman -S ghc fakeroot
Run command makepkg, this will generate a bin file.
Run command sudo pacman -U aura-bin-1.x.x.x-x-x86_64.pkg.tar.xz, in the same directory, replace x.x.x-x with the actual version or just press tab after entering the partial command sudo pacman -U aura-bin, hit enter.
If you did not get any errors, the installation was successful, validate by running command aura in the terminal.

18 August 2014

Few days with Arch Linux

Over a year ago I wrote a post called Ubuntu Fail. My opinion has changed a lot since then, I have totally stopped using Windows on any of my computers as my primary OS. My three desktops and one NUC run some form of Ubuntu 14.04 without any issues. I boot into Windows only when I want to play some game.

My laptop, which is my primary machine, was running Ubuntu till last month (July 2014) then one kernel update broke the sound driver. I fixed it buy updating the kernel manually to the latest version, which was 3.15.*, but then I realized why not install a distro which always gets updated to the latest kernel, libs and drivers without all this manual effort. I started looking for such a distro, I read a few posts talking about Fedora and Arch Linux. I'd installed Arch Linux on my Raspberry Pi long back and I was impressed with its performance then, it seemed faster than Raspbian, the official distro for RPi. I also liked Arch wiki.

So I decided to replace my Ubuntu GNOME 14.04 with Arch. I backed up all my data and installed basic Arch, did the basic configuration, everything just worked fine. I went with my usual choice for DM i.e. GNOME 3, no issues there too. Although it took 2 days in setting up, but it was fun.

It's been 3 weeks since the Arch installation and I haven't had any major issues. My system is more stable and faster than the previous Ubuntu installation, and that fuzzy touchpad issue which I described in my Ubuntu Fail post is not there in Arch.

I don't think I'm going back to Windows anytime soon.

20 December 2013

Ebay India Orders Fuck Up

I purchased a 24'' monitor to replace my 4 year old 19'' monitor from eBay on 13th of December. The estimated delivery date mentioned on the product page was 17th Dec. On 17th Dec I received this email from eBay:

MC011 XXXXXXX: Update- Error in payment mode details updated with the courier

Hello,

Thank you for shopping with eBay

We have noticed that due to a system bug there are few transactions where a wrong information about the payment mode has been updated in the system. You might receive SMS alert or an Emails asking you to keep the cod amount ready with you at the time of delivery. Request you to kindly ignore this SMS/Email and not to give any cod amount at the time of delivery if you have made payment of your transaction by using Debit/Credit card or via Net banking.

Kindly be rest assured this issues has been addressed and resolved, the correct information of payment has been shared with the courier to avoid confusion at the time of delivery. If the courier executive is asking you to pay the cod amount you can refuse the payment and expect the product to be delivered in the next 2-3 working days without the cash being collected. If your transaction is getting returned due to this reason, you will get the refund of your transaction as per Paisa pay timelines.

We sincerely regret for the inconvenience caused to you and also appreciate your cooperation and understanding at this time.

Regards
eBay India

Next day I received this SMS from BlueDart courier company:

Ebay order will reach thru BlueDart Awb xxxxxxxxx on or after 19-DEC-13.Kindly keep Cash Rs.xxxxx ready.

Today (20th December) morning I received an email saying:

A refund has automatically been initiated on item "BenQ GL2450HM LED Monitor" on 20-Dec-2013 since our shipping partner has not confirmed pickup of your shipment from the seller within the timelines.

I called up the seller and asked about this refund thing, the seller did not have any clue about it. He told me that the item was already shipped. Around 3 PM the courier company guys appeared with the monitor at my doorstep and asked me to pay the whole amount again which I had already paid while placing the order. I tried to explain them that it was a prepaid item not a Cash On Delivery, it was of no use. Those guys took the monitor back and updated the status as: "Cnee Refused To Accept Shipment".
In the evening I got another email saying:

"Your refund request has been put on hold by PaisaPay on 20-Dec-13 as the seller has appealed against this refund"

I checked my account page and here is how it looked:

Notice the dates underlined by the orange line, they are all future dates. How on earth a company like ebay can fuck up so much and at so many levels ??
First they fucked up the order payment mode then gave wrong information about the shipment and now all the dates celebrating Christmas already.
This was indeed an enlightening experience, will probably avoid going to ebay any more. Its already 8th day since I placed the order and customer care guy told me to wait till 25th Dec. WOW!!

09 September 2013

Keyword extraction in Java

Around two months back I started working on a Java library: NiceText, to isolate data mining stuff from Crowl (another weekend project!) and make it an independent lib. I already implemented a text extractor for web pages (hence the name NiceText), which I observed sometimes performs faster and better than boilerpipe library.

I also wanted to extract keywords (or keyphrases) from text. I considered widely used algo TF-IDF for this purpose. Since keyphrases can contain more than one word, I also considered n-grams (mono, bi and tri), e.g., consider the following text:

Astronomers have gotten the first-ever peek at our solar system's tail, called the heliotail, finding that it's shaped like a four-leaf clover, NASA scientists announced.
The discovery was made using NASA's Interstellar Boundary Explorer (IBEX), a coffee-table-sized spacecraft that is studying the edge of the solar system.
'We always drew pictures where the tail of the solar system just trailed off the page.'
- David McComas, IBEX principal investigator
"Many models have suggested the heliotail might look like this or like that, but we have had no observations," David McComas, IBEX principal investigator at Southwest Research Institute in San Antonio, Tex., said in a statement. "We always drew pictures where the tail of the solar system just trailed off the page, since we couldn't even speculate about what it really looked like." [Images: NASA's IBEX Sees Our Solar System's Tail]
The heliotail is "a much larger structure with a much more interesting configuration" than scientists had previously predicted, McComas added during a news conference announcing the finding.
The heliotail is inflated by the solar wind of particles streaming off the sun, and the four-leaf clover shape is the result of fast solar wind shooting out near the sun's poles and slower wind flowing from near the sun's equator, researchers say. The finding is based on the first three years of IBEX's measurements of energetic neutral atoms.
In the interstellar boundary region, charged particles from the sun stream outward far beyond the planets toward the gas- and dust-filled space between stars. Collisions between these particles and interstellar material create fast-moving particles with no charge, known as energetic neutral atoms, or ENAs. Some of these particles speed inward toward the sun, where IBEX can detect them from its perch 200,000 miles above Earth.
"Scientists have always presumed that the heliosphere had a tail," Eric Christian, IBEX mission scientist at Goddard Space Flight Center in Greenbelt, Md., said during a Google+ Hangout announcing the finds. "But this is actually the first real data that we have to give us the shape of the tail."
Though IBEX data has given scientists an idea of the shape and structure of the heliotail, they say they have not been able to measure its length particularly well. They think it is probably evaporating over something like the 1000 times the distance between the Earth and the sun, McComas said.
The $169 million IBEX spacecraft, launched in 2008, was built for an initial two-year mission, which has since been extended. Early on in its mission, IBEX detected ENAs flowing toward the sun in an unexpected pattern: They were significantly enhanced in a mysterious ribbon on the edge of the solar system that scientists now think is a reflection of the solar wind, shot back toward the sun by a strong galactic magnetic field.
IBEX has made several other important discoveries throughout its mission. In 2010, the spacecraft turned its gaze back toward Earth and got the first-ever peek at the solar wind crashing into the planet's magnetosphere. Last year, NASA announced that the spacecraft made its first detection of matter from outside the solar system, finding alien particles of hydrogen, oxygen and neon in the interstellar wind.

The extracted keywords are (arranged in decreasing order of tf-idf score):

ibex, solar system, heliotail, interstellar, mccomas, nasa, solar wind, spacecraft, particles, tail, toward sun, shape

As you can observe keyphrases "solar wind" and "solar system" has a common word "solar" but it does not appear as a keyword, because the program takes care of this ambiguity by comparing monograms with bigrams and then bigrams with trigrams to check if certain words appear together at the same number of time. For example: if the word "solar" appeared 10 times with the word "system" and they individually occurred exactly 10 times each, then both words are merged to form a keyphrase "solar system".

The tf can be calculated in many ways, but in the above case it is calculate by the below formula (it worked better than others!):
$$tf = log(TermFrequency+1)$$
Another way to calculate the tf is (it is commented out, you can uncomment and comment the above one, if you decide to use the below one):
$$tf = {\alpha + (1-\alpha)*TermFrequency \over \max\{f(w,d): w \in d\}} $$
Above method somewhat takes care of the bias created by document length. You can read more about it here.

As we know tf-idf depends on multiple documents, more number of documents leads to a better accuracy. Above results are based on only 40+ documents present @ https://github.com/vikasing/NiceText/tree/master/data

CODE:

Code for extracting the keyphrases from the text is present in https://github.com/vikasing/NiceText/blob/master/src/com/vikasing/nicetext/TfIdf.java

I would suggest to take (or fork it to play with it) the whole project to avoid any dependency issues. This is still a WIP, I'll release a lib (a jar) soon after adding more features and fixing some bugs.

30 June 2013

Ubuntu Fail

Recently I tried to switch permanently to Ubuntu, the logic was that I mainly do 3 things on my laptop: internet browsing, coding and watching movies/videos on VLC and none of them require Windows, so switching to the Ubuntu made sense, also it has got some tools which I wanted to learn, e.g. octave. I didn't want to loose my Windows 8 installation and wanted to install Ubuntu 12.04 LTS parallel to W8. Earlier attempt of installing Ubuntu failed due to a lot of EFI related issues.

Ubuntu 12.04

Installation was smooth and I was able to get the grub screen, which displayed a number of options to choose from. I logged into Ubuntu. The familiar Unity interface came up, which I did not like so I went ahead and installed GNOME. But it ran in fallback mode due to graphics driver issues. My laptop has hybrid graphics: ATI with + Intel 4000 . I looked on the web for the drivers but found nothing, Intel stopped developing a graphics driver for 12.04 due to some dependency issues but there was a driver for 12.10.

Another issue which I faced due to the lack of the graphics drivers was the pixelated video play in VLC. So I thought of upgrading from 12.04 to 12.10.

Ubuntu 12.10

After upgradation was done, I could not get the grub, it got screwed somehow. I got a blank screen, nothing else. I ran the live cd and did a boot repair. It restored the grub, but when I selected the linux option in the grub and hit Enter, a blank screen appeared and kept appearing until I switched of the system. I tried recovery mode, which logged me into the command line mode. There was a message which said I could upgrade to 13.04 from 12.10. Having known that Intel has provided a good support for 13.04 I decided to upgrade.

Ubuntu 13.04

Again I had to run boot repair to fix the grub issues, even after that I was not able to log in, the blank screen didn't go away no matter what I did. This time I thought of installing 13.04 from scratch, a fresh installation. Everything went smooth and I was able to login. After using about half an hour I noticed a weird issue of my mouse pointer moving a little out of place, the motion was not precise. It seemed to be a driver issue for Synaptics touchpad. A latest driver for Synaptics was already installed. I tried tweaking Synaptics settings, but nothing noticeable happened. And this issue was affecting my whole experience, I was not able to open links I intended to open, I was not able to close windows etc. It was worse than that graphics issue I had in 12.04.

So I went back to 12.04 and this time I installed Xubuntu desktop. It worked ok for some time until one day, I had to connect my laptop to a projector. Due to no graphics driver only one display was supported, I selected monitor option and my laptop screen went blank. When the job was done, I disconnected the projector hoping my laptop screen will lit up, it remained blank.

Xubuntu-desktop had another issue, the fan would start spinning at full speed after waking the laptop from suspension. I tried a patch which I found on the net, it did not help.

Linux Mint 15

Finally I thought of giving Mint a try, since it came with a lot of pre-installed software and drivers. Installed it after wiping the previous 12.04 installation. When I tried to boot I got a grub rescue screen, I tried many things to restore grub but failed all the time. Now I was not able to boot into any of the systems, neither Windows 8 nor Mint 15. I gave up, removed grub and booted into W8 and formatted the Mint partition. Now I have decided to run linux on top Windows; in VrtualBox.

Final Thoughts

Ubuntu 13.04 with GNOME has better looks than Windows 8 but when it comes to hardware support, many h/w companies don't care about this market segment (linux). GRUB is another rotten area in the Linux territory, if you are damn lucky then only it might work. Linux guys need to work together to create a better alternative to GRUB. Finally I still hope that one day I'll be able to make a permanent switch to Ubuntu or any other popular distribution.

27 June 2013

Angelhack 2013 Bangalore Experience

Angelhack 2013 was held at Microsoft Research/Microsoft Accelerator for Azure Office at Lavelle Road, Bangalore. We were given 23 hours to code/hack, from 2PM June 22nd till 1PM next day. Me and my partner divided the work and started to code. Soon we realized that there were issues with wi-fi connectivity at the venue, there was no internet for a long time and we were expected to code/hack without internet. After a couple of hours when technician could not sort out the issue, we were given a generic wifi access, which was slow to the death, occasionally inaccessible. For important things I relied on my mobile data connection, many had USB modem, this situation did not improve at all.

Organizers were happy to help/instruct.
Facilities were good, probably the best I've seen in an IT office.

I slept around 3AM after a 2nd git commit. Some did not sleep. I woke up around 6:30AM. We resumed our work around 7AM

Demo Time

Demo session started somewhere around 2:30PM, each team was given 2 Minutes for presentation + 1 Minute for QnA from judges. We'd prepared a video of whatever we could finish, it was not even half backed, just featuring some HTML mockups and FB integration.

Judging (Only reason behind this post)

Initially when the judges were ROFLing and making fun of the ideas and passing stupid comments, I was forced to think that these people really knew what they were talking about. But when a team presented an idea of querying the data by typing the human readable questions in a text field, one judge (the dominating one!) started to compare it with Wolfram Alpha, and went on saying that Wolfram Alpha was not present in Indian market, but when they come, your app wouldn't stand a chance.

Another team presented an app based on some medical data, it could tell you what disease you might have based on your symptoms. One Judge suggested him to participate in a kaggle competition "The Heritage Health Prize" . BTW that event ended in April 2013, it may re-start this year sometime, but no announcement has been made yet about the dates.

When we presented an idea about a generic platform for peer to peer learning, the judges responded with "many tried it and failed".

One guy presented an app which created cinemagram of a video, he explained how it worked and claimed that it was faster than the apps present in the market. When judges asked why it was faster, he simply said that he didn't know. Guess what? he won the hackathon.

Finding out that something like cinemagram already exists was just a google search away, in fact there is a company named cinemagram which has got apps on both Android as well as iOS app stores. And when you search cinemagram on Play Store you get more that 60 apps.

Why am I pissed?

My hackathon partner has a startup and he told me that he'd met most of these judges at least once for pitching his product. They are well known in India's VC circle, kind of celebrities in startup world in Bangalore.

This is what upsets me, at an event like hackathon we don't need celebrity judges, who can only focus on India when the event clearly wasn't about India. These people have a certain mindset, for example "this has failed before and would fail again" or "this already exists" or the ignorance about what has been happening around the world, like this cinemagram concept. Judges had great knowledge about Indian market and showed complete ignorance about outside world. Imagine about the people who seek funding from these ignorant people. Lots of great ideas may be dying everyday.

Now I am a bit concerned, because I'd be launching my startup in near future, and if the whole Indian industry is filled with such idiot VCs, I am going to face some really hard time.