02 November 2012

Resolving grub rescue issue after Windows 8 installation

The issue occurs when you try to upgrade your Windows 7 to Windows 8 and have installed a Linux version (e.g. Ubuntu) in the past, parallel to Windows 7. The grub rescue screen appears just after first restart and it doesn't let you boot into the Windows. In my case the text looked like this:

error: unknown filesystem.
grub rescue>

This happens because grub can not find the Linux partition, which you may have deleted earlier (just by formatting the Linux disk partition). First you can try all these options:

1. Repair Windows start-up using Windows 7 installation DVD.
2. Using command prompt option with W7 installation DVD to fix master boot record (MBR) [Link] & [Link]
3. Try removing Grub using Ubuntu live CD/DVD [Link]
4. Use Ubuntu tool "Boot-Repair" [Link]

If any of the above mentioned option did not work, then use the Ubuntu live CD/DVD to install Boot-Repair, after installation click on "Create a BootInfo summary":

This will analyze all the partitions' boot information and create a report, you'll have to access this report using the URL given by this program. The URL will look like: http://paste.ubuntu.com/(a_number)/

The report is divided in many sections, what you should be looking at is the block where you can find an entry mentioning grub.

For example in my case the grub entry was located on the partition sda3:



sda3:__________________________________________________________________________
    File system:       vfat
    Boot sector type:  Unknown
    Boot sector info:  No errors found in the Boot Parameter Block.
    Operating System:  
    Boot files:        /EFI/Boot/bootx64.efi /EFI/ubuntu/grubx64.efi 
                       /EFI/Microsoft/Boot/bootmgfw.efi 
                       /EFI/Microsoft/Boot/bootmgr.efi 
                       /EFI/Microsoft/Boot/memtest.efi

In you case grub boot file may be located on a different partition, mount that partition into newd directory using command (in ubuntu live cd/dvd session):

sudo mount /dev/sda3 /newd
go to the newd directory and then EFI by typing:
cd /newd/EFI/
remove the whole ubuntu directory using command:
sudo rm -r ubuntu
End the live Ubuntu session and restart. If you can boot into the Windows then it worked!! Otherwise look for other solutions on the Google (one of them was: re-installing the Ubuntu again, hence fixing the Grub).

13 April 2012

Introduction to node.js

Recently I had to give a presentation on a technology, I sorted out two: a graph db (neo4j) and node.js. I had heard a lot about node.js and was not very sure what it really was, so I choose it for the presentation. Here it is:


27 March 2012

Extracting meaningful text from webpages

I was trying to extract the meaningful text from a webpage for a given URL for crowl. For example if I visit any news site for a particular article, I will find a lot of crap (clutter) with the news text, this includes: ads, related news stories, top news stories, comments on the article, other web site links and much more.
Lets take an example of this The Times of India article:

 http://timesofindia.indiatimes.com/tech/news/hardware/83-year-old-woman-sues-Apple-for-1m/articleshow/12415012.cms

The useful text in the The Times of India article has around 30% share of total content, the remaining 70% is the clutter. You may argue that you need those links related to most popular stories, related stories etc. But sill a lot of extra stuff is there which we really don't care about.  (Meaningful) Information extraction from such a page is a big nightmare. We can start with getting the HTML source and stripping the HTML tags from the text.  Using regular expressions, lets remove all the links too. The resultant content will look like:

83-year-old woman sues Apple for $1m - The Times of India | The Times of India | | More More ADVERTISEMENT Hardware The Times of India The Times of India Indiatimes Web (by Google) Video Photos You are here:  »   »   » Hardware Breaking News: 83-year-old woman sues Apple for $1m The writer has posted comments on this articleANI | Mar 26, 2012, 04.42PM IST My Saved articles Read more:||||||| SHARE AND DISCUSS NEW YORK: An 83-year-old American woman has sued for 1 million dollars after she failed to see the glass door at the tech giant's office and smashed her face. Evelyn Paswall, a former Manhattan fur-company vice president, went to to return an on December 13. While approaching the store, Paswall didn't realize she was heading straight for a wall of glass. She smashed her face against it, breaking her nose, Paswall claims in her suit filed in the US Eastern District federal court. Now the Forest Hills, Queens, resident, Paswall claimed in her lawsuit that the company was negligent not elderly-proofing the store's see-through fa ade, The New York Post reports. She argues that Apple should have put marks on the glass that older people could spot before they come face-to-face with disaster. "The defendant was negligent . . . in allowing a clear, see-through glass wall and/or door to exist without proper warning," Paswall suit said. Hi ! Do you like this story? My saved articles RELATED COVERAGE Articles Blogs LATEST NEWS » ......

As you can observe the above text has a lot of extra text which we don't want. Attempts have been made to get extract the main content, here is one such article: How to Extract a Webpage’s Main Article Content
The Java program to get the above text: (Jsoup can be downloaded from here)

 public static void main(String[] args) throws Exception {
     String href="(.*?<\\/a>)";
     Document doc = Jsoup.connect("http://timesofindia.indiatimes.com/tech/news/hardware/83-year-old-woman-sues-Apple-for-1m/articleshow/12415012.cms").get();
            String source = doc.html();
            source = source.replaceAll(href, "");
     System.out.println(Jsoup.parse(source).text());
 }

The best Java lib I could find to get the main text from a web page was boilerpipe, and the same can be tested here. It does a pretty good job of removing the clutter around the meaningful text. Running the The Times of India news article link through boilerpipe gives the following text:

Tweet
NEW YORK: An 83-year-old American woman has sued Apple for 1 million dollars after she failed to see the glass door at the tech giant's office and smashed her face.
Evelyn Paswall, a former Manhattan fur-company vice president, went to Apple's Manhasset store to return an iPhone on December 13.
While approaching the store, Paswall didn't realize she was heading straight for a wall of glass.
She smashed her face against it, breaking her nose, Paswall claims in her suit filed in the US Eastern District federal court.
Now the Forest Hills, Queens, resident, Paswall claimed in her lawsuit that the company was negligent not elderly-proofing the store's see-through fa ade, The New York Post reports.
She argues that Apple should have put marks on the glass that older people could spot before they come face-to-face with disaster.
"The defendant was negligent . . . in allowing a clear, see-through glass wall and/or door to exist without proper warning," Paswall suit said.
Hi !

The above text is very close to what we want. Boilerpipe library is based on this paper. By combining Jsoup (to get the page title) with boilerpipe (to get the page content) we can get the meaningful content from a webpage.