April 10, 2009

How to download contents of a site?

Sites such as http://www.nios.ac.in/sec_cour.htm have a vast resource of books and other online material which many a times is not accessible either because the servers are down or due to some other pathetic reasons. Further, many profs and teachers share their course material and other lecture notes online, for example, http://faculty.ccp.edu/faculty/dsantos/lecture_notes.html. Clicking on each page requires patience which many of us including lil ol me lack, also I am not to comfortable with tools such as flashgot, a plugin for firefox. I prefer the good old wget. A simple way to mirror the contents of a site is to use wget -mk. This ensures that all the pages downloaded link back to the files downloaded. This ensures that you can view all the pages even when you are offline. However, many links can point back to the parent, for example, the prof might link back to the univ website which can create some loops in the download process. I am not sure if I am completely correct in this point but I avoid such problems by using the -np switch of wget that ensures that I do not fetch the pages which are the parents of the given page, i.e., I always traverse to the leaf nodes. So if you have to download notes from a particular site.
1. Open your favourite terminal
2. create a new folder to avoid a messy directory structure, e.g. mkdir -p /home/ashwin/Books/nios-notes
3. cd /home/ashwin/Books/nios-notes
4. wget -np -mk http://www.nios.ac.in/sec_cour.html

Enjoy a few tom and jerry cartoons based on your internet bandwidth and voila, the notes are mirrored with all the links in the pages downloaded pointing to the files downloaded.