Introduction  |   DB Browser  |   How-To  |   Submission  |   Contact Us

How-To and Frequently Asked Questions


What is this all about?

 

BioDB100 is an initiative from the Asia Pacific Bioinformatics Network (APBioNet), to build at least 100 interoperable MIABi standard-compliant bioinformatics databases.

This platform of databases will enable participants to

  1. Save, compress, archive their database at a particular point of time
  2. Re-instantiate their database at some future point of time, e.g. when the current database is discontinued
  3. Allow others to search for their database with keywords
  4. Allow others to re-instantiate their database on demand
  5. Create new databases and services with multiple instantiated databases

Announced by the President of APBioNet, Shoba Ranganathan, during the International Conference on Bioinformatics InCoB 2010 in Tokyo, this initiative will eventually apply to all papers accepted for InCoB conferences which have database content or webservices.

Every one bemoans the loss of data, and are frequently frustrated that the paper they read about points to a database which is not accessible or has its services discontinued. Well, not if you have saved a version of your database on BioDB100.

Who can participate?

 

Anyone who has created a bioinformatics database or web service can participate in this project by contributing your database on this platform and joining the BioDB100 Database Development Team (contact tinwee --at-- bic.nus.edu.sg).

What do I benefit from participation?

 

The benefits of participation are many. If you have published a paper on a bioinformatics database and are maintaining it on your own server, how long do you think it will stay running? Statistically, the odds are stacked against you. Why not make a snapshot of your database, deposit the entire database and all its services in a re-instantiable form?

If your database shuts down, anyone including yourself, can re-instantiate the original database on BioDB100 platform within minutes on our BioDB100 cloud.

When fully operational, and if you are member of the Database Development Team, you can create new databases by accessing other databases in an interoperable manner and create new ways of looking at old data, and carry out data mining and knowledge discovery on this interoperable platform.

How can I make a new BioSlax Module?

 

First, understand what BioSlax is (See BioSlax.com). It is a bioinformatics-software-packed open downloadable Slax, a Slackware version which is a live operating system.

The live operating system originated from a LiveCD, which is an operating system which is volatile, runs in memory and is not a full installed version of a Linux operating system. When the power is shutoff or the OS powers down, every thing you have done on the OS disappears and the original BioSlax reverts back to its original pristine condition.

However, BioSlax comes with a feature that allows the current state of the OS to be preserved. Because it is a live system, the entire run-time OS lives in memory. All the files and folders are in a filesystem in memory, which can be saved and compressed into a BioSlax module using LZMA compression technology. This special BioSlax module, which is a snapshot of every single new file or every modification made since the LiveOS was instantiated, will be saved and compressed into an LZM module (use the savechanges command). This LZM module can be inserted into a new LiveOS instance of BioSlax and activated manually (using the activate command. All the new files will overwrite into the existing run time folders, and the appropriate commands activated to restart whatever services e.g. apache server, which rely on the new files.

If you wish to activate it at boot time, all you have to do is to simply copy the LZM module file into a specific folder of the bootable image, the /bioslax/modules folder. When the LiveOS boots up, it goes into the /bioslax/modules folder and based on alphabetic order, takes each LZM file in the modules folder and activates them sequentially. The last LZM file of course, will clobber (overwrite) every other previous copy of a file, so please be careful.

Thus, more than one LZM file can be activated and activated in sequence during boot-up time.

How big a module can I upload?

 

The LZM module upload limit is 500MB. LZMs are compressed data, so all files that relate to the LZM module (http files, binaries, required libraries, etc., including the actual database itself), must not exceed 500MB. However, submitters can contact the system administrator if large database files are to be uploaded.

How to test the insertion and activation of the LZM module?

 

Once you have saved all the database/service specific files into an lzm compressed file as nameofmydb.lzm, you will need to test if it really works on boot up. To do this, save the lzm module on your home machine or desktop by file transfer. (It is always good practice to save copies everywhere, or at least a copy somewhere.)

Reboot your bioslax instance or virtual machine or LiveCD. Copy the lzm module into your home directory, and issue the command as root superuser:

$ activate nameofmydb.lzm
The activate command will unsquash everything in the lzm module file and write it into the existing file structure of the operating system. So a file in the folder etc/httpd/httpd.conf will get written over /etc/httpd/httpd.conf (provided it is not being used). Note that if you have a scriptalias in httpd.conf or something you need to modify, it is recommended for you to create a new file, say mydb.conf and put all your configuration directives in there e.g.
ScriptAlias /mgalign-cgi "/var/www/cgi-bin/mgalign-cgi" Options None AllowOverride All Order allow,deny Allow from all

Then, save this file as mydb.conf and copy into etc/httpd/alias/mydb.conf before dir2lzm. Whatever httpd.conf files in etc/httpd/alias folder will be detected during the "activate" command and an "include" statement append to /etc/httpd/httpd.conf. When the apache httpd server is restarted automatically, the new directives will be processed. This is the recommended way of modifying the behaviour of the httpd.conf.

In other words, if you have to do some remapping of the root document folder or aliasing of cgi-bin directories during your porting over to bioslax, this is a good way of doing it. Take the current httpd.conf and identify what needs to be changed. Put the commands insider a separate conf file, and copy it into the etc/httpd/alias folder and dir2lzm up this file which upon activation, will copy the conf file into the /etc/httpd/alias file. Thus the directives in your new config file will override all affected httpd configurations.

This is a safer way of adding changes to your httpd.conf because otherwise, you will have to wait for httpd to shutdown and then copy the new file to overwrite httpd.conf, and the restart httpd. Otherwise, httpd server will not read the new httpd.conf new configurations; or worse, the "activate" command may not overwrite and clobber the old httpd.conf.

How to convert a directory into an LZM module?

 

The dir2lzm command is a bioslax-specific utility. If you store all your files in a folder, say /tmp/mydb/*, dir2lzm will be able to compress and squash everything in that directory if you issue the command

$  dir2lzm
Convert directory tree into .lzm compressed module
usage: /usr/bin/dir2lzm source_directory output_file.lzm
$  dir2lzm /tmp/mydb  mydb.lzm

If you save all files in /tmp/mydb/, e.g.
/tmp/mydb/foofolder/blah.txt
/tmp/mydb/barfolder/blah.txt
/tmp/mydb/usr/local/bin/myprog
/tmp/mydb/var/www/htdocs/my.html
/tmp/mydb/var/www/cgi-bin/my.cgi

it will generate /tmp/mydb.lzm which you can transfer to another new instantiation of a bioslax, and activate mydb.lzm and it will insert the files into the root level, ie
/foofolder/blah.txt
/barfolder/blah.txt
/usr/local/bin/myprog
/var/www/htdocs/my.html
/var/www/cgi-bin/my.cgi

To check the directories are correct, you can un-zip them all into a folder to verify the paths using the command, lzm2dir.

$ lzm2dir
Convert .lzm compressed module back into directory tree
usage: /usr/bin/lzm2dir source_file.lzm existing_output_directory
$ mkdir testfolder
$ lzm2dir my.lzm testfolder
$ ls -al testfolder

Example of how to convert a webservice into an LZM module?

 

Porting over MGALIGNIT service

Let's take an example of a simple webservice, the MGAlignIt webtool from Lee, B.T.K., Tan. T.W. and Ranganathan, S. (2003).

Lee, B.T.K., Tan. T.W. and Ranganathan, S. (2003). MGAlignIt: a web service for the alignment of mRNA/EST and genomic sequences. Nucleic Acids Research, 31(13), 3533-3536.

This Web service is extant at http://proline.bic.nus.edu.sg/mgalign As a best practice of achieving re-instantiability should this Web service end, MGAlignIt has been deposited with biodb100.apbionet.org on a BioSlax 7.5 cloud-reinstantiable OS base (ISO bootable image from http://www.bioslax.org).

A fully compatible MGAlign tar compressed file of all MGAlign programs and documents was transferred from proline.bic.nus.edu.sg (mg.tar.gz), copying specific files from MGAlign in /etc /usr and /var

This was copied into a BioSlax 7.5 instance (137.132.19.172) on biodb100 cloud by instantiation using http://vmc.apbionet.org/

MGAlign Files

1. Web configuration (/etc)

/etc/www/httpd/httpd.conf was edited to do some aliasing to the appropriate cgi-bin directory

2. MGAlignIt uses Python 2.5 (/usr)

Python modules not already in the base OS BioSlax 6.5 was added, in particular, the PIL module found in site-packages: /usr/lib/python2.5/site-packages/* and the PIL python executables in /usr/bin/*

3. MGAlignIt webservice files (/var)

These are found in two directories

  • /var/www/cgi-bin/
  • /var/www/htdocs/

3a. MGAlignIt web documents in /var/www/htdocs
Only two folders are involved:
3a.i MGAlign files /var/www/htdocs/mgalign
proline:/var/www/htdocs/mgalign/* were transferred to
bioslax:/var/www/htdocs/mgalign
3a.ii Python scripts and files /var/www/htdocs/Python

proline:Python modules were transferred into
bioslax:/var/www/htdocs/Python

3b. MGAlignIt cgi executables and libraries in /var/www/cgi-bin
Only one folder is involved:
proline:/var/www/cgi-bin/mgalign-cgi/* moved to
bioslax:/var/www/cgi-bin/mgalign-cgi

Check all hyperlinks and all cgi-bin scripts for file pointers specific to the local machine. Modify them manually to make hyperlinks relative (in this case all compatible), and cgi-bin file pointers to be portable.
For example, change the full path of the cgi-bin python scripts to include certain modules which are kept in some other location.

sys.path.append("/var/www/htdocs/Python/modules") sys.path.append("/var/www/htdocs/Python/projects")

These changes were conveniently found in the BioSlax 7.5 instance, 137.132.19.172, in the folder /mnt/live/memory/changes

# mkdir /tmp/mg
# mkdir /tmp/mg/etc; mkdir /tmp/mg/etc/httpd
# cp /mnt/live/memory/changes/etc/httpd/httpd.conf
# mkdir /tmp/mg/usr; cp -Rp mnt/live/memory/changes/usr/* /tmp/mg/usr
# mkdir /tmp/mg/var; cp -Rpv /mnt/live/memory/changes/var/www /tmp/mg/var
# cd /tmp; dir2lzm mg mgalign.lzm

Although using the command "savechanges" will save everything in /mnt/live/memory/changes/* as an lzm instantly, it includes a lot of unnecessary non-MGAlign files. Using the above manual approach will give you an opportunity to identify files which are necessary to the webservice and also for you to detect and modify hard-coded pointers to machine-specific files which need to be ported to bioslax.

Important Note:
If you wish to modify the httpd.conf, there is an undocumented bug in the liveCD that does not seem to allow the httpd.conf to be overwritten. So the solution is to save every new httpd configuration into a separate file, e.g. mydb.conf, and save it as /tmp/mg/etc/httpd/alias/mydb.conf. When you boot up the VM instance, and file transfer your mydb.lzm into the instance, the web cgi will shut the httpd server, sweep the /etc/httpd/alias folder, and detect any new file, in this case mydb.lzm, and construct an include statement, and have it appended to the /etc/httpd/httpd.conf file, and restart the httpd server, which will read the new httpd.conf, and at the last line, call the include file, which will read from /etc/httpd/alias/mydb.conf all the latest configurations.

How can I run a command when I activate an lzm module?

 

When you instantiate a server, our cgi scripts will remotely boot up the bioslax base OS, and then start copying the lzm module you have chosen by file transfer to the new instance once booted up. It will save the lzm into /mnt/hda2/modules. It will then stop the httpd server. Run the "activate" command. And then it will restart the httpd server.

The "activate" command will union-write the files in the lzm module into the live operating system on the fly. In other words, it will write the lzm module compressed files into the /live/memory/changes/ folders.

Then, it will run all commands in /etc/rc.d/init.d/* and one of them is the rc.update_httpd_conf command, which will sweep the /etc/httpd/alias folder for any new httpd.conf files and insert an include statement at the bottom of the httpd.conf that will get the httpd server to read the config file in the alias folder. when the httpd server is restarted by the next command in the "activate" script (mentioned in a previous FAQ).

So if you insert executables inside /etc/rc.d/init.d folder (making sure you have changed permissions to chmod +x), then, all those commands will be executed in alphabetical sort order. This is useful if you have to run your command upon "activate", ie. if you have to insert the files and then run a command during instantiation, use this method.


An cloud based initiative of APBioNet and NUS