How-To and Frequently Asked Questions
BioDB100 is an initiative from the
Asia Pacific Bioinformatics Network (APBioNet),
to build at least 100 interoperable MIABi standard-compliant
This platform of databases will enable participants to
- Save, compress, archive their database at a particular point of time
- Re-instantiate their database at some future point of time, e.g. when
the current database is discontinued
- Allow others to search for their database with keywords
- Allow others to re-instantiate their database on demand
- Create new databases and services with multiple instantiated databases
Announced by the President of APBioNet, Shoba Ranganathan, during
the International Conference on Bioinformatics InCoB 2010 in Tokyo,
this initiative will eventually apply to all papers accepted
for InCoB conferences which have database content or webservices.
Every one bemoans the loss of data, and are frequently frustrated
that the paper they read about points to a database which is
not accessible or has its services discontinued. Well, not if
you have saved a version of your database on BioDB100.
Anyone who has created a bioinformatics database or web service can
participate in this project by contributing your database on this
platform and joining the BioDB100 Database Development Team (contact
tinwee --at-- bic.nus.edu.sg).
The benefits of participation are many. If you have published
a paper on a bioinformatics database and are maintaining it
on your own server, how long do you think it will stay running?
Statistically, the odds are stacked against you. Why not make
a snapshot of your database, deposit the entire database and
all its services in a re-instantiable form?
If your database shuts down, anyone including yourself,
can re-instantiate the original database on BioDB100 platform
within minutes on our BioDB100 cloud.
When fully operational, and if you are member of the Database
Development Team, you can create new databases by accessing
other databases in an interoperable manner and create new
ways of looking at old data, and carry out data mining
and knowledge discovery on this interoperable platform.
First, understand what BioSlax is
It is a bioinformatics-software-packed open downloadable Slax,
a Slackware version which is a live operating system.
The live operating system originated from a LiveCD, which is
an operating system which is volatile, runs in memory and
is not a full installed version of a Linux operating system.
When the power is shutoff or the OS powers down, every
thing you have done on the OS disappears and the original
BioSlax reverts back to its original pristine condition.
However, BioSlax comes with a feature that allows the
current state of the OS to be preserved. Because it
is a live system, the entire run-time OS lives in memory.
All the files and folders are in a filesystem in memory,
which can be saved and compressed into a BioSlax module
using LZMA compression technology. This special BioSlax module,
which is a snapshot of every single new file or every
modification made since the LiveOS was instantiated,
will be saved and compressed into an LZM module
(use the savechanges command). This
LZM module can be inserted into a new LiveOS instance
of BioSlax and activated manually
(using the activate command. All the new files
will overwrite into the existing run time folders, and
the appropriate commands activated to restart whatever
services e.g. apache server, which rely on the new
If you wish to activate it at boot time, all you have
to do is to simply copy the LZM module file into a specific folder
of the bootable image, the /bioslax/modules folder. When
the LiveOS boots up, it goes into the /bioslax/modules folder
and based on alphabetic order, takes each LZM file in the
modules folder and activates them sequentially. The
last LZM file of course, will clobber (overwrite) every other
previous copy of a file, so please be careful.
Thus, more than one LZM file can be activated and
activated in sequence during boot-up time.
The LZM module upload limit is 500MB. LZMs are compressed data,
so all files that relate to the LZM module (http files, binaries,
required libraries, etc., including the actual database itself),
must not exceed 500MB. However, submitters can contact the
system administrator if large database files are to be uploaded.
Once you have saved all the database/service specific files into
an lzm compressed file as nameofmydb.lzm, you will need to test
if it really works on boot up. To do this, save the lzm module
on your home machine or desktop by file transfer. (It is always
good practice to save copies everywhere, or at least a copy
Reboot your bioslax instance or virtual machine or LiveCD.
Copy the lzm module into your home directory, and issue the command
as root superuser:
$ activate nameofmydb.lzm
The activate command will unsquash everything in the lzm module file
and write it into the existing file structure of the operating system.
So a file in the folder etc/httpd/httpd.conf will get written over
/etc/httpd/httpd.conf (provided it is not being used). Note that
if you have a scriptalias in httpd.conf or something you need to
modify, it is recommended for you to create a new file, say mydb.conf
and put all your configuration directives in there e.g.
ScriptAlias /mgalign-cgi "/var/www/cgi-bin/mgalign-cgi"
Allow from all
Then, save this file as mydb.conf and
copy into etc/httpd/alias/mydb.conf
before dir2lzm. Whatever httpd.conf files in
etc/httpd/alias folder will be detected during
the "activate" command and an "include" statement
append to /etc/httpd/httpd.conf. When the
apache httpd server is restarted automatically,
the new directives will be processed.
This is the recommended way of modifying
the behaviour of the httpd.conf.
In other words, if you have to do some remapping of the
root document folder or aliasing of cgi-bin directories
during your porting over to bioslax, this is a good way
of doing it. Take the current httpd.conf and identify
what needs to be changed. Put the commands insider
a separate conf file, and copy it into the etc/httpd/alias
folder and dir2lzm up this file which upon activation,
will copy the conf file into the /etc/httpd/alias file.
Thus the directives in your new config file will override
all affected httpd configurations.
This is a safer way of adding changes to your httpd.conf
because otherwise, you will have to wait for httpd to shutdown and
then copy the new file to overwrite httpd.conf, and the
restart httpd. Otherwise, httpd server will not read the
new httpd.conf new configurations; or worse, the "activate"
command may not overwrite and clobber the old httpd.conf.
The dir2lzm command is a bioslax-specific utility. If you store
all your files in a folder, say /tmp/mydb/*, dir2lzm will be able
to compress and squash everything in that directory if you issue
Convert directory tree into .lzm compressed module
usage: /usr/bin/dir2lzm source_directory output_file.lzm
$ dir2lzm /tmp/mydb mydb.lzm
If you save all files in /tmp/mydb/, e.g.
it will generate /tmp/mydb.lzm which you can transfer to
another new instantiation of a bioslax, and activate mydb.lzm
and it will insert the files into the root level, ie
To check the directories are correct, you can un-zip them all
into a folder to verify the paths using the command, lzm2dir.
Convert .lzm compressed module back into directory tree
usage: /usr/bin/lzm2dir source_file.lzm existing_output_directory
$ mkdir testfolder
$ lzm2dir my.lzm testfolder
$ ls -al testfolder
Porting over MGALIGNIT service
Let's take an example of a simple webservice, the MGAlignIt webtool from
Lee, B.T.K., Tan. T.W. and Ranganathan, S. (2003).
Lee, B.T.K., Tan. T.W. and Ranganathan, S. (2003).
MGAlignIt: a web service for the alignment of mRNA/EST and genomic sequences. Nucleic Acids Research, 31(13), 3533-3536.
This Web service is extant at
As a best practice of achieving re-instantiability should this
Web service end, MGAlignIt has been deposited with biodb100.apbionet.org
on a BioSlax 7.5 cloud-reinstantiable OS base (ISO bootable image
A fully compatible MGAlign tar compressed file of all MGAlign
programs and documents was transferred from proline.bic.nus.edu.sg
(mg.tar.gz), copying specific files from MGAlign in /etc /usr and /var
This was copied into a BioSlax 7.5 instance (126.96.36.199)
on biodb100 cloud by instantiation using http://vmc.apbionet.org/
1. Web configuration (/etc)
/etc/www/httpd/httpd.conf was edited to
do some aliasing to the appropriate cgi-bin directory
2. MGAlignIt uses Python 2.5 (/usr)
Python modules not already in the base OS BioSlax 6.5 was added,
in particular, the PIL module found in site-packages:
and the PIL python executables in
3. MGAlignIt webservice files (/var)
These are found in two directories
3a. MGAlignIt web documents in /var/www/htdocs
Only two folders are involved:
3a.i MGAlign files /var/www/htdocs/mgalign
proline:/var/www/htdocs/mgalign/* were transferred to
3a.ii Python scripts and files /var/www/htdocs/Python
proline:Python modules were transferred into
3b. MGAlignIt cgi executables and libraries in /var/www/cgi-bin
Only one folder is involved:
proline:/var/www/cgi-bin/mgalign-cgi/* moved to
Check all hyperlinks and all cgi-bin scripts for file
pointers specific to the local machine. Modify them
manually to make hyperlinks relative (in this case all
compatible), and cgi-bin file pointers to be portable.
For example, change the full path of the cgi-bin python
scripts to include certain modules which are kept in
some other location.
These changes were conveniently found in the BioSlax 7.5 instance,
188.8.131.52, in the folder
# mkdir /tmp/mg
# mkdir /tmp/mg/etc; mkdir /tmp/mg/etc/httpd
# cp /mnt/live/memory/changes/etc/httpd/httpd.conf
# mkdir /tmp/mg/usr; cp -Rp mnt/live/memory/changes/usr/* /tmp/mg/usr
# mkdir /tmp/mg/var; cp -Rpv /mnt/live/memory/changes/var/www /tmp/mg/var
# cd /tmp; dir2lzm mg mgalign.lzm
Although using the command "savechanges" will save everything
in /mnt/live/memory/changes/* as an lzm instantly, it includes
a lot of unnecessary non-MGAlign files.
Using the above manual approach will
give you an opportunity to identify files which are necessary to the webservice
and also for you to detect and modify hard-coded pointers to machine-specific files which need to be ported to bioslax.
If you wish to modify the httpd.conf, there is an undocumented
bug in the liveCD that does not seem to allow the httpd.conf to
be overwritten. So the solution is to save every new httpd configuration
into a separate file, e.g. mydb.conf,
and save it as /tmp/mg/etc/httpd/alias/mydb.conf. When you
boot up the VM instance, and file transfer your mydb.lzm into
the instance, the web cgi will shut the httpd server, sweep
the /etc/httpd/alias folder, and detect any new file, in this
case mydb.lzm, and construct an include statement, and have
it appended to the /etc/httpd/httpd.conf file, and restart
the httpd server, which will read the new httpd.conf,
and at the last line, call the include file, which
will read from /etc/httpd/alias/mydb.conf all the
When you instantiate a server, our cgi scripts will remotely
boot up the bioslax base OS, and then start copying the lzm
module you have chosen by file transfer to the new instance
once booted up. It will save the lzm into /mnt/hda2/modules.
It will then stop the httpd server. Run the "activate" command.
And then it will restart the httpd server.
The "activate" command will union-write the files in the lzm
module into the live operating system on the fly. In other
words, it will write the lzm module compressed files into
the /live/memory/changes/ folders.
Then, it will run all commands in /etc/rc.d/init.d/*
and one of them is the rc.update_httpd_conf command,
which will sweep the /etc/httpd/alias folder for
any new httpd.conf files and insert an include statement
at the bottom of the httpd.conf that will get the
httpd server to read the config file in the alias folder.
when the httpd server is restarted by the next command
in the "activate" script (mentioned in a previous FAQ).
So if you insert executables inside /etc/rc.d/init.d folder
(making sure you have changed permissions to chmod +x),
then, all those commands will be executed in alphabetical
sort order. This is useful if you have to run your command
upon "activate", ie. if you have to insert the files and then
run a command during instantiation, use this method.
An cloud based initiative of APBioNet and NUS