A quick tutorial of the BioXRT TBrowse

[back to BioXRT home]

This is a quick tutorial to take you through the main steps to create a BioXRT database and publish it on the internet through TBrowse. This tutorial assumes that you have successfully set up an Apache web server with MySQL, Perl, BioPerl, other BioXRT dependencies and the BioXRT system itself (instructions for installing system dependencies, and the necessary BioPerl modules, are included in the BioXRT distribution).

Aside: $HTDOCS is used to indicate your web server's DocumentRoot; the place where web pages are stored. This is defined when the server is installed (usually in the httpd.conf file), and can be provided by the server admin.

1. About the XRT model

    Before data can be loaded into the database, it has to be converted into XRT format, which is just a tab-delimited flat file with column headers (details about XRT specification). In the XRT distribution, you will find a sample XRT table named “Gene.xrt” under “$HTDOCS/bioxrt/tutorial/sample_data/chr7annotation” directory. The file looks like this:

    ID Type Symbol OldID LocusLinkID C_ID/LocusLink
    GA0001 Known_Gene ABP1 HSC7000001 26 LL000026
    GA0002 Known_Gene ACHE HSC7000002 43 LL000043
    GA0003 Known_Gene ACTB HSC7000003 60 LL000060
    ......          

    XRT organizes data into different classes (e.g. gene, transcript, clone, mutation, disease etc.); each class has many attributes to describe the properties of its elements. The XRT table above contains three elements (each in the Gene class), each with six attributes; the first line specifies the attribute names, while the following lines contain the actual attribute values for elements. Three attributes (ID, P_ID, C_ID) have special meaning to the system. The ID field (mandatory to all entries) stores the unique identifier of each data element; P_ID / C_ID are used for more complex parent / child relationships between entries which will not be used here.
     

2. Preparing XRT tables

    Original data can be derived from diverse sources, including external data or internal experiment results. For simple data, for example data with just one class, we suggest putting it in a Microsoft Excel spreadsheet, which can be easily maintained by biologists with little or no database knowledge. Excel spreadsheets can be easily converted into XRT format. Data in other tabular or text formats can easily be converted to XRT format using standard tools, and for data in more complex formats, a simple parser can automate the process.
     

3. Loading XRT tables into database

    XRT tables can be loaded directly into a database with the tools provided. For example, you can create an empty database with the name “chr7annotation”, and load the Gene.xrt file into it, with the following two commands (you may need to substitute -u(ser) and -p(ass) settings depending on how you have configured MySQL):

    mysql –u username –p password –e “CREATE DATABASE chr7annotation”
    bulk_load_xrt.pl –database chr7annotation –user username –pass password Gene.xrt

    Revised data (for example, new attributes have been added) can then be imported into the database using the bulk_load_xrt command, replacing earlier iterations of the data.
     

4. Defining output table for TBrowse

    Now we will need a TBrowse configuration file to tell TBrowse how to display this data set in an HTML table. In the “$HTDOCS/bioxrt/tutorial/sample_conf” directory, you will find a sample configuration file named “02.chr7annotation.conf”. Copy this into your TBrowse configuration directory ($CONF/bioxrt.conf, specified at the top of the cgi-bin/tbrowse script). The table definition section in the file (illustrated below), defines a table named “Gene” with 4 columns derived from 4 attributes of the Gene class. Please note that an arbitrary web link can be defined for table columns. You may have to add the user and pass information for connecting MySQL database.
     
    #### configuration file of pre-defined tables for BioXRT TBrowse
    xrt_db      = chr7annotation
    host        = localhost
    user        =
    pass        =

    description = TCAG annotated genes on human chromosome 7

    page header = <h2>BioXRT TBrowse Test: TCAG annotated genes on human chromosome 7</h2>

    #bgcolor    = #E5F5F5
    width       = 780
    rows        = 20
    rows2choose = 10 20 50 100 200 0
    layout      = hHtml
    show back   = 1
    link_target =

    ### below are the table definitions
    # table key in the square brackets

    [Gene]
    view_id           = V0001
    title             = TCAG annotated genes on human chromosome 7
    main_class        = Gene
    Column1.          = 0::ID::Gene ID::http://www.chr7.org/cgi-bin/geneview?id=*
    Column2.          = 0::Type
    Column3.          = 0::Symbol
    Column4.rclass    = 0
    Column4.attribute = LocusLinkID
    Column4.header    = Locus Link ID
    Column4.url       = http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=*

    The view_id is simply a unique name for this view, and the main_class indicates what class the upcoming attributes (in column definition rows) should be based on. To define a table column, four options (class, attribute, column header, hyperlink, while the last two are optional) delimited by “::” need to be specified. Column can be defined in two formats: one-row and multi-row definitions. The column settings above are fairly self explanatory with the exception of the “*” in the URL link, which is replaced with the value of the column in question, and the “0” class, which should be left intact for simple displays with only one class involved such as this, but is used for more complex tables.
     

5. Exploring the data in TBrowse

    In your web browser, open

    http://YourDomainName/cgi-bin/tbrowse?source=chr7annotation

    and you should be able to see the interface of TBrowse. Using the simple controls presented, you can browse the entire table, search for keywords, and filter results by column values to obtain your data of interest. If there is a particular 'default' view you wish to present to visitors to your site, all controls can be represented in the URL you provide. For example,

    http://YourDomainName/cgi-bin/tbrowse?source=chr7annotation&table=Gene&keyword=
    Known_Gene&fcol=Column3&fcomp==&fkwd=ACHE

    will display results from the database with the keyword “Known_Gene”, but only show the match records whose Column3 (i.e. Symbol) equals “ACHE”.