it4gh icon

IT4GH: Colin Beckingham - Note #4

Converting the ITIS database into Mysql format for reading/reference purposes only

dated: June 12, 2005

Summary

The ITIS database of taxonomic nomenclature (http://www.itis.usda.gov) is a very comprehensive list of latin names which is actively kept up-to-date by the professionals at the Smithsonian Institution in cooperation with a number of government departments, academics and other from across North America. This information is kept freely available via file transfer protocol. Simple queries are best handled through the ITIS website interactive forms. However, complex queries may be either impossible or time consuming to set up using the web form. In this case you need the full download at your fingertips.

Please note the privacy statement on the webpage regarding copyright. Much of the information is in the public domain, but the user is responsible for determining what is not in the public domain.

The format of the download anticipates that the user will have an installation of the relational database information system 'Informix' available as a container for the data. For those who do not have this installation it is still possible to use the informix import files to put the data into other formats, including Mysql. A process is described for achieving this objective. It is assumed that the purpose is for individual reference and that there will be no addition, editing or deletion of data, and that the user has moderate to advanced Mysql and server skills and permissions.

These notes are based on the following versions:
    Mysql: 4.0.18
    Linux: SUSE Enterprise Server 9
    Editor: Programmer's File Editor 1.01.000 on Windows 98 SE

Assumptions

The result of this import process should be suitable for reading information only. A lot of key and constraint information is dropped to make the import easier, which however is vital for those editing and adding new records.

The process

  1. Obtain the compressed files by ftp, decompress, keep the compressed file as a backup in case something goes wrong in the processing. You will need a decompressor such as ZipGenius to unzip tar.gz type files on a windows machine.
  2. Creating the tables.
  3. Modifying the instructions: The file 'itis.sql' contains the instructions for creating the tables using SQL (Structured Query Language). However the sql format is specifically related to Informix. Mysql is also able to read instructions from a text file; however the instructions are significantly different. The following table shows the Informix and Mysql versions side by side for one of the 21 tables.
    Informix format
    Mysql format
    { TABLE "itis".taxon_unit_types row size = 29 number of columns = 6 index size =
                 15 }
    { unload file name = taxon00105.unl number of rows = 136 }

    create table "itis".taxon_unit_types
      (
        kingdom_id integer not null ,
        rank_id smallint not null ,
        rank_name char(15) not null ,
        dir_parent_rank_id smallint not null ,
        req_parent_rank_id smallint not null ,
        update_date date not null
      );
    revoke all on "itis".taxon_unit_types from "public";
    create table taxon_unit_types
      (
        kingdom_id int ,
        rank_id smallint ,
        rank_name char(15) ,
        dir_parent_rank_id smallint ,
        req_parent_rank_id smallint ,
        update_date date
      );
    load data infile '/tmp/itis/taxon00105.unl' into table taxon_unit_types fields terminated by '|';

  4. A few notes:
    1. The schema name "itis." is dropped
    2. Table and field names remain the same as in Informix - with a couple of exceptions:
      1. table 'taxonomic_units' has a field called 'usage' which is a mysql keyword. Renaming this field to 'usagex' removes the import issue
      2. table 'vernaculars' contains a reference to a 'unique' index, this should be replaced with 'index unique'
    3. Field types must be converted to the Mysql equivalent, e.g. 'serial' => 'int'.
    4. Remove all references that are specific to informix:
      1. 'year to minute' and 'year to second'
      2. character specifications such as (xx,x) should be replaced by (xx)
      3. constraint clauses
      4. references to char which call for length > 255 must be replaced by 'text' type columns
    5. It will be necessary to use an editor which can handle newlines easily in the search and replace process.
  5. Once the file has been correctly formatted and saved as 'myitis.sql' we transfer all files to a temporary directory on the server such as '/tmp/itis/', log in to mysql and run the commands:

    mysql> create database itis;

    -- then 'grant' permissions on the new database and quit the mysql interactive shell.

    unixprompt > mysql -u xxxxx -pyyyyyyy itis < /tmp/itis/myitis.sql

    - the result of which should be no error messages, meaning that all the tables have been set up and the data imported. On a Pentium 400 the import takes about 5 minutes.

  6. You may wish to set up further indexes to make searching and sorting faster.
The Result

We now have a functional reference version of the ITIS database. Some knowledge of databases and queries is required to set up the necessary access to the information. An example which retrieves the records from the kingdom 'Plantae' which shows genus, species, subspecies and sub-subspecies might be:

SELECT taxonomic_units.unit_ind2, taxonomic_units.unit_name1, taxonomic_units.unit_ind1, taxonomic_units.unit_name2, taxonomic_units.unit_ind3, taxonomic_units.unit_name3, taxonomic_units.unit_ind4, taxonomic_units.unit_name4, taxonomic_units.usage, taxonomic_units.unaccept_reason
FROM taxonomic_units
WHERE (((taxonomic_units.kingdom_id)=3))
ORDER BY taxonomic_units.unit_name1, taxonomic_units.unit_name2, taxonomic_units.unit_name3, taxonomic_units.unit_name4;




Colin Beckingham is a freelance writer and PHP programmer who lives near Kingston, Ontario.
He can be contacted through his web site at http://www.it4gh.com

Valid XHTML 1.1

Updated: Sept 13, 2006

grapes