The Giardia Genome Project: Production Pipeline and Assembly of LI-COR Bi-Directional Sequence Data

Hilary G. Morrison, Andrew G. McArthur, Julie E.J. Nixon, Nora Q.E. Passamaneck, Ulandt Kim, Melissa K. Crocker, Gregory Hinkle, Michael E. Holder, Rebecca Farr, Claudia I. Reich, Gary J. Olsen, Lorena A. Fierro, Stephen B. Aley, Rodney D. Adam, Frances D. Gillin and Mitchell L. Sogin
The Josephine Bay Paul Center for Comparative Molecular Biology and Evolution
The Marine Biological Laboratory, Woods Hole, MA 02543
E-mail: morrison@mbl.edu, sogin@mbl.edu


Abstract

We have undertaken complete sequencing and annotation of the 12 MB genome of the eukaryotic parasite, Giardia lamblia (Figure 1). Our genome project relies on LI-COR bi-directional reads for primary shotgun sequence data. We have achieved four-fold coverage of the genome in three years (>99% of coding capacity) and have assembled the data into approximately 1400 contigs. We are now using directed plasmid and BAC sequencing to join contigs and map contigs to BACs and to Giardia's five chromosomes. We have recently begun annotation of the genome. Our production pipeline begins with individual reads (SMP or SAMP files, after basecalling and minimal editing) and ends with annotated contig files. We utilize several pre-existing tools for sequence analysis, including modules from the SEALS, BLAST, GCG, and PHRED/PHRAP/CONSED packages. Additionally, we have created a number of UNIX® and perl scripts specific to LI-COR data and this genome project.

image

Figure 1. Giardia Lamblia parasite.

 

x
A A A
line
800 | 1024 | max
Biotechnology
4647 Superior St
Lincoln, NE 68504
Toll-Free: 800-645-4267
Email: biohelp@licor.com
Environmental
4421 Superior Street
Lincoln, NE 68504
Toll-Free: 800-447-3576
Email: envsales@licor.com