The Giardia Genome Project: Production Pipeline and Assembly of LI-COR Bi-Directional Sequence Data

Hilary G. Morrison, Andrew G. McArthur, Julie E.J. Nixon, Nora Q.E. Passamaneck, Ulandt Kim, Melissa K. Crocker, Gregory Hinkle, Michael E. Holder, Rebecca Farr, Claudia I. Reich, Gary J. Olsen, Lorena A. Fierro, Stephen B. Aley, Rodney D. Adam, Frances D. Gillin and Mitchell L. Sogin
The Josephine Bay Paul Center for Comparative Molecular Biology and Evolution
The Marine Biological Laboratory, Woods Hole, MA 02543
E-mail: morrison@mbl.edu, sogin@mbl.edu


Conclusions

At the end of three years of our sequencing effort, the Giardia lamblia sequence database (www.mbl.edu/Giardia) contains over 50 million bases, representing four-fold genome coverage. Average trimmed read length is 885 nucleotides (Fig. 5). Data quality is very high (phd quality value > 20) out to 950-1000 bases (Figure 6). At the time of the last data release, the reads assembled into approximately 1400 contigs, with a total length of greater than 11 Mbp (Figure 7). Our results demonstrate that a shotgun sequencing approach using bi-directional reactions and LI-COR automated sequencers is well suited to a small genome project.

Figure 5
image

Figure 6
image

Figure 7
image

Our ability to generate long and accurate reads means that reliable first-pass sequence data can be released to the scientific community and used to jump-start specific research. Furthermore, the long, bi-directional reads have allowed us to link several hundred of the contigs into "super-contigs," since the two reads from a single clone sometimes assemble into two different contigs.

BLAST results show that over 27% of the reads contain significant similarity to published protein sequences. Among these are proteins involved in intermediary metabolism, nucleic acid processing, cytoskeletal structure, and cell division. Interestingly, although introns have not been reported in Giardia, we have identified genes similar to PRP8 and RNP specific proteins, both of which are involved in RNA splicing, and have detected genes which potentially contain introns. And, although Giardia is amitochondriate, we have found homologues of mitochondrial proteins. This suggests that at one time in its evolutionary history, Giardia harbored a prokaryotic endosymbiont. Another surprising discovery is a protein that displays nearly 95% similarity at the amino acid level with a cDNA that is expressed in embryonic mouse and human placental tissue. The function of this protein is unknown in any system. As expected, we have discovered many open reading frames that do not return any significant BLASTX hits and potentially encode unique or novel proteins.

References

S. Roemer, J. Amen, R. Bruce et al., 1997. A New Near-IR Fluorescence Automated DNA Sequencer. Poster pre- sented at Automation in Mapping and DNA Sequencing, Heidelberg, Germany, March 1997. LI-COR Application Note #484. http://bio.licor.com.

S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. 1990. Basic Local Alignment Search Tool.
Journal of Molecular Biolology, 215:403-410.

D. Gordon, C. Abajian, and P. Green.1998. CONSED: A Graphical Tool for Sequence Finishing.
Genome Research 8:195-202.

J. H. Badger and G. J. Olsen. 1999. CRITICA: Coding Region Identification Tool Invoking Comparative Analysis. Molecular Biology Evolution 16:512-524.

Acknowledgments

Supported by grant AI43272 to M.L.S. from the National Institutes of Health, LI-COR Biotechnology Division, and the generosity of the G.Unger Vetlesen Foundation. The following have also contributed to this work: Bruce Luders, Scott Bressoud, Elizabeth Duffy, Margaret Bradley, Seth Ament, Dave Gellis, Jeff Kim, John Darga, Alexandria Papa and Martin Foster.

 

x
A A A
line
800 | 1024 | max
Biotechnology
4647 Superior St
Lincoln, NE 68504
Toll-Free: 800-645-4267
Email: biohelp@licor.com
Environmental
4421 Superior Street
Lincoln, NE 68504
Toll-Free: 800-447-3576
Email: envsales@licor.com