Tetrahymena Whole-Genome Sequencing Project: A Concept Paper
November 5, 2001
Preface. To facilitate access to the large diversity of
information, this concept paper is organized into two components: Main and
Appendix. The Main document contains the introductory overview of the case for
sequencing the Tetrahymena genome, the concise description of the
sequencing and annotation project, and the specific answers to the NonMammalian
Models Committee questionnaire. The Appendix expands on the coverage of two
topics in the main document: advanced molecular genetic tools and unique or
very special studies that would be enabled by the availability of the genome
sequence. The latter are grouped by topic into numbered tables for easy
reference from the main document. All the cited references are listed in the
Appendix, organized so that related references are generally clustered
together. Page numbers in tables of contents (main and appendix) may be
sensitive to software print settings and are therefore only approximate.
|
Section |
Topic
|
Page |
|
1. |
Introduction:
Why sequence the Tetrahymena genome?
|
1 |
2.
|
Aims and
outline of the project
|
4
|
|
3. |
Answers to the Non-Mammalian Models Committee
questionnaire |
5 |
|
a) |
Community
process |
5 |
|
b) |
Other sources
of support. |
6 |
|
c) |
Advantages and
limitations of the model organism for research purposes. |
6 |
|
d) |
Justification for needing the genomic resources now. |
9 |
|
e) |
Existence or plans to develop the proposed resources
outside the U.S. |
10 |
|
f)&g) |
Unique advantages of having the genomic information of this
organism and scientific advances that will be made possible. |
10 |
|
h) |
Cost of the
project. |
11 |
|
i) |
Duration of the
project. |
11 |
|
j) |
Support of
resources after the completion of the project. |
12 |
|
k) |
Availability of
data and resources generated by this project to the research community. |
12 |
|
l) |
Genomic
resources currently existing. |
13 |
|
m) |
Size of the
research community. |
14 |
|
n)&o) |
Who will
benefit from the improved genomic resources and how? |
14 |
|
p) |
Material
transfer agreements. |
15 |
|
Table 1 |
Tetrahymena Genomic Resources and Database Needs |
16 |
Appendix
------------------
1. Introduction: Why sequence the Tetrahymena genome?
Tetrahymena is a fresh-water protozoan that is highly successful
ecologically. It has been used as a microbial animal model for more than 75
years -- ever since Nobel Laureate Andre Lwoff [1] in 1923 succeeded in growing
this unicell under axenic conditions, i.e., in pure culture. Tetrahymena has typical eukaryotic
biology. Although unicellular, Tetrahymena
displays a degree of cellular structural and functional complexity fully
comparable to that of humans and other metazoans. Its ultrastructural
morphology, cell physiology, development, biochemistry, genetics, and molecular
biology have been comprehensively investigated [2-6]. Certain eukaryotic
mechanisms are uniquely or especially well developed in Tetrahymena, and have facilitated discoveries that have generated
major fields of fundamental research:
Advanced molecular and genetic tools developed in Tetrahymena have maintained this organism
at the forefront of fundamental research. This is particularly the case in
areas that are less accessible to in vivo
experimental investigation in other model organisms, such as regulated
secretion, cell motility, phagocytosis, telomere function, function of
post-translational modifications of histones and tubulins, and developmental
DNA rearrangements. Sustained extramural grant support of Tetrahymena research and published statements by leading
researchers doing related work on other organisms support this self-assessment:
telomerase [21], tubulins [22-24] regulated secretion of stored proteins [25]
and development [26].
The advances in Tetrahymena
knowledge and technology have resulted from the very productive and highly
collaborative efforts of the ciliate community, which is the largest genetic model organism community
without a genome project. The juncture has now been reached where the
enormous potential of Tetrahymena for
research in various areas -- fundamental science, genomic, biomedical, public
health, bioagricultural, environmental and biotechnological -- will be wasted
unless its genome sequence is quickly determined. The ciliate molecular biology
research community has chosen Tetrahymena
as the ciliate whose genome should be sequenced first because it has the most
advantageous combination of biological features, the only genetic and physical
mapping and other important accumulated genomic resources, and the most
powerful array of molecular genetic tools for post-genomic in vivo experimental
functional genomics. In this document, the ciliate community proposes a project
to sequence, assemble, annotate and make publicly available the entire Tetrahymena expressed (macronuclear)
genome, under a plan described later in this concept paper.
There are at least five major, unique or special reasons
(described in more detail later) why the sequencing of the Tetrahymena genome would be an important contribution to science.
1) Evolutionary genomics: key phylogenetic position for
comparative genomics. The ciliate Tetrahymena
occupies a key position in the third, major independent branch of eukaryotic
evolution, the Alveolata [27; 28]. All of the model organisms that have "completed" or on-going genome
projects belong to the two other major clades: the Heterokonta (metazoa, fungi,
Dictyostelium) or the Viridiplantae (plants and Chlamydomonas).
The Alveolata also include the Dinoflagellates and the Apicomplexa -- a group
exclusively composed of medically or agriculturally important parasites of
metazoa. Several Apicomplexans, e.g., Plasmodium (the human malarial
parasite), have ongoing genome projects, but their genomes are small (10-20% of
the Tetrahymena genome). This genome
simplification likely results mainly from the loss of functions supplied by their
hosts. No free-living member of the entire Alveolate clade -- let alone an
experimentally tractable genetic model
organism -- has an ongoing genome project.
2) Investigating the unknown functions of important human
genes. Humans share a higher degree of functional conservation
with ciliates than with other microbial model organisms. This is evidenced by
better matches (i.e., lower probability of a chance match) of Tetrahymena EST [29] and Paramecium
coding sequences [30] to humans than to other non-ciliate microbial model
organisms. Significant Tetrahymena
EST matches to human proteins occur not just among housekeeping genes [29].
Examples: an opioid-regulated protein with previously unknown function
(recently elucidated in Tetrahymena
[31]), a protein required for stem cell maintenance [32], a brain NMDA-receptor
glutamate-binding protein, and several human brain-expressed genes with unknown
function sequenced by the Japanese KIAA project. Some of those proteins are not
found in yeast. Tetrahymena is thus an excellent unicellular animal model.
Sequence conservation over more than a billion years of
independent evolution predicts that the function of the genes is important --
and likely to cause human hereditary disease by dysfunctional mutation -- and that
the proteins have retained their basic, ancestral biochemistry and molecular
biology. Thousands of human genes of unknown function are predicted by the
human genome sequence [33; 34]. Sequence conservation, coupled with the
advanced and powerful experimental tools available in Tetrahymena, thus would
confer on the biomedical research community an enormous opportunity to use Tetrahymena in the experimental
elucidation of the in vivo function
of many important human genes at the cell
and molecular level. The results of this work would complement
investigations of human gene function at more integrative levels using multicellular
animal models.
3) Experimental functional genomics: advanced molecular
genetic tools. An impressive array of robust and novel molecular genetic
tools have placed Tetrahymena at the
forefront of experimental, in vivo
functional genomics research [35]. Two unique genetic features, heterokaryons
and assortment genetics, are used in combination with a battery of DNA-mediated
transformation techniques in novel, powerful and versatile ways. We anticipate
an increased use of these methods by the general scientific community once the
genome sequence becomes available.
We propose to sequence the expressed (somatic or
macronuclear) genome because, during its programmed differentiation, it retains
all the genes and other DNA elements required for vegetative life, while
eliminating most of the repeated sequence in the germline genome. Furthermore,
we propose to seek the finished sequence
of the genome for several scientifically important reasons:
The size of the Tetrahymena
macronuclear (MAC) genome (~180 Mb) precludes a distributed timely and
cost-effective sequencing effort by the Tetrahymena
research community. The plan that follows is based on careful consideration of
interest, feasibility assessment, sequencing approaches and preliminary cost
estimates provided by five major sequencing facilities (The Institute for
Genomic Research (TIGR), Whitehead Institute-MIT, University of Oklahoma,
University of Washington, and Integrated Genomics Company). Three centers
(including the Joint Genome Institute) have obtained Tetrahymena DNA from us and intend to start genomic
test-sequencing.
b) Closure
of the genomic sequence. Most, if not all, of the Tetrahymena macronuclear genome can be closed with high throughput
technology already in use in other genomic projects. Higher priority will be
given to the closure of protein coding segments and their flanking sequence.
Sequencing the macronuclear genome will avoid obstacles that have
prevented closure of the other eukaryotic genomes, e.g., centromeric DNA,
repeated DNA and extended GC tracts. Closure of those (protein non-coding)
regions that have the highest AT composition may present a challenge and an
opportunity. Closure of the sequence of two entire ~1 Mb chromosomes from the
malarial parasite Plasmodium [39; 40] shows that the challenge can be
overcome, even when their average A+T composition (83%) is significantly higher
than that of Tetrahymena genome
(75%). The opportunity is to use the Tetrahymena
sequencing project to develop and test technology to facilitate the cloning and
sequencing of larger AT-rich-DNA genomes.
3. Answers to the Non-Mammalian Models Committee
questionnaire (http://www.nih.gov/science/models/process/index.html)
a. By what process did the community obtain input and reach a
consensus about the priority for the proposed project?
b. What other
sources of support, including non-U.S. sources, exist?
·
Project under the direction of William
Nierman (TIGR) to make a Tetrahymena
BAC library of ~50-kb inserts. This library would supplement or replace the
linking library for the assembly and scaffolding of the Tetrahymena MAC genome.
·
We are currently exploring additional
sources of partial support for the genome-sequencing project.
Table 1 contains
a more systematic listing of funding, already awarded, for completed or
in-progress genome-wide projects.
c. What are the
advantages and limitations of the model organism for research purposes,
including genome size, tractability for genetic studies, ease of use,
generation time, storage of organism or gametes, etc.?
c1. Genome size and other genomic features
Advantages:
Potential Limitation:
a) "Conventional genetics" tools
·
Conjugation is readily induced and experimentally manipulated,
allowing crossing, genetic analysis, mapping, and manipulation of replaced
genes.
·
Readily inducible self-fertilization, leading to whole-genome
homozygotes in a single step.
3) Ease of use [reviewed in 44]
·
Dual, self-sufficient nutritional modes: particle (bacteria)
phagocytosis and small-molecule uptake by active transport. This gives complete
control of Tetrahymena's chemical and
physical growth environment, as well as making phagocytosis essential or not
according to experimental conditions.
·
Large cell size (50 x 30 um): facile injection, cytology,
immunocytology and FISH, electrophysiological recording and large-scale cell
fractionation (micronuclei, mature and developmental stage-specific
macronuclei, nucleoli, mitochondria, cilia, phagosomes, lysosomes, protein
storage secretory granules, cell cortex, etc.).
·
Growth under wide range of volume conditions - microdrops to
large bioreactors: industrial-scale production of valuable small molecules and
macromolecules.
·
Vast temperature range for growth (18OC-41OC):
great latitude in experimental investigation.
·
Readily visualized and quantifiable physiological endpoints:
growth rate, phagocytosis rate, induced exocytosis, swimming speed and
direction, chemotaxis, active water expulsion rate, cytokinesis, conjugation,
meiosis induction, nuclear differentiation: useful for fundamental studies and
for determining quantitative structure/activity relationships (e.g., drug
design, environmental toxicants)
4) Generation time
·
Fastest growing microbial animal model (down to 1.5hr
doubling time): Quick results, low maintenance costs, compact space
requirements, noncontroversial animal model.
5) Storage of the organism or gametes
Cells are readily frozen alive at
liquid nitrogen temperature [45]. This allows long term maintenance and
germline protection of valuable strains.
6) Other advantages
·
Mitosis and meiosis restricted to a germline nucleus that is
non-essential for growth, and gene transcription restricted to a non-mitotic
nucleus: facilitates mutational and other experimental analyses of these
processes.
·
Developmentally programmed, site-specific DNA rearrangements
during MAC differentiation: precise germline-determined chromosome breakage;
formation of physically and genetically identifiable MAC chromosomes;
extensive, suppressible chromosome diminution.
·
Developmentally-regulated nuclear apoptosis at the
"early development" stage of conjugation, resulting in the selective
elimination of the parental macronucleus.
·
Single germline copy of ribosomal RNA genes. This is a
unique feature of Tetrahymena that
has allowed conventional rRNA genetics, ribosomal antisense repression and
mutagenesis technology (see section c2 and Appendix section 1).
·
Abundance of sibling species with well-characterized
phylogeny: useful for decryption of regulatory DNA elements and functional RNA
domains.
7) A challenge: the Tetrahymena variant genetic code
(UAR=Q)
The ciliates constitute a remarkable natural laboratory for
investigating the late evolution of a diversity of variant genetic codes. In Tetrahymena [46] and many other
ciliates, UAR (UAA and UAG, stop codons in the "universal code") are
additional glutamine codons, leaving UGA as the only stop codon. This
phenomenon poses no significant problem for the expression of foreign genes in Tetrahymena, which has already begun and
is likely to become a major use of this organism. Genes already ending in UGA
(from other ciliates or from universal code organisms) are directly expressible
in Tetrahymena. For the rest, it
would be sufficient to include a UGA codon flanking the insertion site in a
universal expression cassette.
The variant code does become an important consideration when
expressing Tetrahymena genes in
universal-code cells. This challenge has been addressed by different
approaches: 1) a general approach based on expression in a host carrying a UAR
nonsense suppressor [47] or 2) UAR to CAR codon replacement in the Tetrahymena gene -- either by targeted
in vitro mutagenesis (if the codons are few) or by efficient protocols for de
novo synthesis of the entire gene [48-50].
d. What is the justification
for needing the genomic resources now, rather than later, when costs are
likely to be lower?
e. Do the
proposed resources exist, or are there plans to develop such resources, outside
the U.S.?
A significant
amount of resources to support this proposed project already exist (see section
3l).
None of the
resources proposed in this project exist outside the U.S. A project is being
explored with Genome Canada to contribute sequence and closure in conjunction
with (i.e., as matching funds for) this proposed project (section 3.b).
f & g. What
are the unique advantages of having the genomic information of this
organism? What scientific advances will be made possible that otherwise would
not, given the current state of the genomic tools?
Tetrahymena has
conserved virtually all the ancestral cellular processes and structures shared
by humans and other currently living eukaryotes. In addition, this organism
exhibits insightful elaborations of basic eukaryotic mechanism, which highlight
the functional versatility and diversity of these mechanisms, and render them
particularly accessible to investigation. We expect that thousands of Tetrahymena
proteins will have sequence homology with important human proteins, the mutational
dysfunction of which is expected to cause hereditary disease (section 1b). Thus
the availability of the Tetrahymena genome sequence, favorable
biological features (Section 3c) and advanced experimental tools (section 3c2
and Appendix section 1) should significantly contribute to the elucidation of
the molecular basis of diseases that fall under the mission of every
medically-oriented NIH institute. For the moment, we list concrete areas of
general research interest in which on-going investigations in Tetrahymena,
in combination with the availability of the genome sequence, present unique
opportunities to contribute to the mission of a number of NIH Institutes and
other federal granting agencies. These research areas are listed in the table
below, with specific pointers to the more detailed information given in the
appropriate (and referenced) tables of Appendix Section 2.
Area and Potential funding source |
Table |
|
In vivo telomere function and telomerase function (NCI, NIA, NIGMS, NSF) |
1 |
|
Chromosome copy number homeostasis (NCI, NIGMS, NSF) |
2 |
|
Developmentally regulated chromosome breakage and telomere
formation (chromosome healing) (NCI, NIA, NIGMS, NSF) |
2 |
|
Developmentally regulated ribosomal gene amplification
(NCI, NIGMS, NSF) |
2 |
|
Developmentally regulated, immunoglobulin-gene-like
chromosome breakage-rejoining (chromatin diminution) (NCI, NHLBI, NIAID,
NIGMS, NSF) |
2 |
|
Function of chromatin in mitosis, meiosis, transcription
and developmentally-programmed DNA rearrangement (NCI, NIGMS, NSF) |
3 |
|
Function of histone post-translational modifications (NCI, NIGMS, NSF) |
3 |
|
Epigenetic inheritance (NCI, NICHD, NIGMS, NSF, USDA) |
4 |
|
Developmentally regulated apoptosis (NCI, NHLBI, NIA, NIAID, NIAMS, NICHD, NIGMS,
NINDS, NSF) |
5 |
|
Genetic analysis of the functions of the large rRNAs (NIGMS, NSF) |
6 |
|
Microtubule diversity: 17 distinct systems including
cilia, centriolar structures and mitotic spindles (NCI, NEI, NHLBI, NICHD,
NIGMS, NSF) |
7 |
|
Function of tubulin post-translational modifications
(NIGMS, NSF) |
7 |
|
Cytoskeletal motors (NCI,
NHLBI, NIAMS, NICHD, NIGMS, NINDS, NSF) |
8 |
|
Regulated secretion of protein storage granules (NIDDK, NINDS) |
9 |
|
Phagocytosis and phagosome-mediated bacterial pathogenesis
(NEI, NHLBI, NIAID) |
10 |
|
Chemoreception and signal transduction (NCRR, NHLBI, NIDA) |
11 |
|
Cell-cycle dependent regulation of cytoskeletal proteins (NCI) |
12 |
|
Stem cell maintenance (NCI, NICHD) |
12 |
|
Differential control of DNA replication and division in
germinal and somatic nuclei (NCI) |
12 |
|
Highly organized and polarized cell cortex; cellular
handedness; intracellular
positional information (NICHD, NSF) |
13 |
|
Developmentally-controlled interactions between nuclei and
the cell cortex (NICHD, NSF) |
13 |
|
Biotechnology (NCI,
NIAID, NIDDK, USDA) |
14 |
|
Environmental adaptation and monitoring (DOE, EPA, NIEHS, NSF) |
15 |
|
Eukaryotic evolution (NHGRI,
NIGMS, NSF) |
16 |
|
Science education opportunities (NHGRI, NSF) |
17 |
|
Additional intriguing phenomena in Tetrahymena, not
yet investigated molecularly: sexual maturation, resistance to viral
infection, and conserved signaling elements |
18 |
h. With as great
precision as possible, what is the cost of the project?
The estimated
total cost is $20.9M, distributed as follows:
Stage
|
Year |
Category |
Cost |
|
1 |
1 |
8X-coverage WGS sequencing and assembly* |
$7.5M |
|
|
|
Database establishment** |
$0.55M |
|
2 |
2 |
Sequence closure and electronic annotation |
$9.5M |
|
|
|
Database maintenance |
$0.35M |
|
3 |
3 |
Manual annotation by sequencing center experts |
$2.5M |
|
|
|
Database expansion to deal with
the finished sequence |
$0.45M |
Total
|
(3-year)
|
Direct and indirect costs*** |
$20.9M |
* If funding availability
becomes a limiting factor, it would be very important to at least accomplish a
5-fold coverage WGS sequencing during the first year, in order to be able to
quickly initiate the electronic annotation. This 5-fold level of WGS sequencing
would already bring major project benefits for comparative and
experimental functional genomics to the ciliate and general research community,
because it should identify virtually all the Tetrahymena genes -- and
already provide finished sequence for many -- that will match those in other
organisms. While cloning some genes from partial sequence would still involve
some gene-by-gene labor, this would already enormously accelerate the pace of
research and discovery.
** The needs are described in detail in section k below.
Estimated costs are based on current salary scales and are reported on the
basis of establishing an independent database. Some savings are expected
(mainly in salaries) if, as we intend, we can affiliate with an existing model
organism database.
*** This total would be reduced by any matching contribution
negotiated with Genome Canada.
i. What is the
duration of the project?
Three years, as
follows:
Year 1:
j. How will
resources, such as databases and repositories, be supported after the
completion of the project?
k. How will data
and resources generated by this project be made available rapidly and
efficiently to the research community?
·
Posting partial sequence and assembled contigs in a public,
freely available and user-friendly sequencing center database as they become
available.
The Tetrahymena
database would be initially staffed as follows, taking into account the
significant genome size and gene number:
1) A full-time Database Administrator, with strong
bioinformatics experience.
2) A full time Programmer, to customize software for use
with Tetrahymena. In years two and three, this position would be reduced
to 50%.
2) Two full time Curators, having responsibility for:
The Curator should have substantial experience with Tetrahymena/Ciliate experimental biology
and genomics. Two well-qualified members of the Ciliate research community have
already expressed interest in filling those positions. A third Curator would be
added on year 3, to deal with the influx of finished genomic sequence.
In addition, we would need hardware with sufficient power to
deal with the computational needs of a 180 Mb genome and 30,000 genes.
We have considered two general strategies for establishing
the Tetrahymena database.
1) Affiliation with a well-established, genetic model
organism database -- such as FlyBase, SGD or Wormbase -- which would be
immediately capable of providing administration, server and informatics
support.
2) Establishing our own independent database, with the
staffing proposed above.
We strongly favor the first alternative as a time- and
cost-effective way to establish our database. We have had exploratory
discussions with senior members of the above databases and we have been greeted
with a supportive attitude. We have also learned that packages are under
development that would facilitate and generalize the establishment and
maintenance of independent genetic model organism databases. Thus we are
confident that we can successfully establish a database that will be responsive
to the needs of the scientific community. Once the Tetrahymena database is functioning successfully, we look forward
to expanding its services to become a general ciliate database,
including genomic data from Paramecium and other experimental ciliates
that possess their own valuable biological and experimental features.
l. What genomic
resources, including databases and repositories, currently exist?
·
Physically mapped sequenced tag sites (STS): Cbs, cloned DNA
polymorphisms and other STS. These will anchor the DNA sequence to the physical
map.
In addition, sequence from a random genomic sequencing pilot
project on the related ciliate Paramecium [30] is likely to contribute
to the quality of Tetrahymena gene
prediction and annotation.
m. What is the
size of the research community for the organism?
n & o. Who
will benefit from the improved genomic resources? The immediate community? The
broader biomedical research community? What will be the benefits?
The most immediate beneficiaries of the Tetrahymena genome sequence and annotation will be the many
on-going Tetrahymena research
programs funded by U.S. (mainly), Canadian, European and Asian granting
agencies. The genome sequence:
The sequence will also enlighten the biology and facilitate
research on other ciliates and alveolates (including apicomplexan human and
animal parasites) by providing the sequence of protein homologs that are
evolutionarily much closer than those in heterokonts or green plants. This will
facilitate the cloning of wanted genes using labeled probes or degenerate PCR
primers.
Finally, through collaborative research, the sequence will
put the very complete animal proteome and the advanced experimental tools of Tetrahymena, documented in this concept
paper, in the service of fundamental, biomedical and applied research by the general
scientific community. A list of postgenomic resources that will facilitate
this research is included in Appendix section 3, although we are not requesting
funding for these resources now. In the long term, the benefits to the general
research community may well become one of the most important contributions
derived from the Tetrahymena genome
sequence.
p. Are there any material transfer
agreements that would affect the availability of data or resources produced by
this project?
No material transfer agreements are anticipated (or desired) that would affect
the availability of data or resources produced by this project. The centers
that have so far shown the greatest interest and intellectual engagement in
this project are non-profit organizations. Regarding existing genomic
resources, all the sequences are posted in GenBank and/or in freely accessible
public databases. All mutants and other useful Tetrahymena laboratory strains are freely made available without
restriction. The highly inducible MTT promoter is subject to a very friendly
MTA designed to promote its academic use.
Table 1. Tetrahymena Genomic Resources and Database Needs
Funding sources are indicated in parenthesis
|
Resource |
Already available |
Expected when |
Wanted in the Database |
|
DNA sequence maps |
- 1-2 kb surrounding an estimated 15% of the chromosome
breakage sites already sequenced (NCRR) - ~75 kb in 53 contigs from randomly cloned inserts from
the 75-kb MAC chromosome fraction |
- In 1 year: the rest of chromosome breakage sites (~300)
(already funded by NCRR) * In 2-3 years: 180 Mb of finished MAC DNA sequence
(requested funds for this project). |
* Location of genes (characterized or predicted),
ESTs, physically and genetically mapped STS, other landmarks |
|
Proteins and ESTs |
- About 500 non-redundant ESTs from exponentially growing
cells (University of Chicago, Tetrahymena community, and a private
contribution) - More than 150 genes in GenBank from T. thermophila (mainly)
and T. pyriformis; some are genomic, others are mRNA sequences (funded
over the years mainly by NIGMS and NSF, American Cancer Society) |
- In one year: 70,000 ESTs from several libraries (Genome
Canada mainly, University of Chicago, and Tetrahymena community). In 2-3 years: Roughly 30,000 predicted genes. Based on
pilot studies, about 1/2 are expected to show matches to genes in public
databases, of which a very large majority is expected to match human proteins
(requested funds for this project) |
* Gene names * Sequence map coordinates - Function: experimentally determined or *predicted * Predicted protein domains/motifs * Links to similar genes/proteins, especially in humans,
other model organisms and other Alveolate species - Regulation pattern - Knockout phenotype - Posttranslational modification - Genetic, physical & functional interactions |
|
Physical maps |
- Physical size of MAC chromosomes flanking ~15% of
chromosome breakage sites. (NCRR, NSF). - Physical size of different MAC chromosomes carrying ~65
RAPD DNA markers (NCRR) |
- In 1 year: Nearly all of the ~300 Cbs junctions
characterized (already funded by NCRR) - In 3 years: 2-4,000 physically mapped sequenced-tagged
sites (mainly ESTs) (already funded by NCRR) |
MIC and MAC molecular distance maps, anchored to the
sequence map |
|
Genetic maps |
- Germline linkage maps and macronuclear coassortment maps
linking an estimated 2/3 of the genome (NCRR) |
|
- Conventional germline genetic maps and MAC coassortment
groups, both anchored to physical and sequence maps - Germline deletion maps (nullisomics, unisomics, partial
chromosome deletions) |
Table 1 (continued)
|
Resource |
Already available |
Expect/when |
Wanted in the Database |
Genetic markers & large deletions
|
- More than 400 genetically mapped markers (conventional
mutant genes, RAPDs, RFLPs, Cbs-associated polymorphisms). (mainly NCRR, also
NSF) - More than 50 mutant genes and other genetic features
assigned only to chromosome arm. (NIGMS & NSF, over the years) - More than 100 mapped partial deletions of germline
chromosomes (NSF) |
|
How to test for, diagnostic phenotypes and map coordinates
|
|
References |
Thousands of references on the biology of cloned genes and
mutants |
An increase in Tetrahymena's share of the current
literature, currently nearly 300 papers per year |
Linked to database entries |
Database elements preceded by an
asterisk and their annotation would be generated as part of the whole-genome
sequencing project. Annotations for the remaining data sets would come
primarily from Tetrahymena research experts.