Chromosome
Structure
o
Each chromosome consists of a single molecule of deoxyribonucleic
acid (DNA) plus its associated proteins
o
DNA is composed of nucleotides, which are composed of:
§
Deoxyribose sugar
§
Phosphate group
§
Nitrogen-containing base
·
Purines:
o
Adenine (A)
o
Guanine (G)
·
Pyrimidines:
o
Cytosine (C)
o
Thymine (T)
o
Nucleotides (and DNA strands) have a 3’ end and a 5’ end
§
The strand opposite has 3’ and 5’ ends in the opposite
orientation
§
Hydrogen bonds between base pairs
·
G-C pairs have 3 hydrogen bonds
·
A-T pairs have 2 hydrogen bonds
§
Nucleotides can only be added to the 3’ end
§
Phosphodiester bonds between
adjacent nucleotides
o
Replication origins:
§
Sequences of DNA where DNA helicases
and single-strand binding proteins start separating the 2 strands during replication
o
Chromatin:
§
Composition:
·
A single continuous molecule of DNA
o
50 Mbp (chr.
21) to 250 Mbp (chr. 1)
·
Histone
o
8 molecules of the core histones (one
pair of each of H2A, H2B, H3, and H4) form an octamer
o
H3 and H2A can be substituted with specialized histones
§
Vary by cell type
§
change how DNA is packaged
§
change how accessible it is to regulatory molecules that
determine gene expression or other genome functions
·
Nonhistone proteins
o
Heterogenous group
o
Transcription factors included
§
DNA double helix winds twice around an octamer
to form a 10nm nucleosome
§
Nucleosomes are linked by a
linker segment of the histone H1
·
“beads on a string”
·
Chromatin is in this conformation when the region is actively
being transcribed
§
Nucleosomes are coiled into
a 30nm solenoid
·
Visible as a string on EM
§
Solenoids are packed into loops (DNA looped domains) tethered to
a nonhistone protein matrix
·
Present during interphase
o
There is little overlap between chromosomes during interphase
o
Position of the chromosomes is not constant between cells
·
These loops may be functional units of DNA replication or gene
transcription
·
May be one level of control of gene expression
·
Specific DNA sequences called scaffold-associated regions (SARs)
or matrix-attachment regions (MARs) interact with the scaffold proteins
§
Looped domains coil into chromosomes during metaphase
·
The staining in G-banding is darker in areas where looped domains
are packed tighter
§
2 types of chromatin in eukaryotic cells:
·
Euchromatin
·
Heterochromatin
§
Euchromatin:
·
Loosely organized, extended, uncoiled
·
early replicating during S-phase
·
active, genes
§
Heterochromatin
·
Genetically inactive (mostly)
·
Late replicating during S-phase
·
Remains dark-staining in interphase
·
Constitutive heterochromatin
o
Simple repeats
o
Around centromeres of all chromosomes
and at the distal end of the Y chromosome
o
No transcribed genes
o
Variations have no phenotypic effect
o
Chromosomes 1, 9, 16, Y have variably sized heterochromatic
regions
o
Involved in regulation of crossing over during
·
Facultative heterochromatin
o
Inactivated X chromosome
o
Condensed during interphase (Barr body)
o
Replicates late during S-phase
o
3 functional regions must be present in a eukaryotic cell in
order to replicate and segregate correctly
§
Nuclear organizer regions (NORs)
§
Centromere
§
telomeres
o
Centromere:
§
Refers to the DNA at the site of the spindle-fiber attachment
§
2 sister chromatids are joined here
§
Essential for chromosome to survive cell division
§
Interaction with the mitotic spindle
·
Kinetochore apparatus:
o
Protein complex that attaches the centromere
to the spindle fibers
§
2-4 Mbp of a-satellite DNA (alphoid DNA)
·
Bound by a special histone (CENP-A histone H3 variant)
o
Nucleolar organizer
regions (NORs)
§
Stalks of satellites of acrocentric
chromosomes contain NORs
·
Theoretically there are 10 per cell
·
Not all are necessarily active during any given cell cycle
§
This is where nucleoli form in interphase
cells
§
Also the site of ribosomal RNA genes that produce rRNA (tandemly repeated genes)
o
Telomere:
§
Physical end of chromosome
§
Nonhistone proteins complex
with telomeric DNA to protect the end from nucleases
§
Plays a role in synapsis during meiosis
·
Chromosome pairing is initiated in the subtelomeric
regions
§
Tandem repeats of TTAGGG over 3-20 kb at the chromosome end
§
Short G-rich unpaired tail at the very end of DNA helix
§
Telomerase synthesizes the TTAGGG repeats during replication
§
Progressive shortening of telomeres in normal cells
o
Mitochondrial chromosome
§
Each mitochondria contains a number of copies of the
mitochondrial chromosome
§
Circular, 16kb
§
37 genes
§
Products function in the mitochondria
o
Genome organization:
§
Regions of the genome with similar characterestics,
organization, replication, and expression tend to be clustered together
o
3 types of DNA:
§
Unique sequence DNA (50-75% of human genome)
§
Repetitive sequence DNA (10-15% of genome – not including
interspersed repetitive DNA)
§
Unclassified spacer DNA (~25% of genome)
o
Unique sequence DNA (single-copy DNA):
§
Most genes are here
·
A gene is the entire nucleic acid sequence that is necessary for
the synthesis of a functional gene product (polypeptide or RNA)
·
~ 33% of human DNA is transcribed into pre-mRNA precursors
o
~95% is introns
§
Intron length varies
considerably, median 3.3 kb
o
~1.5% of DNA encodes for exons of mRNA
§
Most exons are 50-200 bp
·
~5% of total DNA contains regulatory elements
§
One unique copy (or at most a few copies) per haploid set
·
Solitary genes (25-50% of protein-coding genes)
o
Represented only once per haploid genome
·
Duplicated genes (~50% of protein-coding genes)
o
Close but nonidentical sequences
o
Generally located within 5-50 kb of each other
o
Gene family:
§
A set of duplicated genes that encode proteins with similar but
non-identical sequences
§
Encode for closely related homologous proteins called a protein
family
§
Some protein families (protein kinases,
transcription factors) have hundreds of members
§
Most have a few to 30 members
o
Duplicated genes can be “created” by unequal cross-over during
meiotic recombination
·
Multiple copy genes (not in tandem arrays)
o
tRNA
o
histone
o
often in clusters but generally not in tandem arrays like rRNA and snRNA genes
§
Most single-copy DNA is found in short (several kb or less)
stretches interspersed with repetitive sequence DNA
o
Repetitious DNA (Repetitive sequence DNA):
§
Tandemly arranged or
interspersed (dispersed) amongst unique sequence DNA
§
Contributes to maintaining chromosome structure
§
Simple-sequence DNA (satellite DNA) (3% of genome)
·
Repetitive tandemly arranged DNA
·
No known function for most satellite DNA
·
Much lies near centromeres
o
May assistin attaching chromosomes to
spindle microtubules
·
Also telomeres
·
And specific locations within arms of particular chromosomes
·
Within species, the sequence of the repeats are highly conserved
o
However, differences in the number of repeats are common
§
Likely due to unequal crossing over during meiosis
·
3 categories of satellite DNA:
o
a-satellite DNA
o
Minisatellite DNA
o
Microsatellite DNA
·
a-satellite DNA:
o
Located in heterochromatin associated with centromeres
of all human chromosomes
o
~171 bp repeat in a tandem array
(higher order repeats - HORs) of up to 1 Mbp or more
§
HORs highly homogenous for any given chromosome
§
Size and number of repeats is chromosome-specific
1. Some a-satellite probes
are chromosome specific
1. X, Y, 18 have chromosome
specific probes
2. 13 & 21 a-satellite DNA is
not chromosome specific
§
Some a-satellite probes will stain all centromeres
o
Total length varies substantially:
§
Ex. 100 kb to 6 Mb on chr 21
§
6-fold variation on chr 5
o
Generally not transcribed
o
Biologic role is unclear
§
Believed to play a role in centromere
function by ensuring proper chromosome segregation in mitosis and meiosis
o
3 suprafamilies based on
cross-hybridization at low stringency (degree of homology):
§
Suprachromosomal family I
(1,3,5,6,7,10,12,16,19)
§
Suprachromosomal family II
(2,4,8,9,13,14,15,18,20,21,22)
§
Suprachromosomal family III
(11,17,X)
·
b-satellite DNA:
o
Also found at centromeres
·
Minisatellites:
o
15-100 bp repeats
o
1-5 kbp total length
o
Distal ends of chromosomes usually
o
Highly polymorphic
§
DNA fingerprinting is based on these
1. PCR on either end
of multiple known minisatellites
·
Microsatellites:
o
1-13 bp repeat
§
Most are 1-3 bp
o
Less than 150 bp total length usually
o
Highly polymorphic
o
Thought to occur by “backward slippage” of a daughter strand on its
template during DNA replication
o
Expanded microsatellites within transcribed genes can cause
various types of neuromuscular diseases
·
Simple sequence repeats (SSRs) / small (short) tandem repeats
(STRs):
o
3-6 bp repeats
§
Perfect or nearly perfect repeats
o
Found in coding and noncoding DNAs
o
Highly polymorphic
·
Tandemly repeated genes:
o
Genes encoding rRNAs
§
Likely these genes need to be multiple in order to keep up with
the demands of embryonic development
o
Other noncoding RNAs
§
Some of the snRNAs involved in RNA splicing
§
Repetitive interspersed (dispersed) DNA (interspersed repeats)
(mobile DNA elements) (transposable elements) (moderately repeated DNA)
(intermediate-repeat DNA) (~45% of genome):
·
Interspersed amongst unique sequence DNA
·
2 main categories: short and long
·
In most cases appear to have no specific function
o
May exist only to maintain themselves – “selfish DNA”
·
many can generate copies of themselves, and integrate elsewhere
in the genome (transposition)
o
occasionally causing insertional
inactivation of a medically important gene
§
Example: tumour suppressor gene
§
Accounts for only ~0.1-0.2% of mutations in humans
o
This transposition can occur in germ cells or somatic cells
during mitosis
o
Most transposons in humans are retrotransposons
§
Transpose using an RNA intermediate and reverse transcriptase
§
Within the transposon is a region that
encodes transposase
§
Invariably they have an inverted repeat containing ~50 bp at each end
§
at either end of the insertion site there is a direct repeat created
by the replication of the insertion site by transposase
o
overall frequency of L1 and SINE retrotransposition
is ~1 per 8 individuals
o
unequal crossing over between mobile elements between and within
genes can be responsible for duplication of genes and exons,
bringing about different members of gene families
·
LTR retrotansposons (viral retrotransposons) (8% of human genome):
o
Contain long terminal repeats (LTRs)
§
250-600 bp total length
§
Characteristic of integrated retroviral DNA
o
Encode all the proteins of the most common type of retroviruses,
except the envelope proteins
·
LINES (long interspersed elements) (~20% of human genome)
o
~ 6 kbp
§
Most are truncated at their 5’ end, so average size is only 900 bp
o
~900,000 sites per genome
o
3 major families – L1, L2, L3
§
Only L1 family transposes in human genome
o
Transpose by a different mechanism than LTR transposons
o
Short direct repeats on either side of LINE
o
Contains sequences encoding for:
§
an RNA-binding protein
§
a protein homologous to reverse transcriptase of retroviruses,
but also with DNA endonuclease activity
§
Only 0.01% contain full intact open reading frames for the two
encoded proteins
o
insertion within genes can result in disease
o
High A-T content
o
Contain cleavage sites for L1
o
G-dark bands predominantly
·
SINEs (short interspersed elements) (13% of human genome)
o
90-500 bp
o
Most do not encode for proteins
o
Most likely the proteins expressed from full-length LINEs mediate
transposition of SINEs
o
Flanked by short direct repeats
o
Example Alu
elements (> 10% of human genome)
§
~300 bp repeats
1. Most are
truncated at 5’ end similar to SINEs
§
Contain a single cleavage site recognized by the restriction endonuclease AluI
§
> 1 million family members
1. Family members
are recognizably related but not identical in sequence
§
Many are transcribed in pre-mRNA and noncoding
regions of mRNA
§
Considerable homology with 7SL RNA
1. Component of the
signal-recognition particle
1. Aids in targeting
certain polypeptides to the membranes of the endoplasmic reticulum
§
Found in G-light bands
§
High G-C content
·
Processed pseudogenes:
o
mRNAs that have been reverse transcribed and randomly integrated
into chromosomal DNA
§
thought to be a rare event
§
flanked by short direct repeats
o
nonfunctional
§
Duplicated sequences
·
High sequence conservation often
·
Many different locations around the genome
·
Segmental duplications involve substantial segments of a
chromosome
o
Hundreds of kbs
o
At least 5% of the genome
o
Can lead to genomic rearrangements with deletion or duplication
of the region between the gene copies, due to non-allelic homologous
recombination (NAHR)
·
Low-copy repeats (LCRs)
o
Stretches of duplicated DNA that are 10-500 kb in size with
sequence homologies > 95%
o
Increase the probability of NAHR in these areas
References: