Chromosome Structure

 

o   Each chromosome consists of a single molecule of deoxyribonucleic acid (DNA) plus its associated proteins

o   DNA is composed of nucleotides, which are composed of:

§  Deoxyribose sugar

§  Phosphate group

§  Nitrogen-containing base

·         Purines:

o   Adenine (A)

o   Guanine (G)

·         Pyrimidines:

o   Cytosine (C)

o   Thymine (T)

o   Nucleotides (and DNA strands) have a 3’ end and a 5’ end

§  The strand opposite has 3’ and 5’ ends in the opposite orientation

§  Hydrogen bonds between base pairs

·         G-C pairs have 3 hydrogen bonds

·         A-T pairs have 2 hydrogen bonds

§  Nucleotides can only be added to the 3’ end

§  Phosphodiester bonds between adjacent nucleotides

o   Replication origins:

§  Sequences of DNA where DNA helicases and single-strand binding proteins start separating the 2 strands during replication

o   Chromatin:

§  Composition:

·         A single continuous molecule of DNA

o   50 Mbp (chr. 21) to 250 Mbp (chr. 1)

·         Histone

o   8 molecules of the core histones (one pair of each of H2A, H2B, H3, and H4) form an octamer

o   H3 and H2A can be substituted with specialized histones

§  Vary by cell type

§  change how DNA is packaged

§  change how accessible it is to regulatory molecules that determine gene expression or other genome functions

·         Nonhistone proteins

o   Heterogenous group

o   Transcription factors included

§  DNA double helix winds twice around an octamer to form a 10nm nucleosome

§  Nucleosomes are linked by a linker segment of the histone H1

·         “beads on a string”

·         Chromatin is in this conformation when the region is actively being transcribed

§  Nucleosomes are coiled into a 30nm solenoid

·         Visible as a string on EM

§  Solenoids are packed into loops (DNA looped domains) tethered to a nonhistone protein matrix

·         Present during interphase

o   There is little overlap between chromosomes during interphase

o   Position of the chromosomes is not constant between cells

·         These loops may be functional units of DNA replication or gene transcription

·         May be one level of control of gene expression

·         Specific DNA sequences called scaffold-associated regions (SARs) or matrix-attachment regions (MARs) interact with the scaffold proteins

§  Looped domains coil into chromosomes during metaphase

·         The staining in G-banding is darker in areas where looped domains are packed tighter

§  2 types of chromatin in eukaryotic cells:

·         Euchromatin

·         Heterochromatin

§  Euchromatin:

·         Loosely organized, extended, uncoiled

·         early replicating during S-phase

·         active, genes

§  Heterochromatin

·         Genetically inactive (mostly)

·         Late replicating during S-phase

·         Remains dark-staining in interphase

·         Constitutive heterochromatin

o   Simple repeats

o   Around centromeres of all chromosomes and at the distal end of the Y chromosome

o   No transcribed genes

o   Variations have no phenotypic effect

o   Chromosomes 1, 9, 16, Y have variably sized heterochromatic regions

o   Involved in regulation of crossing over during

·         Facultative heterochromatin

o   Inactivated X chromosome

o   Condensed during interphase (Barr body)

o   Replicates late during S-phase

o   3 functional regions must be present in a eukaryotic cell in order to replicate and segregate correctly

§  Nuclear organizer regions (NORs)

§  Centromere

§  telomeres

o   Centromere:

§  Refers to the DNA at the site of the spindle-fiber attachment

§  2 sister chromatids are joined here

§  Essential for chromosome to survive cell division

§  Interaction with the mitotic spindle

·         Kinetochore apparatus:

o   Protein complex that attaches the centromere to the spindle fibers

§  2-4 Mbp of a-satellite DNA (alphoid DNA)

·         Bound by a special histone (CENP-A histone H3 variant)

o   Nucleolar organizer regions (NORs)

§  Stalks of satellites of acrocentric chromosomes contain NORs

·         Theoretically there are 10 per cell

·         Not all are necessarily active during any given cell cycle

§  This is where nucleoli form in interphase cells

§  Also the site of ribosomal RNA genes that produce rRNA (tandemly repeated genes)

o   Telomere:

§  Physical end of chromosome

§  Nonhistone proteins complex with telomeric DNA to protect the end from nucleases

§  Plays a role in synapsis during meiosis

·         Chromosome pairing is initiated in the subtelomeric regions

§  Tandem repeats of TTAGGG over 3-20 kb at the chromosome end

§  Short G-rich unpaired tail at the very end of DNA helix

§  Telomerase synthesizes the TTAGGG repeats during replication

§  Progressive shortening of telomeres in normal cells

o   Mitochondrial chromosome

§  Each mitochondria contains a number of copies of the mitochondrial chromosome

§  Circular, 16kb

§  37 genes

§  Products function in the mitochondria

o   Genome organization:

§  Regions of the genome with similar characterestics, organization, replication, and expression tend to be clustered together

o   3 types of DNA:

§  Unique sequence DNA (50-75% of human genome)

§  Repetitive sequence DNA (10-15% of genome – not including interspersed repetitive DNA)

§  Unclassified spacer DNA (~25% of genome)

o   Unique sequence DNA (single-copy DNA):

§  Most genes are here

·         A gene is the entire nucleic acid sequence that is necessary for the synthesis of a functional gene product (polypeptide or RNA)

·         ~ 33% of human DNA is transcribed into pre-mRNA precursors

o   ~95% is introns

§  Intron length varies considerably, median 3.3 kb

o   ~1.5% of DNA encodes for exons of mRNA

§  Most exons are 50-200 bp

·         ~5% of total DNA contains regulatory elements

§  One unique copy (or at most a few copies) per haploid set

·         Solitary genes (25-50% of protein-coding genes)

o   Represented only once per haploid genome

·         Duplicated genes (~50% of protein-coding genes)

o   Close but nonidentical sequences

o   Generally located within 5-50 kb of each other

o   Gene family:

§  A set of duplicated genes that encode proteins with similar but non-identical sequences

§  Encode for closely related homologous proteins called a protein family

§  Some protein families (protein kinases, transcription factors) have hundreds of members

§  Most have a few to 30 members

o   Duplicated genes can be “created” by unequal cross-over during meiotic recombination

·         Multiple copy genes (not in tandem arrays)

o   tRNA

o   histone

o   often in clusters but generally not in tandem arrays like rRNA and snRNA genes

§  Most single-copy DNA is found in short (several kb or less) stretches interspersed with repetitive sequence DNA

o   Repetitious DNA (Repetitive sequence DNA):

§  Tandemly arranged or interspersed (dispersed) amongst unique sequence DNA

§  Contributes to maintaining chromosome structure

§  Simple-sequence DNA (satellite DNA) (3% of genome)

·         Repetitive tandemly arranged DNA

·         No known function for most satellite DNA

·         Much lies near centromeres

o   May assistin attaching chromosomes to spindle microtubules

·         Also telomeres

·         And specific locations within arms of particular chromosomes

·         Within species, the sequence of the repeats are highly conserved

o   However, differences in the number of repeats are common

§  Likely due to unequal crossing over during meiosis

·         3 categories of satellite DNA:

o   a-satellite DNA

o   Minisatellite DNA

o   Microsatellite DNA

·         a-satellite DNA:

o   Located in heterochromatin associated with centromeres of all human chromosomes

o   ~171 bp repeat in a tandem array (higher order repeats - HORs) of up to 1 Mbp or more

§  HORs highly homogenous for any given chromosome

§  Size and number of repeats is chromosome-specific

1.    Some a-satellite probes are chromosome specific

1.    X, Y, 18 have chromosome specific probes

2.    13 & 21 a-satellite DNA is not chromosome specific

§  Some a-satellite probes will stain all centromeres

o   Total length varies substantially:

§  Ex. 100 kb to 6 Mb on chr 21

§  6-fold variation on chr 5

o   Generally not transcribed

o   Biologic role is unclear

§  Believed to play a role in centromere function by ensuring proper chromosome segregation in mitosis and meiosis

o   3 suprafamilies based on cross-hybridization at low stringency (degree of homology):

§  Suprachromosomal family I (1,3,5,6,7,10,12,16,19)

§  Suprachromosomal family II (2,4,8,9,13,14,15,18,20,21,22)

§  Suprachromosomal family III (11,17,X)

·         b-satellite DNA:

o   Also found at centromeres

·         Minisatellites:

o   15-100 bp repeats

o   1-5 kbp total length

o   Distal ends of chromosomes usually

o   Highly polymorphic

§  DNA fingerprinting is based on these

1.    PCR on either end of multiple known minisatellites

·         Microsatellites:

o   1-13 bp repeat

§  Most are 1-3 bp

o   Less than 150 bp total length usually

o   Highly polymorphic

o   Thought to occur by “backward slippage” of a daughter strand on its template during DNA replication

o   Expanded microsatellites within transcribed genes can cause various types of neuromuscular diseases

·         Simple sequence repeats (SSRs) / small (short) tandem repeats (STRs):

o   3-6 bp repeats

§  Perfect or nearly perfect repeats

o   Found in coding and noncoding DNAs

o   Highly polymorphic

·         Tandemly repeated genes:

o   Genes encoding rRNAs

§  Likely these genes need to be multiple in order to keep up with the demands of embryonic development

o   Other noncoding RNAs

§  Some of the snRNAs involved in RNA splicing

§  Repetitive interspersed (dispersed) DNA (interspersed repeats) (mobile DNA elements) (transposable elements) (moderately repeated DNA) (intermediate-repeat DNA) (~45% of genome):

·         Interspersed amongst unique sequence DNA

·         2 main categories: short and long

·         In most cases appear to have no specific function

o   May exist only to maintain themselves – “selfish DNA”

·         many can generate copies of themselves, and integrate elsewhere in the genome (transposition)

o   occasionally causing insertional inactivation of a medically important gene

§  Example:  tumour suppressor gene

§  Accounts for only ~0.1-0.2% of mutations in humans

o   This transposition can occur in germ cells or somatic cells during mitosis

o   Most transposons in humans are retrotransposons

§  Transpose using an RNA intermediate and reverse transcriptase

§  Within the transposon is a region that encodes transposase

§  Invariably they have an inverted repeat containing ~50 bp at each end

§  at either end of the insertion site there is a direct repeat created by the replication of the insertion site by transposase

o   overall frequency of L1 and SINE retrotransposition is ~1 per 8 individuals

o   unequal crossing over between mobile elements between and within genes can be responsible for duplication of genes and exons, bringing about different members of gene families

·         LTR retrotansposons (viral retrotransposons) (8% of human genome):

o   Contain long terminal repeats (LTRs)

§  250-600 bp total length

§  Characteristic of integrated retroviral DNA

o   Encode all the proteins of the most common type of retroviruses, except the envelope proteins

·         LINES (long interspersed elements) (~20% of human genome)

o   ~ 6 kbp

§  Most are truncated at their 5’ end, so average size is only 900 bp

o   ~900,000 sites per genome

o   3 major families – L1, L2, L3

§  Only L1 family transposes in human genome

o   Transpose by a different mechanism than LTR transposons

o   Short direct repeats on either side of LINE

o   Contains sequences encoding for:

§  an RNA-binding protein

§  a protein homologous to reverse transcriptase of retroviruses, but also with DNA endonuclease activity

§  Only 0.01% contain full intact open reading frames for the two encoded proteins

o   insertion within genes can result in disease

o   High A-T content

o   Contain cleavage sites for L1

o   G-dark bands predominantly

·         SINEs (short interspersed elements) (13% of human genome)

o   90-500 bp

o   Most do not encode for proteins

o   Most likely the proteins expressed from full-length LINEs mediate transposition of SINEs

o   Flanked by short direct repeats

o   Example Alu elements (> 10% of human genome)

§  ~300 bp repeats

1.    Most are truncated at 5’ end similar to SINEs

§  Contain a single cleavage site recognized by the restriction endonuclease AluI

§  > 1 million family members

1.    Family members are recognizably related but not identical in sequence

§  Many are transcribed in pre-mRNA and noncoding regions of mRNA

§  Considerable homology with 7SL RNA

1.    Component of the signal-recognition particle

1.    Aids in targeting certain polypeptides to the membranes of the endoplasmic reticulum

§  Found in G-light bands

§  High G-C content

·         Processed pseudogenes:

o   mRNAs that have been reverse transcribed and randomly integrated into chromosomal DNA

§  thought to be a rare event

§  flanked by short direct repeats

o   nonfunctional

§  Duplicated sequences

·         High sequence conservation often

·         Many different locations around the genome

·         Segmental duplications involve substantial segments of a chromosome

o   Hundreds of kbs

o   At least 5% of the genome

o   Can lead to genomic rearrangements with deletion or duplication of the region between the gene copies, due to non-allelic homologous recombination (NAHR)

·         Low-copy repeats (LCRs)

o   Stretches of duplicated DNA that are 10-500 kb in size with sequence homologies > 95%

o   Increase the probability of NAHR in these areas

 

 

 

References: