New insulin-like growth factor (IGF)-precursor sequences from mammalian genomes: the molecular evolution of IGFs and associated peptides in primates
Introduction
The two insulin-like growth factors (IGFs), IGF-I and IGF-II, together with insulin, are members of a family of small proteins that show similarity at the levels of primary and tertiary structure, and some overlap in biological activities [4]. Structurally, the two IGFs resemble each other quite closely, and differ from insulin in having a single (rather than two) polypeptide chains, and large complex genes rather than a small compact one. Nevertheless, the genes for insulin and IGF-II are co-located on the same chromosome in mammals (chromosome 11 in man), separated by only about 2 kb of DNA, whereas the gene for IGF-I is found on human chromosome 12 [5], [12], [29]. IGF-I, IGF-II and insulin occur as distinct proteins in all vertebrates except for the primitive jawless fish (Agnatha) [18], [23].
In man, the IGF-II gene lies immediately downstream of the small (∼2 kb; 3 exons) insulin gene and is large (∼30 kb), with 10 exons and various alternative splicing patterns, though these give rise to just a single IGF-II precursor [10], [23]. The IGF-I gene extends over about ∼85 kb and comprises 6 exons showing alternative splicing patterns which in this case give rise to IGF-I precursors with two alternative signal peptides and three alternative C-terminal E-domains, but a single mature IGF-I [10], [23], [24]. Fig. 1 summarizes the insulin and IGF precursors, indicating the exons that encode them.
Comparative studies on mammalian insulin show that the protein is strongly conserved [2], with two exceptions, in the hystricomorph rodents (guinea pig and relatives), where insulin sequences are very variable and differ substantially from those of other rodents, and the new-world monkeys (NWM), where insulin sequences differ significantly from those of other primates. However, the sequences of additional parts of pre–proinsulin, the signal peptide and the C-peptide, are less strongly conserved, presumably reflecting less stringent functional requirements for these parts of the molecule. For the IGFs, again the structures of the mammalian proteins are strongly conserved [23], and here significant exceptions have not been noted. The sequence of guinea pig IGF-I is not markedly different from that of IGF-I from other mammals [3], but sequence studies on IGFs from NWM have not been reported. Sequences of additional regions of the IGF precursors, signal peptides and E-domains, are less conserved, but the amount of comparative data is limited. Differences between IGF genes of man and rat and of their processing to give different mRNAs and IGF precursors have been noted [10], [24].
The availability of draft and complete genomic sequences for a number of mammals increases considerably the amount of comparative data available for studying mammalian genes and their protein products. This is particularly true for primates where at least 9 genome sequences are available in complete or draft form, or as unassembled sequence traces. These data were utilised here to study the IGFs, and associated peptides, of primates. Previously unreported sequences are described for precursors of IGF-I (9 species), IGF-II (5 species) and insulin (3 species). Questions addressed include: (1) Do the sequences of NWM IGFs differ from those of other primates, as is the case for insulin? (2) How conserved are the peptides produced from the C-terminal end (the E-domain) of IGF-I and IGF-II? These have been reported in a number of cases to show independent biological actions, and in the case of IGF-I come in several different forms, due to alternative processing of the IGF-I gene. (3) Do the multiple forms of signal peptide produced by alternative splicing show varying rates of molecular evolution? To assess the extent of sequence conservation the ratio between nonsynonymous and synonymous nucleotide substitutions (dN/dS) has been used. Synonymous substitutions are those that do not affect the protein sequence, and are therefore not subject to the selective constraints maintaining protein structure in evolution [13]. The dN/dS ratio therefore gives an indication of the rate of protein evolution relative to the underlying ‘neutral’ rate.
Section snippets
DNA sequence data
Genomic sequence data were obtained from the ensembl (http://www.ensembl.org) website [15], separately for each of the species studied, by searching ensembl assemblies and/or the ensembl Trace Server (http://trace.ensembl.org), using the FASTA or BLAST search methods [1]. In some cases the ncbi trace archive (http://www.ncbi.nlm.nih.gov/Traces) was also used. Table 1 summarizes the data extracted for primate species for which a substantial amount of genomic sequence is available. Tree shrew (
Established and novel IGF and insulin gene sequences
IGF-I genes were identified and assembled for man, chimpanzee, orangutan, gibbon, macaque, marmoset, tarsier, mouse lemur and bushbaby, as well as tree shrew and dog (Table 1). The sequences of the 6 exons were complete, except for the non-coding 3′ end of the very long exon 6; the data for part of tree shrew exon 5 were poor. The gene sequences, except for most of the very large introns 3 and 5 and part of exon 6 are given in Supplementary Fig. S1. IGF-I precursor sequences were deduced from
Conclusions
This study confirms the expectation that sequences of well-characterized active protein(s) within a hormone precursor will be more conserved than the rest of the prohormone. However, for the IGF and insulin precursors in primates three important exceptions have been identified:
- (1)
The sequences of mature IGFs and insulins are very strongly conserved (Fig. 2), except for the bursts of rapid change seen for insulin and IGF-I on the lineage leading to NWM. This had been seen previously for insulin,
References (42)
- et al.
Basic Local Alignment Search Tool
J. Mol. Biol.
(1990) - et al.
Methods Enzymol.
(1990) - et al.
CLUSTAL: a package for performing multiple sequence alignment on a microcomputer
Gene
(1988) - et al.
Specific cell surface binding sites shared by human Pro-IGF-I Eb-peptides and rainbow trout Pro-IGF-I Ea-4-peptide
Gen. Comp. Endocrinol.
(2003) - et al.
Early events in the biosynthesis of secretory and membrane proteins: the signal hypothesis
Recent Prog. Hormone Res.
(1980) - et al.
A new pro-migratory activity on human myogenic precursor cells for a synthetic peptide within the E domain of the mechano growth factor
Exp. Cell Res.
(2007) - et al.
The phylogeny of the insulin-like growth factors
Int. Rev. Cytol.
(1998) - et al.
Organization and sequence of the human insulin-like growth factor I gene. Alternative RNA processing produces two insulin-like growth factor I precursor peptides
J. Biol. Chem.
(1986) - et al.
Biosynthesis of human insulin-like growth factor I (IGF-I). The primary translation product of IGF-1 mRNA contains an unusual 48-amino acid signal peptide
J. Biol. Chem.
(1987) Signal sequences. The limits of variation
J. Mol. Biol.
(1985)