New insulin-like growth factor (IGF)-precursor sequences from mammalian genomes: the molecular evolution of IGFs and associated peptides in primates

https://doi.org/10.1016/j.ghir.2008.05.001Get rights and content

Abstract

The insulin-like growth factors (IGF-I and IGF-II) and insulin are related proteins that play an important role in regulation of metabolism and growth. In mammals these proteins are generally strongly conserved, though the sequence of insulin underwent periods of rapid change during the evolution of hystricomorph rodents and new-world monkeys (NWM). The availability of genomic sequence information for a number of mammals provides gene sequences for insulin and IGF precursors from several new species, and this has been used here to study the evolution of these proteins in primates. The sequence of insulin is strongly conserved in primates except for the branch leading to NWM – the sequence of marmoset insulin confirms the episode of rapid evolution in this lineage. Strongly conserved sequences are also seen for IGF-I and IGF-II, though for IGF-I (but not IGF-II) the marmoset sequence again shows an episode of fairly rapid evolution, paralleling the changes seen in insulin. Thus in NWM the sequences of insulin and IGF-I show a co-evolution that may reflect a coordinated change in the functional properties of these two molecules. The other components of the insulin and IGF precursors (signal peptides, E-domains of IGFs, insulin C-peptide) are much less strongly conserved, though to a variable extent. Signal peptides are generally quite variable, but the sequence encoding the N-terminal region of the unusually long signal peptide of IGF-I is strongly conserved, suggesting specific function(s), at least partly associated with nucleotide rather than protein sequence. The Ea domain of proIGF-I and the N-terminal end of the E-domain of proIGF-II are quite strongly conserved, which accords with reports of a biologically active peptide (preptin) derived from the latter. However, the C-terminal parts of the Eb and Ec domains of proIGF-I (produced by alternative splicing) are very variable, which is of interest in view of reports of peptides with important biological activities deriving from these regions.

Introduction

The two insulin-like growth factors (IGFs), IGF-I and IGF-II, together with insulin, are members of a family of small proteins that show similarity at the levels of primary and tertiary structure, and some overlap in biological activities [4]. Structurally, the two IGFs resemble each other quite closely, and differ from insulin in having a single (rather than two) polypeptide chains, and large complex genes rather than a small compact one. Nevertheless, the genes for insulin and IGF-II are co-located on the same chromosome in mammals (chromosome 11 in man), separated by only about 2 kb of DNA, whereas the gene for IGF-I is found on human chromosome 12 [5], [12], [29]. IGF-I, IGF-II and insulin occur as distinct proteins in all vertebrates except for the primitive jawless fish (Agnatha) [18], [23].

In man, the IGF-II gene lies immediately downstream of the small (∼2 kb; 3 exons) insulin gene and is large (∼30 kb), with 10 exons and various alternative splicing patterns, though these give rise to just a single IGF-II precursor [10], [23]. The IGF-I gene extends over about ∼85 kb and comprises 6 exons showing alternative splicing patterns which in this case give rise to IGF-I precursors with two alternative signal peptides and three alternative C-terminal E-domains, but a single mature IGF-I [10], [23], [24]. Fig. 1 summarizes the insulin and IGF precursors, indicating the exons that encode them.

Comparative studies on mammalian insulin show that the protein is strongly conserved [2], with two exceptions, in the hystricomorph rodents (guinea pig and relatives), where insulin sequences are very variable and differ substantially from those of other rodents, and the new-world monkeys (NWM), where insulin sequences differ significantly from those of other primates. However, the sequences of additional parts of pre–proinsulin, the signal peptide and the C-peptide, are less strongly conserved, presumably reflecting less stringent functional requirements for these parts of the molecule. For the IGFs, again the structures of the mammalian proteins are strongly conserved [23], and here significant exceptions have not been noted. The sequence of guinea pig IGF-I is not markedly different from that of IGF-I from other mammals [3], but sequence studies on IGFs from NWM have not been reported. Sequences of additional regions of the IGF precursors, signal peptides and E-domains, are less conserved, but the amount of comparative data is limited. Differences between IGF genes of man and rat and of their processing to give different mRNAs and IGF precursors have been noted [10], [24].

The availability of draft and complete genomic sequences for a number of mammals increases considerably the amount of comparative data available for studying mammalian genes and their protein products. This is particularly true for primates where at least 9 genome sequences are available in complete or draft form, or as unassembled sequence traces. These data were utilised here to study the IGFs, and associated peptides, of primates. Previously unreported sequences are described for precursors of IGF-I (9 species), IGF-II (5 species) and insulin (3 species). Questions addressed include: (1) Do the sequences of NWM IGFs differ from those of other primates, as is the case for insulin? (2) How conserved are the peptides produced from the C-terminal end (the E-domain) of IGF-I and IGF-II? These have been reported in a number of cases to show independent biological actions, and in the case of IGF-I come in several different forms, due to alternative processing of the IGF-I gene. (3) Do the multiple forms of signal peptide produced by alternative splicing show varying rates of molecular evolution? To assess the extent of sequence conservation the ratio between nonsynonymous and synonymous nucleotide substitutions (dN/dS) has been used. Synonymous substitutions are those that do not affect the protein sequence, and are therefore not subject to the selective constraints maintaining protein structure in evolution [13]. The dN/dS ratio therefore gives an indication of the rate of protein evolution relative to the underlying ‘neutral’ rate.

Section snippets

DNA sequence data

Genomic sequence data were obtained from the ensembl (http://www.ensembl.org) website [15], separately for each of the species studied, by searching ensembl assemblies and/or the ensembl Trace Server (http://trace.ensembl.org), using the FASTA or BLAST search methods [1]. In some cases the ncbi trace archive (http://www.ncbi.nlm.nih.gov/Traces) was also used. Table 1 summarizes the data extracted for primate species for which a substantial amount of genomic sequence is available. Tree shrew (

Established and novel IGF and insulin gene sequences

IGF-I genes were identified and assembled for man, chimpanzee, orangutan, gibbon, macaque, marmoset, tarsier, mouse lemur and bushbaby, as well as tree shrew and dog (Table 1). The sequences of the 6 exons were complete, except for the non-coding 3′ end of the very long exon 6; the data for part of tree shrew exon 5 were poor. The gene sequences, except for most of the very large introns 3 and 5 and part of exon 6 are given in Supplementary Fig. S1. IGF-I precursor sequences were deduced from

Conclusions

This study confirms the expectation that sequences of well-characterized active protein(s) within a hormone precursor will be more conserved than the rest of the prohormone. However, for the IGF and insulin precursors in primates three important exceptions have been identified:

  • (1)

    The sequences of mature IGFs and insulins are very strongly conserved (Fig. 2), except for the bursts of rapid change seen for insulin and IGF-I on the lineage leading to NWM. This had been seen previously for insulin,

References (42)

  • M. Wallis

    Mammalian genome projects reveal new growth hormone (GH) sequences. Characterization of the GH-encoding genes of armadillo (Dasypus novemcinctus), hedgehog (Erinaceus europaeus), bat (Myotis lucifugus), hyrax (Procavia capensis), shrew (Sorex araneus), ground squirrel (Spermophilus tridecemlineatus), elephant (Loxodonta africana), cat (Felis catus) and opossum (Monodelphis domestica)

    Gen. Comp. Endocrinol.

    (2008)
  • H.E. Wilson et al.

    Monoclonal antibodies to the carboxy-terminal Ea sequence of pro-insulin-like growth factor-IA (proIGF-IA) recognize proIGF-IA secreted by IM9 B-lymphocytes

    Growth Horm. IGF Res.

    (2001)
  • S.Y. Yang et al.

    Different roles of the IGF-I Ec peptide (MGF) and mature IGF-I in myoblast proliferation and differentiation

    FEBS Lett.

    (2002)
  • J.J. Beintema et al.

    Molecular evolution of rodent insulins

    Mol. Biol. Evol.

    (1987)
  • G.I. Bell et al.

    Sequence of a cDNA encoding guinea pig IGF-I

    Nucl. Acids Res.

    (1990)
  • T.L. Blundell et al.

    Tertiary structures, receptor binding and antigenicity of insulinlike growth factors

    Fed. Proc.

    (1983)
  • J.E. Brissenden et al.

    Human chromosomal mapping of genes for insulin-like growth factors I and II and epidermal growth factor

    Nature

    (1984)
  • C.M. Buchanan et al.

    Preptin derived from proinsulin-like growth factor II (proIGF-II) is secreted from pancreatic islet β-cells and enhances insulin secretion

    Biochem. J.

    (2001)
  • M.J. Chen et al.

    Suppression of growth and cancer-induced angiogenesis of aggressive human breast cancer cells (MDA-MB-231) on the chorioallantoic membrane of developing chicken embryos by E-peptide of Pro-IGF-I

    J. Cell. Biochem.

    (2007)
  • S.L. Chew et al.

    An alternatively spliced human insulin-like growth factor-I transcript with hepatic tissue expression that diverts away from the mitogenic IBE1 peptide

    Endocrinology

    (1995)
  • J. Cornish et al.

    Preptin, another peptide product of the pancreatic beta-cell, is osteogenic in vitro and in vivo

    Am. J. Physiol. Endocrinol. Metab.

    (2007)
  • Cited by (0)

    View full text