An initial map of insertion and deletion (INDEL) variation in the human genome

  1. Ryan E. Mills1,2,
  2. Christopher T. Luttig1,
  3. Christine E. Larkins3,
  4. Adam Beauchamp4,
  5. Circe Tsui1,2,
  6. W. Stephen Pittard2,5, and
  7. Scott E. Devine1,2,3,4,6
  1. 1Department of Biochemistry, Emory University School of Medicine, Atlanta, Georgia 30322, USA;
  2. 2Center for Bioinformatics, Emory University School of Medicine, Atlanta, Georgia 30322, USA;
  3. 3Biochemistry, Cell, and Developmental Biology Graduate Program, Emory University School of Medicine, Atlanta, Georgia 30322, USA;
  4. 4Genetics and Molecular Biology Graduate Program, Emory University School of Medicine, Atlanta, Georgia 30322, USA;
  5. 5Bimcore, Emory University School of Medicine, Atlanta, Georgia 30322, USA

    Abstract

    Although many studies have been conducted to identify single nucleotide polymorphisms (SNPs) in humans, few studies have been conducted to identify alternative forms of natural genetic variation, such as insertion and deletion (INDEL) polymorphisms. In this report, we describe an initial map of human INDEL variation that contains 415,436 unique INDEL polymorphisms. These INDELs were identified with a computational approach using DNA re-sequencing traces that originally were generated for SNP discovery projects. They range from 1 bp to 9989 bp in length and are split almost equally between insertions and deletions, relative to the chimpanzee genome sequence. Five major classes of INDELs were identified, including (1) insertions and deletions of single-base pairs, (2) monomeric base pair expansions, (3) multi-base pair expansions of 2–15 bp repeat units, (4) transposon insertions, and (5) INDELs containing random DNA sequences. Our INDELs are distributed throughout the human genome with an average density of one INDEL per 7.2 kb of DNA. Variation hotspots were identified with up to 48-fold regional increases in INDEL and/or SNP variation compared with the chromosomal averages for the same chromosomes. Over 148,000 INDELs (35.7%) were identified within known genes, and 5542 of these INDELs were located in the promoters and exons of genes, where gene function would be expected to be influenced the greatest. All INDELs in this study have been deposited into dbSNP and have been integrated into maps of human genetic variation that are available to the research community.

    Footnotes

    • 6 Corresponding author.

      6 E-mail sedevin{at}emory.edu; fax (404) 727-3452.

    • [Supplemental material is available online at www.genome.org. All INDELs described in this manuscript have been deposited into dbSNP under the “Devine_lab” handle.]

    • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4565806.

      • Received August 13, 2005.
      • Accepted July 12, 2006.
    | Table of Contents

    Preprint Server