Dalarna University's logo and link to the university's website

du.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
Michigan State University, East Lansing, United States.ORCID iD: 0000-0002-4872-1961
Show others and affiliations
2012 (English)In: Proceedings of the National Academy of Sciences of the United States of America, ISSN 0027-8424, E-ISSN 1091-6490, Vol. 109, no 33, p. 13272-13277Article in journal (Refereed) Published
Abstract [en]

Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for de novo assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory.We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for de novo assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly.

Place, publisher, year, edition, pages
2012. Vol. 109, no 33, p. 13272-13277
Keywords [en]
Compression, Metagenomics, article, gene sequence, mathematical analysis, metagenome, plots and curves, priority journal, probabilistic de Bruijn graph, Base Pairing, Chromosomes, Bacterial, Computational Biology, DNA, Circular, Escherichia coli, Genome, Bacterial, Information Theory, Nonlinear Dynamics, Sequence Analysis, DNA, Soil Microbiology
National Category
Bioinformatics and Systems Biology
Identifiers
URN: urn:nbn:se:du-37190DOI: 10.1073/pnas.1121464109Scopus ID: 2-s2.0-84865176493OAI: oai:DiVA.org:du-37190DiVA, id: diva2:1557634
Available from: 2021-05-26 Created: 2021-05-26 Last updated: 2021-05-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Hintze, Arend

Search in DiVA

By author/editor
Hintze, Arend
In the same journal
Proceedings of the National Academy of Sciences of the United States of America
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 5 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf