Dalarna University's logo and link to the university's website

du.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The Impact of an Attention Mechanism on the Representations in Neural Networks, Focusing on Catastrophic Forgetting and Robustness to Input Noise
Dalarna University, School of Information and Engineering, Microdata Analysis.
Dalarna University, School of Information and Engineering, Microdata Analysis.
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This study explores how attention mechanisms impact representation distributions within neural networks, focusing on catastrophic forgetting and robustness to input noise. We compare Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), their attention-enhanced counterparts (RNNA, LSTMA, GRUA), and the Transformer model using musical sequences from "Daisy Bell". A key finding is the difference in how these models distribute the information in their representation. Base models like RNN, LSTM, and GRU concentrate information within specific nodes, while attention-enhanced models spread information across more nodes, demonstrating greater robustness to input noise. This is shown by significant differences in performance deterioration between base models and their attention-augmented versions. However, base models such as RNN and GRU exhibit better resistance to catastrophic forgetting compared to their attention-enhanced counterparts. Despite this, attention models show a positive correlation between higher overlap percentages in their representations and improved accuracy for certain tasks, alongside a negative correlation with higher numbers of empty nodes. The Transformer model stands out by maintaining high accuracy across tasks, likely due to its self-attention mechanisms. These results suggest that while attention mechanisms enhance robustness to noise, further research is needed to address catastrophic forgetting in neural networks.

Place, publisher, year, edition, pages
2024.
Keywords [en]
Representations, Hidden States, Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), Gated Recurrent Units (GRU), Attention Mechanism, Transformer, Catastrophic Forgetting, Robustness to Noise, Note Sequence(s) (NS)
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:du-48953OAI: oai:DiVA.org:du-48953DiVA, id: diva2:1881929
Subject / course
Microdata Analysis
Available from: 2024-07-04 Created: 2024-07-04

Open Access in DiVA

fulltext(1219 kB)205 downloads
File information
File name FULLTEXT01.pdfFile size 1219 kBChecksum SHA-512
bba668d3c3785ac1b6b52db1f52103bf62541396fd10b26917208bcbb92cb13dc1781ffe7ace63f1801e5bbf6fcf20f41a1de8bc13be41c28003d794251dfe8d
Type fulltextMimetype application/pdf

By organisation
Microdata Analysis
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 205 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 782 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf