Dalarna University's logo and link to the university's website

du.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluation of Specialized and Non-Specialized Classifiers in PredictingCritical Audit Matters
Dalarna University, School of Information and Engineering, Microdata Analysis.
Dalarna University, School of Information and Engineering, Microdata Analysis.
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This thesis explores the comparative performance of specialized and non-specialized classifiers in predicting Critical Audit Matters (CAMs) from auditors' reports. FinBERT, a domain-specific pre-trained NLP model, is evaluated against traditional non-specialized classifiers such as Naive Bayes, Random Forest, Support Vector Machine, and K-Nearest Neighbors. The thesis aims to assess their effectiveness in identifying CAM topics using textual data and accounting variables. The results show that FinBERT outperforms the non-specialized classifiers in most metrics, demonstrating the benefits of advanced NLP techniques for financial text analysis. The combination of textual and numerical data was initially assumed to enhance the performance, but the experiment proved that the numeric data has a negative impact on the performance. The findings highlight the importance of domain-specific models and suggest further exploration with diverse data sources and additional NLP models to improve audit report analysis.

Place, publisher, year, edition, pages
2024.
Keywords [en]
Critical Audit Matters (CAMs), Financial data, Financial Bidirectional Encoder Representations from Transformers (FinBERT), Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbours (KNN), Large Language Models (LLMs)
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:du-48955OAI: oai:DiVA.org:du-48955DiVA, id: diva2:1881938
Subject / course
Microdata Analysis
Available from: 2024-07-04 Created: 2024-07-04 Last updated: 2025-10-09

Open Access in DiVA

fulltext(1524 kB)273 downloads
File information
File name FULLTEXT01.pdfFile size 1524 kBChecksum SHA-512
cc6349b2fb550627c91905b9e5a1783477800952003c736c22cf91645a607fdb90c8d2c8cfaa31a5b6dc6c505d79d183a71e56e4f6e7cfaf31439aa9252e3443
Type fulltextMimetype application/pdf

By organisation
Microdata Analysis
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 273 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 589 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf