Dalarna University's logo and link to the university's website

du.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparing RAG and Fine-Tuned LLMs to ChatGPT for Domain-Specific Insights
Dalarna University, School of Information and Engineering.
Dalarna University, School of Information and Engineering.
2025 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

This thesis explores the use of retrieval-augmented generation (RAG) for enhancing large language models (LLMs) to answer domain-specific questions about Positive Energy Districts (PEDs). PEDs represent an innovative approach to achieving energy surplus and reducing greenhouse gas emissions, aligning with the European Union’s goals for sustainable urban energy solutions. However, existing Question Answering (QA) frameworks lack the specialized design needed to tackle the complex challenges of PED projects. By constructing a structured corpus of academic literature about PEDs and fine-tuning models such as BERT and T5, the study evaluates their performance answering domain specific questions alongside GPT-enhanced pipelines and ChatGPT. Expert evaluations and cosine similarity analysis reveal that GPT-based pipelines outperform standalone models inaccuracy, relevance, and readability. However, results also highlight limitations in semantic similarity as a sole metric for assessing response quality. This work highlights the role of structured domain-specific corpora and generative models in improving decision-making processes for stakeholders in urban sustainability. The findings suggest future research opportunities, including integrating real-time information retrieval and expanding knowledge bases to refine QA frameworks further. These contributions advance the implementation of energy-positive urban environments and support global efforts toward sustainable urban development.

Place, publisher, year, edition, pages
2025.
Keywords [en]
Retrieval-Augmented Generation (RAG), Positive Energy Districts (PEDs), Large Language Models (LLMs), Fine-Tuning, Domain-Specific Corpus, BERT, T5, GPT-4, Cosine Similarity, Corpus Preprocessing, Knowledge Retrieval, Energy Efficiency, Stakeholder communication
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:du-50141OAI: oai:DiVA.org:du-50141DiVA, id: diva2:1935432
Subject / course
Microdata Analysis
Available from: 2025-02-06 Created: 2025-02-06 Last updated: 2025-10-09

Open Access in DiVA

fulltext(7271 kB)648 downloads
File information
File name FULLTEXT01.pdfFile size 7271 kBChecksum SHA-512
fabb360c0384a1570ba209c75fb459d650a3a13aaf79e2b4967ae09c5b92402616b112ceb18d52be2976080f7d1f961e514ced6f7221a31aefe2c690db82f579
Type fulltextMimetype application/pdf

By organisation
School of Information and Engineering
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 648 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2069 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf