Dalarna University's logo and link to the university's website

du.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Smart Search Engine: A Design and Test of Intelligent Search of News with Classification
Dalarna University, School of Information and Engineering.
Dalarna University, School of Information and Engineering.
2021 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Background

Google, Bing, and Baidu are the most commonly used search engines in the world. They also have some problems. For example, when searching for Jaguar, most of the search  results are cars, not animals. This is the problem of polysemy. Search engines always provide the most popular but not the most correct results.

Aim

We want to design and implement a search function and explore whether the method of classified news can improve the precision of users searching for news.

Method

In this research, we collect data by using a web crawler. We use a web crawler to crawl    the data of news in BBC news. Then we use NLTK, inverted index to do data pre-processing, and use BM25 to do data processing.

Results

Compare to the normal search function, our  function has a lower recall rate and a higher precision.

Conclusions

This search function can improve the precision when people search for news.

Implications

This search function can be used not only to search news but to search everything. It has a great future in search engines. It can be combined with machine learning to analyze users' search habits to search and classify more accurately.

Place, publisher, year, edition, pages
2021.
Keywords [en]
Smart search, precision, recall rate, NLTK, inverted index, BM25
National Category
Information Systems
Identifiers
URN: urn:nbn:se:du-37601OAI: oai:DiVA.org:du-37601DiVA, id: diva2:1577981
Subject / course
Information Systems
Available from: 2021-07-05 Created: 2021-07-05 Last updated: 2025-10-09

Open Access in DiVA

fulltext(1251 kB)563 downloads
File information
File name FULLTEXT01.pdfFile size 1251 kBChecksum SHA-512
8bcbeb244de66a8f59fe6c63df0a50ea6b7fcc5eb7f99b508accf8ac2276a028bbbefec0364fa4a0c7fe7b87a7a9808b1afce42b1c451af867ff7f2d577c21a2
Type fulltextMimetype application/pdf

By organisation
School of Information and Engineering
Information Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 566 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1149 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf