Dalarna University's logo and link to the university's website

du.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An Investigation of the LDA based Topic Model Approachfor Data Mining Twitter Social Network
Dalarna University, School of Technology and Business Studies, Microdata Analysis.
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

In recent years, Twitter has become a highly popular form of social media.Twitter provides a platform for users to post short messages for followers to read inan on-or off-line fashion. Twitter is used in a variety of ways, from posting aboutpersonal daily life, to keeping up to date with current events.This thesis aims to find a reliable pipeline to analyse and visualize hottest topics(or trends) that people are talking about on Twitter during a period of time. Topicmodel is used to cluster Twitter messages and identify topic words, then topic wordscombined with the tweets’ influences are graphically represented by visualizationsoftware to reflect the trend under the topic. However, two limitations of Twittermessages prevent normal topic model tools from being applied their full potentials:Twitter messages are short and and colloquial. Twitter message provides little usefulinformation for the topic model to work properly. Thus, we proposed an poolingschema to enhance the performance of a topic model on Twitter data. Meanwhile, toidentify a reliable pipeline to do the task, we compared different methodologiesduring the process. We compared performance with and without pooling schema inthe data sampling step, performance with and without TF*IDF in the data processingstep; and finally compare performance of Latent Dirichlet allocation (LDA) withCorrelated Topic Models (CTM) to identify a topic. The results show thatLDA-TF*IDF with pooling schema is the most accurate model to identify Twittertrend.

Place, publisher, year, edition, pages
2015.
Keywords [en]
Topic model, Latent Dirichlet Allocation (LDA), Twitter
National Category
Business Administration
Identifiers
URN: urn:nbn:se:du-18642OAI: oai:DiVA.org:du-18642DiVA, id: diva2:828229
Available from: 2015-06-30 Created: 2015-06-30

Open Access in DiVA

No full text in DiVA

By organisation
Microdata Analysis
Business Administration

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 898 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf