du.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Construction site accident analysis using text mining and natural language processing techniques
Dalarna University, School of Technology and Business Studies, Microdata Analysis.
Dalarna University, School of Technology and Business Studies, Computer Engineering.ORCID iD: 0000-0002-1429-2345
2019 (English)In: Automation in Construction, ISSN 0926-5805, E-ISSN 1872-7891, Vol. 99, p. 238-248Article in journal (Refereed) Published
Abstract [en]

Workplace safety is a major concern in many countries. Among various industries, construction sector is identified as the most hazardous work place. Construction accidents not only cause human sufferings but also result in huge financial loss. To prevent reoccurrence of similar accidents in the future and make scientific risk control plans, analysis of accidents is essential. In construction industry, fatality and catastrophe investigation summary reports are available for the past accidents. In this study, text mining and natural language process (NLP) techniques are applied to analyze the construction accident reports. To be more specific, five baseline models, support vector machine (SVM), linear regression (LR), K-nearest neighbor (KNN), decision tree (DT), Naive Bayes (NB) and an ensemble model are proposed to classify the causes of the accidents. Besides, Sequential Quadratic Programming (SQP) algorithm is utilized to optimize weight of each classifier involved in the ensemble model. Experiment results show that the optimized ensemble model outperforms rest models considered in this study in terms of average weighted F1 score. The result also shows that the proposed approach is more robust to cases of low support. Moreover, an unsupervised chunking approach is proposed to extract common objects which cause the accidents based on grammar rules identified in the reports. As harmful objects are one of the major factors leading to construction accidents, identifying such objects is extremely helpful to mitigate potential risks. Certain limitations of the proposed methods are discussed and suggestions and future improvements are provided.

Place, publisher, year, edition, pages
2019. Vol. 99, p. 238-248
Keywords [en]
Construction site accident analysis, Machine learning, Natural language processing, Optimization, Sequential quadratic programming, Text mining
National Category
Computer and Information Sciences
Research subject
Energy and Built Environments
Identifiers
URN: urn:nbn:se:du-29254DOI: 10.1016/j.autcon.2018.12.016ISI: 000456759400020Scopus ID: 2-s2.0-85058940383OAI: oai:DiVA.org:du-29254DiVA, id: diva2:1275663
Available from: 2019-01-07 Created: 2019-01-07 Last updated: 2019-06-27Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Zhang, FanFleyeh, Hasan

Search in DiVA

By author/editor
Zhang, FanFleyeh, Hasan
By organisation
Microdata AnalysisComputer Engineering
In the same journal
Automation in Construction
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 104 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • chicago-author-date
  • chicago-note-bibliography
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf