HighlightsEvaluated six machine learning algorithms for classifying accident narratives in accordance to 11 accident types.Found that support vector machine (SVM) produced the best performance.Across the 11 accident types, the average precision of the SVM was 0.73, average recall was 0.63, and average F1 score was 0.67.Commonly mislabeled cases were evaluated using confusion matrix, and qualitative evaluation of mislabeled cases.A set of 1,000 labelled accident narratives and more than 3,000 unlabeled narratives were made available publicly.

    AbstractLearning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided.


    Zugriff

    Zugriff prüfen

    Verfügbarkeit in meiner Bibliothek prüfen

    Bestellung bei Subito €


    Exportieren, teilen und zitieren



    Titel :

    Construction accident narrative classification: An evaluation of text mining techniques


    Beteiligte:

    Erschienen in:

    Erscheinungsdatum :

    2017-08-26


    Format / Umfang :

    9 pages




    Medientyp :

    Aufsatz (Zeitschrift)


    Format :

    Elektronische Ressource


    Sprache :

    Englisch




    Text Mining Analysis of Railroad Accident Investigation Reports

    Williams, Trefor / Betak, John / Findley, Bridgette | British Library Conference Proceedings | 2016



    Evaluation of database balancing techniques for road accident severity classification employing Artificial Neural Network

    Maria Lígia Chuerubim / Leonardo N. Ferreira / Alan D.B. Valejo et al. | DOAJ | 2020

    Freier Zugriff

    Automobile accident classification

    Sanford, W.E. | Engineering Index Backfile | 1929