U.S. flag An official website of the United States government.
Official websites use .gov

A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

i

Crash Narrative Classification: Identifying Agricultural Crashes Using Machine Learning with Curated Keywords



Details

  • Personal Author:
  • Description:
    Objective: Traditionally, structured or coded data fields from a crash report are the basis for identifying crashes involving different types of vehicles, such as farm equipment. However, using only the structured data can lead to misclassification of vehicle or crash type. The objective of the current article is to examine the use of machine learning methods for identifying agricultural crashes based on the crash narrative and to transfer the application of models to different settings (e.g., future years of data, other states). Methods: Different data representations (e.g., bag-of-words [BoW], bag-of-keywords [BoK]) and document classification algorithms (e.g., support vector machine [SVM], multinomial naïve Bayes classifier [MNB]) were explored using Texas and Louisiana crash narratives across different time periods. Results: The BoK-support vector classifier (SVC), BoK-MNB, and BoW-SVC models trained with Texas data were better predictive models than the baseline rule-based algorithm on the future year test data, with F1 scores of 0.88, 0.89, 0.85 vs. 0.84. The BoK-MNB trained with Louisiana data performed the closest to the baseline rule-based algorithm on the future year test data (F1 scores, 0.91 baseline rule-based algorithm vs. 0.89 BoK-MNB). The BoK-SVC and BoK-MNB models trained with Texas and Louisiana data were better productive models for Texas future year test data with F1 scores 0.89 and 0.90 vs. 0.84. The BoK-MNB model trained with both states' data was a better predictive model for the Louisiana future year test data, F1 score 0.94 vs. 0.91. Conclusions: The findings of this study support that machine learning methodologies can potentially reduce the amount of human power required to develop key word lists and manually review narratives. [Description provided by NIOSH]
  • Subjects:
  • Keywords:
  • ISSN:
    1538-9588
  • Document Type:
  • Funding:
  • Genre:
  • Place as Subject:
  • CIO:
  • Topic:
  • Location:
  • Pages in Document:
    74-78
  • Volume:
    22
  • Issue:
    1
  • NIOSHTIC Number:
    nn:20068187
  • Citation:
    Traffic Inj Prev 2021 Jan; 22(1):74-78
  • Contact Point Address:
    Amber Brooke Trueblood, Center for Transportation Safety, Texas A&M Transportation Institute, 3135 TAMU, College Station, TX, 77843-3135
  • Email:
    a-trueblood@tti.tamu.edu
  • Federal Fiscal Year:
    2021
  • NORA Priority Area:
  • Performing Organization:
    University of Texas Health Center at Tyler
  • Peer Reviewed:
    True
  • Start Date:
    20010930
  • Source Full Name:
    Traffic Injury Prevention
  • End Date:
    20270929
  • Collection(s):
  • Main Document Checksum:
    urn:sha-512:9fe43af13cca492759d0fd9ff22140d26dced3f29dd7c2bb2834ea22e64feab8d281c2dba649eb5d3cf86d93412a2d5fbf0f49e7b27c1967b5acb849ed0959d4
  • Download URL:
  • File Type:
    Filetype[PDF - 1.14 MB ]
ON THIS PAGE

CDC STACKS serves as an archival repository of CDC-published products including scientific findings, journal articles, guidelines, recommendations, or other public health information authored or co-authored by CDC or funded partners.

As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.