Development and evaluation of an auto-coding model for coding unstructured text data among workers' compensation claims.
Public Domain
-
2013/05/01
Details
-
Personal Author:
-
Description:Work-related musculoskeletal disorders caused by ergonomic risk factors (MSDs) such as overexertion and repetitive motion and injuries caused by a slip, trip or fall (STF) are common among workers and result in pain, disability, and substantial cost to workers and employers (Bureau of Labor Statistics, 2011; Liberty Mutual Research Institute for Safety, 2011). The majority of work-related occupational injuries and illnesses can be categorized as a MSD or a STF (Bureau of Labor Statistics, 2011). Improved surveillance of occupational illnesses and injuries (II) classified as MSDs and STFs has been a high national priority, as determined by the National Occupational Research Agenda (NORA). In fact, ninety percent of the time, surveillance of MSDs and STFs were included as strategic goals among the ten NORA sectors' (e.g. manufacturing, construction, wholesale/retail trade [WRT]) agendas. Tracking the incidence and prevalence of MSDs and STFs among Ohio workers is one aim of the partnership between the National Institute for Occupational Safety and Health (NIOSH) and the Ohio Bureau of Workers' Compensation (OBWC). The OBWC collects claims data primarily to manage claims and determine future workers' compensation premiums. Prior to 2007, OBWC had no systematic way of tracking events or exposures (i.e. causation) such as ergonomic risk factors and slips, trips, or falls. Causation was only recorded in a free-text field (unstructured data) used to describe the work-related cause of the claim. Tracking the incidence and prevalence of MSDs and STFs among Ohio workers would therefore require coding causation for millions of unstructured fields and to do this manually was not feasible. Recently, Lehto et al (Lehto et al 2009; Wellman et al, 2004) demonstrated that computer learning algorithms using Bayesian methods could auto-code injury narratives into different causation groups, without any manual intervention, efficiently and accurately. The authors demonstrated that the algorithms could code thousands of claims in a matter of minutes or hours with a high degree of accuracy by "learning" from claims previously coded by experts, referred to as a training set. Furthermore, these algorithms provided a score for each claim that reflected the algorithm's confidence in the prediction and, therefore, claims with low confidence scores could be flagged for manual review. The main goal of this project was to develop and evaluate an auto-coding method which could be used to aid the manual coding of OBWC claim causations as MSD, STF, or other (OTH).
-
Subjects:
-
Keywords:
-
Publisher:
-
Document Type:
-
Genre:
-
Place as Subject:
-
CIO:
-
Division:
-
Topic:
-
Location:
-
Pages in Document:153-156
-
NIOSHTIC Number:nn:20042653
-
Citation:Use of workers' compensation data for occupational safety and health: proceedings from June 2012 workshop. Utterback DF, Schnorr TM, eds. Cincinnati, OH: U.S. Department of Health and Human Services, Public Health Service, Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health, DHHS (NIOSH) Publication No. 2013-147, 2013 May; :153-156
-
Editor(s):
-
Federal Fiscal Year:2013
-
NORA Priority Area:
-
Peer Reviewed:False
-
Source Full Name:Use of workers' compensation data for occupational safety and health: proceedings from June 2012 workshop
-
Collection(s):
-
Main Document Checksum:urn:sha-512:b0aafae5e7569ffac359c4ab8dfd730a8f43121aae0d1bedaf265add02a4e1a738451584f95836141dad615ce22b9401dccf32261589508ad0a2c92151d1f107
-
Download URL:
-
File Type:
ON THIS PAGE
CDC STACKS serves as an archival repository of CDC-published products including
scientific findings,
journal articles, guidelines, recommendations, or other public health information authored or
co-authored by CDC or funded partners.
As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.
As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.
You May Also Like