<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article"><?properties manuscript?><front><journal-meta><journal-id journal-id-type="nlm-journal-id">1264241</journal-id><journal-id journal-id-type="pubmed-jr-id">27000</journal-id><journal-id journal-id-type="nlm-ta">J Safety Res</journal-id><journal-id journal-id-type="iso-abbrev">J Safety Res</journal-id><journal-title-group><journal-title>Journal of safety research</journal-title></journal-title-group><issn pub-type="ppub">0022-4375</issn><issn pub-type="epub">1879-1247</issn></journal-meta><article-meta><article-id pub-id-type="pmid">27620937</article-id><article-id pub-id-type="pmc">5023031</article-id><article-id pub-id-type="doi">10.1016/j.jsr.2016.07.002</article-id><article-id pub-id-type="manuscript">HHSPA811456</article-id><article-categories><subj-group subj-group-type="heading"><subject>Article</subject></subj-group></article-categories><title-group><article-title>Off-road truck-related accidents in U.S. mines</article-title></title-group><contrib-group><contrib contrib-type="author"><name><surname>Dindarloo</surname><given-names>Saeid R.</given-names></name><xref ref-type="aff" rid="A1">a</xref><xref rid="FN1" ref-type="author-notes">*</xref></contrib><contrib contrib-type="author"><name><surname>Pollard</surname><given-names>Jonisha P.</given-names></name><xref ref-type="aff" rid="A2">b</xref></contrib><contrib contrib-type="author"><name><surname>Siami-Irdemoosa</surname><given-names>Elnaz</given-names></name><xref ref-type="aff" rid="A3">c</xref></contrib></contrib-group><aff id="A1"><label>a</label>Department of Mining and Nuclear Engineering, Missouri University of Science and Technology, MO, USA</aff><aff id="A2"><label>b</label>Workplace Health Branch, Pittsburgh Mining Research Division, NIOSH, PA, USA</aff><aff id="A3"><label>c</label>Department of Geoscience and Geological and Petroleum Engineering, Missouri University of Science and Technology, MO, USA</aff><author-notes><corresp id="FN1"><label>*</label>Corresponding author at: 226 McNutt Hall, 1400 N. Bishop Ave., Rolla, MO 65409, USA. <email>srd5zb@mst.edy</email> (S.R. Dindarloo)</corresp></author-notes><pub-date pub-type="nihms-submitted"><day>19</day><month>8</month><year>2016</year></pub-date><pub-date pub-type="epub"><day>25</day><month>7</month><year>2016</year></pub-date><pub-date pub-type="ppub"><month>9</month><year>2016</year></pub-date><pub-date pub-type="pmc-release"><day>01</day><month>9</month><year>2017</year></pub-date><volume>58</volume><fpage>79</fpage><lpage>87</lpage><!--elocation-id from pubmed: 10.1016/j.jsr.2016.07.002--><abstract><sec id="S1"><title>Introduction</title><p id="P1">Off-road trucks are one of the major sources of equipment-related accidents in the U.S. mining industries. A systematic analysis of all off-road truck-related accidents, injuries, and illnesses, which are reported and published by the Mine Safety and Health Administration (MSHA), is expected to provide practical insights for identifying the accident patterns and trends in the available raw database. Therefore, appropriate safety management measures can be administered and implemented based on these accident patterns/trends.</p></sec><sec id="S2"><title>Methods</title><p id="P2">A hybrid clustering-classification methodology using K-means clustering and gene expression programming (GEP) is proposed for the analysis of severe and non-severe off-road truck-related injuries at U.S. mines. Using the GEP sub-model, a small subset of the 36 recorded attributes was found to be correlated to the severity level.</p></sec><sec id="S3"><title>Results</title><p id="P3">Given the set of specified attributes, the clustering sub-model was able to cluster the accident records into 5 distinct groups. For instance, the first cluster contained accidents related to minerals processing mills and coal preparation plants (91%). More than two-thirds of the victims in this cluster had less than 5 years of job experience. This cluster was associated with the highest percentage of severe injuries (22 severe accidents, 3.4%). Almost 50% of all accidents in this cluster occurred at stone operations. Similarly, the other four clusters were characterized to highlight important patterns that can be used to determine areas of focus for safety initiatives.</p></sec><sec id="S4"><title>Conclusions</title><p id="P4">The identified clusters of accidents may play a vital role in the prevention of severe injuries in mining. Further research into the cluster attributes and identified patterns will be necessary to determine how these factors can be mitigated to reduce the risk of severe injuries.</p></sec><sec id="S5"><title>Practical application</title><p id="P5">Analyzing injury data using data mining techniques provides some insight into attributes that are associated with high accuracies for predicting injury severity.</p></sec></abstract><kwd-group><kwd>Off-road mining trucks</kwd><kwd>Fatalities and injuries</kwd><kwd>K-means clustering</kwd><kwd>Genetic programming</kwd><kwd>Classification</kwd></kwd-group></article-meta></front><body><sec id="S6"><title>1. Introduction</title><p id="P6">Analysis of workplace injuries has been heavily utilized as a means to determine high-risk tasks, prioritize workplace redesign, and determine areas of concern for worker safety in many industries including healthcare, construction, retail and services, and mining (<xref rid="R2" ref-type="bibr">Cato, Olson, &#x00026; Studer, 1989</xref>; <xref rid="R6" ref-type="bibr">Drury, Porter, &#x00026; Dempsey, 2012</xref>; <xref rid="R15" ref-type="bibr">Mardis &#x00026; Pratt, 2003</xref>; <xref rid="R18" ref-type="bibr">Moore, Porter, &#x00026; Dempsey, 2009</xref>; <xref rid="R20" ref-type="bibr">Pollard, Heberger, &#x00026; Dempsey, 2014</xref>; <xref rid="R23" ref-type="bibr">Schoenfisch, Lipscomb, Shishlov, &#x00026; Myers, 2010</xref>; <xref rid="R25" ref-type="bibr">Turin, Wiehagen, Jaspal, &#x00026; Mayton, 2001</xref>; <xref rid="R27" ref-type="bibr">Wiehagen, Mayton, Jaspal, &#x00026; Turin, 2001</xref>). While many industries would require injury records from individual companies or insurance providers to perform an analysis, mining is uniquely suited for a more comprehensive injury analysis. An important feature of U.S. mining is the accessibility of injury records. The Mine Safety and Health Administration requires all mine operators and contractors to file a Mine Accident, Injury and Illness Report (MSHA Form 7000-1) for all reportable accidents, injuries, and illnesses incurred at U.S. mining facilities. Reportable illnesses include any illness or disease that may have resulted from work. The database of these reports is available in the public domain and is provided by the National Institute for Occupational Safety and Health (<ext-link ext-link-type="uri" xlink:href="http://www.cdc.gov/niosh/mining/data/default.html">http://www.cdc.gov/niosh/mining/data/default.html</ext-link>). Each entry of the database contains 36 unique attributes including: mine id, mining method, accident date, degree of injury, accident classification, mining equipment, employee's experience and activity, and a narrative briefly explaining the accident. Previous mining research has examined the injury and fatality causes associated with maintenance and repair, haulage vehicles, ingress and egress from mobile equipment, operating underground and surface mining mobile equipment, and other mining tasks (<xref rid="R6" ref-type="bibr">Drury et al., 2012</xref>; <xref rid="R18" ref-type="bibr">Moore et al., 2009</xref>; <xref rid="R20" ref-type="bibr">Pollard et al., 2014</xref>; <xref rid="R21" ref-type="bibr">Reardon, Heberger, &#x00026; Dempsey, 2014</xref>; <xref rid="R25" ref-type="bibr">Turin et al., 2001</xref>; <xref rid="R27" ref-type="bibr">Wiehagen et al., 2001</xref>). Traditional injury data analysis uses counts and cross-tabulations as a means to determine trends in injuries. While this typically yields useful information, more sophisticated data mining techniques may allow for more improved classification of injuries through identification of injury patterns.</p><p id="P7">Clustering and classification are the two widely used methods of data mining for the purpose of pattern recognition. Clustering is among the unsupervised methods of pattern recognition while the classification is a supervised learning method. By an unsupervised method, one means that the data analyzer does not have any prior hypothesis or pre-specified models for the data, but wants to understand the general characteristics or the structure of the high-dimensional data. A supervised method means that the investigator wants to confirm the validity of a hypothesis/model or a set of assumptions, given the available data (<xref rid="R11" ref-type="bibr">Jain, 2010</xref>). Clustering and classification are also called un-labeled and labeled, respectively. In pattern recognition, data analysis is concerned with predictive modeling: given some training data, the prediction task is to find the behavior of the unseen test data. This task is also referred to as learning. Often, a clear distinction is made between learning problems that are (i) supervised (classification) or (ii) unsupervised (clustering), the first involving only labeled data (training patterns with known category labels), while the latter involves only unlabeled data (<xref rid="R7" ref-type="bibr">Duda, Hart, &#x00026; Stork, 2001</xref>; <xref rid="R11" ref-type="bibr">Jain, 2010</xref>). Clustering and classifications are performed using differing algorithms but may be used together to improve prediction accuracy.</p><p id="P8">The aim of this research was to gain a better understanding of the factors associating with severe injuries (fatalities and permanent disabilities) in U.S. mining by employing data mining techniques. Clustering and classification were employed for a comprehensive analysis of off-road truck-related accidents and injuries reported to MSHA during a 13-year period (2000&#x02013;2012). Gene expression programming was used for classification, allowing all injury attributes to be considered and tested to determine which were associated with the highest prediction accuracies. The most explanatory attributes were selected among the available 36 unique attributes in the MSHA database. Then, K-means clustering was used as a means to identify similarity/dissimilarity between the accident records using the selected attributes for the purposes of pattern recognition in the raw data. It should be noted that the goal of this study was not to establish cause&#x02013;effect relationships between accident attributes and outcomes, but to: (a) use data mining to systematically identify important attributes from MSHA incident reports that are highly associated with the outcomes of accidents (classification), and (b) recognize patterns in the accidents (clustering) given a set of work-related attributes.</p></sec><sec id="S7"><title>2. Materials and methods</title><sec id="S8"><title>2.1. MSHA injury data</title><p id="P9">A dataset comprised of 13 years (2000&#x02013;2012) of Mine Accident, Injury and Illness Reports was selected beginning with 1/1/2000 (<xref rid="R17" ref-type="bibr">MSHA, 2014</xref>). From this dataset, records of severe injuries (fatalities and permanent disabilities) and non-severe injuries associated with off-road trucks were selected. The NIOSH code &#x0201c;minemach-44, all accidents related to off-road mining trucks&#x0201d; was identified to select the records of interest in this study. A total of 5,831 records of injuries (both severe and non-severe) were filtered for further analysis. This dataset included 125 severe records that affected 140 employees. These severe injuries consisted of 88 fatalities and 52 permanent disabilities. These records were analyzed using Minitab (Minitab Inc., State College, Pennsylvania), MATLAB (MathWorks, Inc., Natick, Massachusetts), Rapidminer (RapidMiner, Inc., Cambridge, Massachusetts) and GenExprotools (Gepsoft Limited, Bristol, UK) to determine factors associated with the highest counts of severe injuries.</p></sec><sec id="S9"><title>2.2. GEP-clustering modeling</title><p id="P10">The objective of clustering is to discriminate between dissimilar data by dataset partitioning (clustering). As an unsupervised data mining technique, the aim of clustering is to split a heterogeneous dataset into several more homogenous groups. The optimization task is to maximize the similarity between the in-cluster members and dissimilarity between the out-cluster members. K-means clustering is used to partition a large, highly variable dataset such that like data are grouped together. As an example, one is given a set of <italic>n</italic> data points in <italic>d</italic>-dimensional space (<italic>R<sup>d</sup></italic>) and an integer <italic>k</italic>. The goal is to determine a set of <italic>k</italic> points in <italic>R<sup>d</sup></italic>, called centers, so as to minimize the mean squared distance from each data point to its nearest center (<xref rid="R12" ref-type="bibr">Kanungo et al., 2002</xref>). Let <italic>X</italic> = {<italic>x<sub>i</sub></italic>}, <italic>i</italic> = 1, &#x02026;, <italic>n</italic>,be the <italic>d</italic>-dimensional observations which are clustered into a set of <italic>k</italic> clusters, <italic>C</italic> = {<italic>c<sub>k</sub></italic>, <italic>K</italic> = 1, &#x02026;, <italic>K</italic>}. The K-means clustering finds a partition such that the squared error between the empirical mean of a cluster and the points in the cluster is minimized. Let <italic>&#x003bc;<sub>k</sub></italic> be the mean of the cluster <italic>c<sub>k</sub></italic>. The squared error between <italic>&#x003bc;<sub>k</sub></italic> and the points in cluster <italic>c<sub>k</sub></italic> is defined as shown in <xref rid="FD1" ref-type="disp-formula">Eq. (1)</xref>. The goal of K-means clustering is to minimize the sum of the squared error over all <italic>k</italic> clusters as shown in <xref rid="FD1" ref-type="disp-formula">Eq. (2)</xref>. (For a detailed review of the theory and background of K-means clustering see: <xref rid="R10" ref-type="bibr">Halkidi et al., 2001</xref>, <xref rid="R9" ref-type="bibr">Fraley and Raftery, 2002</xref>, <xref rid="R11" ref-type="bibr">Jain, 2010</xref>.)</p><disp-formula id="FD1"><label>(1)</label><mml:math id="M1" display="block" overflow="scroll"><mml:mi>J</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:munder><mml:msup><mml:mrow><mml:mo>&#x02016;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003bc;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x02016;</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:math></disp-formula><disp-formula id="FD2"><label>(2)</label><mml:math id="M2" display="block" overflow="scroll"><mml:mi>J</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>c</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:munderover><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:munder><mml:msup><mml:mrow><mml:mo>&#x02016;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003bc;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x02016;</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:math></disp-formula><p id="P11">Genetic programming (GP) creates a functional relationship between inputs (attributes) and outputs to predict the occurrence of the output based on the properties of the attributes. Genetic programming can be represented as a hierarchically structured tree comprising functions and terminals. <xref rid="F1" ref-type="fig">Fig. 1</xref> illustrates a simple representation of a GP tree for the function <italic>Coshx</italic>/(<italic>C</italic><sub>1</sub><italic>sinx</italic>). The tree reads from left to right and from bottom to top. Mimicking the Darwinian principle of survival, the fittest solutions (smallest error) are chosen to generate a population of new offspring programs for the next generation (<xref rid="R14" ref-type="bibr">Koza, 1992</xref>). In the next step, some genetic operations, namely mutation and crossover, will generate new offspring from the fittest programs of the previous generation. The operator selects a random node in a tree and replaces it with another node or subtree. The new offspring will be evaluated with the error or fitness function. The process continues until reaching a predefined threshold in terms of the best fit or error. In GP, thousands of solutions (computer programs) are generated and evolved consecutively based on the Darwinian principle of survival. The search for the solution starts with a population of completely randomly generated programs (solutions) from a predefined set of available functions (e.g., arithmetic functions) and terminals (independent variables). All programs are measured against a fitness function (e.g., root mean square error in a regression problem) and the best ones survive and are bred to the next generation.</p><p id="P12">Proposed by <xref rid="R8" ref-type="bibr">Ferreira (2001)</xref>, gene expression programming (GEP) is a development of the conventional GP. As with GP, in GEP the main steps include: the function set, terminal set, fitness function, control parameters, and stop criteria. The fundamental difference between GP and GEP resides in the nature of the individuals. In GP the individuals are nonlinear entities of different sizes and shapes (parse trees). In GEP the individuals are also nonlinear entities of different sizes and shapes (expression trees), but these complex entities are encoded as simple strings of fixed length (chromosomes). The chromosomes in this study are the attributes of each record of injuries (60 attributes and 5831 records). Unlike the parse tree representation in conventional GP, GEP uses a fixed length of character strings to represent solutions to the problems, which are afterwards expressed as parse trees of different sizes and shapes. These trees are called GEP expression trees (ETs). One advantage of the GEP technique is that the creation of genetic diversity is extremely simplified as genetic operators work at the chromosomal level. Another GEP strength is its unique, multi-genic nature that allows the evolution of more complex programs composed of several subprograms. The GEP algorithm begins with an initial population of chromosomes, which are randomly generated, linear strings of a fixed length. Then, the linear chromosomes are expressed as ETs and the fitness of each individual is evaluated based on a predefined fitness function. The individuals are then selected, according to fitness, to form a new generation&#x02014;i.e., the higher the fitness value, the more chance an individual has to be selected. The selected individuals are also subjected to reproduction with modification, through genetic operators like crossover, mutation, and rotation. The individuals of this new generation are, in their turn, subjected to the same developmental process: expression of the genomes, selection, and reproduction. The process is repeated for a certain number of generations or until a solution has been found.</p><p id="P13">The GEP algorithm was used for classification of the off-road truck-related injury dataset. The objective was to identify the most explanatory (independent) variables (attributes) to be used in the subsequent clustering sub-model. One advantage of GEP over conventional classification algorithms (such as support vector machines and artificial neural networks) is that it can select the best explanatory features to achieve the highest fitness or accuracy. Hence, all available independent variables included in the incident reports, including both the categorical and numerical variables, were initially used in the GEP model building as explanatory variables. The injury severity was converted into a binary variable with: class 0 for those severe truck-related incidents that resulted in either fatalities or permanent disabilities, and class 1 for all other injuries. Two-thirds of the records (3,888) were used for training the algorithm and the remaining (1,943) for model testing. The fitness function was set as the sensitivity/specificity to achieve maximum classification accuracy. K-means clustering was used to group similar injuries therefore maximizing the similarity of records in each cluster. All data (both severe and non-severe) were analyzed together and separated into clusters.</p></sec></sec><sec id="S10"><title>3. Results</title><sec id="S11" sec-type="methods"><title>3.1. MSHA injury data overview</title><p id="P14">A brief analysis of the MSHA injury data was performed to give an understanding of the types of data available in the injury records. A temporal illustration of off-road truck-related fatalities and disabilities (<xref rid="F2" ref-type="fig">Fig. 2</xref>) shows fluctuations in the fatality and disability counts over the years with a clear decreasing trend. The distribution of accident types is shown in <xref rid="F3" ref-type="fig">Fig. 3</xref>. An analysis of the type of mine operation found that most severe injuries occurred at surface mines that use strips, quarries, or open pits as shown in <xref rid="F4" ref-type="fig">Fig. 4</xref>. Employee job experience was also examined as shown in <xref rid="F5" ref-type="fig">Fig. 5a</xref>. Frequencies of severe versus other injuries among victims of different job experiences are illustrated in <xref rid="F5" ref-type="fig">Fig. 5b</xref>. Nearly half of the affected employees had less than 5 years of job experience. Work activity at the time of injury was also examined as shown in <xref rid="F6" ref-type="fig">Fig. 6</xref>.</p><p id="P15">Most of the severe injuries were sustained while the employee was operating the off-road truck. In the 56 severe accidents (57 victims) that occurred while operating a truck, 46 employees sustained fatal injuries and 11 employees were permanently disabled. The four main causes of these accidents (as identified in the report of the MSHA investigation) were: (i) losing control of the truck, (ii) berm/dump failure, (iii) unsafe/careless actions, and (iv) truck/component mechanical failure. In the category of &#x0201c;unsafe/careless actions,&#x0201d; MSHA identified that the underlying root causes were either associated with poor safety training, management and enforcement with managerial causes, or failure of the employees to obey safety regulations (personal issues). In nearly all of the 26 accidents (affecting 27 victims) that occurred during the &#x0201c;maintenance/repair&#x0201d; activity, either the employees' failure to regard safety regulations or a component/equipment malfunction/failure caused the injuries. A major cause of fatalities in this category was failure to block off/lockout the truck/bed. This was either due to the employees' unsafe actions or the mechanical malfunction of the equipment. In the &#x0201c;other&#x0201d; category, 17 fatal accidents resulted in 15 fatalities and 7 permanent disabilities. The main causes of this category were: (i) the operator(s) exited the cab while operating the equipment and (ii) unsafe/careless minor repairing attempts.</p></sec><sec id="S12"><title>3.2. Results of gene expression programming</title><p id="P16">Gene expression programming utilized all available variables in the incident records to determine those that best predicted injury severity. The important GEP parameters (number of chromosomes, head size, and number of genes) were determined through a grid search algorithm to achieve the maximum classification accuracy. A grid search is an exhaustive search in the pre-specified domains of the parameters of interests. In this case, using the grid search, the best results (highest accuracies) were achieved when the three parameters were set to 34, 11, and 5, respectively. The resulting best model is shown in the GEP expression tree in <xref rid="F7" ref-type="fig">Fig. 7</xref> with the key in <xref rid="T5" ref-type="table">Table 1</xref>. The two classes are &#x0201c;severe&#x0201d; and &#x0201c;other injuries.&#x0201d; Results of this programming elicited five attributes that best explained the injury severity. These attributes were: the operation subunit (NIOSH code &#x0201c;subunit&#x0201d;), the month of the year in which the accident occurred (NIOSH code &#x0201c;month&#x0201d;), the victim's job experience at the time of the accident (NIOSH code &#x0201c;expjob&#x0201d;), the employee's activity at the time of the accident (NIOSH code &#x0201c;mwactiv&#x0201d;), and the type of operation (NIOSH code &#x0201c;commod,&#x0201d; including: coal, metal, non-metal, stone, sand &#x00026; gravel operators, as well as, coal and non-coal contractors). The overall classification accuracies of 64.55% and 65.65% were obtained for the training and testing datasets, respectively. The classification accuracies for the severe injury class and the all other injuries class in the testing dataset were 63.04% and 64.69%, respectively (see <xref rid="T2" ref-type="table">Table 2</xref>).</p><p id="P17">In order to examine the accuracy of the GEP classification algorithm, the whole database was modeled with the widely-used traditional method of logistic regression for binary classification (<xref rid="R3" ref-type="bibr">Cox, 1958</xref>). In a binary logistic regression, the dependent variable is a two-class categorical variable. Here, the two classes were severe and non-severe accidents. The objective of the regression was to classify the dataset using the available attributes. Similar to GEP, the logistic regression builds a relationship between the (categorical) dependent variable and (a combination of categorical, ordinal, and numerical) independent variables (attributes). A logistic regression model assigns to each set of attributes a number between 0 and 1. This output can be interpreted as the probability of a set of attributes belonging to each class. For instance, an output value equal to 0.3 means that the accident (given its attributes) is a non-severe one (class 0) with a probability of 70% and a severe accident (class 1) with a probability of 30%. Using the same training, testing, and attribute sets as the GEP, the results of logistic regressions were obtained for the whole dataset and are compared with the GEP results in <xref rid="T2" ref-type="table">Table 2</xref>. GEP had a superior performance compared to logistic regression resulting in a more accurate classification of both severe and non-severe accidents.</p></sec><sec id="S13"><title>3.3. Results of K-means clustering</title><p id="P18">Using the GEP sub-model's output, the resulting clusters from the K-means clustering were defined based on their most dominant attribute, which discriminated them from the other clusters while maintaining the highest similarity inside the cluster. Therefore, the clusters were defined based on their attributes as specified in <xref rid="T3" ref-type="table">Table 3a</xref>. The optimal number of clusters was identified using the GAP Statistics method (<xref rid="R24" ref-type="bibr">Tibshirani, Walther, &#x00026; Hastie, 2001</xref>). Five clusters were created as defined in <xref rid="T8" ref-type="table">Table 4</xref>. The number of incidents in each cluster and the average job experience of employees at the time of the accident are provided in <xref rid="T3" ref-type="table">Tables 3a</xref>, <xref rid="T4" ref-type="table">3b</xref>, <xref rid="T5" ref-type="table">3c</xref>, and <xref rid="T6" ref-type="table">3d</xref> along with the dominant subunit and activity, operation, and month codes for each cluster. The activity, operation type, and subunit codes in <xref rid="T3" ref-type="table">Table 3a</xref> are defined in <xref rid="T4" ref-type="table">Tables 3b</xref>&#x02013;<xref rid="T6" ref-type="table">3d</xref>. The last column in <xref rid="T3" ref-type="table">Table 3a</xref> shows the number (percentage) of severe accidents (Codes 1 and 2) inside each of the five proposed clusters. In summary, <xref rid="T3" ref-type="table">Tables 3a</xref>, <xref rid="T4" ref-type="table">3b</xref>, <xref rid="T5" ref-type="table">3c</xref>, and <xref rid="T6" ref-type="table">3d</xref> show the most discriminating attribute(s) for each cluster. For instance, in cluster 0, the operation code 4 (stone operator) is dominant (48%). This means almost half of the severe injuries were occurred in the &#x0201c;stone operator&#x0201d; classification. Over two-thirds (72%) of the victims had less than 5 years of &#x0201c;job experience.&#x0201d; Also, the dominant subunit code was 9 (i.e., mill or preparation plants, see <xref rid="T4" ref-type="table">Table 3d</xref>), which includes over 90% of this cluster's accidents. Similarly, the major activity codes, in cluster 0, were 1 and 5 (see <xref rid="T1" ref-type="table">Table 3b</xref>) which implied that the two activities of &#x0201c;getting on/off truck&#x0201d; and &#x0201c;driving off-road truck&#x0201d; were associated with 72% of the accidents during the period 2000&#x02013;2012. In this cluster, a total of 22 fatalities and permanent disabilities have been recorded that account for 3.4% of the total reported accidents (including non-severe ones).</p></sec></sec><sec id="S14"><title>4. Discussion and conclusions</title><p id="P19">Truck-related fatalities have been previously examined in the literature. An analysis of fatal truck-related accidents during the period 1995&#x02013;2006 revealed the three most frequent causes of the haul truck-related fatalities as: (i) failure of victims to respect haul truck working area, (ii) failure to provide adequate berms, and (iii) failure of mechanical components (<xref rid="R16" ref-type="bibr">Md-Nor, Kecojevic, Komijenivic, &#x00026; Groves, 2008</xref>). Another study within the period of 1995&#x02013;2002 (<xref rid="R13" ref-type="bibr">Kecojevic &#x00026; Radomsky, 2004</xref>) categorized the major causes of fatalities as follows: (i) failure of mechanical components (22%); (ii) lack of and/or failure to obey warning signals (20%); (iii) failure to maintain adequate berm (13%); (iv) inadequate hazard training (10%); and (v) failure to recognize adverse geological conditions (10%). <xref rid="R13" ref-type="bibr">Kecojevic and Radomsky (2004)</xref> associated 70% of all fatalities to the above five categories, which are consistent with the results of studies for periods 1995&#x02013;2006 and the current study. <xref rid="R22" ref-type="bibr">Ruff, Coleman, and Martini (2011)</xref> studied the equipment-related fatalities for the period 2000&#x02013;2007 and found that, for mobile equipment, the most frequent fatalities were related to loss of control or visibility issues during the operation of the equipment. A more recent analysis examined 133 fatality reports for the period 1995&#x02013;2010 using a previously developed coding scheme to determine repeating patterns of accidents (<xref rid="R6" ref-type="bibr">Drury et al., 2012</xref>). In this work, the authors were able to more fully develop the classification patterns previously reported by <xref rid="R26" ref-type="bibr">Wenner and Drury (2000)</xref>. This scheme was broken into driving and non-driving accidents. Under driving, the factors included: loss of control, failure of ground, two-vehicle collisions, mechanical failure (sudden and inadequate performance), and leaving the driving track. Under non-driving, the factors included: unexpected movement (of the vehicle or part of the vehicle or vehicle's load), falls from vehicle, and hit by other vehicle. Comparison of the results of these previous studies with the current study reveals that the root causes of truck-related fatalities have not changed in the past two decades. A new method to determine factors associated with these injuries is needed.</p><p id="P20">Gene expression programming was found to successfully predict severe accidents in mining based on available injury data. The accuracies obtained using GEP for both the training and testing data are comparable or superior to those reported in car and traffic accident studies (<xref rid="R1" ref-type="bibr">Abdelwahab &#x00026; Abdel-Aty, 2001</xref>; <xref rid="R4" ref-type="bibr">De O&#x000f1;a, L&#x000f3;pez, Mujalli, &#x00026; Calvo, 2013</xref>; <xref rid="R5" ref-type="bibr">De O&#x000f1;a, Mujalli, &#x00026; Calvo, 2011</xref>; <xref rid="R19" ref-type="bibr">Mujalli &#x00026; De O&#x000f1;a, 2011</xref>). Therefore, the GEP model's inputs (variables) had the highest explanatory characteristics, among 36 attributes, for severity estimation. The break-down of injuries within the clusters was quite interesting.</p><p id="P21">Minerals processing mills and coal preparation plants essentially formed their own cluster (cluster 0). Therefore, there appears to be a unique element to these plants that created different truck-related injury scenarios from the other subunits. Typically, this subunit includes mills, coal preparation plants, breaker operations, shops, and yards associated with one specific mine. These are non-production locations and likely utilize differing types of equipment and have different geographies and layouts from the pits or underground mining locations. Also of the 653 incidents in this cluster, in the past 13 years about 31% of all accidents (203 instances) happened during June&#x02013;August. The main reason(s) behind the high accident rates during these months should be examined in a case by case manner for all of the mining sites in this cluster. Therefore the root cause(s) for the higher accident rates cannot be identified for this particular cluster of mining sites. Furthermore, per <xref rid="T3" ref-type="table">Table 3a</xref> over 70% of the victims had job experiences in the range of 0&#x02013;5 years. Therefore, the inexperienced crews in all of the mining sites in cluster 0 experienced more accidents than other four clusters. The distribution of these inexperienced workers within this cluster as compared to the other clusters is not known so determining the relative rates of accidents was not possible. However, the high number of accidents in the inexperienced worker can justify implementing further training and safety measures. Additionally, over 70% of all off-road truck-related accidents happened when the truck operators were either getting on/off the trucks or when they were operating the trucks. Previous research has identified the safety issues associated with getting on and off of mobile equipment (<xref rid="R18" ref-type="bibr">Moore et al., 2009</xref>). Thus future safety improvement plans for this cluster of mining operations should focus on the root causes of ingress and egress injuries as well as those injuries sustained during operation of these trucks.</p><p id="P22">The interesting pattern in Cluster 1 is that in 55% of all cases, the victim had been inspecting the truck for maintenance/repair with non-powered hand tools. Moreover, over half of the accidents were related to either the coal or stone operators. Although this cluster did not include any severe injuries, it is clear that the use of hand tools for vehicle inspection creates a hazard at mines. Also similar to cluster 0, in this cluster the highest numbers of accident were recorded in June&#x02013;August period. Unlike cluster 0, in cluster 1 the share of the inexperienced employees for all of the accidents is not very high with respect to more experienced employees. Also, a majority of the accidents (78%) took place in either strip or open pit mines.</p><p id="P23">With 2,352 accidents, cluster 2 was the largest cluster in this study. Interestingly all of the accidents in the past 13 years in this cluster happened in the second half of the year. Although it is not within the scope of this study to analyze the root causes or contributing factors for this unique pattern, it provides a well-justified guideline for conducting future, more in-depth analyses to determine activities or tasks associated with high numbers of accidents. Also, similar to cluster 1, in the past 13 years most of the accidents in this cluster happened in surface (strip or open pit) mines. In summary, cluster 2 showed that ingress and egress from off-road trucks is still a problem and requires immediate attention.</p><p id="P24">With 187 accidents, cluster 3 was the smallest cluster in this study. Similar to cluster 1, no severe accidents had been reported since 2000. Over 60% of the accidents were associated with a non-specified activity and coded as code 6 (other). Therefore, the activities preceding accidents in this cluster were not within the more frequent accident causing activities (e.g., getting on/off truck, handling supplies). In terms of the victim job experience this cluster did not differ from clusters 1, 2, and 4 (see <xref rid="T3" ref-type="table">Table 3a</xref>).</p><p id="P25">Similar to cluster 2, one interesting pattern in cluster 4 is that no accident (severe or non-severe) had been recorded in one half of the year. Unlike cluster 2, all of the accidents in cluster 4 occurred in the first half of the year since 2000. Again, this pattern (along with the pattern in cluster 2) is a reasonable justification for conducting seasonal studies for improving safety in all of the mining sites in this cluster. Thus, the main reasons for zero accidents in the first half of each year and a total of 2,192 in the second half should be identified. Also, nearly 90% of these accidents occurred at surface mines. Cluster 4 was also found to have the highest number of severe accidents. More in-depth analysis would need to be conducted to determine the key differences between the accidents in clusters 2 and cluster 4 to determine what appears to be a seasonal division in accidents contributing to increased accident severity. More training may be needed to address the injuries in this cluster.</p><p id="P26">This study was limited to data associated with off-road trucks in mining. Thus all other injuries that were not associated with off-road trucks were not included in this study. It is unclear whether this hybrid methodology will have sufficient accuracies when applied to other types of injuries. Furthermore, the study analyzed only the accident reports within the period January 2000 to December 2012. Data included in this analysis were limited to that provided in the MSHA incident reports which have been collected, organized, and released to the public by NIOSH. Increasing the level of detail provided in incident reports will likely result in improved injury clustering and classifications. It is also important to note that correlation does not equal causation and no causal relationships are implied based on the results of this analysis. A more in-depth analysis will be necessary to determine causal factors for injuries sustained in each cluster.</p><p id="P27">In conclusion, clustering, classification, and a hybrid methodology of both&#x02014;using K-means clustering and gene expression programming&#x02014;was shown to be effective when analyzing mining injury data. In particular, the GEP classification sub-model had a better performance than the traditional method of logistic regression in accident type classification. Furthermore, the identified patterns in the accident database using the clustering method are not achievable with the traditional injury data analysis which uses counts and cross-tabulations. Therefore, determining the most dominant attributes for specific types of injuries (classification) and separating (clustering) the data based on these attributes may allow researchers to better understand the nature and causal factors of mining injuries. Clustering could likely improve traditional injury analysis methods. Injuries sustained in minerals processing mills and coal preparation plants, and injuries sustained during the operation of trucks, should be analyzed separately from larger datasets. The use of non-powered hand tools for off-road truck inspections, ingress and egress from off-road trucks, and newer employees operating dump trucks should be investigated further to determine ways to prevent these severe and non-severe accidents. While this analysis was limited to those injuries associated with off-road trucks, it is expected that larger, broader injury data analyses will also benefit from this hybrid clustering-classification methodology.</p></sec><sec id="S15"><title>5. Practical applications</title><p id="P28">Analyzing the injury data using data mining techniques provides some insight into attributes that are associated with high accuracies for predicting injury severity. Off-road truck-related injuries continue to plague the mining industry resulting in fatalities, permanent disabilities, and other less severe injuries. Many factors contribute to these injuries, and many of these are likely preventable. This analysis revealed that injuries associated with the use of non-powered hand tools for off-road truck inspections, ingress and egress from off-road trucks, and newer employees operating dump trucks are areas deserving of attention to determine ways to prevent future injuries.</p></sec></body><back><fn-group><fn id="FN2"><p><bold>Disclaimer</bold></p><p>The findings and conclusions in this article are those of the authors and do not necessarily represent the views of the National Institute for Occupational Safety and Health. Mention of any company or product does not constitute endorsement by NIOSH.</p></fn></fn-group><ref-list><ref id="R1"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Abdelwahab</surname><given-names>HT</given-names></name><name><surname>Abdel-Aty</surname><given-names>MA</given-names></name></person-group><year>2001</year><article-title>Development of artificial neural network models to predict driver injury severity in traffic accidents at signalized intersections</article-title><source>Transportation Research Record</source><volume>1746</volume><fpage>6</fpage><lpage>13</lpage></element-citation></ref><ref id="R2"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cato</surname><given-names>C</given-names></name><name><surname>Olson</surname><given-names>DK</given-names></name><name><surname>Studer</surname><given-names>M</given-names></name></person-group><year>1989</year><article-title>Incidence, prevalence and variables associated with low back pain in staff nurses</article-title><source>American Association of Occupational Health Nurses Journal</source><volume>37</volume><fpage>321</fpage><lpage>327</lpage></element-citation></ref><ref id="R3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cox</surname><given-names>DR</given-names></name></person-group><year>1958</year><article-title>The regression analysis of binary sequences (with discussion)</article-title><source>Journal of the Royal Statistical Society B</source><volume>20</volume><fpage>215</fpage><lpage>242</lpage></element-citation></ref><ref id="R4"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>De O&#x000f1;a</surname><given-names>J</given-names></name><name><surname>L&#x000f3;pez</surname><given-names>G</given-names></name><name><surname>Mujalli</surname><given-names>R</given-names></name><name><surname>Calvo</surname><given-names>FJ</given-names></name></person-group><year>2013</year><article-title>Analysis of traffic accidents on rural highways using latent class clustering and Bayesian networks</article-title><source>Accident Analysis and Prevention</source><volume>51</volume><fpage>1</fpage><lpage>10</lpage><pub-id pub-id-type="pmid">23182777</pub-id></element-citation></ref><ref id="R5"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>De O&#x000f1;a</surname><given-names>J</given-names></name><name><surname>Mujalli</surname><given-names>RO</given-names></name><name><surname>Calvo</surname><given-names>FJ</given-names></name></person-group><year>2011</year><article-title>Analysis of traffic accident injury severity on Spanish rural highways using Bayesian networks</article-title><source>Accident Analysis and Prevention</source><volume>43</volume><issue>1</issue><fpage>402</fpage><lpage>411</lpage><pub-id pub-id-type="pmid">21094338</pub-id></element-citation></ref><ref id="R6"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Drury</surname><given-names>CG</given-names></name><name><surname>Porter</surname><given-names>WL</given-names></name><name><surname>Dempsey</surname><given-names>PG</given-names></name></person-group><year>2012</year><article-title>Patterns in mining haul-truck accidents</article-title><source>Proceedings of the human factors and ergonomics society 56th annual meeting</source><fpage>2011</fpage><lpage>2015</lpage><publisher-loc>Santa Monica, CA</publisher-loc><publisher-name>Human Factors and Ergonomics Society</publisher-name></element-citation></ref><ref id="R7"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Duda</surname><given-names>R</given-names></name><name><surname>Hart</surname><given-names>P</given-names></name><name><surname>Stork</surname><given-names>D</given-names></name></person-group><year>2001</year><source>Pattern classification</source><edition>2nd</edition><publisher-loc>New York</publisher-loc><publisher-name>John Wiley and Sons</publisher-name></element-citation></ref><ref id="R8"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ferreira</surname><given-names>C</given-names></name></person-group><year>2001</year><article-title>Gene expression programming: A new adaptive algorithm for solving problems</article-title><source>Complex Systems</source><volume>13</volume><issue>2</issue><fpage>87</fpage><lpage>129</lpage></element-citation></ref><ref id="R9"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fraley</surname><given-names>C</given-names></name><name><surname>Raftery</surname><given-names>AE</given-names></name></person-group><year>2002</year><article-title>Model-based clustering, discriminant analysis, and density estimation</article-title><source>Journal of the American Statistical Association</source><volume>97</volume><issue>458</issue><fpage>611</fpage><lpage>631</lpage></element-citation></ref><ref id="R10"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Halkidi</surname><given-names>M</given-names></name><name><surname>Batistakis</surname><given-names>Y</given-names></name><name><surname>Vazirgiannis</surname><given-names>M</given-names></name></person-group><year>2001</year><article-title>On clustering validation techniques</article-title><source>Journal of Intelligent Information Systems</source><volume>17</volume><issue>2-3</issue><fpage>107</fpage><lpage>145</lpage></element-citation></ref><ref id="R11"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jain</surname><given-names>AK</given-names></name></person-group><year>2010</year><article-title>Data clustering: 50 yrs beyond K-means</article-title><source>Pattern Recognition Letters</source><volume>31</volume><issue>8</issue><fpage>651</fpage><lpage>666</lpage></element-citation></ref><ref id="R12"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kanungo</surname><given-names>T</given-names></name><name><surname>Mount</surname><given-names>DM</given-names></name><name><surname>Netanyahu</surname><given-names>NS</given-names></name><name><surname>Piatko</surname><given-names>CD</given-names></name><name><surname>Silverman</surname><given-names>R</given-names></name><name><surname>Wu</surname><given-names>AY</given-names></name></person-group><year>2002</year><article-title>An efficient k-means clustering algorithms: Analysis and implementation</article-title><source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source><volume>24</volume><issue>7</issue><fpage>881</fpage><lpage>892</lpage></element-citation></ref><ref id="R13"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kecojevic</surname><given-names>V</given-names></name><name><surname>Radomsky</surname><given-names>M</given-names></name></person-group><year>2004</year><article-title>The causes and control of loader- and truck-related fatalities in surface mining operations</article-title><source>Injury Control and Safety Promotion</source><volume>11</volume><issue>4</issue><fpage>239</fpage><lpage>251</lpage><pub-id pub-id-type="pmid">15903158</pub-id></element-citation></ref><ref id="R14"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Koza</surname><given-names>JR</given-names></name></person-group><year>1992</year><source>Genetic programming, on the programming of computers by means of natural selection</source><publisher-loc>Cambridge, MA</publisher-loc><publisher-name>MIT Press</publisher-name></element-citation></ref><ref id="R15"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mardis</surname><given-names>AL</given-names></name><name><surname>Pratt</surname><given-names>SG</given-names></name></person-group><year>2003</year><article-title>Nonfatal injuries to young workers in the retail trades and services industries in 1998</article-title><source>Journal of Occupational and Environmental Medicine</source><volume>45</volume><issue>3</issue><fpage>316</fpage><lpage>323</lpage><pub-id pub-id-type="pmid">12661189</pub-id></element-citation></ref><ref id="R16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Md-Nor</surname><given-names>ZA</given-names></name><name><surname>Kecojevic</surname><given-names>V</given-names></name><name><surname>Komijenivic</surname><given-names>D</given-names></name><name><surname>Groves</surname><given-names>W</given-names></name></person-group><year>2008</year><article-title>Risk assessment for haul truck-related fatalities in mining</article-title><source>Mining Engineering</source><volume>60</volume><issue>3</issue><fpage>43</fpage><lpage>49</lpage></element-citation></ref><ref id="R17"><element-citation publication-type="web"><collab>Mine Safety and Health Administration (MSHA)</collab><year>2014</year><article-title>Summary of selected accidents/injuries/illnesses reported to MSHA under 30 CFR part 50, mine injury and worktime quarterly and self extracting files, 2000&#x02013;2014</article-title><comment>(Retrieved Dec 25, 2014 from) <ext-link ext-link-type="uri" xlink:href="http://www.msha.gov">http://www.msha.gov</ext-link></comment></element-citation></ref><ref id="R18"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname><given-names>SM</given-names></name><name><surname>Porter</surname><given-names>WL</given-names></name><name><surname>Dempsey</surname><given-names>PG</given-names></name></person-group><year>2009</year><article-title>Fall from equipment injuries in U.S. mining: Identification of specific research areas for future investigation</article-title><source>Journal of Safety Research</source><volume>40</volume><fpage>455</fpage><lpage>460</lpage><pub-id pub-id-type="pmid">19945559</pub-id></element-citation></ref><ref id="R19"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mujalli</surname><given-names>RO</given-names></name><name><surname>De O&#x000f1;a</surname><given-names>J</given-names></name></person-group><year>2011</year><article-title>A method for simplifying the analysis of traffic accidents injury severity on two-lane highways using Bayesian networks</article-title><source>Journal of Safety Research</source><volume>42</volume><issue>5</issue><fpage>317</fpage><lpage>326</lpage><pub-id pub-id-type="pmid">22093565</pub-id></element-citation></ref><ref id="R20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pollard</surname><given-names>JP</given-names></name><name><surname>Heberger</surname><given-names>J</given-names></name><name><surname>Dempsey</surname><given-names>PG</given-names></name></person-group><year>2014</year><article-title>Maintenance and repair injuries in US mining</article-title><source>Journal of Quality in Maintenance Engineering</source><volume>20</volume><issue>1</issue><fpage>20</fpage><lpage>31</lpage></element-citation></ref><ref id="R21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Reardon</surname><given-names>LM</given-names></name><name><surname>Heberger</surname><given-names>JR</given-names></name><name><surname>Dempsey</surname><given-names>PG</given-names></name></person-group><year>2014</year><article-title>Analysis of fatalities during maintenance and repair operations in the U.S. mining sector</article-title><source>IIE Transactions on Occupational Ergonomics &#x00026; Human Factors</source><volume>2</volume><issue>1</issue><fpage>27</fpage><lpage>38</lpage></element-citation></ref><ref id="R22"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ruff</surname><given-names>T</given-names></name><name><surname>Coleman</surname><given-names>P</given-names></name><name><surname>Martini</surname><given-names>L</given-names></name></person-group><year>2011</year><article-title>Machine-related injuries in the US mining industry and priorities for safety research</article-title><source>International Journal of Injury Control and Safety Promotion</source><volume>18</volume><issue>1</issue><fpage>11</fpage><lpage>20</lpage><pub-id pub-id-type="pmid">20496188</pub-id></element-citation></ref><ref id="R23"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schoenfisch</surname><given-names>AL</given-names></name><name><surname>Lipscomb</surname><given-names>HJ</given-names></name><name><surname>Shishlov</surname><given-names>K</given-names></name><name><surname>Myers</surname><given-names>DJ</given-names></name></person-group><year>2010</year><article-title>Nonfatal construction industry-related injuries treated in hospital emergency departments in the United States, 1998&#x02013;2005</article-title><source>American Journal of Industrial Medicine</source><volume>53</volume><issue>6</issue><fpage>570</fpage><lpage>580</lpage><pub-id pub-id-type="pmid">20506460</pub-id></element-citation></ref><ref id="R24"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tibshirani</surname><given-names>R</given-names></name><name><surname>Walther</surname><given-names>G</given-names></name><name><surname>Hastie</surname><given-names>T</given-names></name></person-group><year>2001</year><article-title>Estimating the number of clusters in a data set via the gap statistic</article-title><source>Journal of the Royal Statistical Society, Series B (Statistical Methodology)</source><volume>63</volume><issue>2</issue><fpage>411</fpage><lpage>423</lpage></element-citation></ref><ref id="R25"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Turin</surname><given-names>FC</given-names></name><name><surname>Wiehagen</surname><given-names>WJ</given-names></name><name><surname>Jaspal</surname><given-names>JS</given-names></name><name><surname>Mayton</surname><given-names>AG</given-names></name></person-group><year>2001</year><source>Haulage truck dump site safety: An examination of reported injuries (DHHS (NIOSH) publication no 2001&#x02013;124, information circular 9454)</source><publisher-loc>Pittsburgh, PA</publisher-loc><publisher-name>U.S. Department of Health and Human Services, Public Health Services, CDC-NIOSH</publisher-name></element-citation></ref><ref id="R26"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wenner</surname><given-names>C</given-names></name><name><surname>Drury</surname><given-names>CG</given-names></name></person-group><year>2000</year><article-title>Analyzing human error in aircraft ground damage incidents</article-title><source>International Journal of Industrial Ergonomics</source><volume>26</volume><issue>2</issue><fpage>177</fpage><lpage>199</lpage></element-citation></ref><ref id="R27"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Wiehagen</surname><given-names>WJ</given-names></name><name><surname>Mayton</surname><given-names>AG</given-names></name><name><surname>Jaspal</surname><given-names>JS</given-names></name><name><surname>Turin</surname><given-names>FC</given-names></name></person-group><year>2001</year><source>An analysis of serious injuries to dozer operators in the U S Mining industry (DHHS (NIOSH) publication no 2001&#x02013;126, information circular 9455)</source><publisher-loc>Pittsburgh, PA</publisher-loc><publisher-name>U.S Department of Health and Health Services, Public Health Services, CDC-NIOSH</publisher-name></element-citation></ref></ref-list><bio id="B1"><p><bold>Saeid R. Dindarloo</bold> holds Ph.D., M.Sc. and B.Sc. degrees, all in Mining Engineering, from Missouri University of Science and Technology, USA, and Amirkabir University of Technology (Tehran Polytechnic), Iran. Dr. Dindarloo has 10 years of research and professional experience. He has extensive experience in mine design, planning, and computer application. His current research interests include: mini safety and health, mining machinery, and mining equipment management.</p></bio><bio id="B2"><p><bold>Jonisha P. Pollard</bold> M.S., CPE is a Research Engineer at the National Institute for Occupational Safety and Health in the Office of Mine Safety and Health Research. She joined NIOSH while pursuing her Master's degree in Bioengineering from the University of Pittsburgh in 2007. During her time at NIOSH, she has been involved in numerous research studies and has published research on knee injuries in mining, the effect of cap lamp lighting on balance, the effect of kneepads on balance, injuries due to maintenance and repair work in mining, locomotion in restricted workspaces and a variety of other topics. Ms. Pollard enjoys conducting field and laboratory research to positively impact the health and safety of mine workers.</p></bio><bio id="B3"><p><bold>Elnaz Siami Irdemoosa</bold> obtained her B.Sc. in Mining Engineering and her M. Sc. in Rock Mechanics from Amirkabir University of Technology (Tehran Polytechnic). She is currently a Ph. D. candidate at Missouri University of Science and Technology in the field of Geological Engineering. Her research interests include underground design and construction, tunneling, construction management, and geophysical method.</p></bio></back><floats-group><fig id="F1" orientation="portrait" position="float"><label>Fig. 1</label><caption><p>GP tree representation of <italic>Coshx</italic>/(<italic>C</italic><sub>1</sub><italic>sinx.</italic>).</p></caption><graphic xlink:href="nihms811456f1"/></fig><fig id="F2" orientation="portrait" position="float"><label>Fig. 2</label><caption><p>Time series of off-road truck-related severe injuries at US mines (2000&#x02013;2012).</p></caption><graphic xlink:href="nihms811456f2"/></fig><fig id="F3" orientation="portrait" position="float"><label>Fig. 3</label><caption><p>Distribution of off-road truck-related accident types at US mines (2000&#x02013;2012). Only, two Non-Occupational Fatalities occurred.</p></caption><graphic xlink:href="nihms811456f3"/></fig><fig id="F4" orientation="portrait" position="float"><label>Fig. 4</label><caption><p>Distribution of mine operation type for severe injuries related to off-road trucks at US mines (2000&#x02013;2012).</p></caption><graphic xlink:href="nihms811456f4"/></fig><fig id="F5" orientation="portrait" position="float"><label>Fig. 5</label><caption><p>a Distribution of job experience (years) for severe injuries related to off-road trucks at US mines (2000&#x02013;2012). b Distribution of job experience (years) for severe injuries vs. other injuries, related to off-road trucks at US mines (2000&#x02013;2012).</p></caption><graphic xlink:href="nihms811456f5"/></fig><fig id="F6" orientation="portrait" position="float"><label>Fig. 6</label><caption><p>Distribution of worker activity at the time of off-road truck-related severe injury in US mines (2000&#x02013;2012).</p></caption><graphic xlink:href="nihms811456f6"/></fig><fig id="F7" orientation="portrait" position="float"><label>Fig. 7</label><caption><p>The three GEP expression trees (Sub-ET 1, Sub-ET 2, and Sub-ET 3) for classification of the injury severity.</p></caption><graphic xlink:href="nihms811456f7"/></fig><table-wrap id="T1" position="float" orientation="portrait"><label>Table 1</label><caption><p>GEP tree key.</p></caption><table frame="hsides" rules="groups"><thead><tr><th valign="middle" align="center" rowspan="1" colspan="1"/><th valign="middle" align="left" rowspan="1" colspan="1">Symbol</th><th valign="middle" align="left" rowspan="1" colspan="1">Definition</th></tr></thead><tbody><tr><td valign="middle" rowspan="10" align="center" colspan="1">Functions</td><td valign="middle" align="left" rowspan="1" colspan="1">LT2A</td><td valign="middle" align="left" rowspan="1" colspan="1">if x&#x0003c;y, then x, else y</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">3Rt</td><td valign="middle" align="left" rowspan="1" colspan="1">x<sup>1/3</sup></td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">LT2G</td><td valign="middle" align="left" rowspan="1" colspan="1">if x&#x0003c;y, then (x+y), else arctan(x*y)</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">GOE2E</td><td valign="middle" align="left" rowspan="1" colspan="1">if x&#x0003e;=y, then (x+y), else (x*y)</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">OR1</td><td valign="middle" align="left" rowspan="1" colspan="1">if x&#x0003c;0 OR y&#x0003c;0, then 1, else 0</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">OR2</td><td valign="middle" align="left" rowspan="1" colspan="1">if x&#x0003e;0 OR y&#x0003e;0, then 1, else 0</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">Avg2</td><td valign="middle" align="left" rowspan="1" colspan="1">avg(x,y)</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">Exp</td><td valign="middle" align="left" rowspan="1" colspan="1">Exponential</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">Atan</td><td valign="middle" align="left" rowspan="1" colspan="1">Arctan</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">X2</td><td valign="middle" align="left" rowspan="1" colspan="1">X<sup>2</sup></td></tr><tr><td valign="bottom" colspan="3" rowspan="1">
<hr/></td></tr><tr><td valign="middle" rowspan="5" align="center" colspan="1">Exlpanatory Variables</td><td valign="middle" align="left" rowspan="1" colspan="1">d0</td><td valign="middle" align="left" rowspan="1" colspan="1">Subunit</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">d1</td><td valign="middle" align="left" rowspan="1" colspan="1">Month of the year</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">d2</td><td valign="middle" align="left" rowspan="1" colspan="1">Activity</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">d3</td><td valign="middle" align="left" rowspan="1" colspan="1">Job Experience</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">d4</td><td valign="middle" align="left" rowspan="1" colspan="1">Operation</td></tr></tbody></table></table-wrap><table-wrap id="T2" position="float" orientation="portrait"><label>Table 2</label><caption><p>GEP classification accuracies.</p></caption><table frame="hsides" rules="groups"><thead><tr><th rowspan="3" valign="top" align="left" colspan="1"/><th rowspan="3" valign="top" align="left" colspan="1">Injury classification</th><th colspan="2" valign="middle" align="left" rowspan="1">Accuracy (%)</th></tr><tr><th valign="bottom" colspan="2" rowspan="1">
<hr/></th></tr><tr><th valign="middle" align="left" rowspan="1" colspan="1">GEP</th><th valign="middle" align="left" rowspan="1" colspan="1">Logistic regression</th></tr></thead><tbody><tr><td rowspan="2" valign="top" align="left" colspan="1">Training</td><td valign="middle" align="left" rowspan="1" colspan="1">Severe</td><td valign="middle" align="left" rowspan="1" colspan="1">74.68</td><td valign="middle" align="left" rowspan="1" colspan="1">61.24</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">Other</td><td valign="middle" align="left" rowspan="1" colspan="1">64.33</td><td valign="middle" align="left" rowspan="1" colspan="1">53.62</td></tr><tr><td rowspan="2" valign="top" align="left" colspan="1">Testing</td><td valign="middle" align="left" rowspan="1" colspan="1">Severe</td><td valign="middle" align="left" rowspan="1" colspan="1">63.04</td><td valign="middle" align="left" rowspan="1" colspan="1">49.59</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">Other</td><td valign="middle" align="left" rowspan="1" colspan="1">64.69</td><td valign="middle" align="left" rowspan="1" colspan="1">51.13</td></tr></tbody></table></table-wrap><table-wrap id="T3" orientation="landscape" position="float"><label>Table 3a</label><caption><p>Cluster specifications.</p></caption><table frame="hsides" rules="groups"><thead><tr><th valign="middle" align="left" rowspan="1" colspan="1">Cluster</th><th valign="middle" align="left" rowspan="1" colspan="1">Number of incidents</th><th valign="middle" align="right" rowspan="1" colspan="1">Month of year</th><th valign="middle" align="right" rowspan="1" colspan="1">Operation Code</th><th valign="middle" align="left" rowspan="1" colspan="1">Average experience (years)</th><th valign="middle" align="left" rowspan="1" colspan="1">Subunit code</th><th valign="middle" align="right" rowspan="1" colspan="1">Activity code</th><th valign="middle" align="right" rowspan="1" colspan="1">Number of Severe Injuries</th></tr></thead><tbody><tr><td valign="middle" align="left" rowspan="1" colspan="1">0</td><td valign="middle" align="left" rowspan="1" colspan="1">653</td><td valign="middle" align="right" rowspan="1" colspan="1">6, 7, 8 (31%)</td><td valign="middle" align="right" rowspan="1" colspan="1">4 (48%)</td><td valign="middle" align="left" rowspan="1" colspan="1">0&#x02013;5 (72%)</td><td valign="middle" align="left" rowspan="1" colspan="1">9 (91%)</td><td valign="middle" align="right" rowspan="1" colspan="1">1, 5 (72%)</td><td valign="middle" align="right" rowspan="1" colspan="1">22 (3.4%)</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">1</td><td valign="middle" align="left" rowspan="1" colspan="1">371</td><td valign="middle" align="right" rowspan="1" colspan="1">6, 7, 8 (34%)</td><td valign="middle" align="right" rowspan="1" colspan="1">1, 4 (54%)</td><td valign="middle" align="left" rowspan="1" colspan="1">0&#x02013;5 (51%)</td><td valign="middle" align="left" rowspan="1" colspan="1">3 (78%)</td><td valign="middle" align="right" rowspan="1" colspan="1">4 (55%)</td><td valign="middle" align="right" rowspan="1" colspan="1">0 (0%)</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">2</td><td valign="middle" align="left" rowspan="1" colspan="1">2,352</td><td valign="middle" align="right" rowspan="1" colspan="1">1&#x02013;6 (0%)</td><td valign="middle" align="right" rowspan="1" colspan="1">1, 4 (55%)</td><td valign="middle" align="left" rowspan="1" colspan="1">0&#x02013;5 (51%)</td><td valign="middle" align="left" rowspan="1" colspan="1">3 (78%)</td><td valign="middle" align="right" rowspan="1" colspan="1">1, 5 (79%)</td><td valign="middle" align="right" rowspan="1" colspan="1">58 (2.5%)</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">3</td><td valign="middle" align="left" rowspan="1" colspan="1">187</td><td valign="middle" align="right" rowspan="1" colspan="1">1, 4, 10 (33%)</td><td valign="middle" align="right" rowspan="1" colspan="1">1, 4 (59%)</td><td valign="middle" align="left" rowspan="1" colspan="1">0&#x02013;5 (49%)</td><td valign="middle" align="left" rowspan="1" colspan="1">3 (68%)</td><td valign="middle" align="right" rowspan="1" colspan="1">6 (62%)</td><td valign="middle" align="right" rowspan="1" colspan="1">0 (0%)</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">4</td><td valign="middle" align="left" rowspan="1" colspan="1">2,191</td><td valign="middle" align="right" rowspan="1" colspan="1">7&#x02013;12 (0%)</td><td valign="middle" align="right" rowspan="1" colspan="1">1, 4 (45%)</td><td valign="middle" align="left" rowspan="1" colspan="1">0&#x02013;5 (57%)</td><td valign="middle" align="left" rowspan="1" colspan="1">3 (89%)</td><td valign="middle" align="right" rowspan="1" colspan="1">5 (52%)</td><td valign="middle" align="right" rowspan="1" colspan="1">45 (2.1%)</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">Total</td><td valign="middle" align="left" rowspan="1" colspan="1">5,754</td><td valign="middle" align="right" rowspan="1" colspan="1"/><td valign="middle" align="right" rowspan="1" colspan="1"/><td valign="middle" align="left" rowspan="1" colspan="1"/><td valign="middle" align="left" rowspan="1" colspan="1"/><td valign="middle" align="right" rowspan="1" colspan="1"/><td valign="middle" align="right" rowspan="1" colspan="1">125 (2.17%)</td></tr></tbody></table></table-wrap><table-wrap id="T4" position="float" orientation="portrait"><label>Table 3b</label><caption><p>Activity code definitions.</p></caption><table frame="hsides" rules="groups"><thead><tr><th valign="middle" align="left" rowspan="1" colspan="1">Code</th><th valign="middle" align="left" rowspan="1" colspan="1">Description</th></tr></thead><tbody><tr><td valign="middle" align="left" rowspan="1" colspan="1">1</td><td valign="middle" align="left" rowspan="1" colspan="1">Getting on/off truck</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">2</td><td valign="middle" align="left" rowspan="1" colspan="1">Handling supplies/materials</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">3</td><td valign="middle" align="left" rowspan="1" colspan="1">Idle (Lunch, break, etc.)</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">4</td><td valign="middle" align="left" rowspan="1" colspan="1">Machine maintenance/repair</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">5</td><td valign="middle" align="left" rowspan="1" colspan="1">Operating truck</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">6</td><td valign="middle" align="left" rowspan="1" colspan="1">Other</td></tr></tbody></table></table-wrap><table-wrap id="T5" position="float" orientation="portrait"><label>Table 3c</label><caption><p>Operation code definitions.</p></caption><table frame="hsides" rules="groups"><thead><tr><th valign="middle" align="left" rowspan="1" colspan="1">Code</th><th valign="middle" align="left" rowspan="1" colspan="1">Description</th></tr></thead><tbody><tr><td valign="middle" align="left" rowspan="1" colspan="1">1</td><td valign="middle" align="left" rowspan="1" colspan="1">Coal operator</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">2</td><td valign="middle" align="left" rowspan="1" colspan="1">Metal operator</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">3</td><td valign="middle" align="left" rowspan="1" colspan="1">Non-metal operator</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">4</td><td valign="middle" align="left" rowspan="1" colspan="1">Stone operator</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">5</td><td valign="middle" align="left" rowspan="1" colspan="1">Sand and gravel operator</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">6</td><td valign="middle" align="left" rowspan="1" colspan="1">Coal contractor</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">7</td><td valign="middle" align="left" rowspan="1" colspan="1">Non-coal contractor</td></tr></tbody></table></table-wrap><table-wrap id="T6" position="float" orientation="portrait"><label>Table 3d</label><caption><p>Subunit code definitions.</p></caption><table frame="hsides" rules="groups"><thead><tr><th valign="middle" align="left" rowspan="1" colspan="1">Code</th><th valign="middle" align="left" rowspan="1" colspan="1">Description</th></tr></thead><tbody><tr><td valign="middle" align="left" rowspan="1" colspan="1">1</td><td valign="middle" align="left" rowspan="1" colspan="1">Underground operations</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">2</td><td valign="middle" align="left" rowspan="1" colspan="1">Surface at underground</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">3</td><td valign="middle" align="left" rowspan="1" colspan="1">Surface: strip or open pit mining</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">4</td><td valign="middle" align="left" rowspan="1" colspan="1">Auger</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">5</td><td valign="middle" align="left" rowspan="1" colspan="1">Culm banks</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">6</td><td valign="middle" align="left" rowspan="1" colspan="1">Dredge</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">7</td><td valign="middle" align="left" rowspan="1" colspan="1">Other surface</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">8</td><td valign="middle" align="left" rowspan="1" colspan="1">Independent shop and yards</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">9</td><td valign="middle" align="left" rowspan="1" colspan="1">Mill or preparation plant</td></tr><tr><td valign="middle" align="left" rowspan="1" colspan="1">10</td><td valign="middle" align="left" rowspan="1" colspan="1">Office</td></tr></tbody></table></table-wrap><table-wrap id="T8" position="float" orientation="portrait"><label>Table 4</label><caption><p>Cluster definitions.</p></caption><table frame="hsides" rules="groups"><thead><tr><th valign="middle" align="left" rowspan="1" colspan="1">Cluster</th><th valign="middle" align="left" rowspan="1" colspan="1">Label</th></tr></thead><tbody><tr><td valign="top" align="left" rowspan="1" colspan="1">0</td><td valign="top" align="left" rowspan="1" colspan="1">Accidents related to minerals processing mills and coal preparation plants (91%). More than two-thirds of the victims in this cluster had job experiences of less than 5 years. This cluster is associated with the highest percentage of severe injuries (22 severe accidents, 3.4%). Almost 50% of all accidents in this cluster occurred at stone operations.</td></tr><tr><td valign="top" align="left" rowspan="1" colspan="1">1</td><td valign="top" align="left" rowspan="1" colspan="1">In 55% of the cases in this cluster, the victim had been inspecting the truck for maintenance/repair and using non-powered hand tools. Most of these accidents occurred at coal or stone operations. This cluster was not associated with any severe accidents.</td></tr><tr><td valign="top" align="left" rowspan="1" colspan="1">2</td><td valign="top" align="left" rowspan="1" colspan="1">No accidents occurred during the first half (January through June) of the year. Half of these victims were injured while getting on or off equipment, machines, etc. The average job experience for this cluster is less than 5 years. Also, the percentage of fatalities in this cluster is considerable. The cluster contains the highest absolute number of severe injuries, 58 victims.</td></tr><tr><td valign="top" align="left" rowspan="1" colspan="1">3</td><td valign="top" align="left" rowspan="1" colspan="1">In this cluster, 62% of all incidents occurred when the victim was either running or walking. No severe injuries were recorded in this cluster.</td></tr><tr><td valign="top" align="left" rowspan="1" colspan="1">4</td><td valign="top" align="left" rowspan="1" colspan="1">The second highest number of severe accidents occurred in this cluster (45 accidents). Over half of the accidents occurred when equipment operators with less than 5 years of experience (57%) were driving dump trucks (activity code 5).</td></tr></tbody></table></table-wrap></floats-group></article>