<!DOCTYPE article
PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD with MathML3 v1.3 20210610//EN" "JATS-archivearticle1-3-mathml3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.3" xml:lang="en" article-type="research-article"><?properties manuscript?><processing-meta base-tagset="archiving" mathml-version="3.0" table-model="xhtml" tagset-family="jats"><restricted-by>pmc</restricted-by></processing-meta><front><journal-meta><journal-id journal-id-type="nlm-journal-id">9507598</journal-id><journal-id journal-id-type="pubmed-jr-id">21558</journal-id><journal-id journal-id-type="nlm-ta">Int J Occup Saf Ergon</journal-id><journal-id journal-id-type="iso-abbrev">Int J Occup Saf Ergon</journal-id><journal-title-group><journal-title>International journal of occupational safety and ergonomics : JOSE</journal-title></journal-title-group><issn pub-type="ppub">1080-3548</issn><issn pub-type="epub">2376-9130</issn></journal-meta><article-meta><article-id pub-id-type="pmid">38576355</article-id><article-id pub-id-type="pmc">11089329</article-id><article-id pub-id-type="doi">10.1080/10803548.2024.2325301</article-id><article-id pub-id-type="manuscript">HHSPA1984496</article-id><article-categories><subj-group subj-group-type="heading"><subject>Article</subject></subj-group></article-categories><title-group><article-title>Establishment-level safety analytics: a scoping review</article-title></title-group><contrib-group><contrib contrib-type="author"><contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0001-9043-4539</contrib-id><name><surname>Foreman</surname><given-names>Anne M.</given-names></name><xref rid="A1" ref-type="aff">a</xref></contrib><contrib contrib-type="author"><name><surname>Friedel</surname><given-names>Jonathan E.</given-names></name><xref rid="A2" ref-type="aff">b</xref></contrib><contrib contrib-type="author"><name><surname>Ezerins</surname><given-names>Maira E.</given-names></name><xref rid="A3" ref-type="aff">c</xref></contrib><contrib contrib-type="author"><name><surname>Matthews</surname><given-names>Riggs</given-names></name><xref rid="A4" ref-type="aff">d</xref></contrib><contrib contrib-type="author"><name><surname>Nicholson</surname><given-names>Royale E.</given-names></name><xref rid="A4" ref-type="aff">d</xref></contrib><contrib contrib-type="author"><name><surname>Wellersdick</surname><given-names>Logan</given-names></name><xref rid="A4" ref-type="aff">d</xref></contrib><contrib contrib-type="author"><name><surname>Bergman</surname><given-names>Shawn</given-names></name><xref rid="A4" ref-type="aff">d</xref></contrib><contrib contrib-type="author"><name><surname>A&#x000e7;&#x00131;kg&#x000f6;z</surname><given-names>Yalcin</given-names></name><xref rid="A4" ref-type="aff">d</xref></contrib><contrib contrib-type="author"><name><surname>Ludwig</surname><given-names>Timothy D.</given-names></name><xref rid="A4" ref-type="aff">d</xref></contrib><contrib contrib-type="author"><name><surname>Wirth</surname><given-names>Oliver</given-names></name><xref rid="A1" ref-type="aff">a</xref></contrib></contrib-group><aff id="A1"><label>a</label>Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, WV, USA</aff><aff id="A2"><label>b</label>Department of Psychology, Georgia Southern University, Statesboro, GA, USA</aff><aff id="A3"><label>c</label>Department of Management, The Sam M. Walton College of Business, University of Arkansas, Fayetteville, AR, USA</aff><aff id="A4"><label>d</label>Department of Psychology, Appalachian State University, Boone, NC, USA</aff><author-notes><corresp id="CR1"><bold>CONTACT</bold> Anne M. Foreman <email>amforeman@cdc.gov</email>; <email>vpc3@cdc.gov</email> Health Effects Laboratory Division, National Institute for Occupational Safety and Health, 1095 Willowdale Road, Morgantown, WV 26505, USA</corresp></author-notes><pub-date pub-type="nihms-submitted"><day>6</day><month>5</month><year>2024</year></pub-date><pub-date pub-type="ppub"><month>6</month><year>2024</year></pub-date><pub-date pub-type="epub"><day>05</day><month>4</month><year>2024</year></pub-date><pub-date pub-type="pmc-release"><day>01</day><month>6</month><year>2024</year></pub-date><volume>30</volume><issue>2</issue><fpage>559</fpage><lpage>570</lpage><abstract id="ABS1"><p id="P1">The use of data analytics has seen widespread application in fields such as medicine and supply chain management, but their application in occupational safety has only recently become more common. The purpose of this scoping review was to summarize studies that employed analytics within establishments to reveal insights about work-related injuries or fatalities. Over 300 articles were reviewed to survey the objectives, scope and methods used in this emerging field. We conclude that the promise of analytics for providing actionable insights to address occupational safety concerns is still in its infancy. Our review shows that most articles were focused on method development and validation, including studies that tested novel methods or compared the utility of multiple methods. Many of the studies cited various challenges in overcoming barriers caused by inadequate or inefficient technical infrastructures and unsupportive data cultures that threaten the accuracy and quality of insights revealed by the analytics.</p></abstract><kwd-group><kwd>data analytics</kwd><kwd>occupational safety</kwd><kwd>injuries</kwd><kwd>data mining</kwd></kwd-group></article-meta></front><body><sec id="S1"><label>1.</label><title>Introduction</title><p id="P2">There has been growing interest in the application of data analytics to aid research and practice in occupational safety. This growth is demonstrated by the rapid increase in the number of Scopus results for the search &#x02018;safety analytics&#x02019; from 2003 to 2022 (see <xref rid="F1" ref-type="fig">Figure 1</xref>). The popular press coverage on safety analytics has increased in kind. A post on the NIOSH Science Blog titled &#x02018;Can Predictive Analytics Help Reduce Workplace Risk?&#x02019; outlines the potential benefits and barriers to the application of analytics to workplace safety, using the logic that &#x02018;if injuries can be predicted accurately, they can be prevented&#x02019; [<xref rid="R1" ref-type="bibr">1</xref>]. Similarly, a 2021 article in <italic toggle="yes">Industrial Safety &#x00026; Hygiene News</italic> titled &#x02018;Rethinking Predictive Analysis: Learn How to Stop Workplace Incidents Before They Occur&#x02019; [<xref rid="R2" ref-type="bibr">2</xref>] described the advantages of analytic approaches to safety in identifying patterns associated with risk. The number of companies applying analytics to their occupational safety data may be unknown but, given the number of consultation companies now providing occupational safety solutions within their data analytic services and considering the substantial increase in the number of published studies related to this topic, we can infer the number is likely increasing.</p><p id="P3">The growing number of articles in the occupational safety analytics literature (reflected in <xref rid="F1" ref-type="fig">Figure 1</xref>) cover a variety of topic areas including sensor technology, hazard evaluation, personal protective equipment (PPE) identification and evaluating injuries across sectors, within industries and within organizations. Sensor technology studies are concerned with implementing sensors for detecting worker-specific variables like fatigue [<xref rid="R3" ref-type="bibr">3</xref>], muscle force [<xref rid="R4" ref-type="bibr">4</xref>] or energy expenditure [<xref rid="R5" ref-type="bibr">5</xref>], then analyzing the sensor data with analytic techniques. Hazard evaluation studies are concerned with using analytics to identify hazards in the work environment. Hazards in this context are aspects of technology, tasks or activities with potential for harm [<xref rid="R6" ref-type="bibr">6</xref>]. Some examples include implementing a video detection algorithm to detect safety equipment [<xref rid="R7" ref-type="bibr">7</xref>] and developing machine learning algorithms to predict hearing loss in workers associated with industrial noise [<xref rid="R8" ref-type="bibr">8</xref>]. PPE identification studies are concerned with using computer vision technology to identify PPE use among workers. Studies have been conducted targeting hard hat [<xref rid="R9" ref-type="bibr">9</xref>] and harness [<xref rid="R10" ref-type="bibr">10</xref>] use, among other types of PPE. Finally, analytics studies have evaluated injury data to identify the cause of incidents across multiple levels of analysis such as across national sectors [<xref rid="R11" ref-type="bibr">11</xref>], within industries [<xref rid="R12" ref-type="bibr">12</xref>] and within organizations [<xref rid="R13" ref-type="bibr">13</xref>].</p><p id="P4">Acknowledging a lack of agreement as regards the definition of analytics in the literature [<xref rid="R14" ref-type="bibr">14</xref>,<xref rid="R15" ref-type="bibr">15</xref>], we started with a consensus definition: &#x02018;the process of developing actionable insights through problem definition and the application of statistical models and analysis against existing and/or simulated future data&#x02019; (p.3) [<xref rid="R16" ref-type="bibr">16</xref>]. For the purposes of this review, we further limited the definition to only include existing data, not simulated data. Although this consensus definition is rather broad and non-specific, its inclusivity permits us to capture a wide range of approaches in the review. While we limited our scope to include only existing data and excluded research in occupational safety using simulated data, a wide range of analytic approaches are captured in the review. There is a great deal of heterogeneity in the existing body of studies as the context for analytics in occupational safety involves a rather recent shift away from traditional approaches, which often consisted of simple frequency counts of injuries across time, to applied data analyses and more complex analytic approaches like machine learning.</p><p id="P5">The purpose of the present article is to conduct a review of studies that use analytics within enterprises or establishments to reveal insights about work-related injuries or fatalities. Our review focuses on enterprise-level and establishment-level analytics instead of industry-level analytics, which are usually based on analyses of national databases such as those recorded by the Occupational Safety and Health Administration or the Bureau of Labor Statistics. Injury reporting to industry or regulatory bodies often only includes a small number of variables, such as days away from work or restricted time at work. Reviews have been published on industry-level safety analytics [<xref rid="R17" ref-type="bibr">17</xref>], but to date no reviews have focused on enterprise-level or establishment-level analytics. The present review may serve as a useful resource for researchers, safety professionals and others interested in implementing analytics within organizations by summarizing the methods and findings of research that has been conducted at similar levels.</p></sec><sec id="S2"><label>2.</label><title>Method</title><p id="P6">We performed a scoping review &#x02013; a preliminary assessment of the size and scope of available research literature &#x02013; of selected articles in which occupational safety analytics was conducted at the enterprise or establishment level. A scoping review approach was used instead of a systematic review approach, which seeks to identify and synthesize evidence related to a specific question, because the application of analytics in this area of investigation is a relatively new phenomenon and there is substantial heterogeneity in types of applications of analytics in occupational safety and the methods used. In conducting the scoping review, we followed guidelines recommended by the Preferred Reporting Items for Systematic reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR [<xref rid="R18" ref-type="bibr">18</xref>]).</p><sec id="S3"><label>2.1.</label><title>Eligibility criteria and search details</title><p id="P7">We searched Scopus and Web of Science for relevant articles. There were two sets of search terms: one set related to analytics and the other set related to occupational safety. The search terms were developed through consultation with experts and based on results of preliminary attempts to search various databases. The same terms were used across databases, but the search strategy was tailored to the particular syntax of each database. The search string was as follows:</p><disp-quote id="Q1"><p id="P8">TITLE-ABS-KEY ((&#x02018;big data&#x02019; OR analytics OR &#x02018;data mining&#x02019; OR &#x02018;machine learning&#x02019; OR clustering OR &#x02018;decision tree&#x02217;&#x02019; OR &#x02018;neural network&#x02019; OR &#x02018;artificial intelligence&#x02019; OR &#x02018;association rule&#x02019;) AND (&#x02018;occupational safety&#x02019; OR &#x02018;occupational injur&#x02217;&#x02019; OR &#x02018;work&#x02217; injur&#x02217;&#x02019; OR &#x02018;occupational health and safety&#x02019; OR &#x02018;industrial safety&#x02019; OR &#x02018;construction safety&#x02019; OR &#x02018;manufacturing safety&#x02019;))</p></disp-quote><p id="P9">The search was limited to articles published after 2006, and only peer-reviewed journal articles and conference publications were considered. Furthermore, articles had to be written in English, involve the use of analytics techniques, have injuries or near misses as an outcome of interest (i.e., the dependent variable) and use data collected at the enterprise or establishment level. The inclusion and exclusion criteria are listed in <xref rid="T1" ref-type="table">Table 1</xref>. These studies were excluded to both keep the number of resulting articles to a manageable number and to focus on studies that evaluated injuries or near misses as the outcome variable. Studies may refer to injuries as &#x02018;incidents&#x02019; or &#x02018;accidents&#x02019; as these terms are often used interchangeably [<xref rid="R19" ref-type="bibr">19</xref>]. Examples of studies that were excluded included those that were not concerned with occupational safety (e.g., pedestrian safety, patient safety, etc.); dealt with occupational health issues instead of injuries (e.g., pneumoconiosis, cardiovascular disease, etc.); were concerned with assessing risks or hazards; or were literature reviews, commentaries or other types of non-empirical research papers.</p><p id="P10">Additional searches, beyond the initial Scopus and Web of Science search, were conducted to ensure that as many qualifying studies as possible were identified. The <italic toggle="yes">Safety Science</italic> journal had the largest number of qualifying articles identified from the database searches (described earlier). Therefore, we conducted a manual search of the titles and abstracts of every paper published in <italic toggle="yes">Safety Science</italic> from 2005 to September 2023. Additionally, citation searches were conducted for the studies that met the inclusion criteria. This involved searching the references of a particular study and subsequent citations of that study listed by Google Scholar for inclusion criteria.</p></sec><sec id="S4"><label>2.2.</label><title>Selection and data charting</title><p id="P11">At least two co-authors reviewed each citation, and the first author (A.M.F.) reviewed all citations. If two authors disagreed on an article&#x02019;s inclusion, a third author would break the tie. The searches were conducted in September 2023. A data chart was populated for each article selected for inclusion. The data chart contained the following columns: year, title, authors, affiliation, author keywords, journal, study question(s)/objective(s), description of data source(s), data preprocessing, predictor variables, outcome variables, analytics techniques, analytics-derived insights, challenges described and notes. A thematic synthesis was used to identify and summarize patterns in the included articles and identify knowledge gaps. These findings will be described in the following section.</p></sec></sec><sec id="S5"><label>3.</label><title>Results</title><p id="P12">A flow chart summarizing the scoping review process is shown in <xref rid="F2" ref-type="fig">Figure 2</xref>. Searches of Scopus and Web of Science resulted in 2845 citations. After the removal of duplicates, 2617 unique citations remained. After reviewing the abstracts for meeting the exclusion criteria, 2292 studies were excluded and 325 studies required further review. The full text of the remaining 325 studies was further examined to assess their eligibility. Because the present review is focused on analytics conducted at the enterprise or establishment level, 71 studies were excluded because their data sources were collected across sectors or industries (e.g., occupational injuries by country) and 225 studies were excluded because their data were collected within a sector or industry. The remaining 29 studies from the initial search satisfied all the inclusion criteria. From the manual search of <italic toggle="yes">Safety Science</italic>, we identified three additional citations. We also identified another 17 studies via the forward and backward citation search method. The final sample included 49 studies.</p><p id="P13"><xref rid="F3" ref-type="fig">Figure 3</xref> shows summary characteristics of the included studies. <xref rid="F3" ref-type="fig">Figure 3(A)</xref> displays the years of publication for the 49 studies. The first study that met the inclusion criteria was published in 2008. There was a marked increase in the number of studies in 2017, and the greatest number of studies were published in 2019. The countries in which the studies were conducted are shown in <xref rid="F3" ref-type="fig">Figure 3(B)</xref>. Most studies were conducted in India (<italic toggle="yes">n</italic> = 29; 59% of the included studies). Italy and Iran were the second most frequent with three (6%) studies each. <xref rid="F3" ref-type="fig">Figure 3(C)</xref> displays the sectors in which the studies were conducted. These included manufacturing (<italic toggle="yes">n</italic> = 36; 73%), construction (<italic toggle="yes">n</italic> = 7; 15%), utilities (<italic toggle="yes">n</italic> = 3; 6%), transportation and warehousing (<italic toggle="yes">n</italic> = 2; 4%), and agriculture, forestry, fishing and hunting (<italic toggle="yes">n</italic> = 1; 2%). Finally, <xref rid="F3" ref-type="fig">Figure 3(D)</xref> shows the types of manufacturing represented in the scoping review studies. Most of the studies conducted in manufacturing were in steel manufacturing (83%; <italic toggle="yes">n</italic> = 30/36).</p><sec id="S6"><label>3.1.</label><title>Study objectives</title><p id="P14">All studies included in the scoping review attempted to predict injuries or near misses, or, alternatively, to identify common patterns related to injuries or near misses. This is not surprising as one of the inclusion criteria for a study was that it be related to analytics of injuries or near misses. Most of the studies were concerned with exploring data from incident reports, safety inspection reports and other safety data, although there were a few exceptions with more specific research objectives. For example, Marques et al. [<xref rid="R20" ref-type="bibr">20</xref>] examined the effects of drug and alcohol testing frequency on the occurrence of injuries. Additionally, Tsang et al. [<xref rid="R21" ref-type="bibr">21</xref>] explored the effects of employee characteristics (e.g., age, body mass index [BMI], average heart rate) and environmental/behavioral variables (e.g., ambient temperature, recovery time) on injuries among cold-storage employees.</p></sec><sec id="S7"><label>3.2.</label><title>Descriptions of data sources</title><p id="P15">Many studies did not disclose the sources of their data. This is problematic, as it impedes the ability to replicate or extend that research by others in the field. There were some common themes in the studies that did disclose information on the sources and types of data sources. Twenty-five studies that included information on data sources were conducted in steel plants, and a few studies were conducted in construction organizations or refineries. The datasets usually spanned 3&#x02013;5 years, with the longest duration of time being 16 years [<xref rid="R22" ref-type="bibr">22</xref>]. Various people collected the data, including front-line workers [<xref rid="R23" ref-type="bibr">23</xref>,<xref rid="R24" ref-type="bibr">24</xref>] and managers [<xref rid="R25" ref-type="bibr">25</xref>], and data were typically stored in the organization&#x02019;s online safety management system (SMS) [<xref rid="R23" ref-type="bibr">23</xref>,<xref rid="R26" ref-type="bibr">26</xref>&#x02013;<xref rid="R31" ref-type="bibr">31</xref>].</p><p id="P16">Data reported in the studies featured both structured and unstructured data with a mix of data types (e.g., categorical, numerical, textual). Structured data can be thought of as searchable data that are easily processed by a computer and have a clearly defined organizational system [<xref rid="R32" ref-type="bibr">32</xref>]. Unstructured data cannot be easily processed with conventional tools and has no clearly defined organizational system (e.g., the text of written incident reports, photographs, etc.) [<xref rid="R32" ref-type="bibr">32</xref>]. Commonly used structured data included the frequency of injuries and incidents. Unstructured data included text-based descriptions of incidents and observations as well as occasional photographic observations.</p><p id="P17">There was a high degree of variability in the types of data collected. Most of the variables included in the studies were reactive or lagging indicators. These are variables such as incident reports, injuries and lost workdays that occur after a safety incident. <xref rid="F4" ref-type="fig">Figure 4(B)</xref> shows the most frequently included incident details. Only 10 (22%) studies included proactive or leading indicators. Leading indicators occur before a safety incident and include preventive variables such as safety audit reports, frequency of toolbox meetings and safety training history. A few studies included human resource management data such as employee age, education and marital status. <xref rid="F4" ref-type="fig">Figure 4(A)</xref> shows the most frequently included employee characteristics. Although some studies in the construction sector included details related to project cost, stage of completion or complexity [<xref rid="R33" ref-type="bibr">33</xref>,<xref rid="R34" ref-type="bibr">34</xref>], limited studies used data from organizational areas outside safety, such as production or maintenance. For example, Tsang et al. [<xref rid="R21" ref-type="bibr">21</xref>] did include production data, but these data were used for analyses separate from those investigating injury.</p><p id="P18">Some of the studies included upwards of 20 variables in their analyses, and although a majority provided definitions for each variable, 14 (29%) of the studies did not provide any definitions. For example, one study examined several predictor variables, including &#x02018;cause of accident&#x02019;, but did not describe what causes were captured by that variable [<xref rid="R35" ref-type="bibr">35</xref>]. Although some variables may be self-explanatory or not require detailed definitions (e.g., year, day of the week, season), other variables necessitate further explanation (e.g., nature of incident, equipment damage score, etc.). To fully understand and interpret the results of the analyses conducted, the reader needs to know what the variables are measuring so that they may draw informed conclusions from the study results, make comparisons to their own organizational data or compare results across studies.</p></sec><sec id="S8"><label>3.3.</label><title>Data preprocessing</title><p id="P19">The preprocessing stage of analytics involves selecting variables to analyze, handling missing data and restructuring text data. This stage typically takes the most amount of time, and there are different methods for approaching the preprocessing steps. <xref rid="F5" ref-type="fig">Figure 5(A)</xref> shows the approaches to variable selection reported in the studies included in this review. Variable selection involves deciding which variables to include in the analysis. For 27% (<italic toggle="yes">n</italic> = 13) of the studies, there was no description of the process for variable selection [1222, <xref rid="R24" ref-type="bibr">24</xref>,<xref rid="R25" ref-type="bibr">25</xref>,<xref rid="R34" ref-type="bibr">34</xref>&#x02013;<xref rid="R42" ref-type="bibr">42</xref>]. For studies that described the process, the most common approaches were statistical in nature, including using Boruta feature selection [<xref rid="R34" ref-type="bibr">34</xref>,<xref rid="R44" ref-type="bibr">44</xref>,<xref rid="R45" ref-type="bibr">45</xref>], <italic toggle="yes">&#x003c7;</italic><sup>2</sup> [<xref rid="R27" ref-type="bibr">27</xref>,<xref rid="R28" ref-type="bibr">28</xref>,<xref rid="R33" ref-type="bibr">33</xref>] and random forest [<xref rid="R46" ref-type="bibr">46</xref>], among others [<xref rid="R22" ref-type="bibr">22</xref>,<xref rid="R24" ref-type="bibr">24</xref>,<xref rid="R47" ref-type="bibr">47</xref>&#x02013;<xref rid="R53" ref-type="bibr">53</xref>]. Fifteen studies stated inclusion or exclusion criteria which varied from including all variables [<xref rid="R29" ref-type="bibr">29</xref>,<xref rid="R31" ref-type="bibr">31</xref>,<xref rid="R54" ref-type="bibr">54</xref>&#x02013;<xref rid="R58" ref-type="bibr">58</xref>], the most common variables [<xref rid="R59" ref-type="bibr">59</xref>] or variables that aligned with specific International Organization for Standardization recommendations [<xref rid="R21" ref-type="bibr">21</xref>], among others [<xref rid="R20" ref-type="bibr">20</xref>,<xref rid="R60" ref-type="bibr">60</xref>&#x02013;<xref rid="R64" ref-type="bibr">64</xref>]. Last, five studies stated that their variables were selected based on domain knowledge or consultations with experts [<xref rid="R30" ref-type="bibr">30</xref>,<xref rid="R65" ref-type="bibr">65</xref>&#x02013;<xref rid="R68" ref-type="bibr">68</xref>].</p><p id="P20">Accounting for and handling missing data is an important step in preprocessing and analysis. The approaches to handling missing data are shown in <xref rid="F5" ref-type="fig">Figure 5(B)</xref>. Most of the studies (59%; <italic toggle="yes">n</italic> = 29) did not include a description of how missing data was handled. Eight studies used a statistical approach to handling missing data, which included imputation [<xref rid="R33" ref-type="bibr">33</xref>,<xref rid="R50" ref-type="bibr">50</xref>], random forest [<xref rid="R27" ref-type="bibr">27</xref>,<xref rid="R44" ref-type="bibr">44</xref>&#x02013;<xref rid="R47" ref-type="bibr">47</xref>] and expectation maximization [<xref rid="R65" ref-type="bibr">65</xref>]. Seven studies mentioned that missing data were accounted for but did not provide specific details [<xref rid="R24" ref-type="bibr">24</xref>,<xref rid="R26" ref-type="bibr">26</xref>,<xref rid="R29" ref-type="bibr">29</xref>,<xref rid="R39" ref-type="bibr">39</xref>,<xref rid="R42" ref-type="bibr">42</xref>,<xref rid="R57" ref-type="bibr">57</xref>,<xref rid="R61" ref-type="bibr">61</xref>]. Four studies stated that data with missing values were omitted from the analysis [<xref rid="R35" ref-type="bibr">35</xref>,<xref rid="R41" ref-type="bibr">41</xref>,<xref rid="R59" ref-type="bibr">59</xref>,<xref rid="R60" ref-type="bibr">60</xref>]. One study consulted with experts to inform the techniques to handle missing data, although did not describe the chosen techniques in their manuscript [<xref rid="R30" ref-type="bibr">30</xref>].</p><p id="P21">Free-text data (e.g., incident narratives) can often provide a great deal of information, but it first must be converted into a format that can be analyzed. That is, unstructured, open-ended text (e.g., explanations on an incident report) must be converted to a structured format that can be read by data analysis software. There are numerous methods to convert free text to structured data. The methods to convert free text reported by the studies in this review are shown in <xref rid="F5" ref-type="fig">Figure 5(C)</xref>. The most common approach was latent dirichlet allocation [<xref rid="R24" ref-type="bibr">24</xref>,<xref rid="R27" ref-type="bibr">27</xref>,<xref rid="R44" ref-type="bibr">44</xref>,<xref rid="R47" ref-type="bibr">47</xref>,<xref rid="R53" ref-type="bibr">53</xref>,<xref rid="R56" ref-type="bibr">56</xref>] followed by term frequency-inverse document frequency (TF-IDF) [<xref rid="R13" ref-type="bibr">13</xref>,<xref rid="R24" ref-type="bibr">24</xref>,<xref rid="R38" ref-type="bibr">38</xref>,<xref rid="R54" ref-type="bibr">54</xref>,<xref rid="R64" ref-type="bibr">64</xref>]. Other studies used structural topic modeling [<xref rid="R45" ref-type="bibr">45</xref>,<xref rid="R57" ref-type="bibr">57</xref>] and expectation maximization [<xref rid="R24" ref-type="bibr">24</xref>,<xref rid="R28" ref-type="bibr">28</xref>].</p><p id="P22">Class imbalances in the data were another concern that was addressed during the data preparation phase in several studies. Class imbalance occurs when one outcome variable occurs disproportionately more often than another, creating bias toward the majority variable [<xref rid="R69" ref-type="bibr">69</xref>]. In such datasets, the variable of greatest interest is often the minority variable, such as workplace injuries, compared to more common near-miss events [<xref rid="R70" ref-type="bibr">70</xref>]. Data-level techniques are the most popular methods for addressing class imbalance and consist of two approaches, undersampling and oversampling. In oversampling, values from the minority variable are replicated to increase the size of the class, and in undersampling, values from the majority variable are deleted to decrease the size of the class. In the present studies, undersampling was used in one study [<xref rid="R50" ref-type="bibr">50</xref>] and oversampling was used in a handful of studies to address class imbalance issues. Oversampling techniques implemented included the synthetic minority oversampling technique (SMOTE) [<xref rid="R22" ref-type="bibr">22</xref>,<xref rid="R33" ref-type="bibr">33</xref>,<xref rid="R34" ref-type="bibr">34</xref>,<xref rid="R44" ref-type="bibr">44</xref>], majority weighted minority oversampling technique (MWMOTE) [<xref rid="R44" ref-type="bibr">44</xref>], borderline SMOTE (BLSMOTE) [<xref rid="R69" ref-type="bibr">69</xref>] and <italic toggle="yes">k</italic>-means SMOTE (KMSMOTE) [<xref rid="R44" ref-type="bibr">44</xref>].</p><p id="P23">The software used to conduct analysis is not strictly part of data preprocessing. However, there are an increasing number of widely available software programs developed for conducting analytics in which different implementations of the algorithms (e.g., non-convergence rates) within the software lead to slightly different analysis outcomes. We have no reason to suspect that any such issue is related to the outcomes within the studies under review, but it is good practice to report the software used for preprocessing and data analysis. The software used in the reviewed studies are shown in <xref rid="F5" ref-type="fig">Figure 5(D)</xref>. Most of the studies (78%) did describe which software was used to analyze data. The most common software was R [<xref rid="R25" ref-type="bibr">25</xref>,<xref rid="R26" ref-type="bibr">26</xref>,<xref rid="R31" ref-type="bibr">31</xref>,<xref rid="R33" ref-type="bibr">33</xref>,<xref rid="R38" ref-type="bibr">38</xref>,<xref rid="R43" ref-type="bibr">43</xref>&#x02013;<xref rid="R47" ref-type="bibr">47</xref>,<xref rid="R52" ref-type="bibr">52</xref>,<xref rid="R57" ref-type="bibr">57</xref>,<xref rid="R66" ref-type="bibr">66</xref>], which is an open-source, free software program. The next two most frequently used software programs were SAS [<xref rid="R23" ref-type="bibr">23</xref>,<xref rid="R30" ref-type="bibr">30</xref>,<xref rid="R58" ref-type="bibr">58</xref>,<xref rid="R60" ref-type="bibr">60</xref>,<xref rid="R61" ref-type="bibr">61</xref>,<xref rid="R67" ref-type="bibr">67</xref>] and SPSS [<xref rid="R20" ref-type="bibr">20</xref>,<xref rid="R35" ref-type="bibr">35</xref>,<xref rid="R37" ref-type="bibr">37</xref>,<xref rid="R40" ref-type="bibr">40</xref>,<xref rid="R59" ref-type="bibr">59</xref>,<xref rid="R68" ref-type="bibr">68</xref>].</p></sec><sec id="S9"><label>3.4.</label><title>Analytics approaches</title><p id="P24">Approaches to analytics are sometimes divided into classification and regression tasks [<xref rid="R71" ref-type="bibr">71</xref>]. In classification methodologies, the output of a statistical model is assigned to a particular class. In the case of occupational safety, an example of the output may be an injury event or a non-injury event. In regression methodologies, the output of a model is a continuous variable. In occupational safety research, an example of a continuous variable may be the frequency or rate of injuries. For example, Ajayi et al. [<xref rid="R33" ref-type="bibr">33</xref>] compared several different machine learning techniques predicting the number of hand injuries in power infrastructure operations workers. For most of the selected studies, the output of the models was assigned to a class (e.g., fatal injury, serious injury, first aid [<xref rid="R56" ref-type="bibr">56</xref>]). Most analytic techniques can be used for both classification and regression.</p><p id="P25"><xref rid="F6" ref-type="fig">Figure 6</xref> shows the techniques that were used in studies focusing on a single analytics approach or method. The most common analytics techniques among these studies were association rule mining [<xref rid="R21" ref-type="bibr">21</xref>,<xref rid="R25" ref-type="bibr">25</xref>,<xref rid="R30" ref-type="bibr">30</xref>,<xref rid="R39" ref-type="bibr">39</xref>,<xref rid="R54" ref-type="bibr">54</xref>,<xref rid="R59" ref-type="bibr">59</xref>,<xref rid="R63" ref-type="bibr">63</xref>,<xref rid="R64" ref-type="bibr">64</xref>,<xref rid="R72" ref-type="bibr">72</xref>] and Bayesian networks [<xref rid="R26" ref-type="bibr">26</xref>,<xref rid="R29" ref-type="bibr">29</xref>,<xref rid="R36" ref-type="bibr">36</xref>,<xref rid="R41" ref-type="bibr">41</xref>,<xref rid="R62" ref-type="bibr">62</xref>]. Many of these single-approach studies employed techniques that examine the relations among temporal variables (e.g., antecedent events preceding near misses or injuries) including association rule mining, multiple correspondence analysis, vector autoregression, object role modeling, cause-and-effect diagrams and axiomatic design framework. As an example, association rule mining is a technique that was originally developed to detect patterns of transactions in retail stores to identify the frequency of patterns in data by identifying conditional associations (e.g., if/then or antecedent/consequent relationships). By examining these conditional associations, events or characteristics that are commonly correlated with injury (the &#x02018;then&#x02019; or &#x02018;consequent&#x02019; portion of the association) can be detected. The resulting rules can vary in the number of items (e.g., a four-item rule could be: young workers [Item 1] with shorter job tenures [Item 2] who work in Department 3 [Item 3] are more likely to have lower limb injuries [Item 4]), although the analysis becomes more complex with additional items. Studies in this scoping review obtained two-item [<xref rid="R21" ref-type="bibr">21</xref>,<xref rid="R25" ref-type="bibr">25</xref>,<xref rid="R39" ref-type="bibr">39</xref>], three-item [<xref rid="R20" ref-type="bibr">20</xref>,<xref rid="R24" ref-type="bibr">24</xref>,<xref rid="R29" ref-type="bibr">29</xref>,<xref rid="R38" ref-type="bibr">38</xref>,<xref rid="R53" ref-type="bibr">53</xref>,<xref rid="R58" ref-type="bibr">58</xref>,<xref rid="R62" ref-type="bibr">62</xref>,<xref rid="R63" ref-type="bibr">63</xref>,<xref rid="R71" ref-type="bibr">71</xref>] all eight studies previously listed, four-item ([<xref rid="R21" ref-type="bibr">21</xref>,<xref rid="R30" ref-type="bibr">30</xref>,<xref rid="R39" ref-type="bibr">39</xref>,<xref rid="R40" ref-type="bibr">40</xref>,<xref rid="R54" ref-type="bibr">54</xref>,<xref rid="R72" ref-type="bibr">72</xref>] and five-item [<xref rid="R30" ref-type="bibr">30</xref>,<xref rid="R39" ref-type="bibr">39</xref>,<xref rid="R54" ref-type="bibr">54</xref>] rules. In general, the researchers were able to identify more specific scenarios in which injuries were likely to happen through association-rule mining. For example, Buddhakulsomsiri et al. [<xref rid="R63" ref-type="bibr">63</xref>] found that incidents that resulted in major injuries and high costs were associated with work performed by outside contractors, less experienced workers and workers between 41 and 50 years of age. The studies that used association-rule mining were in oil refining [<xref rid="R25" ref-type="bibr">25</xref>], warehousing [<xref rid="R21" ref-type="bibr">21</xref>], construction [<xref rid="R59" ref-type="bibr">59</xref>], steel manufacturing [<xref rid="R39" ref-type="bibr">39</xref>,<xref rid="R54" ref-type="bibr">54</xref>,<xref rid="R64" ref-type="bibr">64</xref>,<xref rid="R67" ref-type="bibr">67</xref>] and car manufacturing [<xref rid="R40" ref-type="bibr">40</xref>].</p><p id="P26">One finding that emerged was that many studies (41%) were comparing the performance of several different analytics approaches. The studies and the techniques compared are presented in <xref rid="T2" ref-type="table">Table 2</xref>. The objective of those comparisons was to identify techniques that were most accurate in predicting injuries (or near misses, depending on the study). The techniques that were identified as the most accurate in each study are highlighted in <xref rid="T2" ref-type="table">Table 2</xref> and included random forest and classification and regression tree techniques. A summary table of the analytics techniques performed in all 49 included studies can be found in the <xref rid="SD1" ref-type="supplementary-material">Supplemental data</xref>.</p></sec><sec id="S10"><label>3.5.</label><title>Analytics-derived insights</title><p id="P27">Most of the studies provided detailed findings from the analytics conducted. Of the studies that reported detailed findings, many of the relations identified were rather complex and were unlikely to be detected by using more traditional, simpler analytical approaches. For example, Ajayi et al. [<xref rid="R33" ref-type="bibr">33</xref>] identified complex interactions among project complexity, time of year, worker age and experience, and task type that were more predictive of injury. Sarkar et al. [<xref rid="R44" ref-type="bibr">44</xref>] developed 19 specific safety rules that described scenarios under which incidents were more likely to happen in different divisions of a steel manufacturing facility. Lingard et al. [<xref rid="R43" ref-type="bibr">43</xref>] identified complicated interactions among safety indicators over time that suggested a cyclical relationship between the behavior of management and the occurrence of injury. Through a <italic toggle="yes">&#x003c7;</italic><sup>2</sup> automatic interaction detector (CHAID) analysis, Marques et al. [<xref rid="R20" ref-type="bibr">20</xref>] were able to identify an optimal schedule for drug and alcohol testing of employees that reduced injuries but would not place undue burden on either the organization or its employees. Although the findings of some studies were somewhat unsurprising (e.g., attributes like incident types and primary causes were related to injury risk) [<xref rid="R22" ref-type="bibr">22</xref>,<xref rid="R28" ref-type="bibr">28</xref>,<xref rid="R67" ref-type="bibr">67</xref>], many analyses in these studies made use of free-text, narrative data which often go unanalyzed. Finally, six studies were methodologically focused and only reported information related to model fit (e.g., accuracy, robustness) [<xref rid="R28" ref-type="bibr">28</xref>,<xref rid="R36" ref-type="bibr">36</xref>,<xref rid="R38" ref-type="bibr">38</xref>,<xref rid="R41" ref-type="bibr">41</xref>,<xref rid="R45" ref-type="bibr">45</xref>,<xref rid="R66" ref-type="bibr">66</xref>].</p><p id="P28">Although a desirable end goal of any analytics effort is to reveal actionable insights, the findings may be less actionable than desired. In fact, they may reveal more fundamental or underlying problems with a company&#x02019;s existing data reporting systems. For example, through an expectation-maximization-based clustering analysis of free-text injury narratives, Verma et al. [<xref rid="R67" ref-type="bibr">67</xref>] identified misunderstandings on the part of workers about operational definitions for incident variables. The analysis revealed that workers were categorizing some events related to falling materials as slip/trip/fall events instead of &#x02018;struck by&#x02019; events. This study demonstrates that the analytics process is not unidirectional and can also inform and improve processes related to data collection. For example, it can lead to unexpected outcomes, like the improvement of data collection systems, by identifying unclear or incorrect instructions for incident reporting.</p><p id="P29">As analytics in occupational safety is a relatively new area of exploration, most of the studies were designed as methodological demonstrations or considered pilot studies from which future studies could be based. Only one study in the review described actions taken based on the analytics that were conducted. Tsang et al. [<xref rid="R21" ref-type="bibr">21</xref>] reported that because of the implementation of the analytics software that was developed, actions such as allocating additional recovery time from low-temperature environments or wearing at least three layers of clothing insulation decreased the frequency of employee injuries from 12 injuries per week to five.</p></sec><sec id="S11"><label>3.6.</label><title>Limitations and future directions identified by the included studies</title><p id="P30">There were many limitations identified in the included studies. Two categories of limitations were cited most frequently: model choice and data integrity. Whereas all models had some degree of success in predicting safety incidents, some data did not meet the assumptions of the chosen model which undermined validity. In one example, which concerned text analytics, researchers found that the use of short-hand narratives to describe accidents led to too sparse an amount of contextual data to allow for valid conclusions [<xref rid="R36" ref-type="bibr">36</xref>]. To mitigate this shortcoming, Sarkar et al. [<xref rid="R28" ref-type="bibr">28</xref>] used more advanced methods such as expectation-maximization algorithms in follow-up studies. The authors also stated that more in-depth analyses using similar statistical techniques may yield richer insights. For another example, Guo et al. [<xref rid="R59" ref-type="bibr">59</xref>] suggested using additional multidimensional association rule mining of unsafe behavior to learn more about behavior patterns.</p><p id="P31">Other common limitations identified by the studies in the review included data collection methods, data integrity and data variety/breadth. Recurring problems with collection methods included biased observers, a lack of subject matter experts and the underreporting of injuries; a frequently proposed solution within the identified studies to these problems is the use of standardized surveys made by subject matter experts. Common data integrity issues included confidentiality concerns and inconsistency of record-keeping across different departments. The final limitation within datasets concerned the breadth of data. In one case, while building a rule mining database, the researchers found that their case study was not sufficiently large or broad enough to allow for generalizable results [<xref rid="R59" ref-type="bibr">59</xref>]. Another study found that the sample size limited their ability to extract decision rules from the decision tree they had generated [<xref rid="R65" ref-type="bibr">65</xref>].</p></sec></sec><sec id="S12"><label>4.</label><title>Discussion</title><p id="P32">For the current scoping review, we identified 49 studies conducted since 2007 that implemented analytics techniques within an enterprise or establishment to improve workplace safety. The identified studies were conducted predominantly in the steel manufacturing industry within India, although a smaller number of studies were conducted in other manufacturing industries, warehousing and construction. More than half (<italic toggle="yes">n</italic> = 27, 55%) of the included studies were performed by Maiti and colleagues at the Indian Institute of Technology Kharagpur, which accounts for the predominance of studies conducted in an Indian steel manufacturing plant. Their productivity is demonstrative of the depth of occupational safety insights that can be obtained through data analytics. The review uncovered a large variety in the types of analytics that have been implemented and the software used to conduct the analyses. The insights revealed with analytics approaches in many studies were likely more complex than those that can be identified by traditional approaches to injury data analysis.</p><sec id="S13"><label>4.1.</label><title>Variety of data sources</title><p id="P33">The reviewed studies primarily analyzed safety-related data. Most of the data consisted of lagging indicators such as the details of incidents and near misses. More recent studies have incorporated leading indicators, including safety audit reports, safety training records and the frequency of toolbox talks. By incorporating these leading indicators into safety analytics, organizations can assess how the effort put toward proactive safety programs is borne out quantitatively in improvements to lagging indicator metrics. Additionally, by conducting the somewhat onerous data preprocessing stages for leading indicators, organizations may be able to identify areas in need of improvement or gaps in their safety programs. Extending data sources beyond safety-specific data to other divisions of an organization could have the potential to detect new relations among variables. For example, incorporating production data (e.g., production pressure), human resource management data (e.g., overtime) or weather data (e.g., temperature, humidity, precipitation) could provide a more detailed picture of the potential causes of injury.</p><p id="P34">Providing more information about the variables analyzed is another area that could be improved. More than one-quarter of the studies did not provide definitions for the variables in the analysis, and on a few occasions the names of the variables were shorthand labels used in the data analysis programs, making interpretation by the reader more difficult. Explicitly stating the variable type, whether it was Boolean, numeric, categorical or free text, would also improve the understanding of the data, including if and how the data were converted from one format to another.</p></sec><sec id="S14"><label>4.2.</label><title>Data preprocessing and analytics approaches</title><p id="P35">As with the sources of data, more information about the data preprocessing stages could improve future manuscripts. A large proportion of the studies did not include information related to how variables were selected for analysis or how missing data were dealt with. In some cases, variable selection may be straightforward if the variables are limited to those collected as part of incident reporting. In any case, explicitly stating what variables were included and excluded would help the reader better understand the authors&#x02019; analytics process. Regarding missing data, there are a variety of approaches that can be implemented, including omitting files with missing data from the analysis or addressing them through imputation [<xref rid="R73" ref-type="bibr">73</xref>]. More than half of the included studies did not mention missing data, and of those that did only 11 described the specific procedures used. Given that missing data is an inevitability, particularly with data collected by managers and workers, how this is handled is an important part of the data preprocessing details that should be included.</p><p id="P36">There were a variety of analytics techniques implemented across the included studies, but often the reasons for selecting techniques were not described. Stating a rationale for the chosen methodology would assist others in deciding what approach to take with their own organizational data. Similarly, providing more detailed information about the software and packages utilized would provide further assistance to the research community. For example, providing a list of the R packages selected for data analysis would help others replicate an approach with their own data.</p></sec><sec id="S15"><label>4.3.</label><title>Findings</title><p id="P37">The analytics conducted in the included studies resulted in findings that often would not be obtained with approaches to injury data analysis that solely involve descriptive statistics. In many cases, more complex relations indicating the specific circumstances under which types of incidents occurred were detected and described. One potential area of improvement is to more clearly describe detected relationships among variables. Jargon that is specific to an organization may be difficult for the reader to interpret, and thus better-defined terms would improve understanding. This includes ensuring that all acronyms are spelled out and adequately defined within tables, figures and text. Further, studies often failed to describe their findings in great practical depth. While technique comparisons were common, these models were not compared to the base rate of safety predictions without modeling [<xref rid="R45" ref-type="bibr">45</xref>]. Studies also failed to report the practical application of their models; trained models were excellent at predicting another already-gathered dataset, but there was no mention of accident reduction when implemented at a facility [<xref rid="R41" ref-type="bibr">41</xref>,<xref rid="R65" ref-type="bibr">65</xref>].</p><p id="P38">Other studies generated conclusions using an inductive approach, with generation of their hypotheses post hoc [<xref rid="R33" ref-type="bibr">33</xref>]. While this method allows scientists to be more liberal in their investigations, a lack of a-priori hypothesis generation is a principal component that leads to reproducibility issues and to &#x02018;just so&#x02019; explanations that overgeneralize results [<xref rid="R74" ref-type="bibr">74</xref>]. Once again, without follow-up studies confirming the reduction in outcome variables pre and post model integration, it is difficult to assess whether these inductive conclusions are valid.</p><p id="P39">Studies often attributed blame to safety observers, citing a lack of informative data entry [<xref rid="R30" ref-type="bibr">30</xref>,<xref rid="R67" ref-type="bibr">67</xref>] or data integrity issues related to a diversity of reporting methods and different storage mediums [<xref rid="R63" ref-type="bibr">63</xref>]. While there very well may be data integrity issues associated with method of entry, a productive safety culture dissuades blaming individuals, instead focusing on aligning goals [<xref rid="R75" ref-type="bibr">75</xref>].</p></sec><sec id="S16"><label>4.4.</label><title>Future directions</title><p id="P40">Future studies can overcome the limitations described in the previous section with additional focus on prescriptive rather than simply predictive outcomes. Although models have shown success in predictive power, follow-up studies demonstrating reduction in real-world injuries would better inform practitioners in the field and demonstrate the utility of analytics. Model comparisons to more traditional analytics tools, such as multiple regression, could offer practitioners better perspective on the comparative advantages of more sophisticated models. Additionally, 37 out of 49 studies focused on steel manufacturing or construction. Expansion of these models to new sectors could both improve safety in those industries and possibly offer new insights into current predictive models.</p></sec></sec><sec id="S17"><label>5.</label><title>Conclusions</title><p id="P41">The present scoping review is the first of its kind to review applications of analytics to occupational safety-related concerns at the establishment and enterprise levels. More than 300 articles from databases and journal reviews were reviewed to survey the objectives, scope and methods used in this emerging field. Despite widespread interest and long-term reliance on data analytics in other fields, we conclude that the promise of analytics for providing actionable insights to address occupational safety concerns is still in its infancy. Our review shows that most of the articles were focused on method development and validation, including studies that tested novel methods or compared the utility of multiple methods. Despite these promising efforts, few studies reported actionable insights derived directly from the analytics. Therefore, the espoused goals and promise of analytics for occupational safety have yet to be fully realized. Nevertheless, we are optimistic that increasing use of and reliance on analytics by safety practitioners and researchers will spur rapid progress in this field, and the work described in the studies included in this review has resulted in a relative treasure trove of references for those interested in applying particular methods to their organizational data. Our review also revealed a final point worth emphasizing, and that is the importance of establishing &#x02018;readiness&#x02019; for analytics. Many of the studies cited various challenges in overcoming barriers caused by inadequate or inefficient technical infrastructures and unsupportive data cultures that threaten the accuracy and quality of insights revealed by the analytics. The old adage &#x02018;garbage in, garbage out&#x02019; characterizes a common threat to many well-intentioned analytics initiatives within companies. Indeed, many establishments or enterprises are simply not ready for analytics because inadequate measurement systems are in place. An &#x02018;analytics readiness audit&#x02019; seems to be a good first step before embarking on further analytics inquiries.</p></sec><sec sec-type="supplementary-material" id="SM1"><title>Supplementary Material</title><supplementary-material id="SD1" position="float" content-type="local-data"><label>supplementary material</label><media xlink:href="NIHMS1984496-supplement-supplementary_material.docx" id="d66e1035" position="anchor"/></supplementary-material></sec></body><back><fn-group><fn id="FN1"><p id="P42">Disclaimer</p><p id="P43">The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the National Institute for Occupational Safety and Health (NIOSH), Centers for Disease Control and Prevention.</p></fn><fn fn-type="COI-statement" id="FN2"><p id="P44">Disclosure statement</p><p id="P45">No potential conflict of interest was reported by the authors.</p></fn><fn id="FN3"><p id="P46">Supplemental data</p><p id="P47">Supplemental data for this article can be accessed at <ext-link xlink:href="10.1080/10803548.2024.2325301" ext-link-type="doi">http://dx.doi.org/10.1080/10803548.2024.2325301</ext-link>.</p></fn></fn-group><ref-list><title>References</title><ref id="R1"><label>[1]</label><mixed-citation publication-type="book"><name><surname>Wagner</surname><given-names>GR</given-names></name>. <source>Can predictive analytics help reduce workplace risk?&#x02019; [Internet]</source>. <publisher-name>NIOSH science blog</publisher-name>; <year>2014</year>. Available from: <comment><ext-link xlink:href="https://blogs.cdc.gov/niosh-science-blog/2014/10/02/pa/" ext-link-type="uri">https://blogs.cdc.gov/niosh-science-blog/2014/10/02/pa/</ext-link></comment> [<date-in-citation>cited 2024 March 20</date-in-citation>].</mixed-citation></ref><ref id="R2"><label>[2]</label><mixed-citation publication-type="book"><name><surname>Suman</surname><given-names>A</given-names></name>. <source>Rethinking predictive analysis: Learn how to stop workplace incidents before they occur [Internet]</source>. <publisher-name>ISHN</publisher-name>; <year>2021</year> [<date-in-citation>cited 2024 Mar 20</date-in-citation>]. Available from: <comment><ext-link xlink:href="https://www.ishn.com/articles/113203-rethinking-predictive-analysis-learn-how-to-stop-workplace-incidents-before-they-occur" ext-link-type="uri">https://www.ishn.com/articles/113203-rethinking-predictive-analysis-learn-how-to-stop-workplace-incidents-before-they-occur</ext-link></comment></mixed-citation></ref><ref id="R3"><label>[3]</label><mixed-citation publication-type="journal"><name><surname>Maman</surname><given-names>ZS</given-names></name>, <name><surname>Chen</surname><given-names>Y-J</given-names></name>, <name><surname>Baghdadi</surname><given-names>A</given-names></name>, <etal/>
<article-title>A data analytic framework for physical fatigue management using wearable sensors</article-title>. <source>Expert Syst Appl</source>. <year>2020</year>;<volume>155</volume>:113405. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2020.113405</pub-id></mixed-citation></ref><ref id="R4"><label>[4]</label><mixed-citation publication-type="journal"><name><surname>Patel</surname><given-names>V</given-names></name>, <name><surname>Chesmore</surname><given-names>A</given-names></name>, <name><surname>Legner</surname><given-names>CM</given-names></name>, <etal/>
<article-title>Trends in workplace wearable technologies and connected-worker solutions for next-generation occupational safety, health, and productivity</article-title>. <source>Adv Intell Syst</source>. <year>2022</year>;<volume>4</volume>(<issue>1</issue>):2100099. doi:<pub-id pub-id-type="doi">10.1002/aisy.202100099</pub-id></mixed-citation></ref><ref id="R5"><label>[5]</label><mixed-citation publication-type="journal"><name><surname>Jebelli</surname><given-names>H</given-names></name>, <name><surname>Choi</surname><given-names>B</given-names></name>, <name><surname>Lee</surname><given-names>S</given-names></name>. <article-title>Application of wearable biosensors to construction sites. II: assessing workers&#x02019; physical demand</article-title>. <source>J Constr Eng Manag</source>. <year>2019</year>;<volume>145</volume>(<issue>12</issue>):<fpage>1</fpage>&#x02013;<lpage>12</lpage>.</mixed-citation></ref><ref id="R6"><label>[6]</label><mixed-citation publication-type="journal"><name><surname>Manuele</surname><given-names>FA</given-names></name>. <article-title>Risk assessment &#x00026; hierarchies of control</article-title>. <source>Prof Saf</source>. <year>2005</year>;<volume>50</volume>(<issue>5</issue>):<fpage>33</fpage>&#x02013;<lpage>39</lpage>.</mixed-citation></ref><ref id="R7"><label>[7]</label><mixed-citation publication-type="journal"><name><surname>Phuc</surname><given-names>LTH</given-names></name>, <name><surname>Jeon</surname><given-names>H</given-names></name>, <name><surname>Truong</surname><given-names>NTN</given-names></name>, <etal/>
<article-title>Applying the Haar-cascade algorithm for detecting safety equipment in safety management systems for multiple working environments</article-title>. <source>Electronics (Basel)</source>. <year>2019</year>;<volume>8</volume>(<issue>10</issue>):<fpage>1079</fpage>. doi:<pub-id pub-id-type="doi">10.3390/electronics8101079</pub-id></mixed-citation></ref><ref id="R8"><label>[8]</label><mixed-citation publication-type="journal"><name><surname>Zhao</surname><given-names>Y</given-names></name>, <name><surname>Li</surname><given-names>J</given-names></name>, <name><surname>Zhang</surname><given-names>M</given-names></name>, <etal/>
<article-title>Machine learning models for the hearing impairment prediction in workers exposed to complex industrial noise:apilotstudy</article-title>. <source>EarHear</source>.<year>2019</year>;<volume>40</volume>(<issue>3</issue>):<fpage>690</fpage>.doi:<pub-id pub-id-type="doi">10.1097/AUD.0000000000000649</pub-id></mixed-citation></ref><ref id="R9"><label>[9]</label><mixed-citation publication-type="journal"><name><surname>Shrestha</surname><given-names>K</given-names></name>, <name><surname>Shrestha</surname><given-names>PP</given-names></name>, <name><surname>Bajracharya</surname><given-names>D</given-names></name>, <etal/>
<article-title>Hard-hat detection for construction safety visualization</article-title>. <source>J Constr Eng</source>. <year>2015</year>;<volume>2015</volume>(<issue>1</issue>):<fpage>1</fpage>&#x02013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1155/2015/721380</pub-id></mixed-citation></ref><ref id="R10"><label>[10]</label><mixed-citation publication-type="journal"><name><surname>Fang</surname><given-names>W</given-names></name>, <name><surname>Ding</surname><given-names>L</given-names></name>, <name><surname>Luo</surname><given-names>H</given-names></name>, <etal/>
<article-title>Falls from heights: a computer vision-based approach for safety harness detection</article-title>. <source>Autom Constr</source>. <year>2018</year>;<volume>91</volume>:<fpage>53</fpage>&#x02013;<lpage>61</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.autcon.2018.02.018</pub-id></mixed-citation></ref><ref id="R11"><label>[11]</label><mixed-citation publication-type="journal"><name><surname>Rivas</surname><given-names>T</given-names></name>, <name><surname>Paz</surname><given-names>M</given-names></name>, <name><surname>Mart&#x000ed;n</surname><given-names>J</given-names></name>, <etal/>
<article-title>Explaining and predicting workplace accidents using data-mining techniques</article-title>. <source>Reliab Eng Syst Saf</source>. <year>2011</year>;<volume>96</volume>(<issue>7</issue>):<fpage>739</fpage>&#x02013;<lpage>747</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ress.2011.03.006</pub-id></mixed-citation></ref><ref id="R12"><label>[12]</label><mixed-citation publication-type="journal"><name><surname>Mohammadian</surname><given-names>F</given-names></name>, <name><surname>Sadeghi</surname><given-names>M</given-names></name>, <name><surname>Hanifi</surname><given-names>SM</given-names></name>, <etal/>
<article-title>Modeling important factors on occupational accident severity factor in the construction industry using a combination of artificial neural network and genetic algorithm</article-title>. <source>Work</source>. <year>2022</year>;<volume>73</volume>(<issue>1</issue>):<fpage>189</fpage>&#x02013;<lpage>202</lpage>. doi:<pub-id pub-id-type="doi">10.3233/WOR-205271</pub-id><pub-id pub-id-type="pmid">35871380</pub-id>
</mixed-citation></ref><ref id="R13"><label>[13]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Ejaz</surname><given-names>N</given-names></name>, <name><surname>Kumar</surname><given-names>M</given-names></name>, <etal/>
<source>Root cause analysis of incidents using text clustering and classification algorithms</source>. In: <name><surname>Singh</surname><given-names>PK</given-names></name>, <name><surname>Panigrahi</surname><given-names>BK</given-names></name>, <name><surname>Suryadevara</surname><given-names>NK</given-names></name>, editors. <conf-name>Proceedings of ICETIT 2019</conf-name>. <conf-loc>Cham, Switzerland</conf-loc>: <publisher-name>Springer</publisher-name>; <year>2019</year>. p. <fpage>707</fpage>&#x02013;<lpage>718</lpage>.</mixed-citation></ref><ref id="R14"><label>[14]</label><mixed-citation publication-type="journal"><name><surname>Longbing</surname><given-names>B</given-names></name>. <article-title>Data science and analytics: a new era</article-title>. <source>Int J Data Sci Analytics</source>. <year>2016</year>;<volume>1</volume>:<fpage>1</fpage>&#x02013;<lpage>2</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s41060-016-0006-1</pub-id></mixed-citation></ref><ref id="R15"><label>[15]</label><mixed-citation publication-type="confproc"><name><surname>Almosallam</surname><given-names>EA</given-names></name>, <name><surname>Ouertani</surname><given-names>HC</given-names></name>. <source>Learning analytics: definitions, applications and related fields</source>. In: <name><surname>Herawan</surname><given-names>T</given-names></name>, <name><surname>Deris</surname><given-names>MM</given-names></name>, <name><surname>Abawajy</surname><given-names>J</given-names></name>, editors. <conf-name>Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013)</conf-name>. <conf-loc>Singapore</conf-loc>: <publisher-name>Springer</publisher-name>; <year>2014</year>. p. <fpage>721</fpage>&#x02013;<lpage>730</lpage>.</mixed-citation></ref><ref id="R16"><label>[16]</label><mixed-citation publication-type="journal"><name><surname>Cooper</surname><given-names>A</given-names></name>. <article-title>What is analytics? Definition and essential characteristics</article-title>. <source>CETIS Analytics Ser</source>. <year>2012</year>;<volume>1</volume>(<issue>5</issue>):<fpage>1</fpage>&#x02013;<lpage>10</lpage>.</mixed-citation></ref><ref id="R17"><label>[17]</label><mixed-citation publication-type="journal"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <article-title>Machine learning in occupational accident analysis: a review using science mapping approach with citation network analysis</article-title>. <source>Saf Sci</source>. <year>2020</year>;<volume>131</volume>:104900. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2020.104900</pub-id></mixed-citation></ref><ref id="R18"><label>[18]</label><mixed-citation publication-type="journal"><name><surname>Tricco</surname><given-names>AC</given-names></name>, <name><surname>Lillie</surname><given-names>E</given-names></name>, <name><surname>Zarin</surname><given-names>W</given-names></name>, <etal/>
<article-title>PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation</article-title>. <source>Ann Intern Med</source>
<year>2018</year>;<volume>169</volume>(<issue>7</issue>):<fpage>467</fpage>&#x02013;<lpage>473</lpage>. doi:<pub-id pub-id-type="doi">10.7326/M18-0850</pub-id><pub-id pub-id-type="pmid">30178033</pub-id>
</mixed-citation></ref><ref id="R19"><label>[19]</label><mixed-citation publication-type="book"><name><surname>Nemmers</surname><given-names>P</given-names></name>. <year>2023</year>. <source>The differences between incidents vs. accidents in the workplace</source>. <publisher-name>National Association of Safety Professionals</publisher-name>. 2023 [<date-in-citation>cited March 20, 2024</date-in-citation>]. Available from <comment><ext-link xlink:href="https://naspweb.com/blog/the-differences-between-incidents-and-accidents-in-the-workplace" ext-link-type="uri">https://naspweb.com/blog/the-differences-between-incidents-and-accidents-in-the-workplace</ext-link></comment></mixed-citation></ref><ref id="R20"><label>[20]</label><mixed-citation publication-type="journal"><name><surname>Marques</surname><given-names>PH</given-names></name>, <name><surname>Jesus</surname><given-names>V</given-names></name>, <name><surname>Olea</surname><given-names>SA</given-names></name>, <etal/>
<article-title>The effect of alcohol and drug testing at the workplace on individual&#x02019;s occupational accident risk</article-title>. <source>Saf Sci</source>. <year>2014</year>;<volume>68</volume>:<fpage>108</fpage>&#x02013;<lpage>120</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2014.03.007</pub-id></mixed-citation></ref><ref id="R21"><label>[21]</label><mixed-citation publication-type="journal"><name><surname>Tsang</surname><given-names>Y</given-names></name>, <name><surname>Choy</surname><given-names>K</given-names></name>, <name><surname>Koo</surname><given-names>P</given-names></name>, <etal/>
<article-title>A fuzzy association rule-based knowledge management system for occupational safety and health programs in cold storage facilities</article-title>. <source>VINE J Inf Knowl Manage Syst</source>. <year>2018</year>;<volume>2</volume>:<fpage>199</fpage>&#x02013;<lpage>206</lpage>.</mixed-citation></ref><ref id="R22"><label>[22]</label><mixed-citation publication-type="confproc"><name><surname>Hoenigsberger</surname><given-names>F</given-names></name>, <name><surname>Saranti</surname><given-names>A</given-names></name>, <name><surname>Angerschmid</surname><given-names>A</given-names></name>, <etal/>, editors. <source>Machine learning and knowledge extraction to support work safety for smart forest operations</source>. In: <conf-name>International Cross-Domain Conference for Machine Learning and Knowledge Extraction</conf-name>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2022</year>.</mixed-citation></ref><ref id="R23"><label>[23]</label><mixed-citation publication-type="book"><name><surname>Verma</surname><given-names>A</given-names></name>, <name><surname>Chatterjee</surname><given-names>S</given-names></name>, <name><surname>Sarkar</surname><given-names>S</given-names></name>, <etal/>
<part-title>Data-driven mapping between proactive and reactive measures of occupational safety performance</part-title>. In: <name><surname>Maiti</surname><given-names>J</given-names></name>, <name><surname>Ray</surname><given-names>PK</given-names></name>, editors. <source>Industrial safety management</source>. <publisher-loc>Singapore</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2018</year>. p. <fpage>53</fpage>&#x02013;<lpage>63</lpage>.</mixed-citation></ref><ref id="R24"><label>[24]</label><mixed-citation publication-type="journal"><name><surname>Singh</surname><given-names>K</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>, <name><surname>Dhalmahapatra</surname><given-names>K</given-names></name>. <article-title>Chain of events model for safety management: data analytics approach</article-title>. <source>Saf Sci</source>. <year>2019</year>;<volume>118</volume>:<fpage>568</fpage>&#x02013;<lpage>582</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2019.05.044</pub-id></mixed-citation></ref><ref id="R25"><label>[25]</label><mixed-citation publication-type="journal"><name><surname>Bevilacqua</surname><given-names>M</given-names></name>, <name><surname>Ciarapica</surname><given-names>FE</given-names></name>. <article-title>Human factor risk management in the process industry: a case study</article-title>. <source>Reliab Eng Syst Saf</source>. <year>2018</year>;<volume>169</volume>:<fpage>149</fpage>&#x02013;<lpage>159</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ress.2017.08.013</pub-id></mixed-citation></ref><ref id="R26"><label>[26]</label><mixed-citation publication-type="confproc"><name><surname>Verma</surname><given-names>A</given-names></name>, <name><surname>Rajput</surname><given-names>D</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>, editors. <source>Prioritization of near-miss incidents using text mining and Bayesian network</source>. In: <conf-name>Advances in Computing and Data Sciences: First International Conference, ICACDS 2016</conf-name>, <conf-loc>Ghaziabad, India</conf-loc>, <conf-date>November 11&#x02013;12, 2016</conf-date>, Revised Selected Papers 1. Singapore: <publisher-name>Springer</publisher-name>; <year>2017</year>. p. <fpage>183</fpage>&#x02013;<lpage>191</lpage>.</mixed-citation></ref><ref id="R27"><label>[27]</label><mixed-citation publication-type="journal"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Vinay</surname><given-names>S</given-names></name>, <name><surname>Raj</surname><given-names>R</given-names></name>, <etal/>
<article-title>Application of optimized machine learning techniques for prediction of occupational accidents</article-title>. <source>Comput Oper Res</source>. <year>2019</year>;<volume>106</volume>:<fpage>210</fpage>&#x02013;<lpage>224</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.cor.2018.02.021</pub-id></mixed-citation></ref><ref id="R28"><label>[28]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Lodhi</surname><given-names>V</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <source>Text-clustering based deep neural network for prediction of occupational accident risk: a case study</source>. In: <conf-name>2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)</conf-name>. <conf-loc>Pattaya, Thailand</conf-loc>; <conf-date>15&#x02013;17 November 2018</conf-date>. p. <fpage>1</fpage>&#x02013;<lpage>6</lpage>.</mixed-citation></ref><ref id="R29"><label>[29]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Kumar</surname><given-names>A</given-names></name>, <name><surname>Mohanpuria</surname><given-names>SK</given-names></name>, <etal/>
<source>Application of Bayesian network model in explaining occupational accidents in a steel industry</source>. In: <year>2017</year>
<conf-name>Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)</conf-name>. <conf-loc>Kolkata, India</conf-loc>; <conf-date>03&#x02013;05 November 2017</conf-date>, <fpage>1</fpage>&#x02013;<lpage>6</lpage>.</mixed-citation></ref><ref id="R30"><label>[30]</label><mixed-citation publication-type="journal"><name><surname>Verma</surname><given-names>A</given-names></name>, <name><surname>Khan</surname><given-names>SD</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>, <etal/>
<article-title>Identifying patterns of safety related incidents in a steel plant using association rule mining of incident investigation reports</article-title>. <source>Safety Sci</source>. <year>2014</year>;<volume>70</volume>:<fpage>89</fpage>&#x02013;<lpage>98</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2014.05.007</pub-id></mixed-citation></ref><ref id="R31"><label>[31]</label><mixed-citation publication-type="journal"><name><surname>Dhalmahapatra</surname><given-names>K</given-names></name>, <name><surname>Shingade</surname><given-names>R</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <article-title>An innovative integrated modelling of safety data using multiple correspondence analysis and fuzzy discretization techniques</article-title>. <source>Safety Sci</source>. <year>2020</year>;<volume>130</volume>:104828. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2020.104828</pub-id></mixed-citation></ref><ref id="R32"><label>[32]</label><mixed-citation publication-type="confproc"><name><surname>Rusu</surname><given-names>O</given-names></name>, <name><surname>Halcu</surname><given-names>I</given-names></name>, <name><surname>Grigoriu</surname><given-names>O</given-names></name>, <etal/>
<source>Converting unstructured and semi-structured data into knowledge</source>. In: <conf-name>2013 11th RoEduNet International Conference</conf-name>; <conf-loc>Sinaia, Romania</conf-loc>; <conf-date>17&#x02013;19 January 2013</conf-date>. p. <fpage>1</fpage>&#x02013;<lpage>4</lpage>.</mixed-citation></ref><ref id="R33"><label>[33]</label><mixed-citation publication-type="journal"><name><surname>Ajayi</surname><given-names>A</given-names></name>, <name><surname>Oyedele</surname><given-names>L</given-names></name>, <name><surname>Akinade</surname><given-names>O</given-names></name>, <etal/>
<article-title>Optimised big data analytics for health and safety hazards prediction in power infrastructure operations</article-title>. <source>Safety Sci</source>. <year>2020</year>;<volume>125</volume>:104656. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2020.104656</pub-id></mixed-citation></ref><ref id="R34"><label>[34]</label><mixed-citation publication-type="journal"><name><surname>Poh</surname><given-names>CQ</given-names></name>, <name><surname>Ubeynarayana</surname><given-names>CU</given-names></name>, <name><surname>Goh</surname><given-names>YM</given-names></name>. <article-title>Safety leading indicators for construction sites: a machine learning approach</article-title>. <source>Autom Constr</source>. <year>2018</year>;<volume>93</volume>:<fpage>375</fpage>&#x02013;<lpage>386</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.autcon.2018.03.022</pub-id></mixed-citation></ref><ref id="R35"><label>[35]</label><mixed-citation publication-type="journal"><name><surname>Shirali</surname><given-names>GA</given-names></name>, <name><surname>Noroozi</surname><given-names>MV</given-names></name>, <name><surname>Malehi</surname><given-names>AS</given-names></name>. <article-title>Predicting the outcome of occupational accidents by CART and CHAID methods at a steel factory in Iran</article-title>. <source>J Public Health Res</source>. <year>2018</year>;<volume>7</volume>(<issue>2</issue>):<fpage>74</fpage>&#x02013;<lpage>80</lpage>. doi:<pub-id pub-id-type="doi">10.4081/jphr.2018.1361</pub-id></mixed-citation></ref><ref id="R36"><label>[36]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Vinay</surname><given-names>S</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <source>Text mining based safety risk assessment and prediction of occupational accidents in a steel plant</source>. In: <conf-name>Computational Techniquesin Information and Communication Technologies (ICCTICT), 2016 International ConferenceNew</conf-name>
<conf-loc>Delhi, India</conf-loc>; <conf-date>11&#x02013;13 March 2016</conf-date>. p. <fpage>1</fpage>&#x02013;<lpage>6</lpage>.</mixed-citation></ref><ref id="R37"><label>[37]</label><mixed-citation publication-type="journal"><name><surname>Bevilacqua</surname><given-names>M</given-names></name>, <name><surname>Ciarapica</surname><given-names>FE</given-names></name>, <name><surname>Giacchetta</surname><given-names>G</given-names></name>. <article-title>Data mining for occupational injury risk: a case study</article-title>. <source>Int J Reliab Qual Saf Eng</source>. <year>2010</year>;<volume>17</volume>(<issue>4</issue>):<fpage>351</fpage>&#x02013;<lpage>380</lpage>. doi:<pub-id pub-id-type="doi">10.1142/S021853931000386X</pub-id></mixed-citation></ref><ref id="R38"><label>[38]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Pateshwari</surname><given-names>V</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <source>Predictive model for incident occurrences in steel plant in India</source>. In: <conf-name>2017 8th International Conference on Computing, Communication and Networking Technologies (ICC-CNT)</conf-name>. <conf-loc>Delhi, India</conf-loc>; <conf-date>03&#x02013;05 July 2017</conf-date>. p. <fpage>1</fpage>&#x02013;<lpage>5</lpage>.</mixed-citation></ref><ref id="R39"><label>[39]</label><mixed-citation publication-type="book"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Lohani</surname><given-names>A</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <part-title>Genetic algorithm-based association rule mining approach towards rule generation of occupational accidents</part-title>. In: <name><surname>Mandal</surname><given-names>J</given-names></name>, <name><surname>Dutta</surname><given-names>P</given-names></name>, <name><surname>Mukhopadhyay</surname><given-names>S</given-names></name>, editors. <source>Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science</source>, vol <volume>776</volume>. <publisher-loc>Singapore</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2017</year>. doi:<pub-id pub-id-type="doi">10.1007/978-981-10-6430-2_40</pub-id></mixed-citation></ref><ref id="R40"><label>[40]</label><mixed-citation publication-type="journal"><name><surname>Khosrowabadi</surname><given-names>N</given-names></name>, <name><surname>Ghousi</surname><given-names>R</given-names></name>, <name><surname>Makui</surname><given-names>A</given-names></name>. <article-title>Decision support approach on occupational safety using data mining</article-title>. <source>Int J Ind Eng Prod Res</source>. <year>2019</year>;<volume>30</volume>(<issue>2</issue>):<fpage>149</fpage>&#x02013;<lpage>164</lpage>.</mixed-citation></ref><ref id="R41"><label>[41]</label><mixed-citation publication-type="confproc"><name><surname>Pekel</surname><given-names>E</given-names></name>, <name><surname>Ak&#x0015f;chir</surname><given-names>ZD</given-names></name>, <name><surname>Meto</surname><given-names>B</given-names></name>, <etal/>
<source>A Bayesian network application in occupational health and safety</source>. In: <conf-name>2018 3rd International Conference on Computer Science and Engineering (UBMK)</conf-name>. <conf-loc>Sarajevo, Bosnia and Herzegovina</conf-loc>; <conf-date>20&#x02013;23 September 2018</conf-date>. p. <fpage>1</fpage>&#x02013;<lpage>5</lpage>.</mixed-citation></ref><ref id="R42"><label>[42]</label><mixed-citation publication-type="confproc"><name><surname>Muthusamy</surname><given-names>K</given-names></name>, <name><surname>Gunasegaran</surname><given-names>HR</given-names></name>, <name><surname>Natarajan</surname><given-names>E</given-names></name>, <etal/>
<source>Analysis of potential project work accidents: a case study of a construction project in Malaysia</source>. In: <conf-name>2021 IEEE European Technology and Engineering Management Summit (E-TEMS)</conf-name>. <conf-loc>Dortmund, Germany</conf-loc>; <conf-date>18&#x02013;20 March 2021</conf-date>. p. <fpage>1</fpage>&#x02013;<lpage>6</lpage>.</mixed-citation></ref><ref id="R43"><label>[43]</label><mixed-citation publication-type="journal"><name><surname>Lingard</surname><given-names>H</given-names></name>, <name><surname>Hallowell</surname><given-names>M</given-names></name>, <name><surname>Salas</surname><given-names>R</given-names></name>, <etal/>
<article-title>Leading or lagging? Temporal analysis of safety indicators on a large infrastructure construction project</article-title>. <source>Safety Sci</source>. <year>2017</year>;<volume>91</volume>:<fpage>206</fpage>&#x02013;<lpage>220</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2016.08.020</pub-id></mixed-citation></ref><ref id="R44"><label>[44]</label><mixed-citation publication-type="journal"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Pramanik</surname><given-names>A</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>, <etal/>
<article-title>Predicting and analyzing injury severity: a machine learning-based approach using class-imbalanced proactive and reactive data</article-title>. <source>Safety Sci</source>. <year>2020</year>;<volume>125</volume>:104616. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2020.104616</pub-id></mixed-citation></ref><ref id="R45"><label>[45]</label><mixed-citation publication-type="book"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Gaine</surname><given-names>S</given-names></name>, <name><surname>Deshmukh</surname><given-names>A</given-names></name>, <etal/>
<part-title>A structural topic modelling-based machine learning approach for pattern extraction from accident data</part-title>. In: <name><surname>Raju</surname><given-names>KS</given-names></name>, <name><surname>Senkerik</surname><given-names>S</given-names></name>, <name><surname>Lanka</surname><given-names>SP</given-names></name>, <name><surname>Rajagopal</surname><given-names>V</given-names></name>, editors. <source>Data engineering and communication technology</source>. <publisher-loc>Singapore</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2020</year>. p. <fpage>555</fpage>&#x02013;<lpage>564</lpage>.</mixed-citation></ref><ref id="R46"><label>[46]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Patel</surname><given-names>A</given-names></name>, <name><surname>Madaan</surname><given-names>S</given-names></name>, <etal/>
<source>Prediction of occupational accidents using decision tree approach</source>. In: <conf-name>2016 IEEE Annual India Conference (INDICON)</conf-name>. <conf-loc>Bangalore, India</conf-loc>; <conf-date>16&#x02013;18 December 2016</conf-date>. <fpage>1</fpage>&#x02013;<lpage>6</lpage>.</mixed-citation></ref><ref id="R47"><label>[47]</label><mixed-citation publication-type="journal"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Raj</surname><given-names>R</given-names></name>, <name><surname>Vinay</surname><given-names>S</given-names></name>, <etal/>
<article-title>An optimization-based decision tree approach for predicting slip&#x02013;trip&#x02013;fall accidents at work</article-title>. <source>Safety Sci</source>. <year>2019</year>;<volume>118</volume>:<fpage>57</fpage>&#x02013;<lpage>69</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2019.05.009</pub-id></mixed-citation></ref><ref id="R48"><label>[48]</label><mixed-citation publication-type="journal"><name><surname>Polyvyanyy</surname><given-names>A</given-names></name>, <name><surname>Pika</surname><given-names>A</given-names></name>, <name><surname>Wynn</surname><given-names>MT</given-names></name>, <etal/>
<article-title>A systematic approach for discovering causal dependencies between observations and incidents in the health and safety domain</article-title>. <source>Safety Sci</source>. <year>2019</year>;<volume>118</volume>:<fpage>345</fpage>&#x02013;<lpage>354</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2019.04.045</pub-id></mixed-citation></ref><ref id="R49"><label>[49]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Baidya</surname><given-names>S</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <source>Application of rough set theory in accident analysis at work: a case study</source>. In: <conf-name>2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)</conf-name>. <conf-loc>Kolkata, India</conf-loc>; <conf-date>03&#x02013;05 November 2017</conf-date>. p. <fpage>1</fpage>&#x02013;<lpage>6</lpage>.</mixed-citation></ref><ref id="R50"><label>[50]</label><mixed-citation publication-type="journal"><name><surname>Oyedele</surname><given-names>A</given-names></name>, <name><surname>Ajayi</surname><given-names>A</given-names></name>, <name><surname>Oyedele</surname><given-names>LO</given-names></name>, <etal/>
<article-title>Deep learning and boosted trees for injuries prediction in power infrastructure projects</article-title>. <source>Appl Soft Comput</source>. <year>2021</year>;<volume>110</volume>:107587. doi:<pub-id pub-id-type="doi">10.1016/j.asoc.2021.107587</pub-id></mixed-citation></ref><ref id="R51"><label>[51]</label><mixed-citation publication-type="confproc"><name><surname>Ugur</surname><given-names>O</given-names></name>, <name><surname>Arisoy</surname><given-names>AA</given-names></name>, <name><surname>Ganiz</surname><given-names>MC</given-names></name>, <etal/>
<source>Descriptive and prescriptive analysis of construction site incidents using decision tree classification and association rule mining</source>. In: <conf-name>2021 International Conference on Innovations in Intelligent Systems and Applications (INISTA)</conf-name>. <conf-loc>Kocaeli, Turkey</conf-loc>; <conf-date>25&#x02013;27 August 2021</conf-date>. p. <fpage>1</fpage>&#x02013;<lpage>6</lpage>.</mixed-citation></ref><ref id="R52"><label>[52]</label><mixed-citation publication-type="journal"><name><surname>Dhalmahapatra</surname><given-names>K</given-names></name>, <name><surname>Shingade</surname><given-names>R</given-names></name>, <name><surname>Mahajan</surname><given-names>H</given-names></name>, <etal/>
<article-title>Decision support system for safety improvement: an approach using multiple correspondence analysis, t-SNE algorithm and k-means clustering</article-title>. <source>Comput Ind Eng</source>. <year>2019</year>;<volume>128</volume>:<fpage>277</fpage>&#x02013;<lpage>289</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.cie.2018.12.044</pub-id></mixed-citation></ref><ref id="R53"><label>[53]</label><mixed-citation publication-type="journal"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Pramanik</surname><given-names>A</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <article-title>An integrated approach using rough set theory, ANFIS, and Z-number in occupational risk prediction</article-title>. <source>Eng Appl Artif Intell</source>. <year>2023</year>;<volume>117</volume>:105515. doi:<pub-id pub-id-type="doi">10.1016/j.engappai.2022.105515</pub-id></mixed-citation></ref><ref id="R54"><label>[54]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Vinay</surname><given-names>S</given-names></name>, <name><surname>Djeddi</surname><given-names>C</given-names></name>, <etal/>
<source>Text mining-based association rule mining for incident analysis: a case study of a steel plant in India</source>. In: <name><surname>Djeddi</surname><given-names>C</given-names></name>, <name><surname>Kessentini</surname><given-names>Y</given-names></name>, <name><surname>Siddiqi</surname><given-names>I</given-names></name>, <name><surname>Jmaiel</surname><given-names>M</given-names></name>, editors. <conf-name>Mediterranean Conference on Pattern Recognition and Artificial, Intelligence</conf-name>. <conf-loc>Hammamet, Tunisia</conf-loc>; <conf-date>20&#x02013;22 December 2020</conf-date>, <fpage>257</fpage>&#x02013;<lpage>273</lpage>.</mixed-citation></ref><ref id="R55"><label>[55]</label><mixed-citation publication-type="book"><name><surname>Pramanik</surname><given-names>A</given-names></name>, <name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Sai Siddharth</surname><given-names>V</given-names></name>, <etal/>
<part-title>Semi-automated ontology creation and upgradation for rail-road incidents: a case of a steel plant in India</part-title>. In: <name><surname>Jo&#x000e3;o</surname><given-names>Manuel</given-names></name>
<name><surname>Tavares</surname><given-names>RS</given-names></name>, <name><surname>Satyajit</surname><given-names>Chakrabarti</given-names></name>, <name><surname>Abhishek</surname><given-names>Bhattacharya</given-names></name>, et al., editors. <source>Emerging technologies in data mining and information security</source>. <publisher-loc>Singapore</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2021</year>. p. <fpage>285</fpage>&#x02013;<lpage>294</lpage>.</mixed-citation></ref><ref id="R56"><label>[56]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Ejaz</surname><given-names>N</given-names></name>, <name><surname>Promod</surname><given-names>C</given-names></name>, <etal/>
<source>Pattern extraction using proactive and reactive data: A case study of contractors&#x02019; safety in a steel plant</source>. In: <name><surname>Pradeep</surname><given-names>Kumar Singh</given-names></name>, <name><surname>Bijaya</surname><given-names>Ketan Panigrahi</given-names></name>, <name><surname>Nagender</surname><given-names>Kumar Suryadevara</given-names></name>, et al., editors. <conf-name>Proceedings of ICETIT 2019 Emerging Trends in Information Technology</conf-name>. <conf-loc>Cham, Switzerland</conf-loc>: <publisher-name>Springer</publisher-name>. p. <fpage>731</fpage>&#x02013;<lpage>742</lpage>.</mixed-citation></ref><ref id="R57"><label>[57]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Ejaz</surname><given-names>N</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <source>Application of hybrid clustering technique for pattern extraction of accident at work: a case study of a steel industry</source>. In: <conf-name>2018 4th International Conference on Recent Advances in Information Technology (RAIT)</conf-name>. <conf-loc>Dhanbad, India</conf-loc>; <conf-date>15&#x02013;17 March 2018</conf-date>. p. <fpage>1</fpage>&#x02013;<lpage>6</lpage>.</mixed-citation></ref><ref id="R58"><label>[58]</label><mixed-citation publication-type="journal"><name><surname>Verma</surname><given-names>A</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <article-title>Text-document clustering-based cause and effect analysis methodology for steel plant incident data</article-title>. <source>Int J Inj Contr Saf Promot</source>. <year>2018</year>;<volume>25</volume>(<issue>4</issue>):<fpage>416</fpage>&#x02013;<lpage>426</lpage>. doi:<pub-id pub-id-type="doi">10.1080/17457300.2018.1456468</pub-id><pub-id pub-id-type="pmid">29629618</pub-id>
</mixed-citation></ref><ref id="R59"><label>[59]</label><mixed-citation publication-type="journal"><name><surname>Guo</surname><given-names>S</given-names></name>, <name><surname>Zhang</surname><given-names>P</given-names></name>, <name><surname>Ding</surname><given-names>L</given-names></name>. <article-title>Time-statistical laws of workers&#x02019; unsafe behavior in the construction industry: a case study</article-title>. <source>Physica A</source>. <year>2019</year>;<volume>515</volume>:<fpage>419</fpage>&#x02013;<lpage>429</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.physa.2018.09.091</pub-id></mixed-citation></ref><ref id="R60"><label>[60]</label><mixed-citation publication-type="journal"><name><surname>Versteeg</surname><given-names>K</given-names></name>, <name><surname>Bigelow</surname><given-names>P</given-names></name>, <name><surname>Dale</surname><given-names>AM</given-names></name>, <etal/>
<article-title>Utilizing construction safety leading and lagging indicators to measure project safety performance: a case study</article-title>. <source>Safety Sci</source>. <year>2019</year>;<volume>120</volume>:<fpage>411</fpage>&#x02013;<lpage>421</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2019.06.035</pub-id></mixed-citation></ref><ref id="R61"><label>[61]</label><mixed-citation publication-type="journal"><name><surname>Verma</surname><given-names>A</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>, <name><surname>Boustras</surname><given-names>G</given-names></name>. <article-title>Analysis of categorical incident data and design for safety interventions using axiomatic design framework</article-title>. <source>Safety Sci</source>. <year>2020</year>;<volume>123</volume>:104557. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2019.104557</pub-id></mixed-citation></ref><ref id="R62"><label>[62]</label><mixed-citation publication-type="journal"><name><surname>Ghasemi</surname><given-names>F</given-names></name>, <name><surname>Kalatpour</surname><given-names>O</given-names></name>, <name><surname>Moghimbeigi</surname><given-names>A</given-names></name>, <etal/>
<article-title>Selecting strategies to reduce high-risk unsafe work behaviors using the safety behavior sampling technique and Bayesian network analysis</article-title>. <source>J Res Health Sci</source>. <year>2017</year>;<volume>17</volume>(<issue>1</issue>):<fpage>372</fpage>.</mixed-citation></ref><ref id="R63"><label>[63]</label><mixed-citation publication-type="journal"><name><surname>Buddhakulsomsiri</surname><given-names>J</given-names></name>, <name><surname>Pannakkong</surname><given-names>W</given-names></name>, <name><surname>Nanthavanij</surname><given-names>S</given-names></name>. <article-title>Application of association rule algorithm to industrial safety data mining</article-title>. <source>Int J Ind Syst Eng</source>. <year>2015</year>;<volume>21</volume>(<issue>4</issue>):<fpage>415</fpage>&#x02013;<lpage>437</lpage>. doi:<pub-id pub-id-type="doi">10.1504/IJISE.2015.072728</pub-id></mixed-citation></ref><ref id="R64"><label>[64]</label><mixed-citation publication-type="journal"><name><surname>Verma</surname><given-names>A</given-names></name>, <name><surname>Dhalmahapatra</surname><given-names>K</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <article-title>Forecasting occupational safety performance and mining text-based association rules for incident occurrences</article-title>. <source>Safety Sci</source>. <year>2023</year>;<volume>159</volume>:106014. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2022.106014</pub-id></mixed-citation></ref><ref id="R65"><label>[65]</label><mixed-citation publication-type="book"><name><surname>Dhalmahapatra</surname><given-names>K</given-names></name>, <name><surname>Singh</surname><given-names>K</given-names></name>, <name><surname>Jain</surname><given-names>Y</given-names></name>, <etal/>
<part-title>Exploring causes of crane accidents from incident reports using decision tree</part-title>. In: <name><surname>Satapathy</surname><given-names>Suresh Chandra</given-names></name>, <name><surname>Amit</surname><given-names>Joshi</given-names></name>, editors. <source>Information and communication technology for intelligent systems</source>. <publisher-loc>Singapore</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2019</year>. p. <fpage>175</fpage>&#x02013;<lpage>183</lpage>.</mixed-citation></ref><ref id="R66"><label>[66]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Vinay</surname><given-names>S</given-names></name>, <name><surname>Pateshwari</surname><given-names>V</given-names></name>, <etal/>
<source>Study of optimized SVM for incident prediction of a steel plant in India</source>. In: <conf-name>2016 IEEE Annual India Conference (INDICON)</conf-name>, <conf-date>16&#x02013;18 December 2016</conf-date>, <conf-loc>Bangalore, India</conf-loc>; <year>2016</year>.</mixed-citation></ref><ref id="R67"><label>[67]</label><mixed-citation publication-type="journal"><name><surname>Verma</surname><given-names>A</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>, <name><surname>Gaikwad</surname><given-names>V</given-names></name>. <article-title>A preliminary analysis of incident investigation reports of an integrated steel plant: some reflection</article-title>. <source>Int J Inj Contr Saf Promot</source>. <year>2018</year>;<volume>25</volume>(<issue>2</issue>):<fpage>180</fpage>&#x02013;<lpage>194</lpage>. doi:<pub-id pub-id-type="doi">10.1080/17457300.2017.1416482</pub-id><pub-id pub-id-type="pmid">29280419</pub-id>
</mixed-citation></ref><ref id="R68"><label>[68]</label><mixed-citation publication-type="journal"><name><surname>Bevilacqua</surname><given-names>M</given-names></name>, <name><surname>Ciarapica</surname><given-names>F</given-names></name>, <name><surname>Giacchetta</surname><given-names>G</given-names></name>. <article-title>Industrial and occupational ergonomics in the petrochemical process industry: a regression trees approach</article-title>. <source>Accid Anal Prev</source>. <year>2008</year>;<volume>40</volume>(<issue>4</issue>):<fpage>1468</fpage>&#x02013;<lpage>1479</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.aap.2008.03.012</pub-id><pub-id pub-id-type="pmid">18606280</pub-id>
</mixed-citation></ref><ref id="R69"><label>[69]</label><mixed-citation publication-type="confproc"><name><surname>Sarkar</surname><given-names>S</given-names></name>, <name><surname>Khatedi</surname><given-names>N</given-names></name>, <name><surname>Pramanik</surname><given-names>A</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <source>An ensemble learning-based undersampling technique for handling class-imbalance problem</source>. In: <name><surname>Singh</surname><given-names>PK</given-names></name>, <name><surname>Panigrahi</surname><given-names>BK</given-names></name>, <name><surname>Suryadevara</surname><given-names>NK</given-names></name>, <etal/>, editors. <conf-name>Proceedings of ICETIT 2019</conf-name>. <conf-loc>Cham</conf-loc>: <publisher-name>Springer</publisher-name>; <year>2020</year>. p. <fpage>586</fpage>&#x02013;-<lpage>595</lpage>.</mixed-citation></ref><ref id="R70"><label>[70]</label><mixed-citation publication-type="journal"><name><surname>Cord&#x000f3;n</surname><given-names>I</given-names></name>, <name><surname>Garc&#x000ed;a</surname><given-names>S</given-names></name>, <name><surname>Fern&#x000e1;ndez</surname><given-names>A</given-names></name>, <etal/>
<article-title>Imbalance: oversampling algorithms for imbalanced classification in R</article-title>. <source>Knowl Based Syst</source>. <year>2018</year>;<volume>161</volume>:<fpage>329</fpage>&#x02013;<lpage>341</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.knosys.2018.07.035</pub-id></mixed-citation></ref><ref id="R71"><label>[71]</label><mixed-citation publication-type="journal"><name><surname>Mishra</surname><given-names>N</given-names></name>, <name><surname>Silakari</surname><given-names>S</given-names></name>. <article-title>Predictive analytics: a survey, trends, applications, oppurtunities &#x00026; challenges</article-title>. <source>Int J Comput Sci Inf Technol</source>. <year>2012</year>;<volume>3</volume>(<issue>3</issue>):<fpage>4434</fpage>&#x02013;<lpage>4438</lpage>.</mixed-citation></ref><ref id="R72"><label>[72]</label><mixed-citation publication-type="confproc"><name><surname>Singh</surname><given-names>K</given-names></name>, <name><surname>Maiti</surname><given-names>J</given-names></name>. <source>Mining frequent patterns with temporal effect: a case of accident path analysis</source>. In: <name><surname>Singh</surname><given-names>PK</given-names></name>, <name><surname>Panigrahi</surname><given-names>BK</given-names></name>, <name><surname>Suryadevara</surname><given-names>NK</given-names></name>, <etal/>, editors. <conf-name>Proceedings of ICETIT 2019</conf-name>. <conf-loc>Singapore</conf-loc>: <publisher-name>Springer</publisher-name>; <year>2019</year>. <fpage>596</fpage>&#x02013;<lpage>603</lpage>.</mixed-citation></ref><ref id="R73"><label>[73]</label><mixed-citation publication-type="journal"><name><surname>Collins</surname><given-names>LM</given-names></name>, <name><surname>Schafer</surname><given-names>JL</given-names></name>, <name><surname>Kam</surname><given-names>C-M</given-names></name>. <article-title>A comparison of inclusive and restrictive strategies in modern missing data procedures</article-title>. <source>Psychol Methods</source>. <year>2001</year>;<volume>6</volume>(<issue>4</issue>):<fpage>330</fpage>. doi:<pub-id pub-id-type="doi">10.1037/1082-989X.6.4.330</pub-id><pub-id pub-id-type="pmid">11778676</pub-id>
</mixed-citation></ref><ref id="R74"><label>[74]</label><mixed-citation publication-type="journal"><name><surname>Baker</surname><given-names>M</given-names></name>. <article-title>1,500 scientists lift the lid on reproducibility</article-title>. <source>Nature</source>. <year>2016</year>;<volume>533</volume>:<fpage>452</fpage>&#x02013;<lpage>454</lpage>. doi:<pub-id pub-id-type="doi">10.1038/533452a</pub-id><pub-id pub-id-type="pmid">27225100</pub-id>
</mixed-citation></ref><ref id="R75"><label>[75]</label><mixed-citation publication-type="journal"><name><surname>Milch</surname><given-names>V</given-names></name>, <name><surname>Laumann</surname><given-names>K</given-names></name>. <article-title>Interorganizational complexity and organizational accident risk: a literature review</article-title>. <source>Saf Sci</source>. <year>2016</year>;<volume>82</volume>:<fpage>9</fpage>&#x02013;<lpage>17</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2015.08.010</pub-id></mixed-citation></ref></ref-list></back><floats-group><fig position="float" id="F1"><label>Figure 1.</label><caption><p id="P48">Number of Web of Science search results from 2007 to 2022 for the search &#x02018;safety analytics&#x02019; in the article title, abstract and key words.</p></caption><graphic xlink:href="nihms-1984496-f0001" position="float"/></fig><fig position="float" id="F2"><label>Figure 2.</label><caption><p id="P49">Flow diagram for the scoping review study selection process. The <italic toggle="yes">n</italic> denotes the number of articles under consideration in each step of the process.</p></caption><graphic xlink:href="nihms-1984496-f0002" position="float"/></fig><fig position="float" id="F3"><label>Figure 3.</label><caption><p id="P50">Frequency counts: (A) year of publication; (B) country where the research was conducted; (C) sector in which the research was conducted; (D) industry in which the research was conducted if the sector was manufacturing.</p></caption><graphic xlink:href="nihms-1984496-f0003" position="float"/></fig><fig position="float" id="F4"><label>Figure 4.</label><caption><p id="P51">Most frequently analyzed variables for (A) employee characteristics and (B) incident details in the included studies.</p></caption><graphic xlink:href="nihms-1984496-f0004" position="float"/></fig><fig position="float" id="F5"><label>Figure 5.</label><caption><p id="P52">Summary data for different aspects of data preprocessing: (A) methods employed for variable selection; (B) missing data; (C) text-mining preprocessing; (D) software used for the included studies. Note: TF-IDF = term frequency-inverse document frequency.</p></caption><graphic xlink:href="nihms-1984496-f0005" position="float"/></fig><fig position="float" id="F6"><label>Figure 6.</label><caption><p id="P53">Techniques employed by studies that used only one technique (in contrast to the studies that compared two or more techniques).</p></caption><graphic xlink:href="nihms-1984496-f0006" position="float"/></fig><table-wrap position="float" id="T1"><label>Table 1.</label><caption><p id="P54">Inclusion and exclusion criteria for studies considered for the scoping review.</p></caption><table frame="hsides" rules="groups"><colgroup span="1"><col align="left" valign="middle" span="1"/><col align="left" valign="middle" span="1"/></colgroup><thead><tr><th align="left" valign="middle" rowspan="1" colspan="1">Inclusion criteria</th><th align="center" valign="middle" rowspan="1" colspan="1">Exclusion criteria</th></tr></thead><tbody><tr><td align="left" valign="top" rowspan="1" colspan="1">&#x02003;Reports, observational studies, experimental studies, case studies<break/>&#x02003;Available in English<break/>&#x02003;Concerned with occupational safety<break/>&#x02003;Involve the use of analytics<break/>&#x02003;Published between 2007 and 2021 Conducted within an establishment or enterprise<break/>&#x02003;Dependent variable is injuries, fatalities or near misses</td><td align="left" valign="top" rowspan="1" colspan="1">Reviews, meta-analyses,<break/>laboratory studies, commentaries<break/>Unpublished documents<break/>Non-English language<break/>Published before 2007<break/>Conducted across multiple establishments or enterprises, industries or sectors<break/>Focused on risk or hazard assessment<break/>Only self-reported data analyzed</td></tr></tbody></table></table-wrap><table-wrap position="float" id="T2"><label>Table 2.</label><caption><p id="P55">Studies included in the scoping review that compared the performance of different analytics techniques listed by citation and the techniques that were compared.</p></caption><table frame="hsides" rules="groups"><colgroup span="1"><col align="left" valign="middle" span="1"/><col align="left" valign="middle" span="1"/></colgroup><thead><tr><th align="left" valign="middle" rowspan="1" colspan="1">Citation</th><th align="center" valign="middle" rowspan="1" colspan="1">Techniques compared</th></tr></thead><tbody><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R33" ref-type="bibr">33</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">CART, <bold>GBM</bold>, RF</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R68" ref-type="bibr">68</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1"><bold>CART,</bold> CHAID, eCHAID, QUEST</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R37" ref-type="bibr">37</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">ANN, CART, CHAID, eCHAID, negative binomial regression, <bold>neuro-fuzzy systems</bold>, QUEST</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R50" ref-type="bibr">50</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1"><bold>Deep neural network,</bold> gradient-boosted machines, extreme gradient boosting, SVM, KNN</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R34" ref-type="bibr">34</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">DT, <bold>RF</bold>, logistic regression, KNN, SVM</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R55" ref-type="bibr">55</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1"><bold>RF</bold>, SVM</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R69" ref-type="bibr">69</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">RF, <bold>SVM</bold></td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R57" ref-type="bibr">57</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1"><italic toggle="yes">k</italic>-Means clustering, <bold>SOM-based <italic toggle="yes">k</italic>-means clustering</bold>, SOM-based hierarchical clustering</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R56" ref-type="bibr">56</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">C5.0, <bold>CHAID</bold></td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R45" ref-type="bibr">45</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1"><bold>RF</bold>, SVM, KNN</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R28" ref-type="bibr">28</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">DNN, <bold>expectation-maximization-based DNN</bold>, SVM, RF</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R46" ref-type="bibr">46</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">CART-tuning algorithms: <bold>genetic algorithm</bold>, grid-based, pruned grid-based</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R38" ref-type="bibr">38</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">SVM, <bold>RF</bold>, <bold>ME</bold></td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R53" ref-type="bibr">53</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">ANN, CART, KNN, NB, <bold>RF</bold>, SVM</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R47" ref-type="bibr">47</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">CART, C5.0, <bold>RF</bold></td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R66" ref-type="bibr">66</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">Parameter optimization of SVM: <bold>grid search</bold>, genetic algorithm, BAT algorithm</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R27" ref-type="bibr">27</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">ANN, <bold>SVM</bold></td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R35" ref-type="bibr">35</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1"><bold>CART</bold>, CHAID</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R51" ref-type="bibr">51</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1">
<bold>DT, ARM</bold>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">[<xref rid="R53" ref-type="bibr">53</xref>]</td><td align="left" valign="top" rowspan="1" colspan="1"><bold>ANFIS</bold>, SVM, ANN, KNN, NB, RF</td></tr></tbody></table><table-wrap-foot><fn id="TFN1"><p id="P56">Note: Techniques in bold were the techniques deemed the best for the study data. ANFIS = adaptive neuro-fuzzy inference system; ANN = artificial neural network; ARM = Association-rule mining; BAT = metaheuristic bio-inspired algorithm; CART = classification and regression tree; CHAID = <italic toggle="yes">&#x003c7;</italic><sup>2</sup> automatic interaction detector; DT = decision tree; eCHAID = exhaustive <italic toggle="yes">&#x003c7;</italic><sup>2</sup> interaction detector; GBM = gradient boosting machine; KNN = <italic toggle="yes">k</italic>-nearest neighbors; ME = Maximum entropy; NB = na&#x000ef;ve Bayes; QUEST = quick, unbiased and efficient statistical tree; RF = random forest; SOM = Self-organizing map; SVM = support vector machine.</p></fn></table-wrap-foot></table-wrap></floats-group></article>