A robust computer vision-based approach is developed to estimate the load asymmetry angle defined in the revised NIOSH lifting equation (RNLE). The angle of asymmetry enables the computation of a recommended weight limit for repetitive lifting operations in a workplace to prevent lower back injuries. An open-source package OpenPose is applied to estimate the 2D locations of skeletal joints of the worker from two synchronous videos. Combining these joint location estimates, a computer vision correspondence and depth estimation method is developed to estimate the 3D coordinates of skeletal joints during lifting. The angle of asymmetry is then deduced from a subset of these 3D positions. Error analysis reveals unreliable angle estimates due to occlusions of upper limbs. A robust angle estimation method that mitigates this challenge is developed. We propose a method to flag unreliable angle estimates based on the average confidence level of 2D joint estimates provided by OpenPose. An optimal threshold is derived that balances the percentage variance reduction of the estimation error and the percentage of angle estimates flagged. Tested with 360 lifting instances in a NIOSH-provided dataset, the standard deviation of angle estimation error is reduced from 10.13° to 4.99°. To realize this error variance reduction, 34% of estimated angles are flagged and require further validation.

Overexertion during manual lifting is a leading cause of lower back pain and related health issues that cost the industry billions of dollars annually [

The RNLE computes the recommended weight lifted (RWL) and the lifting index (LI) as the ratio of the load weight (L) and the RWL[

Video monitoring is a non-intrusive approach to acquire the measurements for the RNLE [

In this study, we present a robust computer vision workflow to estimate the asymmetry angle of asymmetric manual lifting. Specifically, the proposed system is tasked to process two video clips taken synchronously by two video cameras of a manual lifting operation. For each video, it detects a keyframe when the lifting operation starts. Then, it applies an open-source human pose estimation software package OpenPose [

This approach exhibits several unique features: (a) It leverages an open-source software package to estimate 2D image coordinates of body skeletal joints for each camera. No re-training using the experiment data in this work was performed. (b) Cameras poses in the experiment are not available. A structure for motion procedure and manually selected matching feature points are used to estimate camera poses. (c) A robust angle estimation procedure using the estimated 3D coordinates of skeletal joints was proposed. The empirical relation between angle estimation error and the confidence score of 2D skeletal joint estimates was leveraged to predict unreliable angle estimates. By rejecting these outliers, the overall angle estimation accuracy is significantly improved.

An important technical innovation of this work is the development of an end-to-end workflow that incorporates generic computer vision software modules such as the OpenPose to provide a robust estimation of the angle of asymmetry. Existing approaches [

A potential challenge of applying a generic pose estimation module is the need to assess potential estimation error. In this work, we analyzed causes that contribute to excessive angle estimation error and leverage a confidence score assessed by the generic pose estimation package to infer the angle estimation error. Our efforts showed that an exploratory investigation of estimation error is an integral part of the proposed workflow to enhance the reliability of the outcome.

In the rest of this paper, the definition of load asymmetry angle and the computer vision-based pose estimation method are discussed in

In the RNLS, the RWL is expressed as the product of a load constant (nominal weights, about 23 Kg) and six multipliers, including horizontal, vertical, distance, angle, frequency, and coupling multipliers [

From

From

Previously, we developed a video-processing algorithm [

Several deep neural networks trained human pose estimation algorithms have been developed recently and made available as open-source software packages [

The dataset used in this work consists of two synchronous video clips taken from two video cameras from the opposing sides of a subject performing a lift operation. In each trial, the subject walks toward a shelf, pickups the object from the shelf, turns around, and walks to the destination to drop off the object. One camera is stationary capturing the entire cycle of the lifting operation. The other camera was panned horizontally by an operator to track the movement of the subject. No camera intrinsic parameters (e.g. focal lengths) and extrinsic parameters (e.g. poses, and positions) are available.

Using a lifting instance detection algorithm [

Given the 2D skeletal joint coordinates estimated using OpenPose from the respective key video frames, our next goal is to estimate the 3D coordinates of these skeletal joints. Since the camera poses are not available, our approach is to first calibrate the cameras using a

SfM is a computer vision technique that simultaneously estimates camera poses and the 3D coordinates of a set of corresponding feature points extracted from two or more views (cameras) of the same scene. It consists of the following steps:

Extract a set of visually distinct feature points from both keyframes.

Establish correspondence matching of feature points between the keyframes.

Using the epipolar constraint to estimate the fundamental matrix

Estimate (relative) camera pose (

Estimate 3D coordinates of the matching feature points.

In this work, in step a), we applied the SURF feature detector to estimate the set of feature points. In step b), we manually selected a set of matching corner feature points corresponding to static objects visible from both keyframes. Steps c) and d) are realized using Matlab computer vision toolbox functions

The decision of applying the OpenPose package to elicit 2D coordinates of skeletal joints from each view without retraining directly impacts the method described above. Specifically, the accuracy of these 2D coordinates of skeletal joints may not be sufficiently accurate to be used in SfM (step b) for estimating the camera poses.

Given the 3D coordinates of wrists and hip joints, we may proceed to estimate the asymmetry line defined in

Let _{LW}, _{RW}, _{LH}, and _{RH} be the horizontal (

Then the wrist angle _{W} = tan^{−1} (_{2}/_{1}). Similarly, one may compute the hip direction angle _{H}. Finally, the asymmetry angle

Recall that _{W} or _{H})

Hence

_{W} and _{H} in

The experiment data was adapted from a study conducted by NIOSH [

For each of the 12 initial hand locations, a subject repeats a lifting task three times. Learning effects were mitigated by instructing the subjects to perform random ordered trials. The subject will walk from a starting point toward the lifting station, lift the object with both hands, and turn-around walk back to the drop-off station near the starting point. A wire basket was used as the lifting object. The wire texture would reduce (to some extent) the visual obstruction reduction of the hands holding the object. The basket was set on a 12 × 12 cm platform to facilitate the subjects to perform lifting tasks naturally. The height of the platform was adjusted according to the designated 12 initial and locations during lifting.

A MoCap system OptiTrack (Model Flex 13, Innovative Sports, Inc., Chicago, USA) is used to track marker clusters attached to 13 positions on the body of the subject. This MoCap system claims an average accuracy of 0.7 mm in the 3D coordinate system when calibrated.

Two cameras were used to record the video data. Both cameras were mounted on tripods and were synchronized with the MoCap system. One camera was a web camera (Microsoft 1080p LifeCam, 640 × 480 pixels, 30 fps). It is placed 4 meters away from the starting point, at eye-level height, with a fixed viewing angle perpendicular to the subject’s walking path. The other camera was a camcorder (Sony, 1280 × 720 pixels, 30 fps). The second camera is located across the walking path of the subject and is controlled by a staff (seen in

Since the two cameras have different resolutions, we scale video frames of the first camera (1280 × 720) so that the same subject has about the same height (in units of pixels) as that in the second camera. This

Each camera is initially calibrated by capturing a short clip of a calibration checkerboard. MATLAB camera calibration app is applied to obtain intrinsic parameters, including focal length, of each camera. The camera poses, however, are not available since the video clips of the checkerboard were taken before the experimentation at different camera poses.

Ten subjects were recruited. These subjects were employees in the division of the Applied Research and Technology office of NIOSH in Cincinnati, Ohio. Inclusion criteria and exclusion criteria were applied to screen the subjects. Written consents were obtained according to the NIOSH-approved IRB study protocol.

The path the subject is directed to follow during each lifting trial is marked on the floor, with the initial position, the lifting location, and the finishing line identified. The subject is instructed to line up toes to each of these lines when performing such tasks. The subject will walk from the initial position toward the lifting station following the line and will lift the basket in the front with both hands. Then the subject will turn around, carrying the basket to a shelf to release the object, and walk to the finishing line. The subjects will perform these steps at their own pace and turning at their preference. The distance between these specific locations is not more than 20-step. An experiment will not be recorded until the subject became familiarized with the required steps. Each trial lasted about 15 seconds.

The frame number of the beginning of lifting (BOL) for the ground truth data was established manually by two researchers independently. BOL is defined as the instant when the basket started to move. The ground truth (MoCap) asymmetry angles then are estimated by feeding corresponding MoCap-annotated 3D skeletal joint positions into

We apply the estimated angle of asymmetry of the 360 lifting instances in the NIOSH dataset and compared it to the computed ground-truth values. The error distribution is shown in

After reviewing the corresponding videos and the 2D pose estimates, it is found that those large errors of estimated angles are often due to self-occlusion. Since the cameras are placed to face the sagittal plane of the tester, only one side of the body is exposed to a camera. As shown in

Out of the 360 lifting instances in the dataset, the number of instances that some of the joints are occluded and not detected by OpenPose are listed in

We also hypothesize that the weighted averaged confidence scores of the 2D joint position estimates may be correlated to the angle estimation error. In

We search the range between 0.1 to 0.8 of the averaged confidence score and find 0.5 yields the optimal solution. This optimization process is summarized in

In this research, an algorithm to estimate the load asymmetry angle for the RNLE was developed. This algorithm does not require customized training using local datasets and provides an assessment of the estimated angle is reliable or not. We identify that self-occlusion is the main source of estimation errors. It may be mitigated with additional cameras properly placed to avoid blind spots of viewing. Future work will focus on general-purpose training robust learning-based 3D pose estimation algorithms to provide accurate 3D coordinates of body poses in a harsh work environment and integration of estimating other RNLE parameters in one estimation process.

This study was funded in part by a NIOSH research contract 5D30118P01249. The authors thank the NIOSH research team members Menekse Barim, Maire Hayden, and Dwight Werren for their assistance with the dataset used for this study. The findings and conclusions in this study are those of the authors and do not necessarily represent the official position of the National Institute for Occupational Safety and Health (NIOSH), the Centers for Disease Control and Prevention (CDC). Mention of any company or product does not constitute an endorsement by NIOSH, CDC.

Graphic Representation of Asymmetry angle [

Sony camera view [Photo credit: CDC/NIOSH].

Life camera view [Photo credit: CDC/NIOSH].

Distribution of angle estimation errors (mean = 0.48°, Std = 0.14°)

OpenPose [

Absolute values of angle estimation error as a function of averaged confidence scores of 2D joint estimates

Distribution of angle estimation errors after excluding unreliable estimates (mean = −1.12°, Std = 4.99°)

Choosing optimal threshold value of averaged confidence score to minimize overall cost.

% Instances the joint is not detected by OpenPose

Wrist missing | Elbow missing | |
---|---|---|

Right side | 16.94% | 4.72% |

Left side | 28.61% | 15.56% |