Monitoring mininginduced seismicity (MIS) can help engineers understand the rock mass response to resource extraction. With a thorough understanding of ongoing geomechanical processes, engineers can operate mines, especially those mines with the propensity for rockbursting, more safely and efficiently. Unfortunately, processing MIS data usually requires significant effort from human analysts, which can result in substantial costs and time commitments. The problem is exacerbated for operations that produce copious amounts of MIS, such as mines with highstress and/or extraction ratios. Recently, deep learning methods have shown the ability to significantly improve the quality of automated arrivaltime picking on earthquake data recorded by regional seismic networks. However, relatively little has been published on applying these techniques to MIS. In this study, we compare the performance of a convolutional neural network (CNN) originally trained to pick arrival times on the Southern California Seismic Network (SCSN) to that of human analysts on coalminerelated MIS. We perform comparisons on several coalrelated MIS data sets recorded at various network scales, sampling rates and mines. We find that the SouthernCaliforniatrained CNN does not perform well on any of our data sets without retraining. However, applying the concept of transfer learning, we retrain the SCSN model with relatively little MIS data after which the CNN performs nearly as well as a human analyst. When retrained with data from a single analyst, the analystCNN pick time residual variance is lower than the variance observed between human analysts. We also compare the retrained CNN to a simpler, optimized picking algorithm, which falls short of the CNN’s performance. We conclude that CNNs can achieve a significant improvement in automated phase picking although some data setspecific training will usually be required. Moreover, initializing training with weights found from other, even very different, data sets can greatly reduce the amount of training data required to achieve a given performance threshold.
Because seismic monitoring offers unique insight into the Earth’s response to mining, it has become standard practice in deep underground hardrock mines, especially those experiencing rockbursting (
The standard product of seismic monitoring is an earthquake catalogue—a listing of information about each discrete seismic event including origin time, location, magnitude and often other source parameters such as radiated energy or the moment tensor. Event locations are particularly important for accurately interpreting the geomechanical significance of the seismicity because location errors propagate to other source parameter estimates. For coal mining environments, the greatest source of location error typically stems from inaccurately modelling the complex, timedependent velocity structure in which the events occur (
Over the past few decades, many automatic picking techniques have been developed for estimating body wave arrival times. Commonly used methods are based on detecting changes in observed energy, polarization, or other statistical properties of the recorded timeseries (e.g.
Several studies have also explored the use of relatively simplistic neural networks for phasepicking tasks (e.g.
CNNs have also been developed for performing tasks other than arrivaltime picking, for example determining first motions (
However, little work has been published on applying CNN phase pickers to MIS. In this study, we assess the performance of a publicly accessible CNN
The data sets used in this study were collected by five different seismic networks monitoring underground longwall coal mines in the United States. Four of the networks (data sets A–D) were deployed and operated by the National Institute for Occupational Safety and Health (NIOSH). No geographic references are made to these operations as the mines wish to remain anonymous. The fifth network (data set E) is operated by the University of Utah.
Data sets A and B were derived from two separate temporary surface deployments at the same mine, consisting of 5Hz, threecomponent MagSeis ZLand geophones sampling at 500 sps with a gain setting of 30 dB (
Due to the large volume of data, only 1251 events (10 per cent of the total) from data set A were manually processed by a single analyst, resulting in 12 527
Data set C was collected by a dense microseismic network of both in mine and surface sensors. The surface stations were 4.5Hz geophones sampling at 1000 sps, and the underground stations were 14Hz, threecomponent geophones sampling at 5000 sps. The network operated for approximately 5 yr and recorded events associated with the mining of four longwall panels. Over 210 000 seismic triggers were recorded during the network lifetime, although many of the triggers were caused by noise associated with mine operations rather than induced seismicity.
For this study, we used analystprocessed events detected during an arbitrarily selected month. During this time period, the network consisted of 8 underground and 11 surface stations covering 5.5 km^{2} (
Data set D consists of two years of data collected by a local surface network surrounding a longwall coal mine (
Data set E includes 1929 events located by the University of Utah Seismograph Stations (UUSS) that originated in the coal mining regions of Utah between 2012 October 01 and 2019 December 04 (
We evaluated three models for automatic picking of
First, verticalchannel data were extracted from each stationevent pair in the five data sets. Stations in data set D had two vertical channels, one from an L4 geophone and the other from an accelerometer so the channel on which the analyst made the
Due to data storage limitations, only triggered waveforms for data set C were archived. These triggered waveforms generally only had a very small number of samples available before the
Like
For each data set, the base CNN model was retrained, using the Southern California Seismic Network (SCSN) model’s weights as a starting point. The input arrays for training were created through the following procedure, based on that described by
We used the same training data (including the same preprocessing) to optimize a Baer picker (
The base and trained CNN, and trained Baer models were evaluated using the test data for each of the five data sets. For the test data, the traces were segmented into 400sample windows and the analyst pick index was shifted using the same process described above.
In order to estimate variability among human analysts, the test data from data sets A and B were processed by three analysts in addition to the standard analyst who originally processed the training data. Although the sample size is small, these can be used as an approximation of humanlevel performance for data sets A and B.
The trained CNN model’s weights and trained Baer model’s parameters from data set A were used on data set B rather than retraining both models, because the networks were similar and located close together.
The trained CNN performed better than the trained Baer model on all data sets. Both models performed within levels of measured human variance for all but one of the evaluation metrics for data sets A and B. From an operational sense, both models probably perform ‘well enough’ to produce meaningful event locations. For example, when using picks from either model to locate events, 75 per cent of the events from data sets A and B locate within 16 m of the location resulting from manual
The base CNN did not perform adequately for any of the data sets but was greatly improved through retraining. In order to quantify the benefits gleaned by transfer learning, and to determine how much data are required to adequately retrain the CNN starting with the SCSN weights, we explored several training restrictions using a variable number of seismic traces for training on data set A (
The improvements in training drop off sharply for both the Baer and the CNN around 200 traces and the improvements start to level off around 5000 traces, although minor improvements probably continue past the largest test data set of 10 000 traces.
We see no benefits to only allowing the outer (nonconvolutional) layers to update during training.
A CNN with no starting weights achieves the same mean absolute error as the Baer picker once the test data set reaches about 5000 traces.
For an operator of a similar (local) network deployed in/around a coal mine, the first finding has practical significance; with only 5000 manually processed traces, a CNN model can be trained to pick Parrival times with acceptable performance. Admittedly, there will still need to be human review of phase picks, particularly on events with high location residuals whose traces tend to contain multiple events, but the analyst workload would be greatly reduced compared to fully manual processing workflows.
The poor performance of the base model on all data sets is not surprising considering the significant differences between the MIS data sets and tectonic seismicity recorded by the SCSN, and certainly does not represent a deficiency of the original work. However, the CNNs failure to extrapolate to new types of seismicity and networks clearly demonstrates the CNN has not internalized the
After retraining the base CNN, we attempted to quantify the performance degradation for picking on the original SCSN test data. If little or no degradation occurred, it would mean creating a general picker, one that would perform well on a wide variety of network and event types, could be possible with this CNN architecture. When using the CNN trained on data sets A and C, the pick time residuals for the SCSN data had standard deviations around 200 per cent and 120 per cent higher than the base CNN, while the standard deviation using the CNN trained on data set D was only around 30 per cent higher. Unfortunately, the degradations indicate that the CNN is somewhat network/training data dependent and cannot be generally applied without some retraining. However, as evidenced from the excellent transferability from data sets A to B, it may be possible to generate a small number of trained models to select from based on network and waveform characteristics.
The original motivation for this research was to process the entirety of the data sets A and B deployments (36 012 events in total). We used less than 1 per cent of the data in order to train, test and evaluate the different models. We then processed the remaining events with the trained CNN from data set A in conjunction with a simple moving window scheme, which took around 5 d. Had a human analyst processed the same amount of data it would have taken approximately two years, assuming a 40hr work week.
Expanding this type of study to include additional picking algorithms, including CNN pickers which return CFs, and perhaps combining various pickers in concert, would be an interesting line of future research. A highquality, opensource package which facilitates these types of studies through a unified Application Programming Interface would be a boon to both network operators and seismology researchers. A larger, more statistically rigorous effort to quantify the variability between human analysts accounting for network geometry and type, phase and experience levels would provide important benchmarks for assessing the performance of future automated phasepicking, detection and classification models.
In the near future, we expect neural networkbased models will adequately perform the simpler tasks currently performed by seismic analysts. However, analysts will still be needed to provide oversight, quality assurance and to process particularly unusual signals.
We have shown that a CNN trained on millions of regionally recorded earthquake traces to estimate
We also demonstrated that the retrained CNN is superior to the Baer picking algorithm optimized on the same training data. Properly tuning any phase picker to a specific data set, however, remains an important consideration. Both the optimized Baer picker and the trained CNN model exhibit less variance in pick time residuals than the variance observed between human analysts. The application of improved phase picking models has the potential to reduce the time, cost and manual intervention required to extract actionable information from MIS. This will make it easier for groundcontrol experts at mines to understand the rock mass response to mining and more effectively detect and address certain types of stability issues.
This work would not be possible without the published models of
Disclaimer
The findings and conclusions in this paper are those of the authors and do not necessarily represent the official position of the National Institute for Occupational Safety and Health, Centers for Disease Control and Prevention. Mention of company names or products does not constitute endorsement by NIOSH.
Data Availability
The decontextualized event waveforms and phase picks used in this study were compiled by
Planview plot of data sets A and B. Triangles represent the locations of nodes, and rectangles outline longwall panels. The shading denotes the number of events that occurred within a 400m^{2} area.
Planview plot of data set C. Triangles indicate the location of surface sensors, while inverted triangles demarcate underground sensors. The rectangles outline the longwall panels, and shading denotes the number of events that occurred within a 1050m^{2} area.
Planview plot of data set D. Black triangles represent the locations of sensors, and rectangles outline the longwall panels. The shading denotes the number of events that occurred within a 11 500m^{2} area.
Planview plot of data set E. Black triangles mark the locations of UUSS sensors. The dark lines delineate the coal mining regions. The inset shows the region’s location within a map of the state of Utah, USA.
Zoomedin sample traces from each data set showing picks from each model/analyst.
Summation of residuals between each analyst pair for the test sets of data sets A and B. Residual statistics of mean (μ), standard deviation (
Comparison of each model pick to the analyst’s picks for the data set A test set. Statistics are shown in samples. Residual statistics of mean (μ), standard deviation (
Comparison of each model pick to the analyst’s picks for the data set B test set. Statistics are shown in samples. Residual statistics of mean (μ), standard deviation (
Comparison of each model pick to the analyst’s picks for the data set C test set. Statistics are shown in samples. Residual statistics of mean (μ), standard deviation (
Comparison of each model’s picks to the analyst’s picks for the data set D test set. Statistics are shown in samples. Residual statistics of mean (μ), standard deviation (
Comparison of each model’s picks to the analyst’s picks for the data set E test set. Statistics are shown in samples. Residual statistics of mean (μ), standard deviation (
Mean absolute error of pick time residuals for various models trained on differing numbers of traces for data set A. Baer is the trained Baer picker, CNN is the base CNN (which starts with the SCSN weights), CNN (empty) starts with random weights, and CNN (last) is the base CNN but only the last three layers of the network are allowed to update during training.
Dominant frequency, assuming a nominal sampling rate of 100 Hz versus the absolute value of the base model residuals. A onebin Kernel Density Estimate is shown for each data set.
The entirety of data set A: (top) shows the events located using picks made by the original, unoptimized Baer picker, and (bottom) shows the events located using picks made by the trained model.
The training and testing time of different models. Data set B has no training values because we simply used the models trained on data set A.
Training  Testing  

Data set  Number of training traces  CNN time (min)  CNN epochs  Baer time (min)  Number of test traces  CNN time (s)  Baer time (s) 
A  12527  17.52  14  39.52  514  0.87  0.14 
B  –  –  –  –  677  1.11  0.10 
C  17937  14.43  5  65.87  5981  9.10  0.94 
D  23990  25.10  8  92.72  7997  12.20  1.28 
E  12693  10.36  5  47.67  4231  6.48  0.63 
Analyst comparisons residuals. All statistics are in samples.
Analyst 1  Analyst 2  μ 




Standard analyst  Analyst A  − 1.120  1.235  1.785  3.169 
Standard analyst  Analyst B  − 1.055  1.332  1.937  3.119 
Standard analyst  Analyst C  − 1.020  1.468  1.826  3.904 
Analyst A  Analyst B  0.143  1.266  1.330  2.594 
Analyst A  Analyst C  0.186  1.186  1.372  2.629 
Analyst B  Analyst C  − 0.053  1.515  1.616  3.293 
Mean statistics  − 0.487  1.645  1.334  3.118 