Chemical properties of coal largely determine coal handling, processing, beneficiation methods, and design of coal-fired power plants. Furthermore, these properties impact coal strength, coal blending during mining, as well as coal's gas content, which is important for mining safety. In order for these processes and quantitative predictions to be successful, safer, and economically feasible, it is important to determine and map chemical properties of coals accurately in order to infer these properties prior to mining.
Ultimate analysis quantifies principal chemical elements in coal. These elements are C, H, N, S, O, and, depending on the basis, ash, and/or moisture. The basis for the data is determined by the condition of the sample at the time of analysis, with an “as-received” basis being the closest to sampling conditions and thus to the in-situ conditions of the coal. The parts determined or calculated as the result of ultimate analyses are compositions, reported in weight percent, and pose the challenges of statistical analyses of compositional data. The treatment of parts using proper compositional methods may be even more important in mapping them, as most mapping methods carry uncertainty due to partial sampling as well.
In this work, we map the ultimate analyses parts of the Springfield coal from an Indiana section of the Illinois basin, USA, using sequential Gaussian simulation of isometric log-ratio transformed compositions. We compare the results with those of direct simulations of compositional parts. We also compare the implications of these approaches in calculating other properties using correlations to identify the differences and consequences. Although the study here is for coal, the methods described in the paper are applicable to any situation involving compositional data and its mapping.
Aside from being characterized as containing only organic and inorganic compounds, coal is a chemically, petrographically, and physically complex and heterogeneous natural material, which is not easy to fully characterize for all of its properties. Its organic part contains different macerals as the building blocks, whereas the inorganic part contains different clays, minerals, and various major and trace elements. These compositional properties of coal and how they may interact with air, for instance, can significantly influence how it should be mined, handled, processed, utilized, and even how coal-fired power plants should be designed to reduce emissions. Moreover, these properties affect coal strength, coal blending design, as well as coal's gas content, which is important for mining safety. In other words, coal's properties and composition impact all processes, from its safe mining to its utilization in different industries.
Two of the basic and most common analyses to describe properties of coal are proximate and ultimate analyses. Proximate analysis determines moisture, volatile matter, ash, and fixed carbon within the coal (
The elemental composition of coal determined by ultimate analysis can be important for various purposes. For instance, it gives an idea about maturity of coal. Carbon, oxygen, and hydrogen are, to some degree, rank-dependent elements. The highest rank coals have the highest carbon contents and the lowest oxygen contents. High-volatile B and C bituminous coals, on the other hand, have the highest hydrogen contents, with decreasing amounts as rank increases. Therefore, ratios of these elements can indicate the rank of the coal and its coalification degree.
The results of ultimate analysis can also be used in different correlations to predict various properties of coal.
The ultimate analysis is a
Compositional data analysis techniques have been applied in different problems related to coal and to different areas related to earth and environmental sciences (e.g. geochemistry, in particular).
As with any non-compositional data, the knowledge of the spatial distribution of compositional data is also often desirable as these properties of any commodity, e.g. coal, may have safety and health, as well as economic and environmental implications in exploiting and utilizing them. Geostatistics is a powerful technique for investigation of spatial relationships of regionalized variables and for modeling them (e.g.
In this paper, we map the ultimate analyses parts of the Springfield coal from an Indiana section of the Illinois basin, USA. If determined on an “as-received” basis, weight percentages of the ultimate analyses parts, C, H, N, S, O, and ash-yield, sum to 100%—i.e. they are constrained data in a simplex, not data in real space that can take values from − ∞ to + ∞, and they also carry relative information, which is essential information of interest. Therefore, they need a special pre-processing for a mathematically adequate mapping using geostatistical methods. In this work, we use isometric log-ratio transformed compositions in sequential Gaussian simulation. We compare the results from this approach with those of direct simulations of compositional parts using sequential Gaussian simulation. We also compare the implications of these approaches in modeling heat value data for this coal to identify the differences and potential error incurred. In this study, the approach is demonstrated for coal. However, the methods described in the paper are applicable to any situation involving compositional data and its mapping.
The studied area of the Springfield coal seam is located in the Indiana section of the Illinois Basin, USA. The Springfield coal seam is in the Petersburg Formation of the Carbondale Group and is one of the most important coal seams of economic value along with the Danville, Hymera, Herrin, and Seelyville coals (
The ultimate analysis data used in this work was compiled from the Indiana Geological Survey Coal Stratigraphic and Coal Quality Databases (
The data given in
In most situations, for data that is concerned with Euclidian distance, classical statistics is applicable. However, a composition is a vector that carries information about the relative importance of the measured parts in the whole. Since the parts in a composition are relative to each other in the whole, this gives a composition an intrinsic multivariate property. This multivariate property can be attained by means of a comparison between parts as pairs for their importance in the whole (
The compositional bi-plot that is based on the centered log-ratio (clr) transformation is one of the most popular ways to jointly represent the variables due to its connection to principal component analysis (PCA) (
Geostatistics applies to attributes defined in real number space with the assumption of Euclidian distance (
In this work, our objective was to map ultimate properties of Springfield coal shown in
The details of sequential Gaussian simulation (e.g.
In addition, central to its application, sequential Gaussian simulation requires that the data follow a univariate normal distribution. Since none of the parts met this requirement, as seen in
The workflow adopted for the compositional geostatistical analysis and mapping of the data is shown in
In this work, ilr transformation was used to produce a new set of variables in an unconstrained space with an orthogonal coordinate system, where standard geostatistical methods can be applied without violating the mathematics. The ilr transformation was selected because it avoids singular covariance matrices and sub-compositional incoherence (
The first step of ilr transformation is to generate a class of interpretable ilr variables, called balances. As mentioned earlier, a D-part composition whose values sum up to a constant is in a simplex of D-1 dimension due to the relativeness of measurements and thus the multivariate nature of the data. Therefore, for a 6-part system, 5 ilr balances were generated through binary partition of parts of the composition by using a binary partition matrix,
Performing this transformation for all 54 spatial data locations with 6 parts gave 5 ilr balances, each of which had 54 spatial data points that could be used for analysis and modeling.
The next step was to do spatial modeling of ilr balances. This required two considerations; the first one was to check the normality of ilr balances, as before, to be able to proceed with variogram analysis and then with sequential Gaussian simulation. The histograms of ilr balances were not normally distributed, and thus variogram analyses were performed after normal-score transformation (
The second consideration was to understand if there was spatial cross-correlation between ilr balances that would require sequential Gaussian co-simulation, instead of sequential Gaussian simulation. In order to explore cross-correlations, structural analysis was performed by generating cross-variograms between all normal-score transformed ilr balances (
The next and final step was to transform ilr balance realizations back to the simplex (percentages). The back transformation was performed using Eq. (4). In this equation,
Application of direct simulation and compositional geostatistical simulation though ilr transformation generated 100 realizations for each of the parts of the ultimate analysis—i.e. 1200 total realizations for direct and compositional approaches combined. The obvious question is whether these are different from each other and what the consequences are of using direct simulation compared to the more complex compositional approach.
Finally, in order to further explore the difference between direct geostatistical simulation using raw compositions and the ones with isometric log ratio transformation, compositional distances between the results were computed. Here, the interest is to exemplify the relative variation of the simulated parts between the two sample sets (raw versus compositional simulations) at each node location shown in the maps. For this purpose, two measures of distance were used. The first one was the Euclidian distance of the ilr balances (
In these equations,
The compositional distance maps are illustrated using realization 25 and E-type, and are given in
Aside from a purely geostatistical modeling of compositional data point of view and a comparison of the two modeling approaches, the next question is what the implications of the differences may be from an engineering point of view. For instance, what would the consequences be if the parts modeled by direct simulation rather than compositional were to be used in correlations to predict other properties? In order to explore this aspect of geostatistical modeling of compositional data, a correlation was established to estimate an as-received calorific value (CV) of Springfield coal by using its ultimate properties. The correlation was established by using 265 data points available in the Indiana Coal Quality Database (
In order to estimate calorific value distribution of Springfield coal, all parts realizations generated using direct and compositional simulation were used in the correlation. To continue with the same realization as an example,
In order to compare which one of these mapping techniques—direct or compositional simulation of parts—and the resultant data distributions give closer estimates to the maps of calorific value that is modeled directly from the pointwise data, measured calorific values of initial 54 data points collocated with ultimate analysis parts were modeled by sequential Gaussian simulation. In this process, variograms of 54 pointwise calorific values were modeled and 100 realizations were generated by using exactly the same simulation parameters, including the initial seed number, as in the simulation of ultimate analysis parts. For comparison, the realizations corresponding to the order of the 11 realizations presented before were selected as the benchmark data.
In this equation, (
Although the methodology explained in this paper is for coal and the data is ultimate analysis data, the method is general and is applicable to any setting and any compositional data. Analysis and modeling of ultimate data showed that the compositional modeling approach is coherent and does not violate mathematical constraints that are imposed on the measurements or the basis that the data is reported by. As important as mathematical coherence, using results of compositional modeling in correlations, rather than direct modeling, to predict non-compositional parameters produces more accurate results. This may have engineering and safety consequences. In this paper, one such application was demonstrated for predicting in-situ calorific value of coal due to data availability. This has economic and resource utilization benefits. However, for instance, methane content of coal seams can be predicted using proximate analysis or maceral compositions in correlations by using the same approach. Similarly, self-heating temperature can be predicted using correlations involving composition of coal. Accurate prediction of such parameters has the potential to improve safety and productivity of mines and also has the potential to guide engineers better in the design of mines and of methane and self-heating control systems. Therefore, going one extra step in modeling through compositional modeling, when the data requires such, has multiple benefits to improve the economics and safety of operations.
In this paper, a compositional modeling approach through ilr transformation was demonstrated to map ultimate properties of Springfield coal in the Indiana section of the Illinois basin. The results were compared with the results of direct spatial modeling of raw measurements. Further, the results were used in a correlation to predict calorific value distribution within the coal seam for comparison purposes.
This study demonstrated that by using compositional simulation using ilr and sequential Gaussian simulation, the benefits of both techniques are taken advantage of. Sequential Gaussian simulation offers the advantages that the errors are conditionally unbiased, the histogram of each realization is the same as the realization of the data, and multiple realizations generated stochastically can be used for uncertainty assessment. On the other hand, compositional modeling ensures that mathematics and the limits of the data are not violated. In our study, compositional simulation gave exactly 100% as the sum of values of parts, regardless of the type of the map and how many realizations are included in assessment. Direct modeling of parts using raw data, on the hand, created as much as ~ ± 21% error in sum of the part values in individual realizations due to direct variogram modeling of compositional data.
In order to exemplify the relative variation of the simulated parts using raw compositions and the ones with isometric log ratio transformation, compositional distances between the results were computed. The results showed that the distances were larger in areas where hard data was absent, but geostatistically simulated. This indicates that using compositional data treatments prior to geostatistical simulations helps eliminating dispersion of data and spurious spatial fluctuations of compositions.
Application of part values modeled using both approaches were used in a correlation to predict calorific value of coal. The results were compared with modeling of collocated pointwise calorific value data. Results showed that the data spread was closer to the data range from calorific value maps predicted using parts from compositional modeling, which showed less prediction error compared to direct geostatistical modeling of parts.
Using, or not using, compositional modeling when the data calls for such may have important implications for safety and engineering consequences. For instance, besides other non-compositional parameters, methane content of coal seams can be predicted using proximate analysis or maceral compositions in correlations. Likewise, self-heating temperature can be predicted using correlations involving composition of coal. Better prediction of such parameters and their distributions within the seam has the potential to improve safety and productivity of mines and to guide engineers towards more efficient design of methane and self-heating control systems.
For the National Institute for Occupational Safety and Health (NIOSH), the findings and conclusions in any paper are those of the authors and do not necessarily represent the views of NIOSH. Mention of any company name, product, or software does not constitute endorsement by NIOSH. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Profs. Vera Pawlowsky-Glahn, Juan Jose Egozcue and Karel Hron are gratefully acknowledged for reviewing earlier version of this paper and for making useful comments.
General stratigraphy of Indiana's Pennsylvanian system of formations and coal members (after
Posting of the ultimate properties of Springfield coal within the studied area. Values are in %.
Frequency histograms of the raw ultimate analysis of Springfield coal within the studied area.
Compositional clr bi-plot of the ultimate analysis parts from 54 samples presented in
Methodology applied in this work and the workflow for each approach.
Progression of variogram building (for carbon %) for direct geostatistical modeling.
Progression of variogram building for normal scores of ilr balances to be used in sequential Gaussian simulation in compositional modeling.
Structural functions, auto- and cross-variograms, between normal scores of ilr balances.
Maps of ilr balances (realization 25) generated using sequential Gaussian simulation.
Maps of ultimate analysis parts (realization 25) using direct simulation (top row) and compositional simulation (bottom row). Values are in %.
E-type maps of C, H, and O generated based on 100 realizations.
Sum of cell values of parts of E-type maps (A) as well as those of 11 randomly selected realizations (B), as a comparison between direct and compositional simulation.
Compositional distance maps computed using
Calorific value distribution of realization 25 using parts maps generated through direct and compositional simulation.
Calorific value distribution of cells of 11 realizations calculated using the results of direct and compositional simulation of ultimate analysis parts.
Distribution of data from cells of 11 realizations modeled using 54 collocated calorific value points with the ultimate analysis data (A), and the errors incurred by using the results of compositional and direct modeling of parts in correlation (B).
Basic standard descriptive statistics of the parts of ultimate analysis based on raw data just for exploration purposes.
| Mean | Std. dev. | Upper | Median | Lower | Min | Max | |
|---|---|---|---|---|---|---|---|
| Ash (%) | 8.8 | 2.0 | 10.2 | 8.5 | 7.6 | 4.7 | 13.2 |
| C (%) | 64.0 | 2.6 | 65.2 | 63.6 | 62.7 | 57.7 | 71.3 |
| H (%) | 5.6 | 0.3 | 5.8 | 5.6 | 5.5 | 4.8 | 6.0 |
| N (%) | 1.4 | 0.2 | 1.4 | 1.4 | 1.3 | 1.0 | 1.8 |
| O (%) | 16.7 | 2.6 | 18.2 | 16.8 | 15.1 | 9.9 | 21.6 |
| S (%) | 3.3 | 1.2 | 4.1 | 3.3 | 2.7 | 0.8 | 6.0 |
Analytical variograms and their parameters of normal scores of each of the raw parts (in %).
| Component | Model | Nugget | Sill-nugget | Range (ft) |
|---|---|---|---|---|
| C | Spherical | 0.01 | 0.84 | 5618 |
| H | Spherical | 0.01 | 0.96 | 3040 |
| N | Spherical | 0.01 | 0.81 | 5408 |
| O | Spherical | 0.00 | 0.85 | 2500 |
| S | Spherical | 0.01 | 0.64 | 2700 |
| Ash | Spherical | 0.01 | 0.74 | 2665 |
Analytical variograms and their parameters of normal scores of ilr balances.
| ilr | Model | Nugget | Sill-nugget | Range (ft) |
|---|---|---|---|---|
| 1 | Spherical | 0.01 | 0.98 | 6300 |
| 2 | Spherical | 0.25 | 0.75 | 3050 |
| 3 | Spherical | 0.01 | 0.96 | 3100 |
| 4 | Spherical | 0.02 | 0.98 | 5000 |
| 5 | Spherical | 0.15 | 0.85 | 6700 |
Basic statistics of benchmark calorific value data and of the errors (
| Variable | Cells | Min. | Max. | Mean | Std. dev. |
|---|---|---|---|---|---|
| Calorific value (Btu/lb) – benchmark data | 231,000 | 10,589 | 12,617 | 11,526 | 365.2 |
| Rel. error (%) – compositional sim. | 231,000 | − 18.1 | 10.5 | − 0.703 | 3.477 |
| Rel. error (%) – direct sim. | 231,000 | − 24.8 | 20.6 | − 0.505 | 5.269 |