10152854037779Aust N Z J StatAust N Z J StatAustralian & New Zealand journal of statistics1369-14731467-842X29643741589053410.1111/anzs.12183HHSPA950053ArticleAn Improved Test of Equality of Mean Directions for the Langevin-von Mises-Fisher DistributionRumchevaPavlina1PresnellBrett2U.S. Centers for Disease Control and Prevention and University of Florida
National Center for Health Statistics, U.S. Centers for Disease Control and Prevention, 3311 Toledo Road, Hyattsville, MD 20782, USA prumcheva@cdc.gov
Department of Statistics, University of Florida, 102 Griffin-Floyd Hall, Gainesville, FL 32611, USA
13320181320170942018591119135
A multi-sample test for equality of mean directions is developed for populations having Langevin-von Mises-Fisher distributions with a common unknown concentration. The proposed test statistic is a monotone transformation of the likelihood ratio. The high-concentration asymptotic null distribution of the test statistic is derived. In contrast to previously suggested high-concentration tests, the high-concentration asymptotic approximation to the null distribution of the proposed test statistic is also valid for large sample sizes with any fixed nonzero concentration parameter. Simulations of size and power show that the proposed test outperforms competing tests. An example with three-dimensional data from an anthropological study illustrates the practical application of the testing procedure.
Directional data are observed in many scientific fields and are especially common in the earth sciences. Directions are measured in various dimensions and can be represented as unit vectors or combinations of angles. The analysis of such data requires statistical methods that properly account for the structure of the sample space. In this article we represent directions as unit vectors, and thus the sample space will be the unit circle in two dimensions and the unit sphere in three dimensions. In general, the sample space of a d-dimensional direction is the unit hypersphere in ℝ^{d} denoted by 𝕊^{d}^{−1} = {x ∈ ℝ^{d} : ||x|| = 1}, where ||·|| is the usual Euclidean norm for vectors. We derive all results in this paper for the general d-dimensional case.
Let U be a random direction in ℝ^{d}; that is, U ∈ 𝕊^{d}^{−1}. Two important characteristics of the distribution of U are its mean direction μ = E(U)/||E(U)|| and its mean resultant length ρ = ||E(U)||, which measure the location and concentration of the distribution, respectively (here E(U) represents the usual componentwise expectation for vectors). The mean resultant length takes values in the interval [0, 1], with larger values of ρ indicating higher concentration: if ρ = 1, then U is a constant; and if ρ = 0, then the mean direction is not defined, as when U is uniformly distributed on the sphere.
Models for directional data often employ the Langevin-von Mises-Fisher distribution, L_{d}(μ, κ), whose density with respect to the uniform distribution on the unit hypersphere is
f(u;μ,κ)=(κ/2)d/2-1Γ(d/2)Id/2-1(κ)exp(κμ⊤u),u∈Sd-1, where μ ∈ 𝕊^{d}^{−1}, κ > 0, Γ denotes the gamma function, and I_{ν} is the modified Bessel function of the first kind and order ν (see Abramowitz & Stegun 1970, page 374). This distribution is unimodal and rotationally symmetric about its mean direction μ, with mean resultant length (see Watson 1983, page 201)
ρ=Ad(κ)=Id/2(κ)Id/2-1(κ).
The mean resultant length ρ is a strictly increasing function of the concentration parameter κ, with ρ ↓ 0 as κ ↓ 0 and ρ ↑ 1 as κ ↑ ∞. Thus, larger values of κ correspond to more concentrated distributions, with κ = 0 corresponding to the uniform distribution on the sphere and κ = ∞ corresponding to the point mass at μ.
Given several independent random samples of directions of equal concentration, one often wishes to test for equality of the corresponding population mean directions, the directional analogue of the classical one-way analysis of variance problem. Let U_{i}_{1}, . . . ,U_{ini}, i = 1, . . . , k, be k independent random samples of sizes n_{i} from L_{d}(μ_{i}, κ), i = 1, . . . , k, where μ_{i} is the mean direction of the ith sample and κ is the common unknown concentration. We wish to test the null hypothesis that the k mean directions are equal against the alternative that at least two mean directions differ; that is, we wish to test
H0:μ1=⋯=μkversusHa:μi≠μjforatleastonepair(i,j).
Before discussing how this can be accomplished with existing tests, we introduce some additional definitions and notation. For the ith sample, the resultant vector is the sum
∑j=1niUij and the sample mean vector is the average
U¯i=ni-1∑j=1niUij. The resultant length is denoted by
Ri=‖∑j=1niUij‖ and the sample mean resultant length by
R¯i=ni-1Ri. The sample mean direction is then defined as the unit vector
R¯i-1U¯i. For the combined sample with a total sample size n = n_{1} + ··· + n_{k}, we will denote the sample mean vector, resultant length, and sample mean resultant length by Ū, R, and R̄, respectively.
Watson & Williams (1956) proposed a high-concentration F-test of (3), applicable to any number of samples in any dimension, generalizing a test previously introduced by Watson (1956) for the spherical (3-dimensional) case. The Watson & Williams test is based on a decomposition of the total variation represented by 2κ(n − R) into within-sample and between-sample components, as expressed by the formula
2κ(n-R)=2κ(n-∑i=1kRi)+2κ(∑i=1kRi-R).
Following Watson (1956), Watson & Williams noted that, under H_{0},
2κ(n-∑i=1kRi∑i=1kRi-R)→d(χ(n-k)(d-1)2χ(k-1)(d-1)2)asκ→∞, where
χ(n-k)(d-1)2 and
χ(k-1)(d-1)2 are independent chi-squared distributed random variables with (n − k)(d − 1) and (k − 1)(d − 1) degrees of freedom, respectively, and
→d denotes convergence in distribution. The test statistic suggested by Watson & Williams,
W=(n-k)(∑i=1kRi-R)(k-1)(n-∑i=1kRi), is the ratio of the between-sample and within-sample variabilities, divided by the appropriate degrees of freedom. The null hypothesis of equal mean directions is rejected for large values of W, whose high-concentration asymptotic distribution under the null hypothesis is F with (k − 1)(d − 1) numerator and (n − k)(d − 1) denominator degrees of freedom.
In the circular (2-dimensional) case, Stephens (1972) incorporated a multiplicative factor into the Watson & Williams statistic (5) in order to improve the F-approximation. Mardia & Jupp (2000, page 191) suggested a similar correction for the spherical (3-dimensional) case, and Stephens (1992) provided further details for the general d-dimensional case.
Watson & Williams (1956) also developed exact tests of (3) based on the conditional distribution of R_{1} + ··· + R_{k} given R, which does not depend on the common unknown κ. Intuitively, if the value of R_{1} + ··· + R_{k} is large relative to R, then a departure from the null hypothesis of equal mean directions is indicated (see Figure 7.1 of Stephens 1992). In the circular case with two samples, Watson & Williams derived the joint density of R_{1} and R_{2} given R, and Stephens (1972) provided tables with critical values for the conditional test based on the test statistic R_{1} + R_{2} given R. For the two-sample problem in three dimensions, Stephens (1969) derived the conditional density of R_{1} and R_{2} given R using the joint distribution of R_{1}, R_{2}, and R derived earlier by Fisher (1953), and provided tables with critical values for the significance test under certain restrictions. As far as we are aware, critical values have not been calculated for higher dimensions or for comparing more than two groups (also see Mardia & Jupp 2000, page 222).
Harrison, Kanji & Gadsden (1986) suggested a multi-sample test of (3) for circular data, which was later generalized for any dimension in Mardia & Jupp (2000, page 225). This test is based on the ANOVA decomposition
∑i=1k∑j=1ni‖Uij-U¯‖2=∑i=1k∑j=1ni‖Uij-U¯i‖2+∑i=1kni‖U¯i-U¯‖2, which can be rewritten as
n(1-R¯2)=(n-∑i=1kniR¯i2)+(∑i=1kniR¯i2-nR¯2), where the terms represent, respectively, the total variation, variation within samples and variation between samples. This suggests the test statistic
A=(n-k)(∑i=1kniR¯i2-nR¯2)(k-1)(n-∑i=1kniR¯i2), which compares between-sample and within-sample variability. The null hypothesis of equal mean directions is rejected for large values of A, which is referred to its high-concentration null distribution, which is again F with (k − 1)(d − 1) and (n − k)(d − 1) degrees of freedom.
Of course the hypotheses in (3) can also be tested using the likelihood-ratio test statistic (see Mardia & Jupp 2000, page 224)
G=2(κ^a∑i=1kRi-κ^0R+nad(κ^a)-nad(κ^0)), where κ̂_{0} and κ̂_{a} are the maximum likelihood estimators (MLEs) of κ under the null and alternative hypotheses, respectively, and a_{d}(κ) = log(κ^{d/}^{2−1}/I_{d/}_{2−1}(κ)). The null hypothesis is rejected for large values of G, whose large-sample asymptotic null distribution is chi-squared with (k − 1)(d − 1) degrees of freedom. For small values of κ, Mardia & Jupp (2000, page 225) suggested incorporating a multiplicative factor into G to improve the chi-squared approximation. In the circular case, Upton (1976) retained the chi-squared reference distribution, but used an approximation of the likelihood-ratio test statistic obtained by applying the approximations and substitutions described in Upton (1973). Note however that Upton’s (1976) primary motivation was to develop a test that could be calculated easily, and specifically, one that did not require evaluation of Bessel functions, considerations which are not particularly relevant in today’s computing environment.
In this paper we present an improved version of the multi-sample likelihood-ratio test for equality of mean directions in the general d-dimensional case, assuming a common, unknown concentration. Our procedure is based on a simple, monotone transformation of the likelihood-ratio statistic which can be applied with any number of samples in any dimension. We show that, under the null hypothesis, the high-concentration asymptotic distribution of our modified likelihood-ratio statistic is F with (k − 1)(d − 1) and (n − k)(d − 1) degrees of freedom. We demonstrate by simulation that the null distribution of the test statistic is well approximated by this F distribution even with small sample sizes and moderate concentrations. Our simulations of size and power show that our test is superior to competing tests. We also illustrate the practical application of our test by analyzing three-dimensional directions of primate vertebral facets.
2. The High-Concentration Likelihood-Ratio Test
Let U_{c} = (U_{11}, . . . ,U_{1n1}, . . . ,U_{k}_{1}, . . . ,U_{knk}) denote the d × n matrix of the combined k independent random samples of sizes n_{i} from L_{d}(μ_{i}, κ), i = 1, . . . , k. The likelihood function, using (1), is
Under the null hypothesis in (3), the MLEs of the common mean direction and the common concentration κ are μ̂_{0} = Ū/R̄ and
κ^0=Ad-1(R¯) respectively (see Mardia & Jupp 2000, page 224), where
Ad-1(·) denotes the inverse of the function A_{d}(·) defined in (2). Under the alternative hypothesis, the MLEs of μ_{i} and κ are μ̂_{ai} = Ū_{i}/R̄_{i} and
κ^a=Ad-1(R∼), where
R∼=n-1∑i=1kRi. Therefore, the likelihood ratio is
For large κ, the L_{d}(μ, κ) distribution can be approximated by a spherical normal distribution in the hyperplane tangent to 𝕊^{d}^{−1} at μ (see Mardia & Jupp 2000, page 172). For testing (3), this suggests the following power transformation of (8), which is analogous to the transformation relating the likelihood ratio statistic and the usual “ratio of sums of squares” F-test statistic in a classical one-way analysis of variance (see Scheffe 1961, page 36):
P=n-kk-1(Λ-2/(n(d-1))-1).
We will show that the high-concentration asymptotic null distribution of P is F with (k −1)(d − 1) and (n − k)(d − 1) degrees of freedom. Of course, for fixed positive concentration, the null distribution of the likelihood-ratio test statistic G = −2 logΛ in (7) is approximately
χ(k-1)(d-1)2 when the group sizes n_{i} are large. Using the Maclaurin series expansion of the exponential function, we see that under the null hypothesis, for large n,
P=n-kk-1(exp(Gn(d-1))-1)=G(k-1)(d-1)+Op(n-1), and therefore P is approximately distributed as
χ(k-1)(d-1)2 scaled by ((k − 1)(d − 1))^{−1}. This scaled chi-squared distribution is also the large-n limit of the F reference distribution suggested by the high-concentration theory, and therefore the F approximation is applicable in both high-concentration and large-sample settings. Note that this is not true for the tests based on W in (5) and A in (6).
In the remainder of this section, we will be concerned with establishing the high-concentration null distribution of P. We adopt the standard notation O and O_{p} for quantities that are bounded and bounded in probability, respectively. For brevity, we will omit κ → ∞ in expressions involving
→d, O or O_{p}, taking this as implied. We will also take it as implied that statements involving O_{p} hold for the null hypothesis.
For large κ, the modified Bessel function I_{ν} (κ) can be expanded as (see Abramowitz & Stegun 1970, 9.7.1)
Using this, Schou (1978) determined that the mean resultant length has expansion
Ad(κ)=1-d-12κ+(d-1)(d-3)8κ2+O(κ-3).
Letting y = y(κ) = 1 − A_{d}(κ), or equivalently,
κ=Ad-1(1-y), from (11) it follows that, as y ↓ 0,
1κ=2d-1y+(d-3)(d-1)2y2+O(y3).
We will apply this expansion to each of the random variables κ̂_{0} and κ̂_{a}.
Under H_{0}, from (4), it follows that
2κn(1-R¯)→dχ(n-1)(d-1)2, and thus
1-R¯=Op(κ-1)1-R∼=Op(κ-1).
Since the MLEs of κ under the null and alternative hypotheses are
κ^0=Ad-1(R¯) and
κ^a=Ad-1(R∼) respectively, we separately take y = 1 − R̄ and y = 1 − R̃ in the asymptotic expansion (12) for κ^{−1}, and use (13), to obtain
1κ^0=2(1-R¯)d-1+Op(κ-2)1κ^a=2(1-R∼)d-1+Op(κ-2).
Now, from the asymptotic expansions for I_{(}_{d}_{−2)}_{/}_{2}(κ) and A_{d}(κ) in (10) and (11), and the Maclaurin series expansions of the exponential function exp(x) and geometric series (1 − x)^{−1}, we have
Now, using (15) in (9), together with the binomial series for (1 + x)^{a}, a ∈ ℝ^{1}, and the result (1 − R̃)^{−1} = O_{p}(κ) implied by (4), we have
P=n-kk-1((1-R¯+Op(κ-2)(1-R∼)(1+Op(κ-1)))(1+Op(κ-1))-1)=n-kk-1((1-R¯(1-R∼)+Op(κ)Op(κ-2))(1+Op(κ-1))-1)=n-kk-1(R∼-R¯1-R∼+Op(κ-1))=W+Op(κ-1), where W is the Watson & Williams test statistic (5). Therefore, under the null hypothesis, the test statistic P is equal to W up to a first-order approximation, and from this it follows that the high-concentration asymptotic null distribution of P is F with (k − 1)(d − 1) and (n − k)(d − 1) degrees of freedom, as stated earlier.
Using additional terms in the asymptotic expansions, one can show that
In the second term on the right-hand side of this expression, the factor (R̃ − R̄)(1 − R̄)/(1 − R̃) is of order κ^{−1} in probability and is always non-negative. The factor (1 − (d − 2)^{2})/(2(d − 1)^{2}) is zero for d = 3, and thus, as κ→ ∞, the difference between the test statistics P and W diminishes faster for d = 3 than for other dimensions. Also, notice that for d = 2, this factor is positive (0.5), and therefore the test statistic P will be greater than W for κ sufficiently large; for d ≥ 4, this coefficient is negative, and thus P will be smaller than W for large values of κ. These facts are reflected in the simulations in the next section.
It can also be shown, using the equality 1 − R̄^{2} = 2(1 − R̄)(1 − (1 − R̄)/2), that A = W + O_{p}(κ^{−1}). However, the simulation results of the next section will demonstrate that the test based on P outperforms the tests based on W, A, and G, as the null distribution of P is approximated better by the F distribution in both the high-concentration and large-sample settings.
3. Simulations and Comparisons
Simulations of size and power were carried out for the proposed test with statistic P, and for the tests based on the statistics W, G, and A described in the Introduction. We also included the modified version of W given by the formula M = (1 + 3/(8κ̂_{0}))W for d = 2 (Stephens 1972), and
M=(1-1/(5κ^02))W for d = 3 (Mardia & Jupp 2000, page 191), where κ̂_{0} is the MLE of κ under H_{0}. Since all tests under consideration are approximate, we first examine how well each of them maintains its nominal size (following Upton 1976, we refer to this as the accuracy of the test). We then compare the powers of the tests. Upton’s (1976) test for the circular case was not considered in our simulations, because simulations in Upton’s (1976) paper showed that this test is similar in power but slightly less accurate compared to Stephens’s (1972) test using the statistic M.
The performance of the tests was investigated for dimensions d = 2 and d = 3 with common mean resultant lengths ρ = 0.10, 0.20, 0.30, 0.40, 0.45, 0.50, 0.60, 0.75, 0.85, 0.95, and sample size combinations (n_{1}, n_{2}) = (10, 10), (10, 20), (20, 20), (20, 40), (40, 40) for the two-sample problem (k = 2) and (n_{1}, n_{2}, n_{3}) = (10, 10, 10), (10, 20, 20), (20, 20, 40), (20, 30, 40), (40, 40, 40) for the three-sample problem (k = 3). Limited simulations were also performed for dimensions d = 4, d = 10, and d = 100. We focus here on simulation results for values of ρ ≥ 0.45, but note that for ρ < 0.45, only the test based on P maintained a size reasonably close to its nominal size.
The generation of a random direction from the Langevin-von Mises-Fisher distribution can be accomplished with the procedures described in Best & Fisher (1979) for d = 2 and Fisher, Lewis & Willcox (1981) for d = 3. We used R (R Core Team 2015) for all computations. The R package circular (Agostinelli & Lund 2013) was used to generate random directions for the two-dimensional case, and the R package movMF (Hornik & Grün 2014) for the cases d = 4, d = 10, and d = 100.
All tests considered are invariant to rotation of the circle/sphere, so in all cases one of the population mean directions can be set to the angle 0 for d = 2, and to the direction with colatitude and longitude both equal to 0 for d = 3; we will refer to these as the reference directions. In the simulations for power, for k = 2, the second population mean direction was chosen to be separated from the reference by the angle δ = 5, 10, 15, 20, 25, 30, 35, 40 degrees. For k = 3, two scenarios were used: in the first, two groups shared a common population mean direction defined as the reference, and the third group had a mean direction δ degrees away; in the second, the three groups had coplanar mean directions, two of which formed an angle of 2δ degrees and the third was the bisector of this angle. All test statistics were simulated 100 000 times for each combination of sample size, mean resultant length ρ, and separation angle δ.
Under the null hypotheses, p-values should be uniformly distributed on the interval (0, 1). In Figures 1 and 2, Q-Q plots are shown comparing the p-values obtained by simulation under the null hypothesis to the target uniform distribution for d = 2, k = 2, (n_{1}, n_{2}) = (10, 10), and for d = 3, k = 3, (n_{1}, n_{2}, n_{3}) = (20, 30, 40). In these plots, the ordered observed p-values (vertical axes) obtained under the null hypothesis are plotted against the corresponding quantiles (horizontal axes) of the uniform distribution. The axes are truncated to 0 to 0.10 to provide more detail in the range of most interest in hypothesis testing. A test maintains its nominal size if the simulated curve coincides with the (dashed) diagonal line; the test is conservative if the curve lies above the diagonal line, and anti-conservative if the curve lies below the diagonal line.
Table 1 provides further details by giving the actual sizes of the tests for nominal sizes α = 0.01, 0.05, 0.10. Tables 2 and 3 compare the powers of the tests at nominal size α = 0.05 as the mean resultant length ρ increases from 0.45 to 0.85. Similar figures and tables for the other scenarios considered are provided in the supplementary materials.
Based on these results, we make the following observations:
With the exception of the standard likelihood-ratio test based on G (using the chi-squared approximation), the accuracy of each test improves with increasing ρ. However, the modified likelihood-ratio test based on P is consistently the best at maintaining its size, especially for values of ρ less than 0.75.
As expected from the discussion at the end of Section 2, when ρ ≥ 0.45, the Watson & Williams test based on W is conservative for d = 2, but anti-conservative for d ≥ 3. When ρ < 0.45, this test changes to being anti-conservative for d = 2, but remains anti-conservative for d ≥ 3.
The test based on M is usually better than W, but our simulations show that this is not always the case. In particular, M seems to be less accurate than W in the two-dimensional case when ρ ≤ 0.50.
For d = 2 and d = 3, the test based on A is conservative even for very large values of ρ, and its power never exceeded the power of any of the other tests when ρ ≥ 0.45 (see supplementary figures and tables).
The accuracy of the likelihood-ratio test statistic G is poor, although it does improve with increasing sample size. Among the other tests, only the accuracy of the modified likelihood-ratio test based on P seems to improve with increasing sample size, at least over the range of sample sizes considered here (see supplementary figures and tables).
As expected (see Tables 2 and 3), the power of each test increases with ρ, and, not surprisingly, the anti-conservative tests based on M and G (and on W when d = 3) have an artificial power advantage which is reflected in these tables. However, this advantage appears to be entirely explained by the discrepancy in the actual size of the tests. For example, if we use our simulation results to calibrate the tests based on P and W to have equal size, then the powers of the tests become essentially equal (see Tables 5 and 6 in the supplementary materials).
Finally, the test based on P appears to maintain its advantage over the tests based on W, M, and G as the dimension increases to d = 4, d = 10, and d = 100. For d = 4 and d = 10, the test with P again seems to perform better than the test with A. For d = 100, P and A appear to maintain their nominal size even for concentrations down to ρ = 0.10, and they also seem to have similar power. Note that for d = 4, d = 10 and d = 100, the multiplicative factor in M was calculated using only the first three terms of the expansion provided for the correction in Stephens (1992); the performance of M may improve if more terms of this expansion are used.
In conclusion, these simulation results show that the test based on P maintains its size and power across the range of concentrations usually encountered, much better than any of the other tests considered. This test does not require any special adjustments that depend on dimension or (estimated) concentration, nor does it require the use of the bootstrap or other computationally intensive techniques. There seems to be no reason that this version of the likelihood-ratio test should not be adopted in preference to the other tests in this comparison.
4. Orientations of Primate Vertebral Facets
Anthropologists have long been interested in the relationship between form and function. One specific area of research is the relationship between skeletal form and locomotion, including in particular the spine morphology and locomotor forces transmitted in the lumbar vertebrae in orthograde primates (see Johnson & Shapiro 1998).
In this example we consider the three-dimensional orientation of the last lumbar inferior facets in three species of primates: chimpanzees, gorillas, and humans. The data were collected at the Division of Anthropology and the Division of Vertebrate Zoology at the American Museum of Natural History in New York City, by Dorion A. Keifer, as part of Keifer’s (2005) master’s thesis research conducted at the Department of Anthropology at the University of Florida. Details on the methods and procedures for specimen selection and for measuring different vertebral elements can be found in Keifer’s (2005) thesis.
To illustrate the application of the proposed testing procedure, we consider the normal vector to the last lumbar right inferior facet, as this vector is considered important for the development of biomechanical models on the force direction in the facet. Table 4 contains the three-dimensional directions of the normal vectors (rounded to three decimal places) for the three samples of primates, with the sample mean directions and sample mean resultant lengths provided in the last two rows. One outlying observation in the humans sample and two in the gorillas sample were excluded from the original data set. Note that our numerical results were obtained using the original (full precision) data. Figure 3 shows the Lambert’s azimuthal equal-area projection (see Fisher, Lewis & Embleton 1987) of the data onto the plane tangent to the sphere at the point of the overall sample mean direction of the three samples combined, after applying the rotation (i) of Fisher & Best (1984).
To test the goodness-of-fit of the Langevin-von Mises-Fisher three-dimensional distribution to the data, we performed the tests on colatitude, uniformity, and normality described in Fisher & Best (1984) for each of the three samples, and we found no evidence against the null hypothesis of Langevin distribution at the 0.05 level. Before testing for equality of the mean directions, Bartlett’s test of homogeneity was performed to verify the assumption of equal concentrations (see Mardia & Jupp 2000, page 226). The test was expected to perform satisfactorily when R̄ ≥ 0.67, which is the case in this example as R̄ = 0.97. The data in the three samples are highly concentrated, and the p-value from Bartlett’s test is 0.127 indicating no evidence of unequal concentrations. We then tested for equality of the three mean directions using the test statistic P and found a statistically significant difference (p-value < 0.0001). Pairwise tests on the mean directions were then performed using P, finding statistically significant differences for all three pairs of primate species at the 0.05 level after accounting for the multiple testing with Holm’s (1979) simultaneous testing procedure: Human & Gorilla (p-value < 0.0001), Human & Chimpanzee (p-value < 0.0001), and Gorilla & Chimpanzee (p-value = 0.008).
5. Conclusion
In this paper we propose an improved version of the multi-sample likelihood-ratio test of equality of mean directions for populations having Langevin-von Mises-Fisher distributions with a common unknown concentration in the general d-dimensional case. Our test statistic is a monotone transformation of the likelihood ratio. We show that the high-concentration asymptotic null distribution of the test statistic is the F distribution. We also demonstrate that the F approximation to the null distribution is applicable in the large-sample setting. Our simulations of size and power show that the proposed test outperforms competing tests. The proposed test performs well even with small sample sizes and moderate concentrations across all dimensions considered. Further examination of the behavior of the test under departures from the assumptions of Langevin distribution or equal concentrations would provide beneficial information on its robustness. Additional simulations on the size and power of the test in dimensions higher than three would extend its practical applicability.
Supplementary Material
The first author thanks the Department of Statistics at the University of Florida, USA, the School of Public Health at the University of Sydney, Australia, and the National Center for Health Statistics at the Centers for Disease Control and Prevention, USA, for their support while the author worked on the results and preparation of this paper. The authors also thank the reviewers for their helpful comments, which significantly improved the paper.
The findings and conclusions in this paper are those of the authors and do not necessarily represent the official views of the National Center for Health Statistics, U.S. Centers for Disease Control and Prevention.
AbramowitzMStegunIA1970AgostinelliCLundU2013BestDJFisherNI1979Efficient simulation of the von Mises distributionFisherNIBestDJ1984Goodness-of-fit tests for Fisher’s distribution on the sphereFisherNILewisTEmbletonJJ1987FisherNILewisTWillcoxME1981Tests of discordancy for samples from Fisher’s distribution on the sphereFisherRA1953Dispersion on a sphereHarrisonDKanjiGKGadsdenRJ1986Analysis of variance for circular dataHolmS1979A simple sequentially rejective multiple test procedureHornikKGrünB2014movMF: An R package for fitting mixtures of von Mises-Fisher distributionsJohnsonSEShapiroLJ1998Positional behavior and vertebral morphology in atelines and cebinesKeiferDA2005Last lumbar facet and pedicle orientation in orthograde primatesMardiaKVJuppPE2000R Core Team2015SchefféH1961SchouG1978Estimation of the concentration parameter in von Mises-Fisher distributionsStephensMA1969Multi-sample tests for the Fisher distribution for directionsStephensMA1972Multisample tests for the von Mises distributionStephensMA1992On Watson’s ANOVA for DirectionsUptonGJG1973Single-sample tests for the von Mises distributionUptonGJG1976More multisample tests for the von Mises distributionWatsonGS1956Analysis of dispersion on a sphereWatsonGS1983WatsonGSWilliamsEJ1956On the construction of significance tests on the circle and the sphere
Size of the tests with statistics P, M, W, G, and A (100 000 simulated values each), examined by plotting the ordered p-values (vertical axis) versus uniform quantiles (horizontal axis), for d = 2, k = 2, (n_{1}, n_{2}) = (10, 10), and ρ = 0.45, 0.60, 0.75, 0.85.
Size of the tests with statistics P, M, W, G, and A (100 000 simulated values each), examined by plotting the ordered p-values (vertical axis) versus uniform quantiles (horizontal axis), for d = 3, k = 3, (n_{1}, n_{2}, n_{3}) = (20, 30, 40), and ρ = 0.45, 0.60, 0.75, 0.85.
Three-dimensional directions of the normal vectors to the last lumbar right inferior facets in the samples of humans (circles), gorillas (asterisks), and chimpanzees (triangles) (Table 4), projected onto the plane tangent to the sphere at the point of the combined sample mean direction using Lambert’s azimuthal equal-area projection. The angles θ and ϕ denote the colatitude and longitude of the direction respectively, after applying the rotation (i) of Fisher & Best (1984).
Actual size of the tests based on the statistics P, M, W, G, and A ( 100 000 simulated values each), at nominal size α = 0.01, 0.05, 0.10, for d = 2, k = 2, (n_{1}, n_{2}) = (10, 10), and d = 3, k = 3, (n_{1}, n_{2}, n_{3}) = (20, 30, 40), and ρ = 0.45, 0.60, 0.75, 0.85.
d = 2, k = 2, (n_{1}, n_{2}) = (10, 10)
ρ =
0.45
0.60
α =
0.01
0.05
0.10
0.01
0.05
0.10
P
0.011
0.055
0.110
0.010
0.050
0.099
M
0.024
0.088
0.154
0.011
0.053
0.106
W
0.007
0.043
0.093
0.005
0.033
0.072
G
0.017
0.073
0.135
0.016
0.066
0.122
A
0.002
0.021
0.064
0.002
0.023
0.064
ρ =
0.75
0.85
α =
0.01
0.05
0.10
0.01
0.05
0.10
P
0.010
0.050
0.099
0.010
0.050
0.099
M
0.010
0.048
0.097
0.010
0.050
0.099
W
0.006
0.035
0.077
0.007
0.041
0.085
G
0.016
0.066
0.123
0.016
0.066
0.122
A
0.004
0.030
0.072
0.006
0.037
0.081
d = 3, k = 3, (n_{1}, n_{2}, n_{3}) = (20, 30, 40)
ρ =
0.45
0.60
α =
0.01
0.05
0.10
0.01
0.05
0.10
P
0.011
0.052
0.103
0.010
0.050
0.100
M
0.014
0.064
0.122
0.010
0.050
0.100
W
0.022
0.087
0.155
0.012
0.057
0.112
G
0.012
0.058
0.112
0.012
0.056
0.109
A
0.003
0.028
0.070
0.004
0.030
0.073
ρ =
0.75
0.85
α =
0.01
0.05
0.10
0.01
0.05
0.10
P
0.009
0.049
0.099
0.010
0.051
0.100
M
0.009
0.048
0.097
0.010
0.050
0.099
W
0.010
0.050
0.100
0.010
0.051
0.100
G
0.011
0.055
0.108
0.012
0.056
0.109
A
0.005
0.036
0.081
0.007
0.042
0.088
Power of the tests with statistics P, M, W, and G (100 000 simulated values each) at nominal size α = 0.05, for d = 2, k = 2, (n_{1}, n_{2}) = (10, 10), mean directions separated by δ = 5, 10, 15, 20, 25, 30, 35, 40 degrees, and ρ = 0.45, 0.60, 0.75, 0.85.
δ
ρ = 0.45
ρ = 0.60
P
M
W
G
P
M
W
G
5
0.06
0.09
0.05
0.08
0.05
0.06
0.04
0.07
10
0.06
0.10
0.05
0.08
0.06
0.07
0.04
0.08
15
0.07
0.11
0.06
0.09
0.08
0.09
0.06
0.10
20
0.09
0.13
0.07
0.11
0.11
0.12
0.08
0.13
25
0.10
0.15
0.08
0.13
0.14
0.15
0.11
0.17
30
0.12
0.18
0.10
0.15
0.18
0.20
0.14
0.22
35
0.15
0.20
0.12
0.18
0.23
0.25
0.18
0.28
40
0.17
0.24
0.15
0.21
0.29
0.31
0.23
0.33
δ
ρ = 0.75
ρ = 0.85
P
M
W
G
P
M
W
G
5
0.06
0.05
0.04
0.07
0.06
0.06
0.05
0.08
10
0.08
0.08
0.06
0.10
0.10
0.10
0.09
0.13
15
0.12
0.11
0.09
0.14
0.17
0.17
0.14
0.20
20
0.17
0.16
0.13
0.20
0.26
0.26
0.23
0.31
25
0.24
0.23
0.19
0.28
0.38
0.38
0.34
0.43
30
0.32
0.31
0.26
0.37
0.50
0.50
0.46
0.56
35
0.41
0.40
0.34
0.46
0.63
0.63
0.59
0.68
40
0.50
0.49
0.43
0.55
0.74
0.74
0.70
0.79
Power of the tests with statistics P, M, W, and G (100 000 simulated values each) at nominal size α = 0.05, for d = 3, k = 3, (n_{1}, n_{2}, n_{3}) = (20, 30, 40), two equal mean directions and one separated by δ = 5, 10, 15, 20, 25, 30, 35, 40 degrees, and ρ = 0.45, 0.60, 0.75, 0.85.
δ
ρ = 0.45
ρ = 0.60
P
M
W
G
P
M
W
G
5
0.06
0.07
0.09
0.06
0.06
0.06
0.07
0.07
10
0.07
0.09
0.12
0.08
0.10
0.10
0.11
0.11
15
0.10
0.12
0.16
0.11
0.17
0.17
0.18
0.18
20
0.15
0.18
0.22
0.16
0.27
0.28
0.30
0.29
25
0.22
0.25
0.30
0.23
0.42
0.42
0.45
0.44
30
0.30
0.34
0.39
0.32
0.58
0.58
0.61
0.60
35
0.40
0.44
0.50
0.42
0.72
0.73
0.75
0.74
40
0.50
0.54
0.61
0.52
0.85
0.85
0.87
0.86
δ
ρ = 0.75
ρ = 0.85
P
M
W
G
P
M
W
G
5
0.07
0.07
0.07
0.08
0.10
0.09
0.10
0.10
10
0.16
0.15
0.16
0.17
0.27
0.27
0.27
0.29
15
0.32
0.31
0.32
0.34
0.58
0.58
0.58
0.60
20
0.55
0.54
0.55
0.57
0.85
0.85
0.85
0.86
25
0.76
0.76
0.76
0.78
0.97
0.97
0.97
0.97
30
0.91
0.90
0.91
0.91
1.00
1.00
1.00
1.00
35
0.97
0.97
0.97
0.98
1.00
1.00
1.00
1.00
40
0.99
0.99
1.00
1.00
1.00
1.00
1.00
1.00
Three-dimensional directions of the normal vectors to the last lumbar right inferior facets in samples of three species of primates (humans, gorillas, and chimpanzees) rounded to three decimal places. The mean direction R̄^{−1}Ū and the mean resultant length R̄ are calculated for each sample separately.