Discounting is the process by which outcomes lose value. Much of discounting research has focused on differences in the degree of discounting across various groups. This research has relied heavily on conventional null hypothesis significance testing that is familiar to psychology in general such as

Delay discounting describes the process by which delayed outcomes lose value (

There are several commonly used direct measures and several derived measures of the degree of discounting. The most common direct measure of discounting are indifference points. Indifference points are the immediate amount of an outcome that is equal to a larger but delayed amount of that outcome. As the delayed outcome has a discounted value, this smaller but immediate amount has the same reinforcing efficacy as the delayed outcome. A popular derived measure of discounting is area under the curve (AUC;

Another technique to describe discounting data is to quantify the relation between delay and the value of an outcome by fitting theoretical models of delay discounting (

Often the goal of obtaining these measures of discounting is to compare the degree of discounting based on some other question of interest (e.g., group differences). These research questions about similarities or differences in the degree of discounting typically rely on the derived measures of discounting described above. For these sorts of questions, statistical comparisons are made on AUC, free parameters obtained from the theoretical models of discounting, or ED-50.

The search for differences in delay discounting between groups has relied heavily on traditional inferential null-hypothesis statistical testing (NHST). With NHST, researchers have generally have attempted to determine how compatible measured group differences in the degree of discounting are with the null hypothesis that there are no differences (e.g.,

As interest in discounting has grown, researchers are asking more complex research questions (e.g., simultaneous between-group and within-subjects differences in discounting, changes in the degree of discounting over time) that often rely on more complex study designs (e.g., clinical controlled trials, longitudinal designs). The conventional statistical tests used to compare delay discounting either between or within groups may no longer be adequate in these circumstances. For example, (

One test—the Generalized Estimating Equation (GEE)— can address many of the above mentioned shortcomings of conventional tests. A GEE is a semi-parametric test that accounts for the inter-correlation of repeated measures, similar to the random intercept of a mixed-effect model (

The goal of this study is to compare the analyses of discounting data with conventional statistical methods—specifically a mixed-effects ANOVA—and GEE methods. Generalized Estimating Equations have not been widely used in delay discounting research. Although they are similar to mixed-effects ANOVAs, it is possible that the output of a GEE could be drastically different than a mixed-effects ANOVA output for discounting data. A Monte Carlo method was used to facilitate the comparisons between the statistical techniques. Monte Carlo methods use various forms of random sampling to simulate data (

For this study, we used the discounting data from

We made two predictions with this study. The first prediction was that the conventional statistical tests and the GEE will produce similar patterns of results. The conventional statistical tests and the GEE rely on similar assumptions and statistical models. Therefore, it is reasonable to assume that the different approaches will provide a similar pattern of results. As described above, the GEE is more appropriate for discounting data because it relatively more robust to deviations in the standard assumptions of the tests and it was designed to account for clustering within the data (i.e., data from a single participant). In other words, the GEE will produce a similar pattern of results but is a more appropriate statistical test for discounting data. We also predicted that when the simulation ignored the experimental group membership (i.e., simulating a null-condition by random assignment of participant to the groups), the GEE would be less likely to find significant group differences. In other words, when both groups are sampled from the same population, then the null hypothesis (i.e., there are no differences between the groups) is true. Therefore, any significant group differences when the simulation randomly assigns participants to a group will be a Type I error (false-positive).

The data for this study are from

In this data set, the two outcomes that were discounted were the delay to arriving at the destination and the probability of being in a car crash. The delays to the destination were 30 seconds, 3 minutes, 15 minutes, 1 hour, and 3 hours. The probabilities shown to the participants of a car crash were 10%, 1%, 0.3%, 0.1%, and 0.03%. As is convention, the probability factor was analyzed as odds against the car crash (e.g., the likelihood of arriving at the destination safely). Participants rated their likelihood of responding to a text message for each of the 25 pairwise combinations of the delay to the destination and the probability of a car crash.

For the purpose of this study, participants from

Conventional parametric statistical tests, conventional non-parametric statistical tests, and a GEE were used to compare differences in the degree of discounting. With each statistical analysis, only certain comparisons were made with the

To conduct the parametric and the non-parametric statistical analyses, we used AUC as our measure of discounting. For each participant, AUC for probability discounting was calculated for all five of the delays to the destination. Additionally, the AUC for delay discounting was calculated for all five of the odds against a car crash. These AUC values are displayed as the marginal figures

The goal of the parametric analyses was to duplicate a mixed-effects ANOVA with the relevant post-hoc tests. In lieu of ANOVAs on AUC, we conducted linear mixed-effects regression models on AUC (lme function; no random effects specified). The mixed-effects models were selected because the post-hoc tests of interest were burdensome to conduct when using the built in R functions for a mixed-effects ANOVA. The key difference between a mixed-effect regression model and a mixed-effects ANOVA is that the regression model provided results in terms of differences in AUC from an arbitrary reference (i.e., an intercept parameter) and the ANOVA provided results in terms of differences from the mean of AUC. As a supplemental step, we used the R “anova” function to convert the linear mixed-effects model results into mixed-effects ANOVA results.

Two separate models were specified for the parametric tests: one model for AUC of delay discounting and one model for AUC of probability discounting. The model for AUC of delay discounting was specified to include the main effects of group, probability, and the interaction of group and probability. The model for AUC of probability discounting was specified to include the main effects of group, delay, and the interaction of group and delay. For each mixed-effects model, post-hoc comparisons were completed with the “emmeans” and “contrast” functions (emmeans package;

The non-parametric statistical testing was relatively limited compared to the parametric tests and the GEE, because there is no non-parametric equivalent for mixed-effects models. For this reason, only the results of non-parametric post-hoc tests can be directly compared to the results of the parametric and GEE post-hoc tests. The post-hoc tests for the non-parametric analyses were Mann-Whitney U tests (“wilcox.test” function with the paired parameter set to false). The non-parametric statistical tests were included in this study because the data from

We also conducted four Friedman’s tests to provide information that is related to the parametric tests’ and GEE’s main effects of delay on probability discounting and probability on delay discounting. A Friedman’s test (“friedman.test” function) is the non-parametric equivalent of a repeated-measures ANOVA. The Friedman’s tests can only provide the main effects within each group. Whereas, the main effects in the parametric tests and the GEE collapse the data across the groups. Therefore the results of the Friedman’s provide a limited amount of similar—but not identical—information as the main effects in the parametric tests and the GEE. One Friedman’s test per group was used to determine in if there were differences in delay discounting across the odds against a car crash. Two additional Friedman’s tests were used to determine there were differences in probability discounting across the delays to the destination.

The GEE analysis was conducted with the “geeglm” function (geepack R package;

The Monte Carlo simulations relied on randomly selecting participants to create a simulated sample and then conducting the relevant conventional parametric tests, non-parametric tests, and the GEE on that simulated sample. Participants were randomly selected, with replacement, via the “sample” function in R. When a participant was selected for inclusion, all of their experimentally obtained data were copied into the current simulated sample. After being selected for inclusion in a simulated sample, a new participant number was assigned to the currently selected data so that if a participant was randomly selected twice for a single simulated sample, then the GEE would treat each selection as a separate participant cluster.

Two separate Monte Carlo simulations were conducted. The first Monte Carlo simulation maintained the experimental group memberships. For these simulations, participants were randomly selected from within each group. For the second Monte Carlo simulation, participants were randomly selected with replacement and then assigned to either the high-TWD group andor the low-TWD group. After the total sample had been created by randomly selecting participants, each analysis was conducted on the simulated sample.

After the analyses on all of the simulated data sets were completed, we calculated two main measures to describe the pattern of results across the simulated data sets. These measures were calculated separately for the simulations in which group membership was maintained and in which group membership was randomly assigned. The first measure was the number of simulations in which a statistically significant effect was found. This measure was calculated for each comparison in the parametric tests, the non-parametric tests, and the GEE. For example, the number of data sets in which the GEE determined there was a statistically significant effect of delay on probability discounting was calculated. A second measure was calculated to assess the degree to which the results of the parametric statistical tests agreed with the results of the GEE. For each simulated data set, the results of the parametric analyses were compared to the resultsof the GEE. For each comparison of interest (e.g., main effects, interactions, post-hoc tests), the statistical significance of the parametric test was compared to the statistical significance of the GEE. If the statistical significance was in agreement across both the parametric test and the GEE, then that test was counted as in agreement. If one test analysis indicated a statistically significant effect for that simulated sample and the other analysis indicated a non-significant effect than that test was not counted as being in agreement. The agreement scores were then counted across all of the simulated samples for each specific comparison of interest. This measure of agreement was calculated only when participants were resampled from within the existing groups. Agreement scores were not calculated for the non-parametric tests because only the post-hoc Mann-Whitney U tests could be compared to the conventional parametric and GEE post-hoc tests.

Two sets of 1,000 samples were simulated via a Monte Carlo method resampling procedure. In the first set, samples were simulated by randomly selecting participants and including their data with the membership of each participant in either the high- or low-TWD being maintained. In the second set of simulated samples, participants were randomly selected and then assigned to either the high- or low-TWD based on the resampling procedure. To determine if the random sampling was adequate, Kullback-Leibler divergence values were calculated for both groups in each set of simulated samples. Kullback-Leibler divergence provides a metric for the difference between two distributions. The measure is bound between 0 and 1, lower values indicate more similar distributions and higher values indicate more dissimilar distributions. The obtained frequency of each participant in the simulated samples was compared to the expected uniform distribution. When group memberships were maintained, the divergence of the high-TWD samples was 0.0004 and the divergence of the low-TWD samples was 0.0006. When participants were randomly assigned to groups, the divergence for the high-TWD samples was 0.0009 and the divergence of the low-TWD samples was 0.0014. Overall, these low divergence values indicate that participants were almost uniformly distributed across the groups in the simulated samples.

Only the GEE was capable of providing a single statistical comparison to determine if there was an overall difference in discounting based on whether a person was more likely to text while driving (high-TWD) or less likely (low-TWD). In the experimentally obtained data set, there were significant differences in indifference points based on group membership (χ^{2} = 276.2,

The Monte Carlo simulations allow for the assessment of how likely the results obtained from the experimental data sets were due to a chance sampling. When participants were resampled from within their respective experimental groups, there was a significant difference in the average indifference points between the groups 988 times. When participants were randomly assigned to the experimental groups, there was a significant difference in the average indifference points between the groups 54 times.

The effect of delay on probability discounting across both groups combined were assessed statistically with the GEE and the conventional parametric tests. The non-parametric Friedman’s tests were also conducted with each group, which—as described above—do not provide the exact same information as the conventional parametric and GEE tests which collapse the data across the groups. For the experimentally obtained data, with the GEE there was a significant effect of delay on indifference points for probability discounting (χ^{2} = 131.2

The Monte Carlo simulation provided measures of how likely each statistical comparison would lead to a significant effect as well as the how often the GEE and the conventional parametric statistical tests agreed. When participants were selected from within groups, there was a significant effect of delay to the destination on the degree of probability discounting 992 times for the GEE, 1,000 times for the conventional parametric test, as well as 1,000 times for non-parametric tests for both the high- and low-TWD groups. The pattern of results obtained from the GEE was the same as the pattern obtained with the conventional parametric tests 992 times, indicating that the GEE and conventional parametric test were providing similar results. When participants were randomly assigned to groups, there was a significant effect of delay to the destination on the degree of probability discounting 994 times for the GEE, 1,000 times for the conventional statistical test, as well as 1,000 times for both the non-parametric tests for both the high- and low- TWD group. Across both sets of simulations, all three statistical techniques produced similar results.

The effect of odds against a car crash on the degree of delay discounting shared identical features (with different variables) to the analyses described above. Therefore, the nature of the statistical comparisons was the same. In the experimentally obtained data, in the GEE there was a significant effect of odds against a car crash on indifference points for delay discounting (χ^{2} = 4935,

The Monte Carlo simulation provided measures of how likely each statistical comparison would lead to a significant effect as well as the how often the GEE and the conventional parametric statistical tests agreed. When participants were selected from within the groups, all four statistical tests found significant effects for all 1,000 simulated samples. The pattern of results obtained from the GEE was the same as the pattern of results obtained with the conventional parametric for each of the 1,000 simulated samples. When participants were randomly assigned to groups the comparisons were statistically significant for the GEE, the conventional parametric test, and the non-parametric tests for all 1,000 simulated samples.

The interaction of whether a person was more likely to text while driving or less likely to text while driving (i.e., group membership) and the effect of delay to the destination on probability discounting could be assessed statistically with the GEE and the conventional parametric statistics. For the experimental data set, with the GEE there were differences across the groups in the effect that delay to the destination had on indifference points (i.e., a significant interaction; χ^{2} = 26.4

The Monte Carlo was again used to determine the likelihood of a significant interaction of group membership and the delay to the destination as well as the degree of agreement between the GEE and the conventional parametric tests. When participants were selected from within groups, there was a statistically significant interaction detected with the GEE 369 times and a significant interaction detected with the conventional parametric test 708 times. The Monte Carlo simulation revealed that the GEE and the conventional statistical test only agreed 383 times. When particpants were randomly assigned to the groups, there was a signiciant interaction detected only 66 times with the GEE and only 93 times with the conventional statistcal test.

The interaction of whether a person was more likely to text while driving or less likely to text while driving (i.e., group membership) and the effect of odds against a car crash on delay discounting could be assessed statistically with the GEE and the conventional parametric statistics. For the experimentally obtained data, with the GEE there were significant differences across the groups in the effect that odds against a car crash had on indifference points (i.e., a significant interaction; χ^{2} = 26.4,

The Monte Carlo simulations were used to derived the same measures described above. When participants were selected from within groups, there was a statistically significant interaction detected 971 times with the GEE and 724 imes with the convetional statistical tests. The GEE and conventional parametric tests agreed for 745 of the 1,000 simulated samples. When participants were randomly assigned to the groups, there was a significant interaction detected only 78 times with the GEE and only 88 times with the conventional statistical test.

The post-hoc comparisons of the degree of probability discounting between the groups could be examined with the GEE, the conventional statistical tests, and the non-parametric tests.

In this study, we compared the effectiveness of three different statistical approaches in describing discounting data using Monte Carlo simulations. The Monte Carlo simulations were used to create data sets from an experimentally obtained data set. Participants were randomly selected with replacement and placed into simulated data sets. Statistical analyses were then conducted on the simulated data sets. In one set of simulations, participants were only randomly selected for a group if they were members of that group in the experimental data set. In the second set of simulations, participants were randomly assigned to a group. Across the three analysis types, the Monte Carlo simulations found a pattern of results that was generally similar to what was obtained with the experimental data set. Where it was possible to assess the agreement between significant effects in the GEE analysis and significant effects in the conventional parametric tests, the two types of analysis were largely in agreement. Finally, when participants were randomly assigned to groups (i.e., an imposed null-effect of group membership) the analyses found significant effects in only a small number of the simulated samples which was inline with the expected Type I error rate of .05.

The GEE provided a similar pattern of results as conventional mixed-effects ANOVAs. Across the 1,000 simulations in which participants were resampled from within their respective experimental groups, the GEE and the conventional parametric statistical tests had a high degree of agreement in terms of whether or not statistically significant effects were detected in the simulated samples. The GEE and conventional tests only had a low degree of agreement for the comparison that was testing for an interaction between group membership and delay to the destination on the degree of probability discounting. Additionally, the experimental data were not normally distributed and it is reasonable to assume that the majority of the simulated samples did not have normally distributed data. Despite this violation, both the GEE and the conventional parametric tests reliably found statistical differences across the simulated samples and there was a high degree of agreement in the pattern of results found by the tests. It is therefore reasonable to conclude that the GEE is in line with more conventionally used statistical tests and can be a useful and reliable tool for NHST with discounting data (e.g.,

The results of the GEE and the conventional statistical tests diverged when an interaction between group membership and delay to the destination on the degree of probabilty was tested. There are several possible reasons for this pattern of disagreement. It is possible that for this specific comparison, the true difference between the groups was very small so the simulation procedure could not reliably produce samples in which a difference existed. If such a scenario occurred, the GEE would likely be more conservative than the conventional mixed-effects ANOVA because the GEE is accounting for the entirety of the simulated data set all at once while the mixed-effects ANOVA only accounts for some features of the data.

An alternative explanation for this discrepancy might lie in the distributions of the obtained indifference points and the derived AUC.

Condsidering that GEE appears to produce patterns of results that are generally similar to more conventional parametric tests it is important to note that the GEE has several properties that make it potentially superior to the conventional mixed-effects ANOVA and non-parametric statistics. First, a GEE can be easily be conducted on the obtained indifference points instead of a derived measure like AUC or an output from model fitting. This allows the GEE to compare differences across measures of interest (e.g., group membership) without making any underlying assumptions about the nature of the data. There are two key benefits of conducting an analysis on the indifference points instead of on a summary measure or the output from model fitting. The first benefit is that the indifference points provide a richer data source. When calculating AUC or a model output, several indifference points are collapsed into a single measure. By collapsing the data, both degrees of freedom for later analyses and variability within that collapsed data is lost. Any time degrees of freedom or data richness are lost future comparisons become more difficult. For example, imagine a scenario in which a researcher had two subjects and obtained seven indifference points per subject. It would not be possible to compare the degree of discounting across the subjects using AUC or an output from a model because the seven indifference points have been collapsed into a single data point per subject. As the indifference points are a richer data source it would be possible to use a GEE to compare the degree of discounting between those subjects, although in this extreme case researchers would not typically compare the differences between two subjects. Relying on a richer data source is always a better analytic strategy.

A second benefit of GEE is that it can, potentially, abate a common problem in the characteristics of obtained AUC values or model outputs. There are serveral well known limitations of using AUC or the results of model fits to analyze delay discounting data (

Finally, one limitation of analyses that rely on categorical comparisons (i.e.,

This study relied on a Monte Carlo method to simulate data sets so that many “aritificial experiments” could be conducted to compare different types of analyses. This study provides further evidence that Monte Carlo methods are a tool that should be more commonly used by behavior analysts. Monte Carlo methods can be used to answer a wide variety of quantitative questions. For example

In conclusion, there are many potential benefits to analyzing discounting data with a GEE. Often times, with discounting data more conventional and simpler statistical tests are appropriate. However, as behavior analysts begin to ask more complex questions which lead to extremly complex data structures it is important that we use the proper statistical tools for the data at hand. The GEE might not be the best or most appropriate statistical test in every situation, but we encourage authors to be as careful with their decision-making about statistical analyses as they are with decision-making about experimental designs.

Authors’ note: The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the National Institute for Occupational Safety and Health, Centers for Disease Control and Prevention. The authors would like to thank Amy Odum and Oliver Wirth for providing comments on an earlier version of this manuscript and Yusuke Hayashi for thoughtful conversations in developing this study. Correspondence concerning this article should be addressed to Jonathan Friedel.

Histograms of all indifference points, AUC for delay discounting, and AUC for probability discounting from

Indifference points and AUC values from

List of statistical comparisons included in the Monte Carlo Simulations

Analysis | Conv. | Non-Par | GEE |
---|---|---|---|

Main effect of group on discounting | - | - | X |

Effect of delay on probability discounting | * | X | X |

Effect of probability on delay discounting | * | X | X |

Interaction of group and delay on probability discounting | - | X | X |

Interaction of group and probability on delay discounting | - | X | X |

Post-hoc: Differences in probability discounting between groups at each level of delay | X | X | X |

Post-hoc: Differences in delay discounting between groups at each level of probability | X | X | X |

Counts of significant post-hoc for between group differences in probability discounting at each delay to the destination

Within group resampling | Random assignment | |||||
---|---|---|---|---|---|---|

Delay to the destination | Conv. | Non-par. | GEE | Conv. | Non-par. | GEE |

30 seconds | 636 | 627 | 468 | 45 | 44 | 67 |

3 minutes | 920 | 984 | 877 | 44 | 51 | 54 |

15 minutes | 999 | 1000 | 966 | 44 | 49 | 52 |

1 hour | 998 | 998 | 967 | 42 | 43 | 55 |

3 hours | 997 | 998 | 996 | 41 | 34 | 43 |

Note:

indicate that a statistically significant difference between the groups was found in the experimentally obtained data (

Counts of significant post-hoc for between group differences in delay discounting across each odds of a car crash

Within group resampling | Random assignment | |||||
---|---|---|---|---|---|---|

Odds against a car crash | Conv. | Non-par. | GEE | Conv. | Non-par. | GEE |

9 | 910 | 932 | 754 | 36 | 55 | 58 |

99 | 982 | 973 | 963 | 41 | 43 | 51 |

299 | 997 | 993 | 988 | 41 | 39 | 53 |

999 | 998 | 998 | 991 | 39 | 35 | 47 |

2999 | 998 | 998 | 998 | 51 | 39 | 48 |

Note:

indicate that a statistically significant difference between the groups was found in the experimentally obtained data (