By RAFAŁ WAŚKO (Predictive Solutions)
Popular statistical tests include Pearson’s chi-square tests. It is worth noting at the outset that this test has more than one application. In this material, I will discuss the main differences between the tests and introduce the most important issues related to the chi-square test.
To begin with, it is worth recalling the basic information about Pearson’s chi-square test. The first one, which is probably the most commonly used when we talk about this kind of test, is the chi-square independence test. In various types of survey research in the field of marketing, psychology or sociology, the main type of variables that the analyst has at his/her disposal are qualitative variables. A popular test used to analyse two qualitative variables and determine whether there is a statistically significant relationship between them is the chi-square independence test.
We can use the chi-square test of contingency when we have a single qualitative variable. Often, but not always, the analyst expects the categories to have equal proportions, e.g., when we use a t-test for independent groups or in the case of analysis of variance. The test allows us to check whether the frequency distribution of a categorical variable differs significantly from our expectation. In other words, the chi-square test of contingency is used to assess whether the empirical distribution of the data is consistent with the theoretical distribution that is described by a specific null hypothesis.
A similar form of test is the chi-square test of homogeneity, which checks, for example, whether two distributions of a variable have the same proportions relative to each other. In general, the chi-square test of homogeneity is used to check whether the frequency distribution of a categorical variable differs from another defined distribution. This test is used when the researcher wants to check whether there is a significant difference between the distributions of at least two categorical variables. Examples of null hypotheses that can be tested using the chi-square test of homogeneity include the frequency of a certain event in different groups, a comparison of consumer preferences for different products, etc.
From a mathematical point of view, it is worth noting that these are actually the same tests. However, we often think of them as different tests because they are used for different purposes.
CHI-SQUARE TEST FORMULA
The formulae for the homogeneity test and the contingency test are in the main very similar to each other. In both cases, the calculation of the chi-square statistic is based on observed and expected values.
– chi-square test statistic,
– observed values,
– expected values,
– number of measurements/groups.
As can be seen, the formula is similar to that for the chi-square independence test. The greater the difference between the observed and expected values, the greater the value of the chi-square statistic will be. To decide whether the difference is statistically significant, compare the resulting test value with the table of critical values of the chi-square distribution.
EXAMPLE OF CALCULATING A CHI-SQUARE STATISTICWe asked respondents if they did any physical activity at least once a week, e.g., running, gym, cycling. We received the following results:
Table 1. Engaging in physical activity at least once a week
We want to answer the question of whether the difference between people doing at least one physical activity per week and non-exercisers is statistically significant. To do this, we will calculate the chi-square statistic. The easiest way to do this is to use a properly prepared table.
Table 2. Calculation of chi-square statistics for physical activity data
Having calculated the chi-square statistic, we still need to calculate the number of degrees of freedom (df) to answer the question posed above. The formula for the number of degrees of freedom is as follows:
df = k-1
k – number of categories.
In our example, the number of degrees of freedom is 1.
Then compare the chi-square value with the table of critical values of the chi-square distribution. Assuming a significance level of 0.05, in our example the chi-square test showed no statistically significant difference between exercisers and non-exercisers.
CHI-SQUARE TEST OF CONTINGENCY AS A MEASURE OF VARIATION FOR QUALITATIVE VARIABLES IN PS IMAGO PROIn this material, I have discussed the basic issues of the chi-square test of contingency and how we can calculate it without using a computer and statistical software. Let us now turn to a non-obvious application of this test, namely to use it as a measure of variation for qualitative variables. Let us return to the example of people exercising. If the numbers of exercisers and non-exercisers are the same, the value of the chi-square test will be 0. The same will be true if the variable under analysis has more than two categories for which the counts are the same. If the value of the test is close to 0, then the variation in the categories of the variable under study can be interpreted as small. The minimum value for the chi-square test is 0 when the distribution of the abundances is uniform. The maximum value, on the other hand, is reached when all observations are assigned to one category of the variable. One of the procedures that allows the chi-square test statistic to be calculated in PS IMAGO PRO is Data Audit. The procedure allows you to prepare a summary for the analysed variables in the form of tables containing selected statistics broken down into qualitative and quantitative variables. Let us analyse another example in which we have a variable with four categories.
Table 3: Distribution of the variable “Type of car body”.
Table 4. Chi-square results for the analysed variable