CalcFisher for performing Fisher’s exact test
Fisher’s exact test (FET) calculates the exact probability value for the relationship between two dichotomous variables. It is an extremely useful non-parametric method for analyzing statistical association between the two independent sample groups and is commonly used for analyzing clinical and experimental data in biomedical research. The results of FET are expressed in terms of exact probability (P-value), varying within 0 and 1. The data format for FET is conveniently represented by 2´2 table, made of 2 rows and 2 columns. The two rows are two independent groups and the two columns represent the two effects or conditions. Although the calculations required for FET are fairly straightforward the construction of additional 2´2 tables and the computation of respective probabilities using factorial formula entail considerable time and effort, especially when the lowest cell value is high. We have developed a computer program to solve the complexities involved in factorial computations for data analysis using FET .
Figure 1. Construction of 2×2 contingency table and formulae for P-value calculation.
The design of 2´2 contingency tables provides a comprehensive view of data to be analyzed. There are four input parameters (frequencies) belonging to two different groups. The top row is one group and the bottom row is another group whereas ‘+’ and ‘–’ signs above the two columns indicate presence or absence of a certain condition respectively. The standard formula (Formula 1) for calculating P-value is shown in Table 1. If the smallest cell value in the contingency table is 0 then only one exact probability has to be calculated which is the simplest form of FET. However, if none of the cell frequencies is 0, more extreme deviations from the distribution could occur with the same marginal totals; thus, all those possible deviations must be considered and respective probabilities summed for testing null hypothesis. For instance, if the smallest cell value is 2, then three exact probabilities (using smallest cell values 2, 1 and 0) must be determined and then summed to get the exact P-value.
Our preliminary efforts while developing this Visual Basic application showed that the standard Formula 1 could only be used for up to a total of 113 subjects (X = 113), beyond that the output of factorial computations exceeds the range of Visual Basic. Consequently, a modified procedure (Formula 2, Table 1) based on logarithmic conversions was used to perform FET for a wider range of frequencies. According to this procedure, the program finds out the P-value of the original frequencies using the antilogarithm of the value obtained by subtracting the logarithm of total subjects (X) and sum of logarithms of individual frequencies (x1, x2, x3 and x4) from the sum of logarithms of row and column totals (t1, t2, t3 and t4).Then the program identifies the minimum frequency in the 2×2 table, subtracts 1 from this frequency and adjusts the remaining frequencies in the table so that row (t1 and t2) and column totals (t3 and t4) remain constant. The resulting set of frequencies is also used to compute the respective P-value. The whole process of subtracting 1 from the current minimum frequency, adjusting remaining 3 frequencies and computing the P-value is repeated until the least frequency becomes 0. All the P-values (obtained by using the least frequency 0, 1, 2, xmin) are summed up to get the exact P-value (Formula 2). The software rounds the P-values at fifth decimal place for ease in presentation.
The commonly used statistical packages including SPSS and EPI-INFO can also be used for computing FET, but in a condition-bound strategy. The former program calculates Fisher’s exact P-value only when the total subjects are twenty or less whereas the later performs FET when any of the expected values is less than five. On the other hand, CalcFisher computes P-values irrespective of cell frequencies and therefore can be utilized for universal application of FET for any data sets. Both SPSS and EPI-INFO basically compute c2 statistics in the 2´2 table format when the cell values are high. In fact, the c2 test is an approximation to FET and when applied with appropriate continuity correction leads to a fair approximation to exact probability. However, the estimate of probability in the c2 test may not be very accurate if the marginal is very uneven or if one of the values is very small. Whereas, FET is a valid procedure for any number of frequencies and can easily be performed using CalcFisher. Thus, the complexity of factorial computations can be greatly simplified by using logarithmic methodology. Log-based computations are highly suitable for developing Visual Basic applications as they involve lesser number of operations and also keep the output of intermediate steps within the permissible range of Visual Basic. The operational simplicity and integrated report format of CalcFisher render a handy tool for performing Fisher’s exact test.
Khan HA (2003) A Visual Basic software for Fisher’s exact test. J. Stat. Software. 8 (21), 1-7.