ArraySolver for analysis of gene expression data
Microarray is a versatile technique for measuring the expression of thousands of genes simultaneously in a single experiment. However, capturing the hidden treasure from huge microarray data is a great challenge to scientists. The primary microarray data need to be normalized first for correcting slide-to-slide experimental variation before any statistical interpretation can be meaningfully carried out. One of the major goals of microarray data analysis is the identification of genes that are differentially expressed within two or more kinds of samples or experimental conditions. Both parametric and nonparametric approaches have been applied for this purpose. On the other hand, gene clustering (hierarchical grouping) is a commonly used computation tool for molecular classification of disease states, functional grouping of genes, and biological description of gene regulation. Usually the strategies of filtering differentially expressed genes and functional clustering are applied in tandem for molecular classification of gene signatures or fingerprints with embedded diagnostic and prognostic features.
The usage of an appropriate statistical method for two-group comparisons (e.g. normal versus diseased) is an important criterion for effective application of gene signatures. The cluster analysis cannot be considered a valid method for comparing gene expression between the two samples or groups. Similarly, the tools of determining differentially expressed genes tend to apply filters that would disturb the basic configuration of gene signatures and would not be suitable for an integrated two-group comparison. Although normalization of microarray data might validate parametric statistics for detecting differences between the two groups, a nonparametric (distribution-free) approach seems to be more reliable and appropriate statistics for such a data structure.The Wilcoxon matched-pairs signed-rank test (the counterpart of the parametric paired t-test) examines the differences between dependent groups and could be more useful for analyzing microarray expression data. Wilcoxon signed-rank test has been applied for pair-wise comparison of gene expression data obtained from reverse-transcription PCR, real-time PCR, in-situ hybridization, immunohistochemistry and laser dosimetry. Whereas the potential application of Wilcoxon signed-rank test has largely been neglected for microarray data analysis, possibly due to the computational complexities especially when the number of pairs is too large.
Figure 1. Display window showing color-coded gene expression of 2 groups. The program was executed after choosing the option of differentially expressed genes.
We have developed a Microsoft Excel-based tool  for minimizing the complexities of gene expression data by using color-coded graphics and to perform the Wilcoxon signed-rank test within the same framework. ArraySolver is a convenient tool for analysis and interpretation of gene expression data. The facility of color-coded graphical display minimizes the complexity of tabular data whereas Wilcoxon signed-rank test provides an appropriate and reliable statistical analysis. Although ArraySolver can handle very large data sets it is highly desirable to apply this software to pre-filtered data or gene signatures for meaningful interpretation of results.
Khan HA (2004) ArraySolver: an algorithm for color-coded graphical display and Wilcoxon signed-rank statistics for comparing microarray gene expression data. Comp. Func. Genom. 5 (1), 39-47.