A t-SNE-Plot-Aided Generalized F-test for Analysis of Gene Expression Data - Abstract
In this paper, we introduce a novel approach for analyzing gene expression data by integrating the t-distributed Stochastic Neighbor Embedding (t-SNE) for data clustering with a generalized F-test for multiple mean comparison. High-dimensional gene expression data often poses challenges when the number of features exceeds the total sample size from individual clusters, limiting the applicability of traditional multivariate methods such as Multivariate Analysis of Variance (MANOVA). By employing t-SNE, we first perform nonlinear dimensionality reduction to cluster gene expression data, providing clear visual separation of different groups. Following this, a generalized F-test is applied to compare the mean expression levels across these clusters. The method is further enhanced through projections onto lower dimensions using Principal Component Analysis (PCA), ensuring robustness across different projection spaces. Our approach provides an efficient solution to the problem of multiple mean comparison in high-dimensional settings, where traditional methods fall short. We demonstrate the effectiveness of the proposed method through a case study involving real gene expression data, highlighting its practical utility for researchers in genomics and bioinformatics. Future work will explore post-hoc analyses after rejecting the null hypothesis of equal mean expression levels