Overcoming bias when analysing transcriptomics data


In a nutshell: A common method for linking gene expression with brain structure and function is affected by substantial bias. A new software toolbox can help researchers overcome them.

View Paper Abstract

Overcoming bias when analysing transcriptomics data

In recent years, efforts to understand the brain have been enhanced by transcriptomic profiling – a measure of the expression levels of every gene in a genome, in every cell or tissue across the brain. This information helps to link the brain’s molecular activity with observable measures relating to its structure or function (known as ‘phenotypes’).

New research from the Brain Function CoE shows that a common method for discovering the links between gene expression and phenotype is affected by substantial bias.

To link gene expression to a brain phenotype, researchers use an approach called gene category enrichment analysis (GCEA). GCEA uses statistical methods to score how well the expression of each gene correlates to a particular phenotype. The genes are then grouped together by category, and their scores are combined. GCEA measures the statistical significance of each category’s cumulative score, which identifies the gene category most strongly related to a particular phenotype.

Brain Function CoE researchers Ben Fulcher and Alex Fornito, together with Aurina Arnatkeviciute from Monash University, examined the statistical biases involved in using GCEA with transcriptomic data. They found that the rate at which a particular gene category is linked to a random phenotype is much higher than would be expected by chance. This leads to false positives – associations reported where there really are none. For some gene categories, the researchers found that more than 20% of associations were false positives.

After identifying the causes of this false-positive bias, the researchers designed a new GCEA approach to overcome the bias. It uses a different method to measure statistical significance. Their software toolbox, which can be used to perform conventional GCEA and their new approach, is freely available online.

Next steps:
The team has no plans to do more work in this research area.

Fulcher, B. D., Arnatkeviciute, A., & Fornito, A. (2021). Overcoming false-positive gene-category enrichment in the analysis of spatially resolved transcriptomic brain atlas data. Nature Communications, 12, 2669. doi: 10.1038/s41467-021-22862-1

Republish this article:

We believe in sharing knowledge. We use a Creative Commons Attribution 4.0 International License, which allows unrestricted use of this content, subject only to appropriate attribution. So please use this article as is, or edit it to fit your purposes. Referrals, mentions and links are appreciated.