Table 2 Comparison of discussed methods for addressing heteroscedastic noise
From: Orbitrap noise structure and method for noise unbiased multivariate analysis
Type | Method | Description | Comments |
---|---|---|---|
No scaling | - | - | Overweights higher intensity peaks relative to lower intensity ones by an amount proportional to their difference (unbounded). |
Heuristic | Variance scaling | The noise variance vector is estimated as the sample variance. | The sample variance includes systematic spatial variation, which is erroneously treated as noise. It underestimates the significance of high-intensity over-dispersed peaks. |
Pareto scaling | The noise variance vector is estimated as the sample standard deviation. | The sample standard deviation includes systematic spatial variation, which is erroneously treated as noise. It underestimates the significance of low-intensity/censored peaks. | |
Root-mean scaling | The noise variance is assumed to equal the global mean spectrum (i.e., the sample mean). | Overestimates the significance of low-intensity/censored peaks relative to high-intensity ones by an amount bounded by a constant factor. | |
Log transform | Takes the logarithm of intensities in each spectrum. | From a noise equalisation standpoint, assumes noise variance is an exponential function of the signal. It is mathematically problematic for sparse datasets with a high fraction of zeros. | |
Square-root transform | Takes the square-root of intensities in each spectrum. | Prone to overfitting the noise since each data element is assigned an individual estimate of its variance. These estimates can be exceedingly poor for low-intensity signals. | |
Machine learning-based | PFA | Directly estimates the uncorrelated noise variance vector as part of an iterative matrix factorization algorithm. | Erroneously characterises uncorrelated systematic variation as noise, can be very time consuming to compute, is susceptible to outliers. |
Model-based(proposed in this work) | WSoR | The noise variance vector is computed based on a weighted-sum-of-Ricians statistical model. |