Fig. 2: Data melting for data restructuring and supervised learning prediction performances. | npj Microgravity

Fig. 2: Data melting for data restructuring and supervised learning prediction performances.

From: GLARE: discovering hidden patterns in spaceflight transcriptome using representation learning

Fig. 2: Data melting for data restructuring and supervised learning prediction performances.

a Illustration showing the restructuring process of our base data model, where we make the experiment location a categorical variable (right-hand table, column label `Location'). Raw FPKM numerical data (denoted as ###) as reads per locus (e.g., AT1G01010, AT1G01020 across all ~25,000 genes in the Arabidopsis genome) for each experimental sample (e.g., FLT sample of Columbia ecotype grown in the light, FLT_col_Light, or GC sample of Columbia ecotype grown in the light, GC_col_Light) is shown in the left hand table. Each gene has one row in the data matrix, but each column label encodes multiple experimental factors (flight vs ground, genotype, and lighting regime). After restructuring (right table), each gene has two instances, one from the flight and one from the ground sample, separately. This facilitates subsequent analysis, asking, in this case, if flight versus ground is a key discriminator within the dataset. b Receiver Operating Characteristic (ROC) curves using the training and test datasets on the best-performing data model, which is the base model. Blue line represents the XGBoost classifier, showing the ratio of true positives to false positives in the model predictions from the training data (left) and the test set (right). Red line is the random chance baseline. FLT spaceflight, GC ground control.

Back to article page