Extended Data Fig. 1: Pictorial representation of the complete Emu algorithm.

Follow the gray-arrowed path until expectation–maximization (EM) iterations are complete, then pink arrows are followed to the final composition estimate. The method starts by establishing probabilities for each alignment type C = [mismatch (X), insertion (I), deletion (D), softclip (S)] through occurrence counts in the primary alignments. Next, alignment probability P(r|t) is calculated for each read, taxonomy pair (r,t) by assuming the maximum alignment probability between r and t. Meanwhile, an evenly distributed composition vector F is initialized. The EM phase is entered by determining P(t|r), the probability that r emanated from t, for all P(r|t). F is updated accordingly, and the total log likelihood of the estimate is calculated. If the total log likelihood is a significant increase over the previous iteration (>.01), then EM iterations continue. Otherwise, the loop is exited, and F is trimmed to remove all entries less than the set threshold. Now following the pink arrows, one final round of estimation is completed with the trimmed F to produce the final sample composition estimate.