Extended Data Fig. 4: Bioinformatic pipeline for determining material identity/composition from NGS sequencing data.

Fastq NGS data was demultiplexed by row and column barcodes to re-group sequences amplified from the same DNA input. Then for each amplicon sequence, the grep function was applied to search the dominant and variant alleles to calculate variant allele frequency (VAF) for each SNP locus. If the encapsulated cells comprised only one donor, the VAF profile was compared against profiles of the 20 pre-screened HUVEC donors. The donor with the highest match rate was identified as the encapsulated donor cell. When one or two donors were used as encapsulated cells, the log-likelihood of all possible donor compositions was calculated. The composition with the highest overall log-likelihood was determined as the cell composition (Quality control for log-likelihood analysis: 1) at least 25/30 SNP loci had sequencing coverage >50; and 2) overall log-likelihood higher than -200, and 3) goodness measurement higher than 10 where goodness is defined as the difference of log-likelihood between the most likely and the second most likely donor pairs). The material corresponding to the identified donor cell or cell composition would be the material encapsulating cells.