Fig. 1

Schematic representation of the analytic strategy. a ESCC whole-exome sequencing data of three patient cohorts, Caucasian, Vietnamese and Chinese, were respectively obtained from this study and TCGA. Our strategy includes two major steps to remove confounders. To remove technical confounders, we applied the same procedure to process sequencing reads generated from the Hi-seq sequencing platform. We then performed downsampling to balance the depth of coverage among the three cohorts, followed by a stringent method to call somatic single-nucleotide mutations using multiple mutational callers. Second, to remove biological confounders, we calculated propensity scores, reweighted samples in the cohorts, and compared gene mutation frequencies between two balanced cohorts. We considered five biological factors (age at diagnosis, gender, tumor stage, smoking history, and alcohol consumption history) in the propensity score adjustment. b Hierarchical clustering pattern of patient samples by common SNP status in the exonic regions. Asian patients and Caucasian patients form two distinct clusters