Figure 4
From: Regional gender differences in an autosomal disease result in corresponding diversity differences

Data analysis process. The process includes multiple steps: data cleaning, association mining, association rule significance testing, association regionality discussion, and association rule replacement discussion. The data were cleaned mainly by removing the items without needed information. The associations were mined with the apriori algorithm. The association rules identified by apriori were tested with the Fisher’s exact test. Then, the regionality and replacement of the significant rules were discussed. When discussing regionality, each significant rule found in a region was tested in the other region. If a rule was significant in one region but not in the other, it was thought to be regional. When discussing the rationality of replacing an apriori rule with a simpler rule, a comparison was made between two conditional probabilities, one for the apriori rule and the other for a given simpler rule that could replace the former rule. If the probabilities differed from each other significantly, the replacement was thought not to be rational. At last, a permutation test was designed for the Xishuangbanna rules including “Sex” in discussion.