Extended Data Table 2 Word statistics for geographical attribution

To discover underlying patterns in Ithaca’s predictions, we compute statistics to track the words that appear most frequently (“frequency”) in texts whose region Ithaca predicts correctly (“accuracy”). For each word of the test set, we compute an average accuracy, and a frequency of appearance. This visualization is intended to evaluate whether the occurrence of particular words could be correlated to the model’s geographical attributions.

Quick links

Search