CountryInfo-1
-
This is the GitHub repository for the CountryInfo.txt and related utility programs.
CountryInfo.txt is a general purpose file intended to facilitate natural language
processing of news reports and political texts.
It was originally developed to identify
states for the text filtering system used in the development of the Correlates of War
project dataset MID4, then extended to incorporate CIA World Factbook and WordNet
information for the development of TABARI Dictionaries.
File contains about 32,000 lines
with country names, synonyms and other alternative forms, major city and region names,
and national leaders.
It covers about 240 countries and administrative units
(e.g. American Samoa, Christmas Island, Hong Kong, Greenland).
It is internally documented
and almost but not quite XML.