CountryInfo-1


  • This is the GitHub repository for the CountryInfo.txt and related utility programs. CountryInfo.txt is a general purpose file intended to facilitate natural language processing of news reports and political texts. It was originally developed to identify states for the text filtering system used in the development of the Correlates of War project dataset MID4, then extended to incorporate CIA World Factbook and WordNet information for the development of TABARI Dictionaries. File contains about 32,000 lines with country names, synonyms and other alternative forms, major city and region names, and national leaders. It covers about 240 countries and administrative units (e.g. American Samoa, Christmas Island, Hong Kong, Greenland). It is internally documented and almost but not quite XML.