scraper
-
We face a tradeoff between seeking the broadest geographic coverage we can get
(meaning including every local paper we can find) and accuracy and relevance
(which would lead us to include only large, well-known, and high quality news
outlets).
We're trying to balance the two objectives by including a third
column indicating whether the source is one is a wire service, a dependable
news source with solid international coverage, or a local source that may
contribute extra noise to the data and may require specialized actor
Dictionaries.
The distinction between the latter two is hazy and requires a
judgement call.
Eventually, these labels can be used to build event datasets
that are either optimized for accuracy and stability (at the cost of
sparseness), or micro-level, geographically dispersed (but noisy) coverage.