Brief: System to read and understand books and other texts
Jump To: Parent Description
Code: GitHub

  • Workhorse is a system to set up dedicated servers for the creation of tagged, analyzed and understood texts, and other linguistic research. For datasets, we have Wikipedia, Gutenberg, and hopefully fulltext books from Google Books, all appropriately licensed. We aim to develop a highly annotated freely-available corpus of marked-up texts that have been processed with a wide variety of state of the art systems. We also aim to apply natural language understanding, knowledge base population, and other techniques, onto the texts to derive useful knowledge.