Jump To: Parent Description
- Bios is a suite of syntactico-semantico analyzers that include the most common tools needed for the shallow analysis of English text. Currently the following tools are included: (*) Smart tokenizer that recognizes abbreviations, SGML tags etc. (*) Part-of-speech (POS) tagger. The POS tagger is implemented as a a wrapper around the TNT tagger by Thorsten Brants. (*) Syntactic chunking using the labels promoted by the CoNLL chunking evaluations (http://www.cnts.ua.ac.be/conll2000/chunking). (*) Named-Entity Recognition and Classification (NERC) for the CoNLL entity types plus an additional 11 numerical entity types.