CTCDecoder


  • The RNN output matrix of the **Mini example** testcase contains 2 time-steps (t0 and t1) and 3 labels (a, b and - representing the CTC-blank). Best path decoding (see left figure) takes the most probable label per time-step which gives the path "--" and therefore the recognized text "" with probability 0.6\*0.6=0.36. Beam search, prefix search and token passing calculate the probability of labelings. For the labeling "a" these algorithms sum over the paths "-a", "a-" and "aa" (see right figure) with probability 0.6\*0.4+0.4\*0.6+0.4*0.4=0.64. The only path which gives "" still has probability 0.36, therefore "a" is the result returned by beam search, prefix search and token passing.