Visualize how Byte-Pair Encoding learns a vocabulary and tokenizes text.
The token sequence at each step. The next pair to be merged is highlighted in yellow.
The most frequent pair (highlighted) is selected for the next merge.
Pairs are merged in this specific order.
Includes initial characters and new merged tokens.
Text is broken down using the learned vocabulary.
The final integer representation of the text.