Skip to content
ahead x

Language models

Tokenization

Tokenization is the process of breaking text down into tokens, that is, into small building blocks the model can work with. The model learns in advance which token splits fit best, based on many texts.

Related terms

← Back to the glossary