Tokenization

/ˌtoʊkənaɪˈzeɪʃən/ noun

Definition

The process of breaking down text or code into individual meaningful units called tokens, such as keywords, operators, identifiers, and literals. It's typically the first step in parsing, where a continuous stream of characters is divided into discrete elements.

Etymology

From 'token' meaning a sign or symbol, combined with the suffix '-ization.' In computing, adopted in the 1960s from linguistics where it meant breaking speech into individual words or morphemes.

Kelly Says

Tokenization is like cutting a sentence into individual words with scissors - except the computer needs to know that 'while' is different from 'while(' and that spaces sometimes matter and sometimes don't! It's why you can't just randomly add spaces in your variable names.

Related Words

Explore More Words

Get the Word Orb API

Complete word intelligence in one call. Free tier — 50 lookups/day.