Question 16
Domain 1: Generative AI with LLMs and PromptingWhat happens during the token generation process when a decoder-only LLM produces an output?
Correct answer: C
Explanation
A decoder-only LLM generates the next token by producing logits, or raw scores, over the vocabulary. Those scores are turned into probabilities with softmax, then a token is chosen by sampling or selecting the highest-probability option.
Why each option is right or wrong
A. The model generates all tokens in the output sequence simultaneously in a single forward pass using parallel decoding, producing the complete response at once rather than sequentially
B. The model outputs dense embedding vectors from its final transformer layer that are directly used as the text representation without any additional projection or sampling step
C. The model outputs logits (raw scores) for each vocabulary token, which are converted to probabilities via softmax and then sampled or selected
During autoregressive decoding, a decoder-only transformer computes a hidden state for the current context and applies the output projection to produce one logit per vocabulary item; if the vocabulary has V tokens, the result is a V-dimensional score vector. Those logits are then normalized with the softmax function, \(p_i = e^{z_i}/\sum_j e^{z_j}\), and the next token is either sampled from that distribution or chosen by an argmax/greedy rule, depending on the decoding strategy.
D. The model directly outputs text characters one at a time using a character-level mapping function, bypassing any intermediate probability computation or vocabulary-based token selection