NCA-GENL Practice Q16

A. The model generates all tokens in the output sequence simultaneously in a single forward pass using parallel decoding, producing the complete response at once rather than sequentially

B. The model outputs dense embedding vectors from its final transformer layer that are directly used as the text representation without any additional projection or sampling step

C. The model outputs logits (raw scores) for each vocabulary token, which are converted to probabilities via softmax and then sampled or selected

During autoregressive decoding, a decoder-only transformer computes a hidden state for the current context and applies the output projection to produce one logit per vocabulary item; if the vocabulary has V tokens, the result is a V-dimensional score vector. Those logits are then normalized with the softmax function, \(p_i = e^{z_i}/\sum_j e^{z_j}\), and the next token is either sampled from that distribution or chosen by an argmax/greedy rule, depending on the decoding strategy.

D. The model directly outputs text characters one at a time using a character-level mapping function, bypassing any intermediate probability computation or vocabulary-based token selection

Question 16

Explanation

Why each option is right or wrong