Question 17
Domain 3: Data Transformation, Cleansing, and QualityA silver table combines phone numbers from multiple countries in inconsistent formats. What transformation goal should be prioritized before deduplication?
Correct answer: B
Explanation
Deduplication depends on comparing like-for-like values, so phone numbers from multiple countries must first be standardized into one canonical format. This aligns with data transformation and cleansing: “apply advanced data transformations” and handle “bad data” before downstream processing.
Why each option is right or wrong
A. Hash the raw string as-is
Hashing preserves the original string form; it does not normalize country-specific phone formats.
B. Standardize numbers into one canonical format
Deduplication only works when the comparison keys are normalized; with phone numbers stored in country-specific and inconsistent formats, the same subscriber can appear as multiple distinct strings. Under the exam’s data transformation and cleansing objective, the first step is to standardize the values into a single canonical representation so downstream matching operates on like-for-like data rather than formatting noise.
C. Sort rows alphabetically
Alphabetical sorting changes row order, not the underlying phone-number format.
D. Convert all values to NULL when separators differ
Different separators indicate formatting differences, not invalid values requiring NULL replacement.