Question 20
Domain 2: ML Model DevelopmentA company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket. Which solution will meet these requirements?
Correct answer: C
Explanation
Amazon Kendra is a managed enterprise search service built for semantic search over unstructured content, and its S3 connector can ingest documents directly from an Amazon S3 bucket. Querying Kendra then returns relevance-ranked results based on meaning rather than exact keywords, which fits the requirement to "provide semantic search of text files."
Why each option is right or wrong
A. Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.
AWS Batch runs jobs, but Glue is not a vector search engine and SQL is not semantic retrieval.
B. Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.
SageMaker can generate embeddings, but Feature Store is for ML features, not semantic vector search.
C. Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.
Amazon Kendra is the AWS service designed for enterprise semantic search over unstructured text, and its S3 data source connector can crawl and ingest documents directly from an Amazon S3 bucket. Under the Amazon Kendra pricing and service model, you create an index, attach the S3 connector, and then query Kendra for relevance-ranked results based on meaning; this avoids building or managing a vector store yourself and matches the requirement to search the migrated text repository in S3.
D. Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.
Textract extracts text and structure from documents; it does not provide semantic search capabilities.