Question 5

Domain 2: Data Preparation

A Generative AI Engineer is building a RAG application that will rely on context retrieved from source documents that are currently in HTML format. They want to develop a solution using the least amount of lines of code. Which Python package should be used to extract the text from the source documents?

A. pytesseract B. numpy C. pypdf2 D. beautifulsoup

Previous Next

Question 5

Explanation

Why each option is right or wrong