soup = BeautifulSoup(content)
You can switch parser.
soup = BeautifulSoup(content, "html.parser")
or
pip install lxml
soup = BeautifulSoup(content, "lxml")
NOTE: sometimes lxml
fail to find some html elements where I have to fall back to html.parser
.
You can specifiy the encoding of the html content as well. On some not common cases I have to specify encoding else unicode are not outputted correctly.
soup = BeautifulSoup(content, "html.parser", from_encoding="utf-8")
r = requests.get("https://news.ycombinator.com")encoding = r.encoding if "charset" in r.headers.get("content-type", "").lower() else Nonesoup = BeautifulSoup(r.content, from_encoding=encoding)
References: