BeautifulSoup Html Parser and Encoding

soup = BeautifulSoup(content)

You can switch parser.

soup = BeautifulSoup(content, "html.parser")

pip install lxml

soup = BeautifulSoup(content, "lxml")

NOTE: sometimes lxml fail to find some html elements where I have to fall back to html.parser.

You can specifiy the encoding of the html content as well. On some not common cases I have to specify encoding else unicode are not outputted correctly.

soup = BeautifulSoup(content, "html.parser", from_encoding="utf-8")

r = requests.get("https://news.ycombinator.com")encoding = r.encoding if "charset" in r.headers.get("content-type", "").lower() else Nonesoup = BeautifulSoup(r.content, from_encoding=encoding)

References:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser