BeautifulSoup Html Parser and Encoding

August 20, 2018
soup = BeautifulSoup(content)

You can switch parser.

soup = BeautifulSoup(content, "html.parser")

or

pip install lxml
soup = BeautifulSoup(content, "lxml")

NOTE: sometimes lxml fail to find some html elements where I have to fall back to html.parser.

You can specifiy the encoding of the html content as well. On some not common cases I have to specify encoding else unicode are not outputted correctly.

soup = BeautifulSoup(content, "html.parser", from_encoding="utf-8")
r = requests.get("https://news.ycombinator.com")
encoding = r.encoding if "charset" in r.headers.get("content-type", "").lower() else None
soup = BeautifulSoup(r.content, from_encoding=encoding)

References:

This work is licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License.