BeautifulSoup Html Parser and Encoding

soup = BeautifulSoup(content)

You can switch parser.

soup = BeautifulSoup(content, "html.parser")

or

pip install lxml
soup = BeautifulSoup(content, "lxml")

NOTE: sometimes lxml fail to find some html elements where I have to fall back to html.parser.

You can specifiy the encoding of the html content as well. On some not common cases I have to specify encoding else unicode are not outputted correctly.

soup = BeautifulSoup(content, "html.parser", from_encoding="utf-8")
r = requests.get("https://news.ycombinator.com")encoding = r.encoding if "charset" in r.headers.get("content-type", "").lower() else Nonesoup = BeautifulSoup(r.content, from_encoding=encoding)

References:

❤️ Is this article helpful?

Buy me a coffee ☕ or support my work via PayPal to keep this space 🖖 and ad-free.

Do send some 💖 to @d_luaz or share this article.

✨ By Desmond Lua

A dream boy who enjoys making apps, travelling and making youtube videos. Follow me on @d_luaz

👶 Apps I built

Travelopy - discover travel places in Malaysia, Singapore, Taiwan, Japan.