Do i need to use html entities?

Based on the comments I have received, I looked into this a little further. It seems that currently the best practice is to forgo using HTML entities and use the actual UTF-8 character instead. The reasons listed are as follows:

  1. UTF-8 encodings are easier to read and edit for those who understand what the character means and know how to type it.
  2. UTF-8 encodings are just as unintelligible as HTML entity encodings for those who don't understand them, but they have the advantage of rendering as special characters rather than hard to understand decimal or hex encodings.

As long as your page's encoding is properly set to UTF-8, you should use the actual character instead of an HTML entity. I read several documents about this topic, but the most helpful were:

  • UTF-8: The Secret of Character Encoding
  • Wikipedia Special Characters Help

From the UTF-8: The Secret of Character Encoding article:

Wikipedia is a great case study for an application that originally used ISO-8859-1 but switched to UTF-8 when it became far too cumbersome to support foreign languages. Bots will now actually go through articles and convert character entities to their corresponding real characters for the sake of user-friendliness and searchability.

That article also gives a nice example involving Chinese encoding. Here is the abbreviated example for the sake of laziness:

UTF-8:

這兩個字是甚麼意思

HTML Entities:

這兩個字是甚麼意思

The UTF-8 and HTML entity encodings are both meaningless to me, but at least the UTF-8 encoding is recognizable as a foreign language, and it will render properly in an edit box. The article goes on to say the following about the HTML entity-encoded version:

Extremely inconvenient for those of us who actually know what character entities are, totally unintelligible to poor users who don't! Even the slightly more user-friendly, "intelligible" character entities like θ will leave users who are uninterested in learning HTML scratching their heads. On the other hand, if they see θ in an edit box, they'll know that it's a special character, and treat it accordingly, even if they don't know how to write that character themselves.

As others have noted, you still have to use HTML entities for reserved XML characters (ampersand, less-than, greater-than).

An HTML entity is a piece of text ("string") that begins with an ampersand (&) and ends with a semicolon (;). Entities are frequently used to display reserved characters (which would otherwise be interpreted as HTML code), and invisible characters (like non-breaking spaces). You can also use them in place of other characters that are difficult to type with a standard keyboard.

Note: Many characters have memorable entities. For example, the entity for the copyright symbol (©) is ©. For less memorable characters, such as or , you can use a reference chart or decoder tool.

Reserved characters

Some special characters are reserved for use in HTML, meaning that your browser will parse them as HTML code. For example, if you use the less-than (<) sign, the browser interprets any text that follows as a tag.

To display these characters as text, replace them with their corresponding character entities, as shown in the following table.

See also

I wouldn't replace with an entity unless it invalidates the HTML. Entities look ugly and are difficult to read. The simplest and best teacher for what is and is not requiring an HTML entity is an HTML validator. The most trustworthy validator is at: https://validator.w3.org/ Not all entities are needed to make valid HTML. For example, &quot; most-often isn't needed. It makes the HTML look more cryptic to developers, adds more bytes to the download size, and doesn't make the HTML any more valid. It is bad in every way. The validator will ensure that those special characters don't introduce any character encoding issues or other HTML validity problems. Trust the validator more than what any person says including myself. I'm fairly confident with a few pointers, though. Out of the specific characters you mentioned, exclamation marks, regular quotation marks, and regular hyphens shouldn't require entities. There are some similar but more special characters that require entities. There are directed quotes(right and left), left and right single quote/apostrophe, a long dash or hyphen(longer than the regular hyphen) that probably need entities but the validator should point this out. An ampersand should always be an entity like &amp; in HTML. < and > signs in text nodes must be replaced with entities like &lt; and &gt; too. There are different rules for when < and > are used in JavaScript and CSS, though. Also note that URL formats need to be encoded differently than text nodes. Regular double quotes wrap an attribute's value so a quote within it needs to be escaped like \".

Why are HTML character entities necessary?

You use entities to help the parser distinguish when a character should be represented as HTML, and what you really want to show the user, as HTML will reserve a special set of characters for itself. will cause the "" tag to disappear, e.g. as HTML does not have a tag defined as such.

Do you not use entity references?

Google's HTML/CSS Style Guide advises against using entity references: Do not use entity references. There is no need to use entity references like — , ” , or ☺ , assuming the same encoding (UTF-8) is used for files and editors as well as among teams.

What does HTML entities do in PHP?

htmlentities() Function: The htmlentities() function is an inbuilt function in PHP that is used to transform all characters which are applicable to HTML entities. This function converts all characters that are applicable to HTML entities.

When should you use HTML?

What is HTML used for?.
Structuring web pages. With tags and elements, we can define the headings, paragraphs, and other contents of a web page. ... .
Navigating the internet. ... .
Embedding images and videos. ... .
Improving client-side data storage and offline capabilities. ... .
Game development. ... .
Interacting with native APIs..