PHP HTML Encode with Example Program

What is HTML encode PHP?

Encoding is the process of transforming reserved characters into HTML entity characters.

HTML character entities are expressed as &value;, where "value" is an abbreviation or number representing each character.

HTML provides an exhaustive set of entities. For purposes of encoding, though, we need only consider four of them:

CharEntity
<&lt;
>&gt;
&&amp;
=&quot;

PHP Encoding Functions

PHP provides two primary functions for encoding HTML characters.

  • Htmlentities() Function
  • Htmlspecialchars()

The htmlspecialchars() functions encode the four primary characters, while the htmentities() function encodes all available characters.

PHP htmlentities() Function

We shall examine the PHP htmlentities() encoding characters.

This function turns all HTML-applicable characters into HTML entities.

It is the optimal option when you need to safely parse HTML.

Syntax:

The general syntax of the function is as shown:

htmlentities( string,flags,character-set,double_encode )

Parameters

ParameterDescription
stringRequired. Specifies the string to convert
flagsOptional. Sets how to deal with quotes, invalid encoding, and the type of document that is being used.

The available quote styles are:
• ENT COMPAT – Default. Encodes only double quotes
• ENT QUOTES – Encodes double and single quotes
• ENT NOQUOTES – Does not encode any quotes

Invalid encoding:
• ENT IGNORE: Ignores invalid encoding instead of making the function return an empty string. Should be avoided, because it could be dangerous.
• ENT SUBSTITUTE replaces invalid encoding for a given character set with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; instead of returning an empty string.
• ENT DISALLOWED: Replaces invalid code points in the specified doctype with the Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD;

Extra flags to specify the doctype used:
• ENT HTML401 – Default. Code to be handled as HTML 4.01
• ENT HTML5: Treat code as if it were HTML 5
• ENT XML1 – Handle code as XML 1
• ENT XHTML – Handle code as XHTML
character-setOptional. A string that tells the computer what character set to use.

The values that can be used are:
• ASCII compatible 8-bit multi-byte Unicode ISO-8859-1
• Western European ISO-8859-15 Western European (Adds the Euro symbol and the French and Finnish letters that were left out of ISO-8859-1)
• cp866 is a Cyrillic charset for DOS;
• cp1251 is a Cyrillic charset for Windows;
• cp866 is a Cyrillic charset for DOS;
• KOI8-R – Russian
• BIG5: Traditional Chinese, mostly used in Taiwan.
• GB2312: Simplified Chinese, national standard character set.
• BIG5: Traditional Chinese, mostly used in Taiwan.

Note: Versions of PHP before 5.4 will ignore character sets that aren’t recognized and use ISO-8859-1 instead. As of PHP 5.4, it will no longer be used and UTF-8 will take its place.
double_encodeOptional. A boolean value that says whether existing HTML entities should be encoded or not.

• TRUE – Default. will turn everything into
• FALSE – Existing HTML entities won’t be encoded.

Technical Details

Return Value:Returns the string that was changed. But if the string parameter has invalid encoding, it will return an empty string unless the ENT IGNORE or ENT SUBSTITUTE flags are set.
PHP Version:4+
Changelog:PHP 5.6: Changed the default value of the character-set parameter to the value of the default charset (in configuration).
The default value for the character-set parameter in PHP 5.4 has been changed to UTF-8.
ENT SUBSTITUTE, ENT DISALLOWED, ENT HTML401, ENT HTML5, ENT XML1, and ENT XHTML were added to PHP 5.4.
The ENT IGNORE constant was added to PHP 5.3.
The double encode parameter has been added to PHP 5.2.3.
In PHP 4.1, the character-set parameter was added.

The function is similar to htmlspecialchars(), with the exception that it handles all possible characters by default.

In the following example, you’ll see how to use the htmlentities() function.

<?php
    $str = "<p>This is <i>valid</i> HTML code</p>";
    echo htmlentities($str);
?>

All the tags that have been turned into entities should be returned by the code above as:

&lt;p&gt;This is &lt;i&gt;valid&lt;/i&gt; HTML code&lt;/p&gt;

Like the htmlspecialchars() function, it supports flags and charset encoding. Find out more by looking at the documentation.

PHP htmlspecialchars()

This function turns all HTML characters that are special or reserved into HTML entities. You can tell the function what to do, but by default it will ignore single quotes.

Syntax:

htmlspecialchars( string,flags,character-set,double_encode )

Parameters:

ParameterDescription
stringRequired. Specifies the string to convert
flagsOptional. Sets how to deal with quotes, invalid encoding, and the type of document that is being used.

The available quote styles are:
• ENT COMPAT – Default. Encodes only double quotes
• ENT QUOTES – Encodes double and single quotes
• ENT NOQUOTES – Does not encode any quotes

Invalid encoding:
• ENT IGNORE: Ignores invalid encoding instead of making the function return an empty string. Should be avoided, because it could be dangerous.
• ENT SUBSTITUTE replaces invalid encoding for a given character set with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; instead of returning an empty string.
• ENT DISALLOWED: Replaces invalid code points in the specified doctype with the Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD;

Extra flags to specify the doctype used:
• ENT HTML401 – Default. Code to be handled as HTML 4.01
• ENT HTML5: Treat code as if it were HTML 5
• ENT XML1 – Handle code as XML 1
• ENT XHTML – Handle code as XHTML
character-setOptional. A string that tells the computer what character set to use.

The values that can be used are:
• ASCII compatible 8-bit multi-byte Unicode ISO-8859-1
• Western European ISO-8859-15 Western European (Adds the Euro symbol and the French and Finnish letters that were left out of ISO-8859-1)
• cp866 is a Cyrillic charset for DOS;
• cp1251 is a Cyrillic charset for Windows;
• cp866 is a Cyrillic charset for DOS;
• KOI8-R – Russian
• BIG5: Traditional Chinese, mostly used in Taiwan.
• GB2312: Simplified Chinese, national standard character set.
• BIG5: Traditional Chinese, mostly used in Taiwan.

Note: Versions of PHP before 5.4 will ignore character sets that aren’t recognized and use ISO-8859-1 instead. As of PHP 5.4, it will no longer be used and UTF-8 will take its place.
double_encodeOptional. A boolean value that says whether existing HTML entities should be encoded or not.

• TRUE – Default. will turn everything into
• FALSE – Existing HTML entities won’t be encoded.

Technical Details

Return Value:Returns the string that was changed.

If the string has invalid encoding, it will return an empty string unless the ENT IGNORE or ENT SUBSTITUTE flags are set.
PHP Version:4+
Changelog:PHP 5.6: Changed the default value of the character-set parameter to the value of the default charset (in configuration).
The default value for the character-set parameter in PHP 5.4 has been changed to UTF-8.
ENT SUBSTITUTE, ENT DISALLOWED, ENT HTML401, ENT HTML5, ENT XML1, and ENT XHTML were added to PHP 5.4.
The ENT IGNORE constant was added to PHP 5.3.
The double encode parameter has been added to PHP 5.2.3.
In PHP 4.1, the character-set parameter was added.

The function takes a string with the HTML that needs to be encoded.

You can also give the method flag values that let you change how it works.

PHP also lets you choose which method of encoding to use for HTML entities.

The charsets that can be used are shown in the image below.

supported charset

The image above is based on PHP Documentation.

The htmlspecialchars() method is shown in the next example.

<?php
    $str = "HTML uses < and > for <em>tags</em>";
    echo htmlspecialchars($str);
?>

In the above example, the HTML characters in the variable $str will be encoded.

Output:

HTML uses &lt; and &gt; for &lt;em&gt;tags&lt;/em&gt;

You can use a flag, as shown in the example below, to tell the function to handle both single and double quotes:

<?php
    $str = "A single quote as 'and' will be ignored by default ";
    echo htmlspecialchars($str, ENT_QUOTES);
?>

Once you run the above code, the function will process the single quotes and give an output as shown:

A single quote as 'and' will be ignored by default

Summary

In summary, the htmlentities function and htmlspecialchars do the same thing, but htmlspecialchars handles all characters by default instead of just a few.

Now that we know how to use these functions, we can use HTML encode to make a very complicated program.

Lastly, if you want to learn more about PHP HTML Encode, please leave a comment below. We’ll be happy to hear it!

Leave a Comment