Tuesday 1 July 2014

C#.Net - Handling special characters in XML strings

Because XML syntax uses some characters for tags and attributes it is not possible to directly use those characters inside XML tags or attribute values.
To include special characters inside XML files you must use the numeric character reference instead of that character. The numeric character reference must be UTF-8 because the supported encoding for XML files is defined in the prolog as encoding="UTF-8" and should not be changed.
The numeric character reference uses the format:

&#nn;  decimal form
&#xhh; hexadecimal form
We can use the SecurityElement.Escape method to replace the invalid XML characters in a string with their valid XML equivalent [1]. 
1
2
//Usage
srtXML = SecurityElement.Escape(strXML);
Namespace: System.Security
Assembly: mscorlib (in mscorlib.dll)
I have used the HttpUtility classes UrlEncode and UrlDecode methods to handle cross-site scripting attacks and this also helped me to get rid of the XmlException – “Data at the root level is invalid”.
The following table shows the invalid XML characters and their respective replacements.

CodeNameDisplayed as
	Horizontal tabnon-printing

Line feednon-printing

Carriage Returnnon-printing
 Spacenon-printing
!Exclamation mark!
"Quotation mark"
#Number sign#
$Dollar sign$
%Percent sign%
&Ampersand&
'Apostrophe'
(Left parenthesis(
)Right parenthesis)
*Asterisk*
+Plus sign+
,Comma,
-Hyphen-
.Period.
/Slash/
:Colon:
&#59;Semi-colon;
&#60;Less than<
&#61;Equals sign=
&#62;Greater than>
&#63;Question mark?
&#64;At@
&#91;Left square bracket[
&#92;Bbackslash\
&#93;Right square bracket]
&#94;Caret^
&#95;Underscore_
&#96;Acute accent`
&#123;Left curly brace{
&#124;Vertical bar|
&#125;Right curly brace}
&#126;Tilde~

Reference:
1. MSDN