What is URL Encoding?
We know URL, composed by characters, is used as a locater of resources on Internet. And a browser is one of the entrances from Computer, which has its own character encoding rules. There is a boundary between Internet and Computer, so the rules of character encoding are different.
So URL encoding converts characters into a format that can be transmitted over the Internet.
What is the rules of URL encoding?
URL can only be sent over the Internet using the ASCII character-set. Only alphanumerics [0-9a-zA-Z], the special characters “$-_.+!*’(),” [not including the quotes], and reserved characters used for their reserved purposes may be used un-encoded within a URL.
When we send URL via browser from Computer, we always input characters outside of ASCII set. So URL must be converted into a valid ASCII format.
But this is not enough, there are categories of safe, reserved, unsafe characters:
|Classification||Included characters||Encoding required?|
|ASCII Control characters||Includes the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal.)||YES|
|Non-ASCII characters||Includes the entire “top half” of the ISO-Latin set 80-FF hex (128-255 decimal.)||YES|
|Unsafe characters||Includes the blank/empty space and
* Note: Reserved characters only need encoding when not used for their defined, reserved purposes.
To make it clear, there are 3 rules:
URI characters before “?”, it is encoded by UTF-8 with Percentage Encoding. Which means a non-ascii character, for example Chinese character 春 is encoded as “%E6%98%A5”
URI characters after “?”, it is encoded based by its Operation System, for Chinese characters, it might be GB2312 or UTF-8.
When GET/POST or make HTTP request in a web browser, the encoding is determined by
<meta http-equiv="Content-Type" content="text/html;charset=xxxx">
As we can see, there are different browsers, OS and web page charsets. There is, indeed, a silver bullet to handle this mess.
Encode whatever in a URL according to rule 1, 2, 3. before sending it to server side.
Summary and Tips
- Get to know where is the boundary, the characters your typing in between the characters sent to Internet.
- Be clear about the characters you deal with, do they require encoding?
- Don’t mess up JSON encoding, OData Encoding or other web related encoding rules.