The Encoded Result
Encoding the input search query=hello world&category=café & résumé&redirect=https://example.com/path?id=42 produces:
search%20query%3Dhello%20world%26category%3Dcaf%C3%A9%20%26%20r%C3%A9sum%C3%A9%26redirect%3Dhttps%3A%2F%2Fexample.com%2Fpath%3Fid%3D42
Every character that is not an unreserved character (A-Z a-z 0-9 - _ . ~) has been replaced with its percent encoded form. The = signs, & characters, the accented letters, and the full URL in the redirect parameter are all safely encoded.
Percent Encoding (RFC 3986)
Percent encoding is defined in RFC 3986, the specification for Uniform Resource Identifiers. The rule is simple: any byte that is not an unreserved character is encoded as a percent sign followed by two uppercase hexadecimal digits representing that byte’s value.
Unreserved characters that never need encoding:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~
Everything else (spaces, punctuation, non ASCII, control characters) must be encoded when it appears in a position where the URL parser would misinterpret it.
Why Special Characters Break URLs
A URL has defined syntax. Characters like ?, &, =, #, /, and : each have specific roles in the URL structure:
?separates the path from the query string&separates query parameters from each other=separates a parameter key from its value#starts the fragment identifier/separates path segments:separates the scheme from the rest
If a query parameter value contains any of these characters without encoding, the URL parser reads them as structural markers and the parameter value gets cut off or misinterpreted. A redirect URL like https://example.com/path?id=42 as a parameter value contains ?, =, :, and /. All four would break the outer URL’s structure if not encoded.
Example of what goes wrong without encoding:
# Intended: one parameter called "redirect"
https://api.example.com/login?redirect=https://example.com/path?id=42
# What the parser sees: three parameters
# redirect = https://example.com/path
# id = 42
# (fragment separator or next param depending on the parser)
With encoding:
https://api.example.com/login?redirect=https%3A%2F%2Fexample.com%2Fpath%3Fid%3D42
Now the parser sees one parameter with an opaque value that the application can decode and use safely.
encodeURI vs encodeURIComponent
JavaScript provides two built in encoding functions. They are not interchangeable.
encodeURI is for encoding a complete URL. It leaves all characters that are meaningful in URL structure untouched, because the assumption is the URL already has valid structure and you just want to fix any non ASCII characters or literal spaces:
encodeURI("https://example.com/search?q=hello world")
// "https://example.com/search?q=hello%20world"
// Note: ? and = are NOT encoded
encodeURIComponent is for encoding a value that will be embedded inside a URL component (a query parameter value, a path segment, or a fragment). It encodes everything that could be mistaken for URL structure:
encodeURIComponent("hello world&goodbye=world")
// "hello%20world%26goodbye%3Dworld"
// Note: & and = ARE encoded
For building query strings, always use encodeURIComponent on both keys and values:
const params = {
q: "café & résumé",
redirect: "https://example.com/path?id=42"
};
const queryString = Object.entries(params)
.map(([k, v]) => `${encodeURIComponent(k)}=${encodeURIComponent(v)}`)
.join("&");
// q=caf%C3%A9%20%26%20r%C3%A9sum%C3%A9&redirect=https%3A%2F%2Fexample.com%2Fpath%3Fid%3D42
Or use URLSearchParams, which handles encoding automatically:
const params = new URLSearchParams({
q: "café & résumé",
redirect: "https://example.com/path?id=42"
});
params.toString();
// q=caf%C3%A9+%26+r%C3%A9sum%C3%A9&redirect=https%3A%2F%2Fexample.com%2Fpath%3Fid%3D42
Note that URLSearchParams uses form encoding. Spaces become +, not %20. This is correct for query strings submitted by HTML forms, and most servers handle both. If you need strict RFC 3986 encoding with %20 for spaces, use encodeURIComponent manually.
The + vs %20 Space Debate
Two specifications encode spaces differently:
- RFC 3986 (URI): Space ->
%20 - HTML form encoding (application/x-www-form-urlencoded): Space ->
+
Both are widely accepted by web servers. The ambiguity bites when you decode: a + in the query string is a space only if the value was form encoded. In a URL that was constructed directly (not from a form submission), a literal + might be intentional. Always know which encoding convention the sender used before decoding.
In Python, the difference is explicit:
from urllib.parse import quote, quote_plus
quote("hello world") # "hello%20world" (RFC 3986)
quote_plus("hello world") # "hello+world" (form encoding)
For API development, prefer %20. For HTML form handling, expect + and use the appropriate decoder.
Encoding Non ASCII Characters
Non ASCII characters go through two steps:
- Encode the character as UTF-8 bytes.
- Percent encode each byte.
The letter é (U+00E9) is one code point but two UTF-8 bytes: 0xC3 and 0xA9. The percent encoded form is %C3%A9.
The emoji 😀 (U+1F600) is four UTF-8 bytes (0xF0 0x9F 0x98 0x80), encoding to %F0%9F%98%80.
You can verify this in a browser console:
encodeURIComponent("é") // "%C3%A9"
encodeURIComponent("😀") // "%F0%9F%98%80"
Or in Python:
from urllib.parse import quote
quote("é") # "%C3%A9"
quote("😀") # "%F0%9F%98%80"
UTF-8 is the universal standard for percent encoding non ASCII characters. Do not assume Latin-1 or Windows-1252 encoding unless you are working with a legacy system that explicitly documents it.
Common Mistakes
Encoding the full URL instead of just parameter values
If you pass https://example.com/path?q=hello to encodeURIComponent, the colons, slashes, and question mark all get encoded, producing a broken string. Encode parameter values, not full URLs.
Forgetting to encode parameter keys
Keys can also contain special characters if they come from user input. Encode both keys and values.
Double encoding
If you receive a URL encoded string and encode it again, every % becomes %25. The decoded result will contain literal %XX sequences instead of the intended characters. Decode first, then reencode if the structure has changed.
Assuming the server will handle malformed input
Some servers are lenient and decode % sequences that appear in unencoded input. Others are strict. Do not rely on server side leniency. Encode correctly on the client.