RFC 4514 specifies how to represent Distinguished Names (DNs) in the Lightweight Directory Access Protocol (LDAP). It outlines the rules for escaping special characters to ensure that DNs are correctly interpreted by LDAP servers and clients. Escaping special characters is necessary to avoid ambiguities and parsing errors. Here are the key reasons why special characters must be escaped:
1. Avoiding ambiguity:
Certain characters have special meanings in DNs. For example:
- Commas (`,`) separate different components of a DN.
- Equal signs (`=`) separate attribute types from attribute values.
- Plus signs (`+`) separates multiple attribute-value pairs within a Relative Distinguished Name (RDN).
Escaping these characters prevents them from being misinterpreted as delimiters, ensuring that the actual content is correctly parsed.
2. Ensuring valid syntax:
LDAP DNs follow a strict syntax. Special characters within attribute values need to be escaped to adhere to this syntax. This ensures that the DNs are valid and can be processed correctly by LDAP servers.
3. Preventing injection attacks:
Escaping special characters helps prevent LDAP injection attacks, where an attacker might try to manipulate the LDAP query by injecting special characters. Properly escaped characters ensure that the input is treated as literal data rather than part of the LDAP query syntax.
4. Handling non-printable and UTF-8 characters:
Non-printable characters and characters outside the ASCII range, such as those with umlauts, must be encoded to ensure compatibility across different systems and character sets. These characters are encoded in hexadecimal prefixed by a backslash (`\`) to make sure they are correctly interpreted.
Examples of escaping characters:
- Comma (`,`):
- Original: `CN=Smith, John`
- Escaped: `CN=Smith\, John`
- Plus sign (`+`):
- Original: `CN=Smith+John`
- Escaped: `CN=Smith\+John`
- Quotation mark (`"`):
- Original: `CN="John Smith"`
- Escaped: `CN=\"John Smith\"`
- Backslash (`\`):
- Original: `CN=John\Smith`
- Escaped: `CN=John\\Smith`
- UTF-8 Characters:
- Original: `CN=Jörg Müller`
- Escaped: `CN=J\C3\B6rg M\C3\BCller`
LDAP Subject Names often need to include various special characters. These characters must be escaped according to the rules defined by RFC 4514. Below is a comprehensive list of special characters that are allowed in LDAP subject names, along with how they should be escaped:
1. Comma (`,`) - `\,`
2. Plus (`+`) - `\+`
3. Quote (`"`) - `\"`
4. Backslash (`\`) - `\\`
5. Less than (`<`) - `\<`
6. Greater than (`>`) - `\>`
7. Semicolon (`;`) - `\;`
8. Equals (`=`) - `\=`
9. Leading or trailing space - `\ ` (escaped with a backslash if it is leading or trailing)
10. Hexadecimal characters: Any character not in the printable ASCII range (except for the special characters listed above) must be represented in the escaped hexadecimal format. For example:
- Newline: `\0A`
- Carriage Return: `\0D`
Examples of escaped characters in Subject Names:
- Comma: `CN=Smith\, John`
- Plus: `CN=Smith\+John`
- Quote: `CN=\"John Smith\"`
- Backslash: `CN=John\\Smith`
- Less than: `CN=John\<Smith`
- Greater than: `CN=John\>Smith`
- Semicolon: `CN=John\;Smith`
- Equals: `CN=John\=Smith`
- Leading space: `CN=\ John`
- Trailing space: `CN=John\ `
Other special characters in UTF-8:
In addition to these special characters, LDAP Subject Names may include a variety of UTF-8 characters, such as accented letters and other international characters. Here are some examples:
1. Accented characters:
- `á` (U+00E1)
- `é` (U+00E9)
- `í` (U+00ED)
- `ó` (U+00F3)
- `ú` (U+00FA)
- `ñ` (U+00F1)
- `ç` (U+00E7)
- `ø` (U+00F8)
- `å` (U+00E5)
- `ü` (U+00FC)
2. Special characters:
- `ß` (U+00DF)
- `Æ` (U+00C6)
- `æ` (U+00E6)
- `Ø` (U+00D8)
- `Å` (U+00C5)
- `Þ` (U+00DE)
- `Ð` (U+00D0)
Examples of International Characters in Subject Names:
- `CN=Jörg Müller`
- `CN=François Dupont`
- `CN=José García`
- `CN=Søren Kierkegaard`
- `CN=Åsa Larsson`
When using these characters in LDAP Subject Names, it's crucial to ensure that the LDAP server and clients are properly configured to support UTF-8 encoding. This guarantees that the characters are correctly interpreted and displayed.
Practical examples:
To encode the Common Name (CN) "Søren Kierkegaard" for use in an LDAP subject name, special characters must be properly encoded to ensure they are correctly interpreted.
Below are examples of how to encode "Søren Kierkegaard":
-
Convert the character "ø" to UTF-8 bytes:
- The character "ø" (U+00F8) in UTF-8 is C3 B8.
-
Represent the UTF-8 bytes as hexadecimal digits prefixed by a backslash:
- "ø" becomes \C3\B8.
-
Construct the LDAP subject name with the encoded character:
- CN=Søren Kierkegaard becomes CN=S\C3\B8ren Kierkegaard.
Below are examples of how to encode "Jörg Müller":
-
Convert the characters to UTF-8 bytes:
- The character "ö" (U+00F6) in UTF-8 is C3 B6.
- The character "ü" (U+00FC) in UTF-8 is C3 BC.
-
Represent the UTF-8 bytes as hexadecimal digits prefixed by a backslash:
- "ø" becomes \C3\B6.
- "ü" becomes \C3\BC
-
Construct the LDAP subject name with the encoded character:
- CN=Jörg Müller becomes CN=J\C3\B6rg M\C3\BCller
This ensures that special characters and UTF-8 characters are properly included and interpreted in LDAP subject names.
Summary:
Escaping special characters in DNs, as specified by RFC 4514, is crucial for ensuring correct interpretation and processing by LDAP servers and clients. It avoids ambiguities, ensures valid syntax, prevents injection attacks, and handles non-printable and UTF-8 characters. Proper escaping guarantees that DNs are accurately and securely represented in LDAP directories.