diff --git a/doc/draft/draft-ietf-idn-uri-02.txt b/doc/draft/draft-ietf-idn-uri-02.txt new file mode 100644 index 0000000000..aabaa9c541 --- /dev/null +++ b/doc/draft/draft-ietf-idn-uri-02.txt @@ -0,0 +1,449 @@ + + +Network Working Group M. Duerst +Internet-Draft W3C/Keio University +Expires: December 30, 2002 July 1, 2002 + + + Internationalized Domain Names in URIs + draft-ietf-idn-uri-02 + +Status of this Memo + + This document is an Internet-Draft and is in full conformance with + all provisions of Section 10 of RFC2026. + + Internet-Drafts are working documents of the Internet Engineering + Task Force (IETF), its areas, and its working groups. Note that + other groups may also distribute working documents as Internet- + Drafts. + + Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as "work in progress." + + The list of current Internet-Drafts can be accessed at http:// + www.ietf.org/ietf/1id-abstracts.txt. + + The list of Internet-Draft Shadow Directories can be accessed at + http://www.ietf.org/shadow.html. + + This Internet-Draft will expire on December 30, 2002. + +Copyright Notice + + Copyright (C) The Internet Society (2002). All Rights Reserved. + +Abstract + + This document proposes to upgrade the definition of URIs (RFC 2396) + [RFC2396] to work consistently with internationalized domain names. + + + + + + + + + + + + + +Duerst Expires December 30, 2002 [Page 1] + +Internet-Draft IDNs in URIs July 2002 + + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 + 2. URI syntax changes . . . . . . . . . . . . . . . . . . . . . . 3 + 3. Security considerations . . . . . . . . . . . . . . . . . . . 5 + 4. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 5 + 4.1 Changes from draft-ietf-idn-uri--01 to draft-ietf-idn-uri-02 . 5 + 4.2 Changes from draft-ietf-idn-uri--00 to draft-ietf-idn-uri-01 . 5 + References . . . . . . . . . . . . . . . . . . . . . . . . . . 5 + Author's Address . . . . . . . . . . . . . . . . . . . . . . . 7 + Full Copyright Statement . . . . . . . . . . . . . . . . . . . 8 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Duerst Expires December 30, 2002 [Page 2] + +Internet-Draft IDNs in URIs July 2002 + + +1. Introduction + + Internet domain names serve to identify hosts and services on the + Internet in a convenient way. The IETF IDN working group [IDNWG] has + been working on extending the character repertoire usable in domain + names beyond a subset of US-ASCII. + + One of the most important places where domain names appear are + Uniform Resource Identifiers (URIs, [RFC2396], as modified by + [RFC2732]). However, in the current definition of the generic URI + syntax, the restrictions on domain names are 'hard-coded'. In + Section 2, this document relaxes these restrictions by updating the + syntax, and defines how internationalized domain names are encoded in + URIs. + + The syntax in this document has been choosen to further increase the + uniformity of URI syntax, which is a very important principle of + URIs. + + In practice, escaped domanin names should be used as rarely as + possible. Wherever possible, the actual characters in + Internationalized Domain Names should be preserved as long as + possible by using IRIs [IRI] rather than URIs, and only converting to + URIs and then to ACE-encoded [IDNA] domain names (or ideally directly + to ACE-encoding without even using URIs) when resolving the IRI. + Also, this document does in no way exclude the use of ACE encoding + directly in an URI domain name part. ACE encoding may be used + directly in an URI domain name part if this is considered necessary + for interoperability. + + Please note that even with the definition of URIs in [RFC2396], some + URIs can already contain host names with escaped characters. For + example, mailto:example@w%33.org is legal per [RFC2396] because the + mailto: URI scheme does not follow the generic syntax of [RFC2396]. + +2. URI syntax changes + + The syntax of URIs [RFC2396] currently contains the following rules + relevant to domain names: + + hostname = *( domainlabel "." ) toplabel [ "." ] + domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum + toplabel = alpha | alpha *( alphanum | "-" ) alphanum + + + + + + + + +Duerst Expires December 30, 2002 [Page 3] + +Internet-Draft IDNs in URIs July 2002 + + + The later two rules are changed as follows: + + domainlabel = anchar | anchar *( anchar | "-" ) anchar + toplabel = achar | achar *( anchar | "-" ) anchar + + and the following rules are added: + + anchar = alphanum | escaped + achar = alpha | escaped + + Characters outside the repertoire (alphanum) are encoded by first + encoding the characters in UTF-8 [RFC 2279], resulting in a sequence + of octets, and then escaping these octets according to the rules + defined in [RFC2396]. + + Using UTF-8 assures that this encoding interoperates with IRIs [IRI]. + It is also aligned with the recommendations in [RFC2277] and + [RFC2718], and is consistent with the URN syntax [RFC2141] as well as + recent URL scheme definitions that define encodings of non-ASCII + characters based on UTF-8 (e.g., IMAP URLs [RFC2192] and POP URLs + [RFC2384]). + + The above syntax rules permit for domain names that are neither + permitted as US-ASCII only domain names nor as internationalized + domain names. However, such syntax should never be used, and will + always be rejected by resolvers. For US-ASCII only domain names, the + syntax rules in [RFC2396] are relevant. For example, http:// + www.w%33.org is legal, because the corresponding 'w3' is a legal + 'domainlabel' according to [RFC2396]. However, http:// + %2a.example.org is illegal because the corresponding '*' is not a + legal 'domainlabel' according to [RFC2396]. For domain names + containing non-ASCII characters, the legal domain names are those for + which the ToASCII operation ([IDNA], [Nameprep]; using the unescaped + UTF-8 values as input) is successful. + + For consistency in comparison operations and for interoperability + with older software, the following should be noted: 1) US-ASCII + characters in domain names should not be escaped. 2) Because of the + principle of syntax uniformity for URIs, it is always more prudent to + take into account the possibility that US-ASCII characters are + escaped. + + The work of the IDN WG includes some procedures for name preparation + [Nameprep]. Before encoding an internationalized domain name in an + URI, this preparation step SHOULD be applied. However, the URI + resolver MUST also apply any steps required as part of domain name + resolution by [IDNA]. + + + + +Duerst Expires December 30, 2002 [Page 4] + +Internet-Draft IDNs in URIs July 2002 + + +3. Security considerations + + The security considerations of [RFC2396] and those applying to + internationalized domain names apply. There may be an increased + potential to smuggle escaped US-ASCII-based domain names across + firewalls, although because of the uniform syntax principle for URIs, + such a potential is already existing. + +4. Change Log + +4.1 Changes from draft-ietf-idn-uri--01 to draft-ietf-idn-uri-02 + + Moved change log to back + + Changed to only change URIs; IRI syntax updated directly in IRI + draft. + + Removed syntax restriction on %hh in the US-ASCII part, but made + clear that restrictions to domain names apply. + + Made clear that escaped domain names in URIs should only be an + intermediate representation. + + Gave example of mailto: as already allowing escaped host names. + +4.2 Changes from draft-ietf-idn-uri--00 to draft-ietf-idn-uri-01 + + Changed requirement for URI/IRI resolvers from MUST to SHOULD + + Changed IRI syntax slightly (ichar -> idchar, based on changes in + [IRI]) + + Various wording changes + +References + + [IDNA] Faltstrom, P., Hoffman, P. and A. Costello, + "Internationalizing Domain Names in Applications (IDNA)", + draft-ietf-idn-idna-09.txt (work in progress), May 2002, + . + + [IDNWG] "IETF Internationalized Domain Name (idn) Working Group". + + [IRI] Duerst, M. and M. Suignard, "Internationalized Resource + Identifiers (IRI)", draft-duerst-iri-01 (work in + progress), July 2002. + + + + +Duerst Expires December 30, 2002 [Page 5] + +Internet-Draft IDNs in URIs July 2002 + + + [ISO10646] International Organization for Standardization, + "Information Technology - Universal Multiple-Octet Coded + Character Set (UCS) - Part 1: Architecture and Basic + Multilingual Plane", ISO Standard 10646-1, October 2000. + + [Nameprep] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep + Profile for Internationalized Domain Names", draft-ietf- + idn-nameprep-10.txt (work in progress), May 2002, . + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. + + [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. + + [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and + Languages", BCP 18, RFC 2277, January 1998. + + [RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO + 10646", RFC 2279, January 1998. + + [RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998. + + [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform + Resource Identifiers (URI): Generic Syntax", RFC 2396, + August 1998. + + [RFC2640] Curtin, B., "Internationalization of the File Transfer + Protocol", RFC 2640, July 1999. + + [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke, + "Guidelines for new URL Schemes", RFC 2718, November + 1999. + + [RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for + Literal IPv6 Addresses in URL's", RFC 2732, December + 1999. + + + + + + + + + + + +Duerst Expires December 30, 2002 [Page 6] + +Internet-Draft IDNs in URIs July 2002 + + +Author's Address + + Martin Duerst + W3C/Keio University + 5322 Endo + Fujisawa 252-8520 + Japan + + Phone: +81 466 49 1170 + Fax: +81 466 49 1171 + EMail: duerst@w3.org + URI: http://www.w3.org/People/D%C3%BCrst/ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Duerst Expires December 30, 2002 [Page 7] + +Internet-Draft IDNs in URIs July 2002 + + +Full Copyright Statement + + Copyright (C) The Internet Society (2002). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Duerst Expires December 30, 2002 [Page 8] + +