mirror of
https://gitlab.isc.org/isc-projects/bind9
synced 2025-08-31 06:25:31 +00:00
new draft
This commit is contained in:
1175
doc/draft/draft-jseng-idn-admin-01.txt
Normal file
1175
doc/draft/draft-jseng-idn-admin-01.txt
Normal file
File diff suppressed because it is too large
Load Diff
437
doc/draft/draft-klensin-idn-tld-00.txt
Normal file
437
doc/draft/draft-klensin-idn-tld-00.txt
Normal file
@@ -0,0 +1,437 @@
|
||||
INTERNET-DRAFT John C Klensin
|
||||
21 October 2002
|
||||
Expires April 2003
|
||||
|
||||
National and Local Characters in DNS TLD Names
|
||||
draft-klensin-idn-tld-00.txt
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance
|
||||
with all provisions of Section 10 of RFC2026 except that the
|
||||
right to produce derivative works is not granted.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering
|
||||
Task Force (IETF), its areas, and its working groups. Note that
|
||||
other groups may also distribute working documents as
|
||||
Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six
|
||||
months and may be updated, replaced, or obsoleted by other
|
||||
documents at any time. It is inappropriate to use Internet-
|
||||
Drafts as reference material or to cite them other than as
|
||||
"work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
Internet-Drafts are working documents of the Internet Engineering
|
||||
Task Force (IETF), its areas, and its working groups. Note that
|
||||
other groups may also distribute working documents as
|
||||
Internet-Drafts.
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
In the context of work on internationalizing the Domain Name System
|
||||
(DNS), there have been extensive discussions about "multilingual" or
|
||||
"internationalized" top level domain names (TLDs), especially for
|
||||
countries whose predominant language is not written in a Roman-based
|
||||
script. This document reviews some of the motivations for such
|
||||
domains and the constraints that the DNS imposes. It then suggests
|
||||
an alternative, local translation, that may solve a superset of the
|
||||
problem while avoiding protocol changes, serious deployment delays,
|
||||
and other difficulties.
|
||||
|
||||
Table of Contents
|
||||
|
||||
1 Introduction
|
||||
1.1 Background on the "Multilingual Name" Problem
|
||||
1.2 Domain Name System Constraints
|
||||
1.3 Internationalization and Localization
|
||||
2. Client-side solutions
|
||||
2.1 IDNA and the client
|
||||
2.2 Local translation tables for TLD names
|
||||
3. Advantages and disadvantages of local translation
|
||||
3.1 Every TLD in the local language and character set
|
||||
|
||||
3.2 Unification of country code domains
|
||||
3.3 User understanding of local and global reference
|
||||
3.4 Limits on TLD propagation
|
||||
4. Security Considerations
|
||||
5. References
|
||||
6. Acknowledgements
|
||||
7. Author's Address
|
||||
|
||||
|
||||
1. Introduction
|
||||
|
||||
1.1 Background on the "Multilingual Name" Problem
|
||||
|
||||
People who share a language prefer to communicate in it, using whatever
|
||||
characters are normally used to write that language, rather than in some
|
||||
"foreign" one. There have been standards for using mutually-agreed
|
||||
characters and languages in electronic mail message bodies and selected
|
||||
headers since the introduction of MIME in 1992 [MIME] and the Web has
|
||||
permitted multilingual text since its inception. However, since domain
|
||||
names are exposed to users in email addresses and URLs, and
|
||||
corresponding arrangements in other protocols, demand rapidly arose to
|
||||
permit domain names in applications that used characters other than
|
||||
those of the very restrictive, ASCII-subset, "LDH" conventions [LDH].
|
||||
The effort to do this rapidly became known as "multilingual domain
|
||||
names", although that is a misnomer, since the DNS deals only with
|
||||
characters and identifier strings, and not, except by accident, what
|
||||
people usually think of as "names". And there has been little actual
|
||||
interest in what would actually be a "multilingual name" -- i.e., a name
|
||||
that contains components from more than one language -- but only the use
|
||||
of strings conforming to different languages in the context of the DNS.
|
||||
|
||||
1.1.1 Approaches to the requirement
|
||||
|
||||
If the requirement is seen, not as "modifying the DNS", but as
|
||||
"providing users with access to the DNS from a variety of languages and
|
||||
character sets", three sets of proposals have emerged in the IETF and
|
||||
elsewhere. They are:
|
||||
|
||||
(1) Perform processing in client software that recodes a user-visible
|
||||
string into an ASCII-compatible form that can safely be passed
|
||||
through the DNS protocols and stored in the DNS. This is the
|
||||
approach used, for example, in the IETF's "IDNA" protocol [IDNA].
|
||||
|
||||
(2) Modify the DNS to be more hospitable to non-ASCII names and
|
||||
strings. There have been a variety of proposals to do this in almost
|
||||
as many ways, some of which have been implemented on a proprietary
|
||||
basis by various vendors. None of them have gained acceptance in the
|
||||
IETF community, primarily because they would take a long time to
|
||||
deploy and would leave many problems unsolved.
|
||||
|
||||
(3) Move the problem out of the DNS entirely, relying instead on a
|
||||
"directory" or "presentation" layer to handle internationalization.
|
||||
The rationale for this approach is discussed in [DNSROLE].
|
||||
|
||||
This document proposes a fourth approach, applicable to the top level
|
||||
domains (TLDs) only (see section 1.2.1 for a discussion of the special
|
||||
issues that make TLDs problematic). That approach could be used as an
|
||||
alternate or supplement to the strategies summarized above.
|
||||
|
||||
|
||||
1.1.2 Writing the name of one's country in its own characters
|
||||
|
||||
An early focus of the "multilingual domain name" efforts was expressed
|
||||
in statements such as "users in my country, in which ASCII is rarely
|
||||
used, should be able to write an entire domain name in their own
|
||||
character set. In particular, since all top-level domain names, at
|
||||
present, follow the LDH rules, the somewhat more restrictive naming
|
||||
rules discussed in [STD3], and the coding conventions specified in
|
||||
[RFC1591], all fully-qualified DNS names were effectively required to
|
||||
contain at least one ASCII label (the TLD name), and that was considered
|
||||
inappropriate. One should, instead, be able to write the name of the
|
||||
ccTLD for China in Chinese, the name of the ccTLD for Saudi Arabia in
|
||||
Arabic, and so on.
|
||||
|
||||
1.1.3 Countries with multiple languages and countries with multiple
|
||||
names
|
||||
|
||||
>From a user interface standpoint, writing ccTLD names in local
|
||||
characters is a problem. As discussed in section 1.2.2, the DNS itself
|
||||
does not easily permit a domain to be referred to by more than one name
|
||||
(or spelling or translation of a name). Countries with more than one
|
||||
official language would require that the country name be represented in
|
||||
each of those languages. And, just as it is important that a user in
|
||||
China be able to represent the name of the Chinese ccTLD in Chinese
|
||||
characters, she should be able to access a Chinese-language site in
|
||||
France using Chinese characters, requiring that she be able to write the
|
||||
name of the French ccTLD in those characters rather than in a form based
|
||||
on a Roman character set.
|
||||
|
||||
|
||||
1.2 Domain Name System Constraints
|
||||
|
||||
1.2.1 Administrative hierarchy
|
||||
|
||||
The domain name system is designed around the idea of an "administrative
|
||||
hierarchy", with the entity responsible for a given node of the
|
||||
hierarchy responsible for policies applicable to its subhierarchies (Cf.
|
||||
[STD13]). The model works quite well for the domain and subdomains of a
|
||||
particular enterprise, where the hierarchy can be organized to match the
|
||||
organizational structure, there are established ways to set policies and
|
||||
there is, at least presumably, shared assumptions about overall goals
|
||||
and objectives among all registrants in the domain. It is more
|
||||
problematic when a domain is shared by unrelated entities which lack
|
||||
common policy assumptions. It is difficult to reach agreement on rules
|
||||
that should apply to all of them. That situation always prevails for
|
||||
the labels registered in a TLD (second-level names) except in those TLDs
|
||||
for which the second level is structural (e.g., the .CO, .AC, .GOV
|
||||
conventions in many ccTLD) in which case, it exists for the labels
|
||||
within that structural level.
|
||||
|
||||
TLDs may, but need not, have consistent registration policies for those
|
||||
second (or third) level names. Countries (or ccTLD administrators) have
|
||||
often adopted rules about what entities may register in those ccTLDs,
|
||||
and the forms the names may take. RFC 1591 outlined registration norms
|
||||
for most of the gTLDs, even though those norms have been largely ignored
|
||||
in recent years. And some recent "sponsored" domains are based on quite
|
||||
specific rules about appropriate registrations. Homogeneous
|
||||
|
||||
registration rules for the root are, by contrast, impossible: almost by
|
||||
definition, the subdomains registered in it are diverse and no single
|
||||
policy applying to all root subdomains (TLDs) is feasible.
|
||||
|
||||
1.2.2 Aliases
|
||||
|
||||
In an environment different from the DNS, a rational way to permit
|
||||
assigning local-language names to a country code (or other) domain would
|
||||
be to set up an alias for the name, or to use some sort of "see instead"
|
||||
reference. But the DNS does not have quite the right facilities for
|
||||
either. Instead, it supports a "CNAME" record, whose label can refer
|
||||
onto to a particular label and not to a subtree. For example, if A.B.C
|
||||
is a fully-qualified name, then a CNAME reference from X to A would make
|
||||
X.B.C appear to have the same values as A.B.C. However, a CNAME
|
||||
reference from Y to C would not make A.B.Y referenceable (or even
|
||||
defined) at all. A second record type, DNAME [RFC2672], can provide an
|
||||
alias for a portion of the tree. But it is problematic technically, and
|
||||
its use is strongly discouraged except for transition uses from one
|
||||
domain to another.
|
||||
|
||||
|
||||
1.3 Internationalization and Localization
|
||||
|
||||
It has often been observed that while many people talk about
|
||||
"internationalization" (a term we typically use for making something
|
||||
globally accessible while incorporating a broad-range "universal"
|
||||
character set and conventions appropriate to all languages), they often really
|
||||
mean, and want, "localization" (making things work well in a particular
|
||||
locality, or well, but potentially differently, for a broad range of
|
||||
localities). Anything that actually involves the DNS must be global and
|
||||
hence internationalized since the DNS cannot meaningfully support
|
||||
different responses based, e.g., on the location of the user making a
|
||||
query. While the DNS cannot support localization internally, many of
|
||||
the features discussed earlier in this section are much more easily
|
||||
thought about in local terms --whether localized to a geographical area,
|
||||
users of a language, or using some other criteria -- than in global ones.
|
||||
|
||||
2. Client-side solutions
|
||||
|
||||
Traditionally, the IETF has avoided becoming involved in standardization
|
||||
for actions that take place strictly on individual hosts on the network,
|
||||
assuming that it should confine itself to behavior that is observable
|
||||
"on the wire", i.e., in protocols between network hosts. Exceptions to
|
||||
this general principle have been made when different clients were
|
||||
required to utilize data or interpret values in compatible ways to
|
||||
preserve interoperability: the standards for email and web body formats,
|
||||
and IDNA itself, are examples of these exceptions. Regardless of what
|
||||
is required to be standardized, it is almost never required, and often
|
||||
unwise, that a user interface, by default, present on-the-wire formats
|
||||
to the user. However, in most cases when the presentation format and
|
||||
the wire format differ, the client program must take precautions that
|
||||
the wire format can be reconstructed from user input, or to keep the
|
||||
wire format, while hidden, bound to the presentation mechanism so that
|
||||
it can be reconstructed. And, while it is rarely a goal in itself, it
|
||||
is often necessary that the user be at least vaguely aware that the wire
|
||||
("real") format is different from the presentation one and that the wire
|
||||
format be available for debugging.
|
||||
|
||||
|
||||
2.1 IDNA and the client
|
||||
|
||||
As mentioned above, IDNA itself is entirely a client-side protocol. It
|
||||
works by providing labels to the DNS in a special format (so-called
|
||||
"ACE"). When labels in that format are encountered, they are
|
||||
transformed, by the client, back into internationalized (normally
|
||||
Unicode) characters. In the context of this document, the important
|
||||
obvservation about IDNA is that any application program that supports it
|
||||
is already doing considerable transformation work on the client; it is
|
||||
not simply presenting the on-the-wire formats to the user.
|
||||
|
||||
|
||||
2.2 Local translation tables for TLD names
|
||||
|
||||
We suggest that, in addition to maintaining the code and tables required
|
||||
to support IDNA, clients may want to maintain a table that contains a
|
||||
list of TLDs and that maps between them and locally-desirable names.
|
||||
For ccTLDs, these might be the names (or locally-standard abbreviations)
|
||||
by which the relevant countries are known locally (whether in ASCII
|
||||
characters or others). With some care on the part of the application
|
||||
designer (e.g., to ensure that local forms do not conflict with the
|
||||
actual TLD names), a particular TLD name input from the user could be
|
||||
either in local or standard form without special tagging or problems.
|
||||
When DNS names are received by these client programs, the TLD labels
|
||||
would be mapped to local form before IDNA is applied to the rest of the
|
||||
name; when names are received from users, local TLD names would be
|
||||
mapped to the global ones before being passed into IDNA or for other DNS
|
||||
processing.
|
||||
|
||||
|
||||
3. Advantages and disadvantages of local translation
|
||||
|
||||
3.1 Every TLD in the local language and character set
|
||||
|
||||
The notion of a top-level domain whose name matches, e.g., the name that
|
||||
is used for a country in that country or the name of a language in that
|
||||
language as, as mentioned above, immediately appealing. But most of the
|
||||
reasons for it argue equally strongly for other TLDs being accessible
|
||||
from that language. A user in Korea who can access the national ccTLD
|
||||
in the Korean language and character set has every reason to expect that
|
||||
both generic top level domains and and domains associated with other
|
||||
countries would be similarly accessible, especially if the second-level
|
||||
domains bear Korean names. A user in Spain or Portugal, or in Latin
|
||||
America, would presumably have similar expectations, but would expect to
|
||||
use Spanish names, not Korean ones.
|
||||
|
||||
That level of local optimization is not realistic --some would argue not
|
||||
possible-- with the DNS since it would ultimately require that every top
|
||||
level domain be replicated for each of the world's languages. That
|
||||
replication process would involve not just the top level domain itself:
|
||||
in principle, all of its subtrees would need to be completely replicated
|
||||
as well (or at least all of the subtrees for which a the language
|
||||
associated with the a given replicant was relevant). The administrative
|
||||
hierarchy characteristics of the DNS (see section 1.2.1) turn the
|
||||
replication process into an administrative nightmare: every
|
||||
administrator of a second-level domain in the world would be forced to
|
||||
maintain dozens, probably hundreds, of similar zone files for the the
|
||||
replicates of the domain. Even if only the zones relevant to a
|
||||
|
||||
particular country or language were replicated, the administrative and
|
||||
tracking problems to bind these to the appropriate top-level domain and
|
||||
keep all of the replicas synchronized would be extremely difficulty at
|
||||
best. And many administrators of third- and fourth-level domains, and
|
||||
beyond, would be faced with similar problems.
|
||||
|
||||
By contrast, dealing with the names of TLDs as a localization problem,
|
||||
using local translation, is fairly simple. Each function represented by
|
||||
a TLD -- a country, generic registrations, or purpose-specific
|
||||
registrations -- could be represented in the local language and
|
||||
character set as needed. And, for countries with many languages, or
|
||||
users living, working, or visiting countries where their language was
|
||||
not dominant, "local" could be defined in terms of the needs or wishes
|
||||
of each particular user.
|
||||
|
||||
3.2 Unification of country code domains
|
||||
|
||||
It follows from some of the comments above that, while there appears to
|
||||
be some immediate appeal from having (at least) two domains for each
|
||||
country, one using the ISO 3166-1 code and another one using a name
|
||||
based on the national name in the national language, such a situation
|
||||
would create considerable problems for registrants in the multiple
|
||||
domains. For registrants maintaining enterprise or organizational
|
||||
subdomains, ease of administration in a single family of zone files will
|
||||
usually make a registration in a single top-level domain preferable to
|
||||
replicated sets of them, at least as long as their functional
|
||||
requirements (such a local-language access) are met by the unified
|
||||
structure.
|
||||
|
||||
Of course, having replicated domains might be popular with registries
|
||||
and registrars, since replication would almost inevitably increase the
|
||||
total number of domains to be registered.
|
||||
|
||||
3.3 User understanding of local and global references
|
||||
|
||||
While the IDNA tables (actually Nameprep and Stringprep -- see the IDNA
|
||||
specification) must be identical globally for IDNA to work reliably, the
|
||||
tables for mapping between local names and TLD names could be locally
|
||||
determined, and differ from one locale to another, as long as users
|
||||
understood that international interchange of names required using the
|
||||
standard forms. That understanding could be assisted by software. It
|
||||
is likely that, at least for the foreseeable future, DNS names being
|
||||
passed among users in different countries, or using different languages,
|
||||
will be forced to be in ACE form to guarantee compatibility in any
|
||||
event, so the marginal knowledge or effort needed to put TLD names into
|
||||
standard form and transmit them that way would be very small.
|
||||
|
||||
3.4 Limits on TLD propagation
|
||||
|
||||
The concept of using local translation does have one side-effect, which
|
||||
some portions of the Internet community might consider undesirable.
|
||||
The size and complexity of translation tables, and maintaining those
|
||||
tables, will be, to a considerable extent, a function of the number of
|
||||
top-level domains, the frequency with which new domains are added, and
|
||||
the number of domains that are added at a time. A country or other
|
||||
locale that wished to maintain a few set of translations (i.e., so that
|
||||
every TLD had a representation in the local language) would presumably
|
||||
find setting up a table for the current collection of a few hundred
|
||||
|
||||
domains to be a task that would take some days. If the number of TLDs
|
||||
was relatively stable, with a relatively small number being added at
|
||||
infrequent intervals, the updates could probably be dealt with on an ad
|
||||
hoc basis. But, if large numbers of domains were added frequently, or
|
||||
if the total number of TLDs became very large, maintaining the table
|
||||
might require dedicated staff. Worse, updating the tables stored on
|
||||
client machines might require update and synchronization protocols and
|
||||
all of the related complexities.
|
||||
|
||||
4. Security Considerations
|
||||
|
||||
IDNA provides a client-based mechanism for presenting Unicode names in
|
||||
applications while passing only ASCII-based names on the wire. As such,
|
||||
it constitutes a major step along the path of introducing a client-based
|
||||
presentation layer into the Internet. Client-based presentation layer
|
||||
transformations introduce risks from variant tables that can change
|
||||
meaning without external protection. For example, if a mapping table
|
||||
normally maps A onto C and that table is altered by an attacker so that
|
||||
A maps onto D instead, much mischief can be committed. On the other
|
||||
hand, these are not the usual sort of network attacks: they may be
|
||||
thought of as falling into the "users can always cause harm to
|
||||
themselves" category. The local translation model outlined here does
|
||||
not significantly increase the risks over those associated with IDNA,
|
||||
but may provide some new avenues for exploiting them.
|
||||
|
||||
Both this approach and IDNA rely on having updated programs present
|
||||
information to the user in a very different form than the one in which
|
||||
it is transmitted on the wire. Unless the internal (wire) form is
|
||||
always used in interchange, there are possibilities for ambiguity and
|
||||
confusion about references.
|
||||
|
||||
5. References
|
||||
|
||||
[DNSROLE] Klensin, J.C., "Role of the Domain Name System", work in
|
||||
progress (draft-klensin-dns-role-04.txt).
|
||||
|
||||
[IDNA] Faltstorm, F., P. Hoffman, A. M. Costello, "Internationalizing
|
||||
Domain Names in Applications (IDNA)", work in progress
|
||||
(draft-ietf-idn-idna-13.txt)
|
||||
|
||||
[LDH] STD13 and comments
|
||||
|
||||
[MIME] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet Mail
|
||||
Extensions): Mechanisms for Specifying and Describing the Format of
|
||||
Internet Message Bodies", RFC 1341, June 1992. Updated and replaced
|
||||
by Freed, N. and N. Borenstein, "Multipurpose Internet Mail
|
||||
Extensions (MIME) Part One: Format of Internet Message Bodies",
|
||||
RFC2045, November 1996. Also, Moore, K., "Representation of
|
||||
Non-ASCII Text in Internet Message Headers", RFC 1342, June 1992.
|
||||
Updated and replaced by Moore, K., "MIME (Multipurpose Internet
|
||||
Mail Extensions) Part Three: Message Header Extensions for
|
||||
Non-ASCII Text", RFC 2047, November 1996.
|
||||
|
||||
[RFC1591] Postel, J., "Domain Name System Structure and Delegation",
|
||||
RFC1591, March 1994.
|
||||
|
||||
[RFC2672] Crawford, M., "Non-Terminal DNS Name Redirection", RFC 2672,
|
||||
August 1999.
|
||||
|
||||
|
||||
[STD3] Braden, R., Ed., "Requirements for Internet Hosts - Application and
|
||||
Support", RFC1123, October 1989.
|
||||
|
||||
[STD13] Mockapetris, P.V., 1034 "Domain names - concepts and
|
||||
facilities", RFC 1034, and "Domain names - implementation and
|
||||
specification", RFC 1035, November 1987.
|
||||
|
||||
6. Acknowledgements
|
||||
|
||||
This document was inspired by a number of conversations in ICANN, IETF,
|
||||
MINC, and private contexts about the future evolution and
|
||||
internationalization of top level domains. Discussions within, and
|
||||
about, the ICANN IDN Committee have been particularly helpful, although
|
||||
several of the members of that committee may be surprised about where
|
||||
those discussions led.
|
||||
|
||||
7. Author's Address
|
||||
|
||||
John C Klensin
|
||||
1770 Massachusetts Ave, #322
|
||||
Cambridge, MA 02140 USA
|
||||
email: john+ietf@jck.com
|
||||
|
Reference in New Issue
Block a user