2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-30 05:57:52 +00:00

remove expired/drafts

This commit is contained in:
Mark Andrews 2004-03-10 00:39:48 +00:00
parent a80cc8dfd9
commit d38cad0add
20 changed files with 0 additions and 23568 deletions

File diff suppressed because it is too large Load Diff

View File

@ -1,767 +0,0 @@
Internet Engineering Task Force Jun-ichiro itojun Hagino
INTERNET-DRAFT IIJ Research Laboratory
Expires: January 19, 2002 July 19, 2001
Comparison of AAAA and A6 (do we really need A6?)
draft-ietf-dnsext-aaaa-a6-01.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as ``work in progress.''
To view the list Internet-Draft Shadow Directories, see
http://www.ietf.org/shadow.html.
Distribution of this memo is unlimited.
The internet-draft will expire in 6 months. The date of expiration will
be January 19, 2002.
Abstract
At this moment, there are two DNS resource record types defined for
holding IPv6 address in the DNS database; AAAA [Thomson, 1995] and A6
[Crawford, 2000] . AAAA has been used for IPv6 network operation since
1996. Questions arose whether we really need A6 or not, or whether it
is really possible to migrate to A6 or not. Some says AAAA is enough
and A6 is not necessary. Some says A6 is necessary and AAAA should get
deprecated.
The draft tries to understand pros and cons between these two record
types, and makes suggestions on deployment of IPv6 record type.
The draft does not cover the use of bit string label and DNAME resource
record (reverse mapping), as it seems that nibble form is well accepted
in the community, newer formats have too much deployment costs, thus we
see few need/voice that calls for migration. Refer to IETF50 dnsext
working group minutes for more details.
HAGINO Expires: January 19, 2002 [Page 1]
DRAFT Comparison of AAAA and A6 July 2001
1. A brief summary of the IPv6 resource record types
1.1. AAAA record
AAAA resource record is formatted as follows. DNS record type value for
AAAA is 28 (assigned by IANA). Note that AAAA record is formatted as a
fixed-length data.
+------------+
|IPv6 Address|
| (16 octets)|
+------------+
With AAAA, we can define DNS records for IPv6 address resolution as
follows, just like A records for IPv4.
$ORIGIN X.EXAMPLE.
N AAAA 2345:00C1:CA11:0001:1234:5678:9ABC:DEF0
N AAAA 2345:00D2:DA11:0001:1234:5678:9ABC:DEF0
N AAAA 2345:000E:EB22:0001:1234:5678:9ABC:DEF0
1.2. A6 record
A6 resource record is formatted as follows. DNS record type value for
A6 is 38 (assigned by IANA). Note that A6 record is formatted as a
variable-length data.
+-----------+------------------+-------------------+
|Prefix len.| Address suffix | Prefix name |
| (1 octet) | (0..16 octets) | (0..255 octets) |
+-----------+------------------+-------------------+
With A6, it is possible to define an IPv6 address by using multiple DNS
records. Here is an example taken from RFC2874:
HAGINO Expires: January 19, 2002 [Page 2]
DRAFT Comparison of AAAA and A6 July 2001
$ORIGIN X.EXAMPLE.
N A6 64 ::1234:5678:9ABC:DEF0 SUBNET-1.IP6
SUBNET-1.IP6 A6 48 0:0:0:1:: IP6
IP6 A6 48 0::0 SUBSCRIBER-X.IP6.A.NET.
IP6 A6 48 0::0 SUBSCRIBER-X.IP6.B.NET.
SUBSCRIBER-X.IP6.A.NET. A6 40 0:0:0011:: A.NET.IP6.C.NET.
SUBSCRIBER-X.IP6.A.NET. A6 40 0:0:0011:: A.NET.IP6.D.NET.
SUBSCRIBER-X.IP6.B.NET. A6 40 0:0:0022:: B-NET.IP6.E.NET.
A.NET.IP6.C.NET. A6 28 0:0001:CA00:: C.NET.ALPHA-TLA.ORG.
A.NET.IP6.D.NET. A6 28 0:0002:DA00:: D.NET.ALPHA-TLA.ORG.
B-NET.IP6.E.NET. A6 32 0:0:EB00:: E.NET.ALPHA-TLA.ORG.
C.NET.ALPHA-TLA.ORG. A6 0 2345:00C0::
D.NET.ALPHA-TLA.ORG. A6 0 2345:00D0::
E.NET.ALPHA-TLA.ORG. A6 0 2345:000E::
If we translate the above into AAAA records, it will be as follows:
$ORIGIN X.EXAMPLE.
N AAAA 2345:00C1:CA11:0001:1234:5678:9ABC:DEF0
N AAAA 2345:00D2:DA11:0001:1234:5678:9ABC:DEF0
N AAAA 2345:000E:EB22:0001:1234:5678:9ABC:DEF0
It is also possible to use A6 records in ``non-fragmented'' manner, like
below.
$ORIGIN X.EXAMPLE.
N A6 0 2345:00C1:CA11:0001:1234:5678:9ABC:DEF0
N A6 0 2345:00D2:DA11:0001:1234:5678:9ABC:DEF0
N A6 0 2345:000E:EB22:0001:1234:5678:9ABC:DEF0
There is a large design difference between A6 and AAAA. A6 imposes
address resolutions tasks more to the resolver side, to reduce the
amount of zone file maintenance cost. The complexity is in the resolver
side. AAAA asks zone file maintainers to supply the full 128bit IPv6
address in one record, and the resolver side can be implemented very
simple.
2. Deployment status
2.1. Name servers/resolvers
As of writing, AAAA is deployed pretty widely. BIND4 (since 4.9.4),
BIND8, BIND9 and other implementations support AAAA, as both DNS servers
and as resolver libraries. On the contrary, the author knows of only
one DNS server/resolver implementation that supports A6; BIND9.
HAGINO Expires: January 19, 2002 [Page 3]
DRAFT Comparison of AAAA and A6 July 2001
Almost all of the IPv6-ready operating systems ship with BIND4 or BIND8
resolver library. [need to check situations with resolver libraries
based on non-BIND code] Therefore, they cannot query A6 records (unless
applications gets linked with BIND9 libraries explicitly).
2.2. IPv6 network
IPv6 network has been deployed widely since 1996. Though many of the
participants consider it to be experimental, commercial IPv6 services
has been deployed since around 1999, especially in Asian countries.
Even today, there are numerous IPv6 networks operated just as serious as
IPv4.
2.3. DNS database
There are no IPv6-reachable root DNS servers, partly because we have
both AAAA and A6, and we are not decided about which is the one we would
like to really deploy (so we cannot put IPv6 root NS records). The lack
of IPv6-reachable root DNS servers is now preventing IPv6-only or
IPv4/v6 dual stack network operations.
At this moment, very small number of ccTLD registries accept
registeration requests for IPv6 glue records. Many of the ccTLDs and
gTLDs do not take IPv6 glue records, partly because of the lack of
consensus between AAAA and A6. Again, the lack of IPv6 glue records is
causing pain in IPv6-ready network operations. For example, JP ccTLD
accepts IPv6 glue records and registers them as AAAA records. IPv6 NS
records (with AAAA) works flawlessly from our experiences. For example,
try the following commands to see how JP ccTLD registers IPv6 glue
records (``/e'' is for English-language output):
% whois -h whois.nic.ad.jp wide.ad.jp/e
% whois -h whois.nic.ad.jp ns1.v6.wide.ad.jp/e
3. Deploying DNS records
At this moment, the following four strategies are proposed for the
deployment of IPv6 DNS resource record; AAAA, fragmented A6 records,
non-fragmented A6 records, and AAAA synthesis.
3.1. AAAA records
AAAA records have been used on IPv6 network (also known as 6bone) since
it has started in 1996 and has been working just fine ever since. AAAA
record is a straight extension of A record; it needs a single query-
response roundtrip to resolve a name into an IPv6 address.
A6 was proposed to add network renumbering friendliness to AAAA. With
AAAA, a full 128bit IPv6 address needs to be supplied in a DNS resource
record. Therefore, in the event of network renumber, administrators
need to update the whole DNS zone file with the new IPv6 address
HAGINO Expires: January 19, 2002 [Page 4]
DRAFT Comparison of AAAA and A6 July 2001
prefixes. We will discuss the issues with renumbering in a dedicated
section.
3.2. Fragmented A6 records
If we are to use fragmented A6s (128bit splitted into multiple A6s), we
have a lot of issues/worries.
If we are to resolve IPv6 addresses using fragmented A6 records, we need
to query DNS multiple times to resolve a single DNS name into an IPv6
address. Since there has been no DNS record types that require multiple
queries for resolution of the address itself, we have very little
experience on such resource records.
There will be more delays in name resolution compared to queries to
A/AAAA records. If we define a record with more number of fragments,
there will be more query roundtrips. There are only few possibilities
to query fragments in parallel. In the above example, we can resolve
A.NET.IP6.C.NET and A.NET.IP6.D.NET in parallel, but not others.
At this moment, there is very little documents available, regarding to
the relationship between DNS record TTL and the query delays. For
example, if the DNS record TTL is smaller than the communication delays
between the querier and the DNS servers, what should happen?
o If we compute DNS record TTL based on the wallclock on the DNS server
side, the DNS records are already expired and the querier will not be
able to reassemble a complete IPv6 record. Worse, by setting up
records with very low TTL, we can let recursive DNS resolvers to go
into infinite loop by letting them chase a wrong A6 chain (see the
section on security considration) [BIND 9.2.0snap: resolver does not
go into infinite loop, meaning that BIND 9.2.0snap resolver does not
really honor DNS record TTL during A6 reassembly].
o If we compute it starting from the time the querier got the record, we
will have some jitter in TTL computation among multiple queriers. If
the query delays are long enough, the querier would end up having
inconsistent A6 fragments, and the IPv6 address can be bogus after
reassembly. With record types other than A6, we had no such problem,
since we have never tried to reassemble an address out of multiple DNS
records (with CNAME chain chasing a similar problem can arise, but the
failure mode is much simpler to diagnose as the records are considered
as an atomic entity).
Some says that caches will avoid querying fragmented A6s again and
again. However, most of the library resolver implementations do not
cache anything. The traffic between library resolver and the first-hop
nameserver will not be decreased by the cached records. The TTL problem
(see above) is unavoidable for the library resolver without cache. [XXX
will they interpret TTL field? BIND8 resolver does not]
HAGINO Expires: January 19, 2002 [Page 5]
DRAFT Comparison of AAAA and A6 July 2001
If some of the fragments are DNSSEC-signed and some are not, how should
we treat that address? RFC2874 section 6 briefly talks about it, not
sure if complete answer is given.
It is much harder to implement A6 fragment reassemble code, than to
implement AAAA record resolver. AAAA record resolver can be implemented
as a straight extension of A record resolver.
o It is much harder to design timeout handling for the A6 reassembly.
There would be multiple timeout parameters, including (1) communcation
timeout for a single A6 fragment, (2) communcation timeout for the
IPv6 address itself (total time needed for reassembly) and (3) TTL
timeout for A6 fragment records.
o In the case of library resolver implementation, it is harder to deal
with exceptions (signals in UNIX case) for the large code fragment for
resolvers.
o When A6 prefix length field is not multiple of 8, address suffix
portion needs to be shifted bitwise while A6 fragments are
reassembled. Also, resolver implementations must be careful about
overwraps of the bits. From our implementatation experiences, the
logic gets very complex and we (unfortunately) expect to see a lot of
security-critical bugs in the future.
In RFC2874, a suggestion is made to use limited number of fragments per
an IPv6 address. However, there is no protocol limitation defined. The
lack makes it easier for malicious parties to impose DoS attacks using
lots of A6 fragments (see the section on security consideration). [BIND
9.2.0snap: The implementation limits the number of fragments within an
A6 chain to be smaller than 16; It is not a protocol limitation but an
implementation choice. Not sure if it is the right choice or not]
With fragmented A6 records, in multi-prefix network configuration, it is
not possible for us to limit the address on the DNS database to the
specific set of records, like for load distribution purposes. Consider
the following example. Even if we would like to advertise only
2345:00D2:DA11:1:1234:5678:9ABC:DEF0 for N.X.EXAMPLE, it is not possible
to do so. It becomes mandatory for us to define the whole IPv6 address
by using ``A6 0'' for N.X.EXAMPLE, and in effect, the benefit of A6
(renumber friendliness) goes away.
HAGINO Expires: January 19, 2002 [Page 6]
DRAFT Comparison of AAAA and A6 July 2001
; with the following record we would advertise both records
$ORIGIN X.EXAMPLE.
N A6 64 ::1234:5678:9ABC:DEF0 SUBNET-1.IP6
M A6 64 ::2345:2345:2345:2345 SUBNET-1.IP6
SUBNET-1.IP6 A6 0 2345:00C1:CA11:1::
A6 0 2345:00D2:DA11:1::
; we need to do the following, jeopardizing renumbering
; friendliness for N.X.EXAMPLE
$ORIGIN X.EXAMPLE.
N A6 0 2345:00C1:CA11:1:1234:5678:9ABC:DEF0
M A6 64 ::2345:2345:2345:2345 SUBNET-1.IP6
SUBNET-1.IP6 A6 0 2345:00C1:CA11:1::
A6 0 2345:00D2:DA11:1::
A6 resource record type and A6 fragment/reassembly were introduced to
help administrators on network renumber. When network gets renumbered,
the administrator needs to update A6 fragment for the higher address
bits (prefixes) only. Again, we will discuss the issues with
renumbering in a dedicated section.
3.3. Non-fragmented A6 records
There are proposals to use non-fragmented A6 records in most of the
places, like ``A6 0 <128bit>'', so that we would be able to switch to
fragmented A6 records when we find a need for A6.
>From the packet format point of view, the approach has no benefit
against AAAA. Rather, there is a one-byte overhead to every
(unfragmented) A6 record compared to a AAAA record.
If the nameserver/resolver programs hardcode A6 processing to handle no
fragments, there will be no future possibility for us to introduce
fragmented A6 records. When there is no need for A6 reassembly, there
will be no code deployment, and even if the reassembly code gets
deployed they will not be tested enough. The author believes that the
``prepare for the future, use non-fragmented A6'' argument is not
worthwhile.
In the event of renumbering, non-fragmented A6 record has the same
property as AAAA (the whole zone file has to be updated).
3.4. AAAA synthesis (A6 and AAAA hybrid approach)
At this moment, end hosts support AAAA records only. Some people would
like to see A6 deployment in DNS databases even with the lack of end
hosts support. To workaround the deployment issues of A6, the following
approach is proposed in IETF50 dnsext working group slot. It is called
``AAAA synthesis'' [Austein, 2001] :
HAGINO Expires: January 19, 2002 [Page 7]
DRAFT Comparison of AAAA and A6 July 2001
o Deploy A6 DNS records worldwide. The proposal was not specific about
whether we would deploy fragmented A6 records, or non-fragmented A6
records (``A6 0'').
o When a host queries AAAA record to a DNS server, the DNS server
queries A6 fragments, reassemble it, and respond with a AAAA record.
The approach needs to be diagnosed/specified in far more detail. For
example, the following questions need to be answered:
o What is the DNS error code against AAAA querier, if the A6 reassembly
fails?
o What TTL should the synthesized AAAA record have? [BIND 9.2.0snap
uses TTL=0]
o Which nameserver should synthesize the AAAA record, in the DNS
recursize query chain? Is the synthesis mandatory for every DNS
server implementation?
o What should we do if the A6 reassembly takes too much time?
o What should we do about DNSSEC signatures?
o What if the resolver wants no synthesis? Do we want to have a flag
bit in DNS packet, to enable/disable AAAA synthesis?
o Relationships between A6 TTL, AAAA TTL, A6 query timeouts, AAAA query
timeouts, and other timeout parameters?
The approach seems to be vulnerable against DoS attacks, because the
nameserver reassembles A6 fragments on behalf of the AAAA querier. See
security consideration section for more details.
3.5. Issues in keeping both AAAA and A6
If we are to keep both AAAA and A6 records onto the worldwide DNS
database, it would impose more query delays to the client resolvers.
Suppose we have a dual-stack host implementation. If they need to
resolve a name into addresses, the node would need to query in the
following order (in the order which RFC2874 suggests):
o Query A6 records, and get full IPv6 addresses by chasing and
reassembling A6 fragment chain.
o Query AAAA records.
o Query A records.
o Sort the result based on destination address ordering rule. An
example of the ordering rule is presented as a draft [Draves, 2001] .
HAGINO Expires: January 19, 2002 [Page 8]
DRAFT Comparison of AAAA and A6 July 2001
o Contact the destination addresses in sequence.
The ordering imposes additional delays to the resolvers. The above
ordering would be necessary for all approaches that use A6, as there are
existing AAAA records in the world.
4. Network renumbering
Some says that there will be more frequent renumbers in IPv6 network
operation, and A6 is necessary to reduce the zone modification cost as
well as zone signing costs on renumber operation.
It is not clear if we really want to renumber that frequently. With
IPv6, it should be easier for ISPs to assign addresses statically to the
downstream customers, rather than dynamically like we do in IPv4 dialup
connectivity today. If ISPs do assign static IPv6 address block to the
customers, there is no need to renumber customer network that frequently
(unless the customer decides to switch the upstream ISPs that often).
NOTE: Roaming dialup users, like those who carry laptop computers
worldwide, seems to have a different issue from stationary dialup users.
See [Hagino, 2000] for more discussions.
It is questionable if it is possible to renumber IPv6 networks more
frequently than with IPv4. Router renumbering protocol [Crawford, 2000]
, IPv6 host autoconfiguration and IPv6 address lifetime [Thomson, 1998]
can help us renumber the IPv6 network, however, network renumbering
itself is not an easy task. If you would like to maintain reachability
from the outside world, a site administrator needs to carefully
coordinate site renumber. The minimal interval between renumber is
restricted by DNS record timeouts, as DNS records will be cached around
the world. If the TTL of DNS records are X, the interval between
renumber must be longer than 2 * X. If we consider clients/servers that
tries to validate addresses using reverse lookups, we also need to care
about the relationship between IPv6 address lifetime [Thomson, 1998] and
the interval between renumber. At IETF50 ipngwg session, there was a
presentation by JINMEI Tatsuya regarding to site renumbering experiment.
It is recommend to read through the IETF49 minutes and slides. [XXX
Fred Baker had a draft on this - where?] For the network renumbering to
be successful, no configuration files should have hardcoded (numeric) IP
addresses. It is a very hard requirement to meet. We fail to satisfy
this in many of the network renumbering events, and the failure causes a
lot of troubles.
At this moment there is no mechanism defined for ISPs to renumber
downstream customers at will. Even though it may sound interesting for
ISPs, it would cause a lot of (social and political) issues in doing so,
so the author would say it is rather unrealistic to pursue this route.
The only possible candidate, router renumbering protocol [Crawford,
2000] does not really fit into the situation. The protocol is defined
using IPsec authentication over site-local multicast packets. It would
be cumbersome to run router renumbering protocol across multiple
HAGINO Expires: January 19, 2002 [Page 9]
DRAFT Comparison of AAAA and A6 July 2001
administrative domains, as (1) customers will not want to share IPsec
authentication key for routers with the upstream ISP, and (2) customer
network will be administered as a separate site from the upstream ISP
(Even though router renumbering protocol could be used with unicast
addresess, it is not realistic to assume that we can maintain the list
of IPv6 addresses for all the routers in both customers' and ISPs'
networks).
A6 was designed to help administators update zone files during network
renumbering events. Even with AAAA, zone file modification itself is
easy; you can just query-replace the addresses in the zone files. The
difficulty of network renumber comes from elsewhere.
With AAAA, we need to sign the whole zone again with DNSSEC keys, after
renumbering the network. With A6, we need to sign upper bits only with
DNSSEC keys. Therefore, A6 will impose less zone signing cost on the
event of network renumbering. As seen above, it is questionable if we
renumber network that often, so it is questionable if A6 brings us an
overall benefit. Note, however, even if we use A6 to facilitate more
frequent renumbering and lower signing cost, all glue records has to be
installed as non-fragmented A6 records (``A6 0''), and required to be
signed again on renumbering events.
5. Security consideration
There are a couple of security worries mentioned in the above. To give
a brief summary:
o There will be a higher delay imposed by query/reply roundtrips for
fragmented A6 records. This could affect every services that relies
upon DNS records.
o There is no upper limit defined for the number of A6 fragments for
defining an IPv6 address. Malicious parties may try to put a very
complex A6 chains and confuse nameservers worldwide.
o A6 resolver/nameserver is much harder to implement correctly than AAAA
resolver/nameserver. A6 fragment reassembly code needs to take care
of bitwise data reassembly, bitwise overwrap checks, and others. From
our implementatation experiences, we expect to see a lot of security-
issue bugs in the future.
o Interaction between DNS record TTL and the DNS query delays leads to
non-trivial timeout problem.
We would like to go into more details for some of these.
5.1. DoS attacks against AAAA synthesis
When a DNS server is configured for AAAA synthesis, malicious parties
can impose DoS attacks using the interaction between DNS TTL and query
HAGINO Expires: January 19, 2002 [Page 10]
DRAFT Comparison of AAAA and A6 July 2001
delays. The attack can chew CPU time and/or memory, as well as some
network bandwidth on a victim nameserver, by the following steps:
o A bad guy configures a record with very complex A6 chain, onto some
nameservers. (the bad guy has to have controls over the servers).
The nameservers can be located anywhere in the world. The A6 chain
should have a very low TTL (like 1 or 0 seconds). The attack works
better if we have higher delays between the victim nameservers and the
nameservers that serve A6 fragments.
o The bad guy queries the record using AAAA request, to the victim
nameserver.
o The victim nameserver will try to reassemble A6 fragments. During the
reassembly process, the victim nameserver puts A6 fragments into the
local cache. The cached records will expire during the reassembly
process. The nameserver will need to query a lot of A6 fragments
(more traffic). The server can go into an infinite loop, if it tries
to query the expired A6 fragments again.
Note, however, this problem could be considered as a problem in
recursize resolvers in general (like CNAME and NS chasing); A6 and AAAA
synthesis makes the problem more apparent, and more complex to diagnose.
To remedy this problem, we have a couple of solutions:
(1) Deprecate A6 and deploy AAAA worldwide. If we do not have A6, the
problem goes away.
(2) Even if we use A6, do not configure nameservers for AAAA synthesis.
Deployment issues with existing IPv6 hosts get much harder.
(3) Impose a protocol limitation to the number of A6 fragments.
(4) Do not query the expired records in A6 chain again. In other words,
implement resolvers that ignore TTL on DNS records. Not sure if it
is the right thing to do.
6. Conclusion
NOTE: the section expresses the impressions of the author.
A6/AAAA discussion has been an obstacle for IPv6 deployment, as the
deployment of IPv6 NS recodrs have been deferred because of the
discussion. The author do not see benefit in keeping both AAAA and A6
records, as it imposes more query delays to the clients. So the author
believes that we need to pick one of them.
Given the unlikeliness of frequent network renumbering, the author
believes that the A6's benefit in lower zone signing cost is not
significant. The benefit of A6 (in zone signing cost) is much less than
HAGINO Expires: January 19, 2002 [Page 11]
DRAFT Comparison of AAAA and A6 July 2001
the expected complication that will be imposed by A6 operations.
>From the above discussions, the author suggests to keep AAAA and
deprecate A6 (move A6 document to historic state). The author believes
that A6 can cause a lot of problem than the benefits it may have. A6
will make IPv6 DNS operation more complicated and vulnerable to attacks.
AAAA is proven to work right in our IPv6 network operation since 1996.
AAAA has been working just fine in existing IPv6 networks, and the
author believes that it will in the coming days.
References
Thomson, 1995.
S. Thomson and C. Huitema, "DNS Extensions to support IP version 6" in
RFC1886 (December 1995). ftp://ftp.isi.edu/in-notes/rfc1886.txt.
Crawford, 2000.
M. Crawford, C. Huitema, and S. Thomson, "DNS Extensions to Support IPv6
Address Aggregation and Renumbering" in RFC2874 (July 2000).
ftp://ftp.isi.edu/in-notes/rfc2874.txt.
Austein, 2001.
R. Austein, "Tradeoffs in DNS support for IPv6" in draft-ietf-dnsext-
ipv6-dns-tradeoffs-00.txt (July 2001). work in progress material.
Draves, 2001.
Richard Draves, "Default Address Selection for IPv6" in draft-ietf-
ipngwg-default-addr-select-04.txt (May 2001). work in progress material.
Hagino, 2000.
Jun-ichiro Hagino and Kazu Yamamoto, "Requirements for IPv6 dialup PPP
operation" in draft-itojun-ipv6-dialup-requirement-00.txt (July 2000).
work in progress material.
Crawford, 2000.
Matt Crawford, "Router Renumbering for IPv6" in RFC2894 (August 2000).
ftp://ftp.isi.edu/in-notes/rfc2894.txt.
Thomson, 1998.
S. Thomson and T. Narten, "IPv6 Stateless Address Autoconfiguration" in
RFC2462 (December 1998). ftp://ftp.isi.edu/in-notes/rfc2462.txt.
Change history
none.
HAGINO Expires: January 19, 2002 [Page 12]
DRAFT Comparison of AAAA and A6 July 2001
Acknowledgements
The draft was written based on discussions in IETF IPv6 and dnsext
working groups, and help from WIDE research group.
Author's address
Jun-ichiro itojun HAGINO
Research Laboratory, Internet Initiative Japan Inc.
Takebashi Yasuda Bldg.,
3-13 Kanda Nishiki-cho,
Chiyoda-ku,Tokyo 101-0054, JAPAN
Tel: +81-3-5259-6350
Fax: +81-3-5259-6351
Email: itojun@iijlab.net
HAGINO Expires: January 19, 2002 [Page 13]

View File

@ -1,394 +0,0 @@
INTERNET-DRAFT Peter Koch
Expires: September 2001 Universitaet Bielefeld
Updates: RFC 1035 March 2001
A DNS RR Type for Lists of Address Prefixes (APL RR)
draft-ietf-dnsext-apl-rr-02.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Comments should be sent to the author or the DNSEXT WG mailing list
<namedroppers@OPS.IETF.ORG>.
Abstract
The Domain Name System is primarily used to translate domain names
into IPv4 addresses using A RRs. Several approaches exist to describe
networks or address ranges. This document specifies a new DNS RR type
"APL" for address prefix lists.
1. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Domain names herein are for explanatory purposes only and should not
be expected to lead to useful information in real life [RFC2606].
Koch Expires September 2001 [Page 1]
INTERNET-DRAFT DNS APL RR March 2001
2. Background
The Domain Name System [RFC1034], [RFC1035] provides a mechanism to
associate addresses and other Internet infrastructure elements with
hierarchically built domain names. Various types of resource records
have been defined, especially those for IPv4 and IPv6 [RFC2874]
addresses. In [RFC1101] a method is described to publish information
about the address space allocated to an organisation. In older BIND
versions, a weak form of controlling access to zone data was
implemented using TXT RRs describing address ranges.
This document specifies a new RR type for address prefix lists.
3. APL RR Type
An APL record has the DNS type of "APL" [draft, IANA: not yet applied
for] and a numeric value of [draft, IANA:to be assigned]. The APL RR
is defined in the IN class only. APL RRs cause no additional section
processing.
4. APL RDATA format
The RDATA section consists of zero or more items (<apitem>) of the
form
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| ADDRESSFAMILY |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| PREFIX | N | AFDLENGTH |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
/ AFDPART /
| |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
ADDRESSFAMILY 16 bit unsigned value as assigned by IANA
(see IANA Considerations)
PREFIX 8 bit unsigned binary coded prefix length.
Upper and lower bounds and interpretation of
this value are address family specific.
N negation flag, indicates the presence of the
"!" character in the textual format. It has
the value "1" if the "!" was given, "0" else.
AFDLENGTH length in octets of the following address
family dependent part (7 bit unsigned).
AFDPART address family dependent part. See below.
Koch Expires September 2001 [Page 2]
INTERNET-DRAFT DNS APL RR March 2001
This document defines the AFDPARTs for address families 1 (IPv4) and
2 (IPv6). Future revisions may deal with additional address
families.
4.1. AFDPART for IPv4
The encoding of an IPv4 address (address family 1) follows the
encoding specified for the A RR by [RFC1035], section 3.4.1.
PREFIX specifies the number of bits of the IPv4 address starting at
the most significant bit. Legal values range from 0 to 32.
Trailing zero octets do not bear any information (e.g. there is no
semantic difference between 10.0.0.0/16 and 10/16) in an address
prefix, so the shortest possible AFDLENGTH can be used to encode it.
However, for DNSSEC [RFC2535] a single wire encoding must be used by
all. Therefore the sender MUST NOT include trailing zero octets in
the AFDPART regardless of the value of PREFIX. This includes cases in
which AFDLENGTH times 8 results in a value less than PREFIX. The
AFDPART is padded with zero bits to match a full octet boundary.
An IPv4 AFDPART has a variable length of 0 to 4 octets.
4.2. AFDPART for IPv6
The 128 bit IPv6 address (address family 2) is encoded in network
byte order (high-order byte first).
PREFIX specifies the number of bits of the IPv6 address starting at
the most significant bit. Legal values range from 0 to 128.
With the same reasoning as in 4.1 above, the sender MUST NOT include
trailing zero octets in the AFDPART regardless of the value of
PREFIX. This includes cases in which AFDLENGTH times 8 results in a
value less than PREFIX. The AFDPART is padded with zero bits to
match a full octet boundary.
An IPv6 AFDPART has a variable length of 0 to 16 octets.
5. Zone File Syntax
The textual representation of an APL RR in a DNS zone file is as
follows:
<owner> IN <TTL> APL {[!]afi:address/prefix}*
The data consists of zero or more strings of the address family
indicator <afi>, immediately followed by a colon ":", an address,
Koch Expires September 2001 [Page 3]
INTERNET-DRAFT DNS APL RR March 2001
immediately followed by the "/" character, immediately followed by a
decimal numeric value for the prefix length. Any such string may be
preceded by a "!" character. The strings are separated by whitespace.
The <afi> is the decimal numeric value of that particular address
family.
5.1. Textual Representation of IPv4 Addresses
An IPv4 address in the <address> part of an <apitem> is in dotted
quad notation, just as in an A RR. The <prefix> has values from the
interval 0..32 (decimal).
5.2. Textual Representation of IPv6 Addresses
The representation of an IPv6 address in the <address> part of an
<apitem> follows [RFC2373], section 2.2. Legal values for <prefix>
are from the interval 0..128 (decimal).
6. APL RR usage
An APL RR with empty RDATA is valid and implements an empty list.
Multiple occurrences of the same <apitem> in a single APL RR are
allowed and MUST NOT be merged by a DNS server or resolver. <apitems>
MUST be kept in order and MUST NOT be rearranged or aggregated.
A single APL RR may contain <apitems> belonging to different address
families. The maximum number of <apitems> is upper bounded by the
available RDATA space.
RRSets consisting of more than one APL RR are legal but the
interpretation is left to the particular application.
7. Applicability Statement
The APL RR defines a framework without specifying any particular
meaning for the list of prefixes. It is expected that APL RRs will
be used in different application scenarios which have to be
documented separately. Those scenarios may be distinguished by
characteristic prefixes placed in front of the DNS owner name.
An APL application specification MUST include information on
o the characteristic prefix, if any
o how to interpret APL RRSets consisting of more than one RR
o how to interpret an empty APL RR
Koch Expires September 2001 [Page 4]
INTERNET-DRAFT DNS APL RR March 2001
o which address families are expected to appear in the APL RRs for
that application
o how to deal with APL RR list elements which belong to other
address families, including those not yet defined
o the exact semantics of list elements negated by the "!" character
Possible applications include the publication of address ranges
similar to [RFC1101], description of zones built following [RFC2317]
and in-band access control to limit general access or zone transfer
(AXFR) availability for zone data held in DNS servers.
The specification of particular application scenarios is out of the
scope of this document.
8. Examples
The following examples only illustrate some of the possible usages
outlined in the previous section. None of those applications are
hereby specified nor is it implied that any particular APL RR based
application does exist now or will exist in the future.
; RFC 1101-like announcement of address ranges for foo.example
foo.example. IN APL 1:192.168.32.0/21 !1:192.168.38.0/28
; CIDR blocks covered by classless delegation
42.168.192.IN-ADDR.ARPA. IN APL ( 1:192.168.42.0/26 1:192.168.42.64/26
1:192.168.42.128/25 )
; Zone transfer restriction
_axfr.sbo.example. IN APL 1:127.0.0.1/32 1:172.16.64.0/22
; List of address ranges for multicast
multicast.example. IN APL 1:224.0.0.0/4 2:FF00:0:0:0:0:0:0:0/8
Note that since trailing zeroes are ignored in the first APL RR the
AFDLENGTH of both <apitems> is three.
9. Security Considerations
Any information obtained from the DNS should be regarded as unsafe
unless techniques specified in [RFC2535] or [RFC2845] were used. The
definition of a new RR type does not introduce security problems into
the DNS, but usage of information made available by APL RRs may
compromise security. This includes disclosure of network topology
information and in particular the use of APL RRs to construct access
control lists.
Koch Expires September 2001 [Page 5]
INTERNET-DRAFT DNS APL RR March 2001
10. IANA Considerations
This section is to be interpreted as following [RFC2434].
This document does not define any new namespaces. It uses the 16 bit
identifiers for address families maintained by IANA in
http://www.iana.org/numbers.html.
IANA is asked to assign a numeric RR type value for APL.
11. Acknowledgements
The author would like to thank Mark Andrews, Olafur Gudmundsson, Ed
Lewis, Thomas Narten, Erik Nordmark, and Paul Vixie for their review
and constructive comments.
12. References
[RFC1034] Mockapetris,P., "Domain Names - Concepts and Facilities",
RFC 1034, STD 13, November 1987
[RFC1035] Mockapetris,P., "Domain Names - Implementation and
Specification", RFC 1035, STD 13, November 1987
[RFC1101] Mockapetris,P., "DNS Encoding of Network Names and Other
Types", RFC 1101, April 1989
[RFC2119] Bradner,S., "Key words for use in RFCs to Indicate
Requirement Levels", RFC 2119, BCP 14, March 1997
[RFC2181] Elz,R., Bush,R., "Clarifications to the DNS
Specification", RFC 2181, July 1997
[RFC2317] Eidnes,H., de Groot,G., Vixie,P., "Classless IN-ADDR.ARPA
delegation", RFC 2317, March 1998
[RFC2373] Hinden,R., Deering,S., "IP Version 6 Addressing
Architecture", RFC 2373, July 1998
[RFC2434] Narten,T., Alvestrand,H., "Guidelines for Writing an IANA
Considerations Section in RFCs", RFC 2434, BCP 26, October
1998
[RFC2535] Eastlake,D., "Domain Name System Security Extensions", RFC
2535, March 1999
Koch Expires September 2001 [Page 6]
INTERNET-DRAFT DNS APL RR March 2001
[RFC2606] Eastlake,D., Panitz,A., "Reserved Top Level DNS Names",
RFC 2606, BCP 32, June 1999
[RFC2845] Vixie,P., Gudmundsson,O., Eastlake,D., Wellington,B.,
"Secret Key Transaction Authentication for DNS (TSIG)",
RFC 2845, May 2000
[RFC2874] Crawford,M., Huitema,C., "DNS Extensions to Support IPv6
Address Aggregation and Renumbering", RFC 2874, July 2000
13. Author's Address
Peter Koch
Universitaet Bielefeld
Technische Fakultaet
D-33594 Bielefeld
Germany
+49 521 106 2902
<pk@TechFak.Uni-Bielefeld.DE>
Koch Expires September 2001 [Page 7]

View File

@ -1,992 +0,0 @@
DNSEXT Working Group Olafur Gudmundsson
INTERNET-DRAFT June 2003
<draft-ietf-dnsext-delegation-signer-15.txt>
Updates: RFC 1035, RFC 2535, RFC 3008, RFC 3090.
Delegation Signer Resource Record
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as ``work in progress.''
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This draft expires on January 19, 2004.
Copyright Notice
Copyright (C) The Internet Society (2003). All rights reserved.
Abstract
The delegation signer (DS) resource record is inserted at a zone cut
(i.e., a delegation point) to indicate that the delegated zone is
digitally signed and that the delegated zone recognizes the indicated
key as a valid zone key for the delegated zone. The DS RR is a
modification to the DNS Security Extensions definition, motivated by
operational considerations. The intent is to use this resource record
as an explicit statement about the delegation, rather than relying on
inference.
Gudmundsson Expires January 2004 [Page 1]
INTERNET-DRAFT Delegation Signer Record June 2003
This document defines the DS RR, gives examples of how it is used and
describes the implications on resolvers. This change is not backwards
compatible with RFC 2535.
This document updates RFC1035, RFC2535, RFC3008 and RFC3090.
Table of contents
Status of this Memo . . . . . . . . . . . . . . . . . . . . . . . . 1
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . 2
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Reserved Words" . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Specification of the Delegation key Signer" . . . . . . . . . . . 4
2.1 Delegation Signer Record Model" . . . . . . . . . . . . . . . . 4
2.2 Protocol Change" . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 RFC2535 2.3.4 and 3.4: Special Considerations at
Delegation Points" . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1.1 Special processing for DS queries" . . . . . . . . . . . . 6
2.2.1.2 Special processing when child and an ancestor share
server" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1.3 Modification on use of KEY RR in the construction of
Responses" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Signer's Name (replaces RFC3008 section 2.7)" . . . . . . . . 9
2.2.3 Changes to RFC3090" . . . . . . . . . . . . . . . . . . . . . 9
2.2.3.1 RFC3090: Updates to section 1: Introduction" . . . . . . . . 9
2.2.3.2 RFC3090 section 2.1: Globally Secured" . . . . . . . . . . . 9
2.2.3.3 RFC3090 section 3: Experimental Status." . . . . . . . . . 10
2.2.4 NULL KEY elimination" . . . . . . . . . . . . . . . . . . . . 10
2.3 Comments on Protocol Changes" . . . . . . . . . . . . . . . . . 10
2.4 Wire Format of the DS record" . . . . . . . . . . . . . . . . . 11
2.4.1 Justifications for Fields" . . . . . . . . . . . . . . . . . . 12
2.5 Presentation Format of the DS Record" . . . . . . . . . . . . . 12
2.6 Transition Issues for Installed Base" . . . . . . . . . . . . . 12
2.6.1 Backwards compatibility with RFC2535 and RFC1035" . . . . . . 12
2.7 KEY and corresponding DS record example" . . . . . . . . . . . . 13
3 Resolver" . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 DS Example" . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Resolver Cost Estimates for DS Records" . . . . . . . . . . . . 15
4 Security Considerations: " . . . . . . . . . . . . . . . . . . . . 15
5 IANA Considerations: " . . . . . . . . . . . . . . . . . . . . . . 16
6 Acknowledgments" . . . . . . . . . . . . . . . . . . . . . . . . . 16
Normative References: " . . . . . . . . . . . . . . . . . . . . . . 16
Informational References" " . . . . . . . . . . . . . . . . . . . . 17
Author Address" . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Full Copyright Statement" . . . . . . . . . . . . . . . . . . . . . 17
Gudmundsson Expires January 2004 [Page 2]
INTERNET-DRAFT Delegation Signer Record June 2003
1 Introduction
Familiarity with the DNS system [RFC1035], DNS security extensions
[RFC2535] and DNSSEC terminology [RFC3090] is important.
Experience shows that when the same data can reside in two
administratively different DNS zones, the data frequently gets out of
sync. The presence of an NS RRset in a zone anywhere other than at
the apex indicates a zone cut or delegation. The RDATA of the NS
RRset specifies the authoritative servers for the delegated or
"child" zone. Based on actual measurements, 10-30% of all delegations
on the Internet have differing NS RRsets at parent and child. There
are a number of reasons for this, including a lack of communication
between parent and child and bogus name servers being listed to meet
registry requirements.
DNSSEC [RFC2535,RFC3008,RFC3090] specifies that a child zone needs to
have its KEY RRset signed by its parent to create a verifiable chain
of KEYs. There has been some debate on where the signed KEY RRset
should reside, whether at the child [RFC2535] or at the parent. If
the KEY RRset resides at the child, maintaining the signed KEY RRset
in the child requires frequent two-way communication between the two
parties. First the child transmits the KEY RRset to the parent and
then the parent sends the signature(s) to the child. Storing the KEY
RRset at the parent was thought to simplify the communication.
DNSSEC [RFC2535] requires that the parent store a NULL KEY record for
an unsecure child zone to indicate that the child is unsecure. A NULL
KEY record is a waste: an entire signed RRset is used to communicate
effectively one bit of information--that the child is unsecure.
Chasing down NULL KEY RRsets complicates the resolution process in
many cases, because servers for both parent and child need to be
queried for the KEY RRset if the child server does not return it.
Storing the KEY RRset only in the parent zone simplifies this and
would allow the elimination of the NULL KEY RRsets entirely. For
large delegation zones the cost of NULL keys is a significant barrier
to deployment.
Prior to the restrictions imposed by RFC3445[RFC3445], another
implication of the DNSSEC key model is that the KEY record could be
used to store public keys for other protocols in addition to DNSSEC
keys. There are number of potential problems with this, including:
1. The KEY RRset can become quite large if many applications and
protocols store their keys at the zone apex. Possible protocols
are IPSEC, HTTP, SMTP, SSH and others that use public key
cryptography.
2. The KEY RRset may require frequent updates.
3. The probability of compromised or lost keys, which trigger
emergency key rollover procedures, increases.
Gudmundsson Expires January 2004 [Page 3]
INTERNET-DRAFT Delegation Signer Record June 2003
4. The parent may refuse to sign KEY RRsets with non-DNSSEC zone
keys.
5. The parent may not meet the child's expectations of turnaround
time for resigning the KEY RRset.
Given these reasons, SIG@parent isn't any better than SIG/KEY@Child.
1.2 Reserved Words
The key words "MAY","MAY NOT", "MUST", "MUST NOT", "REQUIRED",
"RECOMMENDED", "SHOULD", and "SHOULD NOT" in this document are to be
interpreted as described in RFC2119.
2 Specification of the Delegation key Signer
This section defines the Delegation Signer (DS) RR type (type code
TBD) and the changes to DNS to accommodate it.
2.1 Delegation Signer Record Model
This document presents a replacement for the DNSSEC KEY record chain
of trust [RFC2535] that uses a new RR that resides only at the
parent. This record identifies the key(s) that the child uses to
self-sign its own KEY RRset.
Even though DS identifies two roles for KEYs, Key Signing Key (KSK)
and Zone Signing Key (ZSK), there is no requirement that zone use two
different keys for these roles. It is expected that many small zones
will only use one key, while larger zones will be more likely to use
multiple keys.
The chain of trust is now established by verifying the parent KEY
RRset, the DS RRset from the parent and the KEY RRset at the child.
This is cryptographically equivalent to using just KEY records.
Communication between the parent and child is greatly reduced, since
the child only needs to notify the parent about changes in keys that
sign its apex KEY RRset. The parent is ignorant of all other keys in
the child's apex KEY RRset. Furthermore, the child maintains full
control over the apex KEY RRset and its content. The child can
maintain any policies regarding its KEY usage for DNSSEC with minimal
impact on the parent. Thus if the child wants to have frequent key
rollover for its DNS zone keys, the parent does not need to be aware
of it. The child can use one key to sign only its apex KEY RRset and
a different key to sign the other RRsets in the zone.
This model fits well with a slow roll out of DNSSEC and the islands
of security model. In this model, someone who trusts "good.example."
Gudmundsson Expires January 2004 [Page 4]
INTERNET-DRAFT Delegation Signer Record June 2003
can preconfigure a key from "good.example." as a trusted key, and
from then on trusts any data signed by that key or that has a chain
of trust to that key. If "example." starts advertising DS records,
"good.example." does not have to change operations by suspending
self-signing. DS records can be used in configuration files to
identify trusted keys instead of KEY records. Another significant
advantage is that the amount of information stored in large
delegation zones is reduced: rather than the NULL KEY record at every
unsecure delegation demanded by RFC 2535, only secure delegations
require additional information in the form of a signed DS RRset.
The main disadvantage of this approach is that verifying a zone's KEY
RRset requires two signature verification operations instead of the
one in RFC 2535 chain of trust. There is no impact on the number of
signatures verified for other types of RRsets.
2.2 Protocol Change
All DNS servers and resolvers that support DS MUST support the OK bit
[RFC3225] and a larger message size [RFC3226]. In order for a
delegation to be considered secure the delegation MUST contain a DS
RRset. If a query contains the OK bit, a server returning a referral
for the delegation MUST include the following RRsets in the authority
section in this order:
If DS RRset is present:
parent's copy of child's NS RRset
DS and SIG(DS)
If no DS RRset is present:
parent's copy of child's NS RRset
parent's zone NXT and SIG(NXT)
This increases the size of referral messages, possibly causing some
or all glue to be omitted. If the DS or NXT RRsets with signatures do
not fit in the DNS message, the TC bit MUST be set. Additional
section processing is not changed.
A DS RRset accompanying a NS RRset indicates that the child zone is
secure. If a NS RRset exists without a DS RRset, the child zone is
unsecure (from the parents point of view). DS RRsets MUST NOT appear
at non-delegation points or at a zone's apex.
Section 2.2.1 defines special considerations related to authoritative
servers responding to DS queries and replaces RFC2535 sections 2.3.4
and 3.4. Section 2.2.2 replaces RFC3008 section 2.7, and section
2.2.3 updates RFC3090.
Gudmundsson Expires January 2004 [Page 5]
INTERNET-DRAFT Delegation Signer Record June 2003
2.2.1 RFC2535 2.3.4 and 3.4: Special Considerations at Delegation Points
DNS security views each zone as a unit of data completely under the
control of the zone owner with each entry (RRset) signed by a special
private key held by the zone manager. But the DNS protocol views the
leaf nodes in a zone that are also the apex nodes of a child zone
(i.e., delegation points) as "really" belonging to the child zone.
The corresponding domain names appear in two master files and might
have RRsets signed by both the parent and child zones' keys. A
retrieval could get a mixture of these RRsets and SIGs, especially
since one server could be serving both the zone above and below a
delegation point [RFC 2181].
Each DS RRset stored in the parent zone MUST be signed by at least
one of the parent zone's private keys. The parent zone MUST NOT
contain a KEY RRset at any delegation point. Delegations in the
parent MAY contain only the following RR types: NS, DS, NXT and SIG.
The NS RRset MUST NOT be signed. The NXT RRset is the exceptional
case: it will always appear differently and authoritatively in both
the parent and child zones if both are secure.
A secure zone MUST contain a self-signed KEY RRset at its apex. Upon
verifying the DS RRset from the parent, a resolver MAY trust any KEY
identified in the DS RRset as a valid signer of the child's apex KEY
RRset. Resolvers configured to trust one of the keys signing the KEY
RRset MAY now treat any data signed by the zone keys in the KEY RRset
as secure. In all other cases resolvers MUST consider the zone
unsecure. A DS RRset MUST NOT appear at a zone's apex.
An authoritative server queried for type DS MUST return the DS RRset
in the answer section.
2.2.1.1 Special processing for DS queries
When a server is authoritative for the parent zone at a delegation
point and receives a query for the DS record at that name, it MUST
answer based on data in the parent zone, return DS or negative
answer. This is true whether or not it is also authoritative for the
child zone.
When the server is authoritative for the child zone at a delegation
point but not the parent zone, there is no natural response, since
the child zone is not authoritative for the DS record at the zone's
apex. As these queries are only expected to originate from recursive
servers which are not DS-aware, the authoritative server MUST answer
with:
RCODE: NOERROR
AA bit: set
Gudmundsson Expires January 2004 [Page 6]
INTERNET-DRAFT Delegation Signer Record June 2003
Answer Section: Empty
Authority Section: SOA [+ SIG(SOA) + NXT + SIG(NXT)]
That is, it answers as if it is authoritative and the DS record does
not exist. DS-aware recursive servers will query the parent zone at
delegation points, so will not be affected by this.
A server authoritative for only the child zone, that is also a
caching server MAY (if the RD bit is set in the query) perform
recursion to find the DS record at the delegation point, or MAY
return the DS record from its cache. In this case, the AA bit MUST
not be set in the response.
2.2.1.2 Special processing when child and an ancestor share server
Special rules are needed to permit DS RR aware servers to gracefully
interact with older caches which otherwise might falsely label a
server as lame because of the placement of the DS RR set.
Such a situation might arise when a server is authoritative for both
a zone and it's grandparent, but not the parent. This sounds like an
obscure example, but it is very real. The root zone is currently
served on 13 machines, and "root-servers.net." is served on 4 of the
same 13, but "net." is served elsewhere.
When a server receives a query for (<QNAME>, DS, <QCLASS>), the
response MUST be determined from reading these rules in order:
1) If the server is authoritative for the zone that holds the DS RR
set (i.e., the zone that delegates <QNAME>, aka the "parent" zone),
the response contains the DS RR set as an authoritative answer.
2) If the server is offering recursive service and the RD bit is set
in the query, the server performs the query itself (according to the
rules for resolvers described below) and returns its findings.
3) If the server is authoritative for the zone that holds the
<QNAME>'s SOA RR set, the response is an authoritative negative
answer as described in 2.2.1.1.
4) If the server is authoritative for a zone or zones above the
QNAME, a referral to the most enclosing zone's servers is made.
5) If the server is not authoritative for any part of the QNAME, a
response indicating a lame server for QNAME is given.
Gudmundsson Expires January 2004 [Page 7]
INTERNET-DRAFT Delegation Signer Record June 2003
Using these rules will require some special processing on the part of
a DS RR aware resolver. To illustrate this, an example is used.
Assuming a server is authoritative for roots.example.net. and for the
root zone but not the intervening two zones (or the intervening two
label deep zone). Assume that QNAME=roots.example.net., QTYPE=DS,
and QCLASS=IN.
The resolver will issue this request (assuming no cached data)
expecting a referral to a net. server. Instead, rule number 3 above
applies and a negative answer is returned by the server. The
reaction by the resolver is not to accept this answer as final as it
can determine from the SOA RR in the negative answer the context
within which the server has answered.
A solution to this is to instruct the resolver to hunt for the
authoritative zone of the data in a brute force manner.
This can be accomplished by taking the owner name of the returned SOA
RR and striping off enough left-hand labels until a successful NS
response is obtained. A successful response here means that the
answer has NS records in it. (Entertaining the possibility that a
cut point can be two labels down in a zone.)
Returning to the example, the response will include a negative answer
with either the SOA RR for "roots.example.net." or "example.net."
depending on whether roots.example.net is a delegated domain. In
either case, removing the left most label of the SOA owner name will
lead to the location of the desired data.
2.2.1.3 Modification on use of KEY RR in the construction of Responses
This section updates RFC2535 section 3.5 by replacing it with the
following:
A query for KEY RR MUST NOT trigger any additional section
processing. Security aware resolvers will include corresponding SIG
records in the answer section.
KEY records SHOULD NOT be added to the additional records section in
response to any query.
RFC2535 specified that KEY records be added to the additional section
when SOA or NS records where included in an answer. This was done to
reduce round trips (in the case of SOA) and to force out NULL KEYs
(in the NS case). As this document obsoletes NULL keys there is no
need for the inclusion of KEYs with NSs. Furthermore as SOAs are
included in the authority section of negative answers, including the
Gudmundsson Expires January 2004 [Page 8]
INTERNET-DRAFT Delegation Signer Record June 2003
KEYs each time will cause redundant transfers of KEYs.
RFC2535 section 3.5 also included rule for adding the KEY RRset to
the response for a query for A and AAAA types. As Restrict
KEY[RFC3445] eliminated use of KEY RR by all applications this rule
is no longer needed.
2.2.2 Signer's Name (replaces RFC3008 section 2.7)
The signer's name field of a SIG RR MUST contain the name of the zone
to which the data and signature belong. The combination of signer's
name, key tag, and algorithm MUST identify a zone key if the SIG is
to be considered material. This document defines a standard policy
for DNSSEC validation; local policy MAY override the standard policy.
There are no restrictions on the signer field of a SIG(0) record.
The combination of signer's name, key tag, and algorithm MUST
identify a key if this SIG(0) is to be processed.
2.2.3 Changes to RFC3090
A number of sections of RFC3090 need to be updated to reflect the DS
record.
2.2.3.1 RFC3090: Updates to section 1: Introduction
Most of the text is still relevant but the words ``NULL key'' are to
be replaced with ``missing DS RRset''. In section 1.3 the last three
paragraphs discuss the confusion in sections of RFC 2535 that are
replaced in section 2.2.1 above. Therefore, these paragraphs are now
obsolete.
2.2.3.2 RFC3090 section 2.1: Globally Secured
Rule 2.1.b is replaced by the following rule:
2.1.b. The KEY RRset at a zone's apex MUST be self-signed by a
private key whose public counterpart MUST appear in a zone signing
KEY RR (2.a) owned by the zone's apex and specifying a mandatory-to-
implement algorithm. This KEY RR MUST be identified by a DS RR in a
signed DS RRset in the parent zone.
If a zone cannot get its parent to advertise a DS record for it, the
child zone cannot be considered globally secured. The only exception
to this is the root zone, for which there is no parent zone.
Gudmundsson Expires January 2004 [Page 9]
INTERNET-DRAFT Delegation Signer Record June 2003
2.2.3.3 RFC3090 section 3: Experimental Status.
The only difference between experimental status and globally secured
is the missing DS RRset in the parent zone. All locally secured zones
are experimental.
2.2.4 NULL KEY elimination
RFC3445 section 3 eliminates the top two bits in the flags field of
KEY RR. These two bits were used to indicate NULL KEY or NO KEY.
RFC3090 defines that zone is either secure or not, these rules
eliminates the possible need to put NULL keys in the zone apex to
indicate that the zone is not secured for a algorithm. Along with
this document these other two eliminate all uses for the NULL KEY,
This document obsoletes NULL KEY.
2.3 Comments on Protocol Changes
Over the years there have been various discussions surrounding the
DNS delegation model, declaring it to be broken because there is no
good way to assert if a delegation exists. In the RFC2535 version of
DNSSEC, the presence of the NS bit in the NXT bit map proves there is
a delegation at this name. Something more explicit is needed and the
DS record addresses this need for secure delegations.
The DS record is a major change to DNS: it is the first resource
record that can appear only on the upper side of a delegation. Adding
it will cause interoperabilty problems and requires a flag day for
DNSSEC. Many old servers and resolvers MUST be upgraded to take
advantage of DS. Some old servers will be able to be authoritative
for zones with DS records but will not add the NXT or DS records to
the authority section. The same is true for caching servers; in
fact, some might even refuse to pass on the DS or NXT records.
Gudmundsson Expires January 2004 [Page 10]
INTERNET-DRAFT Delegation Signer Record June 2003
2.4 Wire Format of the DS record
The DS (type=TDB) record contains these fields: key tag, algorithm,
digest type, and the digest of a public key KEY record that is
allowed and/or used to sign the child's apex KEY RRset. Other keys
MAY sign the child's apex KEY RRset.
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| key tag | algorithm | Digest type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| digest (length depends on type) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (SHA-1 digest is 20 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The key tag is calculated as specified in RFC2535. Algorithm MUST be
an algorithm number assigned in the range 1..251 and the algorithm
MUST be allowed to sign DNS data. The digest type is an identifier
for the digest algorithm used. The digest is calculated over the
canonical name of the delegated domain name followed by the whole
RDATA of the KEY record (all four fields).
digest = hash( canonical FQDN on KEY RR | KEY_RR_rdata)
KEY_RR_rdata = Flags | Protocol | Algorithm | Public Key
Digest type value 0 is reserved, value 1 is SHA-1, and reserving
other types requires IETF standards action. For interoperabilty
reasons, keeping number of digest algorithms low is strongly
RECOMMENDED. The only reason to reserve additional digest types is
to increase security.
DS records MUST point to zone KEY records that are allowed to
authenticate DNS data. The indicated KEY records protocol field MUST
be set to 3; flag field bit 7 MUST be set to 1. The value of other
flag bits is not significant for the purposes of this document.
The size of the DS RDATA for type 1 (SHA-1) is 24 bytes, regardless
of key size. New digest types probably will have larger digests.
Gudmundsson Expires January 2004 [Page 11]
INTERNET-DRAFT Delegation Signer Record June 2003
2.4.1 Justifications for Fields
The algorithm and key tag fields are present to allow resolvers to
quickly identify the candidate KEY records to examine. SHA-1 is a
strong cryptographic checksum: it is computationally infeasible for
an attacker to generate a KEY record that has the same SHA-1 digest.
Combining the name of the key and the key rdata as input to the
digest provides stronger assurance of the binding. Having the key
tag in the DS record adds greater assurance than the SHA-1 digest
alone, as there are now two different mapping functions.
This format allows concise representation of the keys that the child
will use, thus keeping down the size of the answer for the
delegation, reducing the probability of DNS message overflow. The
SHA-1 hash is strong enough to uniquely identify the key and is
similar to the PGP key footprint. The digest type field is present
for possible future expansion.
The DS record is well suited to listing trusted keys for islands of
security in configuration files.
2.5 Presentation Format of the DS Record
The presentation format of the DS record consists of three numbers
(key tag, algorithm and digest type) followed by the digest itself
presented in hex:
example. DS 12345 3 1 123456789abcdef67890123456789abcdef67890
2.6 Transition Issues for Installed Base
No backwards compatibility with RFC2535 is provided.
RFC2535-compliant resolvers will assume that all DS-secured
delegations are locally secure. This is bad, but the DNSEXT Working
Group has determined that rather than dealing with both
RFC2535-secured zones and DS-secured zones, a rapid adoption of DS is
preferable. Thus the only option for early adopters is to upgrade to
DS as soon as possible.
2.6.1 Backwards compatibility with RFC2535 and RFC1035
This section documents how a resolver determines the type of
delegation.
RFC1035 delegation (in parent) has:
RFC1035 NS
RFC2535 adds the following two cases:
Gudmundsson Expires January 2004 [Page 12]
INTERNET-DRAFT Delegation Signer Record June 2003
Secure RFC2535: NS + NXT + SIG(NXT)
NXT bit map contains: NS SIG NXT
Unsecure RFC2535: NS + KEY + SIG(KEY) + NXT + SIG(NXT)
NXT bit map contains: NS SIG KEY NXT
KEY must be a NULL key.
DNSSEC with DS has the following two states:
Secure DS: NS + DS + SIG(DS)
NXT bit map contains: NS SIG NXT DS
Unsecure DS: NS + NXT + SIG(NXT)
NXT bit map contains: NS SIG NXT
It is difficult for a resolver to determine if a delegation is secure
RFC 2535 or unsecure DS. This could be overcome by adding a flag to
the NXT bit map, but only upgraded resolvers would understand this
flag, anyway. Having both parent and child signatures for a KEY RRset
might allow old resolvers to accept a zone as secure, but the cost of
doing this for a long time is much higher than just prohibiting RFC
2535-style signatures at child zone apexes and forcing rapid
deployment of DS-enabled servers and resolvers.
RFC 2535 and DS can in theory be deployed in parallel, but this would
require resolvers to deal with RFC 2535 configurations forever. This
document obsoletes the NULL KEY in parent zones, which is a difficult
enough change that to cause a flag day.
2.7 KEY and corresponding DS record example
This is an example of a KEY record and the corresponding DS record.
dskey.example. KEY 256 3 1 (
AQPwHb4UL1U9RHaU8qP+Ts5bVOU1s7fYbj2b3CCbzNdj
4+/ECd18yKiyUQqKqQFWW5T3iVc8SJOKnueJHt/Jb/wt
) ; key id = 28668
DS 28668 1 1 49FD46E6C4B45C55D4AC69CBD3CD34AC1AFE51DE
Gudmundsson Expires January 2004 [Page 13]
INTERNET-DRAFT Delegation Signer Record June 2003
3 Resolver
3.1 DS Example
To create a chain of trust, a resolver goes from trusted KEY to DS to
KEY.
Assume the key for domain "example." is trusted. Zone "example."
contains at least the following records:
example. SOA <soa stuff>
example. NS ns.example.
example. KEY <stuff>
example. NXT NS SOA KEY SIG NXT secure.example.
example. SIG(SOA)
example. SIG(NS)
example. SIG(NXT)
example. SIG(KEY)
secure.example. NS ns1.secure.example.
secure.example. DS tag=12345 alg=3 digest_type=1 <foofoo>
secure.example. NXT NS SIG NXT DS unsecure.example.
secure.example. SIG(NXT)
secure.example. SIG(DS)
unsecure.example NS ns1.unsecure.example.
unsecure.example. NXT NS SIG NXT example.
unsecure.example. SIG(NXT)
In zone "secure.example." following records exist:
secure.example. SOA <soa stuff>
secure.example. NS ns1.secure.example.
secure.example. KEY <tag=12345 alg=3>
secure.example. KEY <tag=54321 alg=5>
secure.example. NXT <nxt stuff>
secure.example. SIG(KEY) <key-tag=12345 alg=3>
secure.example. SIG(SOA) <key-tag=54321 alg=5>
secure.example. SIG(NS) <key-tag=54321 alg=5>
secure.example. SIG(NXT) <key-tag=54321 alg=5>
In this example the private key for "example." signs the DS record
for "secure.example.", making that a secure delegation. The DS record
states which key is expected to sign the KEY RRset at
"secure.example.". Here "secure.example." signs its KEY RRset with
the KEY identified in the DS RRset, thus the KEY RRset is validated
and trusted.
This example has only one DS record for the child, but parents MUST
allow multiple DS records to facilitate key rollover and multiple KEY
algorithms.
Gudmundsson Expires January 2004 [Page 14]
INTERNET-DRAFT Delegation Signer Record June 2003
The resolver determines the security status of "unsecure.example." by
examining the parent zone's NXT record for this name. The absence of
the DS bit indicates an unsecure delegation. Note the NXT record
SHOULD only be examined after verifying the corresponding signature.
3.2 Resolver Cost Estimates for DS Records
From a RFC2535 resolver point of view, for each delegation followed
to chase down an answer, one KEY RRset has to be verified.
Additional RRsets might also need to be verified based on local
policy (e.g., the contents of the NS RRset). Once the resolver gets
to the appropriate delegation, validating the answer might require
verifying one or more signatures. A simple A record lookup requires
at least N delegations to be verified and one RRset. For a DS-enabled
resolver, the cost is 2N+1. For an MX record, where the target of
the MX record is in the same zone as the MX record, the costs are N+2
and 2N+2, for RFC 2535 and DS, respectively. In the case of negatives
answer the same ratios hold true.
The resolver have to do an extra query to get the DS record and this
increases the overall cost of resolving this question, but this is
never worse than chasing down NULL KEY records from the parent in
RFC2535 DNSSEC.
DS adds processing overhead on resolvers and increases the size of
delegation answers, but much less than storing signatures in the
parent zone.
4 Security Considerations:
This document proposes a change to the validation chain of KEY
records in DNSSEC. The change is not believed to reduce security in
the overall system. In RFC2535 DNSSEC, the child zone has to
communicate keys to its parent and prudent parents will require some
authentication with that transaction. The modified protocol will
require the same authentication, but allows the child to exert more
local control over its own KEY RRset.
There is a remote possibility that an attacker could generate a valid
KEY that matches all the DS fields, of a specific DS set, and thus
forge data from the child. This possibility is considered
impractical, as on average more than
2 ^ (160 - <Number of keys in DS set>)
keys would have to be generated before a match would be found.
An attacker that wants to match any DS record will have to generate
on average at least 2^80 keys.
Gudmundsson Expires January 2004 [Page 15]
INTERNET-DRAFT Delegation Signer Record June 2003
The DS record represents a change to the DNSSEC protocol and there is
an installed base of implementations, as well as textbooks on how to
set up secure delegations. Implementations that do not understand the
DS record will not be able to follow the KEY to DS to KEY chain and
will consider all zones secured that way as unsecure.
5 IANA Considerations:
IANA needs to allocate an RR type code for DS from the standard RR
type space (type 43 requested).
IANA needs to open a new registry for the DS RR type for digest
algorithms. Defined types are:
0 is Reserved,
1 is SHA-1.
Adding new reservations requires IETF standards action.
6 Acknowledgments
Over the last few years a number of people have contributed ideas
that are captured in this document. The core idea of using one key to
sign only the KEY RRset comes from discussions with Bill Manning and
Perry Metzger on how to put in a single root key in all resolvers.
Alexis Yushin, Brian Wellington, Sam Weiler, Paul Vixie, Jakob
Schlyter, Scott Rose, Edward Lewis, Lars-Johan Liman, Matt Larson,
Mark Kosters, Dan Massey, Olaf Kolman, Phillip Hallam-Baker, Miek
Gieben, Havard Eidnes, Donald Eastlake 3rd., Randy Bush, David
Blacka, Steve Bellovin, Rob Austein, Derek Atkins, Roy Arends, Mark
Andrews, Harald Alvestrand, and others have provided useful comments.
Normative References:
[RFC1035] P. Mockapetris, ``Domain Names - Implementation and
Specification'', STD 13, RFC 1035, November 1987.
[RFC2535] D. Eastlake, ``Domain Name System Security Extensions'', RFC
2535, March 1999.
[RFC3008] B. Wellington, ``Domain Name System Security (DNSSEC) Signing
Authority'', RFC 3008, November 2000.
[RFC3090] E. Lewis `` DNS Security Extension Clarification on Zone
Status'', RFC 3090, March 2001.
[RFC3225] D. Conrad, ``Indicating Resolver Support of DNSSEC'', RFC
3225, December 2001.
[RFC3445] D. Massey, S. Rose ``Limiting the scope of the KEY Resource
Record (RR)``, RFC 3445, December 2002.
Gudmundsson Expires January 2004 [Page 16]
INTERNET-DRAFT Delegation Signer Record June 2003
Informational References
[RFC2181] R. Elz, R. Bush, ``Clarifications to the DNS Specification'',
RFC 2181, July 1997.
[RFC3226] O. Gudmundsson, ``DNSSEC and IPv6 A6 aware server/resolver
message size requirements'', RFC 3226, December 2001.
Author Address
Olafur Gudmundsson
3821 Village Park Drive
Chevy Chase, MD, 20815
USA
<ogud@ogud.com>
Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
Gudmundsson Expires January 2004 [Page 17]
INTERNET-DRAFT Delegation Signer Record June 2003
Gudmundsson Expires January 2004 [Page 1]

View File

@ -1,219 +0,0 @@
Internet Draft Naomasa Maruyama
draft-ietf-idn-aceid-02.txt Yoshiro Yoneya
Jun 19, 2000 JPNIC
Expires Dec 19, 2001
Proposal for a determining process of ACE identifier
Status of this memo
This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
In IETF IDN WG, various kinds of ASCII Compatible Encodings,
hereafter abbreviated as "ACE", are discussed as methods for realizing
multilingual domain names (hereafter referred to as "MDN"). Each ACE
uses a prefix or a suffix as an identifier in order for MDNs to fit
within the existing ASCII domain name space. In other words,
acceptance of an ACE proposal as an Internet standard means that the
existing ASCII domain name space will be partitioned, in order to
accommodate MDN space.
This document describes possible trouble in the standardization
process of ACE, and proposes a solution for it.
1. Present situation and concern
At present, some specifications relating to MDN specify their own
ACE identifiers. In these drafts, multilingual domain names encoded
into ASCII character strings, with the ACE identifiers in their heads
or tails, are merely ASCII character strings. It is possible
accidently or intentionally to register a domain name that is not an
MDN but has the designated ACE identifier string.
If this kind of registration takes place, there is no warranty
that the domain name will be consistent with MDN semantics.
Furthermore, there is no warranty that the name, interpreted as an
MDN, will comply with the registration policies of the registry, when
the ACE identifier proposal is finally accepted as an Internet
standard. This might cause problems with name disputes and/or
revocations.
Therefore, the current situation letting independent ACE proposal
authors arbitrarily select an ACE identifier, hence permitting domain
name registrants registrer such names, may hinder deployment of MDN
technology.
2. Selecting ACE identifiers
In order to maintain a smooth standardization process for ACE,
this document proposes a strategy for selecting and reserving of ACE
identifiers and a method for assigning them.
2.1 The ACE identifier candidates and tentative suspension of
registering relevant domain names
All strings starting with a combination of two alpha-numericals,
followed by two hyphens, are defined to be ACE prefix identifier
candidates. All strings starting with two hyphens followed by two
alpha-numericals are defined as ACE suffix identifier candidates. ACE
prefix identifier candidates and ACE suffix identifier candidates are
collectively called ACE identifier candidates.
All the domain name registries recognized by ICANN SHOULD
tentatively suspend registration of domain names which have an ACE
prefix identifier candidate at the head of at least one label of the
domain name and those which have an ACE suffix identifier candidate at
the tail of at least one label of the name. These domain names are
collectively called "relevant domain names".
This suspension should be continued until September 1, 2001
00:00:00 UTC.
2.2 Survey of relevant domain name registration
All registries recognized by ICANN SHOULD conduct a survey about
relevant domain names registered in their zone, and report, no later
than August 11, 2001 00:00:00 UTC, all of the ACE identifier
candidates which are used by relevant domain names.
2.3 Selection of ACE identifiers and permanent blocking of
relevant domain names
The IDN WG or other organ of IETF or ICANN MUST summarize the
reports and list ACE identifier candidates that are not reported to be
used in registered domain names by August 18, 2001 00:00:00 UTC, and
select ten to twenty ACE prefix identifier candidates and ten to
twenty ACE suffix identifier candidates for ACE identifiers. Among
these twenty to forty ACE identifiers, one prefix identifier and one
suffix identifier will be used for experiments. Others will be used,
one by one as ACE standard evolves.
The list of ACE identifiers will be sent to IANA, and will be
maintained by IANA from August 25, 2001 00:00:00 UTC. Domain names
relevant to these identifiers SHOULD NOT be registered in any DNS
zone, except for registration of multilingual domain names compliant
to one of future IDN standards. This new restriction about the domain
name space will be notified to all ICANN recognized registries by IANA
immediately after it receives the list.
2.4 Blocking of registration for relevant domain names
Domain names relevant to ACE identifiers selected by the procedure
described in section 2.3 SHOULD NOT be registered in any zone of ICANN
recognized registries except for registration of multilingual domain
names compliant to one of future IDN standards. All ICANN recognized
registries SHOULD implement this restriction no later than September 1,
2001 00:00:00 UTC.
Registration for domain names relevant to ACE identifier
candidates, tentatively suspended by 2.1, but not relevant to ACE
identifiers selected by section 2.3 MAY be reopened from September 1,
2001 00:00:00 UTC.
3. Use of an ACE identifier in writing an ACE proposal
When writing an ACE proposal using an ACE identifier, the author
SHOULD either describe the ACE identifier as "to be decided" and left
to discretion of the IDN WG or other organ of IETF or ICANN, or use
either of the ACE identifiers for experiment defined in section 2.3,
with a unique version number added after or before the prefix or
suffix.
If a proposal is validated and published as an Internet Draft, the
IDN WG or other organ of IETF or ICANN MUST replace the "to be
decided" part with an experimental identifier with a unique version
number added after or before the prefix or the suffix.
4. Determination of ACE identifier
When an Internet Draft relating to ACE is accepted as an Internet
standard and becomes an RFC, IDN WG or other organ of IETF or ICANN
MUST replace the experimental ACE identifier, augmented by the version
number, with one of the ACE identifiers.
5. Security considerations
None in particular.
6. Changes from the previous version
We excluded suffixes of one hyphen followed by three alpha-
numericals from the candidates. This is because we found that, as of
Nov. 29, 2000, there were 23,921 domain names registered in the .JP
space relevant to these suffixes. This was more than 10% of 227,852
total registrations in the JPNIC database at the moment, and hence we
felt these suffixes are not good candidates.
In addition to this and some minor linguistic corrections, we
changed "The IDN WG" in section 2.3 to "The IDN WG or other organ of
IETF or ICANN".
7. References
[IDNREQ] Z Wenzel, J Seng, "Requirements of Internationalized Domain
Names", draft-ietf-idn-requirements-03.txt, Jun 2000.
[RACE] P Hoffman, "RACE: Row-based ASCII Compatible Encoding for
IDN", draft-ietf-idn-race-02.txt, Oct 2000.
[BRACE] A Costello, "BRACE: Bi-mode Row-based ASCII-Compatible
Encoding for IDN", draft-ietf-idn-brace-00.txt, Sep 2000.
[LACE] P Hoffman, "LACE: Length-based ASCII Compatible Encoding for
IDN", draft-ietf-idn-lace-00.txt, Nov 2000.
[VERSION] M Blanchet, "Handling versions of internationalized domain
names protocols", draft-ietf-idn-version-00.txt, Nov 2000.
8. Acknowledgements
We would like to express our hearty thanks to members of JPNIC IDN
Task Force for valuable discussions about this issue. We also would
like to express our appreciation to Mr. Dave Crocker for checking and
correcting the preliminary version of this draft.
9. Author's Address
Naomasa Maruyama
Japan Network Information Center
Fuundo Bldg 1F, 1-2 Kanda-ogawamachi
Chiyoda-ku Tokyo 101-0052, Japan
maruyama@nic.ad.jp
Yoshiro Yoneya
Japan Network Information Center
Fuundo Bldg 1F, 1-2 Kanda-ogawamachi
Chiyoda-ku Tokyo 101-0052, Japan
yone@nic.ad.jp

View File

@ -1,454 +0,0 @@
<EFBFBD>©ÀInternet Draft James SENG
<draft-ietf-idn-cjk-01.txt> Yoshiro YONEYA
11th Apr 2001 Kenny HUANG
Expires 11 Oct 2001 KIM Kyongsok
Han Ideograph (CJK) for Internationalized Domain Names
Status of this Memo
This document is an Internet-Draft and is in full conformance
with all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas, and its working
groups. Note that other groups may also distribute working
documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
During the development of Internationalized Domain Name (IDN), it is
discovered that there is a substantial lack of information and
misunderstanding on Han ideographs and its folding mechanism.
This document attempts to address some of the issues on doing han
folding with respect to IDN. Hopefully, this will dispel some of the
common misunderstanding of this problem and to discuss some of the
issues with han ideograph and its folding mechanism.
This document addresses very specific problem to IDN and thus is not
meant as a reference for generic Han folding. Generic Han folding are
much more complicated and certainly beyond this document. However, the
use of this document may be applicable to other areas that are related
with names, e.g. Common Name Resolution Protocol [CNRP].
1. Definition and convention
Characters mentioned in this document are identified by their position
or code point in the Unicode character set [UCS]. The notation U+12AB,
for example, indicates the character at the position 12AB (hexadecimal)
in the [UCS]. It is strongly recommended that a [UCS] table is available
for reference for the ideograph described.
Han ideographs are defined as the Chinese ideographs starting from
U+3400 to U+9FFF or commonly known as CJK Unification Ideographs. This
covers Chinese 'hanzi' {U+6F22 U+5B57/U+6C49 U+5B57}, Japanese 'kanji'
(U+6F22 U+5B57) and Korean 'hanja' {U+6F22 U+5B57/U+D55C U+C790}.
Additional Han ideographs will appear in other location (not necessary
in plane 0) in the future.
Conversion between ideographs can be done using four different
approaches: Code-base substitution, character-based substitution,
lexicon-based substitution and context-based substitution. Han folding
refers only to code-base substitution, similar to case mapping of
alphabetic characters.
2. Introduction
Traditionally, domain names have been case insensitive (as defined in
[RFC1035] Section 2.3.3). While this is not a problem when domain names
are restricted to English alphanumeric letters and digits, it becomes a
serious problem for IDN. An important criterion for having a robust IDN
is to have good normalization and canonicalization forms. This is to
ensure domain name duplications are kept to the minimal.
Fortunately, Unicode Consortium is developing technical reports on
canonicalization [UTR21] and normalization [UTR15]. Hence, it becomes
simple for IDN to ride upon the work of Unicode and use these
references.
Unfortunately, both [UTR15] and [UTR21] are limited in scope and do not
address many other scripts. In particular, Han ideographs are not
discussed in detail in these documents and most experts are quick to
point out that this problem is technically impossible.
2.1 Han ideographs
While there are many forms or writing style for Chinese characters, the
most common used 'zhengti' {U+6B63 U+4F53/U+6B63 U+9AD4} represent
Chinese ideographs by radicals (U+2E80-U+2FDF) that is composed of
simple strokes.
When the Unicode Consortium started work on Universal Character Set, it
was suggested that Hanzi, Kanji and Hanja ideographs should be unified
into a single code space. This resulted in the CJK Unification, whereby
27,786 Han ideographs are allocated in U+3400-U+9FFF and U+F900-U+FAFF
range. Another 41,000 Han ideographs will be added to Plane 2.
Ideographs are common in China, Korea and Japan but as ideographs spread
and evolve, the form of the ideographs sometimes differs slightly from
country to country. For example, the word 'villa' {U+838A} 'zhuang' in
Chinese, in Japanese is 'sou' {U+8358}. These are given different code
points in Unicode.
3. Chinese (Hanzi)
Chinese ideographs or hanzi {U+6F22 U+5B57/U+6C49 U+5B57} originated
from pictograph. They are 'pictures' which evolved into ideographs
during several thousand years. For instance, the ideograph for "hill"
{U+5C71} still bears some resembles to 3 peaks of a hill.
Not all ideographs are pictograph. There are other classifications such
as compound ideographs, phonetic ideographs etc. For example,
'endurance' {U+5FCD} is a pierced 'knife' {U+5200} above the 'heart'
{U+5FC3}, or as a Chinese saying goes, 'endurance is like having a
pierced knife in your heart'.
Hence, almost all Han ideographs are associated with some meaning by
itself which is very different from most other scripts. This causes some
confusion that Han folding is a form of lexicon-substitution.
Chinese ideographs underwent a major change in the 1950s after the
establishment of People's Republic of China. A committee on Language
Reform was established in China whose activities include simplification
of Chinese ideographs. The Simplified Chinese (SC) are used in China
and Singapore and Traditional Chinese (TC) in Taiwan, Hong Kong PRC,
Macau PRC, and most other oversea Chinese.
The process is to take complex ideographs and simplify them. The main
purposes is to make it easier to remember and write and thus to raise
the literacy of the population.
For example, 'lightning' TC {U+96FB} becomes SC {U+6535} (They drop the
'rain' {U+96E8} part from the TC). In many cases, they bear no
resemblance to any of the original traditional forms e.g. 'dragon' TC
{U+9F8D} SC {U+9F99}. Two different TC may also have the same SC since
it means fewer ideographs to learn, e.g. SC {U+53D1} can be {U+667C} or
{U+9AEE} depending on semantics. The official 'Comprehensive List of
Simplified Characters' latest published in 1986 listed 2244 SC
[ZONGBIAO].
Therefore, the process of SC-to-TC is very complicated. It is not
possible to do it accurately without considering the semantics of the
phrase.
On the other hand, TC-to-SC is much simple although different TCs may
map to one single SC. While Unicode does not handle TC & SC, in the
informal [UNIHAN] document, it listed 2145 TC and its equivalent mapping
of SC. However, because that document is informal and not part of the
Unicode standard, it is incomplete and has mistakes in the code points.
Hence, precise tables for TC-to-SC conversion have not been fully laid
out.
In domain names, we are particularly interested in is to equivalences
comparison of the names, and not converting SC-to-TC. Therefore, for
this purpose, it is possible that equivalency matching be done in the
TC-to-SC folding prior to comparison, similar to lower-case English
strings before comparing them, e.g. 'taiwan' SC {U+53F0 U+6E7E} will
match with TC {U+81FA U+5F4E} or TC {U+53F0 U+5F4E}.
The side effect of this method is that comparing SC {U+53D1} to TC
{U+667C} or TC {U+9AEE} will both be positive. This implies that SC
'hair' SC …ñ³…Åæ {U+5934 U+53D1} will match TC
(U+982D U+9AEE). It will also match TC {U+982D U+9AEE} that does not
have any meaning in Chinese.
It should also be noted that SC are not used together with TC. Hence,
'hair' is either written as SC {U+5934 U+53D1} or TC {U+982D U+9AEE}
but (almost) never {U+5934 U+9AEE} or {U+982D U+53D1}. So the problem
of SC and TC may not too serious for IDN.
Unfortunately, when it comes to names in Chinese, places where SC are
used (i.e. Singapore and China), traditional and simplified ideographs
are sometimes mixed within a single name for artistic reasons. Some of
them even 'create' ideographs for their names.
[Need to add a section on Bopomofo U+3118 to U+312A in future draft]
4. Korean (Hanja and Hangeul)
Korean is one of the first cultures to imported Chinese ideographs into
Korean language as a written form. These Korean ideographs are known as
'hanja' {U+6F22 U+5B57/U+D55C U+C790} and they are widely used until
recently where 'hangeul' {U+D55C U+AE00} become more popular.
Hangeul {U+D55C U+AE00} is a systemic script designed by a 15th century
ruler and linguistic expert, King Sejong {U+4E16 U+5B97}. It is based
on the pronunciation of the Korean language, hanmal. A Korean syllable
is composed of 'jamo' {U+5B57 U+6BCD/U+C790 U+BAA8} elements that
represent different sound. Hence, unlike Han ideographs, each hangeul
syllable does not have any meaning.
Each hanja ideographs can be represented by hangeul syllable. For
example, 'samsung' hanja {U+4E09 U+661F} hangeul {U+C0BC U+C131}. Note
that {U+4E09} is pronounced as 'sa-ah-am' or in jamo {U+3145} {U+314F}
{U+3141}, which gives hangeul {U+C0BC}. While Jamo decompositions are
described in [UTR15] in Form D decomposition, this document also
suggested another hanguel canonical decomposition in Appendix A to
accommodates both modern and old hangeul.
[Need to fill up Appendix A when information is more complete]
Most hanja characters have only one pronunciation. However, some hanja
pronunciation differs as according to orthography (same for Chinese &
Japanese) or the position in a word, which make this more complex. And
of course, conversation of Hangeul back to hanja is impossible by code
substitution without consideration for semantics.
Korean also invented their own ideographs that are called 'gugja'
{U+56FD U+5B57/U+AD6D U+C790}.
5. Japanese (Kanji, Hiragana, Katakana)
Japanese adopted Chinese ideograph from the Korean and the Chinese since
the 5th century. Chinese ideographs in Japanese are known as 'kanji'
{U+6F22 U+5B57}. They also developed their own syllabary hiragana
{U+5E73 U+4EEE U+540D} (U+3040-U+309F) and katakana {U+7247 U+4EEE
U+540D} (U+30A0-U+30FF), both are derivative of kanji that has same
pronunciation. Hiragana is a simplified cursive form, for example, 'a'
{U+3042} was derived from 'an' {U+5B89}. Katakana is a simplified part
form, for example, 'a' {U+30A2} was derived from 'a' {U+963F}. However,
kanji all remain very integrated within the Japanese language.
Japanese also invented ideographs known as 'kokuji' {U+56FD U+5B57}. For
example, 'iwashi' {U+9C2F} is a Japanese kokuji ideograph. Kokuji are
invented according to Han ligature rules. For example, 'touge' "mountain
pass" {U+5CE0} is a conjunction of meaning with 'yama' "mountain"
{U+5C71} + 'ue' "up" {U+4E0A} + 'shita' "down" {U+4E0B}.
Japanese is also a vocal language, i.e. the script itself is based on
pronunciation. Each hiragana corresponding to one pronunciation and 48
hiragana forms the basic of the Japanese language, including the less
commonly used 'we' {U+3091}. Furthermore, hiragana has more 35 forms to
represent voiced sound, P-sound, double consonant. For example, 'ga'
{U+304C} is a voiced sound of 'ka' {U+304B}. Katakana is a mirror of
hiragana with few more forms and they are used to integrate foreign
words or phrases into Japanese, or to emphasize words or phrases even
in Japanese, or to represent onomatopoeia. For example, 'hamburger'
pronounced as 'han-baa-gaa' in Japanese is written as {U+30CF U+30F3
U+30D0 U+30FC U+30AC U+30FC} instead of {U+306F U+3093 U+3070 U+3041
U+304C U+3041} because it is a foreign word.
If Japanese uses hiragana and katakana only, then it is fairly obvious
that written Japanese is going to be very long. Hence, kanji are used
when referring to nouns or verbs. Each kanji corresponds to one or more
hiragana characters. For example, 'japan' pronounced as 'nippon'
{U+306B U+3063 U+307D U+3093} are written as {U+65E5 U+672C} instead.
Hiragana, like Korean jamo, has no meaning itself. And also, Kanji can
take on different pronunciation (which means different hiragana)
depending where and how it is use in the sentence. For example, 'sky'
{U+7A7A} can be pronounced as {U+305D U+3089} or {U+30BD U+30E9}.
Hence, a code substitution between hiragana and kanji is impractical.
On the other hand, there are Kanji that has the same meaning with the
same pronunciation and equivalent. For example, 'river' "kawa" can be
either {U+5DDD} or {U+6CB3}. The only differential between the two
ideographs is that it signifies the 'size of the river' (the latter is
bigger river).
Japanese also reduce complex Chinese ideographs to a simplified form.
For example, 'both' {U+5169} was simplified {U+4E21}. Note that Chinese
simplified it to {U+4E24} instead. However, traditional Japanese kanji
are seldom used nowadays beyond documenting old historical text that
they are treated different from the more commonly used simplified form,
or used to express proper noun such as person's name or trademarks.
Hence, Han folding here is not recommended.
4. Vietnamese
While Vietnamese also adopted Chinese ideographs ('chu han') and created
their own ideographs ('chu nom'), they were now replaced by romanized
'quoc ngu' today. Hence, this document does not attempt to address any
issues with 'chu han' or 'chu nom'.
5. zVariant
Unicode has a three dimension conceptual model to Ideograph
Unification. The three dimensions are semantic (X axis - meaning,
function), abstract shape (Y-axis - general form) and actual shape
(Z-axis ‚Çô instantiated, type-faced).
When two ideographs have similar etymology but are given two different
code points in Unicode, they are known as zVariant ideograph i.e. they
belong to the same 'Z' axis. For example, 'villa' {U+838A} and {U+8358}.
6. Ideographic Description
In Unicode v3.0, an ideographic description (U+2FF0-U+2FFB) was
introduced allowing Han ideograph to be constructed using radical
(U+2E80-U+2FD5) and Han ideograph (U+3400-U+9FFF).
The intention of this description method is to allow ideograph that is
not defined by Unicode to be described. Hence, it is not necessary that
these ideograph can be display properly. In addition, this method are
not deterministic and allowing same ideograph to be represented in
different sequence.
For example, 'zong' {U+9B03} (for discussion sake, we are going to use
an ideograph which is already in Unicode) can be decomposed to U+2FF1
U+9ADF U+5B97 using descriptive code points and Unified Ideograph.
U+9ADF can also be decomposed as U+2FF0 U+2ED2 U+2F3A and U+5B97 as
U+2FF5 U+2F28 U+2F70. In addition, U+9ADF is equivalent to U+2FBD.
Hence, if we were to use only descriptive code points and radicals only,
we can get U+2FF1 U+2FBD U+2FF5 U+2F28 U+2F70 or U+2FF1 U+2FF0 U+2ED2
U+2F3A U+2FF5 U+2F28 U+2F70.
In addition, certain radical has been simplified and thus, in some
context, equivalent. For example, the radical for 'bird' can be either
U+2EE6 or U+2FC3.
Hence, until there is a deterministic well-defined rule for
ideographic description, ideographs formed by this method are not
recommended for domain names use.
It should be noted that the Unicode Consortium never intended the
ideographic description to be used in protocols like IDN where exact
comparison must be done. But it is certainly desirable to this feature
as it is commons for Chinese to invent ideographs for names by adding
or removing radical from standard ideographs.
7. Mechanism
The implicit proposal in this document is that CJKV ideographs may or
may not be "folded" for the purposes of comparison of domain names.
But if folding is required, there are four different ways that this
folding could be done.
a) Folding by DNS clients, or by user agents
b) Folding by DNS servers
c) Folding by Domain Name registration services for the purposes of
preventing confusing allocations CJKV Domain Names which would,
if transcoded, be the same
Before we can give much more reaction, we need to know which use is
planned.
The third use is important. It should be put in place. This problem can
be reduced alternately by representing non-ASCII characters that are
domain names or other URL characters using hex-escaped character
references in HTML pages.
To characterize Han characters as ideographs or pictograms is
inadequate, because most of the Han ideograph have both a phonetic and
a semantic element. Indeed, this is enough to characterize Chinese
writing as phonetic, though it is other things as well. Thus, it's
difficult to comment on whether folding is useful for Chinese or not.
The first use has the problem that lightweight devices do not have
enough room to fit a Unicode X-axis mapping table.
The second use has the problem that introducing mapping will limit the
performance of DNS servers. Alphabetic case mapping can be performed
using a single logical AND instruction; CJKV character folding requires
a lookup table.
In alphabetic scripts, there is also requirement to fold Latin, Greek,
Hebrew, Cyrillic, Hebrew and Arabic together. There may be a stronger
requirement for CJKV characters.
Note also that because modern OS are Unicode based and have network-
downloadable IMEs, "interoperability" is becoming less equivalent to
"use BIG5 characters only" or "use GB2312 character only" or "use
Shift-JIS characters only".
If conservative safety is really required, then
1) find the x-axis characters which are available in all major CJK
character sets used on the internet;
2) only allow variants of those in domain names;
3) when one variant is used, no other can be allocated. So comparisons
are made on x-axis characters, but the license of that domain name
can pick which y or z variants they wish to use..
Acknowledgement
The editor gratefully acknowledge the contributions of:
Paul Hoffman <phoffman@imc.org>
Jiang Mingliang <jiang@i-DNS.net>
Dongman Lee <dlee@icu.ac.kr>
Karlsson Kent <keka@im.se>
Author(s)
James SENG ˆÄè†î¯…«Å
i-DNS.net International Pte Ltd.
8 Temasek Boulevard
Suntec Tower 3 #24-02
Singapore 038988
Email: James@Seng.cc
Tel: +65 2468208
Yoshiro YONEYA
NTT Software Corporation
Shinagawa IntercityBldg., B-13F
2-15-2 Kohnan, Minato-ku Tokyo 108-6113 Japan
Email: yone@po.ntts.co.jp
Tel: +81-3-5782-7291
Kenny HUANG ‰©â…雷¢ä
Geotempo International Ltd; TWNIC
3F, No 16 Kang Hwa Street, Nei Hu
Taipei 114, Taiwan
Email: huangk@alum.sinica.edu
Tel: +886-2-2658-6510
KIM Kyongsok/GIM Gyeongseog
References
[UNISTD3] The Unicode Standard v3.0. Unicode Consortium.
[UCS] ISBN 0-201-61633-5
[IDN] "IETF Internationalized Domain Names Working Group",
idn@ops.ietf.org, James Seng, Marc Blanchet
[CNRP] "Common Name Resolution Protocol",
cnrp-ietf@lists.netsol.com, Leslie Daigle
[CJKV] CJKV Information Processing ISBN 1-56592-224-7
[C2C] The pitfalls and Complexities of Chinese to Chinese
Conversion. http://www.basistech.com/articles/C2C.html,
Jack Halpern, Jouni Kerman
[KANJIDIC] SanseidoÇÖs Unicode Kanji Information Dictionary
ISBN 4-385-13690-4
[UNICHART] Unicode chart http://charts.unicode.org/
[ZONGBIAO] Simplified Characters Standard Chart 2nd Edition, 1986
[UNIHAN] Unicode Han Database, Unicode Consortium
ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt
[ISO11941] ISO TS 11941: Information and documentation ‚Çô
Transliteration of Korean script into Latin characters.
Technical Specification 11941. First edition. 1996-12-31.
ISO (International Organization for Standardization).
[KimK 1990] "A New Proposal for a Standard Hangeul (or Korean Script)
Code", KIM Kyongsok. Computer Standards & Interfaces,
Vol. 9, No. 3, pp. 187-202, 1990.
[KimK 1992] "A common Approach to Designing the Hangeul Code and
Keyboard", KIM Kyongsok. Computer Standards & Interfaces,
Vol. 14, No. 4, pp. 297-325, Aug. 1992.
[KimK 1999] A Hangeul story inside computers. KIM, Kyongsok. Busan
National University Press. 1999. [in Hangeul]

View File

@ -1,864 +0,0 @@
INTERNET-DRAFT Mark Welter
draft-ietf-idn-dude-02.txt Brian W. Spolarich
Expires 2001-Dec-07 Adam M. Costello
2001-Jun-07
Differential Unicode Domain Encoding (DUDE)
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note
that other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Distribution of this document is unlimited. Please send comments to
the authors or to the idn working group at idn@ops.ietf.org.
Abstract
DUDE is a reversible transformation from a sequence of nonnegative
integer values to a sequence of letters, digits, and hyphens (LDH
characters). DUDE provides a simple and efficient ASCII-Compatible
Encoding (ACE) of Unicode strings [UNICODE] for use with
Internationalized Domain Names [IDN] [IDNA].
Contents
1. Introduction
2. Terminology
3. Overview
4. Base-32 characters
5. Encoding procedure
6. Decoding procedure
7. Example strings
8. Security considerations
9. References
A. Acknowledgements
B. Author contact information
C. Mixed-case annotation
D. Differences from draft-ietf-idn-dude-01
E. Example implementation
1. Introduction
The IDNA draft [IDNA] describes an architecture for supporting
internationalized domain names. Each label of a domain name may
begin with a special prefix, in which case the remainder of the
label is an ASCII-Compatible Encoding (ACE) of a Unicode string
satisfying certain constraints. For the details of the constraints,
see [IDNA] and [NAMEPREP]. The prefix has not yet been specified,
but see http://www.i-d-n.net/ for prefixes to be used for testing
and experimentation.
DUDE is intended to be used as an ACE within IDNA, and has been
designed to have the following features:
* Completeness: Every sequence of nonnegative integers maps to an
LDH string. Restrictions on which integers are allowed, and on
sequence length, may be imposed by higher layers.
* Uniqueness: Every sequence of nonnegative integers maps to at
most one LDH string.
* Reversibility: Any Unicode string mapped to an LDH string can
be recovered from that LDH string.
* Efficient encoding: The ratio of encoded size to original size
is small. This is important in the context of domain names
because [RFC1034] restricts the length of a domain label to 63
characters.
* Simplicity: The encoding and decoding algorithms are reasonably
simple to implement. The goals of efficiency and simplicity are
at odds; DUDE places greater emphasis on simplicity.
An optional feature is described in appendix C "Mixed-case
annotation".
2. Terminology
The key words "must", "shall", "required", "should", "recommended",
and "may" in this document are to be interpreted as described in
RFC 2119 [RFC2119].
LDH characters are the letters A-Z and a-z, the digits 0-9, and
hyphen-minus.
A quartet is a sequence of four bits (also known as a nibble or
nybble).
A quintet is a sequence of five bits.
Hexadecimal values are shown preceeded by "0x". For example, 0x60
is decimal 96.
As in the Unicode Standard [UNICODE], Unicode code points are
denoted by "U+" followed by four to six hexadecimal digits, while a
range of code points is denoted by two hexadecimal numbers separated
by "..", with no prefixes.
XOR means bitwise exclusive or. Given two nonnegative integer
values A and B, A XOR B is the nonnegative integer value whose
binary representation is 1 in whichever places the binary
representations of A and B disagree, and 0 wherever they agree.
For the purpose of applying this rule, recall that an integer's
representation begins with an infinite number of unwritten zeros.
In some programming languages, care may need to be taken that A and
B are stored in variables of the same type and size.
3. Overview
DUDE encodes a sequence of nonnegative integral values as a sequence
of LDH characters, although implementations will of course need to
represent the output characters somehow, typically as ASCII octets.
When DUDE is used to encode Unicode characters, the input values are
Unicode code points (integral values in the range 0..10FFFF, but not
D800..DFFF, which are reserved for use by UTF-16).
Each value in the input sequence is represented by one or more LDH
characters in the encoded string. The value 0x2D is represented
by hyphen-minus (U+002D). Each non-hyphen-minus character in
the encoded string represents a quintet. A sequence of quintets
represents the bitwise XOR between each non-0x2D integer and the
previous one.
4. Base-32 characters
"a" = 0 = 0x00 = 00000 "s" = 16 = 0x10 = 10000
"b" = 1 = 0x01 = 00001 "t" = 17 = 0x11 = 10001
"c" = 2 = 0x02 = 00010 "u" = 18 = 0x12 = 10010
"d" = 3 = 0x03 = 00011 "v" = 19 = 0x13 = 10011
"e" = 4 = 0x04 = 00100 "w" = 20 = 0x14 = 10100
"f" = 5 = 0x05 = 00101 "x" = 21 = 0x15 = 10101
"g" = 6 = 0x06 = 00110 "y" = 22 = 0x16 = 10110
"h" = 7 = 0x07 = 00111 "z" = 23 = 0x17 = 10111
"i" = 8 = 0x08 = 01000 "2" = 24 = 0x18 = 11000
"j" = 9 = 0x09 = 01001 "3" = 25 = 0x19 = 11001
"k" = 10 = 0x0A = 01010 "4" = 26 = 0x1A = 11010
"m" = 11 = 0x0B = 01011 "5" = 27 = 0x1B = 11011
"n" = 12 = 0x0C = 01100 "6" = 28 = 0x1C = 11100
"p" = 13 = 0x0D = 01101 "7" = 29 = 0x1D = 11101
"q" = 14 = 0x0E = 01110 "8" = 30 = 0x1E = 11110
"r" = 15 = 0x0F = 01111 "9" = 31 = 0x1F = 11111
The digits "0" and "1" and the letters "o" and "l" are not used, to
avoid transcription errors.
A decoder must accept both the uppercase and lowercase forms of
the base-32 characters (including mixtures of both forms). An
encoder should output only lowercase forms or only uppercase forms
(unless it uses the feature described in the appendix C "Mixed-case
annotation").
5. Encoding procedure
All ordering of bits, quartets, and quintets is big-endian (most
significant first).
let prev = 0x60
for each input integer n (in order) do begin
if n == 0x2D then output hyphen-minus
else begin
let diff = prev XOR n
represent diff in base 16 as a sequence of quartets,
as few as are sufficient (but at least one)
prepend 0 to the last quartet and 1 to each of the others
output a base-32 character corresponding to each quintet
let prev = n
end
end
If an encoder encounters an input value larger than expected (for
example, the largest Unicode code point is U+10FFFF, and nameprep
[NAMEPREP03] can never output a code point larger than U+EFFFD),
the encoder may either encode the value correctly, or may fail, but
it must not produce incorrect output. The encoder must fail if it
encounters a negative input value.
6. Decoding procedure
let prev = 0x60
while the input string is not exhausted do begin
if the next character is hyphen-minus
then consume it and output 0x2D
else begin
consume characters and convert them to quintets until
encountering a quintet whose first bit is 0
fail upon encountering a non-base-32 character or end-of-input
strip the first bit of each quintet
concatenate the resulting quartets to form diff
let prev = prev XOR diff
output prev
end
end
encode the output sequence and compare it to the input string
fail if they do not match (case-insensitively)
The comparison at the end is necessary to guarantee the uniqueness
property (there cannot be two distinct encoded strings representing
the same sequence of integers). This check also frees the decoder
from having to check for overflow while decoding the base-32
characters. (If the decoder is one step of a larger decoding
process, it may be possible to defer the re-encoding and comparison
to the end of that larger decoding process.)
7. Example strings
The first several examples are nonsense strings of mostly unassigned
code points intended to exercise the corner cases of the algorithm.
(A) u+0061
DUDE: b
(B) u+2C7EF u+2C7EF
DUDE: u6z2ra
(C) u+1752B u+1752A
DUDE: tzxwmb
(D) u+63AB1 u+63ABA
DUDE: yv47bm
(E) u+261AF u+261BF
DUDE: uyt6rta
(F) u+C3A31 u+C3A8C
DUDE: 6v4xb5p
(G) u+09F44 u+0954C
DUDE: 39ue4si
(H) u+8D1A3 u+8C8A3
DUDE: 27t6dt3sa
(I) u+6C2B6 u+CC266
DUDE: y6u7g4ss7a
(J) u+002D u+002D u+002D u+E848F
DUDE: ---82w8r
(K) u+BD08E u+002D u+002D u+002D
DUDE: 57s8q---
(L) u+A9A24 u+002D u+002D u+002D u+C05B7
DUDE: 434we---y393d
(M) u+7FFFFFFF
DUDE: z999993r or explicit failure
The next several examples are realistic Unicode strings that could
be used in domain names. They exhibit single-row text, two-row
text, ideographic text, and mixtures thereof. These examples are
names of Japanese television programs, music artists, and songs,
merely because one of the authors happened to have them handy.
(N) 3<nen>b<gumi><kinpachi><sensei> (Latin, kanji)
u+0033 u+5E74 u+0062 u+7D44 u+91D1 u+516B u+5148 u+751F
DUDE: xdx8whx8tgz7ug863f6s5kuduwxh
(O) <amuro><namie>-with-super-monkeys (Latin, kanji, hyphens)
u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074
u+0068 u+002D u+0073 u+0075 u+0070 u+0065 u+0072 u+002D u+006D
u+006F u+006E u+006B u+0065 u+0079 u+0073
DUDE: x58jupu8nuy6gt99m-yssctqtptn-tmgftfth-trcbfqtnk
(P) maji<de>koi<suru>5<byou><mae> (Latin, hiragana, kanji)
u+006D u+0061 u+006A u+0069 u+3067 u+006B u+006F u+0069 u+3059
u+308B u+0035 u+79D2 u+524D
DUDE: pnmdvssqvssnegvsva7cvs5qz38hu53r
(Q) <pafii>de<runba> (Latin, katakana)
u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0
DUDE: vs5bezgxrvs3ibvs2qtiud
(R) <sono><supiido><de> (hiragana, katakana)
u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067
DUDE: vsvpvd7hypuivf4q
8. Security considerations
Users expect each domain name in DNS to be controlled by a single
authority. If a Unicode string intended for use as a domain label
could map to multiple ACE labels, then an internationalized domain
name could map to multiple ACE domain names, each controlled by
a different authority, some of which could be spoofs that hijack
service requests intended for another. Therefore DUDE is designed
so that each Unicode string has a unique encoding.
However, there can still be multiple Unicode representations of the
"same" text, for various definitions of "same". This problem is
addressed to some extent by the Unicode standard under the topic of
canonicalization, and this work is leveraged for domain names by
"nameprep" [NAMEPREP03].
9. References
[IDN] Internationalized Domain Names (IETF working group),
http://www.i-d-n.net/, idn@ops.ietf.org.
[IDNA] Patrik Faltstrom, Paul Hoffman, "Internationalizing Host
Names In Applications (IDNA)", draft-ietf-idn-idna-01.
[NAMEPREP03] Paul Hoffman, Marc Blanchet, "Preparation
of Internationalized Host Names", 2001-Feb-24,
draft-ietf-idn-nameprep-03.
[RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host
Table Specification", 1985-Oct, RFC 952.
[RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities",
1987-Nov, RFC 1034.
[RFC1123] Internet Engineering Task Force, R. Braden (editor),
"Requirements for Internet Hosts -- Application and Support",
1989-Oct, RFC 1123.
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", 1997-Mar, RFC 2119.
[SFS] David Mazieres et al, "Self-certifying File System",
http://www.fs.net/.
[UNICODE] The Unicode Consortium, "The Unicode Standard",
http://www.unicode.org/unicode/standard/standard.html.
A. Acknowledgements
The basic encoding of integers to quartets to quintets to base-32
comes from earlier IETF work by Martin Duerst. DUDE uses a slight
variation on the idea.
Paul Hoffman provided helpful comments on this document.
The idea of avoiding 0, 1, o, and l in base-32 strings was taken
from SFS [SFS].
B. Author contact information
Mark Welter <mwelter@walid.com>
Brian W. Spolarich <briansp@walid.com>
WALID, Inc.
State Technology Park
2245 S. State St.
Ann Arbor, MI 48104
+1 734 822 2020
Adam M. Costello <amc@cs.berkeley.edu>
University of California, Berkeley
http://www.cs.berkeley.edu/~amc/
C. Mixed-case annotation
In order to use DUDE to represent case-insensitive Unicode strings,
higher layers need to case-fold the Unicode strings prior to DUDE
encoding. The encoded string can, however, use mixed-case base-32
(rather than all-lowercase or all-uppercase as recommended in
section 4 "Base-32 characters") as an annotation telling how to
convert the folded Unicode string into a mixed-case Unicode string
for display purposes.
Each Unicode code point (unless it is U+002D hyphen-minus) is
represented by a sequence of base-32 characters, the last of which
is always a letter (as opposed to a digit). If that letter is
uppercase, it is a suggestion that the Unicode character be mapped
to uppercase (if possible); if the letter is lowercase, it is a
suggestion that the Unicode character be mapped to lowercase (if
possible).
DUDE encoders and decoders are not required to support these
annotations, and higher layers need not use them.
Example: In order to suggest that example O in section 7 "Example
strings" be displayed as:
<amuro><namie>-with-SUPER-MONKEYS
one could capitalize the DUDE encoding as:
x58jupu8nuy6gt99m-yssctqtptn-tMGFtFtH-tRCBFQtNK
D. Differences from draft-ietf-idn-dude-01
Four changes have been made since draft-ietf-idn-dude-01 (DUDE-01):
1) DUDE-01 computed the XOR of each integer with the previous one
in order to decide how many bits of each integer to encode, but
now the XOR itself is encoded, so there is no need for a mask.
2) DUDE-01 made the first quintet of each sequence different from
the rest, while now it is the last quintet that differs, so it's
easier for the decoder to detect the end of the sequence.
3) The base-32 map has changed to avoid 0, 1, o, and l, to help
humans avoid transcription errors.
4) The initial value of the previous code point has changed from 0
to 0x60, making the encodings of a few domain names shorter and
none longer.
E. Example implementation
/******************************************/
/* dude.c 0.2.3 (2001-May-31-Thu) */
/* Adam M. Costello <amc@cs.berkeley.edu> */
/******************************************/
/* This is ANSI C code (C89) implementing */
/* DUDE (draft-ietf-idn-dude-02). */
/************************************************************/
/* Public interface (would normally go in its own .h file): */
#include <limits.h>
enum dude_status {
dude_success,
dude_bad_input,
dude_big_output /* Output would exceed the space provided. */
};
enum case_sensitivity { case_sensitive, case_insensitive };
#if UINT_MAX >= 0x1FFFFF
typedef unsigned int u_code_point;
#else
typedef unsigned long u_code_point;
#endif
enum dude_status dude_encode(
unsigned int input_length,
const u_code_point input[],
const unsigned char uppercase_flags[],
unsigned int *output_size,
char output[] );
/* dude_encode() converts Unicode to DUDE (without any */
/* signature). The input must be represented as an array */
/* of Unicode code points (not code units; surrogate pairs */
/* are not allowed), and the output will be represented as */
/* null-terminated ASCII. The input_length is the number of code */
/* points in the input. The output_size is an in/out argument: */
/* the caller must pass in the maximum number of characters */
/* that may be output (including the terminating null), and on */
/* successful return it will contain the number of characters */
/* actually output (including the terminating null, so it will be */
/* one more than strlen() would return, which is why it is called */
/* output_size rather than output_length). The uppercase_flags */
/* array must hold input_length boolean values, where nonzero */
/* means the corresponding Unicode character should be forced */
/* to uppercase after being decoded, and zero means it is */
/* caseless or should be forced to lowercase. Alternatively, */
/* uppercase_flags may be a null pointer, which is equivalent */
/* to all zeros. The encoder always outputs lowercase base-32 */
/* characters except when nonzero values of uppercase_flags */
/* require otherwise. The return value may be any of the */
/* dude_status values defined above; if not dude_success, then */
/* output_size and output may contain garbage. On success, the */
/* encoder will never need to write an output_size greater than */
/* input_length*k+1 if all the input code points are less than 1 */
/* << (4*k), because of how the encoding is defined. */
enum dude_status dude_decode(
enum case_sensitivity case_sensitivity,
char scratch_space[],
const char input[],
unsigned int *output_length,
u_code_point output[],
unsigned char uppercase_flags[] );
/* dude_decode() converts DUDE (without any signature) to */
/* Unicode. The input must be represented as null-terminated */
/* ASCII, and the output will be represented as an array of */
/* Unicode code points. The case_sensitivity argument influences */
/* the check on the well-formedness of the input string; it */
/* must be case_sensitive if case-sensitive comparisons are */
/* allowed on encoded strings, case_insensitive otherwise. */
/* The scratch_space must point to space at least as large */
/* as the input, which will get overwritten (this allows the */
/* decoder to avoid calling malloc()). The output_length is */
/* an in/out argument: the caller must pass in the maximum */
/* number of code points that may be output, and on successful */
/* return it will contain the actual number of code points */
/* output. The uppercase_flags array must have room for at */
/* least output_length values, or it may be a null pointer if */
/* the case information is not needed. A nonzero flag indicates */
/* that the corresponding Unicode character should be forced to */
/* uppercase by the caller, while zero means it is caseless or */
/* should be forced to lowercase. The return value may be any */
/* of the dude_status values defined above; if not dude_success, */
/* then output_length, output, and uppercase_flags may contain */
/* garbage. On success, the decoder will never need to write */
/* an output_length greater than the length of the input (not */
/* counting the null terminator), because of how the encoding is */
/* defined. */
/**********************************************************/
/* Implementation (would normally go in its own .c file): */
#include <string.h>
/* Character utilities: */
/* base32[q] is the lowercase base-32 character representing */
/* the number q from the range 0 to 31. Note that we cannot */
/* use string literals for ASCII characters because an ANSI C */
/* compiler does not necessarily use ASCII. */
static const char base32[] = {
97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, /* a-k */
109, 110, /* m-n */
112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, /* p-z */
50, 51, 52, 53, 54, 55, 56, 57 /* 2-9 */
};
/* base32_decode(c) returns the value of a base-32 character, in the */
/* range 0 to 31, or the constant base32_invalid if c is not a valid */
/* base-32 character. */
enum { base32_invalid = 32 };
static unsigned int base32_decode(char c)
{
if (c < 50) return base32_invalid;
if (c <= 57) return c - 26;
if (c < 97) c += 32;
if (c < 97 || c == 108 || c == 111 || c > 122) return base32_invalid;
return c - 97 - (c > 108) - (c > 111);
}
/* unequal(case_sensitivity,s1,s2) returns 0 if the strings s1 and s2 */
/* are equal, 1 otherwise. If case_sensitivity is case_insensitive, */
/* then ASCII A-Z are considered equal to a-z respectively. */
static int unequal( enum case_sensitivity case_sensitivity,
const char s1[], const char s2[] )
{
char c1, c2;
if (case_sensitivity != case_insensitive) return strcmp(s1,s2) != 0;
for (;;) {
c1 = *s1;
c2 = *s2;
if (c1 >= 65 && c1 <= 90) c1 += 32;
if (c2 >= 65 && c2 <= 90) c2 += 32;
if (c1 != c2) return 1;
if (c1 == 0) return 0;
++s1, ++s2;
}
}
/* Encoder: */
enum dude_status dude_encode(
unsigned int input_length,
const u_code_point input[],
const unsigned char uppercase_flags[],
unsigned int *output_size,
char output[] )
{
unsigned int max_out, in, out, k, j;
u_code_point prev, codept, diff, tmp;
char shift;
prev = 0x60;
max_out = *output_size;
for (in = out = 0; in < input_length; ++in) {
/* At the start of each iteration, in and out are the number of */
/* items already input/output, or equivalently, the indices of */
/* the next items to be input/output. */
codept = input[in];
if (codept == 0x2D) {
/* Hyphen-minus stands for itself. */
if (max_out - out < 1) return dude_big_output;
output[out++] = 0x2D;
continue;
}
diff = prev ^ codept;
/* Compute the number of base-32 characters (k): */
for (tmp = diff >> 4, k = 1; tmp != 0; ++k, tmp >>= 4);
if (max_out - out < k) return dude_big_output;
shift = uppercase_flags && uppercase_flags[in] ? 32 : 0;
/* shift controls the case of the last base-32 digit. */
/* Each quintet has the form 1xxxx except the last is 0xxxx. */
/* Computing the base-32 digits in reverse order is easiest. */
out += k;
output[out - 1] = base32[diff & 0xF] - shift;
for (j = 2; j <= k; ++j) {
diff >>= 4;
output[out - j] = base32[0x10 | (diff & 0xF)];
}
prev = codept;
}
/* Append the null terminator: */
if (max_out - out < 1) return dude_big_output;
output[out++] = 0;
*output_size = out;
return dude_success;
}
/* Decoder: */
enum dude_status dude_decode(
enum case_sensitivity case_sensitivity,
char scratch_space[],
const char input[],
unsigned int *output_length,
u_code_point output[],
unsigned char uppercase_flags[] )
{
u_code_point prev, q, diff;
char c;
unsigned int max_out, in, out, scratch_size;
enum dude_status status;
prev = 0x60;
max_out = *output_length;
for (c = input[in = 0], out = 0; c != 0; c = input[++in], ++out) {
/* At the start of each iteration, in and out are the number of */
/* items already input/output, or equivalently, the indices of */
/* the next items to be input/output. */
if (max_out - out < 1) return dude_big_output;
if (c == 0x2D) output[out] = c; /* hyphen-minus is literal */
else {
/* Base-32 sequence. Decode quintets until 0xxxx is found: */
for (diff = 0; ; c = input[++in]) {
q = base32_decode(c);
if (q == base32_invalid) return dude_bad_input;
diff = (diff << 4) | (q & 0xF);
if (q >> 4 == 0) break;
}
prev = output[out] = prev ^ diff;
}
/* Case of last character determines uppercase flag: */
if (uppercase_flags) uppercase_flags[out] = c >= 65 && c <= 90;
}
/* Enforce the uniqueness of the encoding by re-encoding */
/* the output and comparing the result to the input: */
scratch_size = ++in;
status = dude_encode(out, output, uppercase_flags,
&scratch_size, scratch_space);
if (status != dude_success || scratch_size != in ||
unequal(case_sensitivity, scratch_space, input)
) return dude_bad_input;
*output_length = out;
return dude_success;
}
/******************************************************************/
/* Wrapper for testing (would normally go in a separate .c file): */
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* For testing, we'll just set some compile-time limits rather than */
/* use malloc(), and set a compile-time option rather than using a */
/* command-line option. */
enum {
unicode_max_length = 256,
ace_max_size = 256,
test_case_sensitivity = case_insensitive
/* suitable for host names */
};
static void usage(char **argv)
{
fprintf(stderr,
"%s -e reads code points and writes a DUDE string.\n"
"%s -d reads a DUDE string and writes code points.\n"
"Input and output are plain text in the native character set.\n"
"Code points are in the form u+hex separated by whitespace.\n"
"A DUDE string is a newline-terminated sequence of LDH characters\n"
"(without any signature).\n"
"The case of the u in u+hex is the force-to-uppercase flag.\n"
, argv[0], argv[0]);
exit(EXIT_FAILURE);
}
static void fail(const char *msg)
{
fputs(msg,stderr);
exit(EXIT_FAILURE);
}
static const char too_big[] =
"input or output is too large, recompile with larger limits\n";
static const char invalid_input[] = "invalid input\n";
static const char io_error[] = "I/O error\n";
/* The following string is used to convert LDH */
/* characters between ASCII and the native charset: */
static const char ldh_ascii[] =
"................"
"................"
".............-.."
"0123456789......"
".ABCDEFGHIJKLMNO"
"PQRSTUVWXYZ....."
".abcdefghijklmno"
"pqrstuvwxyz";
int main(int argc, char **argv)
{
enum dude_status status;
int r;
char *p;
if (argc != 2) usage(argv);
if (argv[1][0] != '-') usage(argv);
if (argv[1][2] != 0) usage(argv);
if (argv[1][1] == 'e') {
u_code_point input[unicode_max_length];
unsigned long codept;
unsigned char uppercase_flags[unicode_max_length];
char output[ace_max_size], uplus[3];
unsigned int input_length, output_size, i;
/* Read the input code points: */
input_length = 0;
for (;;) {
r = scanf("%2s%lx", uplus, &codept);
if (ferror(stdin)) fail(io_error);
if (r == EOF || r == 0) break;
if (r != 2 || uplus[1] != '+' || codept > (u_code_point)-1) {
fail(invalid_input);
}
if (input_length == unicode_max_length) fail(too_big);
if (uplus[0] == 'u') uppercase_flags[input_length] = 0;
else if (uplus[0] == 'U') uppercase_flags[input_length] = 1;
else fail(invalid_input);
input[input_length++] = codept;
}
/* Encode: */
output_size = ace_max_size;
status = dude_encode(input_length, input, uppercase_flags,
&output_size, output);
if (status == dude_bad_input) fail(invalid_input);
if (status == dude_big_output) fail(too_big);
assert(status == dude_success);
/* Convert to native charset and output: */
for (p = output; *p != 0; ++p) {
i = *p;
assert(i <= 122 && ldh_ascii[i] != '.');
*p = ldh_ascii[i];
}
r = puts(output);
if (r == EOF) fail(io_error);
return EXIT_SUCCESS;
}
if (argv[1][1] == 'd') {
char input[ace_max_size], scratch[ace_max_size], *pp;
u_code_point output[unicode_max_length];
unsigned char uppercase_flags[unicode_max_length];
unsigned int input_length, output_length, i;
/* Read the DUDE input string and convert to ASCII: */
fgets(input, ace_max_size, stdin);
if (ferror(stdin)) fail(io_error);
if (feof(stdin)) fail(invalid_input);
input_length = strlen(input);
if (input[input_length - 1] != '\n') fail(too_big);
input[--input_length] = 0;
for (p = input; *p != 0; ++p) {
pp = strchr(ldh_ascii, *p);
if (pp == 0) fail(invalid_input);
*p = pp - ldh_ascii;
}
/* Decode: */
output_length = unicode_max_length;
status = dude_decode(test_case_sensitivity, scratch, input,
&output_length, output, uppercase_flags);
if (status == dude_bad_input) fail(invalid_input);
if (status == dude_big_output) fail(too_big);
assert(status == dude_success);
/* Output the result: */
for (i = 0; i < output_length; ++i) {
r = printf("%s+%04lX\n",
uppercase_flags[i] ? "U" : "u",
(unsigned long) output[i] );
if (r < 0) fail(io_error);
}
return EXIT_SUCCESS;
}
usage(argv);
return EXIT_SUCCESS; /* not reached, but quiets compiler warning */
}
INTERNET-DRAFT expires 2001-Dec-07

View File

@ -1,612 +0,0 @@
Internet Draft Patrik Faltstrom
draft-ietf-idn-idna-07.txt Cisco
February 24, 2002 Paul Hoffman
Expires in six months IMC & VPNC
Adam M. Costello
UC Berkeley
Internationalizing Domain Names in Applications (IDNA)
Status of this Memo
This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
Until now, there has been no standard method for domain names to use
characters outside the ASCII repertoire. This document defines
internationalized domain names (IDNs) and a mechanism called IDNA for
handling them in a standard fashion. IDNs use characters drawn from a
large repertoire (Unicode), but IDNA allows the non-ASCII characters to
be represented using the same octets used in so-called host names
today. IDNA is only meant for processing domain names, not free
text.
1. Introduction
IDNA works by allowing applications to use certain ASCII name labels
(beginning with a special prefix) to represent non-ASCII name labels.
Lower-layer protocols need not be aware of this; therefore IDNA does not
require changes to any infrastructure. In particular, IDNA does not
require any changes to DNS servers, resolvers, or protocol elements,
because the ASCII name service provided by the existing DNS is entirely
sufficient.
This document does not require any applications to conform to IDNA,
but applications can elect to use IDNA in order to support IDN while
maintaining interoperability with existing infrastructure. Adding IDNA
support to an existing application entails changes to the application
only, and leaves room for flexibility in the user interface.
A great deal of the discussion of IDN solutions has focused on
transition issues and how IDN will work in a world where not all of the
components have been updated. Other proposals would require that user
applications, resolvers, and DNS servers be updated in order for a user
to use an internationalized domain name. Rather than require widespread
updating of all components, IDNA requires only user applications to be
updated; no changes are needed to the DNS protocol or any DNS servers or
the resolvers on user's computers.
1.1 Interaction of protocol parts
IDNA requires that implementations process input strings with Nameprep
[NAMEPREP], which is a profile of Stringprep [STRINGPREP], and then with
Punycode [PUNYCODE]. Implementations of IDNA MUST fully implement
Nameprep and Punycode; neither Nameprep nor Punycode are optional.
2 Terminology
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
"MAY" in this document are to be interpreted as described in RFC 2119
[RFC2119].
A code point is an integral value associated with a character in a coded
character set.
Unicode [UNICODE] is a coded character set containing tens of thousands
of characters. A single Unicode code point is denoted by "U+" followed
by four to six hexadecimal digits, while a range of Unicode code points
is denoted by two hexadecimal numbers separated by "..", with no
prefixes.
ASCII means US-ASCII, a coded character set containing 128 characters
associated with code points in the range 0..7F. Unicode is an extension
of ASCII: it includes all the ASCII characters and associates them with
the same code points.
The term "LDH code points" is defined in this document to mean the code
points associated with ASCII letters, digits, and the hyphen-minus; that
is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an abbreviation for
"letters, digits, hyphen".
[STD13] talks about "domain names" and "host names", but many people use
the terms interchangeably. Further, because [STD13] was not terribly
clear, many people who are sure they know the exact definitions of each
of these terms disagree on the definitions.
A label is an individual part of a domain name. Labels are usually shown
separated by dots; for example, the domain name "www.example.com" is
composed of three labels: "www", "example", and "com". (The zero-length
root label that is implied in domain names, as described in [STD13], is
not considered a label in this specification.) Throughout this document
the term "label" is shorthand for "text label", and "every label" means
"every text label". In IDNA, not all text strings can be labels.
An "internationalized domain name" (IDN) is a domain name for which the
ToASCII operation (see section 4) can be applied to each label without
failing. This document does not attempt to define an "internationalized
host name". It is expected that protocols and name-handling bodies will
want to limit the characters allowed in IDNs further than what is
specified in this document, such as to prohibit additional characters
that they feel are unneeded or harmful in registered domain names.
An "internationalized label" is a label composed of characters from the
Unicode character set; note, however, that not every string of Unicode
characters can be an internationalized label. To allow internationalized
labels to be handled by existing applications, IDNA uses an "ACE label"
(ACE stands for ASCII Compatible Encoding), which can be represented
using only ASCII characters but is equivalent to a label containing
non-ASCII characters. More rigorously, an ACE label is defined to be any
label that the ToUnicode operation would alter (see section 4.2). For
every internationalized label that cannot be directly represented in
ASCII, there is an equivalent ACE label. The conversion of labels to and
from the ACE form is specified in section 4.
The "ACE prefix" is defined in this document to be a string of ASCII
characters that appears at the beginning of every ACE label. It is
specified in section 5.
A "domain name slot" is defined in this document to be a protocol element
or a function argument or a return value (and so on) explicitly
designated for carrying a domain name. Examples of domain name slots
include: the QNAME field of a DNS query; the name argument of the
gethostbyname() library function; the part of an email address following
the at-sign (@) in the From: field of an email message header; and the host
portion of the URI in the src attribute of an HTML <IMG> tag.
General text that just happens to contain a domain name is not a domain name
slot; for example, a domain name appearing in the plain text body of an
email message is not occupying a domain name slot.
An "internationalized domain name slot" is defined in this document to
be a domain name slot explicitly designated for carrying an
internationalized domain name as defined in this document. The
designation may be static (for example, in the specification of the
protocol or interface) or dynamic (for example, as a result of
negotiation in an interactive session).
A "generic domain name slot" is defined in this document to be any
domain name slot that is not an internationalized domain name slot.
Obviously, this includes any domain name slot whose specification
predates IDNA.
3. Requirements
IDNA conformance means adherence of the following three requirements:
1) Whenever a domain name is put into a generic domain name slot (see
section 2), every label MUST contain only ASCII characters. Given an
internationalized domain name (IDN), an equivalent domain name
satisfying this requirement can be obtained by applying the ToASCII
operation (see section 4) to each label.
2) ACE labels obtained from domain name slots SHOULD be hidden from
users except when the use of the non-ASCII form would cause problems or
when the ACE form is explicitly requested. Given an internationalized
domain name, an equivalent domain name containing no ACE labels can be
obtained by applying the ToUnicode operation (see section 4) to each
label. When requirements 1 and 2 both apply, requirement 1 takes
precedence.
3) Whenever two labels are compared, they MUST be considered to
match if and only if their ASCII forms (obtained by applying ToASCII)
match using a case-insensitive ASCII comparison.
4. Conversion operations
This section specifies the ToASCII and ToUnicode operations. Each one
operates on a sequence of Unicode code points (but remember that all
ASCII code points are also Unicode code points). When domain names are
represented using character sets other than Unicode and ASCII, they will
need to first be transcoded to Unicode before these operations can be
applied, and might need to be transcoded back afterwards.
4.1 ToASCII
The ToASCII operation takes a sequence of Unicode code points and
transforms it into a sequence of code points in the ASCII range (0..7F).
The original sequence and the resulting sequence are equivalent labels.
(If the original is an internationalized label that cannot be directly
represented in ASCII, the result will be the equivalent ACE label.)
ToASCII fails if any step of it fails. If any step fails, the original
sequence MUST NOT be used as a label in an IDN.
The inputs to ToASCII are a sequence of code points; a flag indicating
whether to prohibit unassigned code points (see [STRINGPREP]); and a
flag indicating whether to apply the host name syntax rules. The output
of ToASCII is either a sequence of ASCII code points or a failure
condition.
ToASCII never alters a sequence of code points that are all in the ASCII
range to begin with (although it could fail).
ToASCII consists of the following steps:
1. If all code points in the sequence are in the ASCII range (0..7F)
then skip to step 3.
2. Perform the steps specified in [NAMEPREP] and fail if there is
an error.
3. If the label is part of a host name (or is subject to the host
name syntax rules) then perform these checks:
(a) Verify the absence of non-LDH ASCII code points; that is,
the absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
(b) Verify the absence of leading and trailing hyphen-minus;
that is, the absence of U+002D at the beginning and end of
the sequence.
4. If all code points in the sequence are in the ASCII range (0..7F),
then skip to step 8.
5. Verify that the sequence does NOT begin with the ACE prefix.
6. Encode the sequence using the encoding algorithm in [PUNYCODE].
7. Prepend the ACE prefix.
8. Verify that the number of code points is in the range 1 to 63
inclusive.
4.2 ToUnicode
The ToUnicode operation takes a sequence of Unicode code points and
returns a sequence of Unicode code points. If the input sequence is a
label in ACE form, then the result is an equivalent internationalized
label that is not in ACE form, otherwise the original sequence is
returned unaltered.
ToUnicode never fails. If any step fails, then the original input
sequence is returned immediately in that step.
The inputs to ToUnicode are a sequence of code points; a flag indicating
whether to prohibit unassigned code points (see [STRINGPREP]); and a
flag indicating whether to apply the host name syntax rules. The output
of ToUnicode is always a sequence of Unicode code points.
1. If all code points in the sequence are in the ASCII range (0..7F)
then skip to step 3.
2. Perform the steps specified in [NAMEPREP] and fail if there is an
error. (If step 3 of ToASCII is also performed here, it will not
affect the overall behavior of ToUnicode, but it is not
necessary.)
3. Verify that the sequence begins with the ACE prefix, and save a
copy of the sequence.
4. Remove the ACE prefix.
5. Decode the sequence using decoding algorithm in [PUNYCODE]. Save
a copy of the result of this step.
6. Apply ToASCII.
7. Verify that the sequence matches the saved copy from step 3, using
a case-insensitive ASCII comparison.
8. Return the saved copy from step 5.
5. ACE prefix
[[ Note to the IESG and Internet Draft readers: The two uses of the
string "IESG--" below are to be changed at time of publication to a
prefix which fulfills the requirements in the first paragraph. ]]
The ACE prefix, used in the conversion operations (section 4), is two
alphanumeric ASCII characters followed by two hyphen-minuses. It cannot
be any of the prefixes already used in earlier documents, which includes
the following: "bl--", "bq--", "dq--", "lq--", "mq--", "ra--", "wq--"
and "zq--". The ToASCII and ToUnicode operations MUST recognize the ACE
prefix in a case-insensitive manner.
The ACE prefix for IDNA is "IESG--".
This means that an ACE label might be "IESG--de-jg4avhby1noc0d", where
"de-jg4avhby1noc0d" is the part of the ACE label that is generated by
the encoding steps in [PUNYCODE].
6. Implications for typical applications using DNS
In IDNA, applications perform the processing needed to input
internationalized domain names from users, display internationalized
domain names to users, and process the inputs and outputs from DNS and
other protocols that carry domain names.
The components and interfaces between them can be represented
pictorially as:
+------+
| User |
+------+
^
| Input and display: local interface methods
| (pen, keyboard, glowing phosphorus, ...)
+-------------------|-------------------------------+
| v |
| +-----------------------------+ |
| | Application | |
| | (conversion between local | |
| | character set and Unicode | |
| | is done here) | |
| +-----------------------------+ |
| ^ ^ | End system
| | | |
| Call to resolver: | | Application-specific |
| ACE | | protocol: |
| v | predefined by the |
| +----------+ | protocol or defaults |
| | Resolver | | to ACE |
| +----------+ | |
| ^ | |
+-----------------|----------|----------------------+
DNS protocol: | |
ACE | |
v v
+-------------+ +---------------------+
| DNS servers | | Application servers |
+-------------+ +---------------------+
6.1 Entry and display in applications
Applications can accept domain names using any character set or sets
desired by the application developer, and can display domain names in any
charset. That is, the IDNA protocol does not affect the interface
between users and applications.
An IDNA-aware application can accept and display internationalized
domain names in two formats: the internationalized character set(s)
supported by the application, and as an ACE label. ACE labels that are
displayed or input MUST always include the ACE prefix. Applications MAY
allow input and display of ACE labels, but are not encouraged to do so
except as an interface for special purposes, possibly for debugging. ACE
encoding is opaque and ugly, and should thus only be exposed to users
who absolutely need it. The optional use, especially during a transition
period, of ACE encodings in the user interface is described in section
6.4. Because name labels encoded as ACE name labels can be rendered
either as the encoded ASCII characters or the proper decoded characters,
the application MAY have an option for the user to select the preferred
method of display; if it does, rendering the ACE SHOULD NOT be the
default.
Domain names are often stored and transported in many places. For example,
they are part of documents such as mail messages and web pages. They are
transported in many parts of many protocols, such as both the
control commands and the RFC 2822 body parts of SMTP, and the headers
and the body content in HTTP. It is important to remember that domain
names appear both in domain name slots and in the content that is passed
over protocols.
In protocols and document formats that define how to handle
specification or negotiation of charsets, labels can be encoded in any
charset allowed by the protocol or document format. If a protocol or
document format only allows one charset, the labels MUST be given in
that charset.
In any place where a protocol or document format allows transmission of
the characters in internationalized labels, internationalized labels
SHOULD be transmitted using whatever character encoding and escape
mechanism that the protocol or document format uses at that place.
All protocols that use domain name slots already have the capacity for
handling domain names in the ASCII charset. Thus, ACE labels
(internationalized labels that have been processed with the ToASCII
operation) can inherently be handled by those protocols.
6.2 Applications and resolver libraries
Applications normally use functions in the operating system when they
resolve DNS queries. Those functions in the operating system are often
called "the resolver library", and the applications communicate with the
resolver libraries through a programming interface (API).
Because these resolver libraries today expect only domain names in
ASCII, applications MUST prepare labels that are passed to the resolver
library using the ToASCII operation. Labels received from the resolver
library contain only ASCII characters; internationalized labels that
cannot be represented directly in ASCII use the ACE form. ACE labels
always include the ACE prefix.
IDNA-aware applications MUST be able to work with both
non-internationalized labels (those that conform to [STD13]
and [STD3]) and internationalized labels.
It is expected that new versions of the resolver libraries in the future
will be able to accept domain names in other formats than ASCII, and
application developers might one day pass not only domain names in
Unicode, but also in local script to a new API for the resolver
libraries in the operating system.
6.3 DNS servers
An operating system might have a set of libraries for performing the
ToASCII operation. The input to such a library might be in one or more
charsets that are used in applications (UTF-8 and UTF-16 are likely
candidates for almost any operating system, and script-specific charsets
are likely for localized operating systems).
For internationalized labels that cannot be represented directly in
ASCII, DNS servers MUST use the ACE form produced by the ToASCII
operation. All IDNs served by DNS servers MUST contain only ASCII
characters.
If a signalling system which makes negotiation possible between old and
new DNS clients and servers is standardized in the future, the encoding
of the query in the DNS protocol itself can be changed from ACE to
something else, such as UTF-8. The question whether or not this should
be used is, however, a separate problem and is not discussed in this
memo.
6.4 Avoiding exposing users to the raw ACE encoding
All applications that might show the user a domain name obtained from a
domain name slot, such as from gethostbyaddr or part of a mail header,
SHOULD be updated as soon as possible in order to prevent users from
seeing the ACE.
If an application decodes an ACE name using ToUnicode but cannot show
all of the characters in the decoded name, such as if the name contains
characters that the output system cannot display, the application SHOULD
show the name in ACE format (which always includes the ACE prefix)
instead of displaying the name with the replacement character (U+FFFD).
This is to make it easier for the user to transfer the name correctly to
other programs. Programs that by default show the ACE form when they
cannot show all the characters in a name label SHOULD also have a
mechanism to show the name that is produced by the ToUnicode operation
with as many characters as possible and replacement characters in the
positions where characters cannot be displayed.
The ToUnicode operation does not alter labels that are not valid ACE
labels, even if they begin with the ACE prefix. After ToUnicode has been
applied, if a label still begins with the ACE prefix, then it is not a
valid ACE label, and is not equivalent to any of the intermediate
Unicode strings constructed by ToUnicode.
6.5 Bidirectional text in domain names
The display of domain names that contain bidirectional text is not covered
in this document. It may be covered in a future version of this
document, or may be covered in a different document.
For developers interested in displaying domain names that have
bidirectional text, the Unicode standard has an extensive discussion of
how to deal with reorder glyphs for display when dealing with
bidirectional text such as Arabic or Hebrew. See [UAX9] for more
information. In particular, all Unicode text is stored in logical order.
6.6 DNSSEC authentication of IDN domain names
DNS Security [DNSSEC] is a method for supplying cryptographic
verification information along with DNS messages. Public Key
Cryptography is used in conjunction with digital signatures to provide a
means for a requester of domain information to authenticate the source
of the data. This ensures that it can be traced back to a trusted
source, either directly, or via a chain of trust linking the source of
the information to the top of the DNS hierarchy.
IDNA specifies that all internationalized domain names served by DNS
servers that cannot be represented directly in ASCII must use the ACE
form produced by the ToASCII operation. This operation must be performed
prior to a zone being signed by the private key for that zone. Because
of this ordering, it is important to recognize that DNSSEC authenticates
the ASCII domain name, not the Unicode form or the mapping between the
Unicode form and the ASCII form. In other words, the output of ToASCII
is the canonical name. In the presence of DNSSEC, this is the name that
MUST be signed in the zone and MUST be validated against. It also SHOULD
be used for other name comparisons, such as when a browser wants to
indicate that a URL has been previously visited.
One consequence of this for sites deploying IDNA in the presence of
DNSSEC is that any special purpose proxies or forwarders used to
transform user input into IDNs must be earlier in the resolution flow
than DNSSEC authenticating nameservers for DNSSEC to work.
6.7 Limitations of IDNA
The IDNA protocol does not solve all linguistic issues with users
inputting names in different scripts. Many important language-based and
script-based mappings are not covered in IDNA and must be handled
outside the protocol. For example, names that are entered in a mix of
traditional and simplified Chinese characters will not be mapped to a
single canonical name. Another example is Scandinavian names that are
entered with U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS) will not be
mapped to U+00F8 (LATIN SMALL LETTER O WITH STROKE).
7. Name Server Considerations
Internationalized domain name data in zone files (as specified by section
5 of RFC 1035) MUST be processed with ToASCII before it is entered in
the zone files.
It is imperative that there be only one ASCII encoding for a particular
domain name. ACE is an encoding for domain name labels that use non-ASCII
characters. Thus, a primary master name server MUST NOT contain an
ACE-encoded label that decodes to an ASCII label. The ToASCII operation
assures that no such names are ever output from the operation.
Name servers MUST NOT serve records with domain names that contain
non-ASCII characters; such names MUST be converted to ACE form by the
ToASCII operation in order to be served. If names that are not processed
by ToASCII are passed to an application, it will result in unpredictable
behavior. Note that [STRINGPREP] describes how to handle versioning of
unallocated codepoints.
8. Root Server Considerations
IDNs are likely to be somewhat longer than current host names, so the
bandwidth needed by the root servers should go up by a small amount.
Also, queries and responses for IDNs will probably be somewhat longer
than typical queries today, so more queries and responses may be forced
to go to TCP instead of UDP.
9. Security Considerations
Security on the Internet partly relies on the DNS. Thus, any
change to the characteristics of the DNS can change the security of much
of the Internet.
This memo describes an algorithm which encodes characters that are not
valid according to STD3 and STD13 into octet values that are valid. No
security issues such as string length increases or new allowed values
are introduced by the encoding process or the use of these encoded
values, apart from those introduced by the ACE encoding itself.
Domain names are used by users to connect to Internet servers. The
security of the Internet would be compromised if a user entering a
single internationalized name could be connected to different servers
based on different interpretations of the internationalized domain name.
Because this document normatively refers to [NAMEPREP], it includes the
security considerations from that document as well.
A. References
[PUNYCODE] Adam Costello, "Punycode", draft-ietf-idn-punycode.
[DNSSEC] Don Eastlake, "Domain Name System Security Extensions", RFC
2535, March 1999.
[NAMEPREP] Paul Hoffman and Marc Blanchet, "Preparation of
Internationalized Domain Names", draft-ietf-idn-nameprep.
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997, RFC 2119.
[STD3] Bob Braden, "Requirements for Internet Hosts -- Communication
Layers" (RFC 1122) and "Requirements for Internet Hosts -- Application
and Support" (RFC 1123), STD 3, October 1989.
[STD13] Paul Mockapetris, "Domain names - concepts and facilities" (RFC
1034) and "Domain names - implementation and specification" (RFC 1035),
STD 13, November 1987.
[STRINGPREP] Paul Hoffman and Marc Blanchet, "Preparation of
Internationalized Strings ("stringprep")", draft-hoffman-stringprep,
work in progress
.
[UAX9] Unicode Standard Annex #9, The Bidirectional Algorithm,
<http://www.unicode.org/unicode/reports/tr9/>.
[UNICODE] The Unicode Standard, Version 3.1.0: The Unicode Consortium.
The Unicode Standard, Version 3.0. Reading, MA, Addison-Wesley
Developers Press, 2000. ISBN 0-201-61633-5, as amended by: Unicode
Standard Annex #27: Unicode 3.1,
<http://www.unicode.org/unicode/reports/tr27/tr27-4.html>.
B. Authors' Addresses
Patrik Faltstrom
Cisco Systems
Arstaangsvagen 31 J
S-117 43 Stockholm Sweden
paf@cisco.com
Paul Hoffman
Internet Mail Consortium and VPN Consortium
127 Segre Place
Santa Cruz, CA 95060 USA
phoffman@imc.org
Adam M. Costello
University of California, Berkeley
idna-spec.amc @ nicemice.net

View File

@ -1,426 +0,0 @@
Internet Draft Marc Blanchet
draft-ietf-idn-idne-02.txt Viagenie
March 19, 2001 Paul Hoffman
Expires in six months IMC & VPNC
Internationalized domain names using EDNS (IDNE)
Status of this Memo
This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
The current DNS infrastructure does not provide a way to use
internationalized domain names (IDN). This document describes an
extension mechanism based on EDNS which enables the use of IDN without
causing harm to the current DNS. IDNE enables IDN host names with a as
many characters as current ASCII-only host names. It fully supports
UTF-8 and conforms to the IDN requirements.
1. Introduction
Various proposals for IDN have tried to integrate IDN into the current
limited ASCII DNS. However, the compatibility issues make too many
constraints on the architecture. Many of these proposals require
modifications to the applications or to the DNS protocol or to the
servers. This proposal take a different approach: it uses the
standardized extension mechanism for DNS (EDNS) and uses UTF-8 as the
mandatory charset. It causes no harm to the current DNS because it uses
the EDNS extension mechanism. The major drawback of this proposal is
that all protocols, applications and DNS servers will have to be
upgraded to support this proposal.
1.1 Terminology
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
"MAY" in this document are to be interpreted as described in RFC 2119
[RFC2119].
Hexadecimal values are shown preceded with an "0x". For example,
"0xa1b5" indicates two octets, 0xa1 followed by 0xb5. Binary values are
shown preceded with an "0b". For example, a nine-bit value might be
shown as "0b101101111".
Examples in this document use the notation from the Unicode Standard
[UNICODE3] as well as the ISO 10646 [ISO10646] names. For example, the
letter "a" may be represented as either "U+0061" or "LATIN SMALL LETTER
A". In the lists of prohibited characters, the "U+" is left off to make
the lists easier to read.
1.2 IDN summary
Using the terminology in [IDNCOMP], this protocol specifies an IDN
architecture of arch-2 (send binary or ACE). The binary format is
bin-1.1 (UTF-8), and the method for distinguishing binary from current
names is bin-2.4 (mark binary with EDNS0). The transition period is not
specified.
2. Functional Description
DNS query and responses containing IDNE labels have the following
properties:
- The string in the label MUST be pre-processed as described in
[NAMEPREP] before the query or response is prepared.
- The characters in the label MUST be encoded using UTF-8 [RFC2279].
- The entire label MUST be encoded EDNS [RFC2671].
- The version of the IDN protocol MUST be identified.
3. Encoding
An IDNE label uses the EDNS extended label type prefix (0b01), as
described in [RFC2671]. (A normal label type always begin with 0b00). A
new extended label type for IDNE is used to identify an IDNE label. This
document uses 0b000010 as the extended label type; however, the label
type will be assigned by IANA and it may not be 0b000010.
0 1 2
bits 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 . . .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-//+-+-+-+-+-+-+
|0 1| ELT | Size | IDN label ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+-+-+-+-+
ELT: The six-bit extended label type to be assigned by the IANA for an
IDN label. In this document, the value 0b000010 is used, although that
might be changed by IANA.
Size: Size (in octets) of the IDN label following. This MUST NOT
be zero.
IDN label: Label, encoded in UTF-8 [RFC2279]. Note that this label might
contain all ASCII characters, and thus can be used for host name labels
that are legal in [STD13].
IDNE labels can be mixed with STD13 labels in a domain name.
The compression scheme in section 4.1.4 of [STD13] is supported as is.
Pointers can refer to either IDN labels or non-IDN labels.
3.1 Examples
3.1.1 Basic example
The following example shows the label me.com where the "e" in "me" is
replaced by a <LATIN CAPITAL LETTER E WITH ACUTE>, which is U+00C9. The
decomposition and downcasing specified in [NAMEPREP] changes the second
character to <LATIN SMALL LETTER E WITH ACUTE>, U+00E9. This string is
then transformed using UTF-8 [RFC2279] to 0x6DC3A9.
Ignoring the other fields of the message, the domain name portion of the
datagram could look like:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
20 | 0 1 0 0 0 0 1 0| 0 0 0 0 0 0 1 1|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
22 | 0x6D (m) | 0xC3 (e'(1)) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
24 | 0xA9 (e'(2)) | 3 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
26 | 0x63 (c) | 0x6F (o) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
28 | 0x6D (m) | 0x00 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Octet 20 means EDNS extended label type (0b01) using the IDN label
type (0b000010)
Octet 21 means size of label is 3 octets following
Octet 22-24 are the "m*" label encoded in UTF-8
Octet 25-28 are "com" encoded as a STD13 label
Octet 29 is the root domain
3.1.2 Example with compression
Using the previous labels, one datagram might contain "www.m*.com" and
"m*.com" (where the "*" is <LATIN CAPITAL LETTER E WITH ACUTE>).
Ignoring the other fields of the message, the domain name portions of
the datagram could look like:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
20 | 0 1 0 0 0 0 1 0| 0 0 0 0 0 0 1 1|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
22 | 0x6D (m) | 0xC3 (e'(1)) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
24 | 0xA9 (e'(2)) | 3 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
26 | 0x63 (c) | 0x6F (o) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
28 | 0x6D (m) | 0x00 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
. . .
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
40 | 3 | 0x77 (w) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
42 | 0x77 (w) | 0x77 (w) |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
44 | 1 1| 20 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
The domain name "m*.com" is shown at offset 20. The domain name
"www.m*.com" is shown at offset 40; this definition uses a pointer to
concatenate a label for www to the previously defined "m*.com".
4. Label Size
In IDNE, the maximum length of a label is 255 octets, and the maximum
size for a domain name is 1023 octets. The reason for using these values
is so that IDNE labels can have the same number of characters as the
ASCII-based labels in [STD13]. Because character encoding in UTF-8 is
variable length, the maximum octet length for characters expected in the
foreseeable future (that is, 4 octets for a single character) was used.
Note that this extension allows some IDNE labels to be longer than 63
characters and some IDNE names to be longer than 255 octets.
Software creating DNS queries or responses using IDNE MUST verify that,
after IDN preparation and transformation to UTF8, that no labels are
longer than 255 octets and that no names are longer than 1023 octets. If
there is a user interface associated with the process creating the query
or response, that interface SHOULD give the user an error message.
Software MUST NOT transmit DNS queries or responses which contain labels
that are longer than 255 octets or names that are longer than 1023
octets. Servers MUST NOT accept DNS queries or responses which contain
labels that are longer than 255 octets or names that are longer than
1023 octets, and MUST send the NOTIMPL RCODE error message if such
queries or responses are received.
5. UDP Packet Size
IDNE-capable senders and receivers MUST support UDP packet sizes of 1220
octets, not including IP and UDP headers (note that the minimum MTU for
IPv6 is 1280 [RFC2460]). A sender MUST announce its capability in the
OPT pseudo-RR described in section 4.3 of [RFC2671] by having the CLASS
sender's UDP payload size be greater than or equal to 1220.
6. Canonalization, Prohibited Characters, and Case Folding
The string in the label MUST be pre-processed as described in [NAMEPREP]
before the query or response is prepared. A query or response MUST NOT
contain a label that does not conform to [NAMEPREP].
7. Versions of IDNE
The IDN protocol version number MUST be included in the OPT RR RDATA of
EDNS (described in Section 4.4 of [RFC2671]). An OPTION-CODE will be
assigned by IANA for storing the IDNE protocol version number; this
document uses 0x0001 for the OPTION-CODE. The value (that
is, the OPTION-DATA) is the version number coded in 8 bits.
All requesters MUST send this information as part of the OPT RR included
in the EDNS packet.
7.1 This version of IDNE
This document describes version 1 of IDNE. This version is a combination
of the protocol in this document and the rules as described in
[NAMEPREP]. Note that [NAMEPREP] describes a single version of the list
of canonicalization, case folding, and prohibited characters, and that
this document is linked to that single version of [NAMEPREP].
The identifiers for this specification are:
OPTION-CODE = 0x0001 (IDNE protocol version)
OPTION-LENGTH = 0x0001 (1 octet following)
OPTION-DATA = 0x01 (IDNE protocol version 1)
7.2 Creating new versions of IDNE
A new version of IDNE is created by a standards-track RFC that
specifies:
- a normative reference to [NAMEPREP] or a successor document to
[NAMEPREP]
- an IDNE version number that is 1 greater than the highest IDNE version
number at the time the RFC is published
If there are any changes to the encoding or interpretation of the
protocol, they must also be specified in the same standards-track RFC.
7.3 Prohibited characters and versions of IDNE
If a server receives a request containing an illegal or unknown
character (as described in the version number in the request), it MUST
send a NOTIMPL RCODE to the client. For example, if a server that
understands both version 1 and version 2 receives a request that is
marked as version 1, but contains a label that includes a character that
is prohibited in version 1 but allowed in version 2, that server must
still send a NOTIMPL RCODE to the client.
8. API Specifications
The current API for TCP/IP uses gethostbyname and gethostbyaddr for IPv4
and getnodeipbyname and getnodeipbyaddr (specified in [RFC 2671]) for
both IPv4 and IPv6. These function calls returns hostent structs, where
the h_name field contains a pointer to a char. In this context,
receiving a UTF-8 string mean that the application should know that
UTF-8 uses more than one octet per char.
A new flag "IDN" (to appear in netdb.h) is defined to be passed in the
flags argument of getnodeipbynode and getnodeipbyaddr. This flag tells
the resolver to request an IDNE-encoded name. No new return code is
defined since the returned codes in RFC 2671 are meaningful in the IDNE
context.
If one has not yet converted his code to IPv6 and still wants to enable
IDNs with this API, one can do a macro of the getnodeipby* functions
mapped to the IPv4 gethostby* ones, including the "IDN" flag, and then
process differently based on the presence of the flag.
9. Transition and Deployment
Deployment of this proposal means updating clients and servers, as well
as applications and protocols, and therefore a transition strategy is
proposed. Because many DNS servers do not yet handle IDNE and may take
years or decades to do so, an ASCII-compatible encoding (ACE) format for
IDN names is also needed as a transition to an all-IDNE DNS. Note that
IDNE and an ACE are not related, and do not interact in the DNS. If the
IETF chooses to have an ACE mechanism in use at the same time as IDNE,
it would be wise to choose an ACE that allows as many characters as
possible in the name parts and full names.
IDNE allows names with as many characters as current names. This means
that it is possible to create names in IDNE that are longer than those
that can be created in the ACE protocols that have been described so
far. Although not prohibited, it is unwise to create a name that can be
legally represented in IDNE but not in the ACE, or a name that can be
legally represented in the ACE but not in IDNE.
The IETF should periodically evaluate the benefits and problems
associated with having three different formats for names (STD13, IDNE,
and ACE). If at some point it is decided that the problems outweigh the
benefits, the IETF can state a time when one or more of the services
should not be used on the Internet.
10. Root Server Considerations
Because this specification uses EDNS, root servers should be prepared to
receive EDNS requests. This specification handles IDN top-level domains
in exactly the same fashion as it does every other domain.
Considerations about IDN top-level domains are outside of this work, but
the first IDN top-level domains would require all root servers to be
ready for IDNE requests.
11. IANA Considerations
[[ TBD. This section will have two parts. The first will request an EDNS
option code. The second will specify how IDNE version numbers are
allocated (namely, standards-track RFC only). ]]
12. Security Considerations
Because IDNE uses EDNS, it inherits the same security considerations as
EDNS.
Much of the security of the Internet relies on the DNS. Thus, any change
to the characteristics of the DNS can change the security of much of the
Internet.
Host names are used by users to connect to Internet servers. The
security of the Internet would be compromised if a user entering a
single internationalized name could be connected to different servers
based on different interpretations of the internationalized host name.
Because this document normatively refers to [NAMEPREP] and [RFC2671],
it includes the security considerations from those documents as well.
13. References
[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
Proposals", draft-ietf-idn-compare.
[ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information
technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part
1: Architecture and Basic Multilingual Plane. Five amendments and a
technical corrigendum have been published up to now. UTF-16 is described
in Annex Q, published as Amendment 1. 17 other amendments are currently
at various stages of standardization. [[[ THIS REFERENCE NEEDS TO BE
UPDATED AFTER DETERMINING ACCEPTABLE WORDING ]]]
[NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of
Internationalized Host Names", draft-ietf-idn-nameprep.
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997, RFC 2119.
[RFC2279] Francois Yergeau, "UTF-8, a transformation format of ISO
10646", January 1998, RFC 2279.
[RFC2460] Steve Deering & Bob Hinden, "Internet Protocol, Version 6 (IPv6)
Specification", December 1998, RFC 2460.
[RFC2671] Paul Vixie, "Extension Mechanisms for DNS (EDNS0)", August
1999, RFC 2671.
[STD13] Paul Mockapetris, "Domain names - implementation and
specification", November 1987, STD 13 (RFC 1035).
[UNICODE3] The Unicode Consortium, "The Unicode Standard -- Version
3.0", ISBN 0-201-61633-5. Described at
<http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>.
A. Acknowledgements
This document is the result of the thinking of many people. The following
people made significant comments on the early drafts:
Andre Cormier
Andrew Draper
Bill Sommerfeld
Francois Yergeau
B. Changes from -01 to -02
None.
C. Authors' Addresses
Marc Blanchet
Viagenie
2875 boul. Laurier, bureau 300
Sainte-Foy, QC G1V 2M2 Canada
Marc.Blanchet@viagenie.qc.ca
Paul Hoffman
Internet Mail Consortium and VPN Consortium
127 Segre Place
Santa Cruz, CA 95060 USA
phoffman@imc.org

View File

@ -1,540 +0,0 @@
INTERNET-DRAFT Hongbo Shi
draft-ietf-idn-iptr-02.txt Waseda University
17 May 2001 Jiang Ming Liang
Expires: 17 November 2001 i-DNS.net
Internationalized PTR Resource Record (IPTR)
Status of this Memo
This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Abstract
This draft attempts to address the problem of how an IP address SHOULD
be properly mapped to a set of Internationalized Domain Names(IDNs).
It is currently unspecified how a PTR record can be used for this
purpose. In addition, the syntax of the PTR resource record may be
too restrictive for such a mapping in a more culturally meaningful
context. This document suggests a new TYPE called IPTR using EDNS0
and a mechanism to combined language information with such a mapping.
1. Introduction
Reverse mapping is a very important and essential function in the DNS.
In today's Domain Name System, PTR RRs are used to support address-
to-domain mappings. However, a current PTR RR does not provide support
for proper address-to-IDN mappings, without certain modifications.
Modifying the PTR structure will also affect the current reverse
Shi, Jiang [Page 1]
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
mapping architecture. This document describes a new RR TYPE named IPTR
to provide address-to-IDN mappings and it also specifies that on
receiving of a IPTR query a name server should respond with all the
corresponding IPTR RRs in one response. In short, "one IP several
IDNs".
1.1 Terminology
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
and "MAY" in this document are to be interpreted as described in RFC
2119 [RFC2119].
1.2 Background and Designs
When Internationalized Domain Names come into wide use, an Internet
host is likely to have domain names in different languages. In
today's Internet, even thought the [RFC2181] redefine the
consideration of PTR, because of the design of the PTR mapping
algorithm and implementation of most resolvers, IP address to domain
names mapping is still limited to "one IP one domain name".
For example, BIND treats PTRs specially so that the normal sorting
preference (e.g. cyclic/random) doesn't apply. But as usual, "fixed"
order is always used. So a client that is querying a BIND server and
doesn't look beyond the first PTR RR, no matter how many times it
queries the name. In other words, PTR RRset is different from A RRset,
where the first record in the RRset might differ from query to query.
This is more restrictive in a world of IDNs, for choosing some names
in a particular language. Briefly, according to the use of PTR, it is
no meaning of returning an IDN in an unknown language.
The authors also believe that putting language information into
address-to-name mappings will be benifitial to future applications.
The design purpose of the IPTR RR type is to provide a mechanism that
can map an IP address to the corresponding IDN per language. It also
means that IPTR suggests a new mapping algorithm for the reverse
mapping by using an language information.
CNAME MUST continue to work for IPTR as it works now for PTR records.
The behavior of a resolver on the use of IPTR will be specified in a
seperate draft or a later version of this draft.
1.3 Functional Description
DNS query and responses involving IPTR type MUST have the following
Shi, Jiang [Page 2]
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
properties:
- When the QTYPE is IPTR, the corresponding IDNs SHOULD be
returned in one response.
- The characters in the label MUST be encoded using UTF-8
[RFC2279].
- The entire label MUST be encoded EDNS [RFC2671].
- An exceptional handling of PTR for the IDN is REQUIRED.
2. IPTR definition
The structure of an IPTR RR is somewhat like the MX RR. In addtion to
the IP address in the IN-ADDR.ARPA domain and the domain name field
(similar to a PTR RR), a new field called LANGUAGE has been defined.
A domain name in an IPTR RR MUST be encoded in UTF8. And IDN in this
document MUST be NAMEPREPPED. [NAMEPREP] Below is an example of an
IPTR RR:
1.2.3.4.IN-ADDR.ARPA. IPTR "LANGUAGE" "name-in-utf8"
[RFC1766] describes the ISO 639/ISO 3166 conventions. A language name
is always written in lower case, while country codes are written in
upper case. At here, the "LANGUAGE" field in an IPTR RR SHOULD be done
in a case-insensitive manner and MUST follow the conventions defined
in [RFC1766].
For Example:
4.3.2.1.IN-ADDR.ARPA. IPTR "zh-CN" "name-in-utf8"
4.3.2.1.IN-ADDR.ARPA. IPTR "zh-TW" "name-in-utf8"
4.3.2.1.IN-ADDR.ARPA. IPTR "ja-JP" "name-in-utf8"
4.3.2.1.IN-ADDR.ARPA. IPTR "ko-KR" "name-in-utf8"
The notion of canonical names and aliases described in 3.6.2
[RFC1034], and 10.2 [RFC2181] MUST be preserved for IPTR record types.
An IPTR RR SHOULD be limited to one primary IDN per LANGUAGE, similar
to the a PTR RR.
3. IPTR on IPv6
Shi, Jiang [Page 3]
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
Mapping IPv6 to IDNs can be similarly supported. This document recom-
mands to continue using the IP6.INT domain defined in [RFC1886] for
IPTR mappings. For example, the lookup corresponding to the address
4321:0:1:2:3:4:567:89ab would be:
b.a.9.8.7.6.5.0.4.0.0.0.3.0.0.0.2.0.0.0.1.0.0.0.0.0.0.0.1.2.3.4.IP6.INT.
IPTR "LANGUAGE" "name-in-utf8"
4. Packet format for IPTR
EDNS0[RFC2671] is REQUIRED to implement IPTR.
0 1 2 3 4
bits 0 1 2 3 4 5 6 7 8 9 0 1...9 0...8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 ...
+-+-+-+-+-+-+-+-+-+-//-+-+-//-+-+-+-+-+-+-+-+-+-+-+-+-+-//-+-+-+
|0 1| ELT | LANGUAGE | Size | IDN label... |
+-+-+-+-+-+-+-+-+-+-//-+-+-//-+-+-+-+-+-+-+-+-+-+-+-+-+-//-+-+-+
LANGUAGE: An argument for IPTR to define the kind of language
used in the following IDN label. The size is 2 octets.
ELT: To be defined in [IDNE].
5. Coexistence
5.1 IDN Consideration
IPTR described above is based on "a set of IDNs", strictly speaking, a
set of canonical IDNs. On the other hand, confusion about IDN, such as
"IDN MUST exist with ASCII domain name" has led to a belief that PTR
record should have exactly RRs in its RRSet. In short, the phenomenon
"IDN ONLY" will exist. Thus, the exceptional handling of PTR is
REQUIRED.
On the other hand, IDN is still RECOMMENDED to exist with more than
one ASCII domain name.
5.2 PTR Extension
In the case of "IDN ONLY", if IPTR RR is not NULL, PTR RR MUST contain
a domain name in ACE to coexist with those IDN unaware systems. Else a
"Syntax Error" message SHOULD be sent back, when an administrator con-
figures DNS zone files.
5.3 IPTR and PTR
It is a kind of backward compatible handle for those IDN unaware sys-
tems that can not provide the IPTR function. Besides, if a client can
Shi, Jiang [Page 4]
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
not find the corresponding LANGUAGE IDN finally, then the correspond-
ing PTR RR SHOULD be used as the answer.
6. IPTR query/response
When the QTYPE is IPTR in a query, all of the corresponding IPTR RRs
SHOULD be returned in one response. DNS messages are limited to 512
octets or less in size when sent over UDP. Therefore, if all the RRs
cannot fit in one UDP packet, this draft describe two solutions. One
is for recent environment and the other is for the near future.
6.1 Transport
Today, DNS queries and responses are carried in UDP datagrams or over
TCP connections.[RFC1034] specifies, IPTR RRSet is RECOMMENDED to be
returned in one response. The size of a DNS message could exceed 512
octets, when multiple RRs are present. Therefore, this draft makes the
two following recommendations.
- "Use UDP first, if UDP is not large enough then change to TCP" is
RECOMMENDED.
The server MUST send back the response with the TC bit set. Then
the resolver SHOULD resend the query using TCP on server port
53(decimal). This behavior is consistent with the current DNS
specification [RFC1035].
- In future, EDNS0 is REQUIRED to send large packets.
Then, before a client send a query to ask for IPTR record, it
MUST query the server whether it knows the EDNS0 first. If the
server knows EDNS0, then the client MAY send the IPTR query.
Else, unfortunally, the client MUST change the QTYPE to PTR.
Hence, the size of the UDP payload is no longer limited to 512
octets any more.
6.2 Standard sample
A resolver who wants to find the IDNs corresponding to an IP
address 1.2.3.4 whould pursue a query of the form QTYPE=IPTR,
QCLASS=IN, QNAME=4.3.2.1.IN-ADDR.ARPA, and would receive:
Shi, Jiang [Page 5]
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
+------------------------------------------------------+
Header | OPCODE=SQUERY, RESPONSE, AA |
+------------------------------------------------------+
Question | QNAME=4.3.2.1.IN-ADDR.ARPA.,QCLASS=IN,QTYPE=IPTR |
+------------------------------------------------------+
Answer | 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-CN" "name1-in-utf8" |
| 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-TW" "name2-in-utf8" |
| 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-JP" "name3-in-utf8" |
| 4.3.2.1.IN-ADDR.ARPA. IPTR "ko-KR" "name4-in-utf8" |
+------------------------------------------------------+
Authority | ... |
+------------------------------------------------------+
Additional | ... |
+------------------------------------------------------+
7. IPTR Usage
The "foo1.example" in following samples MAY or MAY NOT be
represented in the same characters.
4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[foo1.example] in utf8"
IPTR "zh-CN" "[foo1.example] in utf8"
IPTR "ja-JP" "[foo1.example] in utf8"
IPTR "ko-KR" "[foo1.example] in utf8"
Moreover,
4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[foo1.example] in utf8"
IPTR "zh-TW" "[foo2.example] in utf8"
...
IPTR "zh-CN" "[foo1.example] in utf8"
IPTR "zh-CN" "[foo2.example] in utf8"
...
IPTR "ja-JP" "[foo1.example] in utf8"
IPTR "ja-JP" "[foo2.example] in utf8"
...
IPTR "ko-KR" "[foo1.example] in utf8"
IPTR "ko-KR" "[foo2.example] in utf8"
...
will exist also. And "foo2.example" MUST be different from
"foo1.example", if they are in signed with same LANGUAGE. Or a
"Syntax Error" SHOULD be sent back, when an administrator config-
ures the zone files. Furthermore "foo2.example" in the samples
above MAY or MAY NOT be represented in the same characters.
Shi, Jiang [Page 6]
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
Thus,
4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[samefoo.sample] in utf8"
IPTR "zh-TW" "[samefoo.sample] in utf8"
occurs a "Syntax Error".
And,
4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[samefoo.sample] in utf8"
IPTR "zh-TW" "[difffoo.sample] in utf8"
IPTR "zh-CN" "[samefoo.sample] in utf8"
IPTR "ja-JP" "[samefoo.sample] in utf8"
IPTR "ko-KR" "[samefoo.sample] in utf8"
is allowed.
8. Changes
Through the discussion on the IETF49 meeting in San Diego, we
deleted the chapter "Open Issues" of our previous draft (version
01).
And,
4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[samefoo.sample] in utf8"
IPTR "zh-TW" "[difffoo.sample] in utf8"
IPTR "zh-CN" "[samefoo.sample] in utf8"
IPTR "ja-JP" "[samefoo.sample] in utf8"
IPTR "ko-KR" "[samefoo.sample] in utf8"
is allowed.
8. Changes
Through the discussion on the IETF49 meeting in San Diego, we
deleted the chapter "Open Issues" of our previous draft (version
01).
References
[IDNREQ] Zita Wenzel & James Seng, "Requirements of International-
ized Domain Names", draft-ietf-idn-requirements.
[IDNE] Marc Blanchet & Paul Hoffman, "Internationalized domain
names using EDNS", draft-ietf-idn-idne.
[NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of
Shi, Jiang [Page 7]
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
Internationalized Host Names", draft-ietf-idn-nameprep.
[RFC1034] P. Mockapetris, "DOMAIN NAMES - CONCEPTS AND FACILITIES",
November 1987, RFC1034
[RFC1035] P. Mockapetris, "DOMAIN NAMES - IMPLEMENTATION AND
SPECIFICATION", November 1987, RFC1035
[RFC1766] H. Alvestrand, "Tags for the Identification of
Languages", March 1999, RFC 1766
[RFC1886] S. Thomson, C. Huitema, "DNS Extensions to support IP
version 6", December 1995, RFC1886
[RFC2181] R. Elz, R. Bush, "Clarifications to the DNS Specifica-
tion", July 1997, RFC2181
[RFC2279] Francois Yergeau, "UTF-8, a transformation format of ISO
10646", January 1998, RFC 2279.
[RFC2671] Paul Vixie, "Extension Mechanisms for DNS (EDNS0)",
August 1999, RFC 2671.
[ISO 639] ISO 639:1988 (E/F) - Code for the representation of names
of languages - The International Organization for Standardization,
1st edition, 1988 17 pages Prepared by ISO/TC 37 - Terminology
(principles and coordination).
[ISO 3166] ISO 3166:1988 (E/F) - Codes for the representation of
names of countries - The International Organization for Standardi-
zation, 3rd edition, 1988-08-15.
Acknowledgements
James Seng and Yoshiro Yoneya have given many comments in our e-
mail discussions. Harald Alvestrand, Mark Davis have given many
suggestions in the idn-wg mailing list discussions. And there are
also a lot of people who have given us their comments in the idn-wg
and BIND-user mailing list discussions.
Authors' Information
Hongbo Shi
Waseda University
3-4-1 Okubo, Shinjyuku-ku
Tokyo, 169-8555 Japan
shi@goto.info.waseda.ac.jp
Shi, Jiang [Page 8]
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
Jiang Ming Liang
i-DNS.net
8 Temasek Boulevard
#24-02 Suntec Tower Three
Singapore 038988
jiang@i-DNS.net
Shi, Jiang [Page 9]

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,655 +0,0 @@
IETF IDN Working Group Editors Zita Wenzel, James Seng
Internet Draft draft-ietf-idn-requirements-09.txt
21 November 2001 Expires 21 May 2002
Requirements of Internationalized Domain Names
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026 [8].
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or made obsolete by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Intended Scope
The intended scope of this document is to explore requirements for the
internationalization of domain names on the Internet. It is not
intended to document user requirements. It is recommended that
solutions not necessarily be within the DNS itself, but could be a layer
interjected between the application and the DNS. Proposals SHOULD
fulfill most, if not all, of the requirements. This document MAY be
updated based on actual trials.
Abstract
This document describes the requirement for encoding international
characters into DNS names and records. This document is guidance for
developing protocols for internationalized domain names.
1. Introduction
At present, the encoding of Internet domain names is restricted to a
subset of 7-bit ASCII (ISO/IEC 646). HTML, XML, IMAP, FTP, and many
other text based protocols on the Internet have already been at least
partially internationalized. It is important for domain names to be
similarly internationalized or for an equivalent solution to be found.
This document assumes that the most effective solution involves putting
non-ASCII names inside some parts of the overall DNS system although
this assumption may not be the consensus of the IETF community.
However, several sections of this document, including "Definitions and
Conventions" should be useful in any case. A reasonable familiarity
with DNS terminology is assumed in this document.
This document is being discussed on the "idn" mailing list. To join the
list, send a message to <majordomo@ops.ietf.org> with the words
"subscribe idn" in the body of the message. Archives of the mailing
list can also be found at ftp://ops.ietf.org/pub/lists/idn*.
1.1 Definitions and Conventions
A language is a way that humans interact. In computerized form, a text
in a written language can be expressed as a string of characters.
The same set of characters can often be used for many written languages,
and many written languages can be expressed using different scripts.
The same characters are often shown with somewhat different glyphs
(shapes) for display of a text depending on the font used, the
automatic shaping applied, or the automatic formation of ligatures. In
addition, the same characters can be shown with somewhat different
glyphs (shapes) for display of a text depending on the language being
used, even within the same font or through automatic font change.
Character: A character is a member of a set of elements used for
organization, control, or representation of textual data.
Graphic character: A graphic character is a character, other than a
control function, that has a visual representation normally
handwritten, printed, or displayed.
Characters mentioned in this document are identified by their position
in the Unicode character set. This character set is also
known as the UCS (ISO 10646) [19]. The notation U+12AB, for example,
indicates the character at position 12AB (hexadecimal) in the Unicode
character set. Note that the use of this notation is not an
indication of a requirement to use Unicode.
Examples quoted in this document should be considered as a method to
further explain the meanings and principles adopted by the document. It
is not a requirement for the protocol to satisfy the examples.
Unicode Technical Report #17 [24] defines a character encoding
model in several levels (much of the text below is quoted from
Unicode Technical Report #17).
[N.B. Sections 1-6 below to be unpacked and and reworded to be
independent of the Unicode Technical Report #17.]
1. A abstract character repertoire (ACR) is defined as the set of
abstract characters to be encoded, normally a familiar alphabet
or symbol set. The word abstract just means that these objects
are defined by convention (such as the 26 letters of the English
alphabet, uppercase and lowercase forms). Examples: the ASCII
repertoire, the Latin 9 repertoire, the JIS X 0208 repertoire,
the UCS repertoire (of a particular version).
2. A coded character set (CCS) is defined to be a mapping from a
set of abstract characters to the set of non-negative integers.
This range of integers need not be contiguous. An abstract
character is defined to be in a coded character set if the coded
character set maps from it to an integer. That integer is said
to be the code point for the abstract character. That abstract
character is then an encoded character. Examples: ASCII, Latin-15,
JIS X 0208, the UCS.
3. A character encoding form (CEF) is a mapping from the set of integers
used in a CCS to the set of sequences of code units. A code unit
is an integer occupying a specified binary width in a computer
architecture, such as a septet, an octet, or a 16-bit unit. The
encoding form enables character representation as actual data in
a computer. The sequences of code units do not necessarily have the
same length. Examples: ASCII, Latin-15, Shift-JIS, UTF-16, UTF-8.
4. A character encoding scheme (CES) is a mapping of code units into
serialized octet sequences. Character encoding schemes are relevant
to the issue of cross-platform persistent data involving code units
wider than a byte, where byte-swapping may be required to put data
into the byte polarity canonical for a particular platform.
The CES may involve two or more CCS's, and may include code units
(e.g., single shifts, SI/SO, or escape sequences) that are not part
of the CCS per se, but which are defined by the character encoding
architecture and which may require an external registry of particular
values (as for the ISO 2022 escape sequences). In such a case, the
CES is called a compound CES. (A CES that only involves a single
CCS is called a simple CES.) Examples: ASCII, Latin-15, Shift-JIS,
UTF-16BE, UTF-16LE, UTF-8.
5. The mapping from an abstract character repertoire (ACR) to a
serialized sequence of octets is called a Character Map (CM). A simple
character map thus implicitly includes a CCS, a CEF, and a CES,
mapping from abstract characters to code units to octets. A compound
character map includes a compound CES, and thus includes more than one
CCS and CEF. In that case, the abstract character repertoire for the
character map is the union of the repertoires covered by the coded
character sets involved.
A sequence of encoded characters must be unambiguously
mapped onto a sequence of octets by the charset. The charset must be
specified in all instances, as in Internet protocols, where textual
content is treated as an ordered sequence of octets, and where the
textual content must be reconstructible from that sequence of
octets. Charset names are registered by the IANA according to
procedures documented in RFC 2278 [12]. In many cases, the same
name is used for both a character map and for a character encoding
scheme, such as UTF-16BE. Typically this is done for simple
character maps when such usage is clear from context.
6. A transfer encoding syntax (TES) is a reversible transform of encoded
data which may (or may not) include textual data represented in
one or more character encoding schemes. Examples: 8bit,
Quoted-Printable, BASE64, UTF-7 (defunct), UTF-5, and RACE.
1.2 Description of the Domain Name System
The Domain Name System is defined by RFC 1034 [4] and RFC 1035 [5], with
clarifications, extensions and modifications given in RFC 1123 [6],
RFC 1996 [7], RFC 2181 [10], and others. Of special importance here are the
security extensions described in RFC 2535 [14] and related RFCs.
Over the years, many different words have been used to describe the
components of resource naming on the Internet (e.g., URI, URN); to make
certain that the set of terms used in this document are well-defined and
non-ambiguous, the definitions are given here.
Master server: A master server for a zone holds the main copy of that
zone. This copy is sometimes stored in a zone file. A slave server for
a zone holds a complete copy of the records for that zone. Slave
servers MAY be either authorized by the zone owner (secondary servers)
or unauthorized (sometimes called "stealth secondaries"). Master and
authorized slave servers are listed in the NS records for the zone,
and are termed "authoritative" servers. In many contexts outside this
document, the term "primary" is used interchangeably with "master" and
"secondary" is used interchangeably with "slave".
Caching server: A caching server holds temporary copies of DNS
records; it uses records to answer queries about domain names. Further
explanation of these terms can be found in RFC 1034 [4] and RFC 1996
[7].
DNS names can be represented in multiple forms, with different
properties for internationalization. The most important ones are:
- Domain name: The binary representation of a name used internally in
the DNS protocol. This consists of a series of components of 1-63
octets, with an overall length limited to 255 octets (including the
length fields).
- Master file format domain name: This is a representation of the name
as a sequence of characters in some character sets; the common
convention (derived from RFC 1035 [5] section 5.1) is to represent the
octets of the name as ASCII characters where the octet is in the set
corresponding to the ASCII values for [a-z,A-Z,0-9,-], using an escape
mechanism (\x or \NNN) where not, and separating the components of the
name by the dot character (".").
The form specified for most protocols using the DNS is a limited form of
the master file format domain name. This limited form is defined in
RFC 1034 [4] Section 3.5 and RFC 1123 [6]. In most implementations of
applications today, domain names in the Internet have been limited to
the much more restricted forms used, e.g., in email, which defines its
own rules. Those names are limited to the upper- and lower-case
letters a-z (interpreted in a case-independent fashion), the digits,
and the hyphen-minus, all in ASCII.
1.3 Definition of "hostname" and "Internationalized Domain Name"
Hostname:
In the DNS protocols, a name is referred to as a sequence of octets.
However, when discussing requirements for internationalized domain
names, what we are looking for are ways to represent characters that
are meaningful for humans.
Internationalized Domain Name:
In this document, this representation is referred to as a
"hostname". While this term has been used for many different purposes
over the years, it is used here in the sense of sequence of characters
(not octets) representing a domain name conforming to the limited
hostname syntax specified in RFC 952 [3]. This document attempts to
define the requirements for an "Internationalized Domain Name"
(IDN). IDN is defined as a sequence of characters that can be used in
the context of functions where a hostname is used today, but contains
one or more characters that are outside the set of characters
specified as legal characters for host names RFC 1123 [6].
1.4 A multilayer model of the DNS function
The DNS can be seen as a multilayer function:
- The bottom layer is where the packets are passed across the Internet
in a DNS query and a DNS response. At this level, what matters is
the format and meaning of bits and octets in a DNS packet.
- Above that is the "DNS service", created by an infrastructure of DNS
servers, NS records that point to those DNS servers, that is
pointed to by the root servers (listed in the "root cache file" on
each DNS server often called "named.cache"). It is at this level
that the statement "the DNS has a single root" RFC 2826 [17] makes
sense, but still, what is being transferred are octets, not
characters.
- Interfacing to the user is a service layer, often called "the resolver
library". It is often embedded in the operating system or system
libraries of the client machines. It is at the top of this layer that
the API calls commonly known as "gethostbyname" and "gethostbyaddress"
reside. These calls are modified to support IPv6 RFC 2553 [15]. A
conceptually similar layer exists in authoritative DNS servers,
comprising the parts that generate "meaningful" strings in DNS files.
Due to the popularity of the "master file" format, this layer often
exists only in the administrative routines of the service maintainers.
- The user of this layer (resolver library) is the application programs
that use the DNS, such as mailers, mail servers, Web clients, Web
servers, Web caches, IRC clients, FTP clients, distributed file
systems, distributed databases, and almost all other applications on
TCP/IP.
Graphically, one can illustrate it like this:
+---------------+ +---------------------+
| Application | | (Base data) |
+---------------+ +---------------------+
| Application service interface |
| For ex. GethostbyXXXX interface | (no standard)
+---------------+ +---------------------+
| Resolver | | Auth DNS server |
+---------------+ +---------------------+
| <----- DNS service interface -----> |
+------------------------------------------------------------------+
| DNS service |
| +-----------------------+ +--------------------+ |
| | Forwarding DNS server | | Caching DNS server | |
| +-----------------------+ +--------------------+ |
| |
| +-------------------------+ |
| | Parent-zone DNS servers | |
| +-------------------------+ |
| |
| +-------------------------+ |
| | Root DNS servers | |
| +-------------------------+ |
| |
+------------------------------------------------------------------+
1.5 Service model of the DNS
The Domain Name Service is used for multiple purposes, each of which is
characterized by what it puts into the system (the query) and what it
expects as a result (the reply).
The most used ones in the current DNS are:
- Hostname-to-address service (A, AAAA, A6): Enter a hostname, and get
back an IPv4 or IPv6 address.
- Hostname-to-mail server service (MX): As above, but the expected
return value is a hostname and a priority for SMTP servers.
- Address-to-hostname service (PTR): Enter an IPv4 or IPv6 address (in
in-addr.arpa. or ip6.arpa form respectively) and get back a hostname.
- Domain delegation service (NS). Enter a domain name and get back
nameserver records (designated hosts which provide authoritive
nameservice) for the domain.
New services are being defined, either as entirely new services (IPv6 to
hostname mapping using binary labels) or as embellishments to other
services such as DNS Security (DNSSEC) [14], returning information
about whether a given DNS service is performed securely or not).
These services exist, conceptually, at the Application/Resolver
interface, NOT at the DNS-service interface. This document attempts to
set requirements for an equivalent of the "used services" given above,
where "hostname" is replaced by "Internationalized Domain Name". This
does not preclude the fact that IDN should work with any kind of DNS
queries. IDN is a new service. Since existing protocols like SMTP or
HTTP use the old service, it is a matter of great concern how the new
and old services work together, and how other protocols can take
advantage of the new service.
2. General Requirements
These requirements address two concerns: The service offered to the
users (the application service), and the protocol extensions, if needed,
added to support this service.
In the requirements, we attempt to use the term "service" whenever a
requirement concerns the service, and "protocol" whenever a requirement
is believed to constrain the possible implementation.
2.1 Compatibility and Interoperability
[1] The DNS is essential to the entire Internet. Therefore, the service
MUST NOT damage present DNS protocol interoperability. It MUST make the
minimum number of changes to existing protocols on all layers of the
stack. It MUST continue to allow any system anywhere that implements
the IDN specification to resolve any internationalized domain name.
[2] The service MUST preserve the basic concept and facilities of domain
names as described in RFC 1034 [4]. It MUST maintain a single, global,
universal, and consistent hierarchical namespace.
[3] The DNS protocol (the packet formats that go on the wire) MUST
NOT limit the codepoints that can be used. A service defined on top of
the DNS, for instance the IDN-to-address function, MAY limit the
codepoints that can be used. The service descriptions MUST describe
what limitations are imposed.
[4] The protocol MUST work for all features of DNS, IPv4, and
IPv6. The protocol MUST NOT allow an IDN to be returned to a requestor
that requests the IP-to-(old)-domain-name mapping service.
[5] The same name resolution request MUST generate the same response,
regardless of the location or localization settings in the resolver, in
the master server, and in any slave servers involved in the resolution
process.
[6] The protocol MUST NOT require that the current DNS cache
servers be modified to support IDN. If a cache server can have
additional functionality to support IDN better, this additional
functionality MUST NOT cause problems for resolving correctly
functioning current domain names.
[7] A caching server MUST NOT return data in response to a query that
would not have been returned if the same query had been presented to an
authoritative server. This applies fully for the cases when:
- The caching server does not know about IDN
- The caching server implements the whole specification
- The caching server implements a valid subset of the specification
[8] The service MAY modify the DNS protocol RFC 1035 [5] and other related
work undertaken by the DNS Extensions (DNSEXT) [2] working group. However,
these changes SHOULD be as small as possible and any changes SHOULD be
coordinated with the DNSEXT working group.
[9] The protocol supporting the service SHOULD be as simple as possible
from the user's perspective. Ideally, users SHOULD NOT realize that IDN
was added on to the existing DNS.
[10] The best solution is one that maintains maximum feasible
compatibility with current DNS standards as long as it meets the other
requirements in this document.
[11] The protocol should handle with care new revisions of the CCS.
Undefined codepoints should not be allowed unless a new revision of
the protocol can handle it. Protocol revisions should be tagged.
2.2 Internationalization
[12] Internationalized characters MUST be allowed to be represented and
used in DNS names and records. The protocol MUST specify what charset is
used when resolving domain names and how characters are encoded in DNS
records.
[13] Codepoints SHOULD be from the Universal Set as defined in
ISO-10646 or Unicode. The specifics of versions MUST be defined in the
proposed solution. If multiple charsets are allowed, each charset MUST
be tagged and conform to RFC 2277 [11].
[14] The protocol MUST NOT reject any non-IDN characters (to be
defined) in any DNS queries or responses.
[15] The protocol SHOULD NOT invent a new CCS for the purpose of IDN
only and SHOULD use an existing CES. The charset(s) chosen SHOULD also be
non-ambiguous.
[16] The protocol SHOULD NOT make any assumptions about the location
in a domain name where internationalization might appear. In other
words, it SHOULD NOT differentiate between any part of a domain name
because this MAY impose restrictions on future internationalization
efforts. For example, the Top-Level Domains (TLDs) can be
internationalized.
[17] The protocol also SHOULD NOT make any localized restrictions in the
protocol. For example, an IDN implementation which only allows domain
names to use a single local script would immediately restrict
multinational organization.
[18] While there are a wide range of devices that use the DNS and a wide
range of characteristics of international scripts and methods of
domain name input and display, IDN is only concerned with the
protocol. Therefore, there MUST be a single way of encoding an
internationalized domain name within the DNS.
2.3 Canonicalization
Matching rules are a complicated process for IDN. Canonicalization
of characters MUST follow precise and predictable rules to ensure
consistency. "Requirements for String Identity Matching and String
Indexing" is RECOMMENDED as a guide on canonicalization.
The DNS has to match a host name in a request with a host name held
in one or more zones. It also needs to sort names into order. It is
expected that some sort of canonicalization algorithm will be used as
the first step of this process. This section discusses some of the
properties which will be REQUIRED of that algorithm.
[19] To achieve interoperability, canonicalization MUST be done at a
single well-defined place in the DNS resolution process. The protocol
MUST specify canonicalization; it MUST specify exactly where in the
DNS that canonicalization happens and does not happen; it MUST specify
how additions to ISO 10646 will affect the stability of the DNS and
the amount of work done on the root DNS servers.
[20] The canonicalization algorithm MAY specify operations for case,
ligature, and punctuation folding.
[21] In order to retain backward compatibility with the current DNS,
the service MUST retain the case-insensitive comparison for US-ASCII
as specified in RFC 1035 [5]. For example, Latin capital letter A
(U+0041) MUST match Latin small letter a (U+0061). Unicode Technical
Report #21 [25] describes some of the issues with case
mapping. Case-insensitivity for non US-ASCII MUST be discussed in the
protocol proposal.
[22] Case folding MUST be locale independent. If it were
locale-dependent, then different clients would get different results.
For example, Latin capital letter I (U+0049) case folded to lower case
in the Turkish context will become Latin small letter dotless i
(U+0131). But in the English context, it will become Latin small
letter i (U+0069).
[23] If other canonicalization is done, it MUST be done before the
domain name is resolved. Further, the canonicalization MUST be easily
upgradable as new languages and writing systems are added.
[24] Any conversion (case, ligature folding, punctuation folding, etc)
from what the user enters into a client to what the client asks for
resolution MUST be done identically on any request from any client.
[25] If the charset can be normalized, then it SHOULD be normalized
before it is used in IDN. Normalization SHOULD follow Unicode
Technical Report #15 [23].
[26] The protocol SHOULD avoid inventing a new normalization form
provided a technically sufficient one is available.
2.4 Operational Issues
[27] Zone files SHOULD remain easily editable.
[28] An IDN-capable resolver or server SHALL NOT generate more traffic
than a non-IDN-capable resolver or server would when resolving an
ASCII-only domain name. The amount of traffic generated when resolving
an IDN SHALL be similar to that generated when resolving an ASCII-only
name.
[29] The service SHOULD NOT add new centralized administration for the
DNS. A domain administrator SHOULD be able to create internationalized
names as easily as adding current domain names.
[30] The protocol MUST work with DNSSEC. The protocol MAY break
language sort order.
3. Security Considerations
Any solution that meets the requirements in this document MUST NOT be
less secure than the current DNS. Specifically, the mapping of
internationalized host names to and from IP addresses MUST have the
same characteristics as the mapping of today's host names.
Specifying requirements for internationalized domain names does not
itself raise any new security issues. However, any change to the DNS MAY
affect the security of any protocol that relies on the DNS or on
DNS names. A thorough evaluation of those protocols for security
concerns will be needed when they are developed. In particular, IDNs
MUST be compatible with DNSSEC and, if multiple charsets or
representation forms are permitted, the implications of this name-spoof
MUST be throughly understood.
4. References
[1] World Wide Web Consortium, "Requirements for string identity
matching and String Indexing", http://www.w3.org/TR/WD-charreq, July
1998.
[2] Olafur Gudmundson, Randy Bush, "IETF DNS Extensions Working Group"
(DNSEXT), namedroppers@ops.ietf.org.
[3] K. Harrenstien, M.K. Stahl, E.J. Feinler, "DoD Internet Host Table
Specification", RFC 952, October 1985.
[4] P. Mockapetris, "Domain Names - Concepts and Facilities",
RFC 1034, November 1987.
[5] P. Mockapetris, "Domain Names - Implementation and
Specification", RFC 1035, November 1987.
[6] R. Braden, "Requirements for Internet Hosts -- Application and
Support", RFC 1123, October 1989.
[7] P. Vixie, "A Mechanism for Prompt Notification of Zone Changes
(DNS NOTIFY)", RFC 1996, August 1996.
[8] S. Bradner, "The Internet Standards Process -- Revision 3", RFC
2026, October 1996.
[9] S. Bradner, "Key words for use in RFCs to Indicate Requirement
Levels", RFC 2119, March 1997.
[10] R. Elz, R. Bush, "Clarifications to the DNS Specification",
RFC 2181, July 1997.
[11] H. Alvestrand, "IETF Policy on Character Sets and Languages", RFC
2277, January 1998.
[12] N. Freed and J. Postel, "IANA Charset Registration Procedures",
RFC 2278, January 1998.
[13] F. Yergeau, "UTF-8, a transformation format of ISO 10646", RFC
2279, January 1998.
[14] D. Eastlake, "Domain Name System Security Extensions", RFC 2535,
March 1999.
[15] R. Gilligan et al, "Basic Socket Interface Extensions for IPv6",
RFC 2553, March 1999.
[16] L. Daigle et al, "A Tangled Web: Issues of I18N, Domain Names,
and the Other Internet protocols", RFC 2825, May 2000.
[17] Internet Architecture Board, "IAB Technical Comment on the Unique DNS
Root", RFC 2826, May 2000.
[18] P. Hoffman, "Comparison of Internationalized Domain Name
Proposals", draft-ietf-idn-compare-00.txt, June 2000.
[19] ISO/IEC 10646-1:2000 (note that an amendment 1 is in
preparation), ISO/IEC 10646-2 (in preparation), plus corrigenda and
amendments to these standards.
[20] The Unicode Consortium, "The Unicode Standard". Described at
http://www.unicode.org/unicode/standard/versions/.
[21] The Unicode Consortium, "The Unicode Standard -- Version 3.0",
ISBN 0-201-61633-5. Same repertoire as ISO/IEC 10646-1:2000. Described
at http://www.unicode.org/unicode/standard/versions/Unicode3.0.html.
[22] Coded Character Set -- 7-bit American Standard Code for
Information Interchange, ANSI X3.4-1986; also: ISO/IEC 646 (IRV).
[23] M. Davis and M. Duerst, Unicode Consortium, "Unicode
Normalization Forms", Unicode Standard Annex #15,
http://www.unicode.org/unicode/reports/tr15/, 2000-08-31.
[24] K. Whistler and M. Davis, Unicode Consortium, "Character Encoding
Model", Unicode Technical Report #17,
http://www.unicode.org/unicode/reports/tr17/, 2000-08-31.
[25] M. Davis, Unicode Consortium, "Case Mappings", Unicode Technical
Report #21, http://www.unicode.org/unicode/reports/tr21/, 2000-09-12.
5. Editors' Contact
Zita Wenzel, Ph.D.
Information Sciences Institute
University of Southern California
4676 Admiralty Way
Marina del Rey, CA
90292 USA
Tel: +1 310 448 8462
Fax: +1 310 823 6714
zita@isi.edu
James Seng
i-DNS.net International Pte Ltd.
8 Temesek Boulevand
#24-02 Suntec Tower 3
Singapore 038988
Tel: +65 248 6208
Fax: +65 248 6198
Email: jseng@pobox.org.sg
6. Acknowledgements
The editors gratefully acknowledge the contributions of:
Harald Tveit Alvestrand <Harald@Alvestrand.no>
Mark Andrews <Mark.Andrews@nominum.com>
RJ Atkinson <request not to have email>
Alan Barret <apb@cequrux.com>
Marc Blanchet <blanchet@mailviagenie.qc.ca>
Randy Bush <randy@psg.com>
Andrew Draper <ADRAPER@altera.com>
Martin Duerst <duerst@w3.org>
Patrik Faltstrom <paf@swip.net>
Ned Freed <ned.freed@innosoft.com>
Olafur Gudmundsson <ogud@ogud.com>
Paul Hoffman <phoffman@imc.org>
Simon Josefsson <jas+idn@pdc.kth.se>
Kent Karlsson <keka@im.se>
John Klensin <klensin+idn@jck.com>
Tan Juay Kwang <tanjk@i-dns.net>
Dongman Lee <dlee@icu.ac.kr>
Bill Manning <bmanning@ISI.EDU>
Dan Oscarsson <Dan.Oscarsson@trab.se>
J. William Semich <bill@mail.nic.nu>
Yoshiro Yoneda <yone@nic.ad.jp>

View File

@ -1,557 +0,0 @@
Internet Draft Dan Oscarsson
draft-ietf-idn-udns-03.txt Telia ProSoft
Updates: RFC 2181, 1035, 1034, 2535 19 August 2001
Expires: 19 February 2002
Using the Universal Character Set in the Domain Name System (UDNS)
Status of this memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
Since the Domain Name System (DNS) [RFC1035] was created there have
been a desire to use other characters than ASCII in domain names.
Lately this desire have grown very strong and several groups have
started to experiment with non-ASCII names. This document defines
how the Universal Character Set (UCS) [ISO10646] is to be used in
DNS. It includes both a transition scheme for older software
supporting non-ASCII handling in applications only, as well as how to
use UCS in labels and having more than 63 octets in a label.
1. Introduction
While the need for non-ASCII domain names have existed since the
creation of the DNS, the need have increased very much during the
last few years. Currently there are at least two implementations
using UTF-8 in use, and others using other methods.
To avoid several different implementations of non-ASCII names in DNS
Dan Oscarsson Expires: 19 February 2002 [Page 1]
Internet Draft Universal DNS 19 August 2001
that do not work together, and to avoid breaking the current ASCII
only DNS, there is an immediate need to standardise how DNS shall
handle non-ASCII names.
While the DNS protocol allow any octet in character data, so far the
octets are only defined for the ASCII code points. Octets outside the
ASCII range have no defined interpretation. This document defines how
all octets are to be used in character data allowing a standardised
way to use non-ASCII in DNS.
The specification here conforms to the IDN requirements [IDNREQ].
1.1 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
IDN: Internationalised Domain Name, here used to mean a domain name
containing non-ASCII characters.
ACE: ASCII Compatible Encoding. Used to encode IDNs in a way
compatible with the ASCII host name syntax.
1.2 Previous versions of this document
This version contains just minor corrections to the 4:th version.
The third version of this document included a way to return both
ASCII and non-ASCII versions of a name. As this could not be
guaranteed to work it has been removed.
The second version of this document was available as draft-ietf-idn-
udns-00.txt. It included a lot of possibilities as well as a flag bit
that is now removed.
The first version of this document was available as draft-oscarsson-
i18ndns-00.txt.
2. The DNS Protocol
The DNS protocol is used when communicating between DNS servers and
other DNS servers or DNS clients. User interface issues like the
format of zone files or how to enter or display domain names are not
part of the protocol.
The update of the protocol defined here can be used immediately as it
Dan Oscarsson Expires: 19 February 2002 [Page 2]
Internet Draft Universal DNS 19 August 2001
is fully compatible with the DNS of today.
For a long time there will be software understanding UCS in DNS and
software only understanding ASCII in DNS. It is therefore necessary
to support a mixing of both types. For the following text software
understanding UCS in DNS will be called UDNS aware.
This specification supports the following scenarios:
- UDNS unaware client, UDNS aware DNS server
- UDNS aware client, UDNS unaware DNS server
- UDNS aware client, UDNS aware DNS server
2.1 Fundamentals
2.1.1 Standard Character Encoding (SCE)
Character data need to be able to represent as much as possible of
the characters in the world as well as being compatible with ASCII.
Character data is used in labels and in text fields in the RDATA part
of a RR.
The Standard Character Encoding of character data used in the DNS
protocol MUST:
- Use ISO 10646 (UCS) [ISO10646] as coded character set.
- Be normalised using form C as defined in Unicode technical report
#15 [UTR15]. See also [CHNORM].
- Encoded using the UTF-8 [RFC2279] character encoding scheme.
2.1.2 Binary Comparison Format (BCF)
RFC 1035 states that the labels of a name are matched case-
insensitively. When using UCS this is no longer enough as there are
other forms than case that need to match as equivalent. Form-
insensitive matching of UCS includes:
- Letters of different case are compared as the same character.
- Code points of primary typographical variations of the same
character are compared as the same character. An example is double
width/normal width characters or presentation forms of a
character.
- Some characters are represented with multiple code points in UCS.
All code points of one character must compare as the same. For
example the degree Kelvin sign is the same as the letter K.
The original definition is now extended to be: labels must be
compared using form-insensitivity.
Dan Oscarsson Expires: 19 February 2002 [Page 3]
Internet Draft Universal DNS 19 August 2001
To handle form-insensitivity it is here defined the Binary Comparison
Format (BCF) to which strings can be mapped. After strings is mapped
to BCF they can be compared using binary string comparison.
Implementors may implement the form-insensitive comparison without
using BCF, as long as the results are the same.
Mapping of a label to BCF is typically done by steps like: changing
all upper case letters to lower case, mapping different forms to one
form and changing different code points of one character into a
single code point.
For the UCS character code range 0-255 (ASCII and ISO 8859-1) the BCF
MUST be done by mapping all upper case characters to lower case
following the one to one mapping as defined in the Unicode 3.0
Character Database [UDATA].
The definition of the Binary Comparison Format (BCF) for the rest of
UCS will be defined in a separate document. The nearest today is
[NAMEPREP].
2.1.3 Backward Compatibility Encoding (BCE)
To support older software expecting only ASCII and to support
downgrading from 8-bit to 7-bit ASCII in other protocols (like SMTP)
a Backward Compatibility Encoding (BCE) is available. It is a
transition mechanism and will no longer be supported at some future
time when it is so decided.
The Backward Compatibility Encoding (BCE) of a label is defined as
the BCF of the label encoded using an ASCII Compatible Encoding
(ACE).
The definition of the ACE to be used, is defined in a separate
document. Typical definitions that are suitable are [SACE] and
[RACE].
The reason that the BCF form of the label is used is to support
solutions where only applications know about non-ASCII labels. By
using BCF the server need not know about UCS and can just do binary
matching so it can be handled in old servers. Though due to the fact
that BCF destroys information contained in the original form of a
label it is impossible to return the original form to a client using
BCE.
2.1.4 Long names
The current DNS protocol limits a label to 63 octets. As UTF-8 take
more than one octet for some characters, an UTF-8 name cannot have 63
Dan Oscarsson Expires: 19 February 2002 [Page 4]
Internet Draft Universal DNS 19 August 2001
characters in a label like an ASCII name can. For example a name
using Hangul would have a maximum of 21 characters.
The limits imposed by RFC 1035 is 63 octets per label and 255 octets
for the full name. The 255 limit is not a protocol limit but one to
simplify implementations.
To support longer names a long label type is defined using [RFC2671]
as extended label 0b000011 (the label type will be assigned by IANA
and may not be the number used here).
1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|0 1 0 0 0 0 1 1| length | label data ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
length: length of label in octets
label data: the label
The long label MUST be handled by all software following this
specification. Also, they MUST support a UDP packet size of up to
1280 bytes.
The limits for labels are updated since RFC 1025 as follows:
A label is limited to a maximum of 63 character code points in UCS
normalised using Unicode form C. The full name is limited to a
maximum of 255 character code points normalised as for a label.
A long label MUST always use the Standard Character Encoding (SCE).
As long labels are not understood by older software, a response MUST
not include a long label unless the query did. At a later date, IETF
may change this.
2.2 Rules for matching of domain names in UDNS aware DNS servers
To be able to handle correct domain name matching in lookups, the
following MUST be followed by DNS servers:
- Do matching on authorative data using form-insensitive matching
for the characters used in the data (for example a zone using only
ASCII need only handle matching of ASCII characters).
- On non-authorative data, either do binary matching or case-
insensitive matching on ASCII letters and binary matching on all
others.
The effect of the above is:
Dan Oscarsson Expires: 19 February 2002 [Page 5]
Internet Draft Universal DNS 19 August 2001
- only servers handling authorative data must implement form-
insensitive matching of names. And they need only implement the
subset needed for the subset of characters of UCS they support in
their authorative zones.
- it normally gives fast lookup because data is usually sent like:
resolver <-> server <-> authorative server.
While form-insensitive matching can be complex and CPU consuming,
the server in the middle will do caching with only simple and fast
binary matching. So the impact of complex matching rules should
not slow down DNS very much.
2.3 Mixing of UDNS aware and non-UDNS aware clients and servers
To handle the mixing of UDNS aware and non-UDNS aware clients and
servers the following MUST be followed for clients and servers.
2.3.1 Native UDNS aware client
A native UDNS aware client is a client supporting all in this
document.
When doing a query it MUST:
- Use the long label in the QNAME.
- If server rejected query due to long label, retry the query using
the normal short label. If the QNAME contains non-ASCII it must be
encoded using BCE.
- Handle answers containg BCE.
The client may skip trying a query using the long label if it knows
the server does not understand it.
2.3.2 Application based UDNS aware client
An application based UDNS aware client is a client supporting UDNS
through BCE handling in the application.
It only understands BCE and need only a non-UDNS aware resolver to
work. All encoding and decoding of BCE is handled in the
application.
Due to BCE being an ACE of BCF the names returned in an answer need
not contain the real form of the name. Instead it may contains the
simplified form used in name matching. As this is a transition
mechanism to support non-ASCII in names before the DNS servers have
been upgraded, it is acceptable and will give people a reason to
upgrade.
2.3.3 non-UDNS aware client
Dan Oscarsson Expires: 19 February 2002 [Page 6]
Internet Draft Universal DNS 19 August 2001
A non-UDNS aware client will send ASCII or whatever is sent from an
application. It can be BCE which will for the client just be ASCII
text.
2.3.4 UDNS aware server
An UDNS aware server MUST handle all in this document and follow:
- If an incoming query contains a long label the answer may contain
a long label and the client is identified as being UDNS aware.
- If the query comes from a non-UDNS aware client and the answer
contains non-ASCII, the non-ASCII labels must be encoded using
BCE.
- If a short label is used in a query and the QNAME contains non-
ASCII, an authorative server must handle the query if the
character encoding can be recognised. If must recognise SCE and
should recognise common encodings used for the labels in the
domain it is authorative for. Answers will use BCE for all labels
except the one matching QNAME. This will allow clients using the
local character set to work in many cases before the resolver code
is upgraded.
2.3.5 non-UDNS aware server
A non-UDNS server can only handle ASCII matching when comparing
names. It can support the transition mechanism with BCE. The
authorative zones will then have to be loaded with manually BCE
encoded names.
2.4 DNSSEC
As labels now can have non-ASCII in them, DNSSEC [RFC2535] need to be
revised so that it also can handle that.
3. Effect on other protocols
As now a domain name may include non-ASCII many other protocols that
include domain names need to be updated. For example SMTP, HTTP and
URIs. The BCE format can be used when interfacing with ASCII only
software or protocols. Protocols like SMTP could be extended using
ESMTP and a UTF8 option that defines that all headers are in UTF-8.
It is recommended that protocols updated to handle i18n do this by
encoding character data in the same standard format as defined for
DNS in this document (UCS normalised form C). The use of encoding it
in ASCII or by tagged character sets should be avoided.
DNS do not only have domain names in them, for example e-mail
Dan Oscarsson Expires: 19 February 2002 [Page 7]
Internet Draft Universal DNS 19 August 2001
addresses are also included. So an e-mail address would be expected
to be changed to include non-ASCII both before and after the @-sign.
Software need to be updated to follow the user interface
recommendations given above, so that a human will see the characters
in their local character set, if possible.
4. Security Considerations
As always with data, if software does not check for data that can be
a problem, security may be affected. As more characters than ASCII is
allowed, software only expecting ASCII and with no checks may now get
security problems.
5. References
[RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities",
STD 13, RFC 1034, November 1987.
[RFC1035] P. Mockapetris, "Domain Names - Implementation and
Specification", STD 13, RFC 1035, November 1987.
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997, RFC 2119.
[RFC2181] R. Elz and R. Bush, "Clarifications to the DNS
Specification", RFC 2181, July 1997.
[RFC2279] F. Yergeau, "UTF-8, a transformation format of ISO 10646",
RFC 2279, January 1998.
[RFC2535] D. Eastlake, "Domain Name System Security Extensions".
RFC 2535, March 1999.
[RFC2671] P. Vixie, "Extension Mechanisms for DNS (EDNS0)", RFC
2671, August 1999.
[ISO10646] ISO/IEC 10646-1:2000. International Standard --
Information technology -- Universal Multiple-Octet Coded
Character Set (UCS)
[Unicode] The Unicode Consortium, "The Unicode Standard -- Version
3.0", ISBN 0-201-61633-5. Described at
http://www.unicode.org/unicode/standard/versions/
Unicode3.0.html
[UTR15] M. Davis and M. Duerst, "Unicode Normalization Forms",
Unicode Technical Report #15, Nov 1999,
Dan Oscarsson Expires: 19 February 2002 [Page 8]
Internet Draft Universal DNS 19 August 2001
http://www.unicode.org/unicode/reports/tr15/.
[UTR21] M. Davis, "Case Mappings", Unicode Technical Report #21,
Dec 1999, http://www.unicode.org/unicode/reports/tr21/.
[UDATA] The Unicode Character Database,
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt.
The database is described in
ftp://ftp.unicode.org/Public/UNIDATA/
UnicodeCharacterDatabase.html.
[IDNREQ] James Seng, "Requirements of Internationalized Domain
Names", draft-ietf-idn-requirement.
[IANADNS] Donald Eastlake, Eric Brunner, Bill Manning, "Domain Name
System (DNS) IANA Considerations",draft-ietf-dnsext-iana-dns.
[IDNE] Marc Blanchet,Paul Hoffman, "Internationalized domain
names using EDNS (IDNE)", draft-ietf-idn-idne.
[CHNORM] M. Duerst, M. Davis, "Character Normalization in IETF
Protocols", draft-duerst-i18n-norm.
[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
Proposals", draft-ietf-idn-compare.
[NAMEPREP] Paul Hoffman, "Comparison of Internationalized Domain Name
Proposals", draft-ietf-idn-compare.
[SACE] Dan Oscarsson, "Simple ASCII Compatible Encoding", draft-
ietf-idn-sace.
[RACE] Paul Hoffman, "RACE: Row-based ASCII Compatible Encoding
for IDN", draft-ietf-idn-race.
6. Acknowledgements
Paul Hoffman giving many comments in our e-mail discussions.
Ideas from drafts by Paul Hoffman, Stuart Kwan, James Gilroy and Kent
Karlsson.
Magnus Gustavsson, Mark Davis, Kent Karlsson and Andrew Draper for
comments on my draft.
Discussions and comments by the members of the IDN working group.
Dan Oscarsson Expires: 19 February 2002 [Page 9]
Internet Draft Universal DNS 19 August 2001
Author's Address
Dan Oscarsson
Telia ProSoft AB
Box 85
201 20 Malmo
Sweden
E-mail: Dan.Oscarsson@trab.se
Dan Oscarsson Expires: 19 February 2002 [Page 10]

View File

@ -1,442 +0,0 @@
Network Working Group M. Duerst
Internet-Draft W3C
Expires: May 4, 2003 November 3, 2002
Internationalized Domain Names in URIs
draft-ietf-idn-uri-03
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on May 4, 2003.
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract
This document proposes to upgrade the definition of URIs (RFC 2396)
[RFC2396] to work consistently with internationalized domain names.
Duerst Expires May 4, 2003 [Page 1]
Internet-Draft IDNs in URIs November 2002
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. URI syntax changes . . . . . . . . . . . . . . . . . . . . . . 3
3. Security considerations . . . . . . . . . . . . . . . . . . . 5
4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 5
5. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.1 Changes from draft-ietf-idn-uri-02 to draft-ietf-idn-uri-03 . 5
5.2 Changes from draft-ietf-idn-uri-01 to draft-ietf-idn-uri-02 . 5
5.3 Changes from draft-ietf-idn-uri-00 to draft-ietf-idn-uri-01 . 5
References . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 7
Full Copyright Statement . . . . . . . . . . . . . . . . . . . 8
Duerst Expires May 4, 2003 [Page 2]
Internet-Draft IDNs in URIs November 2002
1. Introduction
Internet domain names serve to identify hosts and services on the
Internet in a convenient way. The IETF IDN working group [IDNWG] has
been working on extending the character repertoire usable in domain
names beyond a subset of US-ASCII.
One of the most important places where domain names appear are
Uniform Resource Identifiers (URIs, [RFC2396], as modified by
[RFC2732]). However, in the current definition of the generic URI
syntax, the restrictions on domain names are 'hard-coded'. In
Section 2, this document relaxes these restrictions by updating the
syntax, and defines how internationalized domain names are encoded in
URIs.
The syntax in this document has been chosen to further increase the
uniformity of URI syntax, which is a very important principle of
URIs.
In practice, escaped domain names should be used as rarely as
possible. Wherever possible, the actual characters in
Internationalized Domain Names should be preserved as long as
possible by using IRIs [IRI] rather than URIs, and only converting to
URIs and then to ACE-encoded [IDNA] domain names (or ideally directly
to ACE-encoding without even using URIs) when resolving the IRI.
Also, this document does not exclude the use of ACE encoding directly
in an URI domain name part. ACE encoding may be used directly in an
URI domain name part if this is considered necessary for
interoperability.
Please note that even with the definition of URIs in [RFC2396], some
URIs can already contain host names with escaped characters. For
example, mailto:example@w%33.org is legal per [RFC2396] because the
mailto: URI scheme does not follow the generic syntax of [RFC2396].
2. URI syntax changes
The syntax of URIs [RFC2396] currently contains the following rules
relevant to domain names:
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
Duerst Expires May 4, 2003 [Page 3]
Internet-Draft IDNs in URIs November 2002
The later two rules are changed as follows:
domainlabel = anchar | anchar *( anchar | "-" ) anchar
toplabel = achar | achar *( anchar | "-" ) anchar
and the following rules are added:
anchar = alphanum | escaped
achar = alpha | escaped
Characters outside the repertoire (alphanum) are encoded by first
encoding the characters in UTF-8 [RFC 2279], resulting in a sequence
of octets, and then escaping these octets according to the rules
defined in [RFC2396].
Using UTF-8 assures that this encoding interoperates with IRIs [IRI].
It is also aligned with the recommendations in [RFC2277] and
[RFC2718], and is consistent with the URN syntax [RFC2141] as well as
recent URL scheme definitions that define encodings of non-ASCII
characters based on UTF-8 (e.g., IMAP URLs [RFC2192] and POP URLs
[RFC2384]).
The above syntax rules permit for domain names that are neither
permitted as US-ASCII only domain names nor as internationalized
domain names. However, such domain names should never be used, and
will never be resolved because no such domains will be registered.
For US-ASCII only domain names, the syntax rules in [RFC2396] are
relevant. For example, http://www.w%33.org is legal, because the
corresponding 'w3' is a legal 'domainlabel' according to [RFC2396].
However, http://%2a.example.org is illegal because the corresponding
'*' is not a legal 'domainlabel' according to [RFC2396].
For domain names containing non-ASCII characters, the legal domain
names are those for which the ToASCII operation ([IDNA], [Nameprep];
using the unescaped UTF-8 values as input), with the flags
"UseSTD3ASCIIRules" and "AllowUnassigned" set, is successful. The
URI resolver MUST apply any steps required as part of domain name
resolution by [IDNA], in particular the ToASCII operation, with the
above-mentioned flags set. URIs where the ToASCII operation results
in an error should be treated as unresolvable.
For domain names containing non-ASCII characters, the Nameprep
specification ([Nameprep]) defines some mappings, which mainly
include normalization to NFKC and folding to lower case. When
encoding an internationalized domain name in an URI, these mappings
SHOULD NOT be applied. It should be assumed that the domain name is
already normalized as far as appropriate.
Duerst Expires May 4, 2003 [Page 4]
Internet-Draft IDNs in URIs November 2002
For consistency in comparison operations and for interoperability
with older software, the following should be noted: 1) US-ASCII
characters in domain names should not be escaped. 2) Because of the
principle of syntax uniformity for URIs, it is always more prudent to
take into account the possibility that US-ASCII characters are
escaped.
3. Security considerations
The security considerations of [RFC2396] and those applying to
internationalized domain names apply. There may be an increased
potential to smuggle escaped US-ASCII-based domain names across
firewalls, although because of the uniform syntax principle for URIs,
such a potential is already existing.
4. Acknowledgements
Erik Nordmark
5. Change Log
5.1 Changes from draft-ietf-idn-uri-02 to draft-ietf-idn-uri-03
Clarified expectations on name checking.
5.2 Changes from draft-ietf-idn-uri-01 to draft-ietf-idn-uri-02
Moved change log to back
Changed to only change URIs; IRI syntax updated directly in IRI
draft.
Removed syntax restriction on %hh in the US-ASCII part, but made
clear that restrictions to domain names apply.
Made clear that escaped domain names in URIs should only be an
intermediate representation.
Gave example of mailto: as already allowing escaped host names.
Corrected some typos.
5.3 Changes from draft-ietf-idn-uri-00 to draft-ietf-idn-uri-01
Changed requirement for URI/IRI resolvers from MUST to SHOULD
Changed IRI syntax slightly (ichar -> idchar, based on changes in
[IRI])
Duerst Expires May 4, 2003 [Page 5]
Internet-Draft IDNs in URIs November 2002
Various wording changes
References
[IDNA] Faltstrom, P., Hoffman, P. and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)",
draft-ietf-idn-idna-14.txt (work in progress), October
2002, <http://www.ietf.org/internet-drafts/draft-ietf-
idn-idna-14.txt>.
[IDNWG] "IETF Internationalized Domain Name (idn) Working Group".
[IRI] Duerst, M. and M. Suignard, "Internationalized Resource
Identifiers (IRI)", draft-duerst-iri-02.txt (work in
progress), November 2002, <http://www.ietf.org/internet-
drafts/draft-duerst-iri-02.txt>.
[ISO10646] International Organization for Standardization,
"Information Technology - Universal Multiple-Octet Coded
Character Set (UCS) - Part 1: Architecture and Basic
Multilingual Plane", ISO Standard 10646-1, October 2000.
[Nameprep] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
Profile for Internationalized Domain Names", draft-ietf-
idn-nameprep-11.txt (work in progress), June 2002,
<http://www.ietf.org/internet-drafts/draft-ietf-idn-
nameprep-11.txt>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997.
[RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997.
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
Languages", BCP 18, RFC 2277, January 1998.
[RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO
10646", RFC 2279, January 1998.
[RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998.
[RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
Resource Identifiers (URI): Generic Syntax", RFC 2396,
August 1998.
[RFC2640] Curtin, B., "Internationalization of the File Transfer
Duerst Expires May 4, 2003 [Page 6]
Internet-Draft IDNs in URIs November 2002
Protocol", RFC 2640, July 1999.
[RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke,
"Guidelines for new URL Schemes", RFC 2718, November
1999.
[RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for
Literal IPv6 Addresses in URL's", RFC 2732, December
1999.
Author's Address
Martin Duerst
World Wide Web Consortium
200 Technology Square
Cambridge, MA 02139
U.S.A.
Phone: +1 617 253 5509
Fax: +1 617 258 5999
EMail: duerst@w3.org
URI: http://www.w3.org/People/D%C3%BCrst/
Duerst Expires May 4, 2003 [Page 7]
Internet-Draft IDNs in URIs November 2002
Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Duerst Expires May 4, 2003 [Page 8]

View File

@ -1,505 +0,0 @@
IETF IDN Working Group Sung Jae Shim
Internet Draft DualName, Inc.
Document: draft-ietf-idn-vidn-01.txt 2 March 2001
Expires: 2 September 2001
Virtually Internationalized Domain Names (VIDN)
Status of this Memo
This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task Force
(IETF), its areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be
updated, replaced, or obsoleted by other documents at any time. It is
inappropriate to use Internet-Drafts as reference material or to cite them other
than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
1. Abstract
This document proposes a method that enables domain names to be used in both
local and English scripts, as a directory-search solution at an upper layer
above the DNS. The method first converts virtual domain names typed in local
scripts into the corresponding domain names in English scripts that comply with
the DNS, using the knowledge of transliteration between local and English
scripts. Then, the method searches for and displays domain names in English
scripts that are active on the Internet so that the user can choose any of them.
The conversion takes place automatically and transparently in the user's
applications before DNS queries are sent, and so, the method does not make any
change to the DNS nor require separate name servers.
2. Conventions and definitions used in this document
The key words "REQUIRED" and "MAY" in this document are to be interpreted as
described in RFC-2119 [1].
A "host" is a computer or device attached to the Internet. A "user host" is a
computer or device with which a user is connected to the Internet, and a "user"
is a person who uses a user host. A "server host" is a computer or device that
provides services to user hosts.
An "entity" is an organization or individual that has a domain name registered
with the DNS.
A "local language" is a language other than English language that a user prefers
to use in a local context. "Local scripts" are scripts of a local language and
"English scripts" are scripts of English language.
A "virtual domain name" is a domain name in local scripts, and it is not
registered with the DNS but used for the convenience of users. An "English
domain name" is a domain name in English scripts. A "domain name" refers to an
English domain name that complies with the DNS, unless specified otherwise.
A "coded portion" is a pre-coded portion of a domain name (e.g., generic codes
including 'com', 'edu', 'gov', 'int', 'mil', 'net', 'org', and country codes
such as 'kr', 'jp', 'cn', and so on). An "entity-defined portion" is a portion
of a domain name, which is defined by the entity that holds the domain name
(e.g., host name, organization name, server name, and so on).
The method proposed in this document is called "virtually internationalized
domain names (VIDN)," as it enables domain names in English scripts to be used
virtually in local scripts.
A number of Korean-language characters are used in the original of this document
for examples, which is available from the author upon request. The software used
for Internet-Drafts does not allow using multilingual characters other than
ASCII characters. Thus, this document may not display Korean-language characters
properly, although it may be comprehensible without the examples using Korean-
language characters. Also, when you open the original of this document, please
select your view encoding type to Korean for Korean-language characters to be
displayed properly.
3. Introduction
Domain names are valuable to Internet users as a main identifier of entities and
resources on the Internet. The DNS allows using only English scripts in naming
hosts or clusters of hosts on the Internet. More specifically, the DNS uses only
the basic Latin alphabets (case-insensitive), the decimal digits (0-9) and the
hyphen (-) in domain names. But there is a growing need for internationalized
domain names in local scripts. Recognizing this need, various methods have been
proposed to use local scripts in domain names. But to date, no method appears to
meet all the requirements of internationalized domain names as described in
Wenzel and Seng [2].
A group of earlier methods tries to put internationalized domain names in local
scripts inside some parts of the overall DNS, using special encoding schemes of
Universal Character Set (UCS). But these methods put too much of a burden on the
DNS, requiring a great deal of work for transition and update of the DNS
components and the applications working with the DNS. Another group of earlier
methods tries to build separate directory services for internationalized domain
names or keywords in local scripts. But these methods also require complex
implementation efforts, duplicating much of the work already done for the DNS.
Both the groups of earlier methods require creating internationalized domain
names or keywords in local scripts from scratch, which is a costly and lengthy
process on the parts of the DNS and Internet users. Further, domain names or
keywords created in local scripts are usable only by those who know the local
scripts, and so, they may segregate the Internet into many groups of different
sets of local scripts that are less universal than English scripts.
VIDN intends to provide a more immediate and less costly solution to
internationalized domain names than earlier methods. VIDN does not make any
change to the DNS nor require creating additional domain names in local scripts.
VIDN takes notice of the fact that many domain names currently used in regions
where English scripts are not widely used have their entity-defined portions
consisting of English scripts as transliterated from the respective local
scripts. Using this knowledge of transliteration between local and English
scripts, VIDN converts virtual domain names typed in local scripts into the
corresponding domain names in English scripts that comply with the DNS. In this
way, VIDN enables the same domain names to be used not only in English scripts
as usual but also in local scripts, without creating additional domain names in
local scripts.
4. VIDN method
4.1. Objectives
Earlier methods of internationalized domain names try to create domain names or
keywords in local scripts one way or another in addition to existing domain
names in English scripts, and put them inside or outside the DNS, using special
encoding schemes or lookup services. These methods require a lengthy and costly
process of creating domain names in local scripts and updating the DNS
components and applications. Even when they are successfully implemented, these
methods have a risk of localizing the Internet by segregating it into groups of
different sets of local scripts that are less universal than English scripts and
so diminishing the international scope of the Internet. Further, these methods
may cause more problems and disputes on copyrights, trademarks, and so on, in
local contexts than those that we experience with current domain names in
English scripts.
VIDN intends to provide a solution to the problems of earlier methods of
internationalized domain names. VIDN enables the same domain names to be used in
both English scripts as usual and local scripts, and so, there is no need to
create domain names in local scripts in addition to domain names in English
scripts. VIDN works automatically and transparently in applications at user
hosts before DNS requests are sent, and so, there is no need to make any change
to the DNS or to have additional name servers. For these reasons as well as
others, VIDN can be implemented more immediately with less cost than other
methods of internationalized domain names.
4.2. Description
It is important to note that most domain names used in regions where English
scripts are not widely used have their entity-defined portions consisting of
English scripts as transliterated from local scripts. Of course, there are many
domain names in those regions that do not follow this kind of transliteration
between local and English scripts. In such case, new domain names in English
scripts need to be created following this transliteration, but the number would
be minimal, compared to the number of internationalized domain names in local
scripts to be created and registered under other methods.
The English scripts transliterated from local scripts do not have any meanings
in English language, but their originals in local scripts before the
transliteration have some meanings in the respective local language, usually
indicating organization names, brand names, trademarks, and so on. VIDN enables
to use these original local scripts as the entity-defined portions of virtual
domain names in local scripts, by transliterating them into the corresponding
entity-defined portions of actual domain names in English scripts. In this way,
VIDN enables the same domain names in English scripts to be used virtually in
local scripts without actually creating domain names in local scripts.
As domain names in English scripts overlay IP addresses, so virtual domain names
in local scripts do actual domain names in English scripts. The relationship
between virtual domain names in local scripts and actual domain names in English
scripts can be depicted as:
+---------------------------------+
| User |
+---------------------------------+
| |
+----------------|-----------------------|------------------+
| v (Transliteration) v |
| +---------------------+ | +-----------------------+ |
| | Virtual domain name | | | Actual domain name | |
| | in local scripts |--+->| in English scripts | |
| +---------------------+ +-----------------------+ |
| User application | |
+----------------------------------------|------------------+
v
DNS requests
VIDN uses the phonemes of local and English scripts as a medium in
transliterating the entity-defined portions of virtual domain names in local
scripts into those of actual domain names in English scripts. This process of
transliteration can be depicted as:
Local scripts English scripts
+----------------------------+ +-----------------------------+
| Characters ----> Phonemes -----------> Phonemes ----> Characters |
| | | | | | |
| | | | | | |
| (Inverse of transcription) | Match | (Transcription) |
+----------------------------+ +-----------------------------+
| ^
| (Transliteration) |
+------------------------------------+
First, each entity-defined portion of a virtual domain name typed in local
scripts is decomposed into individual characters or sets of characters so that
each individual character or set of characters can represent an individual
phoneme of the local language. This is the inverse of transcription of phonemes
into characters. Second, each individual phoneme of the local language is
matched with an equivalent phoneme of English language that has the same or most
proximate sound. Third, each phoneme of English language is transcribed into the
corresponding character or set of characters in English language. Finally, all
the characters or sets of characters converted into English scripts are united
to compose the corresponding entity-defined portion of an actual domain name in
English scripts.
For example, a word in Korean language, '­­˜' that means 'century' in English
language, is transliterated into 'segi' in English scripts, and so, the entity
whose name contains '­­˜' in Korean language may have an entity-defined portion
of its domain name as 'segi' in English scripts. VIDN enables to use '­­˜' as
an entity-defined portion of a virtual domain name in Korean scripts, which is
converted into 'segi,' the corresponding entity-defined portion of an actual
domain name in English scripts. In other words, the phonemes represented by the
characters consisting of '­­˜' in Korean scripts have the same sounds as the
phonemes represented by the characters consisting of 'segi' in English scripts.
In the local context, '­­˜' in Korean scripts is clearly easier to remember and
type and more intuitive and meaningful than 'segi' in English scripts.
An entity-defined portion of a virtual domain name in Korean scripts, '¾ž®ý', is
transliterated into 'yahoo' in English scripts, since the phonemes represented
by the characters consisting of '¾ž®ý' in Korean scripts have the same sounds as
the phonemes represented by the characters consisting of 'yahoo' in English
scripts. That is, '¾ž®ý' in Korean scripts is pronounced as the same as 'yahoo'
in English scripts, and so, it is easy for Korean-speaking people to deduce '¾ž
®ý' in Korean scripts as the virtual equivalent of 'yahoo' in English scripts.
VIDN enables to use virtual domain names in local scripts for domain names whose
originals are in local scripts, e.g., '­­˜' in Korean scripts, as well as
domain names whose originals are in English scripts, e.g., '¾ž®ý' in Korean
scripts. In this way, VIDN is able to make domain names truly international,
allowing the same domain names to be used both in English and local scripts.
The coded portions of domain names such as generic codes and country codes can
also be transliterated from local scripts into English scripts, using their
phonemes as a medium. For example, seven generic codes in English scripts, 'com',
'edu', 'gov', 'int', 'mil', 'net', and 'org', can be transliterated from 'ýý', '
Àí´€', '—<>¦Š', 'ÁðË«', ' Ï', 'þÚË«', 'ÀÁ˜Ú' in Korean scripts, respectively,
which can be used as the corresponding generic codes of virtual domain names in
Korean scripts. Based upon its meaning in English language, each coded portion
of actual domain names also can be pre-assigned a virtual equivalent word or
code in local scripts. For example, seven generic codes in English scripts,
'com', 'edu', 'gov', 'int', 'mil', 'net', and 'org', can be pre-assigned '˜‚¾•'
(meaning 'commercial' in Korean language), 'ÌϘþ' (meaning 'education' in Korean
language), 'Âñ¦ð' (meaning 'government' in Korean language), '˜ Âª' (meaning
'international' in Korean language), '˜¦À‹' (meaning 'military' in Korean
language), 'þÚË«' (meaning 'network' in Korean language), and '³›È­' (meaning
'organization' in Korean language), respectively, which can be used as the
corresponding generic codes of virtual domain names in Korean scripts.
VIDN does not create such complexities as other conversion methods based upon
semantics do, since it uses phonemes as a medium of transliteration between
local and English scripts. Further, most languages have a small number of
phonemes. For example, Korean language has nineteen consonant phonemes and
twenty-one vowel phonemes, and English language has twenty-four consonant
phonemes and twenty vowel phonemes. Each phoneme of Korean language can be
matched with a phoneme of English language that has the same or proximate sound,
and vice versa.
Some characters or sets of characters may represent more than one phoneme. Some
phonemes may be represented by more than one character or set of characters.
Also, not every character or set of characters in local scripts may be neatly
transliterated into only one character or set of characters in English scripts.
In practice, people often transliterate the same local scripts differently into
English scripts or vice versa. VIDN incorporates the provisions to deal with
those variations that usually occur in particular situations as well as those
variations that are caused by common usage or idiomatic expressions. More
fundamentally, VIDN uses phonemes, which are very universal across different
languages, as a medium of transliteration rather than following a certain set of
transliteration rules that does not exist in many non-English-speaking countries
nor is followed by many non-English-speaking people.
One virtual domain name typed in local scripts may be converted into more than
one possible domain name in English scripts. In such case, VIDN can search for
and displays only those domain names in English scripts that are active on the
Internet, so that the user can choose any of them. Further, VIDN can be used as
a directory-search solution at an upper layer above the DNS. That is, the user
can use VIDN to query a phoneme-based domain name request in local scripts,
receive one or more corresponding domain names in English or ASCII-compatible
scripts preferably, choose one based upon the results of that search, and make
the final DNS request using any protocol or method to be chosen for
internationalized domain names. In this regard of directory search, VIDN uses
one-to-many map between virtual domain names in local scripts and actual domain
names in English scripts.
VIDN needs the one-to-many mapping and subsequent multiple DNS lookups only at
the first query of each virtual domain name typed in local scripts at the user
host. After the first query, the virtual domain name is set to the domain name
in English scripts that has been chosen at the first query. Any subsequent
queries with the same virtual domain name generate only one query with the
selected domain name in English scripts. Once the use selects one possible
domain name in English scripts from the list, VIDN remembers the user's
selection and directs the user to the same domain name at his or her subsequent
queries with that virtual domain name. In this way, VIDN can generate less
traffic on the DNS, while providing faster, easier, and simpler navigation on
the Internet to the user, using local scripts.
Utilizing a coding scheme, VIDN is also capable of making each virtual domain
name typed in local scripts correspond to exactly one actual domain name in
English scripts. In this coding scheme, a unique code such as the Unicode or
hexadecimal code represented by the virtual domain name, is pre-assigned to one
of the corresponding domain names in English scripts and stored in the
respective server host, so that both the user host and the server host can
support and understand the code. Then, VIDN checks whether the code at each
server host matches with the code generated at the user host. If one of the
servers stores the code that matches with the code generated at the user host,
the virtual domain name typed at the user host is recognized as corresponding
only to the domain name of that server host, and the user host is connected to
the server host. The domain names of the remaining server hosts that do not have
the matching code are also displayed at the user host as alternative sites.
Because a unique code is assigned to only one of the domain names in English
scripts, it does not cause any domain name squatting problem beyond what we
experience with current domain names in English scripts. Unique codes do not
need to be stored in any specific format, that is, they can be embedded in HTML,
XML, WML, and so on, so that the user host can interpret the retrieved code
correctly. Likewise, unique codes do not require any specific intermediate
transport protocol such as TCP/IP. The only requirement is that the protocol
must be understood among all participating user hosts and server hosts. For
security purpose, this coding scheme may use an encryption technique.
For example, 'ž¾Ô.ýý', a virtual domain name typed in Korean scripts, may
result in four corresponding domain names in English scripts, including
'jungang.com', 'joongang.com,' 'chungang.com', and 'choongang.com', since the
phonemes represented by characters consisting of 'ž¾Ô.ýý' in Korean scripts can
have the same or almost the same sounds as the phonemes represented by
characters consisting of 'jungang.com', 'joongang.com,' 'chungang.com', or
'choongang.com' in English scripts. In this case, we assume that the server host
with its domain name 'jungang.com' has the pre-assigned code that matches with
the code generated when 'ž¾Ô.ýý' in Korean scripts is entered in user
applications. Then, the user host is connected to this server host, and the
other server hosts may be listed to the user as alternative sites so that the
user can try them.
The process of this coding scheme that makes each virtual domain name in local
scripts correspond to only one actual domain name in English scripts, can be
depicted as:
+---------------------------------+
| User |
+---------------------------------+
| |
+----------------|-----------------------|------------------+
| v v |
| +---------------------+ +-----------------------+ |
| | Virtual domain name | | Potential domain names| |
| | in a local language |---->| in English | |
| | e.g., 'ž¾Ô.ýý' | | e.g., 'jungang.com' | |
| | (code: 297437)| | 'joongang.com' | |
| | | | 'chungang.com' | |
| | | | 'choongang.com' | |
| +---------------------+ +-----------------------+ |
| User application | |
+----------------------------------------|------------------+
^ |
| | Code check by VIDN
Connection to | | +-- 'jungang.com'
the server host | | | (code: 297437)
'jungang.com' | | |-- 'joongang.com'
| |----+ (not active)
| | |-- 'chungang.com'
| | | (code: 381274)
| DNS request and | +-- 'choongang.com'
| response | (not active)
+-----------------------+
Since VIDN converts separately the entity-defined portions and the coded
portions of a virtual domain name, it preserves the current syntax of domain
names, that is, the hierarchical dotted notation, which Internet users are
familiar with. Also, VIDN allows using a virtual domain name mixed with local
and English scripts as the user wishes to, since the conversion takes place on
each individual portion of the domain name and each individual character or set
of characters of the portion.
While VIDN preserves the hierarchical dotted notation of current domain names,
the principles of VIDN are applicable to domain names in other possible
notations such as those in a natural language (e.g., 'microsoft windows' rather
than 'windows.microsoft.com'). Also, the principles of VIDN can be applied into
other identifiers used on the Internet, such as user IDs of e-mail addresses,
names of directories and folders, names of web pages and files, keywords used in
search engines and directory services, and so on, allowing them to be used
interchangeably in local and English scripts, without creating additional
identifiers in local scripts. The conversion of VIDN can be done between any two
sets of scripts interchangeably. Thus, even when the DNS accepts and registers
domain names in other local scripts in addition to English, VIDN can allow using
the same domain names in any two sets of scripts by converting virtual domain
names in one set of scripts into actual domain names in another set of scripts.
4.3. Development and implementation
In a preferred arrangement, the development of VIDN for each set of local
scripts may be administered by one or more local standard bodies in regions
where the local scripts are widely used, for example, Korean Network Information
Center for Korean scripts, Japan Network Information Center for Japanese scripts,
and China, Hong Kong and Taiwan Network Information Centers for Chinese scripts,
with consultation with experts on phonemics and linguistics of the respective
local language and English language. Also, the unique codes for one-to-one
mapping between virtual domain names in local scripts and actual domain names in
English scripts can be administered by a central standard body like IANA.
Alternatively, the unique codes for each set of local scripts may be
administered by one or more local standard bodies in regions where the local
scripts are widely used, as with the development of VIDN.
VIDN is implemented in applications at the user host. That is, the conversion of
virtual domain names in local scripts into the corresponding actual domain names
in English scripts takes place at the user host before DNS requests are sent.
Thus, neither a special encoding nor a separate lookup service is needed to
implement VIDN. VIDN is also modularized with each module being used for
conversion of virtual domain names in one set of local scripts into the
corresponding actual domain names in English scripts. A user needs only the
module for conversion of his or her preferred set of local scripts into English
scripts. Alternatively, VIDN can be implemented at a central server host or a
cluster of local server hosts. A central server can provide the conversion
service for all sets of local scripts, or a cluster of local server hosts can
share the conversion service. In the latter case, each local server host can
provide the conversion service for one or more sets of local scripts used in a
certain region.
Because of its small size, VIDN can be easily embedded into applications
software such as web browser, e-mail software, ftp system, and so on at the user
host, or it can work as an add-on program to such software. In either case, the
only requirement on the part of the user is to install VIDN or software
embedding VIDN at the user host. Using virtual domain names in local scripts in
accordance with the principles of VIDN is very intuitive to those who use the
local scripts. The only requirement on the part of the entity whose server host
provides Internet services to user hosts is to have an actual domain name in
English scripts into which virtual domain names in local scripts are neatly
transliterated in accordance with the principles of VIDN. Most entities in
regions where English scripts are not widely used already have such domain names
in English scripts. Finally, there is nothing to change on the part of the DNS,
since VIDN uses the current DNS as it is.
Taken together, the features of VIDN can meet all the requirement of
internationalized domain names as described in Wenzel and Seng [2], with respect
to compatibility and interoperability, internationalization, canonicalization,
and operating issues. Given the fact that different methods toward
internationalized domain names confuse users, as already observed in some
regions where some of these methods have already been commercialized, e.g.,
Korea, Japan and China, it is important to find and implement the most effective
solution to internationalized domain names as soon as possible.
4.4. Current status
VIDN has been developed for Korean-English conversion as a web browser add-on
program. The program contains all the features described in this document and is
capable of listing all the domain names in English scripts that correspond to a
virtual domain name typed in Korean scripts so that a user can choose any of
them. The program can cover more than ninety percent of the sample. That is, the
results of testing indicate that more than ninety percent of web sites in Korea
can be accessed using virtual domain names in Korean scripts without creating
additional domain names in Korean scripts. The remaining ten percent of domain
names are mostly those that contain acronyms, abbreviations or initials. With
improvement of its knowledge of transliteration, the program is expected to
cover more domain names used in Korea.
5. Security considerations
Because VIDN uses the DNS as it is, it inherits the same security considerations
as the DNS.
6. Intellectual property considerations
It is the intention of DualName, Inc. to submit the VIDN method and other
elements of VIDN software to IETF for review, comment or standardization.
DualName has applied for one or more patents on the technology related to
virtual domain name software and virtual email software. If a standard is
adopted by IETF and any patents are issued to DualName with claims that are
necessary for practicing the standard, DualName is prepared to make available,
upon written request, a non-exclusive license under fair, reasonable and non-
discriminatory terms and condition, based on the principle of reciprocity,
consistent with established practice.
7. References
1 Wenzel, Z. and Seng, J. (Editors), "Requirements of Internationalized Domain
Names," draft-ietf-idn-requirements-03.txt, August 2000
8. Author's address
Sung Jae Shim
DualName, Inc.
3600 Wilshire Boulevard, Suite 1814
Los Angeles, California 90010
USA
Email: shimsungjae@dualname.com

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,437 +0,0 @@
INTERNET-DRAFT John C Klensin
21 October 2002
Expires April 2003
National and Local Characters in DNS TLD Names
draft-klensin-idn-tld-00.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance
with all provisions of Section 10 of RFC2026 except that the
right to produce derivative works is not granted.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Abstract
In the context of work on internationalizing the Domain Name System
(DNS), there have been extensive discussions about "multilingual" or
"internationalized" top level domain names (TLDs), especially for
countries whose predominant language is not written in a Roman-based
script. This document reviews some of the motivations for such
domains and the constraints that the DNS imposes. It then suggests
an alternative, local translation, that may solve a superset of the
problem while avoiding protocol changes, serious deployment delays,
and other difficulties.
Table of Contents
1 Introduction
1.1 Background on the "Multilingual Name" Problem
1.2 Domain Name System Constraints
1.3 Internationalization and Localization
2. Client-side solutions
2.1 IDNA and the client
2.2 Local translation tables for TLD names
3. Advantages and disadvantages of local translation
3.1 Every TLD in the local language and character set
3.2 Unification of country code domains
3.3 User understanding of local and global reference
3.4 Limits on TLD propagation
4. Security Considerations
5. References
6. Acknowledgements
7. Author's Address
1. Introduction
1.1 Background on the "Multilingual Name" Problem
People who share a language prefer to communicate in it, using whatever
characters are normally used to write that language, rather than in some
"foreign" one. There have been standards for using mutually-agreed
characters and languages in electronic mail message bodies and selected
headers since the introduction of MIME in 1992 [MIME] and the Web has
permitted multilingual text since its inception. However, since domain
names are exposed to users in email addresses and URLs, and
corresponding arrangements in other protocols, demand rapidly arose to
permit domain names in applications that used characters other than
those of the very restrictive, ASCII-subset, "LDH" conventions [LDH].
The effort to do this rapidly became known as "multilingual domain
names", although that is a misnomer, since the DNS deals only with
characters and identifier strings, and not, except by accident, what
people usually think of as "names". And there has been little actual
interest in what would actually be a "multilingual name" -- i.e., a name
that contains components from more than one language -- but only the use
of strings conforming to different languages in the context of the DNS.
1.1.1 Approaches to the requirement
If the requirement is seen, not as "modifying the DNS", but as
"providing users with access to the DNS from a variety of languages and
character sets", three sets of proposals have emerged in the IETF and
elsewhere. They are:
(1) Perform processing in client software that recodes a user-visible
string into an ASCII-compatible form that can safely be passed
through the DNS protocols and stored in the DNS. This is the
approach used, for example, in the IETF's "IDNA" protocol [IDNA].
(2) Modify the DNS to be more hospitable to non-ASCII names and
strings. There have been a variety of proposals to do this in almost
as many ways, some of which have been implemented on a proprietary
basis by various vendors. None of them have gained acceptance in the
IETF community, primarily because they would take a long time to
deploy and would leave many problems unsolved.
(3) Move the problem out of the DNS entirely, relying instead on a
"directory" or "presentation" layer to handle internationalization.
The rationale for this approach is discussed in [DNSROLE].
This document proposes a fourth approach, applicable to the top level
domains (TLDs) only (see section 1.2.1 for a discussion of the special
issues that make TLDs problematic). That approach could be used as an
alternate or supplement to the strategies summarized above.
1.1.2 Writing the name of one's country in its own characters
An early focus of the "multilingual domain name" efforts was expressed
in statements such as "users in my country, in which ASCII is rarely
used, should be able to write an entire domain name in their own
character set. In particular, since all top-level domain names, at
present, follow the LDH rules, the somewhat more restrictive naming
rules discussed in [STD3], and the coding conventions specified in
[RFC1591], all fully-qualified DNS names were effectively required to
contain at least one ASCII label (the TLD name), and that was considered
inappropriate. One should, instead, be able to write the name of the
ccTLD for China in Chinese, the name of the ccTLD for Saudi Arabia in
Arabic, and so on.
1.1.3 Countries with multiple languages and countries with multiple
names
>From a user interface standpoint, writing ccTLD names in local
characters is a problem. As discussed in section 1.2.2, the DNS itself
does not easily permit a domain to be referred to by more than one name
(or spelling or translation of a name). Countries with more than one
official language would require that the country name be represented in
each of those languages. And, just as it is important that a user in
China be able to represent the name of the Chinese ccTLD in Chinese
characters, she should be able to access a Chinese-language site in
France using Chinese characters, requiring that she be able to write the
name of the French ccTLD in those characters rather than in a form based
on a Roman character set.
1.2 Domain Name System Constraints
1.2.1 Administrative hierarchy
The domain name system is designed around the idea of an "administrative
hierarchy", with the entity responsible for a given node of the
hierarchy responsible for policies applicable to its subhierarchies (Cf.
[STD13]). The model works quite well for the domain and subdomains of a
particular enterprise, where the hierarchy can be organized to match the
organizational structure, there are established ways to set policies and
there is, at least presumably, shared assumptions about overall goals
and objectives among all registrants in the domain. It is more
problematic when a domain is shared by unrelated entities which lack
common policy assumptions. It is difficult to reach agreement on rules
that should apply to all of them. That situation always prevails for
the labels registered in a TLD (second-level names) except in those TLDs
for which the second level is structural (e.g., the .CO, .AC, .GOV
conventions in many ccTLD) in which case, it exists for the labels
within that structural level.
TLDs may, but need not, have consistent registration policies for those
second (or third) level names. Countries (or ccTLD administrators) have
often adopted rules about what entities may register in those ccTLDs,
and the forms the names may take. RFC 1591 outlined registration norms
for most of the gTLDs, even though those norms have been largely ignored
in recent years. And some recent "sponsored" domains are based on quite
specific rules about appropriate registrations. Homogeneous
registration rules for the root are, by contrast, impossible: almost by
definition, the subdomains registered in it are diverse and no single
policy applying to all root subdomains (TLDs) is feasible.
1.2.2 Aliases
In an environment different from the DNS, a rational way to permit
assigning local-language names to a country code (or other) domain would
be to set up an alias for the name, or to use some sort of "see instead"
reference. But the DNS does not have quite the right facilities for
either. Instead, it supports a "CNAME" record, whose label can refer
onto to a particular label and not to a subtree. For example, if A.B.C
is a fully-qualified name, then a CNAME reference from X to A would make
X.B.C appear to have the same values as A.B.C. However, a CNAME
reference from Y to C would not make A.B.Y referenceable (or even
defined) at all. A second record type, DNAME [RFC2672], can provide an
alias for a portion of the tree. But it is problematic technically, and
its use is strongly discouraged except for transition uses from one
domain to another.
1.3 Internationalization and Localization
It has often been observed that while many people talk about
"internationalization" (a term we typically use for making something
globally accessible while incorporating a broad-range "universal"
character set and conventions appropriate to all languages), they often really
mean, and want, "localization" (making things work well in a particular
locality, or well, but potentially differently, for a broad range of
localities). Anything that actually involves the DNS must be global and
hence internationalized since the DNS cannot meaningfully support
different responses based, e.g., on the location of the user making a
query. While the DNS cannot support localization internally, many of
the features discussed earlier in this section are much more easily
thought about in local terms --whether localized to a geographical area,
users of a language, or using some other criteria -- than in global ones.
2. Client-side solutions
Traditionally, the IETF has avoided becoming involved in standardization
for actions that take place strictly on individual hosts on the network,
assuming that it should confine itself to behavior that is observable
"on the wire", i.e., in protocols between network hosts. Exceptions to
this general principle have been made when different clients were
required to utilize data or interpret values in compatible ways to
preserve interoperability: the standards for email and web body formats,
and IDNA itself, are examples of these exceptions. Regardless of what
is required to be standardized, it is almost never required, and often
unwise, that a user interface, by default, present on-the-wire formats
to the user. However, in most cases when the presentation format and
the wire format differ, the client program must take precautions that
the wire format can be reconstructed from user input, or to keep the
wire format, while hidden, bound to the presentation mechanism so that
it can be reconstructed. And, while it is rarely a goal in itself, it
is often necessary that the user be at least vaguely aware that the wire
("real") format is different from the presentation one and that the wire
format be available for debugging.
2.1 IDNA and the client
As mentioned above, IDNA itself is entirely a client-side protocol. It
works by providing labels to the DNS in a special format (so-called
"ACE"). When labels in that format are encountered, they are
transformed, by the client, back into internationalized (normally
Unicode) characters. In the context of this document, the important
obvservation about IDNA is that any application program that supports it
is already doing considerable transformation work on the client; it is
not simply presenting the on-the-wire formats to the user.
2.2 Local translation tables for TLD names
We suggest that, in addition to maintaining the code and tables required
to support IDNA, clients may want to maintain a table that contains a
list of TLDs and that maps between them and locally-desirable names.
For ccTLDs, these might be the names (or locally-standard abbreviations)
by which the relevant countries are known locally (whether in ASCII
characters or others). With some care on the part of the application
designer (e.g., to ensure that local forms do not conflict with the
actual TLD names), a particular TLD name input from the user could be
either in local or standard form without special tagging or problems.
When DNS names are received by these client programs, the TLD labels
would be mapped to local form before IDNA is applied to the rest of the
name; when names are received from users, local TLD names would be
mapped to the global ones before being passed into IDNA or for other DNS
processing.
3. Advantages and disadvantages of local translation
3.1 Every TLD in the local language and character set
The notion of a top-level domain whose name matches, e.g., the name that
is used for a country in that country or the name of a language in that
language as, as mentioned above, immediately appealing. But most of the
reasons for it argue equally strongly for other TLDs being accessible
from that language. A user in Korea who can access the national ccTLD
in the Korean language and character set has every reason to expect that
both generic top level domains and and domains associated with other
countries would be similarly accessible, especially if the second-level
domains bear Korean names. A user in Spain or Portugal, or in Latin
America, would presumably have similar expectations, but would expect to
use Spanish names, not Korean ones.
That level of local optimization is not realistic --some would argue not
possible-- with the DNS since it would ultimately require that every top
level domain be replicated for each of the world's languages. That
replication process would involve not just the top level domain itself:
in principle, all of its subtrees would need to be completely replicated
as well (or at least all of the subtrees for which a the language
associated with the a given replicant was relevant). The administrative
hierarchy characteristics of the DNS (see section 1.2.1) turn the
replication process into an administrative nightmare: every
administrator of a second-level domain in the world would be forced to
maintain dozens, probably hundreds, of similar zone files for the the
replicates of the domain. Even if only the zones relevant to a
particular country or language were replicated, the administrative and
tracking problems to bind these to the appropriate top-level domain and
keep all of the replicas synchronized would be extremely difficulty at
best. And many administrators of third- and fourth-level domains, and
beyond, would be faced with similar problems.
By contrast, dealing with the names of TLDs as a localization problem,
using local translation, is fairly simple. Each function represented by
a TLD -- a country, generic registrations, or purpose-specific
registrations -- could be represented in the local language and
character set as needed. And, for countries with many languages, or
users living, working, or visiting countries where their language was
not dominant, "local" could be defined in terms of the needs or wishes
of each particular user.
3.2 Unification of country code domains
It follows from some of the comments above that, while there appears to
be some immediate appeal from having (at least) two domains for each
country, one using the ISO 3166-1 code and another one using a name
based on the national name in the national language, such a situation
would create considerable problems for registrants in the multiple
domains. For registrants maintaining enterprise or organizational
subdomains, ease of administration in a single family of zone files will
usually make a registration in a single top-level domain preferable to
replicated sets of them, at least as long as their functional
requirements (such a local-language access) are met by the unified
structure.
Of course, having replicated domains might be popular with registries
and registrars, since replication would almost inevitably increase the
total number of domains to be registered.
3.3 User understanding of local and global references
While the IDNA tables (actually Nameprep and Stringprep -- see the IDNA
specification) must be identical globally for IDNA to work reliably, the
tables for mapping between local names and TLD names could be locally
determined, and differ from one locale to another, as long as users
understood that international interchange of names required using the
standard forms. That understanding could be assisted by software. It
is likely that, at least for the foreseeable future, DNS names being
passed among users in different countries, or using different languages,
will be forced to be in ACE form to guarantee compatibility in any
event, so the marginal knowledge or effort needed to put TLD names into
standard form and transmit them that way would be very small.
3.4 Limits on TLD propagation
The concept of using local translation does have one side-effect, which
some portions of the Internet community might consider undesirable.
The size and complexity of translation tables, and maintaining those
tables, will be, to a considerable extent, a function of the number of
top-level domains, the frequency with which new domains are added, and
the number of domains that are added at a time. A country or other
locale that wished to maintain a few set of translations (i.e., so that
every TLD had a representation in the local language) would presumably
find setting up a table for the current collection of a few hundred
domains to be a task that would take some days. If the number of TLDs
was relatively stable, with a relatively small number being added at
infrequent intervals, the updates could probably be dealt with on an ad
hoc basis. But, if large numbers of domains were added frequently, or
if the total number of TLDs became very large, maintaining the table
might require dedicated staff. Worse, updating the tables stored on
client machines might require update and synchronization protocols and
all of the related complexities.
4. Security Considerations
IDNA provides a client-based mechanism for presenting Unicode names in
applications while passing only ASCII-based names on the wire. As such,
it constitutes a major step along the path of introducing a client-based
presentation layer into the Internet. Client-based presentation layer
transformations introduce risks from variant tables that can change
meaning without external protection. For example, if a mapping table
normally maps A onto C and that table is altered by an attacker so that
A maps onto D instead, much mischief can be committed. On the other
hand, these are not the usual sort of network attacks: they may be
thought of as falling into the "users can always cause harm to
themselves" category. The local translation model outlined here does
not significantly increase the risks over those associated with IDNA,
but may provide some new avenues for exploiting them.
Both this approach and IDNA rely on having updated programs present
information to the user in a very different form than the one in which
it is transmitted on the wire. Unless the internal (wire) form is
always used in interchange, there are possibilities for ambiguity and
confusion about references.
5. References
[DNSROLE] Klensin, J.C., "Role of the Domain Name System", work in
progress (draft-klensin-dns-role-04.txt).
[IDNA] Faltstorm, F., P. Hoffman, A. M. Costello, "Internationalizing
Domain Names in Applications (IDNA)", work in progress
(draft-ietf-idn-idna-13.txt)
[LDH] STD13 and comments
[MIME] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet Mail
Extensions): Mechanisms for Specifying and Describing the Format of
Internet Message Bodies", RFC 1341, June 1992. Updated and replaced
by Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Bodies",
RFC2045, November 1996. Also, Moore, K., "Representation of
Non-ASCII Text in Internet Message Headers", RFC 1342, June 1992.
Updated and replaced by Moore, K., "MIME (Multipurpose Internet
Mail Extensions) Part Three: Message Header Extensions for
Non-ASCII Text", RFC 2047, November 1996.
[RFC1591] Postel, J., "Domain Name System Structure and Delegation",
RFC1591, March 1994.
[RFC2672] Crawford, M., "Non-Terminal DNS Name Redirection", RFC 2672,
August 1999.
[STD3] Braden, R., Ed., "Requirements for Internet Hosts - Application and
Support", RFC1123, October 1989.
[STD13] Mockapetris, P.V., 1034 "Domain names - concepts and
facilities", RFC 1034, and "Domain names - implementation and
specification", RFC 1035, November 1987.
6. Acknowledgements
This document was inspired by a number of conversations in ICANN, IETF,
MINC, and private contexts about the future evolution and
internationalization of top level domains. Discussions within, and
about, the ICANN IDN Committee have been particularly helpful, although
several of the members of that committee may be surprised about where
those discussions led.
7. Author's Address
John C Klensin
1770 Massachusetts Ave, #322
Cambridge, MA 02140 USA
email: john+ietf@jck.com