mirror of
https://gitlab.isc.org/isc-projects/bind9
synced 2025-08-30 05:57:52 +00:00
remove expired/drafts
This commit is contained in:
parent
a80cc8dfd9
commit
d38cad0add
File diff suppressed because it is too large
Load Diff
@ -1,767 +0,0 @@
|
||||
|
||||
|
||||
|
||||
Internet Engineering Task Force Jun-ichiro itojun Hagino
|
||||
INTERNET-DRAFT IIJ Research Laboratory
|
||||
Expires: January 19, 2002 July 19, 2001
|
||||
|
||||
|
||||
Comparison of AAAA and A6 (do we really need A6?)
|
||||
draft-ietf-dnsext-aaaa-a6-01.txt
|
||||
|
||||
Status of this Memo
|
||||
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with all
|
||||
provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering Task
|
||||
Force (IETF), its areas, and its working groups. Note that other groups
|
||||
may also distribute working documents as Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet-Drafts as reference material
|
||||
or to cite them other than as ``work in progress.''
|
||||
|
||||
To view the list Internet-Draft Shadow Directories, see
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
Distribution of this memo is unlimited.
|
||||
|
||||
The internet-draft will expire in 6 months. The date of expiration will
|
||||
be January 19, 2002.
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
At this moment, there are two DNS resource record types defined for
|
||||
holding IPv6 address in the DNS database; AAAA [Thomson, 1995] and A6
|
||||
[Crawford, 2000] . AAAA has been used for IPv6 network operation since
|
||||
1996. Questions arose whether we really need A6 or not, or whether it
|
||||
is really possible to migrate to A6 or not. Some says AAAA is enough
|
||||
and A6 is not necessary. Some says A6 is necessary and AAAA should get
|
||||
deprecated.
|
||||
|
||||
The draft tries to understand pros and cons between these two record
|
||||
types, and makes suggestions on deployment of IPv6 record type.
|
||||
|
||||
The draft does not cover the use of bit string label and DNAME resource
|
||||
record (reverse mapping), as it seems that nibble form is well accepted
|
||||
in the community, newer formats have too much deployment costs, thus we
|
||||
see few need/voice that calls for migration. Refer to IETF50 dnsext
|
||||
working group minutes for more details.
|
||||
|
||||
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 1]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
1. A brief summary of the IPv6 resource record types
|
||||
|
||||
1.1. AAAA record
|
||||
|
||||
AAAA resource record is formatted as follows. DNS record type value for
|
||||
AAAA is 28 (assigned by IANA). Note that AAAA record is formatted as a
|
||||
fixed-length data.
|
||||
|
||||
+------------+
|
||||
|IPv6 Address|
|
||||
| (16 octets)|
|
||||
+------------+
|
||||
|
||||
With AAAA, we can define DNS records for IPv6 address resolution as
|
||||
follows, just like A records for IPv4.
|
||||
|
||||
$ORIGIN X.EXAMPLE.
|
||||
N AAAA 2345:00C1:CA11:0001:1234:5678:9ABC:DEF0
|
||||
N AAAA 2345:00D2:DA11:0001:1234:5678:9ABC:DEF0
|
||||
N AAAA 2345:000E:EB22:0001:1234:5678:9ABC:DEF0
|
||||
|
||||
|
||||
1.2. A6 record
|
||||
|
||||
A6 resource record is formatted as follows. DNS record type value for
|
||||
A6 is 38 (assigned by IANA). Note that A6 record is formatted as a
|
||||
variable-length data.
|
||||
|
||||
+-----------+------------------+-------------------+
|
||||
|Prefix len.| Address suffix | Prefix name |
|
||||
| (1 octet) | (0..16 octets) | (0..255 octets) |
|
||||
+-----------+------------------+-------------------+
|
||||
|
||||
With A6, it is possible to define an IPv6 address by using multiple DNS
|
||||
records. Here is an example taken from RFC2874:
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 2]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
$ORIGIN X.EXAMPLE.
|
||||
N A6 64 ::1234:5678:9ABC:DEF0 SUBNET-1.IP6
|
||||
SUBNET-1.IP6 A6 48 0:0:0:1:: IP6
|
||||
IP6 A6 48 0::0 SUBSCRIBER-X.IP6.A.NET.
|
||||
IP6 A6 48 0::0 SUBSCRIBER-X.IP6.B.NET.
|
||||
|
||||
SUBSCRIBER-X.IP6.A.NET. A6 40 0:0:0011:: A.NET.IP6.C.NET.
|
||||
SUBSCRIBER-X.IP6.A.NET. A6 40 0:0:0011:: A.NET.IP6.D.NET.
|
||||
|
||||
SUBSCRIBER-X.IP6.B.NET. A6 40 0:0:0022:: B-NET.IP6.E.NET.
|
||||
|
||||
A.NET.IP6.C.NET. A6 28 0:0001:CA00:: C.NET.ALPHA-TLA.ORG.
|
||||
|
||||
A.NET.IP6.D.NET. A6 28 0:0002:DA00:: D.NET.ALPHA-TLA.ORG.
|
||||
|
||||
B-NET.IP6.E.NET. A6 32 0:0:EB00:: E.NET.ALPHA-TLA.ORG.
|
||||
|
||||
C.NET.ALPHA-TLA.ORG. A6 0 2345:00C0::
|
||||
D.NET.ALPHA-TLA.ORG. A6 0 2345:00D0::
|
||||
E.NET.ALPHA-TLA.ORG. A6 0 2345:000E::
|
||||
|
||||
If we translate the above into AAAA records, it will be as follows:
|
||||
|
||||
$ORIGIN X.EXAMPLE.
|
||||
N AAAA 2345:00C1:CA11:0001:1234:5678:9ABC:DEF0
|
||||
N AAAA 2345:00D2:DA11:0001:1234:5678:9ABC:DEF0
|
||||
N AAAA 2345:000E:EB22:0001:1234:5678:9ABC:DEF0
|
||||
|
||||
It is also possible to use A6 records in ``non-fragmented'' manner, like
|
||||
below.
|
||||
|
||||
$ORIGIN X.EXAMPLE.
|
||||
N A6 0 2345:00C1:CA11:0001:1234:5678:9ABC:DEF0
|
||||
N A6 0 2345:00D2:DA11:0001:1234:5678:9ABC:DEF0
|
||||
N A6 0 2345:000E:EB22:0001:1234:5678:9ABC:DEF0
|
||||
|
||||
There is a large design difference between A6 and AAAA. A6 imposes
|
||||
address resolutions tasks more to the resolver side, to reduce the
|
||||
amount of zone file maintenance cost. The complexity is in the resolver
|
||||
side. AAAA asks zone file maintainers to supply the full 128bit IPv6
|
||||
address in one record, and the resolver side can be implemented very
|
||||
simple.
|
||||
|
||||
|
||||
2. Deployment status
|
||||
|
||||
2.1. Name servers/resolvers
|
||||
|
||||
As of writing, AAAA is deployed pretty widely. BIND4 (since 4.9.4),
|
||||
BIND8, BIND9 and other implementations support AAAA, as both DNS servers
|
||||
and as resolver libraries. On the contrary, the author knows of only
|
||||
one DNS server/resolver implementation that supports A6; BIND9.
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 3]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
Almost all of the IPv6-ready operating systems ship with BIND4 or BIND8
|
||||
resolver library. [need to check situations with resolver libraries
|
||||
based on non-BIND code] Therefore, they cannot query A6 records (unless
|
||||
applications gets linked with BIND9 libraries explicitly).
|
||||
|
||||
2.2. IPv6 network
|
||||
|
||||
IPv6 network has been deployed widely since 1996. Though many of the
|
||||
participants consider it to be experimental, commercial IPv6 services
|
||||
has been deployed since around 1999, especially in Asian countries.
|
||||
Even today, there are numerous IPv6 networks operated just as serious as
|
||||
IPv4.
|
||||
|
||||
2.3. DNS database
|
||||
|
||||
There are no IPv6-reachable root DNS servers, partly because we have
|
||||
both AAAA and A6, and we are not decided about which is the one we would
|
||||
like to really deploy (so we cannot put IPv6 root NS records). The lack
|
||||
of IPv6-reachable root DNS servers is now preventing IPv6-only or
|
||||
IPv4/v6 dual stack network operations.
|
||||
|
||||
At this moment, very small number of ccTLD registries accept
|
||||
registeration requests for IPv6 glue records. Many of the ccTLDs and
|
||||
gTLDs do not take IPv6 glue records, partly because of the lack of
|
||||
consensus between AAAA and A6. Again, the lack of IPv6 glue records is
|
||||
causing pain in IPv6-ready network operations. For example, JP ccTLD
|
||||
accepts IPv6 glue records and registers them as AAAA records. IPv6 NS
|
||||
records (with AAAA) works flawlessly from our experiences. For example,
|
||||
try the following commands to see how JP ccTLD registers IPv6 glue
|
||||
records (``/e'' is for English-language output):
|
||||
|
||||
% whois -h whois.nic.ad.jp wide.ad.jp/e
|
||||
% whois -h whois.nic.ad.jp ns1.v6.wide.ad.jp/e
|
||||
|
||||
|
||||
3. Deploying DNS records
|
||||
|
||||
At this moment, the following four strategies are proposed for the
|
||||
deployment of IPv6 DNS resource record; AAAA, fragmented A6 records,
|
||||
non-fragmented A6 records, and AAAA synthesis.
|
||||
|
||||
3.1. AAAA records
|
||||
|
||||
AAAA records have been used on IPv6 network (also known as 6bone) since
|
||||
it has started in 1996 and has been working just fine ever since. AAAA
|
||||
record is a straight extension of A record; it needs a single query-
|
||||
response roundtrip to resolve a name into an IPv6 address.
|
||||
|
||||
A6 was proposed to add network renumbering friendliness to AAAA. With
|
||||
AAAA, a full 128bit IPv6 address needs to be supplied in a DNS resource
|
||||
record. Therefore, in the event of network renumber, administrators
|
||||
need to update the whole DNS zone file with the new IPv6 address
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 4]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
prefixes. We will discuss the issues with renumbering in a dedicated
|
||||
section.
|
||||
|
||||
3.2. Fragmented A6 records
|
||||
|
||||
If we are to use fragmented A6s (128bit splitted into multiple A6s), we
|
||||
have a lot of issues/worries.
|
||||
|
||||
If we are to resolve IPv6 addresses using fragmented A6 records, we need
|
||||
to query DNS multiple times to resolve a single DNS name into an IPv6
|
||||
address. Since there has been no DNS record types that require multiple
|
||||
queries for resolution of the address itself, we have very little
|
||||
experience on such resource records.
|
||||
|
||||
There will be more delays in name resolution compared to queries to
|
||||
A/AAAA records. If we define a record with more number of fragments,
|
||||
there will be more query roundtrips. There are only few possibilities
|
||||
to query fragments in parallel. In the above example, we can resolve
|
||||
A.NET.IP6.C.NET and A.NET.IP6.D.NET in parallel, but not others.
|
||||
|
||||
At this moment, there is very little documents available, regarding to
|
||||
the relationship between DNS record TTL and the query delays. For
|
||||
example, if the DNS record TTL is smaller than the communication delays
|
||||
between the querier and the DNS servers, what should happen?
|
||||
|
||||
o If we compute DNS record TTL based on the wallclock on the DNS server
|
||||
side, the DNS records are already expired and the querier will not be
|
||||
able to reassemble a complete IPv6 record. Worse, by setting up
|
||||
records with very low TTL, we can let recursive DNS resolvers to go
|
||||
into infinite loop by letting them chase a wrong A6 chain (see the
|
||||
section on security considration) [BIND 9.2.0snap: resolver does not
|
||||
go into infinite loop, meaning that BIND 9.2.0snap resolver does not
|
||||
really honor DNS record TTL during A6 reassembly].
|
||||
|
||||
o If we compute it starting from the time the querier got the record, we
|
||||
will have some jitter in TTL computation among multiple queriers. If
|
||||
the query delays are long enough, the querier would end up having
|
||||
inconsistent A6 fragments, and the IPv6 address can be bogus after
|
||||
reassembly. With record types other than A6, we had no such problem,
|
||||
since we have never tried to reassemble an address out of multiple DNS
|
||||
records (with CNAME chain chasing a similar problem can arise, but the
|
||||
failure mode is much simpler to diagnose as the records are considered
|
||||
as an atomic entity).
|
||||
|
||||
Some says that caches will avoid querying fragmented A6s again and
|
||||
again. However, most of the library resolver implementations do not
|
||||
cache anything. The traffic between library resolver and the first-hop
|
||||
nameserver will not be decreased by the cached records. The TTL problem
|
||||
(see above) is unavoidable for the library resolver without cache. [XXX
|
||||
will they interpret TTL field? BIND8 resolver does not]
|
||||
|
||||
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 5]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
If some of the fragments are DNSSEC-signed and some are not, how should
|
||||
we treat that address? RFC2874 section 6 briefly talks about it, not
|
||||
sure if complete answer is given.
|
||||
|
||||
It is much harder to implement A6 fragment reassemble code, than to
|
||||
implement AAAA record resolver. AAAA record resolver can be implemented
|
||||
as a straight extension of A record resolver.
|
||||
|
||||
o It is much harder to design timeout handling for the A6 reassembly.
|
||||
There would be multiple timeout parameters, including (1) communcation
|
||||
timeout for a single A6 fragment, (2) communcation timeout for the
|
||||
IPv6 address itself (total time needed for reassembly) and (3) TTL
|
||||
timeout for A6 fragment records.
|
||||
|
||||
o In the case of library resolver implementation, it is harder to deal
|
||||
with exceptions (signals in UNIX case) for the large code fragment for
|
||||
resolvers.
|
||||
|
||||
o When A6 prefix length field is not multiple of 8, address suffix
|
||||
portion needs to be shifted bitwise while A6 fragments are
|
||||
reassembled. Also, resolver implementations must be careful about
|
||||
overwraps of the bits. From our implementatation experiences, the
|
||||
logic gets very complex and we (unfortunately) expect to see a lot of
|
||||
security-critical bugs in the future.
|
||||
|
||||
In RFC2874, a suggestion is made to use limited number of fragments per
|
||||
an IPv6 address. However, there is no protocol limitation defined. The
|
||||
lack makes it easier for malicious parties to impose DoS attacks using
|
||||
lots of A6 fragments (see the section on security consideration). [BIND
|
||||
9.2.0snap: The implementation limits the number of fragments within an
|
||||
A6 chain to be smaller than 16; It is not a protocol limitation but an
|
||||
implementation choice. Not sure if it is the right choice or not]
|
||||
|
||||
With fragmented A6 records, in multi-prefix network configuration, it is
|
||||
not possible for us to limit the address on the DNS database to the
|
||||
specific set of records, like for load distribution purposes. Consider
|
||||
the following example. Even if we would like to advertise only
|
||||
2345:00D2:DA11:1:1234:5678:9ABC:DEF0 for N.X.EXAMPLE, it is not possible
|
||||
to do so. It becomes mandatory for us to define the whole IPv6 address
|
||||
by using ``A6 0'' for N.X.EXAMPLE, and in effect, the benefit of A6
|
||||
(renumber friendliness) goes away.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 6]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
; with the following record we would advertise both records
|
||||
$ORIGIN X.EXAMPLE.
|
||||
N A6 64 ::1234:5678:9ABC:DEF0 SUBNET-1.IP6
|
||||
M A6 64 ::2345:2345:2345:2345 SUBNET-1.IP6
|
||||
SUBNET-1.IP6 A6 0 2345:00C1:CA11:1::
|
||||
A6 0 2345:00D2:DA11:1::
|
||||
|
||||
; we need to do the following, jeopardizing renumbering
|
||||
; friendliness for N.X.EXAMPLE
|
||||
$ORIGIN X.EXAMPLE.
|
||||
N A6 0 2345:00C1:CA11:1:1234:5678:9ABC:DEF0
|
||||
M A6 64 ::2345:2345:2345:2345 SUBNET-1.IP6
|
||||
SUBNET-1.IP6 A6 0 2345:00C1:CA11:1::
|
||||
A6 0 2345:00D2:DA11:1::
|
||||
|
||||
A6 resource record type and A6 fragment/reassembly were introduced to
|
||||
help administrators on network renumber. When network gets renumbered,
|
||||
the administrator needs to update A6 fragment for the higher address
|
||||
bits (prefixes) only. Again, we will discuss the issues with
|
||||
renumbering in a dedicated section.
|
||||
|
||||
|
||||
3.3. Non-fragmented A6 records
|
||||
|
||||
There are proposals to use non-fragmented A6 records in most of the
|
||||
places, like ``A6 0 <128bit>'', so that we would be able to switch to
|
||||
fragmented A6 records when we find a need for A6.
|
||||
|
||||
>From the packet format point of view, the approach has no benefit
|
||||
against AAAA. Rather, there is a one-byte overhead to every
|
||||
(unfragmented) A6 record compared to a AAAA record.
|
||||
|
||||
If the nameserver/resolver programs hardcode A6 processing to handle no
|
||||
fragments, there will be no future possibility for us to introduce
|
||||
fragmented A6 records. When there is no need for A6 reassembly, there
|
||||
will be no code deployment, and even if the reassembly code gets
|
||||
deployed they will not be tested enough. The author believes that the
|
||||
``prepare for the future, use non-fragmented A6'' argument is not
|
||||
worthwhile.
|
||||
|
||||
In the event of renumbering, non-fragmented A6 record has the same
|
||||
property as AAAA (the whole zone file has to be updated).
|
||||
|
||||
3.4. AAAA synthesis (A6 and AAAA hybrid approach)
|
||||
|
||||
At this moment, end hosts support AAAA records only. Some people would
|
||||
like to see A6 deployment in DNS databases even with the lack of end
|
||||
hosts support. To workaround the deployment issues of A6, the following
|
||||
approach is proposed in IETF50 dnsext working group slot. It is called
|
||||
``AAAA synthesis'' [Austein, 2001] :
|
||||
|
||||
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 7]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
o Deploy A6 DNS records worldwide. The proposal was not specific about
|
||||
whether we would deploy fragmented A6 records, or non-fragmented A6
|
||||
records (``A6 0'').
|
||||
|
||||
o When a host queries AAAA record to a DNS server, the DNS server
|
||||
queries A6 fragments, reassemble it, and respond with a AAAA record.
|
||||
|
||||
The approach needs to be diagnosed/specified in far more detail. For
|
||||
example, the following questions need to be answered:
|
||||
|
||||
o What is the DNS error code against AAAA querier, if the A6 reassembly
|
||||
fails?
|
||||
|
||||
o What TTL should the synthesized AAAA record have? [BIND 9.2.0snap
|
||||
uses TTL=0]
|
||||
|
||||
o Which nameserver should synthesize the AAAA record, in the DNS
|
||||
recursize query chain? Is the synthesis mandatory for every DNS
|
||||
server implementation?
|
||||
|
||||
o What should we do if the A6 reassembly takes too much time?
|
||||
|
||||
o What should we do about DNSSEC signatures?
|
||||
|
||||
o What if the resolver wants no synthesis? Do we want to have a flag
|
||||
bit in DNS packet, to enable/disable AAAA synthesis?
|
||||
|
||||
o Relationships between A6 TTL, AAAA TTL, A6 query timeouts, AAAA query
|
||||
timeouts, and other timeout parameters?
|
||||
|
||||
The approach seems to be vulnerable against DoS attacks, because the
|
||||
nameserver reassembles A6 fragments on behalf of the AAAA querier. See
|
||||
security consideration section for more details.
|
||||
|
||||
3.5. Issues in keeping both AAAA and A6
|
||||
|
||||
If we are to keep both AAAA and A6 records onto the worldwide DNS
|
||||
database, it would impose more query delays to the client resolvers.
|
||||
Suppose we have a dual-stack host implementation. If they need to
|
||||
resolve a name into addresses, the node would need to query in the
|
||||
following order (in the order which RFC2874 suggests):
|
||||
|
||||
o Query A6 records, and get full IPv6 addresses by chasing and
|
||||
reassembling A6 fragment chain.
|
||||
|
||||
o Query AAAA records.
|
||||
|
||||
o Query A records.
|
||||
|
||||
o Sort the result based on destination address ordering rule. An
|
||||
example of the ordering rule is presented as a draft [Draves, 2001] .
|
||||
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 8]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
o Contact the destination addresses in sequence.
|
||||
|
||||
The ordering imposes additional delays to the resolvers. The above
|
||||
ordering would be necessary for all approaches that use A6, as there are
|
||||
existing AAAA records in the world.
|
||||
|
||||
|
||||
4. Network renumbering
|
||||
|
||||
Some says that there will be more frequent renumbers in IPv6 network
|
||||
operation, and A6 is necessary to reduce the zone modification cost as
|
||||
well as zone signing costs on renumber operation.
|
||||
|
||||
It is not clear if we really want to renumber that frequently. With
|
||||
IPv6, it should be easier for ISPs to assign addresses statically to the
|
||||
downstream customers, rather than dynamically like we do in IPv4 dialup
|
||||
connectivity today. If ISPs do assign static IPv6 address block to the
|
||||
customers, there is no need to renumber customer network that frequently
|
||||
(unless the customer decides to switch the upstream ISPs that often).
|
||||
NOTE: Roaming dialup users, like those who carry laptop computers
|
||||
worldwide, seems to have a different issue from stationary dialup users.
|
||||
See [Hagino, 2000] for more discussions.
|
||||
|
||||
It is questionable if it is possible to renumber IPv6 networks more
|
||||
frequently than with IPv4. Router renumbering protocol [Crawford, 2000]
|
||||
, IPv6 host autoconfiguration and IPv6 address lifetime [Thomson, 1998]
|
||||
can help us renumber the IPv6 network, however, network renumbering
|
||||
itself is not an easy task. If you would like to maintain reachability
|
||||
from the outside world, a site administrator needs to carefully
|
||||
coordinate site renumber. The minimal interval between renumber is
|
||||
restricted by DNS record timeouts, as DNS records will be cached around
|
||||
the world. If the TTL of DNS records are X, the interval between
|
||||
renumber must be longer than 2 * X. If we consider clients/servers that
|
||||
tries to validate addresses using reverse lookups, we also need to care
|
||||
about the relationship between IPv6 address lifetime [Thomson, 1998] and
|
||||
the interval between renumber. At IETF50 ipngwg session, there was a
|
||||
presentation by JINMEI Tatsuya regarding to site renumbering experiment.
|
||||
It is recommend to read through the IETF49 minutes and slides. [XXX
|
||||
Fred Baker had a draft on this - where?] For the network renumbering to
|
||||
be successful, no configuration files should have hardcoded (numeric) IP
|
||||
addresses. It is a very hard requirement to meet. We fail to satisfy
|
||||
this in many of the network renumbering events, and the failure causes a
|
||||
lot of troubles.
|
||||
|
||||
At this moment there is no mechanism defined for ISPs to renumber
|
||||
downstream customers at will. Even though it may sound interesting for
|
||||
ISPs, it would cause a lot of (social and political) issues in doing so,
|
||||
so the author would say it is rather unrealistic to pursue this route.
|
||||
The only possible candidate, router renumbering protocol [Crawford,
|
||||
2000] does not really fit into the situation. The protocol is defined
|
||||
using IPsec authentication over site-local multicast packets. It would
|
||||
be cumbersome to run router renumbering protocol across multiple
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 9]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
administrative domains, as (1) customers will not want to share IPsec
|
||||
authentication key for routers with the upstream ISP, and (2) customer
|
||||
network will be administered as a separate site from the upstream ISP
|
||||
(Even though router renumbering protocol could be used with unicast
|
||||
addresess, it is not realistic to assume that we can maintain the list
|
||||
of IPv6 addresses for all the routers in both customers' and ISPs'
|
||||
networks).
|
||||
|
||||
A6 was designed to help administators update zone files during network
|
||||
renumbering events. Even with AAAA, zone file modification itself is
|
||||
easy; you can just query-replace the addresses in the zone files. The
|
||||
difficulty of network renumber comes from elsewhere.
|
||||
|
||||
With AAAA, we need to sign the whole zone again with DNSSEC keys, after
|
||||
renumbering the network. With A6, we need to sign upper bits only with
|
||||
DNSSEC keys. Therefore, A6 will impose less zone signing cost on the
|
||||
event of network renumbering. As seen above, it is questionable if we
|
||||
renumber network that often, so it is questionable if A6 brings us an
|
||||
overall benefit. Note, however, even if we use A6 to facilitate more
|
||||
frequent renumbering and lower signing cost, all glue records has to be
|
||||
installed as non-fragmented A6 records (``A6 0''), and required to be
|
||||
signed again on renumbering events.
|
||||
|
||||
|
||||
5. Security consideration
|
||||
|
||||
There are a couple of security worries mentioned in the above. To give
|
||||
a brief summary:
|
||||
|
||||
o There will be a higher delay imposed by query/reply roundtrips for
|
||||
fragmented A6 records. This could affect every services that relies
|
||||
upon DNS records.
|
||||
|
||||
o There is no upper limit defined for the number of A6 fragments for
|
||||
defining an IPv6 address. Malicious parties may try to put a very
|
||||
complex A6 chains and confuse nameservers worldwide.
|
||||
|
||||
o A6 resolver/nameserver is much harder to implement correctly than AAAA
|
||||
resolver/nameserver. A6 fragment reassembly code needs to take care
|
||||
of bitwise data reassembly, bitwise overwrap checks, and others. From
|
||||
our implementatation experiences, we expect to see a lot of security-
|
||||
issue bugs in the future.
|
||||
|
||||
o Interaction between DNS record TTL and the DNS query delays leads to
|
||||
non-trivial timeout problem.
|
||||
|
||||
We would like to go into more details for some of these.
|
||||
|
||||
5.1. DoS attacks against AAAA synthesis
|
||||
|
||||
When a DNS server is configured for AAAA synthesis, malicious parties
|
||||
can impose DoS attacks using the interaction between DNS TTL and query
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 10]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
delays. The attack can chew CPU time and/or memory, as well as some
|
||||
network bandwidth on a victim nameserver, by the following steps:
|
||||
|
||||
o A bad guy configures a record with very complex A6 chain, onto some
|
||||
nameservers. (the bad guy has to have controls over the servers).
|
||||
The nameservers can be located anywhere in the world. The A6 chain
|
||||
should have a very low TTL (like 1 or 0 seconds). The attack works
|
||||
better if we have higher delays between the victim nameservers and the
|
||||
nameservers that serve A6 fragments.
|
||||
|
||||
o The bad guy queries the record using AAAA request, to the victim
|
||||
nameserver.
|
||||
|
||||
o The victim nameserver will try to reassemble A6 fragments. During the
|
||||
reassembly process, the victim nameserver puts A6 fragments into the
|
||||
local cache. The cached records will expire during the reassembly
|
||||
process. The nameserver will need to query a lot of A6 fragments
|
||||
(more traffic). The server can go into an infinite loop, if it tries
|
||||
to query the expired A6 fragments again.
|
||||
|
||||
Note, however, this problem could be considered as a problem in
|
||||
recursize resolvers in general (like CNAME and NS chasing); A6 and AAAA
|
||||
synthesis makes the problem more apparent, and more complex to diagnose.
|
||||
|
||||
To remedy this problem, we have a couple of solutions:
|
||||
|
||||
(1) Deprecate A6 and deploy AAAA worldwide. If we do not have A6, the
|
||||
problem goes away.
|
||||
|
||||
(2) Even if we use A6, do not configure nameservers for AAAA synthesis.
|
||||
Deployment issues with existing IPv6 hosts get much harder.
|
||||
|
||||
(3) Impose a protocol limitation to the number of A6 fragments.
|
||||
|
||||
(4) Do not query the expired records in A6 chain again. In other words,
|
||||
implement resolvers that ignore TTL on DNS records. Not sure if it
|
||||
is the right thing to do.
|
||||
|
||||
|
||||
6. Conclusion
|
||||
|
||||
NOTE: the section expresses the impressions of the author.
|
||||
|
||||
A6/AAAA discussion has been an obstacle for IPv6 deployment, as the
|
||||
deployment of IPv6 NS recodrs have been deferred because of the
|
||||
discussion. The author do not see benefit in keeping both AAAA and A6
|
||||
records, as it imposes more query delays to the clients. So the author
|
||||
believes that we need to pick one of them.
|
||||
|
||||
Given the unlikeliness of frequent network renumbering, the author
|
||||
believes that the A6's benefit in lower zone signing cost is not
|
||||
significant. The benefit of A6 (in zone signing cost) is much less than
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 11]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
the expected complication that will be imposed by A6 operations.
|
||||
|
||||
>From the above discussions, the author suggests to keep AAAA and
|
||||
deprecate A6 (move A6 document to historic state). The author believes
|
||||
that A6 can cause a lot of problem than the benefits it may have. A6
|
||||
will make IPv6 DNS operation more complicated and vulnerable to attacks.
|
||||
AAAA is proven to work right in our IPv6 network operation since 1996.
|
||||
AAAA has been working just fine in existing IPv6 networks, and the
|
||||
author believes that it will in the coming days.
|
||||
|
||||
|
||||
|
||||
References
|
||||
|
||||
Thomson, 1995.
|
||||
S. Thomson and C. Huitema, "DNS Extensions to support IP version 6" in
|
||||
RFC1886 (December 1995). ftp://ftp.isi.edu/in-notes/rfc1886.txt.
|
||||
|
||||
Crawford, 2000.
|
||||
M. Crawford, C. Huitema, and S. Thomson, "DNS Extensions to Support IPv6
|
||||
Address Aggregation and Renumbering" in RFC2874 (July 2000).
|
||||
ftp://ftp.isi.edu/in-notes/rfc2874.txt.
|
||||
|
||||
Austein, 2001.
|
||||
R. Austein, "Tradeoffs in DNS support for IPv6" in draft-ietf-dnsext-
|
||||
ipv6-dns-tradeoffs-00.txt (July 2001). work in progress material.
|
||||
|
||||
Draves, 2001.
|
||||
Richard Draves, "Default Address Selection for IPv6" in draft-ietf-
|
||||
ipngwg-default-addr-select-04.txt (May 2001). work in progress material.
|
||||
|
||||
Hagino, 2000.
|
||||
Jun-ichiro Hagino and Kazu Yamamoto, "Requirements for IPv6 dialup PPP
|
||||
operation" in draft-itojun-ipv6-dialup-requirement-00.txt (July 2000).
|
||||
work in progress material.
|
||||
|
||||
Crawford, 2000.
|
||||
Matt Crawford, "Router Renumbering for IPv6" in RFC2894 (August 2000).
|
||||
ftp://ftp.isi.edu/in-notes/rfc2894.txt.
|
||||
|
||||
Thomson, 1998.
|
||||
S. Thomson and T. Narten, "IPv6 Stateless Address Autoconfiguration" in
|
||||
RFC2462 (December 1998). ftp://ftp.isi.edu/in-notes/rfc2462.txt.
|
||||
|
||||
|
||||
Change history
|
||||
|
||||
none.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 12]
|
||||
|
||||
|
||||
DRAFT Comparison of AAAA and A6 July 2001
|
||||
|
||||
Acknowledgements
|
||||
|
||||
The draft was written based on discussions in IETF IPv6 and dnsext
|
||||
working groups, and help from WIDE research group.
|
||||
|
||||
|
||||
Author's address
|
||||
|
||||
Jun-ichiro itojun HAGINO
|
||||
Research Laboratory, Internet Initiative Japan Inc.
|
||||
Takebashi Yasuda Bldg.,
|
||||
3-13 Kanda Nishiki-cho,
|
||||
Chiyoda-ku,Tokyo 101-0054, JAPAN
|
||||
Tel: +81-3-5259-6350
|
||||
Fax: +81-3-5259-6351
|
||||
Email: itojun@iijlab.net
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
HAGINO Expires: January 19, 2002 [Page 13]
|
||||
|
@ -1,394 +0,0 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
INTERNET-DRAFT Peter Koch
|
||||
Expires: September 2001 Universitaet Bielefeld
|
||||
Updates: RFC 1035 March 2001
|
||||
|
||||
A DNS RR Type for Lists of Address Prefixes (APL RR)
|
||||
draft-ietf-dnsext-apl-rr-02.txt
|
||||
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with
|
||||
all provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering
|
||||
Task Force (IETF), its areas, and its working groups. Note that
|
||||
other groups may also distribute working documents as Internet-
|
||||
Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet-Drafts as reference
|
||||
material or to cite them other than as "work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
Comments should be sent to the author or the DNSEXT WG mailing list
|
||||
<namedroppers@OPS.IETF.ORG>.
|
||||
|
||||
Abstract
|
||||
|
||||
The Domain Name System is primarily used to translate domain names
|
||||
into IPv4 addresses using A RRs. Several approaches exist to describe
|
||||
networks or address ranges. This document specifies a new DNS RR type
|
||||
"APL" for address prefix lists.
|
||||
|
||||
1. Conventions used in this document
|
||||
|
||||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
||||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
|
||||
document are to be interpreted as described in [RFC2119].
|
||||
|
||||
Domain names herein are for explanatory purposes only and should not
|
||||
be expected to lead to useful information in real life [RFC2606].
|
||||
|
||||
|
||||
|
||||
|
||||
Koch Expires September 2001 [Page 1]
|
||||
|
||||
INTERNET-DRAFT DNS APL RR March 2001
|
||||
|
||||
|
||||
2. Background
|
||||
|
||||
The Domain Name System [RFC1034], [RFC1035] provides a mechanism to
|
||||
associate addresses and other Internet infrastructure elements with
|
||||
hierarchically built domain names. Various types of resource records
|
||||
have been defined, especially those for IPv4 and IPv6 [RFC2874]
|
||||
addresses. In [RFC1101] a method is described to publish information
|
||||
about the address space allocated to an organisation. In older BIND
|
||||
versions, a weak form of controlling access to zone data was
|
||||
implemented using TXT RRs describing address ranges.
|
||||
|
||||
This document specifies a new RR type for address prefix lists.
|
||||
|
||||
3. APL RR Type
|
||||
|
||||
An APL record has the DNS type of "APL" [draft, IANA: not yet applied
|
||||
for] and a numeric value of [draft, IANA:to be assigned]. The APL RR
|
||||
is defined in the IN class only. APL RRs cause no additional section
|
||||
processing.
|
||||
|
||||
4. APL RDATA format
|
||||
|
||||
The RDATA section consists of zero or more items (<apitem>) of the
|
||||
form
|
||||
|
||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
||||
| ADDRESSFAMILY |
|
||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
||||
| PREFIX | N | AFDLENGTH |
|
||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
||||
/ AFDPART /
|
||||
| |
|
||||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
||||
|
||||
|
||||
ADDRESSFAMILY 16 bit unsigned value as assigned by IANA
|
||||
(see IANA Considerations)
|
||||
PREFIX 8 bit unsigned binary coded prefix length.
|
||||
Upper and lower bounds and interpretation of
|
||||
this value are address family specific.
|
||||
N negation flag, indicates the presence of the
|
||||
"!" character in the textual format. It has
|
||||
the value "1" if the "!" was given, "0" else.
|
||||
AFDLENGTH length in octets of the following address
|
||||
family dependent part (7 bit unsigned).
|
||||
AFDPART address family dependent part. See below.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Koch Expires September 2001 [Page 2]
|
||||
|
||||
INTERNET-DRAFT DNS APL RR March 2001
|
||||
|
||||
|
||||
This document defines the AFDPARTs for address families 1 (IPv4) and
|
||||
2 (IPv6). Future revisions may deal with additional address
|
||||
families.
|
||||
|
||||
4.1. AFDPART for IPv4
|
||||
|
||||
The encoding of an IPv4 address (address family 1) follows the
|
||||
encoding specified for the A RR by [RFC1035], section 3.4.1.
|
||||
|
||||
PREFIX specifies the number of bits of the IPv4 address starting at
|
||||
the most significant bit. Legal values range from 0 to 32.
|
||||
|
||||
Trailing zero octets do not bear any information (e.g. there is no
|
||||
semantic difference between 10.0.0.0/16 and 10/16) in an address
|
||||
prefix, so the shortest possible AFDLENGTH can be used to encode it.
|
||||
However, for DNSSEC [RFC2535] a single wire encoding must be used by
|
||||
all. Therefore the sender MUST NOT include trailing zero octets in
|
||||
the AFDPART regardless of the value of PREFIX. This includes cases in
|
||||
which AFDLENGTH times 8 results in a value less than PREFIX. The
|
||||
AFDPART is padded with zero bits to match a full octet boundary.
|
||||
|
||||
An IPv4 AFDPART has a variable length of 0 to 4 octets.
|
||||
|
||||
4.2. AFDPART for IPv6
|
||||
|
||||
The 128 bit IPv6 address (address family 2) is encoded in network
|
||||
byte order (high-order byte first).
|
||||
|
||||
PREFIX specifies the number of bits of the IPv6 address starting at
|
||||
the most significant bit. Legal values range from 0 to 128.
|
||||
|
||||
With the same reasoning as in 4.1 above, the sender MUST NOT include
|
||||
trailing zero octets in the AFDPART regardless of the value of
|
||||
PREFIX. This includes cases in which AFDLENGTH times 8 results in a
|
||||
value less than PREFIX. The AFDPART is padded with zero bits to
|
||||
match a full octet boundary.
|
||||
|
||||
An IPv6 AFDPART has a variable length of 0 to 16 octets.
|
||||
|
||||
5. Zone File Syntax
|
||||
|
||||
The textual representation of an APL RR in a DNS zone file is as
|
||||
follows:
|
||||
|
||||
<owner> IN <TTL> APL {[!]afi:address/prefix}*
|
||||
|
||||
The data consists of zero or more strings of the address family
|
||||
indicator <afi>, immediately followed by a colon ":", an address,
|
||||
|
||||
|
||||
|
||||
Koch Expires September 2001 [Page 3]
|
||||
|
||||
INTERNET-DRAFT DNS APL RR March 2001
|
||||
|
||||
|
||||
immediately followed by the "/" character, immediately followed by a
|
||||
decimal numeric value for the prefix length. Any such string may be
|
||||
preceded by a "!" character. The strings are separated by whitespace.
|
||||
The <afi> is the decimal numeric value of that particular address
|
||||
family.
|
||||
|
||||
5.1. Textual Representation of IPv4 Addresses
|
||||
|
||||
An IPv4 address in the <address> part of an <apitem> is in dotted
|
||||
quad notation, just as in an A RR. The <prefix> has values from the
|
||||
interval 0..32 (decimal).
|
||||
|
||||
5.2. Textual Representation of IPv6 Addresses
|
||||
|
||||
The representation of an IPv6 address in the <address> part of an
|
||||
<apitem> follows [RFC2373], section 2.2. Legal values for <prefix>
|
||||
are from the interval 0..128 (decimal).
|
||||
|
||||
6. APL RR usage
|
||||
|
||||
An APL RR with empty RDATA is valid and implements an empty list.
|
||||
Multiple occurrences of the same <apitem> in a single APL RR are
|
||||
allowed and MUST NOT be merged by a DNS server or resolver. <apitems>
|
||||
MUST be kept in order and MUST NOT be rearranged or aggregated.
|
||||
|
||||
A single APL RR may contain <apitems> belonging to different address
|
||||
families. The maximum number of <apitems> is upper bounded by the
|
||||
available RDATA space.
|
||||
|
||||
RRSets consisting of more than one APL RR are legal but the
|
||||
interpretation is left to the particular application.
|
||||
|
||||
7. Applicability Statement
|
||||
|
||||
The APL RR defines a framework without specifying any particular
|
||||
meaning for the list of prefixes. It is expected that APL RRs will
|
||||
be used in different application scenarios which have to be
|
||||
documented separately. Those scenarios may be distinguished by
|
||||
characteristic prefixes placed in front of the DNS owner name.
|
||||
|
||||
An APL application specification MUST include information on
|
||||
|
||||
o the characteristic prefix, if any
|
||||
|
||||
o how to interpret APL RRSets consisting of more than one RR
|
||||
|
||||
o how to interpret an empty APL RR
|
||||
|
||||
|
||||
|
||||
|
||||
Koch Expires September 2001 [Page 4]
|
||||
|
||||
INTERNET-DRAFT DNS APL RR March 2001
|
||||
|
||||
|
||||
o which address families are expected to appear in the APL RRs for
|
||||
that application
|
||||
|
||||
o how to deal with APL RR list elements which belong to other
|
||||
address families, including those not yet defined
|
||||
|
||||
o the exact semantics of list elements negated by the "!" character
|
||||
|
||||
Possible applications include the publication of address ranges
|
||||
similar to [RFC1101], description of zones built following [RFC2317]
|
||||
and in-band access control to limit general access or zone transfer
|
||||
(AXFR) availability for zone data held in DNS servers.
|
||||
|
||||
The specification of particular application scenarios is out of the
|
||||
scope of this document.
|
||||
|
||||
8. Examples
|
||||
|
||||
The following examples only illustrate some of the possible usages
|
||||
outlined in the previous section. None of those applications are
|
||||
hereby specified nor is it implied that any particular APL RR based
|
||||
application does exist now or will exist in the future.
|
||||
|
||||
; RFC 1101-like announcement of address ranges for foo.example
|
||||
foo.example. IN APL 1:192.168.32.0/21 !1:192.168.38.0/28
|
||||
|
||||
; CIDR blocks covered by classless delegation
|
||||
42.168.192.IN-ADDR.ARPA. IN APL ( 1:192.168.42.0/26 1:192.168.42.64/26
|
||||
1:192.168.42.128/25 )
|
||||
|
||||
; Zone transfer restriction
|
||||
_axfr.sbo.example. IN APL 1:127.0.0.1/32 1:172.16.64.0/22
|
||||
|
||||
; List of address ranges for multicast
|
||||
multicast.example. IN APL 1:224.0.0.0/4 2:FF00:0:0:0:0:0:0:0/8
|
||||
|
||||
Note that since trailing zeroes are ignored in the first APL RR the
|
||||
AFDLENGTH of both <apitems> is three.
|
||||
|
||||
9. Security Considerations
|
||||
|
||||
Any information obtained from the DNS should be regarded as unsafe
|
||||
unless techniques specified in [RFC2535] or [RFC2845] were used. The
|
||||
definition of a new RR type does not introduce security problems into
|
||||
the DNS, but usage of information made available by APL RRs may
|
||||
compromise security. This includes disclosure of network topology
|
||||
information and in particular the use of APL RRs to construct access
|
||||
control lists.
|
||||
|
||||
|
||||
|
||||
Koch Expires September 2001 [Page 5]
|
||||
|
||||
INTERNET-DRAFT DNS APL RR March 2001
|
||||
|
||||
|
||||
10. IANA Considerations
|
||||
|
||||
This section is to be interpreted as following [RFC2434].
|
||||
|
||||
This document does not define any new namespaces. It uses the 16 bit
|
||||
identifiers for address families maintained by IANA in
|
||||
http://www.iana.org/numbers.html.
|
||||
|
||||
IANA is asked to assign a numeric RR type value for APL.
|
||||
|
||||
11. Acknowledgements
|
||||
|
||||
The author would like to thank Mark Andrews, Olafur Gudmundsson, Ed
|
||||
Lewis, Thomas Narten, Erik Nordmark, and Paul Vixie for their review
|
||||
and constructive comments.
|
||||
|
||||
12. References
|
||||
|
||||
|
||||
[RFC1034] Mockapetris,P., "Domain Names - Concepts and Facilities",
|
||||
RFC 1034, STD 13, November 1987
|
||||
|
||||
[RFC1035] Mockapetris,P., "Domain Names - Implementation and
|
||||
Specification", RFC 1035, STD 13, November 1987
|
||||
|
||||
[RFC1101] Mockapetris,P., "DNS Encoding of Network Names and Other
|
||||
Types", RFC 1101, April 1989
|
||||
|
||||
[RFC2119] Bradner,S., "Key words for use in RFCs to Indicate
|
||||
Requirement Levels", RFC 2119, BCP 14, March 1997
|
||||
|
||||
[RFC2181] Elz,R., Bush,R., "Clarifications to the DNS
|
||||
Specification", RFC 2181, July 1997
|
||||
|
||||
[RFC2317] Eidnes,H., de Groot,G., Vixie,P., "Classless IN-ADDR.ARPA
|
||||
delegation", RFC 2317, March 1998
|
||||
|
||||
[RFC2373] Hinden,R., Deering,S., "IP Version 6 Addressing
|
||||
Architecture", RFC 2373, July 1998
|
||||
|
||||
[RFC2434] Narten,T., Alvestrand,H., "Guidelines for Writing an IANA
|
||||
Considerations Section in RFCs", RFC 2434, BCP 26, October
|
||||
1998
|
||||
|
||||
[RFC2535] Eastlake,D., "Domain Name System Security Extensions", RFC
|
||||
2535, March 1999
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Koch Expires September 2001 [Page 6]
|
||||
|
||||
INTERNET-DRAFT DNS APL RR March 2001
|
||||
|
||||
|
||||
[RFC2606] Eastlake,D., Panitz,A., "Reserved Top Level DNS Names",
|
||||
RFC 2606, BCP 32, June 1999
|
||||
|
||||
[RFC2845] Vixie,P., Gudmundsson,O., Eastlake,D., Wellington,B.,
|
||||
"Secret Key Transaction Authentication for DNS (TSIG)",
|
||||
RFC 2845, May 2000
|
||||
|
||||
[RFC2874] Crawford,M., Huitema,C., "DNS Extensions to Support IPv6
|
||||
Address Aggregation and Renumbering", RFC 2874, July 2000
|
||||
|
||||
|
||||
|
||||
13. Author's Address
|
||||
|
||||
Peter Koch
|
||||
Universitaet Bielefeld
|
||||
Technische Fakultaet
|
||||
D-33594 Bielefeld
|
||||
Germany
|
||||
+49 521 106 2902
|
||||
<pk@TechFak.Uni-Bielefeld.DE>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Koch Expires September 2001 [Page 7]
|
@ -1,992 +0,0 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
DNSEXT Working Group Olafur Gudmundsson
|
||||
INTERNET-DRAFT June 2003
|
||||
<draft-ietf-dnsext-delegation-signer-15.txt>
|
||||
|
||||
Updates: RFC 1035, RFC 2535, RFC 3008, RFC 3090.
|
||||
|
||||
|
||||
Delegation Signer Resource Record
|
||||
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with
|
||||
all provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering
|
||||
Task Force (IETF), its areas, and its working groups. Note that
|
||||
other groups may also distribute working documents as Internet-
|
||||
Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet-Drafts as reference
|
||||
material or to cite them other than as ``work in progress.''
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html
|
||||
|
||||
This draft expires on January 19, 2004.
|
||||
|
||||
Copyright Notice
|
||||
|
||||
Copyright (C) The Internet Society (2003). All rights reserved.
|
||||
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
The delegation signer (DS) resource record is inserted at a zone cut
|
||||
(i.e., a delegation point) to indicate that the delegated zone is
|
||||
digitally signed and that the delegated zone recognizes the indicated
|
||||
key as a valid zone key for the delegated zone. The DS RR is a
|
||||
modification to the DNS Security Extensions definition, motivated by
|
||||
operational considerations. The intent is to use this resource record
|
||||
as an explicit statement about the delegation, rather than relying on
|
||||
inference.
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 1]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
This document defines the DS RR, gives examples of how it is used and
|
||||
describes the implications on resolvers. This change is not backwards
|
||||
compatible with RFC 2535.
|
||||
This document updates RFC1035, RFC2535, RFC3008 and RFC3090.
|
||||
|
||||
Table of contents
|
||||
|
||||
Status of this Memo . . . . . . . . . . . . . . . . . . . . . . . . 1
|
||||
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
|
||||
Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . 2
|
||||
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
|
||||
1.2 Reserved Words" . . . . . . . . . . . . . . . . . . . . . . . . 4
|
||||
2 Specification of the Delegation key Signer" . . . . . . . . . . . 4
|
||||
2.1 Delegation Signer Record Model" . . . . . . . . . . . . . . . . 4
|
||||
2.2 Protocol Change" . . . . . . . . . . . . . . . . . . . . . . . . 5
|
||||
2.2.1 RFC2535 2.3.4 and 3.4: Special Considerations at
|
||||
Delegation Points" . . . . . . . . . . . . . . . . . . . . . . . . . 6
|
||||
2.2.1.1 Special processing for DS queries" . . . . . . . . . . . . 6
|
||||
2.2.1.2 Special processing when child and an ancestor share
|
||||
server" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
|
||||
2.2.1.3 Modification on use of KEY RR in the construction of
|
||||
Responses" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
|
||||
2.2.2 Signer's Name (replaces RFC3008 section 2.7)" . . . . . . . . 9
|
||||
2.2.3 Changes to RFC3090" . . . . . . . . . . . . . . . . . . . . . 9
|
||||
2.2.3.1 RFC3090: Updates to section 1: Introduction" . . . . . . . . 9
|
||||
2.2.3.2 RFC3090 section 2.1: Globally Secured" . . . . . . . . . . . 9
|
||||
2.2.3.3 RFC3090 section 3: Experimental Status." . . . . . . . . . 10
|
||||
2.2.4 NULL KEY elimination" . . . . . . . . . . . . . . . . . . . . 10
|
||||
2.3 Comments on Protocol Changes" . . . . . . . . . . . . . . . . . 10
|
||||
2.4 Wire Format of the DS record" . . . . . . . . . . . . . . . . . 11
|
||||
2.4.1 Justifications for Fields" . . . . . . . . . . . . . . . . . . 12
|
||||
2.5 Presentation Format of the DS Record" . . . . . . . . . . . . . 12
|
||||
2.6 Transition Issues for Installed Base" . . . . . . . . . . . . . 12
|
||||
2.6.1 Backwards compatibility with RFC2535 and RFC1035" . . . . . . 12
|
||||
2.7 KEY and corresponding DS record example" . . . . . . . . . . . . 13
|
||||
3 Resolver" . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
|
||||
3.1 DS Example" . . . . . . . . . . . . . . . . . . . . . . . . . . 14
|
||||
3.2 Resolver Cost Estimates for DS Records" . . . . . . . . . . . . 15
|
||||
4 Security Considerations: " . . . . . . . . . . . . . . . . . . . . 15
|
||||
5 IANA Considerations: " . . . . . . . . . . . . . . . . . . . . . . 16
|
||||
6 Acknowledgments" . . . . . . . . . . . . . . . . . . . . . . . . . 16
|
||||
Normative References: " . . . . . . . . . . . . . . . . . . . . . . 16
|
||||
Informational References" " . . . . . . . . . . . . . . . . . . . . 17
|
||||
Author Address" . . . . . . . . . . . . . . . . . . . . . . . . . . 17
|
||||
Full Copyright Statement" . . . . . . . . . . . . . . . . . . . . . 17
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 2]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
1 Introduction
|
||||
|
||||
Familiarity with the DNS system [RFC1035], DNS security extensions
|
||||
[RFC2535] and DNSSEC terminology [RFC3090] is important.
|
||||
|
||||
Experience shows that when the same data can reside in two
|
||||
administratively different DNS zones, the data frequently gets out of
|
||||
sync. The presence of an NS RRset in a zone anywhere other than at
|
||||
the apex indicates a zone cut or delegation. The RDATA of the NS
|
||||
RRset specifies the authoritative servers for the delegated or
|
||||
"child" zone. Based on actual measurements, 10-30% of all delegations
|
||||
on the Internet have differing NS RRsets at parent and child. There
|
||||
are a number of reasons for this, including a lack of communication
|
||||
between parent and child and bogus name servers being listed to meet
|
||||
registry requirements.
|
||||
|
||||
DNSSEC [RFC2535,RFC3008,RFC3090] specifies that a child zone needs to
|
||||
have its KEY RRset signed by its parent to create a verifiable chain
|
||||
of KEYs. There has been some debate on where the signed KEY RRset
|
||||
should reside, whether at the child [RFC2535] or at the parent. If
|
||||
the KEY RRset resides at the child, maintaining the signed KEY RRset
|
||||
in the child requires frequent two-way communication between the two
|
||||
parties. First the child transmits the KEY RRset to the parent and
|
||||
then the parent sends the signature(s) to the child. Storing the KEY
|
||||
RRset at the parent was thought to simplify the communication.
|
||||
|
||||
DNSSEC [RFC2535] requires that the parent store a NULL KEY record for
|
||||
an unsecure child zone to indicate that the child is unsecure. A NULL
|
||||
KEY record is a waste: an entire signed RRset is used to communicate
|
||||
effectively one bit of information--that the child is unsecure.
|
||||
Chasing down NULL KEY RRsets complicates the resolution process in
|
||||
many cases, because servers for both parent and child need to be
|
||||
queried for the KEY RRset if the child server does not return it.
|
||||
Storing the KEY RRset only in the parent zone simplifies this and
|
||||
would allow the elimination of the NULL KEY RRsets entirely. For
|
||||
large delegation zones the cost of NULL keys is a significant barrier
|
||||
to deployment.
|
||||
|
||||
Prior to the restrictions imposed by RFC3445[RFC3445], another
|
||||
implication of the DNSSEC key model is that the KEY record could be
|
||||
used to store public keys for other protocols in addition to DNSSEC
|
||||
keys. There are number of potential problems with this, including:
|
||||
1. The KEY RRset can become quite large if many applications and
|
||||
protocols store their keys at the zone apex. Possible protocols
|
||||
are IPSEC, HTTP, SMTP, SSH and others that use public key
|
||||
cryptography.
|
||||
2. The KEY RRset may require frequent updates.
|
||||
3. The probability of compromised or lost keys, which trigger
|
||||
emergency key rollover procedures, increases.
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 3]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
4. The parent may refuse to sign KEY RRsets with non-DNSSEC zone
|
||||
keys.
|
||||
5. The parent may not meet the child's expectations of turnaround
|
||||
time for resigning the KEY RRset.
|
||||
|
||||
Given these reasons, SIG@parent isn't any better than SIG/KEY@Child.
|
||||
|
||||
|
||||
1.2 Reserved Words
|
||||
|
||||
The key words "MAY","MAY NOT", "MUST", "MUST NOT", "REQUIRED",
|
||||
"RECOMMENDED", "SHOULD", and "SHOULD NOT" in this document are to be
|
||||
interpreted as described in RFC2119.
|
||||
|
||||
2 Specification of the Delegation key Signer
|
||||
|
||||
This section defines the Delegation Signer (DS) RR type (type code
|
||||
TBD) and the changes to DNS to accommodate it.
|
||||
|
||||
2.1 Delegation Signer Record Model
|
||||
|
||||
This document presents a replacement for the DNSSEC KEY record chain
|
||||
of trust [RFC2535] that uses a new RR that resides only at the
|
||||
parent. This record identifies the key(s) that the child uses to
|
||||
self-sign its own KEY RRset.
|
||||
|
||||
Even though DS identifies two roles for KEYs, Key Signing Key (KSK)
|
||||
and Zone Signing Key (ZSK), there is no requirement that zone use two
|
||||
different keys for these roles. It is expected that many small zones
|
||||
will only use one key, while larger zones will be more likely to use
|
||||
multiple keys.
|
||||
|
||||
The chain of trust is now established by verifying the parent KEY
|
||||
RRset, the DS RRset from the parent and the KEY RRset at the child.
|
||||
This is cryptographically equivalent to using just KEY records.
|
||||
|
||||
Communication between the parent and child is greatly reduced, since
|
||||
the child only needs to notify the parent about changes in keys that
|
||||
sign its apex KEY RRset. The parent is ignorant of all other keys in
|
||||
the child's apex KEY RRset. Furthermore, the child maintains full
|
||||
control over the apex KEY RRset and its content. The child can
|
||||
maintain any policies regarding its KEY usage for DNSSEC with minimal
|
||||
impact on the parent. Thus if the child wants to have frequent key
|
||||
rollover for its DNS zone keys, the parent does not need to be aware
|
||||
of it. The child can use one key to sign only its apex KEY RRset and
|
||||
a different key to sign the other RRsets in the zone.
|
||||
|
||||
This model fits well with a slow roll out of DNSSEC and the islands
|
||||
of security model. In this model, someone who trusts "good.example."
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 4]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
can preconfigure a key from "good.example." as a trusted key, and
|
||||
from then on trusts any data signed by that key or that has a chain
|
||||
of trust to that key. If "example." starts advertising DS records,
|
||||
"good.example." does not have to change operations by suspending
|
||||
self-signing. DS records can be used in configuration files to
|
||||
identify trusted keys instead of KEY records. Another significant
|
||||
advantage is that the amount of information stored in large
|
||||
delegation zones is reduced: rather than the NULL KEY record at every
|
||||
unsecure delegation demanded by RFC 2535, only secure delegations
|
||||
require additional information in the form of a signed DS RRset.
|
||||
|
||||
The main disadvantage of this approach is that verifying a zone's KEY
|
||||
RRset requires two signature verification operations instead of the
|
||||
one in RFC 2535 chain of trust. There is no impact on the number of
|
||||
signatures verified for other types of RRsets.
|
||||
|
||||
2.2 Protocol Change
|
||||
|
||||
All DNS servers and resolvers that support DS MUST support the OK bit
|
||||
[RFC3225] and a larger message size [RFC3226]. In order for a
|
||||
delegation to be considered secure the delegation MUST contain a DS
|
||||
RRset. If a query contains the OK bit, a server returning a referral
|
||||
for the delegation MUST include the following RRsets in the authority
|
||||
section in this order:
|
||||
If DS RRset is present:
|
||||
parent's copy of child's NS RRset
|
||||
DS and SIG(DS)
|
||||
If no DS RRset is present:
|
||||
parent's copy of child's NS RRset
|
||||
parent's zone NXT and SIG(NXT)
|
||||
|
||||
This increases the size of referral messages, possibly causing some
|
||||
or all glue to be omitted. If the DS or NXT RRsets with signatures do
|
||||
not fit in the DNS message, the TC bit MUST be set. Additional
|
||||
section processing is not changed.
|
||||
|
||||
A DS RRset accompanying a NS RRset indicates that the child zone is
|
||||
secure. If a NS RRset exists without a DS RRset, the child zone is
|
||||
unsecure (from the parents point of view). DS RRsets MUST NOT appear
|
||||
at non-delegation points or at a zone's apex.
|
||||
|
||||
Section 2.2.1 defines special considerations related to authoritative
|
||||
servers responding to DS queries and replaces RFC2535 sections 2.3.4
|
||||
and 3.4. Section 2.2.2 replaces RFC3008 section 2.7, and section
|
||||
2.2.3 updates RFC3090.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 5]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
2.2.1 RFC2535 2.3.4 and 3.4: Special Considerations at Delegation Points
|
||||
|
||||
DNS security views each zone as a unit of data completely under the
|
||||
control of the zone owner with each entry (RRset) signed by a special
|
||||
private key held by the zone manager. But the DNS protocol views the
|
||||
leaf nodes in a zone that are also the apex nodes of a child zone
|
||||
(i.e., delegation points) as "really" belonging to the child zone.
|
||||
The corresponding domain names appear in two master files and might
|
||||
have RRsets signed by both the parent and child zones' keys. A
|
||||
retrieval could get a mixture of these RRsets and SIGs, especially
|
||||
since one server could be serving both the zone above and below a
|
||||
delegation point [RFC 2181].
|
||||
|
||||
Each DS RRset stored in the parent zone MUST be signed by at least
|
||||
one of the parent zone's private keys. The parent zone MUST NOT
|
||||
contain a KEY RRset at any delegation point. Delegations in the
|
||||
parent MAY contain only the following RR types: NS, DS, NXT and SIG.
|
||||
The NS RRset MUST NOT be signed. The NXT RRset is the exceptional
|
||||
case: it will always appear differently and authoritatively in both
|
||||
the parent and child zones if both are secure.
|
||||
|
||||
A secure zone MUST contain a self-signed KEY RRset at its apex. Upon
|
||||
verifying the DS RRset from the parent, a resolver MAY trust any KEY
|
||||
identified in the DS RRset as a valid signer of the child's apex KEY
|
||||
RRset. Resolvers configured to trust one of the keys signing the KEY
|
||||
RRset MAY now treat any data signed by the zone keys in the KEY RRset
|
||||
as secure. In all other cases resolvers MUST consider the zone
|
||||
unsecure. A DS RRset MUST NOT appear at a zone's apex.
|
||||
|
||||
An authoritative server queried for type DS MUST return the DS RRset
|
||||
in the answer section.
|
||||
|
||||
|
||||
2.2.1.1 Special processing for DS queries
|
||||
|
||||
When a server is authoritative for the parent zone at a delegation
|
||||
point and receives a query for the DS record at that name, it MUST
|
||||
answer based on data in the parent zone, return DS or negative
|
||||
answer. This is true whether or not it is also authoritative for the
|
||||
child zone.
|
||||
|
||||
When the server is authoritative for the child zone at a delegation
|
||||
point but not the parent zone, there is no natural response, since
|
||||
the child zone is not authoritative for the DS record at the zone's
|
||||
apex. As these queries are only expected to originate from recursive
|
||||
servers which are not DS-aware, the authoritative server MUST answer
|
||||
with:
|
||||
RCODE: NOERROR
|
||||
AA bit: set
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 6]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
Answer Section: Empty
|
||||
Authority Section: SOA [+ SIG(SOA) + NXT + SIG(NXT)]
|
||||
|
||||
That is, it answers as if it is authoritative and the DS record does
|
||||
not exist. DS-aware recursive servers will query the parent zone at
|
||||
delegation points, so will not be affected by this.
|
||||
|
||||
A server authoritative for only the child zone, that is also a
|
||||
caching server MAY (if the RD bit is set in the query) perform
|
||||
recursion to find the DS record at the delegation point, or MAY
|
||||
return the DS record from its cache. In this case, the AA bit MUST
|
||||
not be set in the response.
|
||||
|
||||
|
||||
2.2.1.2 Special processing when child and an ancestor share server
|
||||
|
||||
Special rules are needed to permit DS RR aware servers to gracefully
|
||||
interact with older caches which otherwise might falsely label a
|
||||
server as lame because of the placement of the DS RR set.
|
||||
|
||||
Such a situation might arise when a server is authoritative for both
|
||||
a zone and it's grandparent, but not the parent. This sounds like an
|
||||
obscure example, but it is very real. The root zone is currently
|
||||
served on 13 machines, and "root-servers.net." is served on 4 of the
|
||||
same 13, but "net." is served elsewhere.
|
||||
|
||||
When a server receives a query for (<QNAME>, DS, <QCLASS>), the
|
||||
response MUST be determined from reading these rules in order:
|
||||
|
||||
|
||||
1) If the server is authoritative for the zone that holds the DS RR
|
||||
set (i.e., the zone that delegates <QNAME>, aka the "parent" zone),
|
||||
the response contains the DS RR set as an authoritative answer.
|
||||
|
||||
2) If the server is offering recursive service and the RD bit is set
|
||||
in the query, the server performs the query itself (according to the
|
||||
rules for resolvers described below) and returns its findings.
|
||||
|
||||
3) If the server is authoritative for the zone that holds the
|
||||
<QNAME>'s SOA RR set, the response is an authoritative negative
|
||||
answer as described in 2.2.1.1.
|
||||
|
||||
4) If the server is authoritative for a zone or zones above the
|
||||
QNAME, a referral to the most enclosing zone's servers is made.
|
||||
|
||||
5) If the server is not authoritative for any part of the QNAME, a
|
||||
response indicating a lame server for QNAME is given.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 7]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
Using these rules will require some special processing on the part of
|
||||
a DS RR aware resolver. To illustrate this, an example is used.
|
||||
|
||||
Assuming a server is authoritative for roots.example.net. and for the
|
||||
root zone but not the intervening two zones (or the intervening two
|
||||
label deep zone). Assume that QNAME=roots.example.net., QTYPE=DS,
|
||||
and QCLASS=IN.
|
||||
|
||||
The resolver will issue this request (assuming no cached data)
|
||||
expecting a referral to a net. server. Instead, rule number 3 above
|
||||
applies and a negative answer is returned by the server. The
|
||||
reaction by the resolver is not to accept this answer as final as it
|
||||
can determine from the SOA RR in the negative answer the context
|
||||
within which the server has answered.
|
||||
|
||||
A solution to this is to instruct the resolver to hunt for the
|
||||
authoritative zone of the data in a brute force manner.
|
||||
|
||||
This can be accomplished by taking the owner name of the returned SOA
|
||||
RR and striping off enough left-hand labels until a successful NS
|
||||
response is obtained. A successful response here means that the
|
||||
answer has NS records in it. (Entertaining the possibility that a
|
||||
cut point can be two labels down in a zone.)
|
||||
|
||||
Returning to the example, the response will include a negative answer
|
||||
with either the SOA RR for "roots.example.net." or "example.net."
|
||||
depending on whether roots.example.net is a delegated domain. In
|
||||
either case, removing the left most label of the SOA owner name will
|
||||
lead to the location of the desired data.
|
||||
|
||||
|
||||
2.2.1.3 Modification on use of KEY RR in the construction of Responses
|
||||
|
||||
This section updates RFC2535 section 3.5 by replacing it with the
|
||||
following:
|
||||
|
||||
A query for KEY RR MUST NOT trigger any additional section
|
||||
processing. Security aware resolvers will include corresponding SIG
|
||||
records in the answer section.
|
||||
|
||||
KEY records SHOULD NOT be added to the additional records section in
|
||||
response to any query.
|
||||
|
||||
RFC2535 specified that KEY records be added to the additional section
|
||||
when SOA or NS records where included in an answer. This was done to
|
||||
reduce round trips (in the case of SOA) and to force out NULL KEYs
|
||||
(in the NS case). As this document obsoletes NULL keys there is no
|
||||
need for the inclusion of KEYs with NSs. Furthermore as SOAs are
|
||||
included in the authority section of negative answers, including the
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 8]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
KEYs each time will cause redundant transfers of KEYs.
|
||||
|
||||
RFC2535 section 3.5 also included rule for adding the KEY RRset to
|
||||
the response for a query for A and AAAA types. As Restrict
|
||||
KEY[RFC3445] eliminated use of KEY RR by all applications this rule
|
||||
is no longer needed.
|
||||
|
||||
|
||||
2.2.2 Signer's Name (replaces RFC3008 section 2.7)
|
||||
|
||||
The signer's name field of a SIG RR MUST contain the name of the zone
|
||||
to which the data and signature belong. The combination of signer's
|
||||
name, key tag, and algorithm MUST identify a zone key if the SIG is
|
||||
to be considered material. This document defines a standard policy
|
||||
for DNSSEC validation; local policy MAY override the standard policy.
|
||||
|
||||
There are no restrictions on the signer field of a SIG(0) record.
|
||||
The combination of signer's name, key tag, and algorithm MUST
|
||||
identify a key if this SIG(0) is to be processed.
|
||||
|
||||
|
||||
2.2.3 Changes to RFC3090
|
||||
|
||||
A number of sections of RFC3090 need to be updated to reflect the DS
|
||||
record.
|
||||
|
||||
|
||||
2.2.3.1 RFC3090: Updates to section 1: Introduction
|
||||
|
||||
Most of the text is still relevant but the words ``NULL key'' are to
|
||||
be replaced with ``missing DS RRset''. In section 1.3 the last three
|
||||
paragraphs discuss the confusion in sections of RFC 2535 that are
|
||||
replaced in section 2.2.1 above. Therefore, these paragraphs are now
|
||||
obsolete.
|
||||
|
||||
|
||||
2.2.3.2 RFC3090 section 2.1: Globally Secured
|
||||
|
||||
Rule 2.1.b is replaced by the following rule:
|
||||
|
||||
2.1.b. The KEY RRset at a zone's apex MUST be self-signed by a
|
||||
private key whose public counterpart MUST appear in a zone signing
|
||||
KEY RR (2.a) owned by the zone's apex and specifying a mandatory-to-
|
||||
implement algorithm. This KEY RR MUST be identified by a DS RR in a
|
||||
signed DS RRset in the parent zone.
|
||||
|
||||
If a zone cannot get its parent to advertise a DS record for it, the
|
||||
child zone cannot be considered globally secured. The only exception
|
||||
to this is the root zone, for which there is no parent zone.
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 9]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
2.2.3.3 RFC3090 section 3: Experimental Status.
|
||||
|
||||
The only difference between experimental status and globally secured
|
||||
is the missing DS RRset in the parent zone. All locally secured zones
|
||||
are experimental.
|
||||
|
||||
|
||||
2.2.4 NULL KEY elimination
|
||||
|
||||
RFC3445 section 3 eliminates the top two bits in the flags field of
|
||||
KEY RR. These two bits were used to indicate NULL KEY or NO KEY.
|
||||
RFC3090 defines that zone is either secure or not, these rules
|
||||
eliminates the possible need to put NULL keys in the zone apex to
|
||||
indicate that the zone is not secured for a algorithm. Along with
|
||||
this document these other two eliminate all uses for the NULL KEY,
|
||||
This document obsoletes NULL KEY.
|
||||
|
||||
2.3 Comments on Protocol Changes
|
||||
|
||||
Over the years there have been various discussions surrounding the
|
||||
DNS delegation model, declaring it to be broken because there is no
|
||||
good way to assert if a delegation exists. In the RFC2535 version of
|
||||
DNSSEC, the presence of the NS bit in the NXT bit map proves there is
|
||||
a delegation at this name. Something more explicit is needed and the
|
||||
DS record addresses this need for secure delegations.
|
||||
|
||||
The DS record is a major change to DNS: it is the first resource
|
||||
record that can appear only on the upper side of a delegation. Adding
|
||||
it will cause interoperabilty problems and requires a flag day for
|
||||
DNSSEC. Many old servers and resolvers MUST be upgraded to take
|
||||
advantage of DS. Some old servers will be able to be authoritative
|
||||
for zones with DS records but will not add the NXT or DS records to
|
||||
the authority section. The same is true for caching servers; in
|
||||
fact, some might even refuse to pass on the DS or NXT records.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 10]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
2.4 Wire Format of the DS record
|
||||
|
||||
The DS (type=TDB) record contains these fields: key tag, algorithm,
|
||||
digest type, and the digest of a public key KEY record that is
|
||||
allowed and/or used to sign the child's apex KEY RRset. Other keys
|
||||
MAY sign the child's apex KEY RRset.
|
||||
|
||||
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
|
||||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| key tag | algorithm | Digest type |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| digest (length depends on type) |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| (SHA-1 digest is 20 bytes) |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
|
||||
| |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
|
||||
| |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
|
||||
The key tag is calculated as specified in RFC2535. Algorithm MUST be
|
||||
an algorithm number assigned in the range 1..251 and the algorithm
|
||||
MUST be allowed to sign DNS data. The digest type is an identifier
|
||||
for the digest algorithm used. The digest is calculated over the
|
||||
canonical name of the delegated domain name followed by the whole
|
||||
RDATA of the KEY record (all four fields).
|
||||
|
||||
digest = hash( canonical FQDN on KEY RR | KEY_RR_rdata)
|
||||
|
||||
KEY_RR_rdata = Flags | Protocol | Algorithm | Public Key
|
||||
|
||||
Digest type value 0 is reserved, value 1 is SHA-1, and reserving
|
||||
other types requires IETF standards action. For interoperabilty
|
||||
reasons, keeping number of digest algorithms low is strongly
|
||||
RECOMMENDED. The only reason to reserve additional digest types is
|
||||
to increase security.
|
||||
|
||||
DS records MUST point to zone KEY records that are allowed to
|
||||
authenticate DNS data. The indicated KEY records protocol field MUST
|
||||
be set to 3; flag field bit 7 MUST be set to 1. The value of other
|
||||
flag bits is not significant for the purposes of this document.
|
||||
|
||||
The size of the DS RDATA for type 1 (SHA-1) is 24 bytes, regardless
|
||||
of key size. New digest types probably will have larger digests.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 11]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
2.4.1 Justifications for Fields
|
||||
|
||||
The algorithm and key tag fields are present to allow resolvers to
|
||||
quickly identify the candidate KEY records to examine. SHA-1 is a
|
||||
strong cryptographic checksum: it is computationally infeasible for
|
||||
an attacker to generate a KEY record that has the same SHA-1 digest.
|
||||
Combining the name of the key and the key rdata as input to the
|
||||
digest provides stronger assurance of the binding. Having the key
|
||||
tag in the DS record adds greater assurance than the SHA-1 digest
|
||||
alone, as there are now two different mapping functions.
|
||||
|
||||
This format allows concise representation of the keys that the child
|
||||
will use, thus keeping down the size of the answer for the
|
||||
delegation, reducing the probability of DNS message overflow. The
|
||||
SHA-1 hash is strong enough to uniquely identify the key and is
|
||||
similar to the PGP key footprint. The digest type field is present
|
||||
for possible future expansion.
|
||||
|
||||
The DS record is well suited to listing trusted keys for islands of
|
||||
security in configuration files.
|
||||
|
||||
2.5 Presentation Format of the DS Record
|
||||
|
||||
The presentation format of the DS record consists of three numbers
|
||||
(key tag, algorithm and digest type) followed by the digest itself
|
||||
presented in hex:
|
||||
example. DS 12345 3 1 123456789abcdef67890123456789abcdef67890
|
||||
|
||||
2.6 Transition Issues for Installed Base
|
||||
|
||||
No backwards compatibility with RFC2535 is provided.
|
||||
|
||||
RFC2535-compliant resolvers will assume that all DS-secured
|
||||
delegations are locally secure. This is bad, but the DNSEXT Working
|
||||
Group has determined that rather than dealing with both
|
||||
RFC2535-secured zones and DS-secured zones, a rapid adoption of DS is
|
||||
preferable. Thus the only option for early adopters is to upgrade to
|
||||
DS as soon as possible.
|
||||
|
||||
2.6.1 Backwards compatibility with RFC2535 and RFC1035
|
||||
|
||||
This section documents how a resolver determines the type of
|
||||
delegation.
|
||||
RFC1035 delegation (in parent) has:
|
||||
|
||||
RFC1035 NS
|
||||
|
||||
RFC2535 adds the following two cases:
|
||||
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 12]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
Secure RFC2535: NS + NXT + SIG(NXT)
|
||||
NXT bit map contains: NS SIG NXT
|
||||
Unsecure RFC2535: NS + KEY + SIG(KEY) + NXT + SIG(NXT)
|
||||
NXT bit map contains: NS SIG KEY NXT
|
||||
KEY must be a NULL key.
|
||||
|
||||
DNSSEC with DS has the following two states:
|
||||
|
||||
Secure DS: NS + DS + SIG(DS)
|
||||
NXT bit map contains: NS SIG NXT DS
|
||||
Unsecure DS: NS + NXT + SIG(NXT)
|
||||
NXT bit map contains: NS SIG NXT
|
||||
|
||||
It is difficult for a resolver to determine if a delegation is secure
|
||||
RFC 2535 or unsecure DS. This could be overcome by adding a flag to
|
||||
the NXT bit map, but only upgraded resolvers would understand this
|
||||
flag, anyway. Having both parent and child signatures for a KEY RRset
|
||||
might allow old resolvers to accept a zone as secure, but the cost of
|
||||
doing this for a long time is much higher than just prohibiting RFC
|
||||
2535-style signatures at child zone apexes and forcing rapid
|
||||
deployment of DS-enabled servers and resolvers.
|
||||
|
||||
RFC 2535 and DS can in theory be deployed in parallel, but this would
|
||||
require resolvers to deal with RFC 2535 configurations forever. This
|
||||
document obsoletes the NULL KEY in parent zones, which is a difficult
|
||||
enough change that to cause a flag day.
|
||||
|
||||
2.7 KEY and corresponding DS record example
|
||||
|
||||
This is an example of a KEY record and the corresponding DS record.
|
||||
|
||||
dskey.example. KEY 256 3 1 (
|
||||
AQPwHb4UL1U9RHaU8qP+Ts5bVOU1s7fYbj2b3CCbzNdj
|
||||
4+/ECd18yKiyUQqKqQFWW5T3iVc8SJOKnueJHt/Jb/wt
|
||||
) ; key id = 28668
|
||||
DS 28668 1 1 49FD46E6C4B45C55D4AC69CBD3CD34AC1AFE51DE
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 13]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
3 Resolver
|
||||
|
||||
3.1 DS Example
|
||||
|
||||
To create a chain of trust, a resolver goes from trusted KEY to DS to
|
||||
KEY.
|
||||
|
||||
Assume the key for domain "example." is trusted. Zone "example."
|
||||
contains at least the following records:
|
||||
example. SOA <soa stuff>
|
||||
example. NS ns.example.
|
||||
example. KEY <stuff>
|
||||
example. NXT NS SOA KEY SIG NXT secure.example.
|
||||
example. SIG(SOA)
|
||||
example. SIG(NS)
|
||||
example. SIG(NXT)
|
||||
example. SIG(KEY)
|
||||
secure.example. NS ns1.secure.example.
|
||||
secure.example. DS tag=12345 alg=3 digest_type=1 <foofoo>
|
||||
secure.example. NXT NS SIG NXT DS unsecure.example.
|
||||
secure.example. SIG(NXT)
|
||||
secure.example. SIG(DS)
|
||||
unsecure.example NS ns1.unsecure.example.
|
||||
unsecure.example. NXT NS SIG NXT example.
|
||||
unsecure.example. SIG(NXT)
|
||||
|
||||
In zone "secure.example." following records exist:
|
||||
secure.example. SOA <soa stuff>
|
||||
secure.example. NS ns1.secure.example.
|
||||
secure.example. KEY <tag=12345 alg=3>
|
||||
secure.example. KEY <tag=54321 alg=5>
|
||||
secure.example. NXT <nxt stuff>
|
||||
secure.example. SIG(KEY) <key-tag=12345 alg=3>
|
||||
secure.example. SIG(SOA) <key-tag=54321 alg=5>
|
||||
secure.example. SIG(NS) <key-tag=54321 alg=5>
|
||||
secure.example. SIG(NXT) <key-tag=54321 alg=5>
|
||||
|
||||
In this example the private key for "example." signs the DS record
|
||||
for "secure.example.", making that a secure delegation. The DS record
|
||||
states which key is expected to sign the KEY RRset at
|
||||
"secure.example.". Here "secure.example." signs its KEY RRset with
|
||||
the KEY identified in the DS RRset, thus the KEY RRset is validated
|
||||
and trusted.
|
||||
|
||||
This example has only one DS record for the child, but parents MUST
|
||||
allow multiple DS records to facilitate key rollover and multiple KEY
|
||||
algorithms.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 14]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
The resolver determines the security status of "unsecure.example." by
|
||||
examining the parent zone's NXT record for this name. The absence of
|
||||
the DS bit indicates an unsecure delegation. Note the NXT record
|
||||
SHOULD only be examined after verifying the corresponding signature.
|
||||
|
||||
3.2 Resolver Cost Estimates for DS Records
|
||||
|
||||
From a RFC2535 resolver point of view, for each delegation followed
|
||||
to chase down an answer, one KEY RRset has to be verified.
|
||||
Additional RRsets might also need to be verified based on local
|
||||
policy (e.g., the contents of the NS RRset). Once the resolver gets
|
||||
to the appropriate delegation, validating the answer might require
|
||||
verifying one or more signatures. A simple A record lookup requires
|
||||
at least N delegations to be verified and one RRset. For a DS-enabled
|
||||
resolver, the cost is 2N+1. For an MX record, where the target of
|
||||
the MX record is in the same zone as the MX record, the costs are N+2
|
||||
and 2N+2, for RFC 2535 and DS, respectively. In the case of negatives
|
||||
answer the same ratios hold true.
|
||||
|
||||
The resolver have to do an extra query to get the DS record and this
|
||||
increases the overall cost of resolving this question, but this is
|
||||
never worse than chasing down NULL KEY records from the parent in
|
||||
RFC2535 DNSSEC.
|
||||
|
||||
DS adds processing overhead on resolvers and increases the size of
|
||||
delegation answers, but much less than storing signatures in the
|
||||
parent zone.
|
||||
|
||||
4 Security Considerations:
|
||||
|
||||
This document proposes a change to the validation chain of KEY
|
||||
records in DNSSEC. The change is not believed to reduce security in
|
||||
the overall system. In RFC2535 DNSSEC, the child zone has to
|
||||
communicate keys to its parent and prudent parents will require some
|
||||
authentication with that transaction. The modified protocol will
|
||||
require the same authentication, but allows the child to exert more
|
||||
local control over its own KEY RRset.
|
||||
|
||||
There is a remote possibility that an attacker could generate a valid
|
||||
KEY that matches all the DS fields, of a specific DS set, and thus
|
||||
forge data from the child. This possibility is considered
|
||||
impractical, as on average more than
|
||||
2 ^ (160 - <Number of keys in DS set>)
|
||||
keys would have to be generated before a match would be found.
|
||||
|
||||
An attacker that wants to match any DS record will have to generate
|
||||
on average at least 2^80 keys.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 15]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
The DS record represents a change to the DNSSEC protocol and there is
|
||||
an installed base of implementations, as well as textbooks on how to
|
||||
set up secure delegations. Implementations that do not understand the
|
||||
DS record will not be able to follow the KEY to DS to KEY chain and
|
||||
will consider all zones secured that way as unsecure.
|
||||
|
||||
5 IANA Considerations:
|
||||
|
||||
IANA needs to allocate an RR type code for DS from the standard RR
|
||||
type space (type 43 requested).
|
||||
|
||||
IANA needs to open a new registry for the DS RR type for digest
|
||||
algorithms. Defined types are:
|
||||
0 is Reserved,
|
||||
1 is SHA-1.
|
||||
Adding new reservations requires IETF standards action.
|
||||
|
||||
6 Acknowledgments
|
||||
|
||||
Over the last few years a number of people have contributed ideas
|
||||
that are captured in this document. The core idea of using one key to
|
||||
sign only the KEY RRset comes from discussions with Bill Manning and
|
||||
Perry Metzger on how to put in a single root key in all resolvers.
|
||||
Alexis Yushin, Brian Wellington, Sam Weiler, Paul Vixie, Jakob
|
||||
Schlyter, Scott Rose, Edward Lewis, Lars-Johan Liman, Matt Larson,
|
||||
Mark Kosters, Dan Massey, Olaf Kolman, Phillip Hallam-Baker, Miek
|
||||
Gieben, Havard Eidnes, Donald Eastlake 3rd., Randy Bush, David
|
||||
Blacka, Steve Bellovin, Rob Austein, Derek Atkins, Roy Arends, Mark
|
||||
Andrews, Harald Alvestrand, and others have provided useful comments.
|
||||
|
||||
Normative References:
|
||||
|
||||
[RFC1035] P. Mockapetris, ``Domain Names - Implementation and
|
||||
Specification'', STD 13, RFC 1035, November 1987.
|
||||
|
||||
[RFC2535] D. Eastlake, ``Domain Name System Security Extensions'', RFC
|
||||
2535, March 1999.
|
||||
|
||||
[RFC3008] B. Wellington, ``Domain Name System Security (DNSSEC) Signing
|
||||
Authority'', RFC 3008, November 2000.
|
||||
|
||||
[RFC3090] E. Lewis `` DNS Security Extension Clarification on Zone
|
||||
Status'', RFC 3090, March 2001.
|
||||
|
||||
[RFC3225] D. Conrad, ``Indicating Resolver Support of DNSSEC'', RFC
|
||||
3225, December 2001.
|
||||
|
||||
[RFC3445] D. Massey, S. Rose ``Limiting the scope of the KEY Resource
|
||||
Record (RR)``, RFC 3445, December 2002.
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 16]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
Informational References
|
||||
|
||||
[RFC2181] R. Elz, R. Bush, ``Clarifications to the DNS Specification'',
|
||||
RFC 2181, July 1997.
|
||||
|
||||
[RFC3226] O. Gudmundsson, ``DNSSEC and IPv6 A6 aware server/resolver
|
||||
message size requirements'', RFC 3226, December 2001.
|
||||
|
||||
Author Address
|
||||
|
||||
Olafur Gudmundsson
|
||||
3821 Village Park Drive
|
||||
Chevy Chase, MD, 20815
|
||||
USA
|
||||
<ogud@ogud.com>
|
||||
|
||||
Full Copyright Statement
|
||||
|
||||
Copyright (C) The Internet Society (2003). All Rights Reserved.
|
||||
|
||||
This document and translations of it may be copied and furnished to
|
||||
others, and derivative works that comment on or otherwise explain it
|
||||
or assist in its implementation may be prepared, copied, published
|
||||
and distributed, in whole or in part, without restriction of any
|
||||
kind, provided that the above copyright notice and this paragraph are
|
||||
included on all such copies and derivative works. However, this
|
||||
document itself may not be modified in any way, such as by removing
|
||||
the copyright notice or references to the Internet Society or other
|
||||
Internet organizations, except as needed for the purpose of
|
||||
developing Internet standards in which case the procedures for
|
||||
copyrights defined in the Internet Standards process must be
|
||||
followed, or as required to translate it into languages other than
|
||||
English.
|
||||
|
||||
The limited permissions granted above are perpetual and will not be
|
||||
revoked by the Internet Society or its successors or assigns.
|
||||
|
||||
This document and the information contained herein is provided on an
|
||||
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
|
||||
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
|
||||
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
|
||||
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 17]
|
||||
|
||||
INTERNET-DRAFT Delegation Signer Record June 2003
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Gudmundsson Expires January 2004 [Page 1]
|
@ -1,219 +0,0 @@
|
||||
|
||||
Internet Draft Naomasa Maruyama
|
||||
draft-ietf-idn-aceid-02.txt Yoshiro Yoneya
|
||||
Jun 19, 2000 JPNIC
|
||||
Expires Dec 19, 2001
|
||||
|
||||
Proposal for a determining process of ACE identifier
|
||||
|
||||
Status of this memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with all
|
||||
provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering Task
|
||||
Force (IETF), its areas, and its working groups. Note that other
|
||||
groups may also distribute working documents as Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet-Drafts as reference material
|
||||
or to cite them other than as "work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
Abstract
|
||||
|
||||
In IETF IDN WG, various kinds of ASCII Compatible Encodings,
|
||||
hereafter abbreviated as "ACE", are discussed as methods for realizing
|
||||
multilingual domain names (hereafter referred to as "MDN"). Each ACE
|
||||
uses a prefix or a suffix as an identifier in order for MDNs to fit
|
||||
within the existing ASCII domain name space. In other words,
|
||||
acceptance of an ACE proposal as an Internet standard means that the
|
||||
existing ASCII domain name space will be partitioned, in order to
|
||||
accommodate MDN space.
|
||||
|
||||
This document describes possible trouble in the standardization
|
||||
process of ACE, and proposes a solution for it.
|
||||
|
||||
|
||||
1. Present situation and concern
|
||||
|
||||
At present, some specifications relating to MDN specify their own
|
||||
ACE identifiers. In these drafts, multilingual domain names encoded
|
||||
into ASCII character strings, with the ACE identifiers in their heads
|
||||
or tails, are merely ASCII character strings. It is possible
|
||||
accidently or intentionally to register a domain name that is not an
|
||||
MDN but has the designated ACE identifier string.
|
||||
|
||||
If this kind of registration takes place, there is no warranty
|
||||
that the domain name will be consistent with MDN semantics.
|
||||
Furthermore, there is no warranty that the name, interpreted as an
|
||||
MDN, will comply with the registration policies of the registry, when
|
||||
the ACE identifier proposal is finally accepted as an Internet
|
||||
standard. This might cause problems with name disputes and/or
|
||||
revocations.
|
||||
|
||||
Therefore, the current situation letting independent ACE proposal
|
||||
authors arbitrarily select an ACE identifier, hence permitting domain
|
||||
name registrants registrer such names, may hinder deployment of MDN
|
||||
technology.
|
||||
|
||||
|
||||
2. Selecting ACE identifiers
|
||||
|
||||
In order to maintain a smooth standardization process for ACE,
|
||||
this document proposes a strategy for selecting and reserving of ACE
|
||||
identifiers and a method for assigning them.
|
||||
|
||||
|
||||
2.1 The ACE identifier candidates and tentative suspension of
|
||||
registering relevant domain names
|
||||
|
||||
All strings starting with a combination of two alpha-numericals,
|
||||
followed by two hyphens, are defined to be ACE prefix identifier
|
||||
candidates. All strings starting with two hyphens followed by two
|
||||
alpha-numericals are defined as ACE suffix identifier candidates. ACE
|
||||
prefix identifier candidates and ACE suffix identifier candidates are
|
||||
collectively called ACE identifier candidates.
|
||||
|
||||
All the domain name registries recognized by ICANN SHOULD
|
||||
tentatively suspend registration of domain names which have an ACE
|
||||
prefix identifier candidate at the head of at least one label of the
|
||||
domain name and those which have an ACE suffix identifier candidate at
|
||||
the tail of at least one label of the name. These domain names are
|
||||
collectively called "relevant domain names".
|
||||
|
||||
This suspension should be continued until September 1, 2001
|
||||
00:00:00 UTC.
|
||||
|
||||
|
||||
2.2 Survey of relevant domain name registration
|
||||
|
||||
All registries recognized by ICANN SHOULD conduct a survey about
|
||||
relevant domain names registered in their zone, and report, no later
|
||||
than August 11, 2001 00:00:00 UTC, all of the ACE identifier
|
||||
candidates which are used by relevant domain names.
|
||||
|
||||
|
||||
2.3 Selection of ACE identifiers and permanent blocking of
|
||||
relevant domain names
|
||||
|
||||
The IDN WG or other organ of IETF or ICANN MUST summarize the
|
||||
reports and list ACE identifier candidates that are not reported to be
|
||||
used in registered domain names by August 18, 2001 00:00:00 UTC, and
|
||||
select ten to twenty ACE prefix identifier candidates and ten to
|
||||
twenty ACE suffix identifier candidates for ACE identifiers. Among
|
||||
these twenty to forty ACE identifiers, one prefix identifier and one
|
||||
suffix identifier will be used for experiments. Others will be used,
|
||||
one by one as ACE standard evolves.
|
||||
|
||||
The list of ACE identifiers will be sent to IANA, and will be
|
||||
maintained by IANA from August 25, 2001 00:00:00 UTC. Domain names
|
||||
relevant to these identifiers SHOULD NOT be registered in any DNS
|
||||
zone, except for registration of multilingual domain names compliant
|
||||
to one of future IDN standards. This new restriction about the domain
|
||||
name space will be notified to all ICANN recognized registries by IANA
|
||||
immediately after it receives the list.
|
||||
|
||||
|
||||
2.4 Blocking of registration for relevant domain names
|
||||
|
||||
Domain names relevant to ACE identifiers selected by the procedure
|
||||
described in section 2.3 SHOULD NOT be registered in any zone of ICANN
|
||||
recognized registries except for registration of multilingual domain
|
||||
names compliant to one of future IDN standards. All ICANN recognized
|
||||
registries SHOULD implement this restriction no later than September 1,
|
||||
2001 00:00:00 UTC.
|
||||
|
||||
Registration for domain names relevant to ACE identifier
|
||||
candidates, tentatively suspended by 2.1, but not relevant to ACE
|
||||
identifiers selected by section 2.3 MAY be reopened from September 1,
|
||||
2001 00:00:00 UTC.
|
||||
|
||||
|
||||
3. Use of an ACE identifier in writing an ACE proposal
|
||||
|
||||
When writing an ACE proposal using an ACE identifier, the author
|
||||
SHOULD either describe the ACE identifier as "to be decided" and left
|
||||
to discretion of the IDN WG or other organ of IETF or ICANN, or use
|
||||
either of the ACE identifiers for experiment defined in section 2.3,
|
||||
with a unique version number added after or before the prefix or
|
||||
suffix.
|
||||
|
||||
If a proposal is validated and published as an Internet Draft, the
|
||||
IDN WG or other organ of IETF or ICANN MUST replace the "to be
|
||||
decided" part with an experimental identifier with a unique version
|
||||
number added after or before the prefix or the suffix.
|
||||
|
||||
|
||||
4. Determination of ACE identifier
|
||||
|
||||
When an Internet Draft relating to ACE is accepted as an Internet
|
||||
standard and becomes an RFC, IDN WG or other organ of IETF or ICANN
|
||||
MUST replace the experimental ACE identifier, augmented by the version
|
||||
number, with one of the ACE identifiers.
|
||||
|
||||
|
||||
5. Security considerations
|
||||
|
||||
None in particular.
|
||||
|
||||
|
||||
6. Changes from the previous version
|
||||
|
||||
We excluded suffixes of one hyphen followed by three alpha-
|
||||
numericals from the candidates. This is because we found that, as of
|
||||
Nov. 29, 2000, there were 23,921 domain names registered in the .JP
|
||||
space relevant to these suffixes. This was more than 10% of 227,852
|
||||
total registrations in the JPNIC database at the moment, and hence we
|
||||
felt these suffixes are not good candidates.
|
||||
|
||||
In addition to this and some minor linguistic corrections, we
|
||||
changed "The IDN WG" in section 2.3 to "The IDN WG or other organ of
|
||||
IETF or ICANN".
|
||||
|
||||
|
||||
7. References
|
||||
|
||||
[IDNREQ] Z Wenzel, J Seng, "Requirements of Internationalized Domain
|
||||
Names", draft-ietf-idn-requirements-03.txt, Jun 2000.
|
||||
|
||||
[RACE] P Hoffman, "RACE: Row-based ASCII Compatible Encoding for
|
||||
IDN", draft-ietf-idn-race-02.txt, Oct 2000.
|
||||
|
||||
[BRACE] A Costello, "BRACE: Bi-mode Row-based ASCII-Compatible
|
||||
Encoding for IDN", draft-ietf-idn-brace-00.txt, Sep 2000.
|
||||
|
||||
[LACE] P Hoffman, "LACE: Length-based ASCII Compatible Encoding for
|
||||
IDN", draft-ietf-idn-lace-00.txt, Nov 2000.
|
||||
|
||||
[VERSION] M Blanchet, "Handling versions of internationalized domain
|
||||
names protocols", draft-ietf-idn-version-00.txt, Nov 2000.
|
||||
|
||||
|
||||
8. Acknowledgements
|
||||
|
||||
We would like to express our hearty thanks to members of JPNIC IDN
|
||||
Task Force for valuable discussions about this issue. We also would
|
||||
like to express our appreciation to Mr. Dave Crocker for checking and
|
||||
correcting the preliminary version of this draft.
|
||||
|
||||
|
||||
9. Author's Address
|
||||
|
||||
Naomasa Maruyama
|
||||
Japan Network Information Center
|
||||
Fuundo Bldg 1F, 1-2 Kanda-ogawamachi
|
||||
Chiyoda-ku Tokyo 101-0052, Japan
|
||||
maruyama@nic.ad.jp
|
||||
|
||||
Yoshiro Yoneya
|
||||
Japan Network Information Center
|
||||
Fuundo Bldg 1F, 1-2 Kanda-ogawamachi
|
||||
Chiyoda-ku Tokyo 101-0052, Japan
|
||||
yone@nic.ad.jp
|
@ -1,454 +0,0 @@
|
||||
<EFBFBD>©ÀInternet Draft James SENG
|
||||
<draft-ietf-idn-cjk-01.txt> Yoshiro YONEYA
|
||||
11th Apr 2001 Kenny HUANG
|
||||
Expires 11 Oct 2001 KIM Kyongsok
|
||||
|
||||
Han Ideograph (CJK) for Internationalized Domain Names
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance
|
||||
with all provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet
|
||||
Engineering Task Force (IETF), its areas, and its working
|
||||
groups. Note that other groups may also distribute working
|
||||
documents as Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of
|
||||
six months and may be updated, replaced, or obsoleted by other
|
||||
documents at any time. It is inappropriate to use Internet-
|
||||
Drafts as reference material or to cite them other than as
|
||||
"work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
Abstract
|
||||
|
||||
During the development of Internationalized Domain Name (IDN), it is
|
||||
discovered that there is a substantial lack of information and
|
||||
misunderstanding on Han ideographs and its folding mechanism.
|
||||
|
||||
This document attempts to address some of the issues on doing han
|
||||
folding with respect to IDN. Hopefully, this will dispel some of the
|
||||
common misunderstanding of this problem and to discuss some of the
|
||||
issues with han ideograph and its folding mechanism.
|
||||
|
||||
This document addresses very specific problem to IDN and thus is not
|
||||
meant as a reference for generic Han folding. Generic Han folding are
|
||||
much more complicated and certainly beyond this document. However, the
|
||||
use of this document may be applicable to other areas that are related
|
||||
with names, e.g. Common Name Resolution Protocol [CNRP].
|
||||
|
||||
1. Definition and convention
|
||||
|
||||
Characters mentioned in this document are identified by their position
|
||||
or code point in the Unicode character set [UCS]. The notation U+12AB,
|
||||
for example, indicates the character at the position 12AB (hexadecimal)
|
||||
in the [UCS]. It is strongly recommended that a [UCS] table is available
|
||||
for reference for the ideograph described.
|
||||
|
||||
Han ideographs are defined as the Chinese ideographs starting from
|
||||
U+3400 to U+9FFF or commonly known as CJK Unification Ideographs. This
|
||||
covers Chinese 'hanzi' {U+6F22 U+5B57/U+6C49 U+5B57}, Japanese 'kanji'
|
||||
(U+6F22 U+5B57) and Korean 'hanja' {U+6F22 U+5B57/U+D55C U+C790}.
|
||||
Additional Han ideographs will appear in other location (not necessary
|
||||
in plane 0) in the future.
|
||||
|
||||
Conversion between ideographs can be done using four different
|
||||
approaches: Code-base substitution, character-based substitution,
|
||||
lexicon-based substitution and context-based substitution. Han folding
|
||||
refers only to code-base substitution, similar to case mapping of
|
||||
alphabetic characters.
|
||||
|
||||
2. Introduction
|
||||
|
||||
Traditionally, domain names have been case insensitive (as defined in
|
||||
[RFC1035] Section 2.3.3). While this is not a problem when domain names
|
||||
are restricted to English alphanumeric letters and digits, it becomes a
|
||||
serious problem for IDN. An important criterion for having a robust IDN
|
||||
is to have good normalization and canonicalization forms. This is to
|
||||
ensure domain name duplications are kept to the minimal.
|
||||
|
||||
Fortunately, Unicode Consortium is developing technical reports on
|
||||
canonicalization [UTR21] and normalization [UTR15]. Hence, it becomes
|
||||
simple for IDN to ride upon the work of Unicode and use these
|
||||
references.
|
||||
|
||||
Unfortunately, both [UTR15] and [UTR21] are limited in scope and do not
|
||||
address many other scripts. In particular, Han ideographs are not
|
||||
discussed in detail in these documents and most experts are quick to
|
||||
point out that this problem is technically impossible.
|
||||
|
||||
2.1 Han ideographs
|
||||
|
||||
While there are many forms or writing style for Chinese characters, the
|
||||
most common used 'zhengti' {U+6B63 U+4F53/U+6B63 U+9AD4} represent
|
||||
Chinese ideographs by radicals (U+2E80-U+2FDF) that is composed of
|
||||
simple strokes.
|
||||
|
||||
When the Unicode Consortium started work on Universal Character Set, it
|
||||
was suggested that Hanzi, Kanji and Hanja ideographs should be unified
|
||||
into a single code space. This resulted in the CJK Unification, whereby
|
||||
27,786 Han ideographs are allocated in U+3400-U+9FFF and U+F900-U+FAFF
|
||||
range. Another 41,000 Han ideographs will be added to Plane 2.
|
||||
|
||||
Ideographs are common in China, Korea and Japan but as ideographs spread
|
||||
and evolve, the form of the ideographs sometimes differs slightly from
|
||||
country to country. For example, the word 'villa' {U+838A} 'zhuang' in
|
||||
Chinese, in Japanese is 'sou' {U+8358}. These are given different code
|
||||
points in Unicode.
|
||||
|
||||
3. Chinese (Hanzi)
|
||||
|
||||
Chinese ideographs or hanzi {U+6F22 U+5B57/U+6C49 U+5B57} originated
|
||||
from pictograph. They are 'pictures' which evolved into ideographs
|
||||
during several thousand years. For instance, the ideograph for "hill"
|
||||
{U+5C71} still bears some resembles to 3 peaks of a hill.
|
||||
|
||||
Not all ideographs are pictograph. There are other classifications such
|
||||
as compound ideographs, phonetic ideographs etc. For example,
|
||||
'endurance' {U+5FCD} is a pierced 'knife' {U+5200} above the 'heart'
|
||||
{U+5FC3}, or as a Chinese saying goes, 'endurance is like having a
|
||||
pierced knife in your heart'.
|
||||
|
||||
Hence, almost all Han ideographs are associated with some meaning by
|
||||
itself which is very different from most other scripts. This causes some
|
||||
confusion that Han folding is a form of lexicon-substitution.
|
||||
|
||||
Chinese ideographs underwent a major change in the 1950s after the
|
||||
establishment of People's Republic of China. A committee on Language
|
||||
Reform was established in China whose activities include simplification
|
||||
of Chinese ideographs. The Simplified Chinese (SC) are used in China
|
||||
and Singapore and Traditional Chinese (TC) in Taiwan, Hong Kong PRC,
|
||||
Macau PRC, and most other oversea Chinese.
|
||||
|
||||
The process is to take complex ideographs and simplify them. The main
|
||||
purposes is to make it easier to remember and write and thus to raise
|
||||
the literacy of the population.
|
||||
|
||||
For example, 'lightning' TC {U+96FB} becomes SC {U+6535} (They drop the
|
||||
'rain' {U+96E8} part from the TC). In many cases, they bear no
|
||||
resemblance to any of the original traditional forms e.g. 'dragon' TC
|
||||
{U+9F8D} SC {U+9F99}. Two different TC may also have the same SC since
|
||||
it means fewer ideographs to learn, e.g. SC {U+53D1} can be {U+667C} or
|
||||
{U+9AEE} depending on semantics. The official 'Comprehensive List of
|
||||
Simplified Characters' latest published in 1986 listed 2244 SC
|
||||
[ZONGBIAO].
|
||||
|
||||
Therefore, the process of SC-to-TC is very complicated. It is not
|
||||
possible to do it accurately without considering the semantics of the
|
||||
phrase.
|
||||
|
||||
On the other hand, TC-to-SC is much simple although different TCs may
|
||||
map to one single SC. While Unicode does not handle TC & SC, in the
|
||||
informal [UNIHAN] document, it listed 2145 TC and its equivalent mapping
|
||||
of SC. However, because that document is informal and not part of the
|
||||
Unicode standard, it is incomplete and has mistakes in the code points.
|
||||
Hence, precise tables for TC-to-SC conversion have not been fully laid
|
||||
out.
|
||||
|
||||
In domain names, we are particularly interested in is to equivalences
|
||||
comparison of the names, and not converting SC-to-TC. Therefore, for
|
||||
this purpose, it is possible that equivalency matching be done in the
|
||||
TC-to-SC folding prior to comparison, similar to lower-case English
|
||||
strings before comparing them, e.g. 'taiwan' SC {U+53F0 U+6E7E} will
|
||||
match with TC {U+81FA U+5F4E} or TC {U+53F0 U+5F4E}.
|
||||
|
||||
The side effect of this method is that comparing SC {U+53D1} to TC
|
||||
{U+667C} or TC {U+9AEE} will both be positive. This implies that SC
|
||||
'hair' SC …ñ³…Åæ {U+5934 U+53D1} will match TC
|
||||
(U+982D U+9AEE). It will also match TC {U+982D U+9AEE} that does not
|
||||
have any meaning in Chinese.
|
||||
|
||||
It should also be noted that SC are not used together with TC. Hence,
|
||||
'hair' is either written as SC {U+5934 U+53D1} or TC {U+982D U+9AEE}
|
||||
but (almost) never {U+5934 U+9AEE} or {U+982D U+53D1}. So the problem
|
||||
of SC and TC may not too serious for IDN.
|
||||
|
||||
Unfortunately, when it comes to names in Chinese, places where SC are
|
||||
used (i.e. Singapore and China), traditional and simplified ideographs
|
||||
are sometimes mixed within a single name for artistic reasons. Some of
|
||||
them even 'create' ideographs for their names.
|
||||
|
||||
[Need to add a section on Bopomofo U+3118 to U+312A in future draft]
|
||||
|
||||
4. Korean (Hanja and Hangeul)
|
||||
|
||||
Korean is one of the first cultures to imported Chinese ideographs into
|
||||
Korean language as a written form. These Korean ideographs are known as
|
||||
'hanja' {U+6F22 U+5B57/U+D55C U+C790} and they are widely used until
|
||||
recently where 'hangeul' {U+D55C U+AE00} become more popular.
|
||||
|
||||
Hangeul {U+D55C U+AE00} is a systemic script designed by a 15th century
|
||||
ruler and linguistic expert, King Sejong {U+4E16 U+5B97}. It is based
|
||||
on the pronunciation of the Korean language, hanmal. A Korean syllable
|
||||
is composed of 'jamo' {U+5B57 U+6BCD/U+C790 U+BAA8} elements that
|
||||
represent different sound. Hence, unlike Han ideographs, each hangeul
|
||||
syllable does not have any meaning.
|
||||
|
||||
Each hanja ideographs can be represented by hangeul syllable. For
|
||||
example, 'samsung' hanja {U+4E09 U+661F} hangeul {U+C0BC U+C131}. Note
|
||||
that {U+4E09} is pronounced as 'sa-ah-am' or in jamo {U+3145} {U+314F}
|
||||
{U+3141}, which gives hangeul {U+C0BC}. While Jamo decompositions are
|
||||
described in [UTR15] in Form D decomposition, this document also
|
||||
suggested another hanguel canonical decomposition in Appendix A to
|
||||
accommodates both modern and old hangeul.
|
||||
[Need to fill up Appendix A when information is more complete]
|
||||
|
||||
Most hanja characters have only one pronunciation. However, some hanja
|
||||
pronunciation differs as according to orthography (same for Chinese &
|
||||
Japanese) or the position in a word, which make this more complex. And
|
||||
of course, conversation of Hangeul back to hanja is impossible by code
|
||||
substitution without consideration for semantics.
|
||||
|
||||
Korean also invented their own ideographs that are called 'gugja'
|
||||
{U+56FD U+5B57/U+AD6D U+C790}.
|
||||
|
||||
5. Japanese (Kanji, Hiragana, Katakana)
|
||||
|
||||
Japanese adopted Chinese ideograph from the Korean and the Chinese since
|
||||
the 5th century. Chinese ideographs in Japanese are known as 'kanji'
|
||||
{U+6F22 U+5B57}. They also developed their own syllabary hiragana
|
||||
{U+5E73 U+4EEE U+540D} (U+3040-U+309F) and katakana {U+7247 U+4EEE
|
||||
U+540D} (U+30A0-U+30FF), both are derivative of kanji that has same
|
||||
pronunciation. Hiragana is a simplified cursive form, for example, 'a'
|
||||
{U+3042} was derived from 'an' {U+5B89}. Katakana is a simplified part
|
||||
form, for example, 'a' {U+30A2} was derived from 'a' {U+963F}. However,
|
||||
kanji all remain very integrated within the Japanese language.
|
||||
|
||||
Japanese also invented ideographs known as 'kokuji' {U+56FD U+5B57}. For
|
||||
example, 'iwashi' {U+9C2F} is a Japanese kokuji ideograph. Kokuji are
|
||||
invented according to Han ligature rules. For example, 'touge' "mountain
|
||||
pass" {U+5CE0} is a conjunction of meaning with 'yama' "mountain"
|
||||
{U+5C71} + 'ue' "up" {U+4E0A} + 'shita' "down" {U+4E0B}.
|
||||
|
||||
Japanese is also a vocal language, i.e. the script itself is based on
|
||||
pronunciation. Each hiragana corresponding to one pronunciation and 48
|
||||
hiragana forms the basic of the Japanese language, including the less
|
||||
commonly used 'we' {U+3091}. Furthermore, hiragana has more 35 forms to
|
||||
represent voiced sound, P-sound, double consonant. For example, 'ga'
|
||||
{U+304C} is a voiced sound of 'ka' {U+304B}. Katakana is a mirror of
|
||||
hiragana with few more forms and they are used to integrate foreign
|
||||
words or phrases into Japanese, or to emphasize words or phrases even
|
||||
in Japanese, or to represent onomatopoeia. For example, 'hamburger'
|
||||
pronounced as 'han-baa-gaa' in Japanese is written as {U+30CF U+30F3
|
||||
U+30D0 U+30FC U+30AC U+30FC} instead of {U+306F U+3093 U+3070 U+3041
|
||||
U+304C U+3041} because it is a foreign word.
|
||||
|
||||
If Japanese uses hiragana and katakana only, then it is fairly obvious
|
||||
that written Japanese is going to be very long. Hence, kanji are used
|
||||
when referring to nouns or verbs. Each kanji corresponds to one or more
|
||||
hiragana characters. For example, 'japan' pronounced as 'nippon'
|
||||
{U+306B U+3063 U+307D U+3093} are written as {U+65E5 U+672C} instead.
|
||||
|
||||
Hiragana, like Korean jamo, has no meaning itself. And also, Kanji can
|
||||
take on different pronunciation (which means different hiragana)
|
||||
depending where and how it is use in the sentence. For example, 'sky'
|
||||
{U+7A7A} can be pronounced as {U+305D U+3089} or {U+30BD U+30E9}.
|
||||
|
||||
Hence, a code substitution between hiragana and kanji is impractical.
|
||||
|
||||
On the other hand, there are Kanji that has the same meaning with the
|
||||
same pronunciation and equivalent. For example, 'river' "kawa" can be
|
||||
either {U+5DDD} or {U+6CB3}. The only differential between the two
|
||||
ideographs is that it signifies the 'size of the river' (the latter is
|
||||
bigger river).
|
||||
|
||||
Japanese also reduce complex Chinese ideographs to a simplified form.
|
||||
For example, 'both' {U+5169} was simplified {U+4E21}. Note that Chinese
|
||||
simplified it to {U+4E24} instead. However, traditional Japanese kanji
|
||||
are seldom used nowadays beyond documenting old historical text that
|
||||
they are treated different from the more commonly used simplified form,
|
||||
or used to express proper noun such as person's name or trademarks.
|
||||
Hence, Han folding here is not recommended.
|
||||
|
||||
4. Vietnamese
|
||||
|
||||
While Vietnamese also adopted Chinese ideographs ('chu han') and created
|
||||
their own ideographs ('chu nom'), they were now replaced by romanized
|
||||
'quoc ngu' today. Hence, this document does not attempt to address any
|
||||
issues with 'chu han' or 'chu nom'.
|
||||
|
||||
|
||||
5. zVariant
|
||||
|
||||
Unicode has a three dimension conceptual model to Ideograph
|
||||
Unification. The three dimensions are semantic (X axis - meaning,
|
||||
function), abstract shape (Y-axis - general form) and actual shape
|
||||
(Z-axis ‚Çô instantiated, type-faced).
|
||||
|
||||
When two ideographs have similar etymology but are given two different
|
||||
code points in Unicode, they are known as zVariant ideograph i.e. they
|
||||
belong to the same 'Z' axis. For example, 'villa' {U+838A} and {U+8358}.
|
||||
|
||||
|
||||
6. Ideographic Description
|
||||
|
||||
In Unicode v3.0, an ideographic description (U+2FF0-U+2FFB) was
|
||||
introduced allowing Han ideograph to be constructed using radical
|
||||
(U+2E80-U+2FD5) and Han ideograph (U+3400-U+9FFF).
|
||||
|
||||
The intention of this description method is to allow ideograph that is
|
||||
not defined by Unicode to be described. Hence, it is not necessary that
|
||||
these ideograph can be display properly. In addition, this method are
|
||||
not deterministic and allowing same ideograph to be represented in
|
||||
different sequence.
|
||||
|
||||
For example, 'zong' {U+9B03} (for discussion sake, we are going to use
|
||||
an ideograph which is already in Unicode) can be decomposed to U+2FF1
|
||||
U+9ADF U+5B97 using descriptive code points and Unified Ideograph.
|
||||
U+9ADF can also be decomposed as U+2FF0 U+2ED2 U+2F3A and U+5B97 as
|
||||
U+2FF5 U+2F28 U+2F70. In addition, U+9ADF is equivalent to U+2FBD.
|
||||
Hence, if we were to use only descriptive code points and radicals only,
|
||||
we can get U+2FF1 U+2FBD U+2FF5 U+2F28 U+2F70 or U+2FF1 U+2FF0 U+2ED2
|
||||
U+2F3A U+2FF5 U+2F28 U+2F70.
|
||||
|
||||
In addition, certain radical has been simplified and thus, in some
|
||||
context, equivalent. For example, the radical for 'bird' can be either
|
||||
U+2EE6 or U+2FC3.
|
||||
|
||||
Hence, until there is a deterministic well-defined rule for
|
||||
ideographic description, ideographs formed by this method are not
|
||||
recommended for domain names use.
|
||||
|
||||
It should be noted that the Unicode Consortium never intended the
|
||||
ideographic description to be used in protocols like IDN where exact
|
||||
comparison must be done. But it is certainly desirable to this feature
|
||||
as it is commons for Chinese to invent ideographs for names by adding
|
||||
or removing radical from standard ideographs.
|
||||
|
||||
7. Mechanism
|
||||
|
||||
The implicit proposal in this document is that CJKV ideographs may or
|
||||
may not be "folded" for the purposes of comparison of domain names.
|
||||
|
||||
But if folding is required, there are four different ways that this
|
||||
folding could be done.
|
||||
|
||||
a) Folding by DNS clients, or by user agents
|
||||
b) Folding by DNS servers
|
||||
c) Folding by Domain Name registration services for the purposes of
|
||||
preventing confusing allocations CJKV Domain Names which would,
|
||||
if transcoded, be the same
|
||||
|
||||
Before we can give much more reaction, we need to know which use is
|
||||
planned.
|
||||
|
||||
The third use is important. It should be put in place. This problem can
|
||||
be reduced alternately by representing non-ASCII characters that are
|
||||
domain names or other URL characters using hex-escaped character
|
||||
references in HTML pages.
|
||||
|
||||
To characterize Han characters as ideographs or pictograms is
|
||||
inadequate, because most of the Han ideograph have both a phonetic and
|
||||
a semantic element. Indeed, this is enough to characterize Chinese
|
||||
writing as phonetic, though it is other things as well. Thus, it's
|
||||
difficult to comment on whether folding is useful for Chinese or not.
|
||||
|
||||
The first use has the problem that lightweight devices do not have
|
||||
enough room to fit a Unicode X-axis mapping table.
|
||||
|
||||
The second use has the problem that introducing mapping will limit the
|
||||
performance of DNS servers. Alphabetic case mapping can be performed
|
||||
using a single logical AND instruction; CJKV character folding requires
|
||||
a lookup table.
|
||||
|
||||
In alphabetic scripts, there is also requirement to fold Latin, Greek,
|
||||
Hebrew, Cyrillic, Hebrew and Arabic together. There may be a stronger
|
||||
requirement for CJKV characters.
|
||||
|
||||
Note also that because modern OS are Unicode based and have network-
|
||||
downloadable IMEs, "interoperability" is becoming less equivalent to
|
||||
"use BIG5 characters only" or "use GB2312 character only" or "use
|
||||
Shift-JIS characters only".
|
||||
|
||||
If conservative safety is really required, then
|
||||
1) find the x-axis characters which are available in all major CJK
|
||||
character sets used on the internet;
|
||||
2) only allow variants of those in domain names;
|
||||
3) when one variant is used, no other can be allocated. So comparisons
|
||||
are made on x-axis characters, but the license of that domain name
|
||||
can pick which y or z variants they wish to use..
|
||||
|
||||
Acknowledgement
|
||||
|
||||
The editor gratefully acknowledge the contributions of:
|
||||
|
||||
Paul Hoffman <phoffman@imc.org>
|
||||
Jiang Mingliang <jiang@i-DNS.net>
|
||||
Dongman Lee <dlee@icu.ac.kr>
|
||||
Karlsson Kent <keka@im.se>
|
||||
|
||||
Author(s)
|
||||
|
||||
James SENG ˆÄè†î¯…«Å
|
||||
i-DNS.net International Pte Ltd.
|
||||
8 Temasek Boulevard
|
||||
Suntec Tower 3 #24-02
|
||||
Singapore 038988
|
||||
Email: James@Seng.cc
|
||||
Tel: +65 2468208
|
||||
|
||||
Yoshiro YONEYA
|
||||
NTT Software Corporation
|
||||
Shinagawa IntercityBldg., B-13F
|
||||
2-15-2 Kohnan, Minato-ku Tokyo 108-6113 Japan
|
||||
Email: yone@po.ntts.co.jp
|
||||
Tel: +81-3-5782-7291
|
||||
|
||||
Kenny HUANG ‰©â…雷¢ä
|
||||
Geotempo International Ltd; TWNIC
|
||||
3F, No 16 Kang Hwa Street, Nei Hu
|
||||
Taipei 114, Taiwan
|
||||
Email: huangk@alum.sinica.edu
|
||||
Tel: +886-2-2658-6510
|
||||
|
||||
KIM Kyongsok/GIM Gyeongseog
|
||||
|
||||
References
|
||||
|
||||
[UNISTD3] The Unicode Standard v3.0. Unicode Consortium.
|
||||
[UCS] ISBN 0-201-61633-5
|
||||
|
||||
[IDN] "IETF Internationalized Domain Names Working Group",
|
||||
idn@ops.ietf.org, James Seng, Marc Blanchet
|
||||
|
||||
[CNRP] "Common Name Resolution Protocol",
|
||||
cnrp-ietf@lists.netsol.com, Leslie Daigle
|
||||
|
||||
[CJKV] CJKV Information Processing ISBN 1-56592-224-7
|
||||
|
||||
[C2C] The pitfalls and Complexities of Chinese to Chinese
|
||||
Conversion. http://www.basistech.com/articles/C2C.html,
|
||||
Jack Halpern, Jouni Kerman
|
||||
|
||||
[KANJIDIC] Sanseido‚ÇÖs Unicode Kanji Information Dictionary
|
||||
ISBN 4-385-13690-4
|
||||
|
||||
[UNICHART] Unicode chart http://charts.unicode.org/
|
||||
|
||||
[ZONGBIAO] Simplified Characters Standard Chart 2nd Edition, 1986
|
||||
|
||||
[UNIHAN] Unicode Han Database, Unicode Consortium
|
||||
ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt
|
||||
|
||||
[ISO11941] ISO TS 11941: Information and documentation ‚Çô
|
||||
Transliteration of Korean script into Latin characters.
|
||||
Technical Specification 11941. First edition. 1996-12-31.
|
||||
ISO (International Organization for Standardization).
|
||||
|
||||
[KimK 1990] "A New Proposal for a Standard Hangeul (or Korean Script)
|
||||
Code", KIM Kyongsok. Computer Standards & Interfaces,
|
||||
Vol. 9, No. 3, pp. 187-202, 1990.
|
||||
|
||||
[KimK 1992] "A common Approach to Designing the Hangeul Code and
|
||||
Keyboard", KIM Kyongsok. Computer Standards & Interfaces,
|
||||
Vol. 14, No. 4, pp. 297-325, Aug. 1992.
|
||||
|
||||
[KimK 1999] A Hangeul story inside computers. KIM, Kyongsok. Busan
|
||||
National University Press. 1999. [in Hangeul]
|
@ -1,864 +0,0 @@
|
||||
INTERNET-DRAFT Mark Welter
|
||||
draft-ietf-idn-dude-02.txt Brian W. Spolarich
|
||||
Expires 2001-Dec-07 Adam M. Costello
|
||||
2001-Jun-07
|
||||
|
||||
Differential Unicode Domain Encoding (DUDE)
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with
|
||||
all provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering
|
||||
Task Force (IETF), its areas, and its working groups. Note
|
||||
that other groups may also distribute working documents as
|
||||
Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six
|
||||
months and may be updated, replaced, or obsoleted by other documents
|
||||
at any time. It is inappropriate to use Internet-Drafts as
|
||||
reference material or to cite them other than as "work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html
|
||||
|
||||
Distribution of this document is unlimited. Please send comments to
|
||||
the authors or to the idn working group at idn@ops.ietf.org.
|
||||
|
||||
Abstract
|
||||
|
||||
DUDE is a reversible transformation from a sequence of nonnegative
|
||||
integer values to a sequence of letters, digits, and hyphens (LDH
|
||||
characters). DUDE provides a simple and efficient ASCII-Compatible
|
||||
Encoding (ACE) of Unicode strings [UNICODE] for use with
|
||||
Internationalized Domain Names [IDN] [IDNA].
|
||||
|
||||
Contents
|
||||
|
||||
1. Introduction
|
||||
2. Terminology
|
||||
3. Overview
|
||||
4. Base-32 characters
|
||||
5. Encoding procedure
|
||||
6. Decoding procedure
|
||||
7. Example strings
|
||||
8. Security considerations
|
||||
9. References
|
||||
A. Acknowledgements
|
||||
B. Author contact information
|
||||
C. Mixed-case annotation
|
||||
D. Differences from draft-ietf-idn-dude-01
|
||||
E. Example implementation
|
||||
|
||||
1. Introduction
|
||||
|
||||
The IDNA draft [IDNA] describes an architecture for supporting
|
||||
internationalized domain names. Each label of a domain name may
|
||||
begin with a special prefix, in which case the remainder of the
|
||||
label is an ASCII-Compatible Encoding (ACE) of a Unicode string
|
||||
satisfying certain constraints. For the details of the constraints,
|
||||
see [IDNA] and [NAMEPREP]. The prefix has not yet been specified,
|
||||
but see http://www.i-d-n.net/ for prefixes to be used for testing
|
||||
and experimentation.
|
||||
|
||||
DUDE is intended to be used as an ACE within IDNA, and has been
|
||||
designed to have the following features:
|
||||
|
||||
* Completeness: Every sequence of nonnegative integers maps to an
|
||||
LDH string. Restrictions on which integers are allowed, and on
|
||||
sequence length, may be imposed by higher layers.
|
||||
|
||||
* Uniqueness: Every sequence of nonnegative integers maps to at
|
||||
most one LDH string.
|
||||
|
||||
* Reversibility: Any Unicode string mapped to an LDH string can
|
||||
be recovered from that LDH string.
|
||||
|
||||
* Efficient encoding: The ratio of encoded size to original size
|
||||
is small. This is important in the context of domain names
|
||||
because [RFC1034] restricts the length of a domain label to 63
|
||||
characters.
|
||||
|
||||
* Simplicity: The encoding and decoding algorithms are reasonably
|
||||
simple to implement. The goals of efficiency and simplicity are
|
||||
at odds; DUDE places greater emphasis on simplicity.
|
||||
|
||||
An optional feature is described in appendix C "Mixed-case
|
||||
annotation".
|
||||
|
||||
2. Terminology
|
||||
|
||||
The key words "must", "shall", "required", "should", "recommended",
|
||||
and "may" in this document are to be interpreted as described in
|
||||
RFC 2119 [RFC2119].
|
||||
|
||||
LDH characters are the letters A-Z and a-z, the digits 0-9, and
|
||||
hyphen-minus.
|
||||
|
||||
A quartet is a sequence of four bits (also known as a nibble or
|
||||
nybble).
|
||||
|
||||
A quintet is a sequence of five bits.
|
||||
|
||||
Hexadecimal values are shown preceeded by "0x". For example, 0x60
|
||||
is decimal 96.
|
||||
|
||||
As in the Unicode Standard [UNICODE], Unicode code points are
|
||||
denoted by "U+" followed by four to six hexadecimal digits, while a
|
||||
range of code points is denoted by two hexadecimal numbers separated
|
||||
by "..", with no prefixes.
|
||||
|
||||
XOR means bitwise exclusive or. Given two nonnegative integer
|
||||
values A and B, A XOR B is the nonnegative integer value whose
|
||||
binary representation is 1 in whichever places the binary
|
||||
representations of A and B disagree, and 0 wherever they agree.
|
||||
For the purpose of applying this rule, recall that an integer's
|
||||
representation begins with an infinite number of unwritten zeros.
|
||||
In some programming languages, care may need to be taken that A and
|
||||
B are stored in variables of the same type and size.
|
||||
|
||||
3. Overview
|
||||
|
||||
DUDE encodes a sequence of nonnegative integral values as a sequence
|
||||
of LDH characters, although implementations will of course need to
|
||||
represent the output characters somehow, typically as ASCII octets.
|
||||
When DUDE is used to encode Unicode characters, the input values are
|
||||
Unicode code points (integral values in the range 0..10FFFF, but not
|
||||
D800..DFFF, which are reserved for use by UTF-16).
|
||||
|
||||
Each value in the input sequence is represented by one or more LDH
|
||||
characters in the encoded string. The value 0x2D is represented
|
||||
by hyphen-minus (U+002D). Each non-hyphen-minus character in
|
||||
the encoded string represents a quintet. A sequence of quintets
|
||||
represents the bitwise XOR between each non-0x2D integer and the
|
||||
previous one.
|
||||
|
||||
4. Base-32 characters
|
||||
|
||||
"a" = 0 = 0x00 = 00000 "s" = 16 = 0x10 = 10000
|
||||
"b" = 1 = 0x01 = 00001 "t" = 17 = 0x11 = 10001
|
||||
"c" = 2 = 0x02 = 00010 "u" = 18 = 0x12 = 10010
|
||||
"d" = 3 = 0x03 = 00011 "v" = 19 = 0x13 = 10011
|
||||
"e" = 4 = 0x04 = 00100 "w" = 20 = 0x14 = 10100
|
||||
"f" = 5 = 0x05 = 00101 "x" = 21 = 0x15 = 10101
|
||||
"g" = 6 = 0x06 = 00110 "y" = 22 = 0x16 = 10110
|
||||
"h" = 7 = 0x07 = 00111 "z" = 23 = 0x17 = 10111
|
||||
"i" = 8 = 0x08 = 01000 "2" = 24 = 0x18 = 11000
|
||||
"j" = 9 = 0x09 = 01001 "3" = 25 = 0x19 = 11001
|
||||
"k" = 10 = 0x0A = 01010 "4" = 26 = 0x1A = 11010
|
||||
"m" = 11 = 0x0B = 01011 "5" = 27 = 0x1B = 11011
|
||||
"n" = 12 = 0x0C = 01100 "6" = 28 = 0x1C = 11100
|
||||
"p" = 13 = 0x0D = 01101 "7" = 29 = 0x1D = 11101
|
||||
"q" = 14 = 0x0E = 01110 "8" = 30 = 0x1E = 11110
|
||||
"r" = 15 = 0x0F = 01111 "9" = 31 = 0x1F = 11111
|
||||
|
||||
The digits "0" and "1" and the letters "o" and "l" are not used, to
|
||||
avoid transcription errors.
|
||||
|
||||
A decoder must accept both the uppercase and lowercase forms of
|
||||
the base-32 characters (including mixtures of both forms). An
|
||||
encoder should output only lowercase forms or only uppercase forms
|
||||
(unless it uses the feature described in the appendix C "Mixed-case
|
||||
annotation").
|
||||
|
||||
5. Encoding procedure
|
||||
|
||||
All ordering of bits, quartets, and quintets is big-endian (most
|
||||
significant first).
|
||||
|
||||
let prev = 0x60
|
||||
for each input integer n (in order) do begin
|
||||
if n == 0x2D then output hyphen-minus
|
||||
else begin
|
||||
let diff = prev XOR n
|
||||
represent diff in base 16 as a sequence of quartets,
|
||||
as few as are sufficient (but at least one)
|
||||
prepend 0 to the last quartet and 1 to each of the others
|
||||
output a base-32 character corresponding to each quintet
|
||||
let prev = n
|
||||
end
|
||||
end
|
||||
|
||||
If an encoder encounters an input value larger than expected (for
|
||||
example, the largest Unicode code point is U+10FFFF, and nameprep
|
||||
[NAMEPREP03] can never output a code point larger than U+EFFFD),
|
||||
the encoder may either encode the value correctly, or may fail, but
|
||||
it must not produce incorrect output. The encoder must fail if it
|
||||
encounters a negative input value.
|
||||
|
||||
6. Decoding procedure
|
||||
|
||||
let prev = 0x60
|
||||
while the input string is not exhausted do begin
|
||||
if the next character is hyphen-minus
|
||||
then consume it and output 0x2D
|
||||
else begin
|
||||
consume characters and convert them to quintets until
|
||||
encountering a quintet whose first bit is 0
|
||||
fail upon encountering a non-base-32 character or end-of-input
|
||||
strip the first bit of each quintet
|
||||
concatenate the resulting quartets to form diff
|
||||
let prev = prev XOR diff
|
||||
output prev
|
||||
end
|
||||
end
|
||||
encode the output sequence and compare it to the input string
|
||||
fail if they do not match (case-insensitively)
|
||||
|
||||
The comparison at the end is necessary to guarantee the uniqueness
|
||||
property (there cannot be two distinct encoded strings representing
|
||||
the same sequence of integers). This check also frees the decoder
|
||||
from having to check for overflow while decoding the base-32
|
||||
characters. (If the decoder is one step of a larger decoding
|
||||
process, it may be possible to defer the re-encoding and comparison
|
||||
to the end of that larger decoding process.)
|
||||
|
||||
7. Example strings
|
||||
|
||||
The first several examples are nonsense strings of mostly unassigned
|
||||
code points intended to exercise the corner cases of the algorithm.
|
||||
|
||||
(A) u+0061
|
||||
DUDE: b
|
||||
|
||||
(B) u+2C7EF u+2C7EF
|
||||
DUDE: u6z2ra
|
||||
|
||||
(C) u+1752B u+1752A
|
||||
DUDE: tzxwmb
|
||||
|
||||
(D) u+63AB1 u+63ABA
|
||||
DUDE: yv47bm
|
||||
|
||||
(E) u+261AF u+261BF
|
||||
DUDE: uyt6rta
|
||||
|
||||
(F) u+C3A31 u+C3A8C
|
||||
DUDE: 6v4xb5p
|
||||
|
||||
(G) u+09F44 u+0954C
|
||||
DUDE: 39ue4si
|
||||
|
||||
(H) u+8D1A3 u+8C8A3
|
||||
DUDE: 27t6dt3sa
|
||||
|
||||
(I) u+6C2B6 u+CC266
|
||||
DUDE: y6u7g4ss7a
|
||||
|
||||
(J) u+002D u+002D u+002D u+E848F
|
||||
DUDE: ---82w8r
|
||||
|
||||
(K) u+BD08E u+002D u+002D u+002D
|
||||
DUDE: 57s8q---
|
||||
|
||||
(L) u+A9A24 u+002D u+002D u+002D u+C05B7
|
||||
DUDE: 434we---y393d
|
||||
|
||||
(M) u+7FFFFFFF
|
||||
DUDE: z999993r or explicit failure
|
||||
|
||||
The next several examples are realistic Unicode strings that could
|
||||
be used in domain names. They exhibit single-row text, two-row
|
||||
text, ideographic text, and mixtures thereof. These examples are
|
||||
names of Japanese television programs, music artists, and songs,
|
||||
merely because one of the authors happened to have them handy.
|
||||
|
||||
(N) 3<nen>b<gumi><kinpachi><sensei> (Latin, kanji)
|
||||
u+0033 u+5E74 u+0062 u+7D44 u+91D1 u+516B u+5148 u+751F
|
||||
DUDE: xdx8whx8tgz7ug863f6s5kuduwxh
|
||||
|
||||
(O) <amuro><namie>-with-super-monkeys (Latin, kanji, hyphens)
|
||||
u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074
|
||||
u+0068 u+002D u+0073 u+0075 u+0070 u+0065 u+0072 u+002D u+006D
|
||||
u+006F u+006E u+006B u+0065 u+0079 u+0073
|
||||
DUDE: x58jupu8nuy6gt99m-yssctqtptn-tmgftfth-trcbfqtnk
|
||||
|
||||
(P) maji<de>koi<suru>5<byou><mae> (Latin, hiragana, kanji)
|
||||
u+006D u+0061 u+006A u+0069 u+3067 u+006B u+006F u+0069 u+3059
|
||||
u+308B u+0035 u+79D2 u+524D
|
||||
DUDE: pnmdvssqvssnegvsva7cvs5qz38hu53r
|
||||
|
||||
(Q) <pafii>de<runba> (Latin, katakana)
|
||||
u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0
|
||||
DUDE: vs5bezgxrvs3ibvs2qtiud
|
||||
|
||||
(R) <sono><supiido><de> (hiragana, katakana)
|
||||
u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067
|
||||
DUDE: vsvpvd7hypuivf4q
|
||||
|
||||
8. Security considerations
|
||||
|
||||
Users expect each domain name in DNS to be controlled by a single
|
||||
authority. If a Unicode string intended for use as a domain label
|
||||
could map to multiple ACE labels, then an internationalized domain
|
||||
name could map to multiple ACE domain names, each controlled by
|
||||
a different authority, some of which could be spoofs that hijack
|
||||
service requests intended for another. Therefore DUDE is designed
|
||||
so that each Unicode string has a unique encoding.
|
||||
|
||||
However, there can still be multiple Unicode representations of the
|
||||
"same" text, for various definitions of "same". This problem is
|
||||
addressed to some extent by the Unicode standard under the topic of
|
||||
canonicalization, and this work is leveraged for domain names by
|
||||
"nameprep" [NAMEPREP03].
|
||||
|
||||
9. References
|
||||
|
||||
[IDN] Internationalized Domain Names (IETF working group),
|
||||
http://www.i-d-n.net/, idn@ops.ietf.org.
|
||||
|
||||
[IDNA] Patrik Faltstrom, Paul Hoffman, "Internationalizing Host
|
||||
Names In Applications (IDNA)", draft-ietf-idn-idna-01.
|
||||
|
||||
[NAMEPREP03] Paul Hoffman, Marc Blanchet, "Preparation
|
||||
of Internationalized Host Names", 2001-Feb-24,
|
||||
draft-ietf-idn-nameprep-03.
|
||||
|
||||
[RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host
|
||||
Table Specification", 1985-Oct, RFC 952.
|
||||
|
||||
[RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities",
|
||||
1987-Nov, RFC 1034.
|
||||
|
||||
[RFC1123] Internet Engineering Task Force, R. Braden (editor),
|
||||
"Requirements for Internet Hosts -- Application and Support",
|
||||
1989-Oct, RFC 1123.
|
||||
|
||||
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
|
||||
Requirement Levels", 1997-Mar, RFC 2119.
|
||||
|
||||
[SFS] David Mazieres et al, "Self-certifying File System",
|
||||
http://www.fs.net/.
|
||||
|
||||
[UNICODE] The Unicode Consortium, "The Unicode Standard",
|
||||
http://www.unicode.org/unicode/standard/standard.html.
|
||||
|
||||
A. Acknowledgements
|
||||
|
||||
The basic encoding of integers to quartets to quintets to base-32
|
||||
comes from earlier IETF work by Martin Duerst. DUDE uses a slight
|
||||
variation on the idea.
|
||||
|
||||
Paul Hoffman provided helpful comments on this document.
|
||||
|
||||
The idea of avoiding 0, 1, o, and l in base-32 strings was taken
|
||||
from SFS [SFS].
|
||||
|
||||
B. Author contact information
|
||||
|
||||
Mark Welter <mwelter@walid.com>
|
||||
Brian W. Spolarich <briansp@walid.com>
|
||||
WALID, Inc.
|
||||
State Technology Park
|
||||
2245 S. State St.
|
||||
Ann Arbor, MI 48104
|
||||
+1 734 822 2020
|
||||
|
||||
Adam M. Costello <amc@cs.berkeley.edu>
|
||||
University of California, Berkeley
|
||||
http://www.cs.berkeley.edu/~amc/
|
||||
|
||||
C. Mixed-case annotation
|
||||
|
||||
In order to use DUDE to represent case-insensitive Unicode strings,
|
||||
higher layers need to case-fold the Unicode strings prior to DUDE
|
||||
encoding. The encoded string can, however, use mixed-case base-32
|
||||
(rather than all-lowercase or all-uppercase as recommended in
|
||||
section 4 "Base-32 characters") as an annotation telling how to
|
||||
convert the folded Unicode string into a mixed-case Unicode string
|
||||
for display purposes.
|
||||
|
||||
Each Unicode code point (unless it is U+002D hyphen-minus) is
|
||||
represented by a sequence of base-32 characters, the last of which
|
||||
is always a letter (as opposed to a digit). If that letter is
|
||||
uppercase, it is a suggestion that the Unicode character be mapped
|
||||
to uppercase (if possible); if the letter is lowercase, it is a
|
||||
suggestion that the Unicode character be mapped to lowercase (if
|
||||
possible).
|
||||
|
||||
DUDE encoders and decoders are not required to support these
|
||||
annotations, and higher layers need not use them.
|
||||
|
||||
Example: In order to suggest that example O in section 7 "Example
|
||||
strings" be displayed as:
|
||||
|
||||
<amuro><namie>-with-SUPER-MONKEYS
|
||||
|
||||
one could capitalize the DUDE encoding as:
|
||||
|
||||
x58jupu8nuy6gt99m-yssctqtptn-tMGFtFtH-tRCBFQtNK
|
||||
|
||||
D. Differences from draft-ietf-idn-dude-01
|
||||
|
||||
Four changes have been made since draft-ietf-idn-dude-01 (DUDE-01):
|
||||
|
||||
1) DUDE-01 computed the XOR of each integer with the previous one
|
||||
in order to decide how many bits of each integer to encode, but
|
||||
now the XOR itself is encoded, so there is no need for a mask.
|
||||
|
||||
2) DUDE-01 made the first quintet of each sequence different from
|
||||
the rest, while now it is the last quintet that differs, so it's
|
||||
easier for the decoder to detect the end of the sequence.
|
||||
|
||||
3) The base-32 map has changed to avoid 0, 1, o, and l, to help
|
||||
humans avoid transcription errors.
|
||||
|
||||
4) The initial value of the previous code point has changed from 0
|
||||
to 0x60, making the encodings of a few domain names shorter and
|
||||
none longer.
|
||||
|
||||
|
||||
E. Example implementation
|
||||
|
||||
|
||||
|
||||
/******************************************/
|
||||
/* dude.c 0.2.3 (2001-May-31-Thu) */
|
||||
/* Adam M. Costello <amc@cs.berkeley.edu> */
|
||||
/******************************************/
|
||||
|
||||
/* This is ANSI C code (C89) implementing */
|
||||
/* DUDE (draft-ietf-idn-dude-02). */
|
||||
|
||||
|
||||
/************************************************************/
|
||||
/* Public interface (would normally go in its own .h file): */
|
||||
|
||||
#include <limits.h>
|
||||
|
||||
enum dude_status {
|
||||
dude_success,
|
||||
dude_bad_input,
|
||||
dude_big_output /* Output would exceed the space provided. */
|
||||
};
|
||||
|
||||
enum case_sensitivity { case_sensitive, case_insensitive };
|
||||
|
||||
#if UINT_MAX >= 0x1FFFFF
|
||||
typedef unsigned int u_code_point;
|
||||
#else
|
||||
typedef unsigned long u_code_point;
|
||||
#endif
|
||||
|
||||
enum dude_status dude_encode(
|
||||
unsigned int input_length,
|
||||
const u_code_point input[],
|
||||
const unsigned char uppercase_flags[],
|
||||
unsigned int *output_size,
|
||||
char output[] );
|
||||
|
||||
/* dude_encode() converts Unicode to DUDE (without any */
|
||||
/* signature). The input must be represented as an array */
|
||||
/* of Unicode code points (not code units; surrogate pairs */
|
||||
/* are not allowed), and the output will be represented as */
|
||||
/* null-terminated ASCII. The input_length is the number of code */
|
||||
/* points in the input. The output_size is an in/out argument: */
|
||||
/* the caller must pass in the maximum number of characters */
|
||||
/* that may be output (including the terminating null), and on */
|
||||
/* successful return it will contain the number of characters */
|
||||
/* actually output (including the terminating null, so it will be */
|
||||
/* one more than strlen() would return, which is why it is called */
|
||||
/* output_size rather than output_length). The uppercase_flags */
|
||||
/* array must hold input_length boolean values, where nonzero */
|
||||
/* means the corresponding Unicode character should be forced */
|
||||
/* to uppercase after being decoded, and zero means it is */
|
||||
/* caseless or should be forced to lowercase. Alternatively, */
|
||||
/* uppercase_flags may be a null pointer, which is equivalent */
|
||||
/* to all zeros. The encoder always outputs lowercase base-32 */
|
||||
/* characters except when nonzero values of uppercase_flags */
|
||||
/* require otherwise. The return value may be any of the */
|
||||
/* dude_status values defined above; if not dude_success, then */
|
||||
/* output_size and output may contain garbage. On success, the */
|
||||
/* encoder will never need to write an output_size greater than */
|
||||
/* input_length*k+1 if all the input code points are less than 1 */
|
||||
/* << (4*k), because of how the encoding is defined. */
|
||||
|
||||
enum dude_status dude_decode(
|
||||
enum case_sensitivity case_sensitivity,
|
||||
char scratch_space[],
|
||||
const char input[],
|
||||
unsigned int *output_length,
|
||||
u_code_point output[],
|
||||
unsigned char uppercase_flags[] );
|
||||
|
||||
/* dude_decode() converts DUDE (without any signature) to */
|
||||
/* Unicode. The input must be represented as null-terminated */
|
||||
/* ASCII, and the output will be represented as an array of */
|
||||
/* Unicode code points. The case_sensitivity argument influences */
|
||||
/* the check on the well-formedness of the input string; it */
|
||||
/* must be case_sensitive if case-sensitive comparisons are */
|
||||
/* allowed on encoded strings, case_insensitive otherwise. */
|
||||
/* The scratch_space must point to space at least as large */
|
||||
/* as the input, which will get overwritten (this allows the */
|
||||
/* decoder to avoid calling malloc()). The output_length is */
|
||||
/* an in/out argument: the caller must pass in the maximum */
|
||||
/* number of code points that may be output, and on successful */
|
||||
/* return it will contain the actual number of code points */
|
||||
/* output. The uppercase_flags array must have room for at */
|
||||
/* least output_length values, or it may be a null pointer if */
|
||||
/* the case information is not needed. A nonzero flag indicates */
|
||||
/* that the corresponding Unicode character should be forced to */
|
||||
/* uppercase by the caller, while zero means it is caseless or */
|
||||
/* should be forced to lowercase. The return value may be any */
|
||||
/* of the dude_status values defined above; if not dude_success, */
|
||||
/* then output_length, output, and uppercase_flags may contain */
|
||||
/* garbage. On success, the decoder will never need to write */
|
||||
/* an output_length greater than the length of the input (not */
|
||||
/* counting the null terminator), because of how the encoding is */
|
||||
/* defined. */
|
||||
|
||||
|
||||
/**********************************************************/
|
||||
/* Implementation (would normally go in its own .c file): */
|
||||
|
||||
#include <string.h>
|
||||
|
||||
/* Character utilities: */
|
||||
|
||||
/* base32[q] is the lowercase base-32 character representing */
|
||||
/* the number q from the range 0 to 31. Note that we cannot */
|
||||
/* use string literals for ASCII characters because an ANSI C */
|
||||
/* compiler does not necessarily use ASCII. */
|
||||
|
||||
static const char base32[] = {
|
||||
97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, /* a-k */
|
||||
109, 110, /* m-n */
|
||||
112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, /* p-z */
|
||||
50, 51, 52, 53, 54, 55, 56, 57 /* 2-9 */
|
||||
};
|
||||
|
||||
/* base32_decode(c) returns the value of a base-32 character, in the */
|
||||
/* range 0 to 31, or the constant base32_invalid if c is not a valid */
|
||||
/* base-32 character. */
|
||||
|
||||
enum { base32_invalid = 32 };
|
||||
|
||||
static unsigned int base32_decode(char c)
|
||||
{
|
||||
if (c < 50) return base32_invalid;
|
||||
if (c <= 57) return c - 26;
|
||||
if (c < 97) c += 32;
|
||||
if (c < 97 || c == 108 || c == 111 || c > 122) return base32_invalid;
|
||||
return c - 97 - (c > 108) - (c > 111);
|
||||
}
|
||||
|
||||
/* unequal(case_sensitivity,s1,s2) returns 0 if the strings s1 and s2 */
|
||||
/* are equal, 1 otherwise. If case_sensitivity is case_insensitive, */
|
||||
/* then ASCII A-Z are considered equal to a-z respectively. */
|
||||
|
||||
static int unequal( enum case_sensitivity case_sensitivity,
|
||||
const char s1[], const char s2[] )
|
||||
{
|
||||
char c1, c2;
|
||||
|
||||
if (case_sensitivity != case_insensitive) return strcmp(s1,s2) != 0;
|
||||
|
||||
for (;;) {
|
||||
c1 = *s1;
|
||||
c2 = *s2;
|
||||
if (c1 >= 65 && c1 <= 90) c1 += 32;
|
||||
if (c2 >= 65 && c2 <= 90) c2 += 32;
|
||||
if (c1 != c2) return 1;
|
||||
if (c1 == 0) return 0;
|
||||
++s1, ++s2;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/* Encoder: */
|
||||
|
||||
enum dude_status dude_encode(
|
||||
unsigned int input_length,
|
||||
const u_code_point input[],
|
||||
const unsigned char uppercase_flags[],
|
||||
unsigned int *output_size,
|
||||
char output[] )
|
||||
{
|
||||
unsigned int max_out, in, out, k, j;
|
||||
u_code_point prev, codept, diff, tmp;
|
||||
char shift;
|
||||
|
||||
prev = 0x60;
|
||||
max_out = *output_size;
|
||||
|
||||
for (in = out = 0; in < input_length; ++in) {
|
||||
|
||||
/* At the start of each iteration, in and out are the number of */
|
||||
/* items already input/output, or equivalently, the indices of */
|
||||
/* the next items to be input/output. */
|
||||
|
||||
codept = input[in];
|
||||
|
||||
if (codept == 0x2D) {
|
||||
/* Hyphen-minus stands for itself. */
|
||||
if (max_out - out < 1) return dude_big_output;
|
||||
output[out++] = 0x2D;
|
||||
continue;
|
||||
}
|
||||
|
||||
diff = prev ^ codept;
|
||||
|
||||
/* Compute the number of base-32 characters (k): */
|
||||
for (tmp = diff >> 4, k = 1; tmp != 0; ++k, tmp >>= 4);
|
||||
|
||||
if (max_out - out < k) return dude_big_output;
|
||||
shift = uppercase_flags && uppercase_flags[in] ? 32 : 0;
|
||||
/* shift controls the case of the last base-32 digit. */
|
||||
|
||||
/* Each quintet has the form 1xxxx except the last is 0xxxx. */
|
||||
/* Computing the base-32 digits in reverse order is easiest. */
|
||||
|
||||
out += k;
|
||||
output[out - 1] = base32[diff & 0xF] - shift;
|
||||
|
||||
for (j = 2; j <= k; ++j) {
|
||||
diff >>= 4;
|
||||
output[out - j] = base32[0x10 | (diff & 0xF)];
|
||||
}
|
||||
|
||||
prev = codept;
|
||||
}
|
||||
|
||||
/* Append the null terminator: */
|
||||
if (max_out - out < 1) return dude_big_output;
|
||||
output[out++] = 0;
|
||||
|
||||
*output_size = out;
|
||||
return dude_success;
|
||||
}
|
||||
|
||||
|
||||
/* Decoder: */
|
||||
|
||||
enum dude_status dude_decode(
|
||||
enum case_sensitivity case_sensitivity,
|
||||
char scratch_space[],
|
||||
const char input[],
|
||||
unsigned int *output_length,
|
||||
u_code_point output[],
|
||||
unsigned char uppercase_flags[] )
|
||||
{
|
||||
u_code_point prev, q, diff;
|
||||
char c;
|
||||
unsigned int max_out, in, out, scratch_size;
|
||||
enum dude_status status;
|
||||
|
||||
prev = 0x60;
|
||||
max_out = *output_length;
|
||||
|
||||
for (c = input[in = 0], out = 0; c != 0; c = input[++in], ++out) {
|
||||
|
||||
/* At the start of each iteration, in and out are the number of */
|
||||
/* items already input/output, or equivalently, the indices of */
|
||||
/* the next items to be input/output. */
|
||||
|
||||
if (max_out - out < 1) return dude_big_output;
|
||||
|
||||
if (c == 0x2D) output[out] = c; /* hyphen-minus is literal */
|
||||
else {
|
||||
/* Base-32 sequence. Decode quintets until 0xxxx is found: */
|
||||
|
||||
for (diff = 0; ; c = input[++in]) {
|
||||
q = base32_decode(c);
|
||||
if (q == base32_invalid) return dude_bad_input;
|
||||
diff = (diff << 4) | (q & 0xF);
|
||||
if (q >> 4 == 0) break;
|
||||
}
|
||||
|
||||
prev = output[out] = prev ^ diff;
|
||||
}
|
||||
|
||||
/* Case of last character determines uppercase flag: */
|
||||
if (uppercase_flags) uppercase_flags[out] = c >= 65 && c <= 90;
|
||||
}
|
||||
|
||||
/* Enforce the uniqueness of the encoding by re-encoding */
|
||||
/* the output and comparing the result to the input: */
|
||||
|
||||
scratch_size = ++in;
|
||||
status = dude_encode(out, output, uppercase_flags,
|
||||
&scratch_size, scratch_space);
|
||||
if (status != dude_success || scratch_size != in ||
|
||||
unequal(case_sensitivity, scratch_space, input)
|
||||
) return dude_bad_input;
|
||||
|
||||
*output_length = out;
|
||||
return dude_success;
|
||||
}
|
||||
|
||||
|
||||
/******************************************************************/
|
||||
/* Wrapper for testing (would normally go in a separate .c file): */
|
||||
|
||||
#include <assert.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
/* For testing, we'll just set some compile-time limits rather than */
|
||||
/* use malloc(), and set a compile-time option rather than using a */
|
||||
/* command-line option. */
|
||||
|
||||
enum {
|
||||
unicode_max_length = 256,
|
||||
ace_max_size = 256,
|
||||
test_case_sensitivity = case_insensitive
|
||||
/* suitable for host names */
|
||||
};
|
||||
|
||||
|
||||
static void usage(char **argv)
|
||||
{
|
||||
fprintf(stderr,
|
||||
"%s -e reads code points and writes a DUDE string.\n"
|
||||
"%s -d reads a DUDE string and writes code points.\n"
|
||||
"Input and output are plain text in the native character set.\n"
|
||||
"Code points are in the form u+hex separated by whitespace.\n"
|
||||
"A DUDE string is a newline-terminated sequence of LDH characters\n"
|
||||
"(without any signature).\n"
|
||||
"The case of the u in u+hex is the force-to-uppercase flag.\n"
|
||||
, argv[0], argv[0]);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
|
||||
static void fail(const char *msg)
|
||||
{
|
||||
fputs(msg,stderr);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
static const char too_big[] =
|
||||
"input or output is too large, recompile with larger limits\n";
|
||||
static const char invalid_input[] = "invalid input\n";
|
||||
static const char io_error[] = "I/O error\n";
|
||||
|
||||
|
||||
/* The following string is used to convert LDH */
|
||||
/* characters between ASCII and the native charset: */
|
||||
|
||||
static const char ldh_ascii[] =
|
||||
"................"
|
||||
"................"
|
||||
".............-.."
|
||||
"0123456789......"
|
||||
".ABCDEFGHIJKLMNO"
|
||||
"PQRSTUVWXYZ....."
|
||||
".abcdefghijklmno"
|
||||
"pqrstuvwxyz";
|
||||
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
enum dude_status status;
|
||||
int r;
|
||||
char *p;
|
||||
|
||||
if (argc != 2) usage(argv);
|
||||
if (argv[1][0] != '-') usage(argv);
|
||||
if (argv[1][2] != 0) usage(argv);
|
||||
|
||||
if (argv[1][1] == 'e') {
|
||||
u_code_point input[unicode_max_length];
|
||||
unsigned long codept;
|
||||
unsigned char uppercase_flags[unicode_max_length];
|
||||
char output[ace_max_size], uplus[3];
|
||||
unsigned int input_length, output_size, i;
|
||||
|
||||
/* Read the input code points: */
|
||||
|
||||
input_length = 0;
|
||||
|
||||
for (;;) {
|
||||
r = scanf("%2s%lx", uplus, &codept);
|
||||
if (ferror(stdin)) fail(io_error);
|
||||
if (r == EOF || r == 0) break;
|
||||
|
||||
if (r != 2 || uplus[1] != '+' || codept > (u_code_point)-1) {
|
||||
fail(invalid_input);
|
||||
}
|
||||
|
||||
if (input_length == unicode_max_length) fail(too_big);
|
||||
|
||||
if (uplus[0] == 'u') uppercase_flags[input_length] = 0;
|
||||
else if (uplus[0] == 'U') uppercase_flags[input_length] = 1;
|
||||
else fail(invalid_input);
|
||||
|
||||
input[input_length++] = codept;
|
||||
}
|
||||
|
||||
/* Encode: */
|
||||
|
||||
output_size = ace_max_size;
|
||||
status = dude_encode(input_length, input, uppercase_flags,
|
||||
&output_size, output);
|
||||
if (status == dude_bad_input) fail(invalid_input);
|
||||
if (status == dude_big_output) fail(too_big);
|
||||
assert(status == dude_success);
|
||||
|
||||
/* Convert to native charset and output: */
|
||||
|
||||
for (p = output; *p != 0; ++p) {
|
||||
i = *p;
|
||||
assert(i <= 122 && ldh_ascii[i] != '.');
|
||||
*p = ldh_ascii[i];
|
||||
}
|
||||
|
||||
r = puts(output);
|
||||
if (r == EOF) fail(io_error);
|
||||
return EXIT_SUCCESS;
|
||||
}
|
||||
|
||||
if (argv[1][1] == 'd') {
|
||||
char input[ace_max_size], scratch[ace_max_size], *pp;
|
||||
u_code_point output[unicode_max_length];
|
||||
unsigned char uppercase_flags[unicode_max_length];
|
||||
unsigned int input_length, output_length, i;
|
||||
|
||||
/* Read the DUDE input string and convert to ASCII: */
|
||||
|
||||
fgets(input, ace_max_size, stdin);
|
||||
if (ferror(stdin)) fail(io_error);
|
||||
if (feof(stdin)) fail(invalid_input);
|
||||
input_length = strlen(input);
|
||||
if (input[input_length - 1] != '\n') fail(too_big);
|
||||
input[--input_length] = 0;
|
||||
|
||||
for (p = input; *p != 0; ++p) {
|
||||
pp = strchr(ldh_ascii, *p);
|
||||
if (pp == 0) fail(invalid_input);
|
||||
*p = pp - ldh_ascii;
|
||||
}
|
||||
|
||||
/* Decode: */
|
||||
|
||||
output_length = unicode_max_length;
|
||||
status = dude_decode(test_case_sensitivity, scratch, input,
|
||||
&output_length, output, uppercase_flags);
|
||||
if (status == dude_bad_input) fail(invalid_input);
|
||||
if (status == dude_big_output) fail(too_big);
|
||||
assert(status == dude_success);
|
||||
|
||||
/* Output the result: */
|
||||
|
||||
for (i = 0; i < output_length; ++i) {
|
||||
r = printf("%s+%04lX\n",
|
||||
uppercase_flags[i] ? "U" : "u",
|
||||
(unsigned long) output[i] );
|
||||
if (r < 0) fail(io_error);
|
||||
}
|
||||
|
||||
return EXIT_SUCCESS;
|
||||
}
|
||||
|
||||
usage(argv);
|
||||
return EXIT_SUCCESS; /* not reached, but quiets compiler warning */
|
||||
}
|
||||
|
||||
|
||||
|
||||
INTERNET-DRAFT expires 2001-Dec-07
|
@ -1,612 +0,0 @@
|
||||
Internet Draft Patrik Faltstrom
|
||||
draft-ietf-idn-idna-07.txt Cisco
|
||||
February 24, 2002 Paul Hoffman
|
||||
Expires in six months IMC & VPNC
|
||||
Adam M. Costello
|
||||
UC Berkeley
|
||||
|
||||
Internationalizing Domain Names in Applications (IDNA)
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with all
|
||||
provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering Task
|
||||
Force (IETF), its areas, and its working groups. Note that other groups
|
||||
may also distribute working documents as Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet-Drafts as reference material
|
||||
or to cite them other than as "work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
Until now, there has been no standard method for domain names to use
|
||||
characters outside the ASCII repertoire. This document defines
|
||||
internationalized domain names (IDNs) and a mechanism called IDNA for
|
||||
handling them in a standard fashion. IDNs use characters drawn from a
|
||||
large repertoire (Unicode), but IDNA allows the non-ASCII characters to
|
||||
be represented using the same octets used in so-called host names
|
||||
today. IDNA is only meant for processing domain names, not free
|
||||
text.
|
||||
|
||||
|
||||
1. Introduction
|
||||
|
||||
IDNA works by allowing applications to use certain ASCII name labels
|
||||
(beginning with a special prefix) to represent non-ASCII name labels.
|
||||
Lower-layer protocols need not be aware of this; therefore IDNA does not
|
||||
require changes to any infrastructure. In particular, IDNA does not
|
||||
require any changes to DNS servers, resolvers, or protocol elements,
|
||||
because the ASCII name service provided by the existing DNS is entirely
|
||||
sufficient.
|
||||
|
||||
This document does not require any applications to conform to IDNA,
|
||||
but applications can elect to use IDNA in order to support IDN while
|
||||
maintaining interoperability with existing infrastructure. Adding IDNA
|
||||
support to an existing application entails changes to the application
|
||||
only, and leaves room for flexibility in the user interface.
|
||||
|
||||
A great deal of the discussion of IDN solutions has focused on
|
||||
transition issues and how IDN will work in a world where not all of the
|
||||
components have been updated. Other proposals would require that user
|
||||
applications, resolvers, and DNS servers be updated in order for a user
|
||||
to use an internationalized domain name. Rather than require widespread
|
||||
updating of all components, IDNA requires only user applications to be
|
||||
updated; no changes are needed to the DNS protocol or any DNS servers or
|
||||
the resolvers on user's computers.
|
||||
|
||||
1.1 Interaction of protocol parts
|
||||
|
||||
IDNA requires that implementations process input strings with Nameprep
|
||||
[NAMEPREP], which is a profile of Stringprep [STRINGPREP], and then with
|
||||
Punycode [PUNYCODE]. Implementations of IDNA MUST fully implement
|
||||
Nameprep and Punycode; neither Nameprep nor Punycode are optional.
|
||||
|
||||
|
||||
2 Terminology
|
||||
|
||||
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
|
||||
"MAY" in this document are to be interpreted as described in RFC 2119
|
||||
[RFC2119].
|
||||
|
||||
A code point is an integral value associated with a character in a coded
|
||||
character set.
|
||||
|
||||
Unicode [UNICODE] is a coded character set containing tens of thousands
|
||||
of characters. A single Unicode code point is denoted by "U+" followed
|
||||
by four to six hexadecimal digits, while a range of Unicode code points
|
||||
is denoted by two hexadecimal numbers separated by "..", with no
|
||||
prefixes.
|
||||
|
||||
ASCII means US-ASCII, a coded character set containing 128 characters
|
||||
associated with code points in the range 0..7F. Unicode is an extension
|
||||
of ASCII: it includes all the ASCII characters and associates them with
|
||||
the same code points.
|
||||
|
||||
The term "LDH code points" is defined in this document to mean the code
|
||||
points associated with ASCII letters, digits, and the hyphen-minus; that
|
||||
is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an abbreviation for
|
||||
"letters, digits, hyphen".
|
||||
|
||||
[STD13] talks about "domain names" and "host names", but many people use
|
||||
the terms interchangeably. Further, because [STD13] was not terribly
|
||||
clear, many people who are sure they know the exact definitions of each
|
||||
of these terms disagree on the definitions.
|
||||
|
||||
A label is an individual part of a domain name. Labels are usually shown
|
||||
separated by dots; for example, the domain name "www.example.com" is
|
||||
composed of three labels: "www", "example", and "com". (The zero-length
|
||||
root label that is implied in domain names, as described in [STD13], is
|
||||
not considered a label in this specification.) Throughout this document
|
||||
the term "label" is shorthand for "text label", and "every label" means
|
||||
"every text label". In IDNA, not all text strings can be labels.
|
||||
|
||||
An "internationalized domain name" (IDN) is a domain name for which the
|
||||
ToASCII operation (see section 4) can be applied to each label without
|
||||
failing. This document does not attempt to define an "internationalized
|
||||
host name". It is expected that protocols and name-handling bodies will
|
||||
want to limit the characters allowed in IDNs further than what is
|
||||
specified in this document, such as to prohibit additional characters
|
||||
that they feel are unneeded or harmful in registered domain names.
|
||||
|
||||
An "internationalized label" is a label composed of characters from the
|
||||
Unicode character set; note, however, that not every string of Unicode
|
||||
characters can be an internationalized label. To allow internationalized
|
||||
labels to be handled by existing applications, IDNA uses an "ACE label"
|
||||
(ACE stands for ASCII Compatible Encoding), which can be represented
|
||||
using only ASCII characters but is equivalent to a label containing
|
||||
non-ASCII characters. More rigorously, an ACE label is defined to be any
|
||||
label that the ToUnicode operation would alter (see section 4.2). For
|
||||
every internationalized label that cannot be directly represented in
|
||||
ASCII, there is an equivalent ACE label. The conversion of labels to and
|
||||
from the ACE form is specified in section 4.
|
||||
|
||||
The "ACE prefix" is defined in this document to be a string of ASCII
|
||||
characters that appears at the beginning of every ACE label. It is
|
||||
specified in section 5.
|
||||
|
||||
A "domain name slot" is defined in this document to be a protocol element
|
||||
or a function argument or a return value (and so on) explicitly
|
||||
designated for carrying a domain name. Examples of domain name slots
|
||||
include: the QNAME field of a DNS query; the name argument of the
|
||||
gethostbyname() library function; the part of an email address following
|
||||
the at-sign (@) in the From: field of an email message header; and the host
|
||||
portion of the URI in the src attribute of an HTML <IMG> tag.
|
||||
General text that just happens to contain a domain name is not a domain name
|
||||
slot; for example, a domain name appearing in the plain text body of an
|
||||
email message is not occupying a domain name slot.
|
||||
|
||||
An "internationalized domain name slot" is defined in this document to
|
||||
be a domain name slot explicitly designated for carrying an
|
||||
internationalized domain name as defined in this document. The
|
||||
designation may be static (for example, in the specification of the
|
||||
protocol or interface) or dynamic (for example, as a result of
|
||||
negotiation in an interactive session).
|
||||
|
||||
A "generic domain name slot" is defined in this document to be any
|
||||
domain name slot that is not an internationalized domain name slot.
|
||||
Obviously, this includes any domain name slot whose specification
|
||||
predates IDNA.
|
||||
|
||||
|
||||
3. Requirements
|
||||
|
||||
IDNA conformance means adherence of the following three requirements:
|
||||
|
||||
1) Whenever a domain name is put into a generic domain name slot (see
|
||||
section 2), every label MUST contain only ASCII characters. Given an
|
||||
internationalized domain name (IDN), an equivalent domain name
|
||||
satisfying this requirement can be obtained by applying the ToASCII
|
||||
operation (see section 4) to each label.
|
||||
|
||||
2) ACE labels obtained from domain name slots SHOULD be hidden from
|
||||
users except when the use of the non-ASCII form would cause problems or
|
||||
when the ACE form is explicitly requested. Given an internationalized
|
||||
domain name, an equivalent domain name containing no ACE labels can be
|
||||
obtained by applying the ToUnicode operation (see section 4) to each
|
||||
label. When requirements 1 and 2 both apply, requirement 1 takes
|
||||
precedence.
|
||||
|
||||
3) Whenever two labels are compared, they MUST be considered to
|
||||
match if and only if their ASCII forms (obtained by applying ToASCII)
|
||||
match using a case-insensitive ASCII comparison.
|
||||
|
||||
|
||||
4. Conversion operations
|
||||
|
||||
This section specifies the ToASCII and ToUnicode operations. Each one
|
||||
operates on a sequence of Unicode code points (but remember that all
|
||||
ASCII code points are also Unicode code points). When domain names are
|
||||
represented using character sets other than Unicode and ASCII, they will
|
||||
need to first be transcoded to Unicode before these operations can be
|
||||
applied, and might need to be transcoded back afterwards.
|
||||
|
||||
4.1 ToASCII
|
||||
|
||||
The ToASCII operation takes a sequence of Unicode code points and
|
||||
transforms it into a sequence of code points in the ASCII range (0..7F).
|
||||
The original sequence and the resulting sequence are equivalent labels.
|
||||
(If the original is an internationalized label that cannot be directly
|
||||
represented in ASCII, the result will be the equivalent ACE label.)
|
||||
|
||||
ToASCII fails if any step of it fails. If any step fails, the original
|
||||
sequence MUST NOT be used as a label in an IDN.
|
||||
|
||||
The inputs to ToASCII are a sequence of code points; a flag indicating
|
||||
whether to prohibit unassigned code points (see [STRINGPREP]); and a
|
||||
flag indicating whether to apply the host name syntax rules. The output
|
||||
of ToASCII is either a sequence of ASCII code points or a failure
|
||||
condition.
|
||||
|
||||
ToASCII never alters a sequence of code points that are all in the ASCII
|
||||
range to begin with (although it could fail).
|
||||
|
||||
ToASCII consists of the following steps:
|
||||
|
||||
1. If all code points in the sequence are in the ASCII range (0..7F)
|
||||
then skip to step 3.
|
||||
|
||||
2. Perform the steps specified in [NAMEPREP] and fail if there is
|
||||
an error.
|
||||
|
||||
3. If the label is part of a host name (or is subject to the host
|
||||
name syntax rules) then perform these checks:
|
||||
|
||||
(a) Verify the absence of non-LDH ASCII code points; that is,
|
||||
the absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
|
||||
|
||||
(b) Verify the absence of leading and trailing hyphen-minus;
|
||||
that is, the absence of U+002D at the beginning and end of
|
||||
the sequence.
|
||||
|
||||
4. If all code points in the sequence are in the ASCII range (0..7F),
|
||||
then skip to step 8.
|
||||
|
||||
5. Verify that the sequence does NOT begin with the ACE prefix.
|
||||
|
||||
6. Encode the sequence using the encoding algorithm in [PUNYCODE].
|
||||
|
||||
7. Prepend the ACE prefix.
|
||||
|
||||
8. Verify that the number of code points is in the range 1 to 63
|
||||
inclusive.
|
||||
|
||||
4.2 ToUnicode
|
||||
|
||||
The ToUnicode operation takes a sequence of Unicode code points and
|
||||
returns a sequence of Unicode code points. If the input sequence is a
|
||||
label in ACE form, then the result is an equivalent internationalized
|
||||
label that is not in ACE form, otherwise the original sequence is
|
||||
returned unaltered.
|
||||
|
||||
ToUnicode never fails. If any step fails, then the original input
|
||||
sequence is returned immediately in that step.
|
||||
|
||||
The inputs to ToUnicode are a sequence of code points; a flag indicating
|
||||
whether to prohibit unassigned code points (see [STRINGPREP]); and a
|
||||
flag indicating whether to apply the host name syntax rules. The output
|
||||
of ToUnicode is always a sequence of Unicode code points.
|
||||
|
||||
1. If all code points in the sequence are in the ASCII range (0..7F)
|
||||
then skip to step 3.
|
||||
|
||||
2. Perform the steps specified in [NAMEPREP] and fail if there is an
|
||||
error. (If step 3 of ToASCII is also performed here, it will not
|
||||
affect the overall behavior of ToUnicode, but it is not
|
||||
necessary.)
|
||||
|
||||
3. Verify that the sequence begins with the ACE prefix, and save a
|
||||
copy of the sequence.
|
||||
|
||||
4. Remove the ACE prefix.
|
||||
|
||||
5. Decode the sequence using decoding algorithm in [PUNYCODE]. Save
|
||||
a copy of the result of this step.
|
||||
|
||||
6. Apply ToASCII.
|
||||
|
||||
7. Verify that the sequence matches the saved copy from step 3, using
|
||||
a case-insensitive ASCII comparison.
|
||||
|
||||
8. Return the saved copy from step 5.
|
||||
|
||||
|
||||
5. ACE prefix
|
||||
|
||||
[[ Note to the IESG and Internet Draft readers: The two uses of the
|
||||
string "IESG--" below are to be changed at time of publication to a
|
||||
prefix which fulfills the requirements in the first paragraph. ]]
|
||||
|
||||
The ACE prefix, used in the conversion operations (section 4), is two
|
||||
alphanumeric ASCII characters followed by two hyphen-minuses. It cannot
|
||||
be any of the prefixes already used in earlier documents, which includes
|
||||
the following: "bl--", "bq--", "dq--", "lq--", "mq--", "ra--", "wq--"
|
||||
and "zq--". The ToASCII and ToUnicode operations MUST recognize the ACE
|
||||
prefix in a case-insensitive manner.
|
||||
|
||||
The ACE prefix for IDNA is "IESG--".
|
||||
|
||||
This means that an ACE label might be "IESG--de-jg4avhby1noc0d", where
|
||||
"de-jg4avhby1noc0d" is the part of the ACE label that is generated by
|
||||
the encoding steps in [PUNYCODE].
|
||||
|
||||
|
||||
6. Implications for typical applications using DNS
|
||||
|
||||
In IDNA, applications perform the processing needed to input
|
||||
internationalized domain names from users, display internationalized
|
||||
domain names to users, and process the inputs and outputs from DNS and
|
||||
other protocols that carry domain names.
|
||||
|
||||
The components and interfaces between them can be represented
|
||||
pictorially as:
|
||||
|
||||
+------+
|
||||
| User |
|
||||
+------+
|
||||
^
|
||||
| Input and display: local interface methods
|
||||
| (pen, keyboard, glowing phosphorus, ...)
|
||||
+-------------------|-------------------------------+
|
||||
| v |
|
||||
| +-----------------------------+ |
|
||||
| | Application | |
|
||||
| | (conversion between local | |
|
||||
| | character set and Unicode | |
|
||||
| | is done here) | |
|
||||
| +-----------------------------+ |
|
||||
| ^ ^ | End system
|
||||
| | | |
|
||||
| Call to resolver: | | Application-specific |
|
||||
| ACE | | protocol: |
|
||||
| v | predefined by the |
|
||||
| +----------+ | protocol or defaults |
|
||||
| | Resolver | | to ACE |
|
||||
| +----------+ | |
|
||||
| ^ | |
|
||||
+-----------------|----------|----------------------+
|
||||
DNS protocol: | |
|
||||
ACE | |
|
||||
v v
|
||||
+-------------+ +---------------------+
|
||||
| DNS servers | | Application servers |
|
||||
+-------------+ +---------------------+
|
||||
|
||||
6.1 Entry and display in applications
|
||||
|
||||
Applications can accept domain names using any character set or sets
|
||||
desired by the application developer, and can display domain names in any
|
||||
charset. That is, the IDNA protocol does not affect the interface
|
||||
between users and applications.
|
||||
|
||||
An IDNA-aware application can accept and display internationalized
|
||||
domain names in two formats: the internationalized character set(s)
|
||||
supported by the application, and as an ACE label. ACE labels that are
|
||||
displayed or input MUST always include the ACE prefix. Applications MAY
|
||||
allow input and display of ACE labels, but are not encouraged to do so
|
||||
except as an interface for special purposes, possibly for debugging. ACE
|
||||
encoding is opaque and ugly, and should thus only be exposed to users
|
||||
who absolutely need it. The optional use, especially during a transition
|
||||
period, of ACE encodings in the user interface is described in section
|
||||
6.4. Because name labels encoded as ACE name labels can be rendered
|
||||
either as the encoded ASCII characters or the proper decoded characters,
|
||||
the application MAY have an option for the user to select the preferred
|
||||
method of display; if it does, rendering the ACE SHOULD NOT be the
|
||||
default.
|
||||
|
||||
Domain names are often stored and transported in many places. For example,
|
||||
they are part of documents such as mail messages and web pages. They are
|
||||
transported in many parts of many protocols, such as both the
|
||||
control commands and the RFC 2822 body parts of SMTP, and the headers
|
||||
and the body content in HTTP. It is important to remember that domain
|
||||
names appear both in domain name slots and in the content that is passed
|
||||
over protocols.
|
||||
|
||||
In protocols and document formats that define how to handle
|
||||
specification or negotiation of charsets, labels can be encoded in any
|
||||
charset allowed by the protocol or document format. If a protocol or
|
||||
document format only allows one charset, the labels MUST be given in
|
||||
that charset.
|
||||
|
||||
In any place where a protocol or document format allows transmission of
|
||||
the characters in internationalized labels, internationalized labels
|
||||
SHOULD be transmitted using whatever character encoding and escape
|
||||
mechanism that the protocol or document format uses at that place.
|
||||
|
||||
All protocols that use domain name slots already have the capacity for
|
||||
handling domain names in the ASCII charset. Thus, ACE labels
|
||||
(internationalized labels that have been processed with the ToASCII
|
||||
operation) can inherently be handled by those protocols.
|
||||
|
||||
6.2 Applications and resolver libraries
|
||||
|
||||
Applications normally use functions in the operating system when they
|
||||
resolve DNS queries. Those functions in the operating system are often
|
||||
called "the resolver library", and the applications communicate with the
|
||||
resolver libraries through a programming interface (API).
|
||||
|
||||
Because these resolver libraries today expect only domain names in
|
||||
ASCII, applications MUST prepare labels that are passed to the resolver
|
||||
library using the ToASCII operation. Labels received from the resolver
|
||||
library contain only ASCII characters; internationalized labels that
|
||||
cannot be represented directly in ASCII use the ACE form. ACE labels
|
||||
always include the ACE prefix.
|
||||
|
||||
IDNA-aware applications MUST be able to work with both
|
||||
non-internationalized labels (those that conform to [STD13]
|
||||
and [STD3]) and internationalized labels.
|
||||
|
||||
It is expected that new versions of the resolver libraries in the future
|
||||
will be able to accept domain names in other formats than ASCII, and
|
||||
application developers might one day pass not only domain names in
|
||||
Unicode, but also in local script to a new API for the resolver
|
||||
libraries in the operating system.
|
||||
|
||||
6.3 DNS servers
|
||||
|
||||
An operating system might have a set of libraries for performing the
|
||||
ToASCII operation. The input to such a library might be in one or more
|
||||
charsets that are used in applications (UTF-8 and UTF-16 are likely
|
||||
candidates for almost any operating system, and script-specific charsets
|
||||
are likely for localized operating systems).
|
||||
|
||||
For internationalized labels that cannot be represented directly in
|
||||
ASCII, DNS servers MUST use the ACE form produced by the ToASCII
|
||||
operation. All IDNs served by DNS servers MUST contain only ASCII
|
||||
characters.
|
||||
|
||||
If a signalling system which makes negotiation possible between old and
|
||||
new DNS clients and servers is standardized in the future, the encoding
|
||||
of the query in the DNS protocol itself can be changed from ACE to
|
||||
something else, such as UTF-8. The question whether or not this should
|
||||
be used is, however, a separate problem and is not discussed in this
|
||||
memo.
|
||||
|
||||
6.4 Avoiding exposing users to the raw ACE encoding
|
||||
|
||||
All applications that might show the user a domain name obtained from a
|
||||
domain name slot, such as from gethostbyaddr or part of a mail header,
|
||||
SHOULD be updated as soon as possible in order to prevent users from
|
||||
seeing the ACE.
|
||||
|
||||
If an application decodes an ACE name using ToUnicode but cannot show
|
||||
all of the characters in the decoded name, such as if the name contains
|
||||
characters that the output system cannot display, the application SHOULD
|
||||
show the name in ACE format (which always includes the ACE prefix)
|
||||
instead of displaying the name with the replacement character (U+FFFD).
|
||||
This is to make it easier for the user to transfer the name correctly to
|
||||
other programs. Programs that by default show the ACE form when they
|
||||
cannot show all the characters in a name label SHOULD also have a
|
||||
mechanism to show the name that is produced by the ToUnicode operation
|
||||
with as many characters as possible and replacement characters in the
|
||||
positions where characters cannot be displayed.
|
||||
|
||||
The ToUnicode operation does not alter labels that are not valid ACE
|
||||
labels, even if they begin with the ACE prefix. After ToUnicode has been
|
||||
applied, if a label still begins with the ACE prefix, then it is not a
|
||||
valid ACE label, and is not equivalent to any of the intermediate
|
||||
Unicode strings constructed by ToUnicode.
|
||||
|
||||
6.5 Bidirectional text in domain names
|
||||
|
||||
The display of domain names that contain bidirectional text is not covered
|
||||
in this document. It may be covered in a future version of this
|
||||
document, or may be covered in a different document.
|
||||
|
||||
For developers interested in displaying domain names that have
|
||||
bidirectional text, the Unicode standard has an extensive discussion of
|
||||
how to deal with reorder glyphs for display when dealing with
|
||||
bidirectional text such as Arabic or Hebrew. See [UAX9] for more
|
||||
information. In particular, all Unicode text is stored in logical order.
|
||||
|
||||
6.6 DNSSEC authentication of IDN domain names
|
||||
|
||||
DNS Security [DNSSEC] is a method for supplying cryptographic
|
||||
verification information along with DNS messages. Public Key
|
||||
Cryptography is used in conjunction with digital signatures to provide a
|
||||
means for a requester of domain information to authenticate the source
|
||||
of the data. This ensures that it can be traced back to a trusted
|
||||
source, either directly, or via a chain of trust linking the source of
|
||||
the information to the top of the DNS hierarchy.
|
||||
|
||||
IDNA specifies that all internationalized domain names served by DNS
|
||||
servers that cannot be represented directly in ASCII must use the ACE
|
||||
form produced by the ToASCII operation. This operation must be performed
|
||||
prior to a zone being signed by the private key for that zone. Because
|
||||
of this ordering, it is important to recognize that DNSSEC authenticates
|
||||
the ASCII domain name, not the Unicode form or the mapping between the
|
||||
Unicode form and the ASCII form. In other words, the output of ToASCII
|
||||
is the canonical name. In the presence of DNSSEC, this is the name that
|
||||
MUST be signed in the zone and MUST be validated against. It also SHOULD
|
||||
be used for other name comparisons, such as when a browser wants to
|
||||
indicate that a URL has been previously visited.
|
||||
|
||||
One consequence of this for sites deploying IDNA in the presence of
|
||||
DNSSEC is that any special purpose proxies or forwarders used to
|
||||
transform user input into IDNs must be earlier in the resolution flow
|
||||
than DNSSEC authenticating nameservers for DNSSEC to work.
|
||||
|
||||
6.7 Limitations of IDNA
|
||||
|
||||
The IDNA protocol does not solve all linguistic issues with users
|
||||
inputting names in different scripts. Many important language-based and
|
||||
script-based mappings are not covered in IDNA and must be handled
|
||||
outside the protocol. For example, names that are entered in a mix of
|
||||
traditional and simplified Chinese characters will not be mapped to a
|
||||
single canonical name. Another example is Scandinavian names that are
|
||||
entered with U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS) will not be
|
||||
mapped to U+00F8 (LATIN SMALL LETTER O WITH STROKE).
|
||||
|
||||
|
||||
7. Name Server Considerations
|
||||
|
||||
Internationalized domain name data in zone files (as specified by section
|
||||
5 of RFC 1035) MUST be processed with ToASCII before it is entered in
|
||||
the zone files.
|
||||
|
||||
It is imperative that there be only one ASCII encoding for a particular
|
||||
domain name. ACE is an encoding for domain name labels that use non-ASCII
|
||||
characters. Thus, a primary master name server MUST NOT contain an
|
||||
ACE-encoded label that decodes to an ASCII label. The ToASCII operation
|
||||
assures that no such names are ever output from the operation.
|
||||
|
||||
Name servers MUST NOT serve records with domain names that contain
|
||||
non-ASCII characters; such names MUST be converted to ACE form by the
|
||||
ToASCII operation in order to be served. If names that are not processed
|
||||
by ToASCII are passed to an application, it will result in unpredictable
|
||||
behavior. Note that [STRINGPREP] describes how to handle versioning of
|
||||
unallocated codepoints.
|
||||
|
||||
|
||||
8. Root Server Considerations
|
||||
|
||||
IDNs are likely to be somewhat longer than current host names, so the
|
||||
bandwidth needed by the root servers should go up by a small amount.
|
||||
Also, queries and responses for IDNs will probably be somewhat longer
|
||||
than typical queries today, so more queries and responses may be forced
|
||||
to go to TCP instead of UDP.
|
||||
|
||||
|
||||
9. Security Considerations
|
||||
|
||||
Security on the Internet partly relies on the DNS. Thus, any
|
||||
change to the characteristics of the DNS can change the security of much
|
||||
of the Internet.
|
||||
|
||||
This memo describes an algorithm which encodes characters that are not
|
||||
valid according to STD3 and STD13 into octet values that are valid. No
|
||||
security issues such as string length increases or new allowed values
|
||||
are introduced by the encoding process or the use of these encoded
|
||||
values, apart from those introduced by the ACE encoding itself.
|
||||
|
||||
Domain names are used by users to connect to Internet servers. The
|
||||
security of the Internet would be compromised if a user entering a
|
||||
single internationalized name could be connected to different servers
|
||||
based on different interpretations of the internationalized domain name.
|
||||
|
||||
Because this document normatively refers to [NAMEPREP], it includes the
|
||||
security considerations from that document as well.
|
||||
|
||||
|
||||
A. References
|
||||
|
||||
[PUNYCODE] Adam Costello, "Punycode", draft-ietf-idn-punycode.
|
||||
|
||||
[DNSSEC] Don Eastlake, "Domain Name System Security Extensions", RFC
|
||||
2535, March 1999.
|
||||
|
||||
[NAMEPREP] Paul Hoffman and Marc Blanchet, "Preparation of
|
||||
Internationalized Domain Names", draft-ietf-idn-nameprep.
|
||||
|
||||
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
|
||||
Requirement Levels", March 1997, RFC 2119.
|
||||
|
||||
[STD3] Bob Braden, "Requirements for Internet Hosts -- Communication
|
||||
Layers" (RFC 1122) and "Requirements for Internet Hosts -- Application
|
||||
and Support" (RFC 1123), STD 3, October 1989.
|
||||
|
||||
[STD13] Paul Mockapetris, "Domain names - concepts and facilities" (RFC
|
||||
1034) and "Domain names - implementation and specification" (RFC 1035),
|
||||
STD 13, November 1987.
|
||||
|
||||
[STRINGPREP] Paul Hoffman and Marc Blanchet, "Preparation of
|
||||
Internationalized Strings ("stringprep")", draft-hoffman-stringprep,
|
||||
work in progress
|
||||
.
|
||||
[UAX9] Unicode Standard Annex #9, The Bidirectional Algorithm,
|
||||
<http://www.unicode.org/unicode/reports/tr9/>.
|
||||
|
||||
[UNICODE] The Unicode Standard, Version 3.1.0: The Unicode Consortium.
|
||||
The Unicode Standard, Version 3.0. Reading, MA, Addison-Wesley
|
||||
Developers Press, 2000. ISBN 0-201-61633-5, as amended by: Unicode
|
||||
Standard Annex #27: Unicode 3.1,
|
||||
<http://www.unicode.org/unicode/reports/tr27/tr27-4.html>.
|
||||
|
||||
|
||||
B. Authors' Addresses
|
||||
|
||||
Patrik Faltstrom
|
||||
Cisco Systems
|
||||
Arstaangsvagen 31 J
|
||||
S-117 43 Stockholm Sweden
|
||||
paf@cisco.com
|
||||
|
||||
Paul Hoffman
|
||||
Internet Mail Consortium and VPN Consortium
|
||||
127 Segre Place
|
||||
Santa Cruz, CA 95060 USA
|
||||
phoffman@imc.org
|
||||
|
||||
Adam M. Costello
|
||||
University of California, Berkeley
|
||||
idna-spec.amc @ nicemice.net
|
@ -1,426 +0,0 @@
|
||||
Internet Draft Marc Blanchet
|
||||
draft-ietf-idn-idne-02.txt Viagenie
|
||||
March 19, 2001 Paul Hoffman
|
||||
Expires in six months IMC & VPNC
|
||||
|
||||
Internationalized domain names using EDNS (IDNE)
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with all
|
||||
provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering Task
|
||||
Force (IETF), its areas, and its working groups. Note that other groups
|
||||
may also distribute working documents as Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet-Drafts as reference material
|
||||
or to cite them other than as "work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
The current DNS infrastructure does not provide a way to use
|
||||
internationalized domain names (IDN). This document describes an
|
||||
extension mechanism based on EDNS which enables the use of IDN without
|
||||
causing harm to the current DNS. IDNE enables IDN host names with a as
|
||||
many characters as current ASCII-only host names. It fully supports
|
||||
UTF-8 and conforms to the IDN requirements.
|
||||
|
||||
|
||||
1. Introduction
|
||||
|
||||
Various proposals for IDN have tried to integrate IDN into the current
|
||||
limited ASCII DNS. However, the compatibility issues make too many
|
||||
constraints on the architecture. Many of these proposals require
|
||||
modifications to the applications or to the DNS protocol or to the
|
||||
servers. This proposal take a different approach: it uses the
|
||||
standardized extension mechanism for DNS (EDNS) and uses UTF-8 as the
|
||||
mandatory charset. It causes no harm to the current DNS because it uses
|
||||
the EDNS extension mechanism. The major drawback of this proposal is
|
||||
that all protocols, applications and DNS servers will have to be
|
||||
upgraded to support this proposal.
|
||||
|
||||
1.1 Terminology
|
||||
|
||||
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
|
||||
"MAY" in this document are to be interpreted as described in RFC 2119
|
||||
[RFC2119].
|
||||
|
||||
Hexadecimal values are shown preceded with an "0x". For example,
|
||||
"0xa1b5" indicates two octets, 0xa1 followed by 0xb5. Binary values are
|
||||
shown preceded with an "0b". For example, a nine-bit value might be
|
||||
shown as "0b101101111".
|
||||
|
||||
Examples in this document use the notation from the Unicode Standard
|
||||
[UNICODE3] as well as the ISO 10646 [ISO10646] names. For example, the
|
||||
letter "a" may be represented as either "U+0061" or "LATIN SMALL LETTER
|
||||
A". In the lists of prohibited characters, the "U+" is left off to make
|
||||
the lists easier to read.
|
||||
|
||||
1.2 IDN summary
|
||||
|
||||
Using the terminology in [IDNCOMP], this protocol specifies an IDN
|
||||
architecture of arch-2 (send binary or ACE). The binary format is
|
||||
bin-1.1 (UTF-8), and the method for distinguishing binary from current
|
||||
names is bin-2.4 (mark binary with EDNS0). The transition period is not
|
||||
specified.
|
||||
|
||||
|
||||
2. Functional Description
|
||||
|
||||
DNS query and responses containing IDNE labels have the following
|
||||
properties:
|
||||
|
||||
- The string in the label MUST be pre-processed as described in
|
||||
[NAMEPREP] before the query or response is prepared.
|
||||
|
||||
- The characters in the label MUST be encoded using UTF-8 [RFC2279].
|
||||
|
||||
- The entire label MUST be encoded EDNS [RFC2671].
|
||||
|
||||
- The version of the IDN protocol MUST be identified.
|
||||
|
||||
|
||||
3. Encoding
|
||||
|
||||
An IDNE label uses the EDNS extended label type prefix (0b01), as
|
||||
described in [RFC2671]. (A normal label type always begin with 0b00). A
|
||||
new extended label type for IDNE is used to identify an IDNE label. This
|
||||
document uses 0b000010 as the extended label type; however, the label
|
||||
type will be assigned by IANA and it may not be 0b000010.
|
||||
|
||||
0 1 2
|
||||
bits 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 . . .
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-//+-+-+-+-+-+-+
|
||||
|0 1| ELT | Size | IDN label ... |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+-+-+-+-+
|
||||
|
||||
|
||||
ELT: The six-bit extended label type to be assigned by the IANA for an
|
||||
IDN label. In this document, the value 0b000010 is used, although that
|
||||
might be changed by IANA.
|
||||
|
||||
Size: Size (in octets) of the IDN label following. This MUST NOT
|
||||
be zero.
|
||||
|
||||
IDN label: Label, encoded in UTF-8 [RFC2279]. Note that this label might
|
||||
contain all ASCII characters, and thus can be used for host name labels
|
||||
that are legal in [STD13].
|
||||
|
||||
IDNE labels can be mixed with STD13 labels in a domain name.
|
||||
|
||||
The compression scheme in section 4.1.4 of [STD13] is supported as is.
|
||||
Pointers can refer to either IDN labels or non-IDN labels.
|
||||
|
||||
3.1 Examples
|
||||
|
||||
3.1.1 Basic example
|
||||
|
||||
The following example shows the label me.com where the "e" in "me" is
|
||||
replaced by a <LATIN CAPITAL LETTER E WITH ACUTE>, which is U+00C9. The
|
||||
decomposition and downcasing specified in [NAMEPREP] changes the second
|
||||
character to <LATIN SMALL LETTER E WITH ACUTE>, U+00E9. This string is
|
||||
then transformed using UTF-8 [RFC2279] to 0x6DC3A9.
|
||||
|
||||
Ignoring the other fields of the message, the domain name portion of the
|
||||
datagram could look like:
|
||||
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
20 | 0 1 0 0 0 0 1 0| 0 0 0 0 0 0 1 1|
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
22 | 0x6D (m) | 0xC3 (e'(1)) |
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
24 | 0xA9 (e'(2)) | 3 |
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
26 | 0x63 (c) | 0x6F (o) |
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
28 | 0x6D (m) | 0x00 |
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
|
||||
Octet 20 means EDNS extended label type (0b01) using the IDN label
|
||||
type (0b000010)
|
||||
Octet 21 means size of label is 3 octets following
|
||||
Octet 22-24 are the "m*" label encoded in UTF-8
|
||||
Octet 25-28 are "com" encoded as a STD13 label
|
||||
Octet 29 is the root domain
|
||||
|
||||
3.1.2 Example with compression
|
||||
|
||||
Using the previous labels, one datagram might contain "www.m*.com" and
|
||||
"m*.com" (where the "*" is <LATIN CAPITAL LETTER E WITH ACUTE>).
|
||||
|
||||
Ignoring the other fields of the message, the domain name portions of
|
||||
the datagram could look like:
|
||||
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
20 | 0 1 0 0 0 0 1 0| 0 0 0 0 0 0 1 1|
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
22 | 0x6D (m) | 0xC3 (e'(1)) |
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
24 | 0xA9 (e'(2)) | 3 |
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
26 | 0x63 (c) | 0x6F (o) |
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
28 | 0x6D (m) | 0x00 |
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
. . .
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
40 | 3 | 0x77 (w) |
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
42 | 0x77 (w) | 0x77 (w) |
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
44 | 1 1| 20 |
|
||||
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
||||
|
||||
The domain name "m*.com" is shown at offset 20. The domain name
|
||||
"www.m*.com" is shown at offset 40; this definition uses a pointer to
|
||||
concatenate a label for www to the previously defined "m*.com".
|
||||
|
||||
|
||||
4. Label Size
|
||||
|
||||
In IDNE, the maximum length of a label is 255 octets, and the maximum
|
||||
size for a domain name is 1023 octets. The reason for using these values
|
||||
is so that IDNE labels can have the same number of characters as the
|
||||
ASCII-based labels in [STD13]. Because character encoding in UTF-8 is
|
||||
variable length, the maximum octet length for characters expected in the
|
||||
foreseeable future (that is, 4 octets for a single character) was used.
|
||||
Note that this extension allows some IDNE labels to be longer than 63
|
||||
characters and some IDNE names to be longer than 255 octets.
|
||||
|
||||
Software creating DNS queries or responses using IDNE MUST verify that,
|
||||
after IDN preparation and transformation to UTF8, that no labels are
|
||||
longer than 255 octets and that no names are longer than 1023 octets. If
|
||||
there is a user interface associated with the process creating the query
|
||||
or response, that interface SHOULD give the user an error message.
|
||||
|
||||
Software MUST NOT transmit DNS queries or responses which contain labels
|
||||
that are longer than 255 octets or names that are longer than 1023
|
||||
octets. Servers MUST NOT accept DNS queries or responses which contain
|
||||
labels that are longer than 255 octets or names that are longer than
|
||||
1023 octets, and MUST send the NOTIMPL RCODE error message if such
|
||||
queries or responses are received.
|
||||
|
||||
|
||||
5. UDP Packet Size
|
||||
|
||||
IDNE-capable senders and receivers MUST support UDP packet sizes of 1220
|
||||
octets, not including IP and UDP headers (note that the minimum MTU for
|
||||
IPv6 is 1280 [RFC2460]). A sender MUST announce its capability in the
|
||||
OPT pseudo-RR described in section 4.3 of [RFC2671] by having the CLASS
|
||||
sender's UDP payload size be greater than or equal to 1220.
|
||||
|
||||
|
||||
6. Canonalization, Prohibited Characters, and Case Folding
|
||||
|
||||
The string in the label MUST be pre-processed as described in [NAMEPREP]
|
||||
before the query or response is prepared. A query or response MUST NOT
|
||||
contain a label that does not conform to [NAMEPREP].
|
||||
|
||||
|
||||
7. Versions of IDNE
|
||||
|
||||
The IDN protocol version number MUST be included in the OPT RR RDATA of
|
||||
EDNS (described in Section 4.4 of [RFC2671]). An OPTION-CODE will be
|
||||
assigned by IANA for storing the IDNE protocol version number; this
|
||||
document uses 0x0001 for the OPTION-CODE. The value (that
|
||||
is, the OPTION-DATA) is the version number coded in 8 bits.
|
||||
|
||||
All requesters MUST send this information as part of the OPT RR included
|
||||
in the EDNS packet.
|
||||
|
||||
7.1 This version of IDNE
|
||||
|
||||
This document describes version 1 of IDNE. This version is a combination
|
||||
of the protocol in this document and the rules as described in
|
||||
[NAMEPREP]. Note that [NAMEPREP] describes a single version of the list
|
||||
of canonicalization, case folding, and prohibited characters, and that
|
||||
this document is linked to that single version of [NAMEPREP].
|
||||
|
||||
The identifiers for this specification are:
|
||||
OPTION-CODE = 0x0001 (IDNE protocol version)
|
||||
OPTION-LENGTH = 0x0001 (1 octet following)
|
||||
OPTION-DATA = 0x01 (IDNE protocol version 1)
|
||||
|
||||
7.2 Creating new versions of IDNE
|
||||
|
||||
A new version of IDNE is created by a standards-track RFC that
|
||||
specifies:
|
||||
|
||||
- a normative reference to [NAMEPREP] or a successor document to
|
||||
[NAMEPREP]
|
||||
|
||||
- an IDNE version number that is 1 greater than the highest IDNE version
|
||||
number at the time the RFC is published
|
||||
|
||||
If there are any changes to the encoding or interpretation of the
|
||||
protocol, they must also be specified in the same standards-track RFC.
|
||||
|
||||
7.3 Prohibited characters and versions of IDNE
|
||||
|
||||
If a server receives a request containing an illegal or unknown
|
||||
character (as described in the version number in the request), it MUST
|
||||
send a NOTIMPL RCODE to the client. For example, if a server that
|
||||
understands both version 1 and version 2 receives a request that is
|
||||
marked as version 1, but contains a label that includes a character that
|
||||
is prohibited in version 1 but allowed in version 2, that server must
|
||||
still send a NOTIMPL RCODE to the client.
|
||||
|
||||
|
||||
8. API Specifications
|
||||
|
||||
The current API for TCP/IP uses gethostbyname and gethostbyaddr for IPv4
|
||||
and getnodeipbyname and getnodeipbyaddr (specified in [RFC 2671]) for
|
||||
both IPv4 and IPv6. These function calls returns hostent structs, where
|
||||
the h_name field contains a pointer to a char. In this context,
|
||||
receiving a UTF-8 string mean that the application should know that
|
||||
UTF-8 uses more than one octet per char.
|
||||
|
||||
A new flag "IDN" (to appear in netdb.h) is defined to be passed in the
|
||||
flags argument of getnodeipbynode and getnodeipbyaddr. This flag tells
|
||||
the resolver to request an IDNE-encoded name. No new return code is
|
||||
defined since the returned codes in RFC 2671 are meaningful in the IDNE
|
||||
context.
|
||||
|
||||
If one has not yet converted his code to IPv6 and still wants to enable
|
||||
IDNs with this API, one can do a macro of the getnodeipby* functions
|
||||
mapped to the IPv4 gethostby* ones, including the "IDN" flag, and then
|
||||
process differently based on the presence of the flag.
|
||||
|
||||
|
||||
9. Transition and Deployment
|
||||
|
||||
Deployment of this proposal means updating clients and servers, as well
|
||||
as applications and protocols, and therefore a transition strategy is
|
||||
proposed. Because many DNS servers do not yet handle IDNE and may take
|
||||
years or decades to do so, an ASCII-compatible encoding (ACE) format for
|
||||
IDN names is also needed as a transition to an all-IDNE DNS. Note that
|
||||
IDNE and an ACE are not related, and do not interact in the DNS. If the
|
||||
IETF chooses to have an ACE mechanism in use at the same time as IDNE,
|
||||
it would be wise to choose an ACE that allows as many characters as
|
||||
possible in the name parts and full names.
|
||||
|
||||
IDNE allows names with as many characters as current names. This means
|
||||
that it is possible to create names in IDNE that are longer than those
|
||||
that can be created in the ACE protocols that have been described so
|
||||
far. Although not prohibited, it is unwise to create a name that can be
|
||||
legally represented in IDNE but not in the ACE, or a name that can be
|
||||
legally represented in the ACE but not in IDNE.
|
||||
|
||||
The IETF should periodically evaluate the benefits and problems
|
||||
associated with having three different formats for names (STD13, IDNE,
|
||||
and ACE). If at some point it is decided that the problems outweigh the
|
||||
benefits, the IETF can state a time when one or more of the services
|
||||
should not be used on the Internet.
|
||||
|
||||
|
||||
10. Root Server Considerations
|
||||
|
||||
Because this specification uses EDNS, root servers should be prepared to
|
||||
receive EDNS requests. This specification handles IDN top-level domains
|
||||
in exactly the same fashion as it does every other domain.
|
||||
Considerations about IDN top-level domains are outside of this work, but
|
||||
the first IDN top-level domains would require all root servers to be
|
||||
ready for IDNE requests.
|
||||
|
||||
|
||||
11. IANA Considerations
|
||||
|
||||
[[ TBD. This section will have two parts. The first will request an EDNS
|
||||
option code. The second will specify how IDNE version numbers are
|
||||
allocated (namely, standards-track RFC only). ]]
|
||||
|
||||
|
||||
12. Security Considerations
|
||||
|
||||
Because IDNE uses EDNS, it inherits the same security considerations as
|
||||
EDNS.
|
||||
|
||||
Much of the security of the Internet relies on the DNS. Thus, any change
|
||||
to the characteristics of the DNS can change the security of much of the
|
||||
Internet.
|
||||
|
||||
Host names are used by users to connect to Internet servers. The
|
||||
security of the Internet would be compromised if a user entering a
|
||||
single internationalized name could be connected to different servers
|
||||
based on different interpretations of the internationalized host name.
|
||||
|
||||
Because this document normatively refers to [NAMEPREP] and [RFC2671],
|
||||
it includes the security considerations from those documents as well.
|
||||
|
||||
|
||||
13. References
|
||||
|
||||
[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
|
||||
Proposals", draft-ietf-idn-compare.
|
||||
|
||||
[ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information
|
||||
technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part
|
||||
1: Architecture and Basic Multilingual Plane. Five amendments and a
|
||||
technical corrigendum have been published up to now. UTF-16 is described
|
||||
in Annex Q, published as Amendment 1. 17 other amendments are currently
|
||||
at various stages of standardization. [[[ THIS REFERENCE NEEDS TO BE
|
||||
UPDATED AFTER DETERMINING ACCEPTABLE WORDING ]]]
|
||||
|
||||
[NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of
|
||||
Internationalized Host Names", draft-ietf-idn-nameprep.
|
||||
|
||||
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
|
||||
Requirement Levels", March 1997, RFC 2119.
|
||||
|
||||
[RFC2279] Francois Yergeau, "UTF-8, a transformation format of ISO
|
||||
10646", January 1998, RFC 2279.
|
||||
|
||||
[RFC2460] Steve Deering & Bob Hinden, "Internet Protocol, Version 6 (IPv6)
|
||||
Specification", December 1998, RFC 2460.
|
||||
|
||||
[RFC2671] Paul Vixie, "Extension Mechanisms for DNS (EDNS0)", August
|
||||
1999, RFC 2671.
|
||||
|
||||
[STD13] Paul Mockapetris, "Domain names - implementation and
|
||||
specification", November 1987, STD 13 (RFC 1035).
|
||||
|
||||
[UNICODE3] The Unicode Consortium, "The Unicode Standard -- Version
|
||||
3.0", ISBN 0-201-61633-5. Described at
|
||||
<http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>.
|
||||
|
||||
|
||||
A. Acknowledgements
|
||||
|
||||
This document is the result of the thinking of many people. The following
|
||||
people made significant comments on the early drafts:
|
||||
|
||||
Andre Cormier
|
||||
Andrew Draper
|
||||
Bill Sommerfeld
|
||||
Francois Yergeau
|
||||
|
||||
|
||||
B. Changes from -01 to -02
|
||||
|
||||
None.
|
||||
|
||||
|
||||
C. Authors' Addresses
|
||||
|
||||
Marc Blanchet
|
||||
Viagenie
|
||||
2875 boul. Laurier, bureau 300
|
||||
Sainte-Foy, QC G1V 2M2 Canada
|
||||
Marc.Blanchet@viagenie.qc.ca
|
||||
|
||||
Paul Hoffman
|
||||
Internet Mail Consortium and VPN Consortium
|
||||
127 Segre Place
|
||||
Santa Cruz, CA 95060 USA
|
||||
phoffman@imc.org
|
||||
|
@ -1,540 +0,0 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
INTERNET-DRAFT Hongbo Shi
|
||||
draft-ietf-idn-iptr-02.txt Waseda University
|
||||
17 May 2001 Jiang Ming Liang
|
||||
Expires: 17 November 2001 i-DNS.net
|
||||
|
||||
|
||||
Internationalized PTR Resource Record (IPTR)
|
||||
|
||||
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with all
|
||||
provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering Task
|
||||
Force (IETF), its areas, and its working groups. Note that other
|
||||
groups may also distribute working documents as Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet-Drafts as reference material
|
||||
or to cite them other than as "work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This draft attempts to address the problem of how an IP address SHOULD
|
||||
be properly mapped to a set of Internationalized Domain Names(IDNs).
|
||||
It is currently unspecified how a PTR record can be used for this
|
||||
purpose. In addition, the syntax of the PTR resource record may be
|
||||
too restrictive for such a mapping in a more culturally meaningful
|
||||
context. This document suggests a new TYPE called IPTR using EDNS0
|
||||
and a mechanism to combined language information with such a mapping.
|
||||
|
||||
1. Introduction
|
||||
|
||||
Reverse mapping is a very important and essential function in the DNS.
|
||||
In today's Domain Name System, PTR RRs are used to support address-
|
||||
to-domain mappings. However, a current PTR RR does not provide support
|
||||
for proper address-to-IDN mappings, without certain modifications.
|
||||
Modifying the PTR structure will also affect the current reverse
|
||||
|
||||
|
||||
|
||||
Shi, Jiang [Page 1]
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
|
||||
|
||||
|
||||
mapping architecture. This document describes a new RR TYPE named IPTR
|
||||
to provide address-to-IDN mappings and it also specifies that on
|
||||
receiving of a IPTR query a name server should respond with all the
|
||||
corresponding IPTR RRs in one response. In short, "one IP several
|
||||
IDNs".
|
||||
|
||||
1.1 Terminology
|
||||
|
||||
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
|
||||
and "MAY" in this document are to be interpreted as described in RFC
|
||||
2119 [RFC2119].
|
||||
|
||||
1.2 Background and Designs
|
||||
|
||||
When Internationalized Domain Names come into wide use, an Internet
|
||||
host is likely to have domain names in different languages. In
|
||||
today's Internet, even thought the [RFC2181] redefine the
|
||||
consideration of PTR, because of the design of the PTR mapping
|
||||
algorithm and implementation of most resolvers, IP address to domain
|
||||
names mapping is still limited to "one IP one domain name".
|
||||
|
||||
For example, BIND treats PTRs specially so that the normal sorting
|
||||
preference (e.g. cyclic/random) doesn't apply. But as usual, "fixed"
|
||||
order is always used. So a client that is querying a BIND server and
|
||||
doesn't look beyond the first PTR RR, no matter how many times it
|
||||
queries the name. In other words, PTR RRset is different from A RRset,
|
||||
where the first record in the RRset might differ from query to query.
|
||||
|
||||
This is more restrictive in a world of IDNs, for choosing some names
|
||||
in a particular language. Briefly, according to the use of PTR, it is
|
||||
no meaning of returning an IDN in an unknown language.
|
||||
|
||||
The authors also believe that putting language information into
|
||||
address-to-name mappings will be benifitial to future applications.
|
||||
|
||||
The design purpose of the IPTR RR type is to provide a mechanism that
|
||||
can map an IP address to the corresponding IDN per language. It also
|
||||
means that IPTR suggests a new mapping algorithm for the reverse
|
||||
mapping by using an language information.
|
||||
|
||||
CNAME MUST continue to work for IPTR as it works now for PTR records.
|
||||
|
||||
The behavior of a resolver on the use of IPTR will be specified in a
|
||||
seperate draft or a later version of this draft.
|
||||
|
||||
1.3 Functional Description
|
||||
|
||||
DNS query and responses involving IPTR type MUST have the following
|
||||
|
||||
|
||||
|
||||
Shi, Jiang [Page 2]
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
|
||||
|
||||
|
||||
properties:
|
||||
|
||||
|
||||
- When the QTYPE is IPTR, the corresponding IDNs SHOULD be
|
||||
returned in one response.
|
||||
|
||||
|
||||
- The characters in the label MUST be encoded using UTF-8
|
||||
[RFC2279].
|
||||
|
||||
|
||||
- The entire label MUST be encoded EDNS [RFC2671].
|
||||
|
||||
|
||||
- An exceptional handling of PTR for the IDN is REQUIRED.
|
||||
|
||||
|
||||
2. IPTR definition
|
||||
|
||||
The structure of an IPTR RR is somewhat like the MX RR. In addtion to
|
||||
the IP address in the IN-ADDR.ARPA domain and the domain name field
|
||||
(similar to a PTR RR), a new field called LANGUAGE has been defined.
|
||||
A domain name in an IPTR RR MUST be encoded in UTF8. And IDN in this
|
||||
document MUST be NAMEPREPPED. [NAMEPREP] Below is an example of an
|
||||
IPTR RR:
|
||||
|
||||
1.2.3.4.IN-ADDR.ARPA. IPTR "LANGUAGE" "name-in-utf8"
|
||||
|
||||
[RFC1766] describes the ISO 639/ISO 3166 conventions. A language name
|
||||
is always written in lower case, while country codes are written in
|
||||
upper case. At here, the "LANGUAGE" field in an IPTR RR SHOULD be done
|
||||
in a case-insensitive manner and MUST follow the conventions defined
|
||||
in [RFC1766].
|
||||
|
||||
For Example:
|
||||
|
||||
4.3.2.1.IN-ADDR.ARPA. IPTR "zh-CN" "name-in-utf8"
|
||||
4.3.2.1.IN-ADDR.ARPA. IPTR "zh-TW" "name-in-utf8"
|
||||
4.3.2.1.IN-ADDR.ARPA. IPTR "ja-JP" "name-in-utf8"
|
||||
4.3.2.1.IN-ADDR.ARPA. IPTR "ko-KR" "name-in-utf8"
|
||||
|
||||
The notion of canonical names and aliases described in 3.6.2
|
||||
[RFC1034], and 10.2 [RFC2181] MUST be preserved for IPTR record types.
|
||||
An IPTR RR SHOULD be limited to one primary IDN per LANGUAGE, similar
|
||||
to the a PTR RR.
|
||||
|
||||
3. IPTR on IPv6
|
||||
|
||||
|
||||
|
||||
|
||||
Shi, Jiang [Page 3]
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
|
||||
|
||||
|
||||
Mapping IPv6 to IDNs can be similarly supported. This document recom-
|
||||
mands to continue using the IP6.INT domain defined in [RFC1886] for
|
||||
IPTR mappings. For example, the lookup corresponding to the address
|
||||
4321:0:1:2:3:4:567:89ab would be:
|
||||
b.a.9.8.7.6.5.0.4.0.0.0.3.0.0.0.2.0.0.0.1.0.0.0.0.0.0.0.1.2.3.4.IP6.INT.
|
||||
IPTR "LANGUAGE" "name-in-utf8"
|
||||
|
||||
4. Packet format for IPTR
|
||||
|
||||
EDNS0[RFC2671] is REQUIRED to implement IPTR.
|
||||
|
||||
|
||||
0 1 2 3 4
|
||||
bits 0 1 2 3 4 5 6 7 8 9 0 1...9 0...8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 ...
|
||||
+-+-+-+-+-+-+-+-+-+-//-+-+-//-+-+-+-+-+-+-+-+-+-+-+-+-+-//-+-+-+
|
||||
|0 1| ELT | LANGUAGE | Size | IDN label... |
|
||||
+-+-+-+-+-+-+-+-+-+-//-+-+-//-+-+-+-+-+-+-+-+-+-+-+-+-+-//-+-+-+
|
||||
|
||||
LANGUAGE: An argument for IPTR to define the kind of language
|
||||
used in the following IDN label. The size is 2 octets.
|
||||
ELT: To be defined in [IDNE].
|
||||
|
||||
|
||||
5. Coexistence
|
||||
|
||||
5.1 IDN Consideration
|
||||
|
||||
IPTR described above is based on "a set of IDNs", strictly speaking, a
|
||||
set of canonical IDNs. On the other hand, confusion about IDN, such as
|
||||
"IDN MUST exist with ASCII domain name" has led to a belief that PTR
|
||||
record should have exactly RRs in its RRSet. In short, the phenomenon
|
||||
"IDN ONLY" will exist. Thus, the exceptional handling of PTR is
|
||||
REQUIRED.
|
||||
|
||||
On the other hand, IDN is still RECOMMENDED to exist with more than
|
||||
one ASCII domain name.
|
||||
|
||||
5.2 PTR Extension
|
||||
|
||||
In the case of "IDN ONLY", if IPTR RR is not NULL, PTR RR MUST contain
|
||||
a domain name in ACE to coexist with those IDN unaware systems. Else a
|
||||
"Syntax Error" message SHOULD be sent back, when an administrator con-
|
||||
figures DNS zone files.
|
||||
|
||||
5.3 IPTR and PTR
|
||||
|
||||
It is a kind of backward compatible handle for those IDN unaware sys-
|
||||
tems that can not provide the IPTR function. Besides, if a client can
|
||||
|
||||
|
||||
|
||||
Shi, Jiang [Page 4]
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
|
||||
|
||||
|
||||
not find the corresponding LANGUAGE IDN finally, then the correspond-
|
||||
ing PTR RR SHOULD be used as the answer.
|
||||
|
||||
6. IPTR query/response
|
||||
|
||||
When the QTYPE is IPTR in a query, all of the corresponding IPTR RRs
|
||||
SHOULD be returned in one response. DNS messages are limited to 512
|
||||
octets or less in size when sent over UDP. Therefore, if all the RRs
|
||||
cannot fit in one UDP packet, this draft describe two solutions. One
|
||||
is for recent environment and the other is for the near future.
|
||||
|
||||
6.1 Transport
|
||||
|
||||
Today, DNS queries and responses are carried in UDP datagrams or over
|
||||
TCP connections.[RFC1034] specifies, IPTR RRSet is RECOMMENDED to be
|
||||
returned in one response. The size of a DNS message could exceed 512
|
||||
octets, when multiple RRs are present. Therefore, this draft makes the
|
||||
two following recommendations.
|
||||
|
||||
|
||||
- "Use UDP first, if UDP is not large enough then change to TCP" is
|
||||
RECOMMENDED.
|
||||
|
||||
The server MUST send back the response with the TC bit set. Then
|
||||
the resolver SHOULD resend the query using TCP on server port
|
||||
53(decimal). This behavior is consistent with the current DNS
|
||||
specification [RFC1035].
|
||||
|
||||
|
||||
- In future, EDNS0 is REQUIRED to send large packets.
|
||||
|
||||
Then, before a client send a query to ask for IPTR record, it
|
||||
MUST query the server whether it knows the EDNS0 first. If the
|
||||
server knows EDNS0, then the client MAY send the IPTR query.
|
||||
Else, unfortunally, the client MUST change the QTYPE to PTR.
|
||||
|
||||
Hence, the size of the UDP payload is no longer limited to 512
|
||||
octets any more.
|
||||
|
||||
6.2 Standard sample
|
||||
|
||||
A resolver who wants to find the IDNs corresponding to an IP
|
||||
address 1.2.3.4 whould pursue a query of the form QTYPE=IPTR,
|
||||
QCLASS=IN, QNAME=4.3.2.1.IN-ADDR.ARPA, and would receive:
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Shi, Jiang [Page 5]
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
|
||||
|
||||
|
||||
|
||||
+------------------------------------------------------+
|
||||
Header | OPCODE=SQUERY, RESPONSE, AA |
|
||||
+------------------------------------------------------+
|
||||
Question | QNAME=4.3.2.1.IN-ADDR.ARPA.,QCLASS=IN,QTYPE=IPTR |
|
||||
+------------------------------------------------------+
|
||||
Answer | 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-CN" "name1-in-utf8" |
|
||||
| 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-TW" "name2-in-utf8" |
|
||||
| 4.3.2.1.IN-ADDR.ARPA. IPTR "zh-JP" "name3-in-utf8" |
|
||||
| 4.3.2.1.IN-ADDR.ARPA. IPTR "ko-KR" "name4-in-utf8" |
|
||||
+------------------------------------------------------+
|
||||
Authority | ... |
|
||||
+------------------------------------------------------+
|
||||
Additional | ... |
|
||||
+------------------------------------------------------+
|
||||
|
||||
|
||||
7. IPTR Usage
|
||||
|
||||
The "foo1.example" in following samples MAY or MAY NOT be
|
||||
represented in the same characters.
|
||||
|
||||
4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[foo1.example] in utf8"
|
||||
IPTR "zh-CN" "[foo1.example] in utf8"
|
||||
IPTR "ja-JP" "[foo1.example] in utf8"
|
||||
IPTR "ko-KR" "[foo1.example] in utf8"
|
||||
|
||||
Moreover,
|
||||
|
||||
4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[foo1.example] in utf8"
|
||||
IPTR "zh-TW" "[foo2.example] in utf8"
|
||||
...
|
||||
IPTR "zh-CN" "[foo1.example] in utf8"
|
||||
IPTR "zh-CN" "[foo2.example] in utf8"
|
||||
...
|
||||
IPTR "ja-JP" "[foo1.example] in utf8"
|
||||
IPTR "ja-JP" "[foo2.example] in utf8"
|
||||
...
|
||||
IPTR "ko-KR" "[foo1.example] in utf8"
|
||||
IPTR "ko-KR" "[foo2.example] in utf8"
|
||||
...
|
||||
|
||||
will exist also. And "foo2.example" MUST be different from
|
||||
"foo1.example", if they are in signed with same LANGUAGE. Or a
|
||||
"Syntax Error" SHOULD be sent back, when an administrator config-
|
||||
ures the zone files. Furthermore "foo2.example" in the samples
|
||||
above MAY or MAY NOT be represented in the same characters.
|
||||
|
||||
|
||||
|
||||
|
||||
Shi, Jiang [Page 6]
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
|
||||
|
||||
|
||||
Thus,
|
||||
|
||||
4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[samefoo.sample] in utf8"
|
||||
IPTR "zh-TW" "[samefoo.sample] in utf8"
|
||||
|
||||
occurs a "Syntax Error".
|
||||
|
||||
And,
|
||||
|
||||
4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[samefoo.sample] in utf8"
|
||||
IPTR "zh-TW" "[difffoo.sample] in utf8"
|
||||
IPTR "zh-CN" "[samefoo.sample] in utf8"
|
||||
IPTR "ja-JP" "[samefoo.sample] in utf8"
|
||||
IPTR "ko-KR" "[samefoo.sample] in utf8"
|
||||
|
||||
is allowed.
|
||||
|
||||
8. Changes
|
||||
|
||||
Through the discussion on the IETF49 meeting in San Diego, we
|
||||
deleted the chapter "Open Issues" of our previous draft (version
|
||||
01).
|
||||
|
||||
And,
|
||||
|
||||
4.3.2.1.IN-ADDR.ARPA IPTR "zh-TW" "[samefoo.sample] in utf8"
|
||||
IPTR "zh-TW" "[difffoo.sample] in utf8"
|
||||
IPTR "zh-CN" "[samefoo.sample] in utf8"
|
||||
IPTR "ja-JP" "[samefoo.sample] in utf8"
|
||||
IPTR "ko-KR" "[samefoo.sample] in utf8"
|
||||
|
||||
is allowed.
|
||||
|
||||
8. Changes
|
||||
|
||||
Through the discussion on the IETF49 meeting in San Diego, we
|
||||
deleted the chapter "Open Issues" of our previous draft (version
|
||||
01).
|
||||
|
||||
References
|
||||
|
||||
[IDNREQ] Zita Wenzel & James Seng, "Requirements of International-
|
||||
ized Domain Names", draft-ietf-idn-requirements.
|
||||
|
||||
[IDNE] Marc Blanchet & Paul Hoffman, "Internationalized domain
|
||||
names using EDNS", draft-ietf-idn-idne.
|
||||
|
||||
[NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of
|
||||
|
||||
|
||||
|
||||
Shi, Jiang [Page 7]
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
|
||||
|
||||
|
||||
Internationalized Host Names", draft-ietf-idn-nameprep.
|
||||
|
||||
[RFC1034] P. Mockapetris, "DOMAIN NAMES - CONCEPTS AND FACILITIES",
|
||||
November 1987, RFC1034
|
||||
|
||||
[RFC1035] P. Mockapetris, "DOMAIN NAMES - IMPLEMENTATION AND
|
||||
SPECIFICATION", November 1987, RFC1035
|
||||
|
||||
[RFC1766] H. Alvestrand, "Tags for the Identification of
|
||||
Languages", March 1999, RFC 1766
|
||||
|
||||
[RFC1886] S. Thomson, C. Huitema, "DNS Extensions to support IP
|
||||
version 6", December 1995, RFC1886
|
||||
|
||||
[RFC2181] R. Elz, R. Bush, "Clarifications to the DNS Specifica-
|
||||
tion", July 1997, RFC2181
|
||||
|
||||
[RFC2279] Francois Yergeau, "UTF-8, a transformation format of ISO
|
||||
10646", January 1998, RFC 2279.
|
||||
|
||||
[RFC2671] Paul Vixie, "Extension Mechanisms for DNS (EDNS0)",
|
||||
August 1999, RFC 2671.
|
||||
|
||||
[ISO 639] ISO 639:1988 (E/F) - Code for the representation of names
|
||||
of languages - The International Organization for Standardization,
|
||||
1st edition, 1988 17 pages Prepared by ISO/TC 37 - Terminology
|
||||
(principles and coordination).
|
||||
|
||||
[ISO 3166] ISO 3166:1988 (E/F) - Codes for the representation of
|
||||
names of countries - The International Organization for Standardi-
|
||||
zation, 3rd edition, 1988-08-15.
|
||||
|
||||
Acknowledgements
|
||||
|
||||
James Seng and Yoshiro Yoneya have given many comments in our e-
|
||||
mail discussions. Harald Alvestrand, Mark Davis have given many
|
||||
suggestions in the idn-wg mailing list discussions. And there are
|
||||
also a lot of people who have given us their comments in the idn-wg
|
||||
and BIND-user mailing list discussions.
|
||||
|
||||
Authors' Information
|
||||
|
||||
Hongbo Shi
|
||||
Waseda University
|
||||
3-4-1 Okubo, Shinjyuku-ku
|
||||
Tokyo, 169-8555 Japan
|
||||
shi@goto.info.waseda.ac.jp
|
||||
|
||||
|
||||
|
||||
|
||||
Shi, Jiang [Page 8]
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
INTERNET-DRAFT Internationalized PTR Resource Record 17 May 2001
|
||||
|
||||
|
||||
Jiang Ming Liang
|
||||
i-DNS.net
|
||||
8 Temasek Boulevard
|
||||
#24-02 Suntec Tower Three
|
||||
Singapore 038988
|
||||
jiang@i-DNS.net
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Shi, Jiang [Page 9]
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,655 +0,0 @@
|
||||
IETF IDN Working Group Editors Zita Wenzel, James Seng
|
||||
Internet Draft draft-ietf-idn-requirements-09.txt
|
||||
21 November 2001 Expires 21 May 2002
|
||||
|
||||
Requirements of Internationalized Domain Names
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with
|
||||
all provisions of Section 10 of RFC 2026 [8].
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering
|
||||
Task Force (IETF), its areas, and its working groups. Note that
|
||||
other groups may also distribute working documents as
|
||||
Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six
|
||||
months and may be updated, replaced, or made obsolete by other
|
||||
documents at any time. It is inappropriate to use Internet-
|
||||
Drafts as reference material or to cite them other than as
|
||||
"work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
Intended Scope
|
||||
|
||||
The intended scope of this document is to explore requirements for the
|
||||
internationalization of domain names on the Internet. It is not
|
||||
intended to document user requirements. It is recommended that
|
||||
solutions not necessarily be within the DNS itself, but could be a layer
|
||||
interjected between the application and the DNS. Proposals SHOULD
|
||||
fulfill most, if not all, of the requirements. This document MAY be
|
||||
updated based on actual trials.
|
||||
|
||||
Abstract
|
||||
|
||||
This document describes the requirement for encoding international
|
||||
characters into DNS names and records. This document is guidance for
|
||||
developing protocols for internationalized domain names.
|
||||
|
||||
1. Introduction
|
||||
|
||||
At present, the encoding of Internet domain names is restricted to a
|
||||
subset of 7-bit ASCII (ISO/IEC 646). HTML, XML, IMAP, FTP, and many
|
||||
other text based protocols on the Internet have already been at least
|
||||
partially internationalized. It is important for domain names to be
|
||||
similarly internationalized or for an equivalent solution to be found.
|
||||
This document assumes that the most effective solution involves putting
|
||||
non-ASCII names inside some parts of the overall DNS system although
|
||||
this assumption may not be the consensus of the IETF community.
|
||||
However, several sections of this document, including "Definitions and
|
||||
Conventions" should be useful in any case. A reasonable familiarity
|
||||
with DNS terminology is assumed in this document.
|
||||
|
||||
This document is being discussed on the "idn" mailing list. To join the
|
||||
list, send a message to <majordomo@ops.ietf.org> with the words
|
||||
"subscribe idn" in the body of the message. Archives of the mailing
|
||||
list can also be found at ftp://ops.ietf.org/pub/lists/idn*.
|
||||
|
||||
1.1 Definitions and Conventions
|
||||
|
||||
A language is a way that humans interact. In computerized form, a text
|
||||
in a written language can be expressed as a string of characters.
|
||||
The same set of characters can often be used for many written languages,
|
||||
and many written languages can be expressed using different scripts.
|
||||
The same characters are often shown with somewhat different glyphs
|
||||
(shapes) for display of a text depending on the font used, the
|
||||
automatic shaping applied, or the automatic formation of ligatures. In
|
||||
addition, the same characters can be shown with somewhat different
|
||||
glyphs (shapes) for display of a text depending on the language being
|
||||
used, even within the same font or through automatic font change.
|
||||
|
||||
Character: A character is a member of a set of elements used for
|
||||
organization, control, or representation of textual data.
|
||||
|
||||
Graphic character: A graphic character is a character, other than a
|
||||
control function, that has a visual representation normally
|
||||
handwritten, printed, or displayed.
|
||||
|
||||
Characters mentioned in this document are identified by their position
|
||||
in the Unicode character set. This character set is also
|
||||
known as the UCS (ISO 10646) [19]. The notation U+12AB, for example,
|
||||
indicates the character at position 12AB (hexadecimal) in the Unicode
|
||||
character set. Note that the use of this notation is not an
|
||||
indication of a requirement to use Unicode.
|
||||
|
||||
Examples quoted in this document should be considered as a method to
|
||||
further explain the meanings and principles adopted by the document. It
|
||||
is not a requirement for the protocol to satisfy the examples.
|
||||
|
||||
Unicode Technical Report #17 [24] defines a character encoding
|
||||
model in several levels (much of the text below is quoted from
|
||||
Unicode Technical Report #17).
|
||||
|
||||
[N.B. Sections 1-6 below to be unpacked and and reworded to be
|
||||
independent of the Unicode Technical Report #17.]
|
||||
|
||||
1. A abstract character repertoire (ACR) is defined as the set of
|
||||
abstract characters to be encoded, normally a familiar alphabet
|
||||
or symbol set. The word abstract just means that these objects
|
||||
are defined by convention (such as the 26 letters of the English
|
||||
alphabet, uppercase and lowercase forms). Examples: the ASCII
|
||||
repertoire, the Latin 9 repertoire, the JIS X 0208 repertoire,
|
||||
the UCS repertoire (of a particular version).
|
||||
|
||||
2. A coded character set (CCS) is defined to be a mapping from a
|
||||
set of abstract characters to the set of non-negative integers.
|
||||
This range of integers need not be contiguous. An abstract
|
||||
character is defined to be in a coded character set if the coded
|
||||
character set maps from it to an integer. That integer is said
|
||||
to be the code point for the abstract character. That abstract
|
||||
character is then an encoded character. Examples: ASCII, Latin-15,
|
||||
JIS X 0208, the UCS.
|
||||
|
||||
3. A character encoding form (CEF) is a mapping from the set of integers
|
||||
used in a CCS to the set of sequences of code units. A code unit
|
||||
is an integer occupying a specified binary width in a computer
|
||||
architecture, such as a septet, an octet, or a 16-bit unit. The
|
||||
encoding form enables character representation as actual data in
|
||||
a computer. The sequences of code units do not necessarily have the
|
||||
same length. Examples: ASCII, Latin-15, Shift-JIS, UTF-16, UTF-8.
|
||||
|
||||
4. A character encoding scheme (CES) is a mapping of code units into
|
||||
serialized octet sequences. Character encoding schemes are relevant
|
||||
to the issue of cross-platform persistent data involving code units
|
||||
wider than a byte, where byte-swapping may be required to put data
|
||||
into the byte polarity canonical for a particular platform.
|
||||
|
||||
The CES may involve two or more CCS's, and may include code units
|
||||
(e.g., single shifts, SI/SO, or escape sequences) that are not part
|
||||
of the CCS per se, but which are defined by the character encoding
|
||||
architecture and which may require an external registry of particular
|
||||
values (as for the ISO 2022 escape sequences). In such a case, the
|
||||
CES is called a compound CES. (A CES that only involves a single
|
||||
CCS is called a simple CES.) Examples: ASCII, Latin-15, Shift-JIS,
|
||||
UTF-16BE, UTF-16LE, UTF-8.
|
||||
|
||||
5. The mapping from an abstract character repertoire (ACR) to a
|
||||
serialized sequence of octets is called a Character Map (CM). A simple
|
||||
character map thus implicitly includes a CCS, a CEF, and a CES,
|
||||
mapping from abstract characters to code units to octets. A compound
|
||||
character map includes a compound CES, and thus includes more than one
|
||||
CCS and CEF. In that case, the abstract character repertoire for the
|
||||
character map is the union of the repertoires covered by the coded
|
||||
character sets involved.
|
||||
|
||||
A sequence of encoded characters must be unambiguously
|
||||
mapped onto a sequence of octets by the charset. The charset must be
|
||||
specified in all instances, as in Internet protocols, where textual
|
||||
content is treated as an ordered sequence of octets, and where the
|
||||
textual content must be reconstructible from that sequence of
|
||||
octets. Charset names are registered by the IANA according to
|
||||
procedures documented in RFC 2278 [12]. In many cases, the same
|
||||
name is used for both a character map and for a character encoding
|
||||
scheme, such as UTF-16BE. Typically this is done for simple
|
||||
character maps when such usage is clear from context.
|
||||
|
||||
6. A transfer encoding syntax (TES) is a reversible transform of encoded
|
||||
data which may (or may not) include textual data represented in
|
||||
one or more character encoding schemes. Examples: 8bit,
|
||||
Quoted-Printable, BASE64, UTF-7 (defunct), UTF-5, and RACE.
|
||||
|
||||
1.2 Description of the Domain Name System
|
||||
|
||||
The Domain Name System is defined by RFC 1034 [4] and RFC 1035 [5], with
|
||||
clarifications, extensions and modifications given in RFC 1123 [6],
|
||||
RFC 1996 [7], RFC 2181 [10], and others. Of special importance here are the
|
||||
security extensions described in RFC 2535 [14] and related RFCs.
|
||||
|
||||
Over the years, many different words have been used to describe the
|
||||
components of resource naming on the Internet (e.g., URI, URN); to make
|
||||
certain that the set of terms used in this document are well-defined and
|
||||
non-ambiguous, the definitions are given here.
|
||||
|
||||
Master server: A master server for a zone holds the main copy of that
|
||||
zone. This copy is sometimes stored in a zone file. A slave server for
|
||||
a zone holds a complete copy of the records for that zone. Slave
|
||||
servers MAY be either authorized by the zone owner (secondary servers)
|
||||
or unauthorized (sometimes called "stealth secondaries"). Master and
|
||||
authorized slave servers are listed in the NS records for the zone,
|
||||
and are termed "authoritative" servers. In many contexts outside this
|
||||
document, the term "primary" is used interchangeably with "master" and
|
||||
"secondary" is used interchangeably with "slave".
|
||||
|
||||
Caching server: A caching server holds temporary copies of DNS
|
||||
records; it uses records to answer queries about domain names. Further
|
||||
explanation of these terms can be found in RFC 1034 [4] and RFC 1996
|
||||
[7].
|
||||
|
||||
DNS names can be represented in multiple forms, with different
|
||||
properties for internationalization. The most important ones are:
|
||||
|
||||
- Domain name: The binary representation of a name used internally in
|
||||
the DNS protocol. This consists of a series of components of 1-63
|
||||
octets, with an overall length limited to 255 octets (including the
|
||||
length fields).
|
||||
|
||||
- Master file format domain name: This is a representation of the name
|
||||
as a sequence of characters in some character sets; the common
|
||||
convention (derived from RFC 1035 [5] section 5.1) is to represent the
|
||||
octets of the name as ASCII characters where the octet is in the set
|
||||
corresponding to the ASCII values for [a-z,A-Z,0-9,-], using an escape
|
||||
mechanism (\x or \NNN) where not, and separating the components of the
|
||||
name by the dot character (".").
|
||||
|
||||
The form specified for most protocols using the DNS is a limited form of
|
||||
the master file format domain name. This limited form is defined in
|
||||
RFC 1034 [4] Section 3.5 and RFC 1123 [6]. In most implementations of
|
||||
applications today, domain names in the Internet have been limited to
|
||||
the much more restricted forms used, e.g., in email, which defines its
|
||||
own rules. Those names are limited to the upper- and lower-case
|
||||
letters a-z (interpreted in a case-independent fashion), the digits,
|
||||
and the hyphen-minus, all in ASCII.
|
||||
|
||||
1.3 Definition of "hostname" and "Internationalized Domain Name"
|
||||
|
||||
Hostname:
|
||||
|
||||
In the DNS protocols, a name is referred to as a sequence of octets.
|
||||
However, when discussing requirements for internationalized domain
|
||||
names, what we are looking for are ways to represent characters that
|
||||
are meaningful for humans.
|
||||
|
||||
Internationalized Domain Name:
|
||||
|
||||
In this document, this representation is referred to as a
|
||||
"hostname". While this term has been used for many different purposes
|
||||
over the years, it is used here in the sense of sequence of characters
|
||||
(not octets) representing a domain name conforming to the limited
|
||||
hostname syntax specified in RFC 952 [3]. This document attempts to
|
||||
define the requirements for an "Internationalized Domain Name"
|
||||
(IDN). IDN is defined as a sequence of characters that can be used in
|
||||
the context of functions where a hostname is used today, but contains
|
||||
one or more characters that are outside the set of characters
|
||||
specified as legal characters for host names RFC 1123 [6].
|
||||
|
||||
1.4 A multilayer model of the DNS function
|
||||
|
||||
The DNS can be seen as a multilayer function:
|
||||
|
||||
- The bottom layer is where the packets are passed across the Internet
|
||||
in a DNS query and a DNS response. At this level, what matters is
|
||||
the format and meaning of bits and octets in a DNS packet.
|
||||
|
||||
- Above that is the "DNS service", created by an infrastructure of DNS
|
||||
servers, NS records that point to those DNS servers, that is
|
||||
pointed to by the root servers (listed in the "root cache file" on
|
||||
each DNS server often called "named.cache"). It is at this level
|
||||
that the statement "the DNS has a single root" RFC 2826 [17] makes
|
||||
sense, but still, what is being transferred are octets, not
|
||||
characters.
|
||||
|
||||
- Interfacing to the user is a service layer, often called "the resolver
|
||||
library". It is often embedded in the operating system or system
|
||||
libraries of the client machines. It is at the top of this layer that
|
||||
the API calls commonly known as "gethostbyname" and "gethostbyaddress"
|
||||
reside. These calls are modified to support IPv6 RFC 2553 [15]. A
|
||||
conceptually similar layer exists in authoritative DNS servers,
|
||||
comprising the parts that generate "meaningful" strings in DNS files.
|
||||
Due to the popularity of the "master file" format, this layer often
|
||||
exists only in the administrative routines of the service maintainers.
|
||||
|
||||
- The user of this layer (resolver library) is the application programs
|
||||
that use the DNS, such as mailers, mail servers, Web clients, Web
|
||||
servers, Web caches, IRC clients, FTP clients, distributed file
|
||||
systems, distributed databases, and almost all other applications on
|
||||
TCP/IP.
|
||||
|
||||
Graphically, one can illustrate it like this:
|
||||
|
||||
+---------------+ +---------------------+
|
||||
| Application | | (Base data) |
|
||||
+---------------+ +---------------------+
|
||||
| Application service interface |
|
||||
| For ex. GethostbyXXXX interface | (no standard)
|
||||
+---------------+ +---------------------+
|
||||
| Resolver | | Auth DNS server |
|
||||
+---------------+ +---------------------+
|
||||
| <----- DNS service interface -----> |
|
||||
+------------------------------------------------------------------+
|
||||
| DNS service |
|
||||
| +-----------------------+ +--------------------+ |
|
||||
| | Forwarding DNS server | | Caching DNS server | |
|
||||
| +-----------------------+ +--------------------+ |
|
||||
| |
|
||||
| +-------------------------+ |
|
||||
| | Parent-zone DNS servers | |
|
||||
| +-------------------------+ |
|
||||
| |
|
||||
| +-------------------------+ |
|
||||
| | Root DNS servers | |
|
||||
| +-------------------------+ |
|
||||
| |
|
||||
+------------------------------------------------------------------+
|
||||
|
||||
1.5 Service model of the DNS
|
||||
|
||||
The Domain Name Service is used for multiple purposes, each of which is
|
||||
characterized by what it puts into the system (the query) and what it
|
||||
expects as a result (the reply).
|
||||
|
||||
The most used ones in the current DNS are:
|
||||
|
||||
- Hostname-to-address service (A, AAAA, A6): Enter a hostname, and get
|
||||
back an IPv4 or IPv6 address.
|
||||
|
||||
- Hostname-to-mail server service (MX): As above, but the expected
|
||||
return value is a hostname and a priority for SMTP servers.
|
||||
|
||||
- Address-to-hostname service (PTR): Enter an IPv4 or IPv6 address (in
|
||||
in-addr.arpa. or ip6.arpa form respectively) and get back a hostname.
|
||||
|
||||
- Domain delegation service (NS). Enter a domain name and get back
|
||||
nameserver records (designated hosts which provide authoritive
|
||||
nameservice) for the domain.
|
||||
|
||||
New services are being defined, either as entirely new services (IPv6 to
|
||||
hostname mapping using binary labels) or as embellishments to other
|
||||
services such as DNS Security (DNSSEC) [14], returning information
|
||||
about whether a given DNS service is performed securely or not).
|
||||
|
||||
These services exist, conceptually, at the Application/Resolver
|
||||
interface, NOT at the DNS-service interface. This document attempts to
|
||||
set requirements for an equivalent of the "used services" given above,
|
||||
where "hostname" is replaced by "Internationalized Domain Name". This
|
||||
does not preclude the fact that IDN should work with any kind of DNS
|
||||
queries. IDN is a new service. Since existing protocols like SMTP or
|
||||
HTTP use the old service, it is a matter of great concern how the new
|
||||
and old services work together, and how other protocols can take
|
||||
advantage of the new service.
|
||||
|
||||
2. General Requirements
|
||||
|
||||
These requirements address two concerns: The service offered to the
|
||||
users (the application service), and the protocol extensions, if needed,
|
||||
added to support this service.
|
||||
|
||||
In the requirements, we attempt to use the term "service" whenever a
|
||||
requirement concerns the service, and "protocol" whenever a requirement
|
||||
is believed to constrain the possible implementation.
|
||||
|
||||
2.1 Compatibility and Interoperability
|
||||
|
||||
[1] The DNS is essential to the entire Internet. Therefore, the service
|
||||
MUST NOT damage present DNS protocol interoperability. It MUST make the
|
||||
minimum number of changes to existing protocols on all layers of the
|
||||
stack. It MUST continue to allow any system anywhere that implements
|
||||
the IDN specification to resolve any internationalized domain name.
|
||||
|
||||
[2] The service MUST preserve the basic concept and facilities of domain
|
||||
names as described in RFC 1034 [4]. It MUST maintain a single, global,
|
||||
universal, and consistent hierarchical namespace.
|
||||
|
||||
[3] The DNS protocol (the packet formats that go on the wire) MUST
|
||||
NOT limit the codepoints that can be used. A service defined on top of
|
||||
the DNS, for instance the IDN-to-address function, MAY limit the
|
||||
codepoints that can be used. The service descriptions MUST describe
|
||||
what limitations are imposed.
|
||||
|
||||
[4] The protocol MUST work for all features of DNS, IPv4, and
|
||||
IPv6. The protocol MUST NOT allow an IDN to be returned to a requestor
|
||||
that requests the IP-to-(old)-domain-name mapping service.
|
||||
|
||||
[5] The same name resolution request MUST generate the same response,
|
||||
regardless of the location or localization settings in the resolver, in
|
||||
the master server, and in any slave servers involved in the resolution
|
||||
process.
|
||||
|
||||
[6] The protocol MUST NOT require that the current DNS cache
|
||||
servers be modified to support IDN. If a cache server can have
|
||||
additional functionality to support IDN better, this additional
|
||||
functionality MUST NOT cause problems for resolving correctly
|
||||
functioning current domain names.
|
||||
|
||||
[7] A caching server MUST NOT return data in response to a query that
|
||||
would not have been returned if the same query had been presented to an
|
||||
authoritative server. This applies fully for the cases when:
|
||||
|
||||
- The caching server does not know about IDN
|
||||
- The caching server implements the whole specification
|
||||
- The caching server implements a valid subset of the specification
|
||||
|
||||
[8] The service MAY modify the DNS protocol RFC 1035 [5] and other related
|
||||
work undertaken by the DNS Extensions (DNSEXT) [2] working group. However,
|
||||
these changes SHOULD be as small as possible and any changes SHOULD be
|
||||
coordinated with the DNSEXT working group.
|
||||
|
||||
[9] The protocol supporting the service SHOULD be as simple as possible
|
||||
from the user's perspective. Ideally, users SHOULD NOT realize that IDN
|
||||
was added on to the existing DNS.
|
||||
|
||||
[10] The best solution is one that maintains maximum feasible
|
||||
compatibility with current DNS standards as long as it meets the other
|
||||
requirements in this document.
|
||||
|
||||
[11] The protocol should handle with care new revisions of the CCS.
|
||||
Undefined codepoints should not be allowed unless a new revision of
|
||||
the protocol can handle it. Protocol revisions should be tagged.
|
||||
|
||||
2.2 Internationalization
|
||||
|
||||
[12] Internationalized characters MUST be allowed to be represented and
|
||||
used in DNS names and records. The protocol MUST specify what charset is
|
||||
used when resolving domain names and how characters are encoded in DNS
|
||||
records.
|
||||
|
||||
[13] Codepoints SHOULD be from the Universal Set as defined in
|
||||
ISO-10646 or Unicode. The specifics of versions MUST be defined in the
|
||||
proposed solution. If multiple charsets are allowed, each charset MUST
|
||||
be tagged and conform to RFC 2277 [11].
|
||||
|
||||
[14] The protocol MUST NOT reject any non-IDN characters (to be
|
||||
defined) in any DNS queries or responses.
|
||||
|
||||
[15] The protocol SHOULD NOT invent a new CCS for the purpose of IDN
|
||||
only and SHOULD use an existing CES. The charset(s) chosen SHOULD also be
|
||||
non-ambiguous.
|
||||
|
||||
[16] The protocol SHOULD NOT make any assumptions about the location
|
||||
in a domain name where internationalization might appear. In other
|
||||
words, it SHOULD NOT differentiate between any part of a domain name
|
||||
because this MAY impose restrictions on future internationalization
|
||||
efforts. For example, the Top-Level Domains (TLDs) can be
|
||||
internationalized.
|
||||
|
||||
[17] The protocol also SHOULD NOT make any localized restrictions in the
|
||||
protocol. For example, an IDN implementation which only allows domain
|
||||
names to use a single local script would immediately restrict
|
||||
multinational organization.
|
||||
|
||||
[18] While there are a wide range of devices that use the DNS and a wide
|
||||
range of characteristics of international scripts and methods of
|
||||
domain name input and display, IDN is only concerned with the
|
||||
protocol. Therefore, there MUST be a single way of encoding an
|
||||
internationalized domain name within the DNS.
|
||||
|
||||
2.3 Canonicalization
|
||||
|
||||
Matching rules are a complicated process for IDN. Canonicalization
|
||||
of characters MUST follow precise and predictable rules to ensure
|
||||
consistency. "Requirements for String Identity Matching and String
|
||||
Indexing" is RECOMMENDED as a guide on canonicalization.
|
||||
|
||||
The DNS has to match a host name in a request with a host name held
|
||||
in one or more zones. It also needs to sort names into order. It is
|
||||
expected that some sort of canonicalization algorithm will be used as
|
||||
the first step of this process. This section discusses some of the
|
||||
properties which will be REQUIRED of that algorithm.
|
||||
|
||||
[19] To achieve interoperability, canonicalization MUST be done at a
|
||||
single well-defined place in the DNS resolution process. The protocol
|
||||
MUST specify canonicalization; it MUST specify exactly where in the
|
||||
DNS that canonicalization happens and does not happen; it MUST specify
|
||||
how additions to ISO 10646 will affect the stability of the DNS and
|
||||
the amount of work done on the root DNS servers.
|
||||
|
||||
[20] The canonicalization algorithm MAY specify operations for case,
|
||||
ligature, and punctuation folding.
|
||||
|
||||
[21] In order to retain backward compatibility with the current DNS,
|
||||
the service MUST retain the case-insensitive comparison for US-ASCII
|
||||
as specified in RFC 1035 [5]. For example, Latin capital letter A
|
||||
(U+0041) MUST match Latin small letter a (U+0061). Unicode Technical
|
||||
Report #21 [25] describes some of the issues with case
|
||||
mapping. Case-insensitivity for non US-ASCII MUST be discussed in the
|
||||
protocol proposal.
|
||||
|
||||
[22] Case folding MUST be locale independent. If it were
|
||||
locale-dependent, then different clients would get different results.
|
||||
For example, Latin capital letter I (U+0049) case folded to lower case
|
||||
in the Turkish context will become Latin small letter dotless i
|
||||
(U+0131). But in the English context, it will become Latin small
|
||||
letter i (U+0069).
|
||||
|
||||
[23] If other canonicalization is done, it MUST be done before the
|
||||
domain name is resolved. Further, the canonicalization MUST be easily
|
||||
upgradable as new languages and writing systems are added.
|
||||
|
||||
[24] Any conversion (case, ligature folding, punctuation folding, etc)
|
||||
from what the user enters into a client to what the client asks for
|
||||
resolution MUST be done identically on any request from any client.
|
||||
|
||||
[25] If the charset can be normalized, then it SHOULD be normalized
|
||||
before it is used in IDN. Normalization SHOULD follow Unicode
|
||||
Technical Report #15 [23].
|
||||
|
||||
[26] The protocol SHOULD avoid inventing a new normalization form
|
||||
provided a technically sufficient one is available.
|
||||
|
||||
2.4 Operational Issues
|
||||
|
||||
[27] Zone files SHOULD remain easily editable.
|
||||
|
||||
[28] An IDN-capable resolver or server SHALL NOT generate more traffic
|
||||
than a non-IDN-capable resolver or server would when resolving an
|
||||
ASCII-only domain name. The amount of traffic generated when resolving
|
||||
an IDN SHALL be similar to that generated when resolving an ASCII-only
|
||||
name.
|
||||
|
||||
[29] The service SHOULD NOT add new centralized administration for the
|
||||
DNS. A domain administrator SHOULD be able to create internationalized
|
||||
names as easily as adding current domain names.
|
||||
|
||||
[30] The protocol MUST work with DNSSEC. The protocol MAY break
|
||||
language sort order.
|
||||
|
||||
3. Security Considerations
|
||||
|
||||
Any solution that meets the requirements in this document MUST NOT be
|
||||
less secure than the current DNS. Specifically, the mapping of
|
||||
internationalized host names to and from IP addresses MUST have the
|
||||
same characteristics as the mapping of today's host names.
|
||||
|
||||
Specifying requirements for internationalized domain names does not
|
||||
itself raise any new security issues. However, any change to the DNS MAY
|
||||
affect the security of any protocol that relies on the DNS or on
|
||||
DNS names. A thorough evaluation of those protocols for security
|
||||
concerns will be needed when they are developed. In particular, IDNs
|
||||
MUST be compatible with DNSSEC and, if multiple charsets or
|
||||
representation forms are permitted, the implications of this name-spoof
|
||||
MUST be throughly understood.
|
||||
|
||||
4. References
|
||||
|
||||
[1] World Wide Web Consortium, "Requirements for string identity
|
||||
matching and String Indexing", http://www.w3.org/TR/WD-charreq, July
|
||||
1998.
|
||||
|
||||
[2] Olafur Gudmundson, Randy Bush, "IETF DNS Extensions Working Group"
|
||||
(DNSEXT), namedroppers@ops.ietf.org.
|
||||
|
||||
[3] K. Harrenstien, M.K. Stahl, E.J. Feinler, "DoD Internet Host Table
|
||||
Specification", RFC 952, October 1985.
|
||||
|
||||
[4] P. Mockapetris, "Domain Names - Concepts and Facilities",
|
||||
RFC 1034, November 1987.
|
||||
|
||||
[5] P. Mockapetris, "Domain Names - Implementation and
|
||||
Specification", RFC 1035, November 1987.
|
||||
|
||||
[6] R. Braden, "Requirements for Internet Hosts -- Application and
|
||||
Support", RFC 1123, October 1989.
|
||||
|
||||
[7] P. Vixie, "A Mechanism for Prompt Notification of Zone Changes
|
||||
(DNS NOTIFY)", RFC 1996, August 1996.
|
||||
|
||||
[8] S. Bradner, "The Internet Standards Process -- Revision 3", RFC
|
||||
2026, October 1996.
|
||||
|
||||
[9] S. Bradner, "Key words for use in RFCs to Indicate Requirement
|
||||
Levels", RFC 2119, March 1997.
|
||||
|
||||
[10] R. Elz, R. Bush, "Clarifications to the DNS Specification",
|
||||
RFC 2181, July 1997.
|
||||
|
||||
[11] H. Alvestrand, "IETF Policy on Character Sets and Languages", RFC
|
||||
2277, January 1998.
|
||||
|
||||
[12] N. Freed and J. Postel, "IANA Charset Registration Procedures",
|
||||
RFC 2278, January 1998.
|
||||
|
||||
[13] F. Yergeau, "UTF-8, a transformation format of ISO 10646", RFC
|
||||
2279, January 1998.
|
||||
|
||||
[14] D. Eastlake, "Domain Name System Security Extensions", RFC 2535,
|
||||
March 1999.
|
||||
|
||||
[15] R. Gilligan et al, "Basic Socket Interface Extensions for IPv6",
|
||||
RFC 2553, March 1999.
|
||||
|
||||
[16] L. Daigle et al, "A Tangled Web: Issues of I18N, Domain Names,
|
||||
and the Other Internet protocols", RFC 2825, May 2000.
|
||||
|
||||
[17] Internet Architecture Board, "IAB Technical Comment on the Unique DNS
|
||||
Root", RFC 2826, May 2000.
|
||||
|
||||
[18] P. Hoffman, "Comparison of Internationalized Domain Name
|
||||
Proposals", draft-ietf-idn-compare-00.txt, June 2000.
|
||||
|
||||
[19] ISO/IEC 10646-1:2000 (note that an amendment 1 is in
|
||||
preparation), ISO/IEC 10646-2 (in preparation), plus corrigenda and
|
||||
amendments to these standards.
|
||||
|
||||
[20] The Unicode Consortium, "The Unicode Standard". Described at
|
||||
http://www.unicode.org/unicode/standard/versions/.
|
||||
|
||||
[21] The Unicode Consortium, "The Unicode Standard -- Version 3.0",
|
||||
ISBN 0-201-61633-5. Same repertoire as ISO/IEC 10646-1:2000. Described
|
||||
at http://www.unicode.org/unicode/standard/versions/Unicode3.0.html.
|
||||
|
||||
[22] Coded Character Set -- 7-bit American Standard Code for
|
||||
Information Interchange, ANSI X3.4-1986; also: ISO/IEC 646 (IRV).
|
||||
|
||||
[23] M. Davis and M. Duerst, Unicode Consortium, "Unicode
|
||||
Normalization Forms", Unicode Standard Annex #15,
|
||||
http://www.unicode.org/unicode/reports/tr15/, 2000-08-31.
|
||||
|
||||
[24] K. Whistler and M. Davis, Unicode Consortium, "Character Encoding
|
||||
Model", Unicode Technical Report #17,
|
||||
http://www.unicode.org/unicode/reports/tr17/, 2000-08-31.
|
||||
|
||||
[25] M. Davis, Unicode Consortium, "Case Mappings", Unicode Technical
|
||||
Report #21, http://www.unicode.org/unicode/reports/tr21/, 2000-09-12.
|
||||
|
||||
|
||||
5. Editors' Contact
|
||||
|
||||
Zita Wenzel, Ph.D.
|
||||
Information Sciences Institute
|
||||
University of Southern California
|
||||
4676 Admiralty Way
|
||||
Marina del Rey, CA
|
||||
90292 USA
|
||||
Tel: +1 310 448 8462
|
||||
Fax: +1 310 823 6714
|
||||
zita@isi.edu
|
||||
|
||||
James Seng
|
||||
i-DNS.net International Pte Ltd.
|
||||
8 Temesek Boulevand
|
||||
#24-02 Suntec Tower 3
|
||||
Singapore 038988
|
||||
Tel: +65 248 6208
|
||||
Fax: +65 248 6198
|
||||
Email: jseng@pobox.org.sg
|
||||
|
||||
6. Acknowledgements
|
||||
|
||||
The editors gratefully acknowledge the contributions of:
|
||||
|
||||
Harald Tveit Alvestrand <Harald@Alvestrand.no>
|
||||
Mark Andrews <Mark.Andrews@nominum.com>
|
||||
RJ Atkinson <request not to have email>
|
||||
Alan Barret <apb@cequrux.com>
|
||||
Marc Blanchet <blanchet@mailviagenie.qc.ca>
|
||||
Randy Bush <randy@psg.com>
|
||||
Andrew Draper <ADRAPER@altera.com>
|
||||
Martin Duerst <duerst@w3.org>
|
||||
Patrik Faltstrom <paf@swip.net>
|
||||
Ned Freed <ned.freed@innosoft.com>
|
||||
Olafur Gudmundsson <ogud@ogud.com>
|
||||
Paul Hoffman <phoffman@imc.org>
|
||||
Simon Josefsson <jas+idn@pdc.kth.se>
|
||||
Kent Karlsson <keka@im.se>
|
||||
John Klensin <klensin+idn@jck.com>
|
||||
Tan Juay Kwang <tanjk@i-dns.net>
|
||||
Dongman Lee <dlee@icu.ac.kr>
|
||||
Bill Manning <bmanning@ISI.EDU>
|
||||
Dan Oscarsson <Dan.Oscarsson@trab.se>
|
||||
J. William Semich <bill@mail.nic.nu>
|
||||
Yoshiro Yoneda <yone@nic.ad.jp>
|
@ -1,557 +0,0 @@
|
||||
Internet Draft Dan Oscarsson
|
||||
draft-ietf-idn-udns-03.txt Telia ProSoft
|
||||
Updates: RFC 2181, 1035, 1034, 2535 19 August 2001
|
||||
Expires: 19 February 2002
|
||||
|
||||
Using the Universal Character Set in the Domain Name System (UDNS)
|
||||
|
||||
Status of this memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with
|
||||
all provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering
|
||||
Task Force (IETF), its areas, and its working groups. Note that other
|
||||
groups may also distribute working documents as Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet-Drafts as reference
|
||||
material or to cite them other than as "work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
Since the Domain Name System (DNS) [RFC1035] was created there have
|
||||
been a desire to use other characters than ASCII in domain names.
|
||||
Lately this desire have grown very strong and several groups have
|
||||
started to experiment with non-ASCII names. This document defines
|
||||
how the Universal Character Set (UCS) [ISO10646] is to be used in
|
||||
DNS. It includes both a transition scheme for older software
|
||||
supporting non-ASCII handling in applications only, as well as how to
|
||||
use UCS in labels and having more than 63 octets in a label.
|
||||
|
||||
|
||||
1. Introduction
|
||||
|
||||
While the need for non-ASCII domain names have existed since the
|
||||
creation of the DNS, the need have increased very much during the
|
||||
last few years. Currently there are at least two implementations
|
||||
using UTF-8 in use, and others using other methods.
|
||||
|
||||
To avoid several different implementations of non-ASCII names in DNS
|
||||
|
||||
|
||||
|
||||
Dan Oscarsson Expires: 19 February 2002 [Page 1]
|
||||
|
||||
Internet Draft Universal DNS 19 August 2001
|
||||
|
||||
|
||||
that do not work together, and to avoid breaking the current ASCII
|
||||
only DNS, there is an immediate need to standardise how DNS shall
|
||||
handle non-ASCII names.
|
||||
|
||||
While the DNS protocol allow any octet in character data, so far the
|
||||
octets are only defined for the ASCII code points. Octets outside the
|
||||
ASCII range have no defined interpretation. This document defines how
|
||||
all octets are to be used in character data allowing a standardised
|
||||
way to use non-ASCII in DNS.
|
||||
|
||||
The specification here conforms to the IDN requirements [IDNREQ].
|
||||
|
||||
1.1 Terminology
|
||||
|
||||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
||||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
|
||||
document are to be interpreted as described in [RFC2119].
|
||||
|
||||
IDN: Internationalised Domain Name, here used to mean a domain name
|
||||
containing non-ASCII characters.
|
||||
|
||||
ACE: ASCII Compatible Encoding. Used to encode IDNs in a way
|
||||
compatible with the ASCII host name syntax.
|
||||
|
||||
1.2 Previous versions of this document
|
||||
|
||||
This version contains just minor corrections to the 4:th version.
|
||||
|
||||
The third version of this document included a way to return both
|
||||
ASCII and non-ASCII versions of a name. As this could not be
|
||||
guaranteed to work it has been removed.
|
||||
|
||||
The second version of this document was available as draft-ietf-idn-
|
||||
udns-00.txt. It included a lot of possibilities as well as a flag bit
|
||||
that is now removed.
|
||||
|
||||
The first version of this document was available as draft-oscarsson-
|
||||
i18ndns-00.txt.
|
||||
|
||||
|
||||
2. The DNS Protocol
|
||||
|
||||
The DNS protocol is used when communicating between DNS servers and
|
||||
other DNS servers or DNS clients. User interface issues like the
|
||||
format of zone files or how to enter or display domain names are not
|
||||
part of the protocol.
|
||||
|
||||
The update of the protocol defined here can be used immediately as it
|
||||
|
||||
|
||||
|
||||
Dan Oscarsson Expires: 19 February 2002 [Page 2]
|
||||
|
||||
Internet Draft Universal DNS 19 August 2001
|
||||
|
||||
|
||||
is fully compatible with the DNS of today.
|
||||
|
||||
For a long time there will be software understanding UCS in DNS and
|
||||
software only understanding ASCII in DNS. It is therefore necessary
|
||||
to support a mixing of both types. For the following text software
|
||||
understanding UCS in DNS will be called UDNS aware.
|
||||
|
||||
This specification supports the following scenarios:
|
||||
|
||||
- UDNS unaware client, UDNS aware DNS server
|
||||
- UDNS aware client, UDNS unaware DNS server
|
||||
- UDNS aware client, UDNS aware DNS server
|
||||
|
||||
|
||||
2.1 Fundamentals
|
||||
|
||||
2.1.1 Standard Character Encoding (SCE)
|
||||
|
||||
Character data need to be able to represent as much as possible of
|
||||
the characters in the world as well as being compatible with ASCII.
|
||||
Character data is used in labels and in text fields in the RDATA part
|
||||
of a RR.
|
||||
|
||||
The Standard Character Encoding of character data used in the DNS
|
||||
protocol MUST:
|
||||
- Use ISO 10646 (UCS) [ISO10646] as coded character set.
|
||||
- Be normalised using form C as defined in Unicode technical report
|
||||
#15 [UTR15]. See also [CHNORM].
|
||||
- Encoded using the UTF-8 [RFC2279] character encoding scheme.
|
||||
|
||||
2.1.2 Binary Comparison Format (BCF)
|
||||
|
||||
RFC 1035 states that the labels of a name are matched case-
|
||||
insensitively. When using UCS this is no longer enough as there are
|
||||
other forms than case that need to match as equivalent. Form-
|
||||
insensitive matching of UCS includes:
|
||||
- Letters of different case are compared as the same character.
|
||||
- Code points of primary typographical variations of the same
|
||||
character are compared as the same character. An example is double
|
||||
width/normal width characters or presentation forms of a
|
||||
character.
|
||||
- Some characters are represented with multiple code points in UCS.
|
||||
All code points of one character must compare as the same. For
|
||||
example the degree Kelvin sign is the same as the letter K.
|
||||
|
||||
The original definition is now extended to be: labels must be
|
||||
compared using form-insensitivity.
|
||||
|
||||
|
||||
|
||||
|
||||
Dan Oscarsson Expires: 19 February 2002 [Page 3]
|
||||
|
||||
Internet Draft Universal DNS 19 August 2001
|
||||
|
||||
|
||||
To handle form-insensitivity it is here defined the Binary Comparison
|
||||
Format (BCF) to which strings can be mapped. After strings is mapped
|
||||
to BCF they can be compared using binary string comparison.
|
||||
Implementors may implement the form-insensitive comparison without
|
||||
using BCF, as long as the results are the same.
|
||||
|
||||
Mapping of a label to BCF is typically done by steps like: changing
|
||||
all upper case letters to lower case, mapping different forms to one
|
||||
form and changing different code points of one character into a
|
||||
single code point.
|
||||
|
||||
For the UCS character code range 0-255 (ASCII and ISO 8859-1) the BCF
|
||||
MUST be done by mapping all upper case characters to lower case
|
||||
following the one to one mapping as defined in the Unicode 3.0
|
||||
Character Database [UDATA].
|
||||
|
||||
The definition of the Binary Comparison Format (BCF) for the rest of
|
||||
UCS will be defined in a separate document. The nearest today is
|
||||
[NAMEPREP].
|
||||
|
||||
2.1.3 Backward Compatibility Encoding (BCE)
|
||||
|
||||
To support older software expecting only ASCII and to support
|
||||
downgrading from 8-bit to 7-bit ASCII in other protocols (like SMTP)
|
||||
a Backward Compatibility Encoding (BCE) is available. It is a
|
||||
transition mechanism and will no longer be supported at some future
|
||||
time when it is so decided.
|
||||
|
||||
The Backward Compatibility Encoding (BCE) of a label is defined as
|
||||
the BCF of the label encoded using an ASCII Compatible Encoding
|
||||
(ACE).
|
||||
|
||||
The definition of the ACE to be used, is defined in a separate
|
||||
document. Typical definitions that are suitable are [SACE] and
|
||||
[RACE].
|
||||
|
||||
The reason that the BCF form of the label is used is to support
|
||||
solutions where only applications know about non-ASCII labels. By
|
||||
using BCF the server need not know about UCS and can just do binary
|
||||
matching so it can be handled in old servers. Though due to the fact
|
||||
that BCF destroys information contained in the original form of a
|
||||
label it is impossible to return the original form to a client using
|
||||
BCE.
|
||||
|
||||
2.1.4 Long names
|
||||
|
||||
The current DNS protocol limits a label to 63 octets. As UTF-8 take
|
||||
more than one octet for some characters, an UTF-8 name cannot have 63
|
||||
|
||||
|
||||
|
||||
Dan Oscarsson Expires: 19 February 2002 [Page 4]
|
||||
|
||||
Internet Draft Universal DNS 19 August 2001
|
||||
|
||||
|
||||
characters in a label like an ASCII name can. For example a name
|
||||
using Hangul would have a maximum of 21 characters.
|
||||
|
||||
The limits imposed by RFC 1035 is 63 octets per label and 255 octets
|
||||
for the full name. The 255 limit is not a protocol limit but one to
|
||||
simplify implementations.
|
||||
|
||||
To support longer names a long label type is defined using [RFC2671]
|
||||
as extended label 0b000011 (the label type will be assigned by IANA
|
||||
and may not be the number used here).
|
||||
|
||||
1 1 1 1 1 1 1 1 1 1
|
||||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|
||||
|0 1 0 0 0 0 1 1| length | label data ...
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|
||||
|
||||
length: length of label in octets
|
||||
label data: the label
|
||||
|
||||
The long label MUST be handled by all software following this
|
||||
specification. Also, they MUST support a UDP packet size of up to
|
||||
1280 bytes.
|
||||
|
||||
The limits for labels are updated since RFC 1025 as follows:
|
||||
A label is limited to a maximum of 63 character code points in UCS
|
||||
normalised using Unicode form C. The full name is limited to a
|
||||
maximum of 255 character code points normalised as for a label.
|
||||
|
||||
A long label MUST always use the Standard Character Encoding (SCE).
|
||||
|
||||
As long labels are not understood by older software, a response MUST
|
||||
not include a long label unless the query did. At a later date, IETF
|
||||
may change this.
|
||||
|
||||
|
||||
2.2 Rules for matching of domain names in UDNS aware DNS servers
|
||||
|
||||
To be able to handle correct domain name matching in lookups, the
|
||||
following MUST be followed by DNS servers:
|
||||
- Do matching on authorative data using form-insensitive matching
|
||||
for the characters used in the data (for example a zone using only
|
||||
ASCII need only handle matching of ASCII characters).
|
||||
- On non-authorative data, either do binary matching or case-
|
||||
insensitive matching on ASCII letters and binary matching on all
|
||||
others.
|
||||
|
||||
The effect of the above is:
|
||||
|
||||
|
||||
|
||||
Dan Oscarsson Expires: 19 February 2002 [Page 5]
|
||||
|
||||
Internet Draft Universal DNS 19 August 2001
|
||||
|
||||
|
||||
- only servers handling authorative data must implement form-
|
||||
insensitive matching of names. And they need only implement the
|
||||
subset needed for the subset of characters of UCS they support in
|
||||
their authorative zones.
|
||||
- it normally gives fast lookup because data is usually sent like:
|
||||
resolver <-> server <-> authorative server.
|
||||
While form-insensitive matching can be complex and CPU consuming,
|
||||
the server in the middle will do caching with only simple and fast
|
||||
binary matching. So the impact of complex matching rules should
|
||||
not slow down DNS very much.
|
||||
|
||||
2.3 Mixing of UDNS aware and non-UDNS aware clients and servers
|
||||
|
||||
To handle the mixing of UDNS aware and non-UDNS aware clients and
|
||||
servers the following MUST be followed for clients and servers.
|
||||
|
||||
2.3.1 Native UDNS aware client
|
||||
|
||||
A native UDNS aware client is a client supporting all in this
|
||||
document.
|
||||
|
||||
When doing a query it MUST:
|
||||
- Use the long label in the QNAME.
|
||||
- If server rejected query due to long label, retry the query using
|
||||
the normal short label. If the QNAME contains non-ASCII it must be
|
||||
encoded using BCE.
|
||||
- Handle answers containg BCE.
|
||||
|
||||
The client may skip trying a query using the long label if it knows
|
||||
the server does not understand it.
|
||||
|
||||
2.3.2 Application based UDNS aware client
|
||||
|
||||
An application based UDNS aware client is a client supporting UDNS
|
||||
through BCE handling in the application.
|
||||
|
||||
It only understands BCE and need only a non-UDNS aware resolver to
|
||||
work. All encoding and decoding of BCE is handled in the
|
||||
application.
|
||||
|
||||
Due to BCE being an ACE of BCF the names returned in an answer need
|
||||
not contain the real form of the name. Instead it may contains the
|
||||
simplified form used in name matching. As this is a transition
|
||||
mechanism to support non-ASCII in names before the DNS servers have
|
||||
been upgraded, it is acceptable and will give people a reason to
|
||||
upgrade.
|
||||
|
||||
2.3.3 non-UDNS aware client
|
||||
|
||||
|
||||
|
||||
Dan Oscarsson Expires: 19 February 2002 [Page 6]
|
||||
|
||||
Internet Draft Universal DNS 19 August 2001
|
||||
|
||||
|
||||
A non-UDNS aware client will send ASCII or whatever is sent from an
|
||||
application. It can be BCE which will for the client just be ASCII
|
||||
text.
|
||||
|
||||
2.3.4 UDNS aware server
|
||||
|
||||
An UDNS aware server MUST handle all in this document and follow:
|
||||
- If an incoming query contains a long label the answer may contain
|
||||
a long label and the client is identified as being UDNS aware.
|
||||
- If the query comes from a non-UDNS aware client and the answer
|
||||
contains non-ASCII, the non-ASCII labels must be encoded using
|
||||
BCE.
|
||||
- If a short label is used in a query and the QNAME contains non-
|
||||
ASCII, an authorative server must handle the query if the
|
||||
character encoding can be recognised. If must recognise SCE and
|
||||
should recognise common encodings used for the labels in the
|
||||
domain it is authorative for. Answers will use BCE for all labels
|
||||
except the one matching QNAME. This will allow clients using the
|
||||
local character set to work in many cases before the resolver code
|
||||
is upgraded.
|
||||
|
||||
2.3.5 non-UDNS aware server
|
||||
|
||||
A non-UDNS server can only handle ASCII matching when comparing
|
||||
names. It can support the transition mechanism with BCE. The
|
||||
authorative zones will then have to be loaded with manually BCE
|
||||
encoded names.
|
||||
|
||||
2.4 DNSSEC
|
||||
|
||||
As labels now can have non-ASCII in them, DNSSEC [RFC2535] need to be
|
||||
revised so that it also can handle that.
|
||||
|
||||
|
||||
3. Effect on other protocols
|
||||
|
||||
As now a domain name may include non-ASCII many other protocols that
|
||||
include domain names need to be updated. For example SMTP, HTTP and
|
||||
URIs. The BCE format can be used when interfacing with ASCII only
|
||||
software or protocols. Protocols like SMTP could be extended using
|
||||
ESMTP and a UTF8 option that defines that all headers are in UTF-8.
|
||||
|
||||
It is recommended that protocols updated to handle i18n do this by
|
||||
encoding character data in the same standard format as defined for
|
||||
DNS in this document (UCS normalised form C). The use of encoding it
|
||||
in ASCII or by tagged character sets should be avoided.
|
||||
|
||||
DNS do not only have domain names in them, for example e-mail
|
||||
|
||||
|
||||
|
||||
Dan Oscarsson Expires: 19 February 2002 [Page 7]
|
||||
|
||||
Internet Draft Universal DNS 19 August 2001
|
||||
|
||||
|
||||
addresses are also included. So an e-mail address would be expected
|
||||
to be changed to include non-ASCII both before and after the @-sign.
|
||||
|
||||
Software need to be updated to follow the user interface
|
||||
recommendations given above, so that a human will see the characters
|
||||
in their local character set, if possible.
|
||||
|
||||
4. Security Considerations
|
||||
|
||||
As always with data, if software does not check for data that can be
|
||||
a problem, security may be affected. As more characters than ASCII is
|
||||
allowed, software only expecting ASCII and with no checks may now get
|
||||
security problems.
|
||||
|
||||
5. References
|
||||
|
||||
[RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities",
|
||||
STD 13, RFC 1034, November 1987.
|
||||
|
||||
[RFC1035] P. Mockapetris, "Domain Names - Implementation and
|
||||
Specification", STD 13, RFC 1035, November 1987.
|
||||
|
||||
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
|
||||
Requirement Levels", March 1997, RFC 2119.
|
||||
|
||||
[RFC2181] R. Elz and R. Bush, "Clarifications to the DNS
|
||||
Specification", RFC 2181, July 1997.
|
||||
|
||||
[RFC2279] F. Yergeau, "UTF-8, a transformation format of ISO 10646",
|
||||
RFC 2279, January 1998.
|
||||
|
||||
[RFC2535] D. Eastlake, "Domain Name System Security Extensions".
|
||||
RFC 2535, March 1999.
|
||||
|
||||
[RFC2671] P. Vixie, "Extension Mechanisms for DNS (EDNS0)", RFC
|
||||
2671, August 1999.
|
||||
|
||||
[ISO10646] ISO/IEC 10646-1:2000. International Standard --
|
||||
Information technology -- Universal Multiple-Octet Coded
|
||||
Character Set (UCS)
|
||||
|
||||
[Unicode] The Unicode Consortium, "The Unicode Standard -- Version
|
||||
3.0", ISBN 0-201-61633-5. Described at
|
||||
http://www.unicode.org/unicode/standard/versions/
|
||||
Unicode3.0.html
|
||||
|
||||
[UTR15] M. Davis and M. Duerst, "Unicode Normalization Forms",
|
||||
Unicode Technical Report #15, Nov 1999,
|
||||
|
||||
|
||||
|
||||
Dan Oscarsson Expires: 19 February 2002 [Page 8]
|
||||
|
||||
Internet Draft Universal DNS 19 August 2001
|
||||
|
||||
|
||||
http://www.unicode.org/unicode/reports/tr15/.
|
||||
|
||||
[UTR21] M. Davis, "Case Mappings", Unicode Technical Report #21,
|
||||
Dec 1999, http://www.unicode.org/unicode/reports/tr21/.
|
||||
|
||||
[UDATA] The Unicode Character Database,
|
||||
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt.
|
||||
The database is described in
|
||||
ftp://ftp.unicode.org/Public/UNIDATA/
|
||||
UnicodeCharacterDatabase.html.
|
||||
|
||||
[IDNREQ] James Seng, "Requirements of Internationalized Domain
|
||||
Names", draft-ietf-idn-requirement.
|
||||
|
||||
[IANADNS] Donald Eastlake, Eric Brunner, Bill Manning, "Domain Name
|
||||
System (DNS) IANA Considerations",draft-ietf-dnsext-iana-dns.
|
||||
|
||||
[IDNE] Marc Blanchet,Paul Hoffman, "Internationalized domain
|
||||
names using EDNS (IDNE)", draft-ietf-idn-idne.
|
||||
|
||||
[CHNORM] M. Duerst, M. Davis, "Character Normalization in IETF
|
||||
Protocols", draft-duerst-i18n-norm.
|
||||
|
||||
[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
|
||||
Proposals", draft-ietf-idn-compare.
|
||||
|
||||
[NAMEPREP] Paul Hoffman, "Comparison of Internationalized Domain Name
|
||||
Proposals", draft-ietf-idn-compare.
|
||||
|
||||
[SACE] Dan Oscarsson, "Simple ASCII Compatible Encoding", draft-
|
||||
ietf-idn-sace.
|
||||
|
||||
[RACE] Paul Hoffman, "RACE: Row-based ASCII Compatible Encoding
|
||||
for IDN", draft-ietf-idn-race.
|
||||
|
||||
6. Acknowledgements
|
||||
|
||||
Paul Hoffman giving many comments in our e-mail discussions.
|
||||
|
||||
Ideas from drafts by Paul Hoffman, Stuart Kwan, James Gilroy and Kent
|
||||
Karlsson.
|
||||
|
||||
Magnus Gustavsson, Mark Davis, Kent Karlsson and Andrew Draper for
|
||||
comments on my draft.
|
||||
|
||||
Discussions and comments by the members of the IDN working group.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Dan Oscarsson Expires: 19 February 2002 [Page 9]
|
||||
|
||||
Internet Draft Universal DNS 19 August 2001
|
||||
|
||||
|
||||
Author's Address
|
||||
|
||||
Dan Oscarsson
|
||||
Telia ProSoft AB
|
||||
Box 85
|
||||
201 20 Malmo
|
||||
Sweden
|
||||
|
||||
E-mail: Dan.Oscarsson@trab.se
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Dan Oscarsson Expires: 19 February 2002 [Page 10]
|
||||
|
@ -1,442 +0,0 @@
|
||||
|
||||
|
||||
|
||||
|
||||
Network Working Group M. Duerst
|
||||
Internet-Draft W3C
|
||||
Expires: May 4, 2003 November 3, 2002
|
||||
|
||||
|
||||
Internationalized Domain Names in URIs
|
||||
draft-ietf-idn-uri-03
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with
|
||||
all provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering
|
||||
Task Force (IETF), its areas, and its working groups. Note that
|
||||
other groups may also distribute working documents as Internet-
|
||||
Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet-Drafts as reference
|
||||
material or to cite them other than as "work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at http://
|
||||
www.ietf.org/ietf/1id-abstracts.txt.
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
This Internet-Draft will expire on May 4, 2003.
|
||||
|
||||
Copyright Notice
|
||||
|
||||
Copyright (C) The Internet Society (2002). All Rights Reserved.
|
||||
|
||||
Abstract
|
||||
|
||||
This document proposes to upgrade the definition of URIs (RFC 2396)
|
||||
[RFC2396] to work consistently with internationalized domain names.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Duerst Expires May 4, 2003 [Page 1]
|
||||
Internet-Draft IDNs in URIs November 2002
|
||||
|
||||
|
||||
Table of Contents
|
||||
|
||||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
|
||||
2. URI syntax changes . . . . . . . . . . . . . . . . . . . . . . 3
|
||||
3. Security considerations . . . . . . . . . . . . . . . . . . . 5
|
||||
4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 5
|
||||
5. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 5
|
||||
5.1 Changes from draft-ietf-idn-uri-02 to draft-ietf-idn-uri-03 . 5
|
||||
5.2 Changes from draft-ietf-idn-uri-01 to draft-ietf-idn-uri-02 . 5
|
||||
5.3 Changes from draft-ietf-idn-uri-00 to draft-ietf-idn-uri-01 . 5
|
||||
References . . . . . . . . . . . . . . . . . . . . . . . . . . 6
|
||||
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 7
|
||||
Full Copyright Statement . . . . . . . . . . . . . . . . . . . 8
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Duerst Expires May 4, 2003 [Page 2]
|
||||
Internet-Draft IDNs in URIs November 2002
|
||||
|
||||
|
||||
1. Introduction
|
||||
|
||||
Internet domain names serve to identify hosts and services on the
|
||||
Internet in a convenient way. The IETF IDN working group [IDNWG] has
|
||||
been working on extending the character repertoire usable in domain
|
||||
names beyond a subset of US-ASCII.
|
||||
|
||||
One of the most important places where domain names appear are
|
||||
Uniform Resource Identifiers (URIs, [RFC2396], as modified by
|
||||
[RFC2732]). However, in the current definition of the generic URI
|
||||
syntax, the restrictions on domain names are 'hard-coded'. In
|
||||
Section 2, this document relaxes these restrictions by updating the
|
||||
syntax, and defines how internationalized domain names are encoded in
|
||||
URIs.
|
||||
|
||||
The syntax in this document has been chosen to further increase the
|
||||
uniformity of URI syntax, which is a very important principle of
|
||||
URIs.
|
||||
|
||||
In practice, escaped domain names should be used as rarely as
|
||||
possible. Wherever possible, the actual characters in
|
||||
Internationalized Domain Names should be preserved as long as
|
||||
possible by using IRIs [IRI] rather than URIs, and only converting to
|
||||
URIs and then to ACE-encoded [IDNA] domain names (or ideally directly
|
||||
to ACE-encoding without even using URIs) when resolving the IRI.
|
||||
Also, this document does not exclude the use of ACE encoding directly
|
||||
in an URI domain name part. ACE encoding may be used directly in an
|
||||
URI domain name part if this is considered necessary for
|
||||
interoperability.
|
||||
|
||||
Please note that even with the definition of URIs in [RFC2396], some
|
||||
URIs can already contain host names with escaped characters. For
|
||||
example, mailto:example@w%33.org is legal per [RFC2396] because the
|
||||
mailto: URI scheme does not follow the generic syntax of [RFC2396].
|
||||
|
||||
2. URI syntax changes
|
||||
|
||||
The syntax of URIs [RFC2396] currently contains the following rules
|
||||
relevant to domain names:
|
||||
|
||||
hostname = *( domainlabel "." ) toplabel [ "." ]
|
||||
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
|
||||
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Duerst Expires May 4, 2003 [Page 3]
|
||||
Internet-Draft IDNs in URIs November 2002
|
||||
|
||||
|
||||
The later two rules are changed as follows:
|
||||
|
||||
domainlabel = anchar | anchar *( anchar | "-" ) anchar
|
||||
toplabel = achar | achar *( anchar | "-" ) anchar
|
||||
|
||||
and the following rules are added:
|
||||
|
||||
anchar = alphanum | escaped
|
||||
achar = alpha | escaped
|
||||
|
||||
Characters outside the repertoire (alphanum) are encoded by first
|
||||
encoding the characters in UTF-8 [RFC 2279], resulting in a sequence
|
||||
of octets, and then escaping these octets according to the rules
|
||||
defined in [RFC2396].
|
||||
|
||||
Using UTF-8 assures that this encoding interoperates with IRIs [IRI].
|
||||
It is also aligned with the recommendations in [RFC2277] and
|
||||
[RFC2718], and is consistent with the URN syntax [RFC2141] as well as
|
||||
recent URL scheme definitions that define encodings of non-ASCII
|
||||
characters based on UTF-8 (e.g., IMAP URLs [RFC2192] and POP URLs
|
||||
[RFC2384]).
|
||||
|
||||
The above syntax rules permit for domain names that are neither
|
||||
permitted as US-ASCII only domain names nor as internationalized
|
||||
domain names. However, such domain names should never be used, and
|
||||
will never be resolved because no such domains will be registered.
|
||||
For US-ASCII only domain names, the syntax rules in [RFC2396] are
|
||||
relevant. For example, http://www.w%33.org is legal, because the
|
||||
corresponding 'w3' is a legal 'domainlabel' according to [RFC2396].
|
||||
However, http://%2a.example.org is illegal because the corresponding
|
||||
'*' is not a legal 'domainlabel' according to [RFC2396].
|
||||
|
||||
For domain names containing non-ASCII characters, the legal domain
|
||||
names are those for which the ToASCII operation ([IDNA], [Nameprep];
|
||||
using the unescaped UTF-8 values as input), with the flags
|
||||
"UseSTD3ASCIIRules" and "AllowUnassigned" set, is successful. The
|
||||
URI resolver MUST apply any steps required as part of domain name
|
||||
resolution by [IDNA], in particular the ToASCII operation, with the
|
||||
above-mentioned flags set. URIs where the ToASCII operation results
|
||||
in an error should be treated as unresolvable.
|
||||
|
||||
For domain names containing non-ASCII characters, the Nameprep
|
||||
specification ([Nameprep]) defines some mappings, which mainly
|
||||
include normalization to NFKC and folding to lower case. When
|
||||
encoding an internationalized domain name in an URI, these mappings
|
||||
SHOULD NOT be applied. It should be assumed that the domain name is
|
||||
already normalized as far as appropriate.
|
||||
|
||||
|
||||
|
||||
|
||||
Duerst Expires May 4, 2003 [Page 4]
|
||||
Internet-Draft IDNs in URIs November 2002
|
||||
|
||||
|
||||
For consistency in comparison operations and for interoperability
|
||||
with older software, the following should be noted: 1) US-ASCII
|
||||
characters in domain names should not be escaped. 2) Because of the
|
||||
principle of syntax uniformity for URIs, it is always more prudent to
|
||||
take into account the possibility that US-ASCII characters are
|
||||
escaped.
|
||||
|
||||
3. Security considerations
|
||||
|
||||
The security considerations of [RFC2396] and those applying to
|
||||
internationalized domain names apply. There may be an increased
|
||||
potential to smuggle escaped US-ASCII-based domain names across
|
||||
firewalls, although because of the uniform syntax principle for URIs,
|
||||
such a potential is already existing.
|
||||
|
||||
4. Acknowledgements
|
||||
|
||||
Erik Nordmark
|
||||
|
||||
5. Change Log
|
||||
|
||||
5.1 Changes from draft-ietf-idn-uri-02 to draft-ietf-idn-uri-03
|
||||
|
||||
Clarified expectations on name checking.
|
||||
|
||||
5.2 Changes from draft-ietf-idn-uri-01 to draft-ietf-idn-uri-02
|
||||
|
||||
Moved change log to back
|
||||
|
||||
Changed to only change URIs; IRI syntax updated directly in IRI
|
||||
draft.
|
||||
|
||||
Removed syntax restriction on %hh in the US-ASCII part, but made
|
||||
clear that restrictions to domain names apply.
|
||||
|
||||
Made clear that escaped domain names in URIs should only be an
|
||||
intermediate representation.
|
||||
|
||||
Gave example of mailto: as already allowing escaped host names.
|
||||
|
||||
Corrected some typos.
|
||||
|
||||
5.3 Changes from draft-ietf-idn-uri-00 to draft-ietf-idn-uri-01
|
||||
|
||||
Changed requirement for URI/IRI resolvers from MUST to SHOULD
|
||||
|
||||
Changed IRI syntax slightly (ichar -> idchar, based on changes in
|
||||
[IRI])
|
||||
|
||||
|
||||
|
||||
Duerst Expires May 4, 2003 [Page 5]
|
||||
Internet-Draft IDNs in URIs November 2002
|
||||
|
||||
|
||||
Various wording changes
|
||||
|
||||
References
|
||||
|
||||
[IDNA] Faltstrom, P., Hoffman, P. and A. Costello,
|
||||
"Internationalizing Domain Names in Applications (IDNA)",
|
||||
draft-ietf-idn-idna-14.txt (work in progress), October
|
||||
2002, <http://www.ietf.org/internet-drafts/draft-ietf-
|
||||
idn-idna-14.txt>.
|
||||
|
||||
[IDNWG] "IETF Internationalized Domain Name (idn) Working Group".
|
||||
|
||||
[IRI] Duerst, M. and M. Suignard, "Internationalized Resource
|
||||
Identifiers (IRI)", draft-duerst-iri-02.txt (work in
|
||||
progress), November 2002, <http://www.ietf.org/internet-
|
||||
drafts/draft-duerst-iri-02.txt>.
|
||||
|
||||
[ISO10646] International Organization for Standardization,
|
||||
"Information Technology - Universal Multiple-Octet Coded
|
||||
Character Set (UCS) - Part 1: Architecture and Basic
|
||||
Multilingual Plane", ISO Standard 10646-1, October 2000.
|
||||
|
||||
[Nameprep] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
|
||||
Profile for Internationalized Domain Names", draft-ietf-
|
||||
idn-nameprep-11.txt (work in progress), June 2002,
|
||||
<http://www.ietf.org/internet-drafts/draft-ietf-idn-
|
||||
nameprep-11.txt>.
|
||||
|
||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
|
||||
Requirement Levels", BCP 14, RFC 2119, March 1997.
|
||||
|
||||
[RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997.
|
||||
|
||||
[RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997.
|
||||
|
||||
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
|
||||
Languages", BCP 18, RFC 2277, January 1998.
|
||||
|
||||
[RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO
|
||||
10646", RFC 2279, January 1998.
|
||||
|
||||
[RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998.
|
||||
|
||||
[RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
|
||||
Resource Identifiers (URI): Generic Syntax", RFC 2396,
|
||||
August 1998.
|
||||
|
||||
[RFC2640] Curtin, B., "Internationalization of the File Transfer
|
||||
|
||||
|
||||
|
||||
Duerst Expires May 4, 2003 [Page 6]
|
||||
Internet-Draft IDNs in URIs November 2002
|
||||
|
||||
|
||||
Protocol", RFC 2640, July 1999.
|
||||
|
||||
[RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke,
|
||||
"Guidelines for new URL Schemes", RFC 2718, November
|
||||
1999.
|
||||
|
||||
[RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for
|
||||
Literal IPv6 Addresses in URL's", RFC 2732, December
|
||||
1999.
|
||||
|
||||
|
||||
Author's Address
|
||||
|
||||
Martin Duerst
|
||||
World Wide Web Consortium
|
||||
200 Technology Square
|
||||
Cambridge, MA 02139
|
||||
U.S.A.
|
||||
|
||||
Phone: +1 617 253 5509
|
||||
Fax: +1 617 258 5999
|
||||
EMail: duerst@w3.org
|
||||
URI: http://www.w3.org/People/D%C3%BCrst/
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Duerst Expires May 4, 2003 [Page 7]
|
||||
Internet-Draft IDNs in URIs November 2002
|
||||
|
||||
|
||||
Full Copyright Statement
|
||||
|
||||
Copyright (C) The Internet Society (2002). All Rights Reserved.
|
||||
|
||||
This document and translations of it may be copied and furnished to
|
||||
others, and derivative works that comment on or otherwise explain it
|
||||
or assist in its implementation may be prepared, copied, published
|
||||
and distributed, in whole or in part, without restriction of any
|
||||
kind, provided that the above copyright notice and this paragraph are
|
||||
included on all such copies and derivative works. However, this
|
||||
document itself may not be modified in any way, such as by removing
|
||||
the copyright notice or references to the Internet Society or other
|
||||
Internet organizations, except as needed for the purpose of
|
||||
developing Internet standards in which case the procedures for
|
||||
copyrights defined in the Internet Standards process must be
|
||||
followed, or as required to translate it into languages other than
|
||||
English.
|
||||
|
||||
The limited permissions granted above are perpetual and will not be
|
||||
revoked by the Internet Society or its successors or assigns.
|
||||
|
||||
This document and the information contained herein is provided on an
|
||||
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
|
||||
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
|
||||
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
|
||||
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
||||
|
||||
Acknowledgement
|
||||
|
||||
Funding for the RFC Editor function is currently provided by the
|
||||
Internet Society.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Duerst Expires May 4, 2003 [Page 8]
|
@ -1,505 +0,0 @@
|
||||
IETF IDN Working Group Sung Jae Shim
|
||||
Internet Draft DualName, Inc.
|
||||
Document: draft-ietf-idn-vidn-01.txt 2 March 2001
|
||||
Expires: 2 September 2001
|
||||
|
||||
|
||||
|
||||
Virtually Internationalized Domain Names (VIDN)
|
||||
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with all
|
||||
provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering Task Force
|
||||
(IETF), its areas, and its working groups. Note that other groups may also
|
||||
distribute working documents as Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months and may be
|
||||
updated, replaced, or obsoleted by other documents at any time. It is
|
||||
inappropriate to use Internet-Drafts as reference material or to cite them other
|
||||
than as "work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
|
||||
1. Abstract
|
||||
|
||||
This document proposes a method that enables domain names to be used in both
|
||||
local and English scripts, as a directory-search solution at an upper layer
|
||||
above the DNS. The method first converts virtual domain names typed in local
|
||||
scripts into the corresponding domain names in English scripts that comply with
|
||||
the DNS, using the knowledge of transliteration between local and English
|
||||
scripts. Then, the method searches for and displays domain names in English
|
||||
scripts that are active on the Internet so that the user can choose any of them.
|
||||
The conversion takes place automatically and transparently in the user's
|
||||
applications before DNS queries are sent, and so, the method does not make any
|
||||
change to the DNS nor require separate name servers.
|
||||
|
||||
|
||||
2. Conventions and definitions used in this document
|
||||
|
||||
The key words "REQUIRED" and "MAY" in this document are to be interpreted as
|
||||
described in RFC-2119 [1].
|
||||
|
||||
A "host" is a computer or device attached to the Internet. A "user host" is a
|
||||
computer or device with which a user is connected to the Internet, and a "user"
|
||||
is a person who uses a user host. A "server host" is a computer or device that
|
||||
provides services to user hosts.
|
||||
|
||||
An "entity" is an organization or individual that has a domain name registered
|
||||
with the DNS.
|
||||
|
||||
A "local language" is a language other than English language that a user prefers
|
||||
to use in a local context. "Local scripts" are scripts of a local language and
|
||||
"English scripts" are scripts of English language.
|
||||
|
||||
A "virtual domain name" is a domain name in local scripts, and it is not
|
||||
registered with the DNS but used for the convenience of users. An "English
|
||||
domain name" is a domain name in English scripts. A "domain name" refers to an
|
||||
English domain name that complies with the DNS, unless specified otherwise.
|
||||
|
||||
A "coded portion" is a pre-coded portion of a domain name (e.g., generic codes
|
||||
including 'com', 'edu', 'gov', 'int', 'mil', 'net', 'org', and country codes
|
||||
such as 'kr', 'jp', 'cn', and so on). An "entity-defined portion" is a portion
|
||||
of a domain name, which is defined by the entity that holds the domain name
|
||||
(e.g., host name, organization name, server name, and so on).
|
||||
|
||||
The method proposed in this document is called "virtually internationalized
|
||||
domain names (VIDN)," as it enables domain names in English scripts to be used
|
||||
virtually in local scripts.
|
||||
|
||||
A number of Korean-language characters are used in the original of this document
|
||||
for examples, which is available from the author upon request. The software used
|
||||
for Internet-Drafts does not allow using multilingual characters other than
|
||||
ASCII characters. Thus, this document may not display Korean-language characters
|
||||
properly, although it may be comprehensible without the examples using Korean-
|
||||
language characters. Also, when you open the original of this document, please
|
||||
select your view encoding type to Korean for Korean-language characters to be
|
||||
displayed properly.
|
||||
|
||||
|
||||
3. Introduction
|
||||
|
||||
Domain names are valuable to Internet users as a main identifier of entities and
|
||||
resources on the Internet. The DNS allows using only English scripts in naming
|
||||
hosts or clusters of hosts on the Internet. More specifically, the DNS uses only
|
||||
the basic Latin alphabets (case-insensitive), the decimal digits (0-9) and the
|
||||
hyphen (-) in domain names. But there is a growing need for internationalized
|
||||
domain names in local scripts. Recognizing this need, various methods have been
|
||||
proposed to use local scripts in domain names. But to date, no method appears to
|
||||
meet all the requirements of internationalized domain names as described in
|
||||
Wenzel and Seng [2].
|
||||
|
||||
A group of earlier methods tries to put internationalized domain names in local
|
||||
scripts inside some parts of the overall DNS, using special encoding schemes of
|
||||
Universal Character Set (UCS). But these methods put too much of a burden on the
|
||||
DNS, requiring a great deal of work for transition and update of the DNS
|
||||
components and the applications working with the DNS. Another group of earlier
|
||||
methods tries to build separate directory services for internationalized domain
|
||||
names or keywords in local scripts. But these methods also require complex
|
||||
implementation efforts, duplicating much of the work already done for the DNS.
|
||||
Both the groups of earlier methods require creating internationalized domain
|
||||
names or keywords in local scripts from scratch, which is a costly and lengthy
|
||||
process on the parts of the DNS and Internet users. Further, domain names or
|
||||
keywords created in local scripts are usable only by those who know the local
|
||||
scripts, and so, they may segregate the Internet into many groups of different
|
||||
sets of local scripts that are less universal than English scripts.
|
||||
|
||||
VIDN intends to provide a more immediate and less costly solution to
|
||||
internationalized domain names than earlier methods. VIDN does not make any
|
||||
change to the DNS nor require creating additional domain names in local scripts.
|
||||
VIDN takes notice of the fact that many domain names currently used in regions
|
||||
where English scripts are not widely used have their entity-defined portions
|
||||
consisting of English scripts as transliterated from the respective local
|
||||
scripts. Using this knowledge of transliteration between local and English
|
||||
scripts, VIDN converts virtual domain names typed in local scripts into the
|
||||
corresponding domain names in English scripts that comply with the DNS. In this
|
||||
way, VIDN enables the same domain names to be used not only in English scripts
|
||||
as usual but also in local scripts, without creating additional domain names in
|
||||
local scripts.
|
||||
|
||||
|
||||
4. VIDN method
|
||||
|
||||
4.1. Objectives
|
||||
|
||||
Earlier methods of internationalized domain names try to create domain names or
|
||||
keywords in local scripts one way or another in addition to existing domain
|
||||
names in English scripts, and put them inside or outside the DNS, using special
|
||||
encoding schemes or lookup services. These methods require a lengthy and costly
|
||||
process of creating domain names in local scripts and updating the DNS
|
||||
components and applications. Even when they are successfully implemented, these
|
||||
methods have a risk of localizing the Internet by segregating it into groups of
|
||||
different sets of local scripts that are less universal than English scripts and
|
||||
so diminishing the international scope of the Internet. Further, these methods
|
||||
may cause more problems and disputes on copyrights, trademarks, and so on, in
|
||||
local contexts than those that we experience with current domain names in
|
||||
English scripts.
|
||||
|
||||
VIDN intends to provide a solution to the problems of earlier methods of
|
||||
internationalized domain names. VIDN enables the same domain names to be used in
|
||||
both English scripts as usual and local scripts, and so, there is no need to
|
||||
create domain names in local scripts in addition to domain names in English
|
||||
scripts. VIDN works automatically and transparently in applications at user
|
||||
hosts before DNS requests are sent, and so, there is no need to make any change
|
||||
to the DNS or to have additional name servers. For these reasons as well as
|
||||
others, VIDN can be implemented more immediately with less cost than other
|
||||
methods of internationalized domain names.
|
||||
|
||||
4.2. Description
|
||||
|
||||
It is important to note that most domain names used in regions where English
|
||||
scripts are not widely used have their entity-defined portions consisting of
|
||||
English scripts as transliterated from local scripts. Of course, there are many
|
||||
domain names in those regions that do not follow this kind of transliteration
|
||||
between local and English scripts. In such case, new domain names in English
|
||||
scripts need to be created following this transliteration, but the number would
|
||||
be minimal, compared to the number of internationalized domain names in local
|
||||
scripts to be created and registered under other methods.
|
||||
|
||||
The English scripts transliterated from local scripts do not have any meanings
|
||||
in English language, but their originals in local scripts before the
|
||||
transliteration have some meanings in the respective local language, usually
|
||||
indicating organization names, brand names, trademarks, and so on. VIDN enables
|
||||
to use these original local scripts as the entity-defined portions of virtual
|
||||
domain names in local scripts, by transliterating them into the corresponding
|
||||
entity-defined portions of actual domain names in English scripts. In this way,
|
||||
VIDN enables the same domain names in English scripts to be used virtually in
|
||||
local scripts without actually creating domain names in local scripts.
|
||||
|
||||
As domain names in English scripts overlay IP addresses, so virtual domain names
|
||||
in local scripts do actual domain names in English scripts. The relationship
|
||||
between virtual domain names in local scripts and actual domain names in English
|
||||
scripts can be depicted as:
|
||||
|
||||
+---------------------------------+
|
||||
| User |
|
||||
+---------------------------------+
|
||||
| |
|
||||
+----------------|-----------------------|------------------+
|
||||
| v (Transliteration) v |
|
||||
| +---------------------+ | +-----------------------+ |
|
||||
| | Virtual domain name | | | Actual domain name | |
|
||||
| | in local scripts |--+->| in English scripts | |
|
||||
| +---------------------+ +-----------------------+ |
|
||||
| User application | |
|
||||
+----------------------------------------|------------------+
|
||||
v
|
||||
DNS requests
|
||||
|
||||
VIDN uses the phonemes of local and English scripts as a medium in
|
||||
transliterating the entity-defined portions of virtual domain names in local
|
||||
scripts into those of actual domain names in English scripts. This process of
|
||||
transliteration can be depicted as:
|
||||
|
||||
Local scripts English scripts
|
||||
+----------------------------+ +-----------------------------+
|
||||
| Characters ----> Phonemes -----------> Phonemes ----> Characters |
|
||||
| | | | | | |
|
||||
| | | | | | |
|
||||
| (Inverse of transcription) | Match | (Transcription) |
|
||||
+----------------------------+ +-----------------------------+
|
||||
| ^
|
||||
| (Transliteration) |
|
||||
+------------------------------------+
|
||||
|
||||
First, each entity-defined portion of a virtual domain name typed in local
|
||||
scripts is decomposed into individual characters or sets of characters so that
|
||||
each individual character or set of characters can represent an individual
|
||||
phoneme of the local language. This is the inverse of transcription of phonemes
|
||||
into characters. Second, each individual phoneme of the local language is
|
||||
matched with an equivalent phoneme of English language that has the same or most
|
||||
proximate sound. Third, each phoneme of English language is transcribed into the
|
||||
corresponding character or set of characters in English language. Finally, all
|
||||
the characters or sets of characters converted into English scripts are united
|
||||
to compose the corresponding entity-defined portion of an actual domain name in
|
||||
English scripts.
|
||||
|
||||
For example, a word in Korean language, '˜‚' that means 'century' in English
|
||||
language, is transliterated into 'segi' in English scripts, and so, the entity
|
||||
whose name contains '˜‚' in Korean language may have an entity-defined portion
|
||||
of its domain name as 'segi' in English scripts. VIDN enables to use '˜‚' as
|
||||
an entity-defined portion of a virtual domain name in Korean scripts, which is
|
||||
converted into 'segi,' the corresponding entity-defined portion of an actual
|
||||
domain name in English scripts. In other words, the phonemes represented by the
|
||||
characters consisting of '˜‚' in Korean scripts have the same sounds as the
|
||||
phonemes represented by the characters consisting of 'segi' in English scripts.
|
||||
In the local context, '˜‚' in Korean scripts is clearly easier to remember and
|
||||
type and more intuitive and meaningful than 'segi' in English scripts.
|
||||
|
||||
An entity-defined portion of a virtual domain name in Korean scripts, '¾ž®ý', is
|
||||
transliterated into 'yahoo' in English scripts, since the phonemes represented
|
||||
by the characters consisting of '¾ž®ý' in Korean scripts have the same sounds as
|
||||
the phonemes represented by the characters consisting of 'yahoo' in English
|
||||
scripts. That is, '¾ž®ý' in Korean scripts is pronounced as the same as 'yahoo'
|
||||
in English scripts, and so, it is easy for Korean-speaking people to deduce '¾ž
|
||||
®ý' in Korean scripts as the virtual equivalent of 'yahoo' in English scripts.
|
||||
VIDN enables to use virtual domain names in local scripts for domain names whose
|
||||
originals are in local scripts, e.g., '˜‚' in Korean scripts, as well as
|
||||
domain names whose originals are in English scripts, e.g., '¾ž®ý' in Korean
|
||||
scripts. In this way, VIDN is able to make domain names truly international,
|
||||
allowing the same domain names to be used both in English and local scripts.
|
||||
|
||||
The coded portions of domain names such as generic codes and country codes can
|
||||
also be transliterated from local scripts into English scripts, using their
|
||||
phonemes as a medium. For example, seven generic codes in English scripts, 'com',
|
||||
'edu', 'gov', 'int', 'mil', 'net', and 'org', can be transliterated from 'ýý', '
|
||||
Àí´€', '—<>¦Š', 'ÁðË«', ' Ï', 'þÚË«', 'ÀÁ˜Ú' in Korean scripts, respectively,
|
||||
which can be used as the corresponding generic codes of virtual domain names in
|
||||
Korean scripts. Based upon its meaning in English language, each coded portion
|
||||
of actual domain names also can be pre-assigned a virtual equivalent word or
|
||||
code in local scripts. For example, seven generic codes in English scripts,
|
||||
'com', 'edu', 'gov', 'int', 'mil', 'net', and 'org', can be pre-assigned '˜‚¾•'
|
||||
(meaning 'commercial' in Korean language), 'ÌϘþ' (meaning 'education' in Korean
|
||||
language), 'Âñ¦ð' (meaning 'government' in Korean language), '˜ ª' (meaning
|
||||
'international' in Korean language), '˜¦À‹' (meaning 'military' in Korean
|
||||
language), 'þÚË«' (meaning 'network' in Korean language), and '³›È' (meaning
|
||||
'organization' in Korean language), respectively, which can be used as the
|
||||
corresponding generic codes of virtual domain names in Korean scripts.
|
||||
|
||||
VIDN does not create such complexities as other conversion methods based upon
|
||||
semantics do, since it uses phonemes as a medium of transliteration between
|
||||
local and English scripts. Further, most languages have a small number of
|
||||
phonemes. For example, Korean language has nineteen consonant phonemes and
|
||||
twenty-one vowel phonemes, and English language has twenty-four consonant
|
||||
phonemes and twenty vowel phonemes. Each phoneme of Korean language can be
|
||||
matched with a phoneme of English language that has the same or proximate sound,
|
||||
and vice versa.
|
||||
|
||||
Some characters or sets of characters may represent more than one phoneme. Some
|
||||
phonemes may be represented by more than one character or set of characters.
|
||||
Also, not every character or set of characters in local scripts may be neatly
|
||||
transliterated into only one character or set of characters in English scripts.
|
||||
In practice, people often transliterate the same local scripts differently into
|
||||
English scripts or vice versa. VIDN incorporates the provisions to deal with
|
||||
those variations that usually occur in particular situations as well as those
|
||||
variations that are caused by common usage or idiomatic expressions. More
|
||||
fundamentally, VIDN uses phonemes, which are very universal across different
|
||||
languages, as a medium of transliteration rather than following a certain set of
|
||||
transliteration rules that does not exist in many non-English-speaking countries
|
||||
nor is followed by many non-English-speaking people.
|
||||
|
||||
One virtual domain name typed in local scripts may be converted into more than
|
||||
one possible domain name in English scripts. In such case, VIDN can search for
|
||||
and displays only those domain names in English scripts that are active on the
|
||||
Internet, so that the user can choose any of them. Further, VIDN can be used as
|
||||
a directory-search solution at an upper layer above the DNS. That is, the user
|
||||
can use VIDN to query a phoneme-based domain name request in local scripts,
|
||||
receive one or more corresponding domain names in English or ASCII-compatible
|
||||
scripts preferably, choose one based upon the results of that search, and make
|
||||
the final DNS request using any protocol or method to be chosen for
|
||||
internationalized domain names. In this regard of directory search, VIDN uses
|
||||
one-to-many map between virtual domain names in local scripts and actual domain
|
||||
names in English scripts.
|
||||
|
||||
VIDN needs the one-to-many mapping and subsequent multiple DNS lookups only at
|
||||
the first query of each virtual domain name typed in local scripts at the user
|
||||
host. After the first query, the virtual domain name is set to the domain name
|
||||
in English scripts that has been chosen at the first query. Any subsequent
|
||||
queries with the same virtual domain name generate only one query with the
|
||||
selected domain name in English scripts. Once the use selects one possible
|
||||
domain name in English scripts from the list, VIDN remembers the user's
|
||||
selection and directs the user to the same domain name at his or her subsequent
|
||||
queries with that virtual domain name. In this way, VIDN can generate less
|
||||
traffic on the DNS, while providing faster, easier, and simpler navigation on
|
||||
the Internet to the user, using local scripts.
|
||||
|
||||
Utilizing a coding scheme, VIDN is also capable of making each virtual domain
|
||||
name typed in local scripts correspond to exactly one actual domain name in
|
||||
English scripts. In this coding scheme, a unique code such as the Unicode or
|
||||
hexadecimal code represented by the virtual domain name, is pre-assigned to one
|
||||
of the corresponding domain names in English scripts and stored in the
|
||||
respective server host, so that both the user host and the server host can
|
||||
support and understand the code. Then, VIDN checks whether the code at each
|
||||
server host matches with the code generated at the user host. If one of the
|
||||
servers stores the code that matches with the code generated at the user host,
|
||||
the virtual domain name typed at the user host is recognized as corresponding
|
||||
only to the domain name of that server host, and the user host is connected to
|
||||
the server host. The domain names of the remaining server hosts that do not have
|
||||
the matching code are also displayed at the user host as alternative sites.
|
||||
|
||||
Because a unique code is assigned to only one of the domain names in English
|
||||
scripts, it does not cause any domain name squatting problem beyond what we
|
||||
experience with current domain names in English scripts. Unique codes do not
|
||||
need to be stored in any specific format, that is, they can be embedded in HTML,
|
||||
XML, WML, and so on, so that the user host can interpret the retrieved code
|
||||
correctly. Likewise, unique codes do not require any specific intermediate
|
||||
transport protocol such as TCP/IP. The only requirement is that the protocol
|
||||
must be understood among all participating user hosts and server hosts. For
|
||||
security purpose, this coding scheme may use an encryption technique.
|
||||
|
||||
For example, 'ž¾Ô.ýý', a virtual domain name typed in Korean scripts, may
|
||||
result in four corresponding domain names in English scripts, including
|
||||
'jungang.com', 'joongang.com,' 'chungang.com', and 'choongang.com', since the
|
||||
phonemes represented by characters consisting of 'ž¾Ô.ýý' in Korean scripts can
|
||||
have the same or almost the same sounds as the phonemes represented by
|
||||
characters consisting of 'jungang.com', 'joongang.com,' 'chungang.com', or
|
||||
'choongang.com' in English scripts. In this case, we assume that the server host
|
||||
with its domain name 'jungang.com' has the pre-assigned code that matches with
|
||||
the code generated when 'ž¾Ô.ýý' in Korean scripts is entered in user
|
||||
applications. Then, the user host is connected to this server host, and the
|
||||
other server hosts may be listed to the user as alternative sites so that the
|
||||
user can try them.
|
||||
|
||||
The process of this coding scheme that makes each virtual domain name in local
|
||||
scripts correspond to only one actual domain name in English scripts, can be
|
||||
depicted as:
|
||||
|
||||
+---------------------------------+
|
||||
| User |
|
||||
+---------------------------------+
|
||||
| |
|
||||
+----------------|-----------------------|------------------+
|
||||
| v v |
|
||||
| +---------------------+ +-----------------------+ |
|
||||
| | Virtual domain name | | Potential domain names| |
|
||||
| | in a local language |---->| in English | |
|
||||
| | e.g., 'ž¾Ô.ýý' | | e.g., 'jungang.com' | |
|
||||
| | (code: 297437)| | 'joongang.com' | |
|
||||
| | | | 'chungang.com' | |
|
||||
| | | | 'choongang.com' | |
|
||||
| +---------------------+ +-----------------------+ |
|
||||
| User application | |
|
||||
+----------------------------------------|------------------+
|
||||
^ |
|
||||
| | Code check by VIDN
|
||||
Connection to | | +-- 'jungang.com'
|
||||
the server host | | | (code: 297437)
|
||||
'jungang.com' | | |-- 'joongang.com'
|
||||
| |----+ (not active)
|
||||
| | |-- 'chungang.com'
|
||||
| | | (code: 381274)
|
||||
| DNS request and | +-- 'choongang.com'
|
||||
| response | (not active)
|
||||
+-----------------------+
|
||||
|
||||
Since VIDN converts separately the entity-defined portions and the coded
|
||||
portions of a virtual domain name, it preserves the current syntax of domain
|
||||
names, that is, the hierarchical dotted notation, which Internet users are
|
||||
familiar with. Also, VIDN allows using a virtual domain name mixed with local
|
||||
and English scripts as the user wishes to, since the conversion takes place on
|
||||
each individual portion of the domain name and each individual character or set
|
||||
of characters of the portion.
|
||||
|
||||
While VIDN preserves the hierarchical dotted notation of current domain names,
|
||||
the principles of VIDN are applicable to domain names in other possible
|
||||
notations such as those in a natural language (e.g., 'microsoft windows' rather
|
||||
than 'windows.microsoft.com'). Also, the principles of VIDN can be applied into
|
||||
other identifiers used on the Internet, such as user IDs of e-mail addresses,
|
||||
names of directories and folders, names of web pages and files, keywords used in
|
||||
search engines and directory services, and so on, allowing them to be used
|
||||
interchangeably in local and English scripts, without creating additional
|
||||
identifiers in local scripts. The conversion of VIDN can be done between any two
|
||||
sets of scripts interchangeably. Thus, even when the DNS accepts and registers
|
||||
domain names in other local scripts in addition to English, VIDN can allow using
|
||||
the same domain names in any two sets of scripts by converting virtual domain
|
||||
names in one set of scripts into actual domain names in another set of scripts.
|
||||
|
||||
4.3. Development and implementation
|
||||
|
||||
In a preferred arrangement, the development of VIDN for each set of local
|
||||
scripts may be administered by one or more local standard bodies in regions
|
||||
where the local scripts are widely used, for example, Korean Network Information
|
||||
Center for Korean scripts, Japan Network Information Center for Japanese scripts,
|
||||
and China, Hong Kong and Taiwan Network Information Centers for Chinese scripts,
|
||||
with consultation with experts on phonemics and linguistics of the respective
|
||||
local language and English language. Also, the unique codes for one-to-one
|
||||
mapping between virtual domain names in local scripts and actual domain names in
|
||||
English scripts can be administered by a central standard body like IANA.
|
||||
Alternatively, the unique codes for each set of local scripts may be
|
||||
administered by one or more local standard bodies in regions where the local
|
||||
scripts are widely used, as with the development of VIDN.
|
||||
|
||||
VIDN is implemented in applications at the user host. That is, the conversion of
|
||||
virtual domain names in local scripts into the corresponding actual domain names
|
||||
in English scripts takes place at the user host before DNS requests are sent.
|
||||
Thus, neither a special encoding nor a separate lookup service is needed to
|
||||
implement VIDN. VIDN is also modularized with each module being used for
|
||||
conversion of virtual domain names in one set of local scripts into the
|
||||
corresponding actual domain names in English scripts. A user needs only the
|
||||
module for conversion of his or her preferred set of local scripts into English
|
||||
scripts. Alternatively, VIDN can be implemented at a central server host or a
|
||||
cluster of local server hosts. A central server can provide the conversion
|
||||
service for all sets of local scripts, or a cluster of local server hosts can
|
||||
share the conversion service. In the latter case, each local server host can
|
||||
provide the conversion service for one or more sets of local scripts used in a
|
||||
certain region.
|
||||
|
||||
Because of its small size, VIDN can be easily embedded into applications
|
||||
software such as web browser, e-mail software, ftp system, and so on at the user
|
||||
host, or it can work as an add-on program to such software. In either case, the
|
||||
only requirement on the part of the user is to install VIDN or software
|
||||
embedding VIDN at the user host. Using virtual domain names in local scripts in
|
||||
accordance with the principles of VIDN is very intuitive to those who use the
|
||||
local scripts. The only requirement on the part of the entity whose server host
|
||||
provides Internet services to user hosts is to have an actual domain name in
|
||||
English scripts into which virtual domain names in local scripts are neatly
|
||||
transliterated in accordance with the principles of VIDN. Most entities in
|
||||
regions where English scripts are not widely used already have such domain names
|
||||
in English scripts. Finally, there is nothing to change on the part of the DNS,
|
||||
since VIDN uses the current DNS as it is.
|
||||
|
||||
Taken together, the features of VIDN can meet all the requirement of
|
||||
internationalized domain names as described in Wenzel and Seng [2], with respect
|
||||
to compatibility and interoperability, internationalization, canonicalization,
|
||||
and operating issues. Given the fact that different methods toward
|
||||
internationalized domain names confuse users, as already observed in some
|
||||
regions where some of these methods have already been commercialized, e.g.,
|
||||
Korea, Japan and China, it is important to find and implement the most effective
|
||||
solution to internationalized domain names as soon as possible.
|
||||
|
||||
4.4. Current status
|
||||
|
||||
VIDN has been developed for Korean-English conversion as a web browser add-on
|
||||
program. The program contains all the features described in this document and is
|
||||
capable of listing all the domain names in English scripts that correspond to a
|
||||
virtual domain name typed in Korean scripts so that a user can choose any of
|
||||
them. The program can cover more than ninety percent of the sample. That is, the
|
||||
results of testing indicate that more than ninety percent of web sites in Korea
|
||||
can be accessed using virtual domain names in Korean scripts without creating
|
||||
additional domain names in Korean scripts. The remaining ten percent of domain
|
||||
names are mostly those that contain acronyms, abbreviations or initials. With
|
||||
improvement of its knowledge of transliteration, the program is expected to
|
||||
cover more domain names used in Korea.
|
||||
|
||||
5. Security considerations
|
||||
|
||||
Because VIDN uses the DNS as it is, it inherits the same security considerations
|
||||
as the DNS.
|
||||
|
||||
6. Intellectual property considerations
|
||||
|
||||
It is the intention of DualName, Inc. to submit the VIDN method and other
|
||||
elements of VIDN software to IETF for review, comment or standardization.
|
||||
|
||||
DualName has applied for one or more patents on the technology related to
|
||||
virtual domain name software and virtual email software. If a standard is
|
||||
adopted by IETF and any patents are issued to DualName with claims that are
|
||||
necessary for practicing the standard, DualName is prepared to make available,
|
||||
upon written request, a non-exclusive license under fair, reasonable and non-
|
||||
discriminatory terms and condition, based on the principle of reciprocity,
|
||||
consistent with established practice.
|
||||
|
||||
|
||||
7. References
|
||||
|
||||
1 Wenzel, Z. and Seng, J. (Editors), "Requirements of Internationalized Domain
|
||||
Names," draft-ietf-idn-requirements-03.txt, August 2000
|
||||
|
||||
|
||||
8. Author's address
|
||||
|
||||
Sung Jae Shim
|
||||
DualName, Inc.
|
||||
3600 Wilshire Boulevard, Suite 1814
|
||||
Los Angeles, California 90010
|
||||
USA
|
||||
Email: shimsungjae@dualname.com
|
||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,437 +0,0 @@
|
||||
INTERNET-DRAFT John C Klensin
|
||||
21 October 2002
|
||||
Expires April 2003
|
||||
|
||||
National and Local Characters in DNS TLD Names
|
||||
draft-klensin-idn-tld-00.txt
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance
|
||||
with all provisions of Section 10 of RFC2026 except that the
|
||||
right to produce derivative works is not granted.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering
|
||||
Task Force (IETF), its areas, and its working groups. Note that
|
||||
other groups may also distribute working documents as
|
||||
Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six
|
||||
months and may be updated, replaced, or obsoleted by other
|
||||
documents at any time. It is inappropriate to use Internet-
|
||||
Drafts as reference material or to cite them other than as
|
||||
"work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
Internet-Drafts are working documents of the Internet Engineering
|
||||
Task Force (IETF), its areas, and its working groups. Note that
|
||||
other groups may also distribute working documents as
|
||||
Internet-Drafts.
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
In the context of work on internationalizing the Domain Name System
|
||||
(DNS), there have been extensive discussions about "multilingual" or
|
||||
"internationalized" top level domain names (TLDs), especially for
|
||||
countries whose predominant language is not written in a Roman-based
|
||||
script. This document reviews some of the motivations for such
|
||||
domains and the constraints that the DNS imposes. It then suggests
|
||||
an alternative, local translation, that may solve a superset of the
|
||||
problem while avoiding protocol changes, serious deployment delays,
|
||||
and other difficulties.
|
||||
|
||||
Table of Contents
|
||||
|
||||
1 Introduction
|
||||
1.1 Background on the "Multilingual Name" Problem
|
||||
1.2 Domain Name System Constraints
|
||||
1.3 Internationalization and Localization
|
||||
2. Client-side solutions
|
||||
2.1 IDNA and the client
|
||||
2.2 Local translation tables for TLD names
|
||||
3. Advantages and disadvantages of local translation
|
||||
3.1 Every TLD in the local language and character set
|
||||
|
||||
3.2 Unification of country code domains
|
||||
3.3 User understanding of local and global reference
|
||||
3.4 Limits on TLD propagation
|
||||
4. Security Considerations
|
||||
5. References
|
||||
6. Acknowledgements
|
||||
7. Author's Address
|
||||
|
||||
|
||||
1. Introduction
|
||||
|
||||
1.1 Background on the "Multilingual Name" Problem
|
||||
|
||||
People who share a language prefer to communicate in it, using whatever
|
||||
characters are normally used to write that language, rather than in some
|
||||
"foreign" one. There have been standards for using mutually-agreed
|
||||
characters and languages in electronic mail message bodies and selected
|
||||
headers since the introduction of MIME in 1992 [MIME] and the Web has
|
||||
permitted multilingual text since its inception. However, since domain
|
||||
names are exposed to users in email addresses and URLs, and
|
||||
corresponding arrangements in other protocols, demand rapidly arose to
|
||||
permit domain names in applications that used characters other than
|
||||
those of the very restrictive, ASCII-subset, "LDH" conventions [LDH].
|
||||
The effort to do this rapidly became known as "multilingual domain
|
||||
names", although that is a misnomer, since the DNS deals only with
|
||||
characters and identifier strings, and not, except by accident, what
|
||||
people usually think of as "names". And there has been little actual
|
||||
interest in what would actually be a "multilingual name" -- i.e., a name
|
||||
that contains components from more than one language -- but only the use
|
||||
of strings conforming to different languages in the context of the DNS.
|
||||
|
||||
1.1.1 Approaches to the requirement
|
||||
|
||||
If the requirement is seen, not as "modifying the DNS", but as
|
||||
"providing users with access to the DNS from a variety of languages and
|
||||
character sets", three sets of proposals have emerged in the IETF and
|
||||
elsewhere. They are:
|
||||
|
||||
(1) Perform processing in client software that recodes a user-visible
|
||||
string into an ASCII-compatible form that can safely be passed
|
||||
through the DNS protocols and stored in the DNS. This is the
|
||||
approach used, for example, in the IETF's "IDNA" protocol [IDNA].
|
||||
|
||||
(2) Modify the DNS to be more hospitable to non-ASCII names and
|
||||
strings. There have been a variety of proposals to do this in almost
|
||||
as many ways, some of which have been implemented on a proprietary
|
||||
basis by various vendors. None of them have gained acceptance in the
|
||||
IETF community, primarily because they would take a long time to
|
||||
deploy and would leave many problems unsolved.
|
||||
|
||||
(3) Move the problem out of the DNS entirely, relying instead on a
|
||||
"directory" or "presentation" layer to handle internationalization.
|
||||
The rationale for this approach is discussed in [DNSROLE].
|
||||
|
||||
This document proposes a fourth approach, applicable to the top level
|
||||
domains (TLDs) only (see section 1.2.1 for a discussion of the special
|
||||
issues that make TLDs problematic). That approach could be used as an
|
||||
alternate or supplement to the strategies summarized above.
|
||||
|
||||
|
||||
1.1.2 Writing the name of one's country in its own characters
|
||||
|
||||
An early focus of the "multilingual domain name" efforts was expressed
|
||||
in statements such as "users in my country, in which ASCII is rarely
|
||||
used, should be able to write an entire domain name in their own
|
||||
character set. In particular, since all top-level domain names, at
|
||||
present, follow the LDH rules, the somewhat more restrictive naming
|
||||
rules discussed in [STD3], and the coding conventions specified in
|
||||
[RFC1591], all fully-qualified DNS names were effectively required to
|
||||
contain at least one ASCII label (the TLD name), and that was considered
|
||||
inappropriate. One should, instead, be able to write the name of the
|
||||
ccTLD for China in Chinese, the name of the ccTLD for Saudi Arabia in
|
||||
Arabic, and so on.
|
||||
|
||||
1.1.3 Countries with multiple languages and countries with multiple
|
||||
names
|
||||
|
||||
>From a user interface standpoint, writing ccTLD names in local
|
||||
characters is a problem. As discussed in section 1.2.2, the DNS itself
|
||||
does not easily permit a domain to be referred to by more than one name
|
||||
(or spelling or translation of a name). Countries with more than one
|
||||
official language would require that the country name be represented in
|
||||
each of those languages. And, just as it is important that a user in
|
||||
China be able to represent the name of the Chinese ccTLD in Chinese
|
||||
characters, she should be able to access a Chinese-language site in
|
||||
France using Chinese characters, requiring that she be able to write the
|
||||
name of the French ccTLD in those characters rather than in a form based
|
||||
on a Roman character set.
|
||||
|
||||
|
||||
1.2 Domain Name System Constraints
|
||||
|
||||
1.2.1 Administrative hierarchy
|
||||
|
||||
The domain name system is designed around the idea of an "administrative
|
||||
hierarchy", with the entity responsible for a given node of the
|
||||
hierarchy responsible for policies applicable to its subhierarchies (Cf.
|
||||
[STD13]). The model works quite well for the domain and subdomains of a
|
||||
particular enterprise, where the hierarchy can be organized to match the
|
||||
organizational structure, there are established ways to set policies and
|
||||
there is, at least presumably, shared assumptions about overall goals
|
||||
and objectives among all registrants in the domain. It is more
|
||||
problematic when a domain is shared by unrelated entities which lack
|
||||
common policy assumptions. It is difficult to reach agreement on rules
|
||||
that should apply to all of them. That situation always prevails for
|
||||
the labels registered in a TLD (second-level names) except in those TLDs
|
||||
for which the second level is structural (e.g., the .CO, .AC, .GOV
|
||||
conventions in many ccTLD) in which case, it exists for the labels
|
||||
within that structural level.
|
||||
|
||||
TLDs may, but need not, have consistent registration policies for those
|
||||
second (or third) level names. Countries (or ccTLD administrators) have
|
||||
often adopted rules about what entities may register in those ccTLDs,
|
||||
and the forms the names may take. RFC 1591 outlined registration norms
|
||||
for most of the gTLDs, even though those norms have been largely ignored
|
||||
in recent years. And some recent "sponsored" domains are based on quite
|
||||
specific rules about appropriate registrations. Homogeneous
|
||||
|
||||
registration rules for the root are, by contrast, impossible: almost by
|
||||
definition, the subdomains registered in it are diverse and no single
|
||||
policy applying to all root subdomains (TLDs) is feasible.
|
||||
|
||||
1.2.2 Aliases
|
||||
|
||||
In an environment different from the DNS, a rational way to permit
|
||||
assigning local-language names to a country code (or other) domain would
|
||||
be to set up an alias for the name, or to use some sort of "see instead"
|
||||
reference. But the DNS does not have quite the right facilities for
|
||||
either. Instead, it supports a "CNAME" record, whose label can refer
|
||||
onto to a particular label and not to a subtree. For example, if A.B.C
|
||||
is a fully-qualified name, then a CNAME reference from X to A would make
|
||||
X.B.C appear to have the same values as A.B.C. However, a CNAME
|
||||
reference from Y to C would not make A.B.Y referenceable (or even
|
||||
defined) at all. A second record type, DNAME [RFC2672], can provide an
|
||||
alias for a portion of the tree. But it is problematic technically, and
|
||||
its use is strongly discouraged except for transition uses from one
|
||||
domain to another.
|
||||
|
||||
|
||||
1.3 Internationalization and Localization
|
||||
|
||||
It has often been observed that while many people talk about
|
||||
"internationalization" (a term we typically use for making something
|
||||
globally accessible while incorporating a broad-range "universal"
|
||||
character set and conventions appropriate to all languages), they often really
|
||||
mean, and want, "localization" (making things work well in a particular
|
||||
locality, or well, but potentially differently, for a broad range of
|
||||
localities). Anything that actually involves the DNS must be global and
|
||||
hence internationalized since the DNS cannot meaningfully support
|
||||
different responses based, e.g., on the location of the user making a
|
||||
query. While the DNS cannot support localization internally, many of
|
||||
the features discussed earlier in this section are much more easily
|
||||
thought about in local terms --whether localized to a geographical area,
|
||||
users of a language, or using some other criteria -- than in global ones.
|
||||
|
||||
2. Client-side solutions
|
||||
|
||||
Traditionally, the IETF has avoided becoming involved in standardization
|
||||
for actions that take place strictly on individual hosts on the network,
|
||||
assuming that it should confine itself to behavior that is observable
|
||||
"on the wire", i.e., in protocols between network hosts. Exceptions to
|
||||
this general principle have been made when different clients were
|
||||
required to utilize data or interpret values in compatible ways to
|
||||
preserve interoperability: the standards for email and web body formats,
|
||||
and IDNA itself, are examples of these exceptions. Regardless of what
|
||||
is required to be standardized, it is almost never required, and often
|
||||
unwise, that a user interface, by default, present on-the-wire formats
|
||||
to the user. However, in most cases when the presentation format and
|
||||
the wire format differ, the client program must take precautions that
|
||||
the wire format can be reconstructed from user input, or to keep the
|
||||
wire format, while hidden, bound to the presentation mechanism so that
|
||||
it can be reconstructed. And, while it is rarely a goal in itself, it
|
||||
is often necessary that the user be at least vaguely aware that the wire
|
||||
("real") format is different from the presentation one and that the wire
|
||||
format be available for debugging.
|
||||
|
||||
|
||||
2.1 IDNA and the client
|
||||
|
||||
As mentioned above, IDNA itself is entirely a client-side protocol. It
|
||||
works by providing labels to the DNS in a special format (so-called
|
||||
"ACE"). When labels in that format are encountered, they are
|
||||
transformed, by the client, back into internationalized (normally
|
||||
Unicode) characters. In the context of this document, the important
|
||||
obvservation about IDNA is that any application program that supports it
|
||||
is already doing considerable transformation work on the client; it is
|
||||
not simply presenting the on-the-wire formats to the user.
|
||||
|
||||
|
||||
2.2 Local translation tables for TLD names
|
||||
|
||||
We suggest that, in addition to maintaining the code and tables required
|
||||
to support IDNA, clients may want to maintain a table that contains a
|
||||
list of TLDs and that maps between them and locally-desirable names.
|
||||
For ccTLDs, these might be the names (or locally-standard abbreviations)
|
||||
by which the relevant countries are known locally (whether in ASCII
|
||||
characters or others). With some care on the part of the application
|
||||
designer (e.g., to ensure that local forms do not conflict with the
|
||||
actual TLD names), a particular TLD name input from the user could be
|
||||
either in local or standard form without special tagging or problems.
|
||||
When DNS names are received by these client programs, the TLD labels
|
||||
would be mapped to local form before IDNA is applied to the rest of the
|
||||
name; when names are received from users, local TLD names would be
|
||||
mapped to the global ones before being passed into IDNA or for other DNS
|
||||
processing.
|
||||
|
||||
|
||||
3. Advantages and disadvantages of local translation
|
||||
|
||||
3.1 Every TLD in the local language and character set
|
||||
|
||||
The notion of a top-level domain whose name matches, e.g., the name that
|
||||
is used for a country in that country or the name of a language in that
|
||||
language as, as mentioned above, immediately appealing. But most of the
|
||||
reasons for it argue equally strongly for other TLDs being accessible
|
||||
from that language. A user in Korea who can access the national ccTLD
|
||||
in the Korean language and character set has every reason to expect that
|
||||
both generic top level domains and and domains associated with other
|
||||
countries would be similarly accessible, especially if the second-level
|
||||
domains bear Korean names. A user in Spain or Portugal, or in Latin
|
||||
America, would presumably have similar expectations, but would expect to
|
||||
use Spanish names, not Korean ones.
|
||||
|
||||
That level of local optimization is not realistic --some would argue not
|
||||
possible-- with the DNS since it would ultimately require that every top
|
||||
level domain be replicated for each of the world's languages. That
|
||||
replication process would involve not just the top level domain itself:
|
||||
in principle, all of its subtrees would need to be completely replicated
|
||||
as well (or at least all of the subtrees for which a the language
|
||||
associated with the a given replicant was relevant). The administrative
|
||||
hierarchy characteristics of the DNS (see section 1.2.1) turn the
|
||||
replication process into an administrative nightmare: every
|
||||
administrator of a second-level domain in the world would be forced to
|
||||
maintain dozens, probably hundreds, of similar zone files for the the
|
||||
replicates of the domain. Even if only the zones relevant to a
|
||||
|
||||
particular country or language were replicated, the administrative and
|
||||
tracking problems to bind these to the appropriate top-level domain and
|
||||
keep all of the replicas synchronized would be extremely difficulty at
|
||||
best. And many administrators of third- and fourth-level domains, and
|
||||
beyond, would be faced with similar problems.
|
||||
|
||||
By contrast, dealing with the names of TLDs as a localization problem,
|
||||
using local translation, is fairly simple. Each function represented by
|
||||
a TLD -- a country, generic registrations, or purpose-specific
|
||||
registrations -- could be represented in the local language and
|
||||
character set as needed. And, for countries with many languages, or
|
||||
users living, working, or visiting countries where their language was
|
||||
not dominant, "local" could be defined in terms of the needs or wishes
|
||||
of each particular user.
|
||||
|
||||
3.2 Unification of country code domains
|
||||
|
||||
It follows from some of the comments above that, while there appears to
|
||||
be some immediate appeal from having (at least) two domains for each
|
||||
country, one using the ISO 3166-1 code and another one using a name
|
||||
based on the national name in the national language, such a situation
|
||||
would create considerable problems for registrants in the multiple
|
||||
domains. For registrants maintaining enterprise or organizational
|
||||
subdomains, ease of administration in a single family of zone files will
|
||||
usually make a registration in a single top-level domain preferable to
|
||||
replicated sets of them, at least as long as their functional
|
||||
requirements (such a local-language access) are met by the unified
|
||||
structure.
|
||||
|
||||
Of course, having replicated domains might be popular with registries
|
||||
and registrars, since replication would almost inevitably increase the
|
||||
total number of domains to be registered.
|
||||
|
||||
3.3 User understanding of local and global references
|
||||
|
||||
While the IDNA tables (actually Nameprep and Stringprep -- see the IDNA
|
||||
specification) must be identical globally for IDNA to work reliably, the
|
||||
tables for mapping between local names and TLD names could be locally
|
||||
determined, and differ from one locale to another, as long as users
|
||||
understood that international interchange of names required using the
|
||||
standard forms. That understanding could be assisted by software. It
|
||||
is likely that, at least for the foreseeable future, DNS names being
|
||||
passed among users in different countries, or using different languages,
|
||||
will be forced to be in ACE form to guarantee compatibility in any
|
||||
event, so the marginal knowledge or effort needed to put TLD names into
|
||||
standard form and transmit them that way would be very small.
|
||||
|
||||
3.4 Limits on TLD propagation
|
||||
|
||||
The concept of using local translation does have one side-effect, which
|
||||
some portions of the Internet community might consider undesirable.
|
||||
The size and complexity of translation tables, and maintaining those
|
||||
tables, will be, to a considerable extent, a function of the number of
|
||||
top-level domains, the frequency with which new domains are added, and
|
||||
the number of domains that are added at a time. A country or other
|
||||
locale that wished to maintain a few set of translations (i.e., so that
|
||||
every TLD had a representation in the local language) would presumably
|
||||
find setting up a table for the current collection of a few hundred
|
||||
|
||||
domains to be a task that would take some days. If the number of TLDs
|
||||
was relatively stable, with a relatively small number being added at
|
||||
infrequent intervals, the updates could probably be dealt with on an ad
|
||||
hoc basis. But, if large numbers of domains were added frequently, or
|
||||
if the total number of TLDs became very large, maintaining the table
|
||||
might require dedicated staff. Worse, updating the tables stored on
|
||||
client machines might require update and synchronization protocols and
|
||||
all of the related complexities.
|
||||
|
||||
4. Security Considerations
|
||||
|
||||
IDNA provides a client-based mechanism for presenting Unicode names in
|
||||
applications while passing only ASCII-based names on the wire. As such,
|
||||
it constitutes a major step along the path of introducing a client-based
|
||||
presentation layer into the Internet. Client-based presentation layer
|
||||
transformations introduce risks from variant tables that can change
|
||||
meaning without external protection. For example, if a mapping table
|
||||
normally maps A onto C and that table is altered by an attacker so that
|
||||
A maps onto D instead, much mischief can be committed. On the other
|
||||
hand, these are not the usual sort of network attacks: they may be
|
||||
thought of as falling into the "users can always cause harm to
|
||||
themselves" category. The local translation model outlined here does
|
||||
not significantly increase the risks over those associated with IDNA,
|
||||
but may provide some new avenues for exploiting them.
|
||||
|
||||
Both this approach and IDNA rely on having updated programs present
|
||||
information to the user in a very different form than the one in which
|
||||
it is transmitted on the wire. Unless the internal (wire) form is
|
||||
always used in interchange, there are possibilities for ambiguity and
|
||||
confusion about references.
|
||||
|
||||
5. References
|
||||
|
||||
[DNSROLE] Klensin, J.C., "Role of the Domain Name System", work in
|
||||
progress (draft-klensin-dns-role-04.txt).
|
||||
|
||||
[IDNA] Faltstorm, F., P. Hoffman, A. M. Costello, "Internationalizing
|
||||
Domain Names in Applications (IDNA)", work in progress
|
||||
(draft-ietf-idn-idna-13.txt)
|
||||
|
||||
[LDH] STD13 and comments
|
||||
|
||||
[MIME] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet Mail
|
||||
Extensions): Mechanisms for Specifying and Describing the Format of
|
||||
Internet Message Bodies", RFC 1341, June 1992. Updated and replaced
|
||||
by Freed, N. and N. Borenstein, "Multipurpose Internet Mail
|
||||
Extensions (MIME) Part One: Format of Internet Message Bodies",
|
||||
RFC2045, November 1996. Also, Moore, K., "Representation of
|
||||
Non-ASCII Text in Internet Message Headers", RFC 1342, June 1992.
|
||||
Updated and replaced by Moore, K., "MIME (Multipurpose Internet
|
||||
Mail Extensions) Part Three: Message Header Extensions for
|
||||
Non-ASCII Text", RFC 2047, November 1996.
|
||||
|
||||
[RFC1591] Postel, J., "Domain Name System Structure and Delegation",
|
||||
RFC1591, March 1994.
|
||||
|
||||
[RFC2672] Crawford, M., "Non-Terminal DNS Name Redirection", RFC 2672,
|
||||
August 1999.
|
||||
|
||||
|
||||
[STD3] Braden, R., Ed., "Requirements for Internet Hosts - Application and
|
||||
Support", RFC1123, October 1989.
|
||||
|
||||
[STD13] Mockapetris, P.V., 1034 "Domain names - concepts and
|
||||
facilities", RFC 1034, and "Domain names - implementation and
|
||||
specification", RFC 1035, November 1987.
|
||||
|
||||
6. Acknowledgements
|
||||
|
||||
This document was inspired by a number of conversations in ICANN, IETF,
|
||||
MINC, and private contexts about the future evolution and
|
||||
internationalization of top level domains. Discussions within, and
|
||||
about, the ICANN IDN Committee have been particularly helpful, although
|
||||
several of the members of that committee may be surprised about where
|
||||
those discussions led.
|
||||
|
||||
7. Author's Address
|
||||
|
||||
John C Klensin
|
||||
1770 Massachusetts Ave, #322
|
||||
Cambridge, MA 02140 USA
|
||||
email: john+ietf@jck.com
|
||||
|
Loading…
x
Reference in New Issue
Block a user