bind/doc/design/search

Copyright (C) Internet Systems Consortium, Inc. ("ISC")

See COPYRIGHT in the source root or http://isc.org/copyright.html for terms.

$Id: search,v 1.10 2004/03/05 05:04:47 marka Exp $

What follows is pseudocode for the zone and cache lookup algorithms, as they
will work in the RBT DB.

Note: These algorithms differ in some respects from those discussed in
the RFCs and drafts.  I believe these algorithms provide better
answers in some cases.


Preliminary Stuff

BIND 9 zone databases are versioned, and every search is done in the
context of some version.  There are a number of ways of implementing
versioning.  The method that's going to be used in the RBT DB is to
store a serial number with every rdataset.  All rdatasets added as the
result of a single database update have the same serial number.  This
serial number is not related to the SOA serial, since the SOA serial
is under user control and can do weird things.  The database serial
number is a monotonically increasing value.  When you go to retrieve
an rdataset, you may encounter many rdatasets of that type at any
given node.  The correct one to return, called the "active rdataset",
has the greatest serial number less than or equal to the serial number
used for the search.  The version whose serial number is being used in
the search is the "target version".

Cache databases are not versioned.  A search will always return the
most recent value.

DKZC == Deepest Known Zone Cut.  This is the zone cut closest to the
desired name.  In a zone, it's either a delegation out of authoritative
data, or it's the top of the zone.

ZC == "zone cut", a node not at the zone top which has an active NS
rdataset, or a node (including the zone top) with an active DNAME
rdataset.


Zone Search Algorithm

	Inputs:
		Search name
		Search rdata type		(including ANY)
		Search options

		The search options parameter is a flags variable.  Current
		flags are

			Glue OK			If set, then the caller is
						wants best match results for
						the search name, even if it's
						glue.  If not set, the caller
						will get a delegation if the
						search name is glue.

			Glue Validation		Section 7.18 of RFC 2136
						requires that certain data that
						is not in the zone and is not
						glue remain stored in the zone.
						A search can never return this
						data, but there might be glue
						mixed in with it.  Telling glue
						from non glue involves some
						work, especially since the
						database is versioned.  Often,
						however, the caller will know
						the name it's looking for is
						glue, so validation isn't
						required.

	Outputs:
		result code
		a node
		the name of the node
		rdataset			(not bound if querying for ANY)

		Note:  The node, name, and rdataset are optional.  If the
		caller doesn't care about them, they won't be set.

	Note: there is no EDNS1 "longest match" support in the algorithm yet,
	though I know how to do it.


	cname_ok = yes
	search_must_succeed = no

	Search down from the root of the tree.  If, while going down, we
	encounter a zone cut node, then search the rdatasets at the zone
	cut for active DNAME or NS rdatasets.  Note that if we find both
	an active DNAME rdataset and an active NS rdataset, then the DNAME
	rdataset has precedence.

		If we found an active DNAME rdataset, the search ends here.
			result = DNS_R_DNAME
			foundname = name of this node
			*nodep = this node
			rdataset is the DNAME
			return

		If we found an active NS rdataset
			If finding glue is not OK, or we're not searching for
			a glue type, then the search ends here.
				result = DNS_R_DELEGATION
				foundname = name of this node
				*nodep = this node
				rdataset = NS
				return
			Else
				We remember that this node is the ZC.
				We remember this node's name.
				We'll ignore any zone cuts found further down
				the tree.
				Continue the search down.

 Partial_Match:
	If we don't have an exact match to the name
		If we're below a zone cut, then we need to return a referral.
			result = DNS_R_DELEGATION;
			foundname = ZC name
			*nodep = ZC
			rdataset = NS
			return
		Else If this zone has any wildcards, then
			Go looking for a wildcard match for this name.
			If we found one,
				result = DNS_R_WILDCARD
				foundname = wildcard node name
				Fall through to searching the wildcard node
				for the desired type.
		Else
			NXDOMAIN	(finally!)
			If this is a secure zone then
				Find the greatest predecessor to this node
				that has at least one active rdataset.
				Change the type we're search for to NXT
				cname_ok = no
				search_must_succeed = yes
			Else
				result = DNS_R_NXDOMAIN
				foundname = <empty>
				rdataset = <unbound>
				*nodep = NULL
				return

	If we're here, then we've got a node and are now trying to find
	an active rdataset of the desired type, or, in the case of an ANY
	query, any active rdataset.

	If we're beneath a zone cut
		cname_ok = no
		If the caller wants us to validate glue, then see if the
		current name is a valid glue name for the ZC.
			If not,
				result = DNS_R_DELEGATION;
				foundname = ZC name
				*nodep = ZC
				rdataset = NS
				return

	If the desired type is KEY, SIG, or NXT, then
		cname_ok = no

	foundname = current node name
	*nodep = current node;

	Search the rdataset list for the desired type.  If cname_ok, also
	look for a CNAME rdataset.  While searching, remember the active NXT
	rdataset if we come across it.  We must also determine if there are
	any active rdatasets at the node.

	If there are no active rdatasets at the node, then we've got an
	exact name match, but the name doesn't exist in the desired version.
	This means we really have a partial match.  Goto Partial_Match.

	If we didn't find the type we were looking for (including a failed
	ANY search)
		If (search_must_succeed), then
			The database is bad, e.g. missing NXT records.
			result = DNS_R_BADDB
			*nodep = NULL
			foundname = <empty>
		Else if we're beneath a zone cut
			result = DNS_R_DELEGATION
			foundname = ZC name
			*nodep = ZC
			rdataset = NS
		Else
			result = DNS_R_NXRDATASET
			If this is a secure zone then
				If we found an active NXT rdataset
					rdataset = NXT rdataset
				Else
					result = DNS_R_BADDB
					*nodep = NULL
					foundname = <empty>
			Else
				rdataset = <unbound>
		return

	We have found the type we were looking for or we've found a CNAME.

	If we're not doing any ANY query, didn't find the type we were looking
	for, but did find a CNAME
		result = DNS_R_CNAME
		rdataset = CNAME
	Else If we're beneath a zone cut
		result = DNS_R_GLUE
	Else
		result = DNS_R_SUCCESS

	If type is ANY
		rdataset = <unbound>
	else
		rdataset = the type we were looking for


XXX This is now old XXX

Now for the cache lookup algorithm, which is a little different.  The
cache algorithm takes an optional "zone DKZC".  Say a server is
authoritative for vix.com but not rc.vix.com.  When it looks up
bb.rc.vix.com it will search vix.com and discover the delegation to
rc.vix.com.  We then want to look in the cache for bb.rc.vix.com, and
if we don't find it, the authoritative delegation might be the best
DKZC (since there might not be anything for rc.vix.com in the cache),
so that's why we allow it to be an argument to the cache search
algorithm.  Of course, the cache might have data for rc.vix.com
cached, in which case we should use it and not the DKZC.

DKZC A is "better" than DKZC B if DKZC A is a proper subdomain of DKZC
B.


Cache Search Algorithm:

	Go down as far as possible remembering every parent node.
	Remember the predecessor too.

	If some rdataset for name exists

		Look for desired type or CNAME

		If found
			If negative cache entry
				Indicate this and return.
			If CNAME?
				Indicate it and return.
			Return.
		Else
			Indicate we know nothing about this type at this
			node.
			Return.

	Else
		(Peek at predecessor to see if it has an NXT for the same
		 zone and which covers the QNAME.  If so, return it.)

		Go up until we find a node with a DNAME or a zone cut.
		XXX DNAME draft says go up until you prove that there are no
		    ancestor DNAMEs at all XXX

		If there's a DNAME
			Return a DNAME result with the dname node and node name
			XXX what if the zone DKZC is better (i.e. deeper)? XXX

		We know nothing about this name.

		XXX DNAME draft says that if we have a zone DKZC, we should
		    use it now.  I say use the best DKZC you've got. XXX

		If we get all the way to '.' and we don't even have the
		root NS records
			If we have a DKZC from authoritative data
				Return it.
			Else
				Return NO_KNOWN_AUTHORITY
				(this will cause priming of root servers or,
				 perhaps, forwarding)

		If we have a zone DKZC and it's better than the one we found
		in the cache
			Return it (node and name).

		Return the cache DKZC (node and name).