diff --git a/doc/design/cc-protocol.txt b/doc/design/cc-protocol.txt new file mode 100644 index 0000000000..0129530878 --- /dev/null +++ b/doc/design/cc-protocol.txt @@ -0,0 +1,296 @@ +protocol version 0x536b616e + +DATA 0x01 +HASH 0x02 +LIST 0x03 +NULL 0x04 +TYPE_MASK 0x0f + +LENGTH_32 0x00 +LENGTH_16 0x10 +LENGTH_8 0x20 +LENGTH_MASK 0xf0 + + +MESSAGE ENCODING +---------------- + +When decoding, the entire message length must be known. If this is +transmitted over a raw stream such as TCP, this is usually encoded +with a 4-byte length followed by the message itself. If some other +wrapping is used (say as part of a different message structure) the +length of the message must be preserved and included for decoding. + +The first 4 bytes of the message is the protocol version encoded +directly as a 4-byte value. Immediately following this is a HASH +element. The length of the hash element is the remainder of the +message after subtracting 4 bytes for the protocol version. + +This initial HASH is intended to be used by the message routing system +if one is in use. + + +ITEM TYPES +---------- + +There are four basic types encoded in this protocol. A simple data +blob (DATA), a tag-value series (HASH), an ordered list (LIST), and +a NULL type (which is used internally to encode DATA types which are +empty and can be used to indicate existance without data in a hash.) + +Each item can be of any type, so a hash of hashes and hashes of lists +are typical. + +All multi-byte integers which are encoded in binary are in network +byte order. + + +ITEM ENCODING +------------- + +Each item is preceeded by a single byte which describes that item. +This byte contains the item type and item length encoding: + + Thing Length Description + ---------------- -------- ------------------------------------ + TyLen 1 byte Item type and length encoding + Length variable Item data blob length + Item Data variable Item data blob + +The TyLen field includes both the item data type and the item's +length. The length bytes are encoded depending on the length of data +portion, and the smallest data encoding type supported should be +used. Note that this length compression is used just for data +compactness. It is wasteful to encode the most common length (8-bit +length) as 4 bytes, so this method allows one byte to be used rather +than 4, three of which are nearly always zero. + + +HASH +---- + +This is a tag/value pair where each tag is an opaque unique blob and +the data elements are of any type. Hashes are not encoded in any +specific tag or item order. + +The length of the HASH's data area is processed for tag/value pairs +until the entire area is consumed. Running out of data prematurely +indicates an incorrectly encoded message. + +The data area consists of repeated items: + + Thing Length Description + ---------------- -------- ------------------------------------ + Tag Length 1 byte The length of the tag. + Tag Variable The tag name + Item Variable Encoded item + +The Tag Length field is always one byte, which limits the tag name to +255 bytes maximum. A tag length of zero is invalid. + + +LIST +---- + +A LIST is a list of items encoded and decoded in a specific order. +The order is chosen entirely by the source curing encoding. + +The length of the LIST's data is consumed by the ITEMs it contains. +Running out of room prematurely indicates an incorrectly encoded +message. + +The data area consists of repeated items: + + Thing Length Description + -------------- ------ ---------------------------------------- + Item Variable Encoded item + + +DATA +---- + +A DATA item is a simple blob of data. No further processing of this +data is performed by this protocol on these elements. + +The data blob is the entire data area. The data area can be 0 or more +bytes long. + +It is typical to encode integers as strings rather than binary +integers. However, so long as both sender and recipient agree on the +format of the data blob itself, any blob encoding may be used. + + +NULL +---- + +This data element indicates no data is actually present. This can be +used to indicate that a tag is present in a HASH but no data is +actually at that location, or in a LIST to indicate empty item +positions. + +There is no data portion of this type, and the encoded length is +ignored and is always zero. + +Note that this is different than a DATA element with a zero length. + + +EXAMPLE +------- + +This is Ruby syntax, but should be clear enough for anyone to read. + +Example data encoding: + +{ + "from" => "sender@host", + "to" => "recipient@host", + "seq" => 1234, + "data" => { + "list" => [ 1, 2, nil, "this" ], + "description" => "Fun for all", + }, +} + + +Wire-format: + +In this format, strings are not shown in hex, but are included "like +this." Descriptions are written (like this.) + +Message Length: 0x64 (100 bytes) +Protocol Version: 0x53 0x6b 0x61 0x6e +(remaining length: 96 bytes) + +0x04 "from" 0x21 0x0b "sender@host" +0x02 "to" 0x21 0x0e "recipient@host" +0x03 "seq" 0x21 0x04 "1234" +0x04 "data" 0x22 + 0x04 "list" 0x23 + 0x21 0x01 "1" + 0x21 0x01 "2" + 0x04 + 0x21 0x04 "this" + 0x0b "description" 0x0b "Fun for all" + + +MESSAGE ROUTING +--------------- + +The message routing daemon uses the top-level hash to contain routing +instructions and additional control data. Not all of these are +required for various control message types; see the individual +descriptions for more information. + + Tag Description + ------- ---------------------------------------- + msg Sender-supplied data + from sender's identity + group Group name this message is being sent to + instance Instance in this group + repl if present, this message is a reply. + seq sequence number, used in replies + to recipient or "*" for no specific receiver + type "send" for a channel message + + +"type" is a DATA element, which indicates to the message routing +system what the purpose of this message is. + + +Get Local Name (type "getlname") +-------------------------------- + +Upon connection, this is the first message to be sent to the control +daemon. It will return the local name of this client. Each +connection gets its own unique local name, and local names are never +repeated. They should be considered opaque strings, in a format +useful only to the message routing system. They are used in replies +or to send to a specific destination. + +To request the local name, the only element included is the + "type" => "getlname" +tuple. The response is also a simple, single tuple: + "lname" => "UTF-8 encoded local name blob" + +Until this message is sent, no other types of messages may be sent on +this connection. + + +Regular Group Messages (type "send") +------------------------------------ + +When sending a message: + +"msg" is the sender supplied data. It is encoded as per its type. +It is a required field, but may be the NULL type if not needed. +In OpenReg, this was another wire format message, stored as an +ITEM_DATA. This was done to make it easy to decode the routing +information without having to decode arbitrary application-supplied +data, but rather treat this application data as an opaque blob. + +"from" is a DATA element, and its value is a UTF-8 encoded sender +identity. It MUST be the "local name" supplied by the message +routing system upon connection. The message routing system will +enforce this, but will not add it. It is a required field. + +"group" is a DATA element, and its value is the UTF-8 encoded group +name this message is being transmitted to. It is a required field for +all messages of type "send". + +"instance" is a DATA element, and its value is the UTF-8 encoded +instance name, with "*" meaning all instances. + +"repl" is the sequence number being replied to, if this is a reply. + +"seq" is a unique identity per client. That is, the +tuple must be unique over the lifetime of the connection, or at least +over the lifetime of the expected reply duration. + +"to" is a DATA element, and its value is a UTF-8 encoded recipient +identity. This must be a specific recipient name or "*" to indicate +"all listeners on this channel." It is a required field. + +When a message of type "send" is received by the client, all the data +is used as above. This indicates a message of the given type was +received. + +A client does not see its own transmissions. (XXXMLG Need to check this) + + +Group Subscriptions (type "subscribe") +-------------------------------------- + +A subscription requires the "group", "instance", and a flag to +indicate the subscription type ("sybtype"). If instance is "*" the +instance name will be ignored when decising to forward a message to +this client or not. + +"subtype" is a DATA element, and contains "normal" for normal channel +subscriptions, "meonly" for only those messages on a channel with the +recipient specified exactly as the local name, or "promisc" to receive +all channel messages regardless of other filters. As its name +implies, "normal" is for typical subscriptions, and "promisc" is +intended for channel message debugging. + +There is no response to this message. + + +Group Unsubscribe (type "unsubscribe") +------------------------------- + +The fields to be included are "group" and "instance" and have the same +meaning as a "subscribe" message. + +There is no response to this message. + + +Statistics (type "stats") +------------------------- + +Request statistics from the message router. No other fields are +inclued in the request. + +The response contains a single element "stats" which is an opaque +element. This is used mostly for debugging, and its format is +specific to the message router. In general, some method to simply +dump raw messages would produce something useful during debugging.