Files
unshell/PROTOCOL.md
2026-04-25 17:42:39 -06:00

30 KiB

UnShell Protocol Specification

Version: 0.7.0
Status: Draft
Last updated: 2026-04-23

1. Introduction

Non-Normative

The UnShell protocol is a tree-addressed packet protocol for remote procedure calls and bidirectional hook-backed data exchange across a hierarchy of connected endpoints.

The protocol is intended to be small, extensible, and canonical. The core stays narrow enough for constrained implementations, new behavior is introduced through leaves, procedures, and payload schemas instead of frequent protocol redesign, and each core protocol behavior has one clearly defined expression.

This document combines exact protocol definition with rationale. Rationale blocks explain why a rule exists, but do not define interoperability requirements.

2. Document Conventions

Normative

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119 when, and only when, they appear in all capitals.

Unless a section is explicitly marked otherwise, sections labeled Normative define protocol requirements and sections labeled Non-Normative provide description, rationale, deployment guidance, or open design commentary.

All Rationale blocks in this document are non-normative.

3. Purpose and Scope

Non-Normative

The purpose of this specification is to define the set of protocol components required to assemble complete UnShell protocol packets and to provide a framework through which the protocol can be extended through leaves and procedure contracts.

To achieve this purpose, the scope of this specification includes:

  • endpoint addressing by path
  • packet framing
  • packet structure
  • local authority rules for downwards procedure calls
  • path-based routing behavior
  • upwards and downwards packet semantics
  • hook behavior
  • protocol fault behavior
  • the required introspection procedure
  • extension through leaves, procedures, and payload schemas

The UnShell protocol assumes that a connection already exists and that any required authentication, authorization, and routing admission decisions have already been handled by the surrounding system.

The following items are beyond the scope of this specification:

  • authentication
  • authorization
  • connection establishment
  • admission protocol
  • transport selection
  • encryption
  • obfuscation
  • router management interfaces
  • deployment-specific orchestration behavior
  • sensing, analytics, and decision-making systems above the protocol layer

Every implementation is expected to maintain its own live connection set and its own ground truth about which peers are connected, admitted, and routable.

Rationale: Authentication and handshakes were intentionally removed from the core scope. They are too deployment-specific to define canonically without bloating the protocol.

Rationale: Packet serialization is in scope because independently authored endpoints need one canonical byte representation in order to interoperate. Transport selection remains out of scope because the same framed packet bytes can be carried over different transports.

4. Protocol Overview

Non-Normative

Endpoints are addressed by path.

Leaves are hosted by endpoints.

A superior endpoint issues a downwards Call toward a subordinate endpoint or one of its leaves.

If the caller wants output, it declares a hook inside the call. The recipient returns one or more Data packets toward the hook host. Once a hook exists, either side MAY continue exchanging Data packets associated with that hook. A side signals it is done by setting end_hook = true on its final Data packet; the hook closes when both sides have done so. If normal execution cannot proceed, the endpoint MAY instead send a Fault packet upstream for that hook, which closes it immediately.

The protocol therefore has three core packet roles:

  • Call for downwards invocation
  • Data for returned data and ongoing hook traffic
  • Fault for upstream protocol failure reporting tied to a hook

This document uses the following notation for readability:

  • /a/b/c for endpoint paths
  • /a/b/c { leaf: tty0 } for a leaf on an endpoint
  • /a/b/c { hook: 7 } for a hook hosted by an endpoint

These notations are descriptive only. Leaves and hooks are not encoded as path segments.

5. Terms and Definitions

Normative

Term Definition
Tree The set of connected endpoints arranged by path.
Endpoint A participant in the protocol that can send, receive, host leaves, and route packets.
Path An ordered sequence of segments identifying an endpoint, serialized as Vec<String>.
Upwards In the direction of rising authority, closer to the root node.
Downwards In the direction of falling authority, farther from the root node.
Leaf A named service or object hosted by an endpoint.
Call A downwards packet that invokes a procedure on an endpoint or leaf.
Procedure An application-defined operation identified by procedure_id.
Hook A bidirectional interaction channel declared inside a Call and identified by hook_id relative to the calling endpoint that declared it.
Authority The endpoint that directly maintains a child connection at a local routing boundary.
Subordinate The lower of two endpoints in a described authority relationship.
Registered Local connection state in which a peer participates in routing.
Unregistered Local connection state in which a peer is connected but not routable.

6. Naming and Structural Conventions

Normative

Paths are serialized as Vec<String>.

Leaf identity is carried in dst_leaf.

Hook identity is carried in hook_id.

No path prefixes are reserved by this protocol.

dst_leaf names a specific leaf hosted by the destination endpoint. Leaf names SHOULD follow the same dotted convention as procedure_id: org.product.vN.part.name. The reserved empty string "" MUST NOT be used as a leaf name.

procedure_id is the canonical identifier for a procedure contract. A procedure contract includes the source library or namespace, the specific procedure identity, and the expected input and output schema pair.

procedure_id SHOULD follow the dotted convention org.product.vN.part.name, except for the reserved empty string "" used by the required introspection procedure defined in Section 12.1, where:

  • org identifies the owning organization or namespace root
  • product identifies the product or system namespace
  • vN identifies the contract version in whatever versioning scheme the owning product uses
  • part identifies the subsystem, leaf family, or functional area
  • name identifies the exact procedure or payload contract name

Each segment SHOULD be non-empty. Implementations SHOULD restrict segments to lowercase ASCII letters, digits, and underscores for portability. The version segment SHOULD appear in the third position.

For Data packets, the same procedure_id is used on both Call and Data packets.

Rationale: procedure_id is intentionally stricter than a method name or content type. It identifies a full callable contract, not just a label. The dotted convention is a strong recommendation rather than a wire-format requirement because the protocol itself does not parse or validate procedure_id structure — it is treated as an opaque string for routing and matching purposes. Version segment format is deliberately left to the owning product to avoid constraining existing versioning schemes.

7. Endpoint Model

Normative

7.1 Local Authority

Each endpoint enforces authority only at the connections it directly maintains.

At a local routing boundary:

  • a Call packet MUST be accepted only if it arrives from the direct parent connection permitted to issue downwards calls into the destination subtree represented by that boundary
  • a Call packet that violates that rule MUST be dropped silently
  • a Data packet MUST be accepted only if it belongs to a valid hook flow, routes correctly by path, and its src_path matches the expected peer recorded in local hook state; otherwise it MUST be discarded
  • a Fault packet MAY arrive only from the subordinate side of a hook-attributable call flow, and its src_path MUST match the expected subordinate peer recorded in local hook state or pending call context

This protocol does not define a protocol-level authority error packet.

7.2 Local Connection States

Each implementation MUST maintain at least the following local states:

State Meaning
Unregistered The connection exists locally but is not part of routing state.
Registered The connection is admitted into local routing state and may send, receive, or forward protocol traffic.

While a connection is Unregistered, an implementation:

  • MUST NOT forward protocol packets through it
  • MUST NOT trust its path claims for routing
  • MUST NOT allocate hook state on its behalf
  • MUST NOT execute protocol procedures received from it

Transition into Registered is implementation-defined and out of scope for this document.

Transition out of Registered MUST invalidate all local routing entries and hook state associated with that connection.

Rationale: The protocol no longer defines a handshake, but it still needs a hard boundary between connected peers and admitted peers.

8. Packet Framing

Normative

Each protocol packet consists of two length-prefixed byte sections:

  1. header bytes
  2. payload bytes

Both lengths MUST be encoded as big-endian u32.

The header MUST be serialized before the payload.

header bytes and payload bytes MUST use the rkyv archived format.

The canonical rkyv format-control settings for this protocol are:

  • little-endian primitives
  • aligned primitives
  • 32-bit relative pointers

An implementation that uses different rkyv format-control settings is not protocol-compatible.

Routing decisions MUST be made from header fields only.

Routers MUST NOT inspect payload structure in order to route a packet.

Rationale: rkyv does not define one single universal format independent of configuration. Its archived representation depends on format-control settings such as endianness, alignment, and pointer width. This specification therefore fixes those settings so "use rkyv" means one exact interoperable byte format rather than a family of related formats.

9. Packet Types

Normative

This protocol defines exactly three packet types.

Packet Type Value Meaning
Call 0x01 Downwards procedure invocation.
Data 0x02 Hook output or ongoing hook traffic.
Fault 0xFF Upstream protocol failure reporting for a hook.

The canonical archived representation of packet type identifiers MUST be:

#[derive(Archive, Serialize, Deserialize, Debug, Clone, PartialEq)]
pub enum PacketType {
    Call = 0x01,
    Data = 0x02,
    Fault = 0xFF,
}

Rationale: Fault is separated from Data so ordinary application output does not need to share semantics with protocol failure signaling. A receiver can distinguish successful hook traffic from protocol failure immediately from packet_type, without inspecting procedure_id or the payload contract.

10. Packet Header

Normative

Field Meaning
packet_type Selects packet semantics.
src_path Path of the sending endpoint.
dst_path Path of the destination endpoint.
dst_leaf Target leaf for a Call, if any.
hook_id Hook identifier scoped to the calling endpoint that declared the hook, for hook-associated packets.

Header rules:

  • src_path and dst_path MUST be present on all packets
  • the immediate receiver MUST validate that src_path matches the registered path of the peer on the connection from which the packet arrived; a packet whose src_path does not match MUST be discarded
  • dst_leaf MUST be None on Data and Fault
  • hook_id MUST be None on Call
  • hook_id MUST appear on Data and Fault

A packet whose header violates these rules MUST be discarded.

The canonical archived header layout MUST be:

#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct PacketHeader {
    pub packet_type: PacketType,
    pub src_path: Vec<String>,
    pub dst_path: Vec<String>,
    pub dst_leaf: Option<String>,
    pub hook_id: Option<u64>,
}

11. Routing Rules

Normative

11.1 Path Routing

All protocol routing is path-based.

Each registered endpoint path in the tree MUST be unique.

At a local routing boundary, an implementation MUST NOT maintain two registered child routes with the same claimed endpoint path.

An endpoint's local subtree consists of the endpoint's own path and every descendant path whose segment sequence begins with the endpoint's path as a prefix.

A path A lies within the subtree of path B if and only if B is a prefix of A.

The root endpoint's path is the empty path, and its subtree contains all paths.

When forwarding a packet, an implementation MUST evaluate the following steps in order, stopping at the first that applies:

  1. If a registered child path is a prefix of dst_path, forward toward the child with the longest matching prefix.
  2. If dst_path identifies the local endpoint, deliver the packet locally.
  3. If dst_path lies outside the local endpoint's subtree, forward the packet upward toward the direct parent connection.
  4. Otherwise, drop the packet silently.

Steps are evaluated in order; a packet that matches step 1 is never re-evaluated against steps 2 or 3.

The protocol defines no mandatory error packet for unresolved destinations.

Rationale: Longest-prefix routing is defined as a path-selection rule, not as a way to resolve duplicate ownership. The tree model assumes each endpoint path names exactly one place in the topology. If two child routes claim the same path, the local routing table is already invalid.

11.2 Call Enforcement

When forwarding or receiving a Call, an endpoint MUST apply the local-authority rules defined in Section 7.1 at the boundary where the packet arrives.

11.3 Data and Fault Routing

Data and Fault packets are routed by dst_path using the same path-routing rules as Call packets.

The sender of a Data packet MUST set dst_path to the path of the peer endpoint for that hook packet.

The sender of a Fault packet MUST set dst_path to the path of the hook host recorded in the active hook context or pending call context.

An implementation MAY maintain an internal fastpath keyed by locally validated hook state for performance, provided it remains behaviorally equivalent to path-based routing. hook_id is scoped to the calling endpoint and is not globally routable, so path remains the canonical routing key.

12. Call Definition

Normative

Field Meaning
procedure_id Identifier of the invoked procedure contract.
data Application-defined procedure input payload.
response_hook Optional hook declaration for returned data, fault delivery, and follow-on bidirectional hook traffic.

Rules:

  • the receiver MUST interpret procedure_id as the identifier of the procedure being invoked
  • the protocol does not define argument encoding beyond raw bytes in data
  • a Call without response_hook will receive no response; the receiver MAY execute the procedure but MUST NOT fabricate an implicit response path
  • if response_hook is present, response_hook.return_path MUST equal src_path

The canonical archived payload of a Call packet MUST be:

#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct CallMessage {
    pub procedure_id: String,
    pub data: Vec<u8>,
    pub response_hook: Option<HookTarget>,
}

12.1 Required Introspection Procedure

The empty string "" is reserved as the required introspection procedure_id.

Every endpoint MUST implement procedure_id == "".

Behavior:

  • when dst_leaf is None, the call requests endpoint introspection
  • when dst_leaf is set, the call requests introspection for that specific leaf

The result MUST be returned through the declared response hook.

A Call with procedure_id == "" MUST include response_hook.

12.2 Failure Behavior

If the destination endpoint does not exist, the packet is dropped during routing.

If the destination endpoint exists but the Call cannot be executed — because dst_leaf names no local leaf, or because procedure_id is unknown or unsupported — the endpoint MUST send a Fault upstream using the declared response_hook if one is present. If no response_hook is present, the endpoint MUST discard the Call silently.

Rationale: Fault reporting for an invalid call would be self-defeating if the callee first had to prove that the application procedure was valid before it could use the declared hook. The hook exists to carry either normal returned data or a protocol fault explaining why normal execution could not proceed.

13. Hook Definition

Normative

Hooks are declared only inside CallMessage.response_hook.

There is no standalone hook-open packet.

Field Meaning
hook_id Identifier scoped to the calling endpoint that declared the hook.
return_path Endpoint path to which returned Data or Fault packets are sent.

Pending call context is local transient state created when an endpoint receives a Call that declares response_hook and before that call has either been accepted into active hook state, rejected with Fault, or discarded. It MUST be keyed by (return_path, hook_id) and MUST retain enough information to emit an upstream Fault for that call if needed.

Rules:

  • hook_id MUST be unique across all hooks at the calling endpoint — active, pending, and inactive — for the lifetime of the endpoint
  • return_path MUST name the calling endpoint that hosts the hook
  • a hook is declared by response_hook inside a Call
  • at the callee, a pending call context MUST NOT be used to forward or process application data; it exists solely to validate and emit an upstream Fault for that received Call
  • a hook becomes active when the destination endpoint accepts that Call and allocates local hook state for it
  • at the hook host, an outbound pending hook MAY be promoted to active by the first valid returned Data or Fault packet from the expected peer, because the protocol defines no separate acceptance acknowledgment packet
  • when a Call is accepted, its pending call context MUST transition into active hook state
  • when a Call is rejected with Fault or discarded, its pending call context MUST be removed
  • once active, either side MAY send Data packets associated with that hook until the interaction ends
  • all protocol faults associated with the call MUST use that same hook_id

Rationale: Pending call context exists because some failures are discovered before normal application execution begins. The callee still needs enough validated state to attribute an upstream Fault to the declared hook without pretending that the hook was fully active for ordinary bidirectional traffic.

The canonical archived hook target layout MUST be:

#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct HookTarget {
    pub hook_id: u64,
    pub return_path: Vec<String>,
}

14. Data Definition

Normative

Field Meaning
procedure_id Identifier of the procedure contract to which this returned payload belongs.
data Application-defined output payload.
end_hook When true, this is the sender's final packet for this hook. No further Data packets may follow from this side.

Rules:

  • the router MUST NOT inspect or validate procedure_id
  • the receiver MUST validate that procedure_id matches the procedure_id of the Call that established the hook
  • for hook-associated Data, the receiver MUST validate src_path against the expected hook peer recorded in local hook state

Rationale: Ordinary hook traffic is part of the same procedure contract that created the hook, so the returned procedure_id stays anchored to the originating Call. This keeps hook validation simple and avoids treating a response as a separate contract lookup. Introspection therefore uses "" on both the Call and the Data it produces. Protocol faults are separate packets and therefore do not need to overload Data semantics.

The canonical archived payload of a Data packet MUST be:

#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct DataMessage {
    pub procedure_id: String,
    pub data: Vec<u8>,
    pub end_hook: bool,
}

14.1 Hook Data and Continuation

A hook MAY carry multiple Data packets in either direction if the application requires chunking, phased output, or prolonged bidirectional interaction. There is no protocol-level requirement that the callee send the first Data packet.

Every Data packet for a hook MUST set dst_path to the path of the peer endpoint for that hook packet.

A Data packet that arrives for a hook_id not yet in active hook state MUST be discarded, except that the hook host MAY treat the first valid returned Data packet from the expected peer as the activation point for its outbound pending hook and then process that same packet as the first active Data packet.

Rationale: The protocol allows symmetric hook traffic after activation but does not introduce a readiness or acknowledgment packet to synchronize the first Data frame. Callee-side pending context still never carries application data. The one exception is the hook host's first valid returned packet, which can safely serve as the observable proof that the remote side accepted the call. Higher-layer protocols that need stricter startup guarantees should still define their own first-packet discipline inside the hook.

14.2 Hook End

A sender SHOULD set end_hook = true on its final Data packet for that hook. A sender MUST NOT send further Data packets on a hook after sending a packet with end_hook = true. A hook closes when both sides have sent end_hook = true, or when either side sends or receives a Fault.

Rationale: Making end_hook = true a hard final marker rather than a soft hint removes ambiguity about whether the hook is still open. Both sides can close cleanly once they have each signaled completion, without needing a separate close packet or higher-layer shutdown sequence.

14.4 Fault Definition

Fault is a distinct packet type used for protocol-level failure reporting associated with a hook.

Protocol faults are upstream-only. An endpoint MUST NOT send a Fault packet to a subordinate endpoint.

The Fault payload is the following enum identified by fixed byte discriminants:

Fault Value Meaning
UnknownLeaf 0x01 The addressed dst_leaf does not exist on the destination endpoint.
UnknownProcedure 0x02 The destination does not support the requested procedure_id.
InvalidSourcePath 0x03 The packet src_path was invalid for the connection on which it arrived.
InvalidHookPeer 0x04 The Data or Fault sender did not match the expected peer recorded in hook state.
InternalError 0x05 The endpoint encountered an internal protocol-processing failure.

The canonical archived payload of a Fault packet MUST be:

#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct FaultMessage {
    pub fault: ProtocolFault,
}

#[repr(u8)]
#[derive(Archive, Serialize, Deserialize, Debug, Clone, Copy, PartialEq, Eq)]
pub enum ProtocolFault {
    UnknownLeaf = 0x01,
    UnknownProcedure = 0x02,
    InvalidSourcePath = 0x03,
    InvalidHookPeer = 0x04,
    InternalError = 0x05,
}

Rules:

  • a Fault packet MUST carry hook_id
  • a receiver of a Fault packet MUST validate src_path against the expected subordinate hook peer recorded in local hook state or pending call context

When an endpoint can attribute a protocol-level failure to a specific hook or declared response_hook, it MUST send a Fault packet upstream using:

  • dst_path set to the path of the hook host recorded in the active hook context or pending call context
  • the same hook_id
  • a ProtocolFault payload describing the condition

Sending a Fault packet closes the hook immediately for both sides. After sending or receiving a Fault packet, an implementation MUST remove that hook from active state.

If an endpoint receives a fault value it does not recognize, it MUST still treat the packet as a protocol fault and close the hook.

Rationale: Protocol faults are part of interoperability, so they need a fixed canonical payload contract rather than a free-form error blob. A small enum with stable byte discriminants is cheap to encode, easy to evolve, and avoids coupling core protocol behavior to human-readable messages. Receivers can make deterministic decisions from the fault kind alone.

Rationale: The fault set is intentionally small. Silent drop remains the canonical behavior for traffic that cannot be safely attributed to a valid call or hook, such as an unknown hook_id, malformed returned traffic, or a routing miss discovered by an intermediate router. Fault is reserved for failures that a receiver can attribute to a specific call flow and report upstream deterministically.

Rationale: An unrecognized protocol fault still means the application contract has failed and the hook can no longer continue safely. Requiring unknown fault values to terminate the hook preserves forward compatibility: newer peers may introduce additional fault kinds without causing older peers to accidentally keep a broken hook alive.

If an endpoint receives Data or Fault with an unknown or expired hook_id, it MUST discard the packet.

15. Introspection Payloads

Normative

Introspection is a machine-readable discovery mechanism for hosted leaves and supported procedure_id values.

Introspection MUST NOT include human-readable descriptions, parameter definitions, or serialized current state.

The caller is expected to know the meaning of each discovered procedure_id from the pre-shared contract identified by that procedure_id.

When the required blank introspection procedure is called, it MUST return one of the following payloads through the declared hook.

15.1 Endpoint Introspection

Returned when procedure_id == "" and dst_leaf == None.

Field Meaning
sub_endpoints The path segment identifiers of directly registered child endpoints. Each entry is the single path segment that distinguishes the child from the local endpoint. The full path of a child can be inferred by appending its segment to the local endpoint's path.
leaves List of introspection summaries for the endpoint's hosted leaves.

Each LeafIntrospectionSummary contains:

Field Meaning
leaf_name The leaf's canonical name, following the org.product.vN.part.name scheme.
procedures Full canonical procedure_id values supported by the leaf.

The canonical archived payload of endpoint introspection MUST be:

#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct EndpointIntrospection {
    pub sub_endpoints: Vec<String>,
    pub leaves: Vec<LeafIntrospectionSummary>,
}

#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct LeafIntrospectionSummary {
    pub leaf_name: String,
    pub procedures: Vec<String>,
}

15.2 Leaf Introspection

Returned when procedure_id == "" and dst_leaf names a specific leaf.

Field Meaning
leaf_name The leaf's canonical name, following the org.product.vN.part.name scheme.
procedures Full canonical procedure_id values supported by the leaf.

The canonical archived payload of leaf introspection MUST be:

#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct LeafIntrospection {
    pub leaf_name: String,
    pub procedures: Vec<String>,
}

Rules:

  • each listed procedure MUST be identified by its full canonical procedure_id, not by a leaf-local short name
  • sub_endpoints MUST list only the direct children registered at this endpoint; it MUST NOT enumerate deeper descendants

Rationale: Returning full procedure_id values avoids forcing the caller to reconstruct contract names from leaf-local fragments. Endpoint introspection and leaf introspection deliberately share the same leaf record shape so the endpoint-wide form is just a list of the leaf-specific form. sub_endpoints returns only immediate child identifiers rather than a full subtree description because the tree topology is not assumed to be globally known; callers that need deeper discovery can issue further introspection calls toward each discovered child.

16. Protocol Description

Non-Normative

The UnShell protocol keeps its core narrow: path addressing, downwards Call, hook-backed Data, and upstream Fault. procedure_id is the main semantic anchor, so callers and callees are expected to share knowledge of each procedure contract without relying on a protocol-level registry.

17. Security Considerations

Non-Normative

Although security is not defined by the protocol itself, implementations should treat the Unregistered state as a strict quarantine boundary.

Recommended behavior:

  • authenticate or otherwise validate a peer before moving it to Registered
  • rate-limit or expire idle unregistered peers
  • avoid disclosing topology before admission
  • avoid detailed admission failure reasons
  • invalidate hooks on disconnect unless a higher-layer session mechanism exists

18. Serialization and Implementation Notes

Non-Normative

This document uses Rust-like rkyv struct notation to describe fields because it matches the current implementation language. The notation is explanatory, but the on-wire byte format is normatively fixed in Section 8.

Recommended implementation limits:

Item Recommended limit
header length 64 KiB
payload length 64 MiB