Files
unshell/PROTOCOL.md
T

666 lines
26 KiB
Markdown
Raw Normal View History

# UnShell Network Protocol Specification
**Version:** 0.1.0
**Status:** Draft — implementation in progress
**Last updated:** 2026-04-20
---
## Overview
The UnShell protocol is a **tree-addressed, message-passing protocol** for command
and control (C2) operations. It is designed around a homogeneous node model: every
participant (payload, operator, router) is structurally identical from the protocol's
perspective. Each node owns a set of **paths** in a global tree and responds to
requests addressed to those paths.
```
/agents/abc123/shell/exec ← a path owned by payload node "abc123"
/agents/abc123/files/read ← another path on the same payload
/operator/sess1 ← operator node's own registration path
/router/nodes ← router's built-in endpoint
```
A **router** is a dumb relay. It reads the destination path from a packet header and
forwards the packet body to whichever node registered that path. It has no application
logic. It does not interpret payloads. Think of it as a post office: it reads the
address on the envelope and delivers the contents without opening them.
---
## Design Goals
1. **Minimal footprint on the payload.** The payload binary must stay small. The
protocol must work in a `no_std + alloc` environment.
2. **Transport independence.** TCP is the first transport, but the protocol must not
assume TCP. HTTPS, ICMP, and other transports will be added later. The protocol
layer sits above the transport layer via a `Transport` trait.
3. **Router-opaque payloads.** The router only reads the packet header (destination
path, source path, packet type). The payload body is forwarded as opaque bytes.
This means the protocol can evolve without touching router code.
4. **Forward compatibility.** Adding new fields to message types must not break
existing implementations. Use rkyv's archived format, which supports this.
5. **Operator experience.** The operator CLI is a first-class node, not a special
client. It connects and registers like any payload, just with a terminal attached.
---
## Node Types
```
┌─────────────────┐ ┌─────────────────────────────────────────────┐
│ Payload Node │ │ Router Node │
│ │ │ │
│ - Registers at │ │ - Accepts TCP from all node types │
│ /agents/<id> │ │ - Maintains: node_id → (paths, tx_channel) │
│ - Hosts modules│ │ - Routes packets by longest-prefix match │
│ as endpoints │ │ - Has own endpoints at /router/... │
│ - no_std + alloc│ │ - NO application logic beyond routing │
└────────┬────────┘ └─────────────────────────────────────────────┘
│ TCP (reverse connect: payload → router)
┌────────▼────────┐
│ Operator Node │
│ (ush-cli) │
│ │
│ - Registers at │
│ /operator/<n>│
│ - Interactive │
│ REPL shell │
│ - Issues Tree │
│ Requests to │
│ any path │
└─────────────────┘
```
**Path conventions:**
- Payload nodes: `/agents/<node_id>/` prefix (e.g., `/agents/abc123/shell/exec`)
- Operator nodes: `/operator/<session_id>/` prefix
- Router built-ins: `/router/` prefix (e.g., `/router/nodes`, `/router/ping`)
**NodeType enum (v1):**
```rust
pub enum NodeType {
Payload,
Operator,
// Router variant added when multi-hop/pivoting is implemented
}
```
---
## Wire Format
Every transmission uses a **two-part framed message**:
```
┌──────────────────────────────────────────────────────────────────────┐
│ Part 1: Header │ Part 2: Payload │
│ │ │
│ [u32 big-endian length] │ [u32 big-endian length] │
│ [rkyv-serialised PacketHeader bytes] │ [rkyv payload bytes] │
│ │ │
│ Router reads this to determine routing │ Router forwards opaque │
└──────────────────────────────────────────┴───────────────────────────┘
```
Both length fields are **big-endian `u32`**, so the maximum frame size is ~4GB per
part. In practice, packets should be much smaller. A future streaming extension will
allow chunked payloads for large data transfers.
### Why two parts?
The router needs to know where to send a packet. With a single rkyv blob, the router
would have to deserialise the entire packet just to read the destination path. With a
separate header, the router deserialises only the small header (typically < 100 bytes)
and forwards the payload bytes untouched. This is efficient and keeps the protocol
transport-agnostic at the router level.
### PacketHeader
```rust
/// The packet header that every node sends before the payload.
/// The router reads ONLY this to determine routing.
/// The payload body is opaque to the router.
#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct PacketHeader {
/// Destination path in the global tree.
/// The router does a longest-prefix match against registered node paths.
/// Example: "/agents/abc123/shell/exec"
pub dst_path: String,
/// Source path of the sending node.
/// Used by the destination to know where to send the response.
/// Example: "/operator/sess1"
pub src_path: String,
/// Discriminates between handshake and protocol messages.
pub packet_type: PacketType,
}
/// Discriminates the payload type so the receiver knows how to deserialise it.
#[derive(Archive, Serialize, Deserialize, Debug, Clone, PartialEq)]
pub enum PacketType {
/// Sent by a newly connected node to register itself.
Handshake,
/// Sent by the router in response to a handshake.
HandshakeAck,
/// An application-level request (the main protocol message).
Request,
/// An application-level response.
Response,
}
```
**Why `String` for paths instead of `Vec<String>`?**
A single `/`-delimited string serialises smaller (one allocation, no Vec overhead)
and is easier for the router to do prefix matching on. Components are split at
application layer, not at the wire level.
---
## Handshake Protocol
When any node connects to the router, it must complete a handshake before sending
application messages. The handshake registers the node's identity and the paths it
owns.
```
Node Router
│ │
│──── TCP connect ────────────>│
│ │
│──── HandshakeMessage ───────>│ (PacketType::Handshake)
│ node_id: "abc123" │
│ node_type: Payload │
│ registered_paths: [...] │
│ platform: "linux-x86_64" │
│ │
│<─── HandshakeAck ────────────│ (PacketType::HandshakeAck)
│ accepted: true │
│ assigned_base_path: "..." │
│ │
│ [now registered, can send │
│ and receive Requests] │
```
**Handshake timeout:** If the node does not receive a `HandshakeAck` within **5
seconds**, it closes the connection and retries.
**Router timeout:** If the router does not receive a `HandshakeMessage` within **10
seconds** of a TCP connect, it closes the connection.
### HandshakeMessage
```rust
#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct HandshakeMessage {
/// Node identifier. For payloads: baked at compile time (base62).
/// For operator CLI: random per session (UUID or random base62).
pub node_id: String,
/// Whether this node is a payload or an operator shell.
pub node_type: NodeType,
/// The path prefixes this node owns. The router registers these.
/// Example: ["/agents/abc123"]
/// All sub-paths are implicitly owned by this prefix.
pub registered_paths: Vec<String>,
/// Human-readable platform string for operator visibility.
/// Example: "linux-x86_64", "windows-x86_64", "operator"
pub platform: String,
}
```
### HandshakeAck
```rust
#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct HandshakeAck {
/// Whether the router accepted this node's registration.
pub accepted: bool,
/// The canonical base path assigned by the router (usually matches
/// the first registered_path the node sent, but the router may adjust it).
/// Empty string if rejected.
pub assigned_base_path: String,
/// Human-readable rejection reason if accepted == false.
pub rejection_reason: Option<String>,
}
```
**Rejection reasons (v1):**
- `"duplicate_node_id"` — a node with this ID is already registered
- `"invalid_path"` — a registered path is malformed or conflicts with a reserved prefix
---
## Application Protocol: TreeRequest / TreeResponse
After the handshake, nodes communicate using `TreeRequest` / `TreeResponse` pairs.
A request travels: **sender → router → destination node**
A response travels: **destination → router → original sender** (using `src_path` from the request header as the destination path for the response)
### TreeRequest
```rust
#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct TreeRequest {
/// Unique ID for this request, generated by the sender.
/// The responder echoes this back in TreeResponse.request_id.
/// Enables correlation when multiple requests are in-flight.
pub request_id: u64,
/// The operation type.
pub request_type: RequestType,
/// Content-type string describing how to interpret `data`.
/// Convention: "core/None", "core/Utf8String", "core/Bytes", etc.
pub content_type: String,
/// The operation payload. Interpretation depends on content_type.
pub data: Vec<u8>,
}
#[derive(Archive, Serialize, Deserialize, Debug, Clone, PartialEq)]
pub enum RequestType {
/// Read a value at this path.
Read = 0,
/// List available sub-paths and procedures at this path.
GetProcedures = 1,
/// Write a value to this path.
Write = 2,
/// Invoke a named procedure at this path.
CallProcedure = 3,
}
```
### TreeResponse
```rust
#[derive(Archive, Serialize, Deserialize, Debug, Clone)]
pub struct TreeResponse {
/// Echoed from the corresponding TreeRequest.request_id.
pub request_id: u64,
/// Whether the operation succeeded or failed.
pub status: ResponseStatus,
/// Content-type of the response data.
pub content_type: String,
/// Response payload. Empty if status is an error with no data.
pub data: Vec<u8>,
}
#[derive(Archive, Serialize, Deserialize, Debug, Clone, PartialEq)]
pub enum ResponseStatus {
/// Operation completed successfully.
Ok = 0,
/// The requested path does not exist at the destination node.
NoBranchError = 1,
/// The requested operation is not supported at this path.
UnsupportedOperation = 2,
/// The destination node encountered an error executing the request.
ExecutionError = 3,
/// The request payload was malformed.
ProtocolError = 4,
}
```
---
## Content Type Convention
The `content_type` field in requests and responses follows a namespaced string
convention, similar to MIME types but simpler:
| Content type | Meaning |
|---|---|
| `"core/None"` | No data (empty payload) |
| `"core/Utf8String"` | Raw UTF-8 string in `data` |
| `"core/Bytes"` | Raw bytes (no specific interpretation) |
| `"core/ProcedureList"` | Response to `GetProcedures`: rkyv-serialised `Vec<ProcedureDescriptor>` |
| `"shell/Output"` | Shell command output (UTF-8 stdout + stderr) |
| `"files/Bytes"` | Raw file contents |
Custom module content types should use the module name as the namespace:
`"mymodule/MyType"`.
---
## Path Routing
The router uses **longest-prefix match** to route packets to nodes.
```
Registered paths: Incoming dst_path: Routes to:
/agents/abc123 /agents/abc123/shell/exec → node "abc123"
/agents/xyz456 /agents/xyz456/files/read → node "xyz456"
/router /router/nodes → router's built-in handler
```
**Rules:**
1. Split `dst_path` by `/`, find all nodes whose `registered_paths` is a prefix of `dst_path`.
2. Choose the node with the longest matching prefix (most specific).
3. If no match, return a `TreeResponse { status: NoBranchError, ... }` to the sender.
4. If multiple nodes match with equal prefix length (should not happen if registration is correct), route to the most recently registered node and log a warning.
---
## Router Built-in Endpoints
The router itself hosts a small set of endpoints at `/router/`:
| Path | RequestType | Returns |
|---|---|---|
| `/router/nodes` | `GetProcedures` | List of all connected nodes with their paths and types |
| `/router/ping` | `Read` | `"pong"` (latency check) |
---
## Real-World Scenario Analysis
This section stress-tests the protocol against conditions you'll actually encounter
on an engagement or in the wild.
### Scenario 1: Flaky Network / Payload Reconnect
**Situation:** A payload is behind a NAT and its TCP connection to the router drops
(firewall timeout, network hiccup, target rebooted).
**What happens:**
1. Payload's `recv()` call returns `TransportError::Disconnected` (EOF) or `TransportError::Io`.
2. Payload closes the TcpStream, waits **5 seconds**, attempts reconnect.
3. Router's node thread for this connection receives EOF, removes the `NodeInfo` entry from the registry, exits cleanly.
4. Payload reconnects, sends a new `HandshakeMessage` with the **same** `node_id`.
5. Router re-registers it. The operator runs `list` and sees the payload appear again.
**Operator experience:** The operator may see the payload disappear from `list` briefly
during the reconnect window. Sessions associated with that payload become temporarily
unresponsive. After reconnect they work again.
**Failure mode:** If the payload's `node_id` was stored as persistent session state on
the operator side, it should survive the reconnect without the operator re-typing `use`.
**Protocol requirement:** The router must handle re-registration of a node ID that was
previously registered. The old entry is already gone (thread exited), so this is a
clean re-registration.
---
### Scenario 2: Operator Disconnects Mid-Session
**Situation:** The operator closes the CLI (`Ctrl+C`, terminal crash) while a payload
is still connected.
**What happens:**
1. Router's operator node thread receives EOF. Removes `/operator/sess1` from registry.
2. Any in-flight `TreeRequest` from that operator that the payload hasn't responded to
yet: the payload sends a `TreeResponse` back, router tries to route it to
`/operator/sess1`, finds no registered node, discards the response and logs a warning.
3. Payloads remain connected. The payload's modules keep running (persistence).
**Operator experience:** When the operator reconnects, it gets a **new session ID**
(`/operator/sess2`). It runs `list` to see what payloads are still connected. Background
operations on payloads that were running continue.
**Key insight:** The payload is the persistent state. The operator is ephemeral.
This is the "background services without another process" design — payload modules
keep running even when no operator is connected.
---
### Scenario 3: Multiple Operators
**Situation:** Two operators connect simultaneously (e.g., red team lead and junior
analyst).
**What happens:**
1. Both connect, get unique session IDs: `/operator/sess1` and `/operator/sess2`.
2. Both can send requests to any payload path.
3. Responses go back to the requesting operator's `src_path`.
4. There is no access control in v1. Both operators have full access to all paths.
**Collision scenario:** Both operators call `/agents/abc123/shell/exec "ls"` at the
same time. The payload processes requests sequentially (single-threaded recv loop).
It sends two responses, each echoing the correct `request_id`. Each response routes
to the operator that sent the matching request (via `src_path` in the request header).
**Failure mode in v1:** No locking on the payload side. If a `Write` and a `Read` to
the same resource happen simultaneously, the result is whatever order the TCP stack
delivers them. This is acceptable for v1 red team use where multiple operators are
unlikely to stomp each other on the same target simultaneously.
**Future:** Add an optional exclusive-lock request type for sensitive operations.
---
### Scenario 4: Large Data Transfer (File Exfiltration)
**Situation:** Operator requests a large file (100MB) from a target.
**Problem with current design:** The `u32` length prefix allows up to 4GB per packet,
but buffering 100MB in RAM on the payload before sending is problematic on constrained
targets.
**V1 approach:** Accept this limitation. Files up to ~50MB should be fine in practice
for most engagements. The `TreeRequest.data` field holds the serialised request;
the `TreeResponse.data` field holds the file bytes. For v1, the payload reads the
entire file into a `Vec<u8>` and sends it.
**Future (chunked streaming):** Add `PacketType::Stream` and `PacketType::StreamEnd`
to support chunked transfers. The router passes stream packets through without buffering.
The operator reassembles chunks. This requires a stream ID in the header to demultiplex
concurrent streams.
---
### Scenario 5: AV / EDR Detection via Network Traffic
**Situation:** The payload is on a monitored network. The router is a VPS. Plain TCP
connections from the target to an unknown IP may trigger alerts.
**V1 limitation:** Plaintext TCP. Easy to detect.
**Transport abstraction payoff:** The `Transport` trait makes this the router's and
payload's responsibility, not the protocol's. To switch to HTTPS:
1. Implement `HttpsTransport: Transport` for the payload.
2. Have the payload connect to a domain name (baked at compile time) on port 443.
3. The router terminates TLS and speaks the same framing protocol underneath.
4. From the network's perspective: an HTTPS connection to what looks like a CDN.
Nothing in the protocol spec changes. Only the `Transport` implementation swaps.
---
### Scenario 6: Router Crash / Restart
**Situation:** The router process crashes or is restarted (e.g., VPS reboot).
**What happens:**
1. All node TCP connections drop simultaneously.
2. All nodes (payloads and operators) receive `Disconnected` errors.
3. All nodes enter reconnect loops.
4. Once the router restarts and starts accepting connections, nodes reconnect and
re-register in whatever order their reconnect loops fire.
5. The router comes back to a clean state (no session persistence across restarts in v1).
**Failure mode:** In-flight requests at the time of crash are lost. The operator may
see commands that appear to hang. The operator should use a timeout on requests.
**V1 mitigation:** Request timeout is on the operator's TODO list. For now, the
operator can detect a crash by the payload disappearing from `list`.
**Future:** The router could persist its node registry to disk and recover after restart.
---
### Scenario 7: Malformed Packet / Bad Actor
**Situation:** Something sends a malformed packet to the router (fuzzer, compromised
node, network corruption).
**Defense layers:**
1. **Length prefix:** If the announced frame length is > a max limit (e.g., 64MB), the
router closes the connection with `TransportError::FrameTooLarge`. No allocation.
2. **rkyv deserialisation:** If the header bytes don't decode to a valid `PacketHeader`,
`rkyv::access` returns an error. The router closes the connection.
3. **Unknown `dst_path`:** Routes to no node, sends back `NoBranchError`.
4. **No authentication in v1:** Any node can send to any path. This is acceptable for
v1 where the router address is only known to the operator. Authentication (shared
secret or challenge-response) is a v2 concern.
---
### Scenario 8: Pivot / Multi-Hop (Future)
**Situation:** A payload on an internal network can only reach another internal host,
not the external router. A "pivot" payload acts as a relay.
**How the tree model enables this:**
1. Pivot payload registers at `/agents/pivot1/` on the external router.
2. Pivot payload also acts as a *local router* for sub-agents.
3. Sub-agents connect to the pivot payload's local listener and register.
4. The pivot payload's `/agents/pivot1/agents/` prefix forwards packets to sub-agents.
5. From the external operator's perspective: `/agents/pivot1/agents/sub1/shell/exec`
is just a deeper path. The routing is recursive.
**Protocol requirement to enable this:** Add `NodeType::Router` to the enum. A pivot
payload registers as a `Router` node, not a `Payload` node. The external router
knows to forward any path with `/agents/pivot1/` prefix to the pivot connection,
and the pivot routes further from there.
This does not require protocol changes to v1. Only the `NodeType` enum needs the
`Router` variant added back.
---
## Transport Trait
All transports implement this interface:
```rust
/// A bidirectional framed transport.
///
/// Implementations are responsible for framing: the two-part header+payload format
/// described in the wire format spec. Each `send` call transmits exactly one
/// logical packet (header + payload). Each `recv` call receives exactly one.
///
/// Implementations MUST use `read_exact`-style loops (not single `read` calls)
/// because TCP is a stream protocol and may deliver partial frames.
///
/// # Example
///
/// ```rust
/// // TCP implementation skeleton
/// impl Transport for TcpTransport {
/// fn send(&mut self, header: &PacketHeader, payload: &[u8]) -> Result<(), TransportError> {
/// // 1. Serialise header to bytes
/// // 2. Write [u32 header_len][header bytes][u32 payload_len][payload bytes]
/// // 3. Use write_all() to ensure complete write
/// }
/// fn recv(&mut self) -> Result<(PacketHeader, Vec<u8>), TransportError> {
/// // 1. read_exact 4 bytes → header length
/// // 2. read_exact N bytes → header bytes
/// // 3. Deserialise header
/// // 4. read_exact 4 bytes → payload length
/// // 5. read_exact M bytes → payload bytes
/// // 6. Return (header, payload)
/// }
/// }
/// ```
pub trait Transport: Send {
/// Send a packet (header + payload) over this transport.
/// Blocks until all bytes are written.
fn send(&mut self, header: &PacketHeader, payload: &[u8]) -> Result<(), TransportError>;
/// Receive one packet from this transport.
/// Blocks until a complete header+payload pair is received.
fn recv(&mut self) -> Result<(PacketHeader, Vec<u8>), TransportError>;
}
#[derive(Debug, thiserror::Error)]
pub enum TransportError {
#[error("I/O error: {0}")]
Io(#[from] std::io::Error),
#[error("frame header too large: {0} bytes (max {1})")]
FrameTooLarge(usize, usize),
#[error("connection closed cleanly")]
Disconnected,
#[error("rkyv deserialisation failed")]
DeserialiseError,
}
```
### Reconnect Policy
**Payloads:** On `Disconnected` or `Io(_)` from `recv()` or `send()`:
1. Close the transport.
2. Wait 5 seconds.
3. Attempt to create a new transport connection.
4. If connect fails, wait 5 more seconds, retry. No maximum retry limit.
5. On connect success, run the handshake again.
**Operator CLI:** On disconnect, print a message and exit. The operator restarts the
CLI manually. (In a future version, the CLI could auto-reconnect and restore session.)
---
## Frame Size Limits
| Limit | Value | Reason |
|---|---|---|
| Max header length | 64 KB | Headers should never be this large; anything bigger is a bug or attack |
| Max payload length | 64 MB | Sufficient for most file transfers; larger files need chunked streaming (future) |
| Handshake timeout | 10 s (router) | Prevent resource exhaustion from hanging connections |
| Handshake ack timeout | 5 s (node) | Keep reconnect loops responsive |
---
## Version Compatibility
rkyv's archived format allows adding new fields (with `#[rkyv(default)]` for missing
fields when reading older messages). This means:
- New fields can be added to any message type without breaking existing implementations.
- Removing or renaming fields IS a breaking change.
- The `PacketType` enum should only gain variants, never lose them.
When breaking changes are necessary, bump the protocol version (future: add a version
field to the framing format).
---
## Implementation Checklist
- [ ] `src/protocol/mod.rs` — re-exports all protocol types
- [ ] `src/protocol/types.rs` — PacketHeader, PacketType, TreeRequest, TreeResponse, HandshakeMessage, HandshakeAck
- [ ] `src/protocol/content_types.rs` — content type constants
- [ ] `src/transport/mod.rs` — Transport trait, TransportError
- [ ] `src/transport/tcp.rs` — TcpTransport implementing Transport
- [ ] `src/tree/mod.rs` — Tree, Endpoint trait (new implementation with correct routing)
- [ ] `ush-router/` — router binary
- [ ] `ush-payload/` — payload binary with transport layer
- [ ] `ush-cli/` — operator REPL binary
- [ ] Unit tests for framing round-trips, tree routing correctness
- [ ] Integration test: two nodes through a real router