Architecture¶
This document explains the design decisions and architecture of nauyaca, a Gemini protocol implementation in Python. Understanding these choices will help you appreciate why the code is structured the way it is and how to effectively extend it.
Why asyncio.Protocol/Transport?¶
Nauyaca uses Python's low-level asyncio.Protocol and asyncio.Transport pattern instead of the higher-level Streams API (asyncio.open_connection() / StreamReader / StreamWriter). This decision is fundamental to the project's architecture.
The Callback-Based Design¶
The Protocol/Transport pattern is callback-based rather than async/await-based:
class GeminiServerProtocol(asyncio.Protocol):
def connection_made(self, transport):
"""Called when connection established - not async"""
self.transport = transport
def data_received(self, data):
"""Called when data arrives - not async"""
self.buffer += data
def connection_lost(self, exc):
"""Called when connection closes - not async"""
self.cleanup()
These methods are synchronous callbacks invoked by the event loop. This design has several advantages:
Performance Benefits¶
-
Lower Overhead: No async/await context switching overhead. Callbacks execute directly in the event loop.
-
Efficient Buffering: Direct control over buffering means you can optimize memory usage and avoid unnecessary copies.
-
High Throughput: Better for servers handling many concurrent connections. Each connection's callbacks are lightweight.
Control Benefits¶
-
Fine-Grained Flow Control: Access to
pause_writing()andresume_writing()for backpressure handling when serving large files. -
Connection Lifecycle: Explicit hooks for every phase of connection lifecycle (established, data, EOF, closed).
-
Buffer Management: Complete control over how you accumulate and process incoming data.
Protocol-Oriented Design¶
The Protocol/Transport pattern naturally fits protocol implementations:
- Protocols are state machines: Each connection has state (buffering partial data, rate limiting, etc.)
- Transports are I/O primitives: Write bytes, close connection, get peer info
- Clear separation: Protocol logic separate from I/O mechanism
Trade-offs¶
The main trade-off is complexity: callback-based code is harder to reason about than linear async/await code. We bridge this gap with asyncio.Future when needed:
# In high-level async code:
loop = asyncio.get_running_loop()
result_future = loop.create_future()
# Create connection with protocol
transport, protocol = await loop.create_connection(
lambda: GeminiClientProtocol(result_future),
host, port, ssl=ssl_context
)
# Protocol sets result when done:
def connection_lost(self, exc):
if not self.result_future.done():
self.result_future.set_result(self.response)
# Back in async code:
response = await result_future # Bridged to async/await
This pattern lets us keep high-level APIs clean while maintaining low-level performance.
Module Structure¶
Nauyaca is organized into six top-level packages, each with clear responsibilities:
src/nauyaca/
├── protocol/ # Core protocol types (request, response, status codes)
├── server/ # Server implementation (protocol, handlers, routing, middleware)
├── client/ # Client implementation (protocol, session, high-level API)
├── security/ # TLS configuration, certificates, TOFU
├── content/ # Gemtext parsing, MIME detection, templates
└── utils/ # URL parsing, logging, helpers
protocol/ - Core Protocol Types¶
The protocol/ package contains protocol-agnostic definitions:
constants.py: Status codes, MIME types, protocol limits (1024 byte request limit)status.py:StatusCodeenum with semantic methods (is_success(),is_redirect())request.py:GeminiRequest- URL parsing, validation, normalizationresponse.py:GeminiResponse- Response building, header formatting
These types are used by both server and client. They encode the Gemini specification without assuming which side you're on.
Design decision: Separating protocol types from implementation lets us: - Test protocol logic independently - Reuse validation code between server and client - Swap out server/client implementations without changing protocol types
server/ - Server Implementation¶
The server/ package implements the server side:
protocol.py:GeminiServerProtocol(asyncio.Protocol)- connection handling, request parsinghandler.py:RequestHandlerbase class,StaticFileHandler, futureCGIHandlerrouter.py: URL pattern matching, route registration to handlersmiddleware.py: Rate limiting, access control, request filteringconfig.py:ServerConfigwith TOML configuration supportserver.py: High-level server startup functions
Layering: The server has three layers:
- Protocol layer (
protocol.py): Manages TLS connection, buffers data, parses request - Middleware layer (
middleware.py): Filters/validates requests before they reach handlers - Application layer (
handler.py,router.py): Business logic - what response to send
This separation allows: - Testing each layer independently - Swapping middleware without touching protocol code - Adding new handler types without modifying protocol parsing
client/ - Client Implementation¶
The client/ package implements the client side:
protocol.py:GeminiClientProtocol(asyncio.Protocol)- connection handling, response parsingsession.py:GeminiClient- high-level async API, TOFU integration, redirect following
The client is simpler than the server (no routing or middleware), but includes important security:
- TOFU validation: Certificate fingerprinting and trust-on-first-use checking
- Redirect loop detection: Maximum 5 redirects to prevent infinite loops
- Rate limit respect: Handles status 44 (SLOW DOWN) responses
security/ - Security Primitives¶
The security/ package centralizes all security concerns:
tls.py: SSL context creation with correct TLS 1.2+ configurationcertificates.py: Certificate generation, loading, fingerprinting (SHA-256)tofu.py: SQLite-backed TOFU database, certificate validation, import/export
Why separate? Security is critical and needs focused attention. By isolating it: - Security code is easier to audit - TLS configuration is consistent across server and client - TOFU implementation can be swapped out (e.g., for Redis-backed storage)
content/ - Content Handling¶
The content/ package handles content types:
gemtext.py: Gemtext parser/renderer, line type detection, directory listing generationtemplates.py: Error page templates, directory listing templates
Design decision: Content handling is separate because: - Gemtext parsing is complex enough to deserve its own module - Templates might be user-customizable in the future - Adding new content types (e.g., Markdown rendering) is easier
utils/ - Shared Utilities¶
The utils/ package contains helpers used across multiple packages:
url.py: URL parsing, validation, normalization (handlesgemini://scheme)logging.py: Logging configuration and formatters
Request/Response Flow¶
Understanding the data flow through nauyaca helps you know where to hook in custom logic.
Server Request Flow¶
graph TD
A[Client connects] --> B[TLS handshake]
B --> C[connection_made]
C --> D[data_received - buffer data]
D --> E{Complete request?}
E -->|No| D
E -->|Yes| F[Parse request URL]
F --> G[Run middleware chain]
G --> H{Middleware allows?}
H -->|No| I[Send error response]
H -->|Yes| J[Route to handler]
J --> K[Handler generates response]
K --> L[Send response]
I --> M[transport.close]
L --> M
M --> N[connection_lost]
Key points:
-
Buffering:
data_received()may be called multiple times for a single request. Always buffer until you see\r\n. -
Middleware runs first: Before routing, middleware can reject requests (rate limiting, IP filtering, certificate validation).
-
One request per connection: Gemini mandates closing the connection after each response. No keep-alive.
-
Transport writes: The protocol writes bytes to the transport, which handles actual network I/O.
Client Request Flow¶
graph TD
A[create_connection] --> B[TLS handshake]
B --> C[connection_made]
C --> D[Write request URL + CRLF]
D --> E[data_received - buffer response]
E --> F{Complete response?}
F -->|No| E
F -->|Yes| G[Parse response header]
G --> H{More data?}
H -->|Yes| E
H -->|No| I[Server closes connection]
I --> J[connection_lost]
J --> K[Set future result]
K --> L[Caller awaits future]
Key points:
-
Future-based: Client protocol uses
asyncio.Futureto bridge callbacks to async/await. -
Server controls closure: Server always closes connection. Client detects this in
connection_lost(). -
Response may be chunked: Large responses arrive in multiple
data_received()calls. Buffer until connection closes. -
TOFU validation: After TLS handshake, check certificate fingerprint against database.
Middleware Architecture¶
Nauyaca uses a chain of responsibility pattern for middleware.
Middleware Protocol¶
All middleware implements the Middleware protocol:
class Middleware(Protocol):
async def process_request(
self,
request_url: str,
client_ip: str,
client_cert_fingerprint: str | None = None,
) -> tuple[bool, str | None]:
"""
Returns:
(True, None) - allow request to proceed
(False, gemini_response) - reject with response
"""
...
This simple interface lets any component decide whether to allow a request.
Middleware Chain¶
Multiple middleware are composed into a chain:
class MiddlewareChain:
def __init__(self, middlewares: list[Middleware]):
self.middlewares = middlewares
async def process_request(self, request_url, client_ip, client_cert_fp):
for middleware in self.middlewares:
allow, response = await middleware.process_request(
request_url, client_ip, client_cert_fp
)
if not allow:
return False, response # First rejection wins
return True, None # All middleware allowed
Execution order matters: Middleware runs in the order registered. Typically:
- Rate limiting first (reject quickly to save resources)
- IP access control second (deny/allow lists)
- Certificate validation third (for paths requiring client certs)
- Custom middleware last
Built-in Middleware¶
Nauyaca provides three middleware out of the box:
RateLimiter - Token bucket algorithm per IP:
rate_limiter = RateLimiter(
config=RateLimitConfig(capacity=10, refill_rate=1.0)
)
# Allows burst of 10 requests, then 1 per second
AccessController - IP allow/deny lists with CIDR support:
ClientCertificateMiddleware - Require client certificates for paths:
Why Middleware?¶
The middleware pattern provides several benefits:
-
Composability: Mix and match security features without modifying server code.
-
Testability: Each middleware can be unit tested independently.
-
Order control: You decide which checks run first (important for performance).
-
Extensibility: Add your own middleware without touching nauyaca's code.
Example: Custom Middleware¶
Here's how to write custom middleware:
class PathBlockerMiddleware:
"""Block requests to specific paths."""
def __init__(self, blocked_paths: set[str]):
self.blocked_paths = blocked_paths
async def process_request(
self, request_url: str, client_ip: str, client_cert_fp: str | None
) -> tuple[bool, str | None]:
from urllib.parse import urlparse
path = urlparse(request_url).path
if path in self.blocked_paths:
return False, "51 Not Found\r\n"
return True, None
# Use it:
middleware_chain = MiddlewareChain([
RateLimiter(),
PathBlockerMiddleware({"/secret", "/hidden"}),
])
Extension Points¶
Nauyaca is designed to be extended without modifying core code. Here are the main extension points:
Custom Handlers¶
The most common extension is adding new request handlers:
from nauyaca.server.handler import RequestHandler
from nauyaca.protocol.request import GeminiRequest
from nauyaca.protocol.response import GeminiResponse
class DynamicContentHandler(RequestHandler):
"""Generate dynamic content based on request."""
def handle(self, request: GeminiRequest) -> GeminiResponse:
# Your logic here
content = self.generate_content(request)
return GeminiResponse.success(content, mime_type="text/gemini")
def generate_content(self, request: GeminiRequest) -> str:
# Generate gemtext based on request
return f"# Dynamic Page\n\nYou requested: {request.path}"
# Register with router:
router = Router()
router.register(r"^/dynamic/.*", DynamicContentHandler())
When to use: Any time you need to generate content programmatically instead of serving static files.
Custom Middleware¶
As shown above, middleware is the extension point for request filtering:
class LoggingMiddleware:
"""Log all requests."""
async def process_request(
self, request_url: str, client_ip: str, client_cert_fp: str | None
) -> tuple[bool, str | None]:
import logging
logging.info(f"{client_ip} -> {request_url}")
return True, None # Always allow
When to use: Authentication, authorization, logging, metrics, custom rate limiting.
TOFU Callbacks¶
The TOFU implementation supports callbacks for certificate events:
from nauyaca.security.tofu import TOFUDatabase
def on_new_certificate(hostname: str, port: int, fingerprint: str):
"""Called when seeing a host for the first time."""
print(f"New host: {hostname}:{port}")
def on_certificate_change(hostname: str, port: int, old_fp: str, new_fp: str):
"""Called when a known host's certificate changes."""
print(f"Certificate changed for {hostname}:{port}")
# Could send alert, log to security system, etc.
db = TOFUDatabase(
db_path="tofu.db",
on_new_certificate=on_new_certificate,
on_certificate_change=on_certificate_change,
)
When to use: Security monitoring, alerting, custom trust policies.
Configuration Extension¶
Server configuration uses Python dataclasses and can be extended:
from dataclasses import dataclass
from nauyaca.server.config import ServerConfig
@dataclass
class ExtendedConfig(ServerConfig):
"""Add custom configuration fields."""
enable_metrics: bool = False
metrics_port: int = 9090
@classmethod
def from_toml(cls, path: str):
# Load base config
base = super().from_toml(path)
# Add your custom fields from TOML
# ...
return cls(**base.__dict__, enable_metrics=True)
When to use: Adding new server features that need configuration.
Design Principles¶
Understanding these principles helps you work with nauyaca's architecture:
1. Protocol Correctness¶
Gemini has a simple but strict specification. Nauyaca prioritizes correctness:
- Always use
\r\n(CRLF) for line endings - Enforce 1024 byte request limit
- Require TLS 1.2+ minimum
- Close connection after each response
- Validate UTF-8 encoding
Rationale: A protocol implementation must be correct before it's fast or feature-rich.
2. Security by Default¶
Security features are enabled by default where possible:
- TLS is mandatory (no plaintext mode)
- Path traversal protection in
StaticFileHandler - Request size limits enforced
- TOFU validation in client
- Rate limiting available out of the box
Rationale: Users shouldn't have to remember to enable security.
3. Explicit Over Implicit¶
Nauyaca favors explicit code over magic:
- Middleware order is explicit (not auto-sorted)
- Route registration is explicit (not auto-discovered)
- Configuration is explicit (not convention-based)
Rationale: When debugging, explicit code is easier to trace and reason about.
4. Test-Driven Architecture¶
The architecture is shaped by testing needs:
- Protocol types are separate (easy to test without network)
- Handlers are dependency-injected (easy to mock)
- Middleware uses
async def(easy to test with pytest-asyncio)
Rationale: If it's hard to test, it's probably poorly designed.
Common Patterns¶
Here are patterns you'll see throughout the codebase:
Pattern: Data Buffering¶
Because data_received() can be called multiple times, always buffer:
def __init__(self):
self.buffer = b""
def data_received(self, data: bytes):
self.buffer += data
if b'\r\n' in self.buffer:
line, self.buffer = self.buffer.split(b'\r\n', 1)
self.process_line(line)
Why: Network data arrives in unpredictable chunks. Buffering ensures you process complete messages.
Pattern: Future Bridging¶
To use callback-based protocols from async code:
# Create future in async context
loop = asyncio.get_running_loop()
future = loop.create_future()
# Pass to protocol
protocol = MyProtocol(future)
# Protocol sets result in callback
def connection_lost(self, exc):
self.future.set_result(self.data)
# Await in async context
result = await future
Why: Bridges the gap between callback world (protocol) and async/await world (application).
Pattern: Fail-Fast Validation¶
Validate input as early as possible:
def handle(self, request: GeminiRequest) -> GeminiResponse:
# Validate first
if not request.path:
return GeminiResponse.bad_request("Path required")
# Then process
return self.process_request(request)
Why: Failing fast prevents wasted work and makes debugging easier.
Pattern: Context Managers for Cleanup¶
Use context managers for resources that need cleanup:
async with TOFUDatabase("tofu.db") as db:
is_valid = await db.verify_certificate(host, port, cert)
# Database automatically closed when context exits
Why: Ensures cleanup happens even when exceptions occur.
See Also¶
- API Reference: Server Protocol - Technical details of
GeminiServerProtocol - API Reference: Middleware - Complete middleware API
- How-To: Write Custom Middleware - Step-by-step guide
- How-To: Create Custom Handlers - Handler development guide