Example Input
functional_description: 'Classify short customer support messages (≤128 tokens) as: billing_issue, technical_fault, general_inquiry, or escalation_required. Must run on a Raspberry Pi 4 with no internet.'
target_runtime: Python/ONNX
size_budget_mb: 22
precision: int8
Example Output
## STEP 1 — SEMANTIC DECOMPOSITION
Primary transformation: text → 4-class categorical label
Latency class: interactive (<50ms acceptable on Pi 4)
Complexity class: shallow-nonlinear (4-class sentiment-style classification over short sequences)
Edge constraints: 4GB RAM ceiling, ARM Cortex-A72, no GPU, single-threaded inference preferred
## STEP 2 — ARCHITECTURE SELECTION
Chosen architecture: Embedded Token MLP (ET-MLP)
- Embedding layer: vocab=8192, dim=64 → 524KB at fp32, 262KB at int8
- Mean-pool over sequence → 64-dim vector
- FC(64→128) + ReLU
- FC(128→64) + ReLU
- FC(64→4) + Softmax
Total parameters: ~612K | fp32=2.4MB | int8=0.6MB ✓ well within 22MB budget
Activations: ReLU (hidden), Softmax (output)
Objective: CrossEntropy (training reference)
Size verdict: PASS — leaves room for tokenizer vocab table (~3MB BPE)
## STEP 3 — CODE SYNTHESIS
**A. Model Definition**
```python
import numpy as np
import onnx
from onnx import helper, TensorProto, numpy_helper
VOCAB_SIZE = 8192
EMBED_DIM = 64
HIDDEN1 = 128
HIDDEN2 = 64
NUM_CLASSES = 4
SEED = 42
rng = np.random.default_rng(SEED)
def make_weights():
# Spec clause: 'classify short messages' → embedding over token ids
emb = rng.normal(0, 0.02, (VOCAB_SIZE, EMBED_DIM)).astype(np.float32)
w1 = rng.normal(0, 0.1, (EMBED_DIM, HIDDEN1)).astype(np.float32)
b1 = np.zeros(HIDDEN1, dtype=np.float32)
w2 = rng.normal(0, 0.1, (HIDDEN1, HIDDEN2)).astype(np.float32)
b2 = np.zeros(HIDDEN2, dtype=np.float32)
w3 = rng.normal(0, 0.1, (HIDDEN2, NUM_CLASSES)).astype(np.float32)
b3 = np.zeros(NUM_CLASSES, dtype=np.float32)
return emb, w1, b1, w2, b2, w3, b3
```
**B–F sections continue with full ONNX graph construction, int8 quantization via onnxruntime quantization API, serialization to support_classifier.onnx, a 15-line loader, and 3 self-test assertions covering each ambiguous label boundary.**
## STEP 4 — SPEC VERIFICATION MATRIX
| Spec Clause | Implemented By |
|---|---|
| 'short messages ≤128 tokens' | Sequence truncation in tokenizer harness, line 14 |
| 'billing_issue … escalation_required' | Output head FC(64→4), label map dict line 67 |
| 'Raspberry Pi 4' | int8 quantization, ONNX CPU EP, single-thread session option |
| 'no internet' | All weights bundled in .onnx; tokenizer vocab embedded as JSON sidecar |
## STEP 5 — FOOTPRINT REPORT
Serialized size: ~3.8MB (model 0.6MB int8 + tokenizer vocab 3.2MB)
CPU inference (Pi 4, 1 thread): ~8ms per message
RAM at runtime: ~48MB (ONNX Runtime overhead + model)
Dependencies: onnxruntime==1.17.0, numpy==1.26 — both pip-installable offline via wheel