Terrestrial Extendable Retrieval & Addressing
A semantic content discovery network with cryptographic integrity guarantees.
TERA is a peer-to-peer network that enables similarity-based content discovery while preventing spam through cryptographic verification. Unlike traditional content-addressed networks (like IPFS) that require exact content hashes, TERA allows you to find "similar" content with provable integrity.
Traditional DHTs have a fundamental limitation: they can only verify exact content identity. TERA introduces a dual-output hash function that provides:
- H_crypto - A homomorphic hash supporting O(1) incremental extensions
- H_semantic - Universal neural kernels for multi-modal similarity
This enables integrity-gated semantic search: nodes can verify that content legitimately extends a root hash while filtering spam based on relevance.
- Universal: Works with any content type (text, images, code, audio, binary)
- Extendable: Add content to a collection in O(1) time without recomputing the entire hash
- Verifiable: Cryptographically prove that content B extends content A
- Spam-resistant: Invalid extensions are automatically rejected by the network
- Parameterized: Users define their own notion of "similarity" via runtime kernel parameters
- Model-agnostic: Neural kernels are content-addressed and exchangeable (like codecs)
Traditional IPFS:
Content → SHA-256 → Exact lookup → Retrieve
TERA:
Content → (H_crypto, H_semantic) → Similarity search + Verification → Discover
Example:
// Extract universal features (works for text, images, code, etc.)
features, _ := semantic.ExtractFeatures(content, filename)
// → FeatureVector{Modality: "text", Data: [512]float32, Hash: "..."}
// Load a neural kernel by content ID (cached locally like a codec)
registry := semantic.NewKernelRegistry("~/.tera/kernels")
kernel, _ := registry.Get("bafyreisemantic...") // IPLD CID
// Compute similarity with runtime parameters
params := semantic.KernelParams{
WeightSemantic: 0.7,
WeightLexical: 0.3,
WeightStructural: 0.1,
Threshold: 0.6,
}
similarity, _ := kernel.ComputeSimilarity(featuresA, featuresB, params)
// Create extendable content with cryptographic proof
root := tera.NewContent(features)
extended := root.Extend(newFeatures) // O(1) verification
// Network forwards only if:
// 1. H_crypto verifies (legitimate extension)
// 2. Neural kernel similarity exceeds threshold (relevant)TERA solves the heterogeneous model problem in distributed systems. In a P2P network, different nodes may use different LLMs (Claude, GPT-4, local models), which produce incompatible embeddings. TERA's solution:
Content-Addressed Kernels (like audio/video codecs):
- Small neural networks (100KB-1MB) for computing similarity
- Referenced by IPLD CID (e.g.,
bafyreisemantic...) - Cached locally, downloaded on-demand
- Multiple kernels coexist (like having H.264, VP9, AV1)
Universal Features (model-agnostic):
- Fixed-size vectors (512 floats) for all content types
- Extracted deterministically (no training needed)
- Text: TF-IDF + n-grams, Images: color/edge histograms, Code: token frequency
- Lightweight enough to transmit over network
Runtime Parameters (high dexterity):
- Users tune kernel behavior at query time
- No need to recompute features or retrain models
- Example: Adjust semantic vs lexical focus per query
This design means nodes can share a "taste" (kernel) without forcing everyone to use the same LLM.
┌─────────────────────────────────────────┐
│ Application Layer │
│ (Queries, Publications, Subscriptions) │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Gossip Protocol + Gatekeeping │
│ (Forward if: valid_extension ∧ similar) │
└─────────────────────────────────────────┘
↓
┌──────────────────┬──────────────────────┐
│ H_crypto │ H_semantic │
│ (Homomorphic) │ (Neural Kernels) │
│ Integrity ✓ │ Discovery ✓ │
└──────────────────┴──────────────────────┘
↓
┌──────────────────┬──────────────────────┐
│ Storage Layer │ Kernel Registry │
│ (BadgerDB) │ (IPLD Cache) │
└──────────────────┴──────────────────────┘
↓
┌─────────────────────────────────────────┐
│ libp2p Transport Layer │
└─────────────────────────────────────────┘
Phase 1: Primitives (Complete ✓)
- Homomorphic hash implementation (
crypto/) - Universal feature extraction (
semantic/features_universal.go) - Native neural kernels (
semantic/neural.go) - IPLD kernel descriptors (
semantic/kernel_model.go) - Gatekeeping logic (
core/) - Working demo (
examples/demo.go)
Phase 2: Network (Complete ✓)
- libp2p integration (
network/) - Gossip protocol with pubsub
- Basic node implementation
- CLI tool (
tera-node)
Phase 3: Storage & Kernels (In Progress)
- Extension graph storage (
storage/) - BadgerDB persistence layer
- Kernel registry with caching
- Multi-modal feature extraction
- Content discovery API
- Interest subscription mechanism
- IPFS integration for content
- DHT for peer discovery
- Kernel training utilities
# Run tests
go test ./...
# Run demo (local simulation)
go run examples/demo.go
# Build and run network node
go build -o tera-node ./cmd/tera-node
# Start bootstrap node
./tera-node -port 9000 -interests "machine learning"
# Start second node (in another terminal)
# Replace <PEER-ID> with the peer ID from bootstrap node
./tera-node -port 9001 \
-bootstrap "/ip4/127.0.0.1/tcp/9000/p2p/<PEER-ID>" \
-interests "machine learning"
# In the node shell, publish content:
> publish neural networks and deep learning algorithms
> stats
> peersIt's a pun. IPFS is "InterPlanetary" File System, so naturally this is the "Terrestrial" version. Supposedly inverted priorities: we're focused on Earth-based problems like spam, discovery, and semantic search rather than interplanetary data transfer.
- IPFS: Content-addressed storage (exact matching only)
- DHT2VEC (2020): Early exploration of semantic DHTs
- Kademlia: XOR-metric DHT routing (no semantic awareness)
TERA combines ideas from content-addressed networks, homomorphic 5C2B cryptography, and kernel methods to create something new: semantic content addressing with integrity.
BSD 3-Clause License (see LICENSE file)
This project is in early development. Contributions welcome once the core primitives are stable.