This document provides a comprehensive mathematical treatment of Sparse Distributed Memory (SDM), including theoretical foundations, proofs, and analysis of key properties.
- Introduction
- Mathematical Framework
- Geometric Interpretation
- Storage and Recall Operations
- Capacity Analysis
- Noise Tolerance
- Convergence Properties
- Information Theory Perspective
- Statistical Properties
- Theoretical Bounds
- Comparison with Other Models
- Advanced Topics
Sparse Distributed Memory operates on the principle that high-dimensional binary spaces have unique geometric properties that can be exploited for robust information storage and retrieval. The key insight is that random points in high-dimensional spaces are almost always far apart, allowing for distributed storage with minimal interference.
Let's establish the fundamental mathematical objects:
- Address Space: 𝔹ⁿ = {0,1}ⁿ, the n-dimensional binary hypercube
- Data Space: 𝔹ᵐ = {0,1}ᵐ, typically m = n
- Hard Locations: H = {h₁, h₂, ..., hₘ} ⊂ 𝔹ⁿ, |H| = M
- Hamming Distance: d(x,y) = Σᵢ|xᵢ - yᵢ| for x,y ∈ 𝔹ⁿ
For random vectors x,y ∈ 𝔹ⁿ, the Hamming distance follows a binomial distribution:
P(d(x,y) = k) = C(n,k) × 2⁻ⁿ
With mean μ = n/2 and variance σ² = n/4.
The critical distance r* is defined as the activation radius that maximizes storage capacity while maintaining acceptable noise tolerance:
r ≈ n/2 - α√(n/4)*
Where α ≈ 1.96 for 95% confidence, giving:
r ≈ 0.451n* for large n
The number of points within Hamming distance r from a given point:
V(n,r) = Σₖ₌₀ʳ C(n,k)
For a random hard location h and random address x:
P(d(x,h) ≤ r) = V(n,r) / 2ⁿ
The fraction of space covered by hyperspheres of radius r around M randomly placed centers:
Coverage ≈ 1 - e⁻ᴹ·ᴾ⁽ᵃᶜᵗⁱᵛᵃᵗⁱᵒⁿ⁾
For each hard location hᵢ within distance r of address x, storing data d:
C[i,j] ← C[i,j] + (2d[j] - 1)
Where C[i,j] is the counter for bit j at location i.
The recalled bit j is determined by:
d̂[j] = sgn(Σᵢ∈ₐ C[i,j])
Where A = {i : d(x,hᵢ) ≤ r} is the set of activated locations.
After storing S patterns, the expected signal strength for a stored pattern:
Signal = S × |A|
The noise variance from other patterns:
Noise² = S × |A| × p(1-p)
Where p is the probability of a 1 bit in the data.
The number of patterns that can be stored with acceptable error rate ε:
C_max = (M × ln(1/ε)) / (2 × r × ln(n))
Accounting for overlapping activation patterns:
C_practical ≈ 0.15 × M for r = r*
The average number of patterns stored per hard location:
λ = (C × |A|) / M
Where |A| is the expected number of activated locations.
Given address x corrupted by noise level ρ (fraction of flipped bits):
P(correct recall) = Φ((μ_signal - μ_noise) / σ_total)
Where:
- μ_signal = S × |A ∩ A'|
- μ_noise = S × |A ∆ A'| / 2
- A' = activated set for noisy address
The maximum noise level for reliable recall:
ρ_max ≈ (r - r) / n*
This gives approximately 10-15% noise tolerance for optimal parameters.
The probability of correcting k errors:
P(correction) = Σᵢ₌₀ᵏ C(n,i) × P(d(recalled, original) ≤ i)
Using recalled data as a new address:
x_{t+1} = recall(x_t)
Converges to a fixed point when:
||x_{t+1} - x_t|| < θ
The expected number of iterations to convergence:
E[T] ≈ log(n) / log(1/ρ)
Where ρ is the initial error rate.
The number of stable fixed points:
N_attractors ≈ C × (1 - e^(-λ))
SDM as a noisy channel:
Capacity = max I(X;Y) = n × (1 - H(p_error))
Where H is the binary entropy function.
Between stored and recalled patterns:
I(stored; recalled) = H(recalled) - H(recalled|stored)
The redundancy factor for distributed storage:
R = |A| ≈ M × P(activation)
The overlap between activation patterns for different addresses:
E[|A₁ ∩ A₂|] = M × P(activation)²
Var[|A₁ ∩ A₂|] = M × P(activation)² × (1 - P(activation)²)
After storing S random patterns:
P(C[i,j] = k) ≈ N(0, S × P(activation))
For large S, counters approach normal distribution.
The interference between stored patterns:
Crosstalk = (Σᵢ≠ⱼ |⟨pᵢ, pⱼ⟩|) / (C × (C-1))
For storing C patterns with error rate ε:
M ≥ (C × log(m/ε)) / P(activation)
Information-theoretic limit:
C ≤ (M × n × log(2)) / (m × H(p_error))
The fundamental trade-off between capacity, noise tolerance, and fidelity:
C × ρ_max × (1-ε) ≤ K
Where K is a constant depending on system parameters.
| Property | SDM | Hopfield |
|---|---|---|
| Capacity | O(M) | O(n/log n) |
| Recall Time | O(M) | O(n²) iterative |
| Noise Tolerance | ~15% | ~10% |
| Storage Time | O(M) | O(n²) |
SDM can be viewed as a generalization of Bloom filters:
- Bloom filters: Binary membership testing
- SDM: Associative recall of vector data
Both use similar principles but:
- LSH: Optimized for similarity search
- SDM: Optimized for associative memory
Using activation radius that scales with dimension:
r(n) = n/2 - β√n
Optimal β depends on desired sparsity.
Multiple levels with different dimensionalities:
Level_k: n_k = n₀ × α^k
Allows multi-resolution storage and recall.
Exploiting quantum superposition:
|ψ⟩ = Σᵢ αᵢ|hᵢ⟩
Theoretical capacity: O(2ⁿ) with quantum parallelism.
Extending to continuous vectors:
d(x,y) = ||x - y||₂ for x,y ∈ ℝⁿ
Activation function: a(d) = exp(-d²/2σ²)
Theorem: The optimal activation radius r* maximizes expected signal-to-noise ratio.
Proof:
- Signal strength: S ∝ P(activation)
- Noise variance: N² ∝ P(activation)
- SNR = S/N ∝ √P(activation)
- Maximize subject to interference constraint
- Result: r* ≈ 0.451n □
Theorem: SDM capacity scales linearly with number of hard locations.
Proof:
- Each pattern activates |A| ≈ M·P(activation) locations
- Each location can distinguish ~√S patterns
- Total capacity: C ≈ M/|A| × √S
- Solving for equilibrium: C ≈ 0.15M □
Theorem: Iterative recall converges to a fixed point for ρ < ρ_critical.
Proof (sketch):
- Define energy function E(x) = -⟨x, recall(x)⟩
- Show E decreases with each iteration
- Bounded below → convergence
- Basin analysis gives ρ_critical ≈ 0.15 □
Based on theoretical analysis:
- Dimension: n ≥ 1000 for good properties
- Hard Locations: M ≈ √(2ⁿ) balanced approach
- Activation Radius: r = ⌊0.451n⌋
- Counters: 8-bit sufficient for most applications
For a system with n=1000, M=10,000:
- Capacity: ~1,500 patterns
- Noise tolerance: ~15%
- Recall accuracy: >95% within capacity
- Storage time: O(10,000) operations
- Recall time: O(10,000) operations
As dimension increases:
- Capacity: ~M (constant factor)
- Noise tolerance: ~15% (constant)
- Precision: exponentially better
- Computational cost: linear in M
The mathematical theory of SDM reveals a robust and efficient memory system based on fundamental properties of high-dimensional spaces. The key insights are:
- Sparsity in high dimensions allows distributed storage
- Critical distance optimizes capacity and noise tolerance
- Linear capacity scaling with resources
- Graceful degradation from theoretical properties
These properties make SDM suitable for applications requiring robust, scalable, and biologically-plausible memory systems.
- Kanerva, P. (1988). Sparse Distributed Memory. MIT Press.
- Kanerva, P. (1993). "Sparse Distributed Memory and Related Models." Associative Neural Memories, 50-76.
- Jaeckel, L. A. (1989). "An Alternative Design for a Sparse Distributed Memory." RIACS Technical Report 89.28.
- Rogers, D. (1989). "Statistical Prediction with Kanerva's Sparse Distributed Memory." NIPS.
- Anwar, A., & Franklin, S. (2003). "Sparse Distributed Memory for 'Conscious' Software Agents." Cognitive Systems Research, 4(4), 339-354.
- Snaider, J., & Franklin, S. (2014). "Modular Composite Representation." Cognitive Computation, 6(3), 510-527.
- Kelly, M. A., et al. (2013). "Holographic Declarative Memory." Cognitive Science, 37(4), 659-697.