◈ Open Research · Python Library
Six competing mathematical models. One synthetic social simulator. Bayesian parameter inference. Real AI-agent simulation data. Intervention testing and resilience analysis. All compared head-to-head - and all limitations documented openly.
The Project
Human groups form around shared interests, ideologies, and communication patterns - but which mechanism dominates? CFT pits five formal theories against each other on identical data, scores their predictions with information-theoretic metrics, and exposes every methodological assumption openly.
Generate random, clustered, polarized, and hierarchical populations - all from numpy, no external dependencies.
Load data from MiroFish-Offline (OASIS format): LLM-backed agents with MBTI personalities on a synthetic social platform.
PAS, NMI, DFI, CTAI - quantitative metrics score each theory's ability to predict held-out community structure.
MCMC over interaction weights and theory parameters. Marginal likelihood ranks theories by evidence, not just fit.
The Theories
Each theory offers a different answer to "why do people cluster?" All six are implemented with the same interface, scored against the same ground truth, and given the same affinity data.
Groups form when all pairwise affinities exceed a threshold θ - binary membership, no overlap.
threshold ∈ [0.1, 0.99] ·
Math: graph theory + combinatorics
Individuals move in behavioral space following affinity gradients - continuous positions, smooth drift.
k, sigma ·
Math: ODEs, potential energy
Individuals exist in superposition of behavioral states until "measured" through interaction.
n_states ·
Math: mean-field quantum (O(n²))
Group structure is determined by information flow and communication bandwidth limits.
bandwidth ·
Math: information theory
Agents are spins in a Potts model - group structure is the ground state of a social energy function.
temperature ·
Math: statistical mechanics
Agents exist in two coupled spaces - proximity (who you're near) and alignment (what you believe). Groups form only where both layers agree.
mu (seeking), lam (conformity), trait_map ·
Math: coupled ODEs + spectral embedding
| Theory | Group boundary | Time evolution | Observer effects | Long-range | Best scale |
|---|---|---|---|---|---|
| CFT | Binary, sharp | Rapid → equilibrium | None | None | n < 50 |
| GFT | Continuous, fuzzy | Smooth trajectories | None | Decays with distance | Any |
| QST | Probabilistic | Unitary until measured | Fundamental | Entanglement | n < 20 (mean-field) |
| ICT | Communication-limited | Punctuated cascades | Indirect | Network-dependent | Medium |
| TST | Statistical, energy-based | Stochastic (MH sweeps) | None | Mean-field | n > 100 |
| DCT | Dual-layer (proximity + alignment) | Two coupled timescales | None | Via alignment layer | Any |
Experimental Results
These are real results from real runs - not cherry-picked. The methodology has known flaws (see Limitations), which we're actively fixing.
6 AI agents (qwen2.5:14b) · 144 rounds · MBTI-typed · Reddit-style platform
Ground truth: two clusters - {CFT, TST, GFT} and {ICT, QST, Thermodynamic}.
⚠ These scores use circular ground truth (same data for affinity + GT).
See limitation #23.
4 scenarios · n=40 agents · seed=42 · default parameters
⚠ Single run per scenario. No statistical significance testing yet on these.
Use n_runs=10 for valid comparisons.
| Claim | Status | Evidence |
|---|---|---|
| CFT produces fewer groups than GFT on polarized data | ✓ Supported | CFT converges to 2 groups; GFT produces 4-6 metastable clusters |
| TST exhibits a phase transition (group count varies with temperature) | ✓ Supported | Variance in group count > 0.5 across T ∈ [0.1, 3.0] |
| All theories agree on strongly clustered data (CTAI > 0.6) | ⚠ Conditional | True at β ≥ 6; fails at β < 2 (weak clustering) |
| CFT outperforms all others on the MiroFish AI-agent run | ✓ Supported | NMI 0.716 vs 0.231 (GFT) - but single seed, circular GT |
| Theories can predict future group structure from past interactions | ○ Preliminary | temporal_prediction() implemented; needs multi-run validation |
Intervention System
Apply perturbations mid-simulation and measure how groups respond. Who breaks? Who recovers? Which traits predict resilience?
Remove agents, shift features, add infiltrators, inject noise shocks, modify affinity between pairs - all at precise simulation times.
Model ongoing propaganda (SustainedShift), prolonged instability (SustainedNoise), or platform algorithm bias (SustainedAffinityBias) over time ranges.
InterventionReport tracks stability curves, fracture/merge events, group survival rates, per-agent vulnerability rankings, and recovery metrics.
Derive seeking and conformity rates from personality (MBTI), influence scores, or custom features. Different people respond differently to the same pressure.
from cft import DCT, TheoryParameters, Agent, InterventionRunner
from cft import RemoveAgents, SustainedShift, SustainedNoise
import numpy as np
# Set up theory with per-agent traits
theory = DCT(params, trait_map="influence", noise=0.05)
theory.initialize_agents(agents)
# Remove the leader at t=5, apply sustained propaganda t=3-8
runner = InterventionRunner(theory,
interventions=[RemoveAgents(time=5.0, agent_ids=[leader_id])],
sustained=[
SustainedShift(start=3.0, end=8.0, agent_ids=target_ids,
delta_per_step=np.array([0.0, 0.0, 0.1, 0.0])),
],
)
report = runner.run(t_max=15.0, dt=0.5)
print(report.resilience_scores)
print(report.vulnerability_ranking()[:5]) # most vulnerable agents
Methodology
Each methodological choice is documented, justified, and - where flawed - flagged as a known limitation with a proposed fix.
Scoring
Combines group count accuracy, partition similarity (NMI), and size distribution accuracy into a single score ∈ [0, 1].
Agreement
Mean pairwise NMI across all theory pairs. High CTAI means theories converge - low means the data is genuinely ambiguous.
Temporal
Early interactions build the affinity matrix; late interactions define ground-truth communities. Breaks the circular evaluation loop where both are derived from the same data.
Inference
Metropolis-Hastings over interaction weights (follow, like, repost…). Posterior mean replaces hardcoded defaults. Fold-over reflection keeps samples in bounds.
Fixes #21 - Arbitrary weights
Inference
Same MCMC machinery infers theory-specific parameters (CFT threshold, GFT k/sigma, TST temperature). Replaces manual tuning with a posterior distribution.
Complexity
Log marginal likelihood = logsumexp(log_liks) − log(n) over the MCMC chain gives an implicit Occam's razor. Theories with more free parameters must fit better to win.
Fixes #24 - Ignores complexity
Statistics
Run the full pipeline n_runs times with offset seeds. Report mean ± std NMI. For n ≥ 5, Wilcoxon signed-rank test determines whether the top-ranked theory is significantly better.
Data
MiroFishAdapter.from_oasis_dir() converts MiroFish-Offline / camel-oasis CSV + JSONL logs directly into the normalized format CFT expects. No hand-written conversion scripts.
Known Limitations
Publishing limitations is part of the point. Every item below has a GitHub issue, a proposed fix, and a status. Transparency over appearance.
Affinity matrix and Louvain ground truth were both derived from the same interaction data - measuring reconstruction, not prediction. Fixed via temporal train/eval split in compare_theories(use_temporal_split=True).
follow=0.3, like=0.2, repost=0.4 were chosen by intuition. Now inferred via MCMC: MCMCInference(adapter, CFT).infer_weights().
CFT threshold=0.6, GFT k=0.1, TST temperature=1.0 were set manually. Now inferred: mcmc.infer_theory_params(DEFAULT_THEORY_PARAM_SPECS["CFT"]).
A theory with more free parameters could win by overfitting. Fixed by marginal likelihood comparison: compare_theories_by_evidence() applies an implicit Occam's razor.
Stochastic theories (TST, QST) and stochastic Louvain meant rankings could vary. Fixed: compare_theories(n_runs=10) reports mean ± std NMI with Wilcoxon significance test.
Real MiroFish produces OASIS-format CSV + JSONL that didn't match the adapter's expected format. Fixed: MiroFishAdapter.from_oasis_dir() handles conversion automatically.
MBTI types are encoded as four {−1, +1} dimensions. All INTJs are identical; within-type variation is lost. Proposed fix: support continuous Big-Five (OCEAN) scores or learn a continuous embedding from interaction data.
Dev Log
What's been shipped, what's in progress, what's next.
TraitMap derives per-agent seeking/conformity rates from personality features or metadata
(presets: "mbti", "influence").
Three sustained intervention types model ongoing pressure over time ranges.
DCT now accepts separate proximity and alignment data sources via spectral embedding.
371 passing tests.
Full intervention framework: 7 point-in-time perturbations (RemoveAgents, ShiftFeatures, AddAgent, NoiseShock, ModifyAffinity, ShiftProximity, ShiftAlignment) plus InterventionRunner with resilience analysis. DCT adds a 6th theory with two coupled layers - proximity and alignment - with per-agent behavioral parameters.
Temporal split in compare_theories() breaks circular ground truth (#23).
MCMC inference module (cft/inference.py) infers weights and theory
parameters from data (#21, #22) and estimates marginal likelihoods for
complexity-penalised comparison (#24).
n_runs + Wilcoxon signed-rank test for statistical validation (#27).
MiroFishAdapter.from_oasis_dir() for OASIS format direct import (#26).
Ran a 144-round simulation with 6 LLM-backed agents (qwen2.5:14b, nomic-embed-text) on a synthetic Reddit-style platform. Ground truth: two clusters emerged. CFT won with NMI=0.716 but methodology was compromised by circular ground truth - opened issues #21-#27 to document and fix.
Built-in synthetic social simulator (no external deps) with four scenarios. HypothesisTester wraps the full pipeline: compare, sweep, temporal prediction, named claims. 276 passing tests.
CFT, GFT, QST (mean-field), ICT, TST (Potts/Metropolis-Hastings) - all sharing
the same BehaviorTheory base class, same affinity interface, same
history format. MiroFish adapter + PredictionTournament with PAS/DFI/PSS/CTAI.
Created cft/ package from scratch: pyproject.toml, base classes,
theory interface, affinity computation, comparator, visualization, notebooks.
pip-installable with optional extras.
Issue #25 - replace binary MBTI features with continuous Big-Five (OCEAN) scores. Intervention scenario library - pre-built scenarios for common social dynamics questions. DCT + TraitMap integration with MiroFish data - derive seeking/conformity from real agent interactions. More MiroFish simulation runs with the temporal split enabled to get clean NMI scores.
Code
Pure Python, no external LLM or database required. Optional pandas + networkx for real-data workflows.
git clone https://github.com/rivirside/cft.git
cd cft
pip install -e ".[dev]" # or .[mirofish] or .[all]
pytest # 371 tests
from cft import SocialSimulator, HypothesisTester
sim = SocialSimulator(
n_agents=40, scenario="clustered",
k=3, T=30, seed=42
)
ht = HypothesisTester(simulator=sim)
# Single run
result = ht.compare_theories(use_temporal_split=True)
print(result["rankings"])
# Statistical validation (n ≥ 5 for Wilcoxon)
result = ht.compare_theories(n_runs=10)
print(result["mean_similarity"])
print(result["wilcoxon_pvalue"])
from cft import MCMCInference, CFT
from cft.integrations.mirofish import MiroFishAdapter
adapter = MiroFishAdapter("sim_dir")
adapter.load_agents()
adapter.load_interactions()
mcmc = MCMCInference(adapter, CFT, seed=42)
# Infer interaction weights from data
weights = mcmc.infer_weights(n_samples=2000)
print(weights.map_estimate)
print(weights.posterior_std)
# Complexity-penalised theory comparison
from cft import compare_theories_by_evidence, GFT
log_ml = compare_theories_by_evidence(
adapter, {"CFT": CFT, "GFT": GFT}
)
print(log_ml) # higher = better evidence
from cft.integrations.mirofish import MiroFishAdapter
adapter = MiroFishAdapter.from_oasis_dir(
"/path/to/oasis_sim"
)
# agents + interactions pre-loaded
affinity = adapter.compute_affinity_matrix()
groups = adapter.extract_ground_truth_groups()
adapter.cleanup_oasis()