◈ Open Research · Python Library

Which theory best explains how groups form and fracture?

Six competing mathematical models. One synthetic social simulator. Bayesian parameter inference. Real AI-agent simulation data. Intervention testing and resilience analysis. All compared head-to-head - and all limitations documented openly.

6
Theories
371
Passing tests
7
Known limitations
NMI 0.716
Best theory on MiroFish run

A framework for testing theories of collective behavior

Human groups form around shared interests, ideologies, and communication patterns - but which mechanism dominates? CFT pits five formal theories against each other on identical data, scores their predictions with information-theoretic metrics, and exposes every methodological assumption openly.

⚗️

Synthetic Scenarios

Generate random, clustered, polarized, and hierarchical populations - all from numpy, no external dependencies.

🤖

AI-Agent Simulations

Load data from MiroFish-Offline (OASIS format): LLM-backed agents with MBTI personalities on a synthetic social platform.

📐

Head-to-Head Scoring

PAS, NMI, DFI, CTAI - quantitative metrics score each theory's ability to predict held-out community structure.

🎲

Bayesian Inference

MCMC over interaction weights and theory parameters. Marginal likelihood ranks theories by evidence, not just fit.

Six mechanisms - one dataset

Each theory offers a different answer to "why do people cluster?" All six are implemented with the same interface, scored against the same ground truth, and given the same affinity data.

CFT

Consensus-Fracture Theory

Groups form when all pairwise affinities exceed a threshold θ - binary membership, no overlap.

  • Sharp, stable group boundaries
  • Fast convergence (< 20 steps)
  • Rare but catastrophic splits
  • Best for ideological / coalition data
Key param: threshold ∈ [0.1, 0.99]  ·  Math: graph theory + combinatorics
GFT

Gradient Field Theory

Individuals move in behavioral space following affinity gradients - continuous positions, smooth drift.

  • Gradual cluster formation
  • Overlapping memberships possible
  • Continuous reorganization
  • Multiple metastable states
Key params: k, sigma  ·  Math: ODEs, potential energy
QST

Quantum Social Theory

Individuals exist in superposition of behavioral states until "measured" through interaction.

  • Probabilistic group membership
  • Observation affects outcomes
  • Long-range entanglement correlations
  • Decoherence → classical at scale
Key param: n_states  ·  Math: mean-field quantum (O(n²))
ICT

Information Cascade Theory

Group structure is determined by information flow and communication bandwidth limits.

  • Optimal group sizes from bandwidth
  • Splits when information overloads
  • Cascade-driven reorganization
  • Task-dependent configurations
Key param: bandwidth  ·  Math: information theory
TST

Thermodynamic Social Theory

Agents are spins in a Potts model - group structure is the ground state of a social energy function.

  • Phase transition at critical temperature
  • Power-law group sizes near Tc
  • High T → fluid, Low T → frozen
  • Metropolis-Hastings dynamics
Key param: temperature  ·  Math: statistical mechanics
DCT

Dual-Context Theory

Agents exist in two coupled spaces - proximity (who you're near) and alignment (what you believe). Groups form only where both layers agree.

  • Two timescales: fast proximity, slow alignment
  • Per-agent seeking and conformity rates
  • Layer tension predicts instability
  • Models code-switching and forced coexistence
Key params: mu (seeking), lam (conformity), trait_map  ·  Math: coupled ODEs + spectral embedding

Theory comparison at a glance

Theory Group boundary Time evolution Observer effects Long-range Best scale
CFT Binary, sharp Rapid → equilibrium None None n < 50
GFT Continuous, fuzzy Smooth trajectories None Decays with distance Any
QST Probabilistic Unitary until measured Fundamental Entanglement n < 20 (mean-field)
ICT Communication-limited Punctuated cascades Indirect Network-dependent Medium
TST Statistical, energy-based Stochastic (MH sweeps) None Mean-field n > 100
DCT Dual-layer (proximity + alignment) Two coupled timescales None Via alignment layer Any

What we've found so far

These are real results from real runs - not cherry-picked. The methodology has known flaws (see Limitations), which we're actively fixing.

MiroFish-Offline Run #1

6 AI agents (qwen2.5:14b) · 144 rounds · MBTI-typed · Reddit-style platform

1
CFT
NMI score
0.716
2
GFT
NMI score
0.231
3
TST
NMI score
0.082

Ground truth: two clusters - {CFT, TST, GFT} and {ICT, QST, Thermodynamic}.
⚠ These scores use circular ground truth (same data for affinity + GT). See limitation #23.

Synthetic Scenario Performance

4 scenarios · n=40 agents · seed=42 · default parameters

Clustered (k=3, β=4)
CFT leads - sharp affinity threshold aligns well with hard cluster boundaries.
Polarized (2 camps)
CFT and TST competitive - binary group structure matches threshold and Potts phases.
Random (no structure)
All theories converge - CTAI high, no discriminating power. Expected result.
Hierarchical (5 influencers)
ICT shows stronger relative performance - bandwidth limits model hub-spoke structure.

⚠ Single run per scenario. No statistical significance testing yet on these. Use n_runs=10 for valid comparisons.

Hypotheses tested

Claim Status Evidence
CFT produces fewer groups than GFT on polarized data ✓ Supported CFT converges to 2 groups; GFT produces 4-6 metastable clusters
TST exhibits a phase transition (group count varies with temperature) ✓ Supported Variance in group count > 0.5 across T ∈ [0.1, 3.0]
All theories agree on strongly clustered data (CTAI > 0.6) ⚠ Conditional True at β ≥ 6; fails at β < 2 (weak clustering)
CFT outperforms all others on the MiroFish AI-agent run ✓ Supported NMI 0.716 vs 0.231 (GFT) - but single seed, circular GT
Theories can predict future group structure from past interactions ○ Preliminary temporal_prediction() implemented; needs multi-run validation

Perturb, measure, understand

Apply perturbations mid-simulation and measure how groups respond. Who breaks? Who recovers? Which traits predict resilience?

💥

Point-in-time interventions

Remove agents, shift features, add infiltrators, inject noise shocks, modify affinity between pairs - all at precise simulation times.

📈

Sustained interventions

Model ongoing propaganda (SustainedShift), prolonged instability (SustainedNoise), or platform algorithm bias (SustainedAffinityBias) over time ranges.

🛡

Resilience analysis

InterventionReport tracks stability curves, fracture/merge events, group survival rates, per-agent vulnerability rankings, and recovery metrics.

🧬

TraitMap - per-agent behavior

Derive seeking and conformity rates from personality (MBTI), influence scores, or custom features. Different people respond differently to the same pressure.

Example: leader removal + sustained propaganda

from cft import DCT, TheoryParameters, Agent, InterventionRunner
from cft import RemoveAgents, SustainedShift, SustainedNoise
import numpy as np

# Set up theory with per-agent traits
theory = DCT(params, trait_map="influence", noise=0.05)
theory.initialize_agents(agents)

# Remove the leader at t=5, apply sustained propaganda t=3-8
runner = InterventionRunner(theory,
    interventions=[RemoveAgents(time=5.0, agent_ids=[leader_id])],
    sustained=[
        SustainedShift(start=3.0, end=8.0, agent_ids=target_ids,
                       delta_per_step=np.array([0.0, 0.0, 0.1, 0.0])),
    ],
)
report = runner.run(t_max=15.0, dt=0.5)

print(report.resilience_scores)
print(report.vulnerability_ranking()[:5])  # most vulnerable agents

How we validate theories

Each methodological choice is documented, justified, and - where flawed - flagged as a known limitation with a proposed fix.

Scoring

PAS - Prediction Accuracy Score

Combines group count accuracy, partition similarity (NMI), and size distribution accuracy into a single score ∈ [0, 1].

Agreement

CTAI - Cross-Theory Agreement Index

Mean pairwise NMI across all theory pairs. High CTAI means theories converge - low means the data is genuinely ambiguous.

Temporal

Temporal train/eval split

Early interactions build the affinity matrix; late interactions define ground-truth communities. Breaks the circular evaluation loop where both are derived from the same data.

Fixes #23 - Circular ground truth

Inference

MCMC weight inference

Metropolis-Hastings over interaction weights (follow, like, repost…). Posterior mean replaces hardcoded defaults. Fold-over reflection keeps samples in bounds.

Fixes #21 - Arbitrary weights

Inference

MCMC parameter inference

Same MCMC machinery infers theory-specific parameters (CFT threshold, GFT k/sigma, TST temperature). Replaces manual tuning with a posterior distribution.

Fixes #22 - Arbitrary theory params

Complexity

Marginal likelihood comparison

Log marginal likelihood = logsumexp(log_liks) − log(n) over the MCMC chain gives an implicit Occam's razor. Theories with more free parameters must fit better to win.

Fixes #24 - Ignores complexity

Statistics

Multi-run + Wilcoxon test

Run the full pipeline n_runs times with offset seeds. Report mean ± std NMI. For n ≥ 5, Wilcoxon signed-rank test determines whether the top-ranked theory is significantly better.

Fixes #27 - Single seed instability

Data

OASIS event-log import

MiroFishAdapter.from_oasis_dir() converts MiroFish-Offline / camel-oasis CSV + JSONL logs directly into the normalized format CFT expects. No hand-written conversion scripts.

Fixes #26 - Manual OASIS conversion

What we know is wrong

Publishing limitations is part of the point. Every item below has a GitHub issue, a proposed fix, and a status. Transparency over appearance.

#23 Fixed

Circular ground truth

Affinity matrix and Louvain ground truth were both derived from the same interaction data - measuring reconstruction, not prediction. Fixed via temporal train/eval split in compare_theories(use_temporal_split=True).

#21 Fixed

Arbitrary interaction weights

follow=0.3, like=0.2, repost=0.4 were chosen by intuition. Now inferred via MCMC: MCMCInference(adapter, CFT).infer_weights().

#22 Fixed

Arbitrary theory parameters

CFT threshold=0.6, GFT k=0.1, TST temperature=1.0 were set manually. Now inferred: mcmc.infer_theory_params(DEFAULT_THEORY_PARAM_SPECS["CFT"]).

#24 Fixed

No penalty for model complexity

A theory with more free parameters could win by overfitting. Fixed by marginal likelihood comparison: compare_theories_by_evidence() applies an implicit Occam's razor.

#27 Fixed

Single simulation run - unstable rankings

Stochastic theories (TST, QST) and stochastic Louvain meant rankings could vary. Fixed: compare_theories(n_runs=10) reports mean ± std NMI with Wilcoxon significance test.

#26 Fixed

MiroFish OASIS output required manual conversion

Real MiroFish produces OASIS-format CSV + JSONL that didn't match the adapter's expected format. Fixed: MiroFishAdapter.from_oasis_dir() handles conversion automatically.

#25 Open

MBTI features are binary and coarse

MBTI types are encoded as four {−1, +1} dimensions. All INTJs are identical; within-type variation is lost. Proposed fix: support continuous Big-Five (OCEAN) scores or learn a continuous embedding from interaction data.

Building in public

What's been shipped, what's in progress, what's next.

2026-03-20

DCT advanced features: TraitMap, sustained interventions, separate sources

TraitMap derives per-agent seeking/conformity rates from personality features or metadata (presets: "mbti", "influence"). Three sustained intervention types model ongoing pressure over time ranges. DCT now accepts separate proximity and alignment data sources via spectral embedding. 371 passing tests.

TraitMap sustained interventions separate sources
2026-03-19

Intervention system + DCT (Dual-Context Theory)

Full intervention framework: 7 point-in-time perturbations (RemoveAgents, ShiftFeatures, AddAgent, NoiseShock, ModifyAffinity, ShiftProximity, ShiftAlignment) plus InterventionRunner with resilience analysis. DCT adds a 6th theory with two coupled layers - proximity and alignment - with per-agent behavioral parameters.

interventions DCT resilience
2026-03-18

Fixed all 5 priority methodological issues

Temporal split in compare_theories() breaks circular ground truth (#23). MCMC inference module (cft/inference.py) infers weights and theory parameters from data (#21, #22) and estimates marginal likelihoods for complexity-penalised comparison (#24). n_runs + Wilcoxon signed-rank test for statistical validation (#27). MiroFishAdapter.from_oasis_dir() for OASIS format direct import (#26).

fix #23 MCMC #21 #22 #24 stats #27 oasis #26
2026-03-17

First MiroFish-Offline run completed

Ran a 144-round simulation with 6 LLM-backed agents (qwen2.5:14b, nomic-embed-text) on a synthetic Reddit-style platform. Ground truth: two clusters emerged. CFT won with NMI=0.716 but methodology was compromised by circular ground truth - opened issues #21-#27 to document and fix.

experiment analysis
2026-03-17

SocialSimulator + HypothesisTester shipped

Built-in synthetic social simulator (no external deps) with four scenarios. HypothesisTester wraps the full pipeline: compare, sweep, temporal prediction, named claims. 276 passing tests.

simulator hypothesis testing
2026-03-17

All five theories implemented

CFT, GFT, QST (mean-field), ICT, TST (Potts/Metropolis-Hastings) - all sharing the same BehaviorTheory base class, same affinity interface, same history format. MiroFish adapter + PredictionTournament with PAS/DFI/PSS/CTAI.

theories scoring
2026-03-17

Repository scaffolded

Created cft/ package from scratch: pyproject.toml, base classes, theory interface, affinity computation, comparator, visualization, notebooks. pip-installable with optional extras.

scaffolding

What's next

Issue #25 - replace binary MBTI features with continuous Big-Five (OCEAN) scores. Intervention scenario library - pre-built scenarios for common social dynamics questions. DCT + TraitMap integration with MiroFish data - derive seeking/conformity from real agent interactions. More MiroFish simulation runs with the temporal split enabled to get clean NMI scores.

Use it yourself

Pure Python, no external LLM or database required. Optional pandas + networkx for real-data workflows.

Install
git clone https://github.com/rivirside/cft.git
cd cft
pip install -e ".[dev]"   # or .[mirofish] or .[all]
pytest                     # 371 tests
Quick comparison
from cft import SocialSimulator, HypothesisTester

sim = SocialSimulator(
    n_agents=40, scenario="clustered",
    k=3, T=30, seed=42
)
ht = HypothesisTester(simulator=sim)

# Single run
result = ht.compare_theories(use_temporal_split=True)
print(result["rankings"])

# Statistical validation (n ≥ 5 for Wilcoxon)
result = ht.compare_theories(n_runs=10)
print(result["mean_similarity"])
print(result["wilcoxon_pvalue"])
MCMC parameter inference
from cft import MCMCInference, CFT
from cft.integrations.mirofish import MiroFishAdapter

adapter = MiroFishAdapter("sim_dir")
adapter.load_agents()
adapter.load_interactions()

mcmc = MCMCInference(adapter, CFT, seed=42)

# Infer interaction weights from data
weights = mcmc.infer_weights(n_samples=2000)
print(weights.map_estimate)
print(weights.posterior_std)

# Complexity-penalised theory comparison
from cft import compare_theories_by_evidence, GFT
log_ml = compare_theories_by_evidence(
    adapter, {"CFT": CFT, "GFT": GFT}
)
print(log_ml)  # higher = better evidence
Load a MiroFish-Offline (OASIS) run
from cft.integrations.mirofish import MiroFishAdapter

adapter = MiroFishAdapter.from_oasis_dir(
    "/path/to/oasis_sim"
)
# agents + interactions pre-loaded

affinity = adapter.compute_affinity_matrix()
groups = adapter.extract_ground_truth_groups()
adapter.cleanup_oasis()