CFT - Multi-Theory Group Formation Framework

The Project

A framework for testing theories of collective behavior

Human groups form around shared interests, ideologies, and communication patterns - but which mechanism dominates? CFT pits five formal theories against each other on identical data, scores their predictions with information-theoretic metrics, and exposes every methodological assumption openly.

⚗️

Synthetic Scenarios

Generate random, clustered, polarized, and hierarchical populations - all from numpy, no external dependencies.

🤖

AI-Agent Simulations

Load data from MiroFish-Offline (OASIS format): LLM-backed agents with MBTI personalities on a synthetic social platform.

📐

Head-to-Head Scoring

PAS, NMI, DFI, CTAI - quantitative metrics score each theory's ability to predict held-out community structure.

🎲

Bayesian Inference

MCMC over interaction weights and theory parameters. Marginal likelihood ranks theories by evidence, not just fit.

The Theories

Six mechanisms - one dataset

Each theory offers a different answer to "why do people cluster?" All six are implemented with the same interface, scored against the same ground truth, and given the same affinity data.

CFT

Consensus-Fracture Theory

Groups form when all pairwise affinities exceed a threshold θ - binary membership, no overlap.

Sharp, stable group boundaries
Fast convergence (< 20 steps)
Rare but catastrophic splits
Best for ideological / coalition data

Key param: threshold ∈ [0.1, 0.99] · Math: graph theory + combinatorics

GFT

Gradient Field Theory

Individuals move in behavioral space following affinity gradients - continuous positions, smooth drift.

Gradual cluster formation
Overlapping memberships possible
Continuous reorganization
Multiple metastable states

Key params: k, sigma · Math: ODEs, potential energy

QST

Quantum Social Theory

Individuals exist in superposition of behavioral states until "measured" through interaction.

Probabilistic group membership
Observation affects outcomes
Long-range entanglement correlations
Decoherence → classical at scale

Key param: n_states · Math: mean-field quantum (O(n²))

ICT

Information Cascade Theory

Group structure is determined by information flow and communication bandwidth limits.

Optimal group sizes from bandwidth
Splits when information overloads
Cascade-driven reorganization
Task-dependent configurations

Key param: bandwidth · Math: information theory

TST

Thermodynamic Social Theory

Agents are spins in a Potts model - group structure is the ground state of a social energy function.

Phase transition at critical temperature
Power-law group sizes near T_c
High T → fluid, Low T → frozen
Metropolis-Hastings dynamics

Key param: temperature · Math: statistical mechanics

DCT

Dual-Context Theory

Agents exist in two coupled spaces - proximity (who you're near) and alignment (what you believe). Groups form only where both layers agree.

Two timescales: fast proximity, slow alignment
Per-agent seeking and conformity rates
Layer tension predicts instability
Models code-switching and forced coexistence

Key params: mu (seeking), lam (conformity), trait_map · Math: coupled ODEs + spectral embedding

Theory comparison at a glance

Theory	Group boundary	Time evolution	Observer effects	Long-range	Best scale
CFT	Binary, sharp	Rapid → equilibrium	None	None	n < 50
GFT	Continuous, fuzzy	Smooth trajectories	None	Decays with distance	Any
QST	Probabilistic	Unitary until measured	Fundamental	Entanglement	n < 20 (mean-field)
ICT	Communication-limited	Punctuated cascades	Indirect	Network-dependent	Medium
TST	Statistical, energy-based	Stochastic (MH sweeps)	None	Mean-field	n > 100
DCT	Dual-layer (proximity + alignment)	Two coupled timescales	None	Via alignment layer	Any

Experimental Results

What we've found so far

These are real results from real runs - not cherry-picked. The methodology has known flaws (see Limitations), which we're actively fixing.

MiroFish-Offline Run #1

6 AI agents (qwen2.5:14b) · 144 rounds · MBTI-typed · Reddit-style platform

CFT

NMI score

0.716

GFT

NMI score

0.231

TST

NMI score

0.082

Ground truth: two clusters - {CFT, TST, GFT} and {ICT, QST, Thermodynamic}.
⚠ These scores use circular ground truth (same data for affinity + GT). See limitation #23.

Synthetic Scenario Performance

4 scenarios · n=40 agents · seed=42 · default parameters

Clustered (k=3, β=4)

CFT leads - sharp affinity threshold aligns well with hard cluster boundaries.

Polarized (2 camps)

CFT and TST competitive - binary group structure matches threshold and Potts phases.

Random (no structure)

All theories converge - CTAI high, no discriminating power. Expected result.

Hierarchical (5 influencers)

ICT shows stronger relative performance - bandwidth limits model hub-spoke structure.

⚠ Single run per scenario. No statistical significance testing yet on these. Use n_runs=10 for valid comparisons.

Hypotheses tested

Claim	Status	Evidence
CFT produces fewer groups than GFT on polarized data	✓ Supported	CFT converges to 2 groups; GFT produces 4-6 metastable clusters
TST exhibits a phase transition (group count varies with temperature)	✓ Supported	Variance in group count > 0.5 across T ∈ [0.1, 3.0]
All theories agree on strongly clustered data (CTAI > 0.6)	⚠ Conditional	True at β ≥ 6; fails at β < 2 (weak clustering)
CFT outperforms all others on the MiroFish AI-agent run	✓ Supported	NMI 0.716 vs 0.231 (GFT) - but single seed, circular GT
Theories can predict future group structure from past interactions	○ Preliminary	temporal_prediction() implemented; needs multi-run validation

Intervention System

Perturb, measure, understand

Apply perturbations mid-simulation and measure how groups respond. Who breaks? Who recovers? Which traits predict resilience?

💥

Point-in-time interventions

Remove agents, shift features, add infiltrators, inject noise shocks, modify affinity between pairs - all at precise simulation times.

📈

Sustained interventions

Model ongoing propaganda (SustainedShift), prolonged instability (SustainedNoise), or platform algorithm bias (SustainedAffinityBias) over time ranges.

🛡

Resilience analysis

InterventionReport tracks stability curves, fracture/merge events, group survival rates, per-agent vulnerability rankings, and recovery metrics.

🧬

TraitMap - per-agent behavior

Derive seeking and conformity rates from personality (MBTI), influence scores, or custom features. Different people respond differently to the same pressure.

Example: leader removal + sustained propaganda

from cft import DCT, TheoryParameters, Agent, InterventionRunner
from cft import RemoveAgents, SustainedShift, SustainedNoise
import numpy as np

# Set up theory with per-agent traits
theory = DCT(params, trait_map="influence", noise=0.05)
theory.initialize_agents(agents)

# Remove the leader at t=5, apply sustained propaganda t=3-8
runner = InterventionRunner(theory,
    interventions=[RemoveAgents(time=5.0, agent_ids=[leader_id])],
    sustained=[
        SustainedShift(start=3.0, end=8.0, agent_ids=target_ids,
                       delta_per_step=np.array([0.0, 0.0, 0.1, 0.0])),
    ],
)
report = runner.run(t_max=15.0, dt=0.5)

print(report.resilience_scores)
print(report.vulnerability_ranking()[:5])  # most vulnerable agents

Methodology

How we validate theories

Each methodological choice is documented, justified, and - where flawed - flagged as a known limitation with a proposed fix.

Scoring

PAS - Prediction Accuracy Score

Combines group count accuracy, partition similarity (NMI), and size distribution accuracy into a single score ∈ [0, 1].

Agreement

CTAI - Cross-Theory Agreement Index

Mean pairwise NMI across all theory pairs. High CTAI means theories converge - low means the data is genuinely ambiguous.

Temporal

Temporal train/eval split

Early interactions build the affinity matrix; late interactions define ground-truth communities. Breaks the circular evaluation loop where both are derived from the same data.

Fixes #23 - Circular ground truth

Inference

MCMC weight inference

Metropolis-Hastings over interaction weights (follow, like, repost…). Posterior mean replaces hardcoded defaults. Fold-over reflection keeps samples in bounds.

Fixes #21 - Arbitrary weights

Inference

MCMC parameter inference

Same MCMC machinery infers theory-specific parameters (CFT threshold, GFT k/sigma, TST temperature). Replaces manual tuning with a posterior distribution.

Fixes #22 - Arbitrary theory params

Complexity

Marginal likelihood comparison

Log marginal likelihood = logsumexp(log_liks) − log(n) over the MCMC chain gives an implicit Occam's razor. Theories with more free parameters must fit better to win.

Fixes #24 - Ignores complexity

Statistics

Multi-run + Wilcoxon test

Run the full pipeline n_runs times with offset seeds. Report mean ± std NMI. For n ≥ 5, Wilcoxon signed-rank test determines whether the top-ranked theory is significantly better.

Fixes #27 - Single seed instability

Data

OASIS event-log import

MiroFishAdapter.from_oasis_dir() converts MiroFish-Offline / camel-oasis CSV + JSONL logs directly into the normalized format CFT expects. No hand-written conversion scripts.

Fixes #26 - Manual OASIS conversion

Known Limitations

What we know is wrong

Publishing limitations is part of the point. Every item below has a GitHub issue, a proposed fix, and a status. Transparency over appearance.

#23 Fixed

Circular ground truth

Affinity matrix and Louvain ground truth were both derived from the same interaction data - measuring reconstruction, not prediction. Fixed via temporal train/eval split in compare_theories(use_temporal_split=True).

#21 Fixed

Arbitrary interaction weights

follow=0.3, like=0.2, repost=0.4 were chosen by intuition. Now inferred via MCMC: MCMCInference(adapter, CFT).infer_weights().

#22 Fixed

Arbitrary theory parameters

CFT threshold=0.6, GFT k=0.1, TST temperature=1.0 were set manually. Now inferred: mcmc.infer_theory_params(DEFAULT_THEORY_PARAM_SPECS["CFT"]).

#24 Fixed

No penalty for model complexity

A theory with more free parameters could win by overfitting. Fixed by marginal likelihood comparison: compare_theories_by_evidence() applies an implicit Occam's razor.

#27 Fixed

Single simulation run - unstable rankings

Stochastic theories (TST, QST) and stochastic Louvain meant rankings could vary. Fixed: compare_theories(n_runs=10) reports mean ± std NMI with Wilcoxon significance test.

#26 Fixed

MiroFish OASIS output required manual conversion

Real MiroFish produces OASIS-format CSV + JSONL that didn't match the adapter's expected format. Fixed: MiroFishAdapter.from_oasis_dir() handles conversion automatically.

#25 Open

MBTI features are binary and coarse

MBTI types are encoded as four {−1, +1} dimensions. All INTJs are identical; within-type variation is lost. Proposed fix: support continuous Big-Five (OCEAN) scores or learn a continuous embedding from interaction data.

Dev Log

Building in public

What's been shipped, what's in progress, what's next.

2026-03-20

DCT advanced features: TraitMap, sustained interventions, separate sources

TraitMap derives per-agent seeking/conformity rates from personality features or metadata (presets: "mbti", "influence"). Three sustained intervention types model ongoing pressure over time ranges. DCT now accepts separate proximity and alignment data sources via spectral embedding. 371 passing tests.

TraitMap sustained interventions separate sources

2026-03-19

Intervention system + DCT (Dual-Context Theory)

Full intervention framework: 7 point-in-time perturbations (RemoveAgents, ShiftFeatures, AddAgent, NoiseShock, ModifyAffinity, ShiftProximity, ShiftAlignment) plus InterventionRunner with resilience analysis. DCT adds a 6th theory with two coupled layers - proximity and alignment - with per-agent behavioral parameters.

interventions DCT resilience

2026-03-18

Fixed all 5 priority methodological issues

Temporal split in compare_theories() breaks circular ground truth (#23). MCMC inference module (cft/inference.py) infers weights and theory parameters from data (#21, #22) and estimates marginal likelihoods for complexity-penalised comparison (#24). n_runs + Wilcoxon signed-rank test for statistical validation (#27). MiroFishAdapter.from_oasis_dir() for OASIS format direct import (#26).

fix #23 MCMC #21 #22 #24 stats #27 oasis #26

2026-03-17

First MiroFish-Offline run completed

Ran a 144-round simulation with 6 LLM-backed agents (qwen2.5:14b, nomic-embed-text) on a synthetic Reddit-style platform. Ground truth: two clusters emerged. CFT won with NMI=0.716 but methodology was compromised by circular ground truth - opened issues #21-#27 to document and fix.

experiment analysis

2026-03-17

SocialSimulator + HypothesisTester shipped

Built-in synthetic social simulator (no external deps) with four scenarios. HypothesisTester wraps the full pipeline: compare, sweep, temporal prediction, named claims. 276 passing tests.

simulator hypothesis testing

2026-03-17

All five theories implemented

CFT, GFT, QST (mean-field), ICT, TST (Potts/Metropolis-Hastings) - all sharing the same BehaviorTheory base class, same affinity interface, same history format. MiroFish adapter + PredictionTournament with PAS/DFI/PSS/CTAI.

theories scoring

2026-03-17

Repository scaffolded

Created cft/ package from scratch: pyproject.toml, base classes, theory interface, affinity computation, comparator, visualization, notebooks. pip-installable with optional extras.

scaffolding

What's next

Issue #25 - replace binary MBTI features with continuous Big-Five (OCEAN) scores. Intervention scenario library - pre-built scenarios for common social dynamics questions. DCT + TraitMap integration with MiroFish data - derive seeking/conformity from real agent interactions. More MiroFish simulation runs with the temporal split enabled to get clean NMI scores.

Code

Use it yourself

Pure Python, no external LLM or database required. Optional pandas + networkx for real-data workflows.

Install

git clone https://github.com/rivirside/cft.git
cd cft
pip install -e ".[dev]"   # or .[mirofish] or .[all]
pytest                     # 371 tests

Quick comparison

from cft import SocialSimulator, HypothesisTester

sim = SocialSimulator(
    n_agents=40, scenario="clustered",
    k=3, T=30, seed=42
)
ht = HypothesisTester(simulator=sim)

# Single run
result = ht.compare_theories(use_temporal_split=True)
print(result["rankings"])

# Statistical validation (n ≥ 5 for Wilcoxon)
result = ht.compare_theories(n_runs=10)
print(result["mean_similarity"])
print(result["wilcoxon_pvalue"])

MCMC parameter inference

from cft import MCMCInference, CFT
from cft.integrations.mirofish import MiroFishAdapter

adapter = MiroFishAdapter("sim_dir")
adapter.load_agents()
adapter.load_interactions()

mcmc = MCMCInference(adapter, CFT, seed=42)

# Infer interaction weights from data
weights = mcmc.infer_weights(n_samples=2000)
print(weights.map_estimate)
print(weights.posterior_std)

# Complexity-penalised theory comparison
from cft import compare_theories_by_evidence, GFT
log_ml = compare_theories_by_evidence(
    adapter, {"CFT": CFT, "GFT": GFT}
)
print(log_ml)  # higher = better evidence

Load a MiroFish-Offline (OASIS) run

from cft.integrations.mirofish import MiroFishAdapter

adapter = MiroFishAdapter.from_oasis_dir(
    "/path/to/oasis_sim"
)
# agents + interactions pre-loaded

affinity = adapter.compute_affinity_matrix()
groups = adapter.extract_ground_truth_groups()
adapter.cleanup_oasis()

Which theory best explains how groups form and fracture?

A framework for testing theories of collective behavior

Synthetic Scenarios

AI-Agent Simulations

Head-to-Head Scoring

Bayesian Inference

Six mechanisms - one dataset

Consensus-Fracture Theory

Gradient Field Theory

Quantum Social Theory

Information Cascade Theory

Thermodynamic Social Theory

Dual-Context Theory

Theory comparison at a glance

What we've found so far

MiroFish-Offline Run #1

Synthetic Scenario Performance

Hypotheses tested

Perturb, measure, understand

Point-in-time interventions

Sustained interventions

Resilience analysis

TraitMap - per-agent behavior

Example: leader removal + sustained propaganda

How we validate theories

PAS - Prediction Accuracy Score

CTAI - Cross-Theory Agreement Index

Temporal train/eval split

MCMC weight inference

MCMC parameter inference

Marginal likelihood comparison

Multi-run + Wilcoxon test

OASIS event-log import

What we know is wrong

Circular ground truth

Arbitrary interaction weights

Arbitrary theory parameters

No penalty for model complexity

Single simulation run - unstable rankings

MiroFish OASIS output required manual conversion

MBTI features are binary and coarse

Building in public

DCT advanced features: TraitMap, sustained interventions, separate sources

Intervention system + DCT (Dual-Context Theory)

Fixed all 5 priority methodological issues

First MiroFish-Offline run completed

SocialSimulator + HypothesisTester shipped

All five theories implemented

Repository scaffolded

What's next

Use it yourself