EduMirror

Modeling Educational Social Dynamics with Value-driven Multi-agent Simulation

Jingzhe Lin*^1,2,3, Hengbin Yu*⁴, Yongdan Zeng*^1,5, Fangwei Zhong^1,2,3,✉

View affiliations

¹ School of Artificial Intelligence, Beijing Normal University, Beijing, China.

² Beijing Key Laboratory of Artificial Intelligence for Education, Beijing, China.

³ Engineering Research Center of Intelligent Technology and Educational Application, Ministry of Education, Beijing, China.

⁴ School of Systems Science, Beijing Normal University, Beijing, China.

⁵ Information Hub, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China.

^* Equal contribution. ^✉ Corresponding author.

ICML 2026

Paper Framework Results

Live Demo

Demo Video

Abstract

A safer laboratory for educational social dynamics.

EduMirror overview of educational scenarios and intervention workflow

EduMirror is a multi-agent simulator for studying educational social dynamics when real-world controlled experiments are ethically difficult and observational studies lack causal power. It combines a configurable library of education-oriented agents with value-driven behavior grounded in social value and intrinsic motivation. A dual-track measurement protocol uses LLMs to quantify both overt actions and latent psychological states, enabling structured in-silico research. Case studies on school bullying and group cooperation show that EduMirror can generate theory-aligned, measurable social phenomena for hypothesis testing in education.

Architecture

From theory to simulation traces.

EduMirror framework flowchart — **EduMirror follows a closed experimental workflow.** Researchers first construct theory-grounded educational scenarios by integrating domain theory, defining context, profiling roles, and configuring measurement metrics. These settings initialize the Concordia-based simulation engine, where the Game Master manages scene setting, narration, rule enforcement, and time. Agents act through a value-driven cognitive architecture: profiles, traits, goals, memories, theory-of-mind, psychological needs, and social value orientation jointly guide the action planner to generate, evaluate, select, and reflect on behaviors. The resulting interaction traces flow into user toolkits, where LLM Raters and LLM Surveyors quantify explicit behaviors and implicit states, while intervention tools create parallel timelines for counterfactual comparison and visualization.

Computable scenario construction

EduMirror turns an abstract educational phenomenon into a measurable simulation package before running agents. The process keeps each scenario tied to theory, roles, metrics, and intervention logic.

Select grounding theory

Choose theories that explain the target phenomenon, such as social comparison, family stress, or social anxiety.

Identify constructs

Break the theory into measurable concepts, such as self-esteem, belonging, pressure, anxiety, or competition.

Profile agents

Map constructs into roles, traits, goals, formative memories, and initial psychological or social-value states.

Configure metrics

Operationalize outcomes with behavior rubrics and validated-scale-inspired survey probes for internal states.

Run comparisons

Generate matched timelines, apply interventions, and compare explicit behavior with latent psychological change.

Scenario Library

A theory-grounded scenario library.

35% 25% 15% 25%

Peer & Group Dynamics 7 scenarios · 35.0%

Individual Social Cognition 5 scenarios · 25.0%

Classroom Culture 3 scenarios · 15.0%

Home-School Dynamics 5 scenarios · 25.0%

Scenario Design

Each scenario is packaged as a computable research unit.

EduMirror contains 20 pre-designed educational scenarios. In the paper, each scenario is specified by its social phenomenon, participating roles, number of agents, grounding theory, and measurement protocol. This lets the same simulation pipeline support both descriptive observation and controlled intervention experiments.

Browse the full 20-scenario library in compact pages. Each card keeps the table's phenomenon, roles, theory, and measurement protocol.

1 / 5

Experiments

Evidence from realism, scale, and intervention tests.

Scenario-wide win-rate comparison

0.50

0.30

0.27

0.17

0.27

0.70

0.50

0.54

0.43

0.35

0.54

0.70

0.46

0.50

0.42

0.30

0.40

0.73

0.57

0.58

0.50

0.38

0.54

0.83

0.65

0.70

0.62

0.50

0.71

0.73

0.46

0.60

0.46

0.29

0.50

The heatmap summarizes pairwise post-hoc evaluations across seventeen educational scenarios: six representative settings, eight bullying simulations, and three social-interaction simulations. Each cell reports the column model's win rate against the row model, offering a compact system-level view of relative realism, contextual appropriateness, and human-likeness.

Scalability under larger groups

Agents	EduMirror	LLMob	BabyAGI	D2A	ReAct
5	4.80	4.25	4.10	3.35	2.35
15	4.18	3.60	3.57	3.53	2.93
30	4.03	3.83	3.86	3.12	2.41

The kindergarten scalability study increases the group from 5 to 15 and 30 agents, then averages four rubric dimensions: naturalness, coherence, plausibility, and developmental typicality. EduMirror remains the top method at every group size, suggesting that the simulator can preserve age-appropriate classroom dynamics even as simultaneous child-agent interactions become denser.

Psychological need trajectories in the dormitory bullying scenario under different initial states

Neglectful intervention boxplots across psychological needs

Authoritative-punitive intervention boxplots across psychological needs

Counterfactual Outcomes

Intervention branches in the family financial strain scenario.

Baseline no-intervention scenario — **(a) Baseline: no intervention** Without support, Alex withdraws. The misunderstanding escalates into conflict and isolation.

Teacher talk scenario — **(b) Teacher-student talk** Mr. Davis reframes Alex's situation and values, helping peers respond with support.

Parent call scenario — **(c) Parent call** Emotional reassurance gives Alex confidence to be honest and plan local activities with friends.

Class meeting scenario — **(d) Class meeting** Class norms shift through collective discussion, turning an expensive trip into a cheaper group option.

1 / 4

EduMirror turns one family financial strain scenario into parallel intervention timelines, making alternative causal pathways inspectable as qualitative narratives.

Citation

BibTeX

@inproceedings{edumirror2026,
  title     = {EduMirror: Modeling Educational Social Dynamics with Value-driven Multi-agent Simulation},
  author    = {Lin, Jingzhe and Yu, Hengbin and Zeng, Yongdan and Zhong, Fangwei},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  year      = {2026},
  address   = {Seoul, South Korea},
  publisher = {PMLR}
}

Demo Video

A safer laboratory for educational social dynamics.

From theory to simulation traces.

Computable scenario construction

Select grounding theory

Identify constructs

Profile agents

Configure metrics

Run comparisons

A theory-grounded scenario library.

Each scenario is packaged as a computable research unit.

Evidence from realism, scale, and intervention tests.

Scenario-wide win-rate comparison

Scalability under larger groups

Psychological trajectories in bullying simulation

Intervention outcomes by strategy

Intervention branches in the family financial strain scenario.

BibTeX