Research preview研究预览 April 26, 2026 Closed-loop evolution闭环演化

Escher-Loop

Intelligence is the ability to improve task agents and the optimizers that improve them.

智能是同时改进任务解法与改进“优化器自身”的能力。

Escher-Loop turns optimization ability from a fixed human prior into an evolving population. Task feedback remains the only grounded signal: every task-agent score is reused to evaluate optimizer agents almost for free.

Escher-Loop 把优化能力从固定的人类先验,变成一个可演化的优化器种群。系统仍然只依赖真实任务反馈:每一次任务智能体得分都会被复用来衡量优化器,几乎是 for free。

Ziyang Liu, Xinyan Guo, Xuchen Wei, Han Hao, Liu Yang Shenzhen X-Institute, Soochow University, HIT Shenzhen, Tsinghua University, NUS 深圳零一学院、苏州大学、哈工大深圳、清华大学、新加坡国立大学 Contact: newzil1225@gmail.com 联系邮箱:newzil1225@gmail.com
Core claim
核心论点

Optimization ability is not optimized directly. It emerges from the loop.

优化能力不是直接目标,而是在闭环中涌现。

Task agents receive absolute scores from execution. Optimizer agents receive relative scores by comparing the task agents they produce. The extra step is self-referential optimizer evolution; the measurement signal is already paid for by task evaluation.

任务智能体通过执行获得绝对分数。优化器智能体通过比较它们生成的任务智能体获得相对分数。额外增加的是优化器自进化;衡量优化器的信号已经由任务评测天然提供。

Define定义intelligence as recursive improvement of solutions and optimizers智能是解法与优化器的递归改进能力
Measure衡量optimizer quality through the task agents it generates通过优化器生成的任务智能体衡量优化能力
Reuse复用task scores as optimizer win-loss signals without extra evaluator calls复用任务分数作为优化器胜负信号,无需额外评测器
Motivation
动机

The central object is the ability to keep improving.

核心对象是持续改进的能力。

Following the paper, Escher-Loop treats intelligence as recursive improvement: an intelligent system improves task agents while also improving the optimizer that generates future task agents.

与论文表述一致,Escher-Loop 将智能视为递归改进:系统一边改进任务智能体,一边改进未来产生任务智能体的优化器。

The key move is operational rather than philosophical. If an optimizer repeatedly generates task agents with stronger empirical scores, then its optimization capability has improved. The optimizer is no longer an invisible workflow around the agent; it becomes an agent population with scores, competition, and evolution.

关键不是抽象哲学定义,而是可操作定义:如果一个优化器不断生成得分更高的任务智能体,那么它的优化能力就提升了。优化器不再是智能体系统外部不可见的流程,而是一个拥有分数、竞争和演化的智能体种群。

Task feedback is the bridge: one execution improves the task population and evaluates the optimizer population.
任务反馈是桥梁:同一次执行既改进任务种群,也衡量优化器种群。
Definition
定义

Optimization ability is defined by the improvements it causes.

优化能力由它造成的改进来定义。

Escher-Loop keeps two scored populations: task agents with absolute task scores, and optimizer agents with relative scores induced by the task agents they generate.

Escher-Loop 维护两个带分数的种群:任务智能体拥有绝对任务分数,优化器智能体则由其生成的任务智能体诱导出相对分数。

Two scored populations 两个带分数的种群

Task agents and optimizer agents share the same evolutionary accounting.

任务智能体和优化器智能体共享同一套演化记账方式。

The task population stores executable candidates and task scores; the optimizer population stores optimizer programs or prompts and relative Elo scores.

任务种群保存可执行候选体及任务分数;优化器种群保存优化器程序或提示词及相对 Elo 分数。

\[ \mathcal{T}=\{(t_j,s_j^t)\}_{j\in J},\qquad \mathcal{O}=\{(o_i,s_i^o)\}_{i\in I} \]
Operational test 操作性检验

An optimizer improves when its generated task agent improves.

优化器的改进体现在它生成的任务智能体改进上。

Given scored task agents, an optimizer proposes a new task agent. The task function supplies the grounded score.

给定带分数的任务智能体后,优化器提出新的任务智能体,任务函数提供真实分数。

\[ t_{\mathrm{new}}=o((t_1,s_1^t),\ldots,(t_n,s_n^t)), \qquad s_{\mathrm{new}}^t=f(t_{\mathrm{new}}) \]
Self-referential step 自指步骤

The optimizer becomes a target of optimization.

优化器本身成为被优化对象。

Because optimizer agents are represented as editable programs or prompts, a strong optimizer can rewrite the optimizer population itself. This extra self-referential step is what turns optimization logic into an evolving object.

因为优化器智能体也被表示为可编辑的程序或提示词,强优化器可以继续改写优化器种群本身。这个额外的自指步骤让优化逻辑成为可演化对象。

\[ o_{\mathrm{new}}=o^{\ast}((o_1,s_1^o),\ldots,(o_m,s_m^o)) \]
Mechanism
机制

One execution creates two feedback signals.

一次执行,两种反馈。

The loop samples both populations, generates task agents, reuses task scores for optimizer Elo, then lets optimizers rewrite the optimizer population.

闭环先抽样两个种群,再生成任务智能体,复用任务分数更新优化器 Elo,最后让优化器继续改写优化器种群。

Escher-Loop mechanism with both scored populations highlighted. Escher-Loop mechanism with task-agent optimization and task execution highlighted. Escher-Loop mechanism with dynamic benchmarking and relative optimizer scoring highlighted. Escher-Loop mechanism with self-referential optimizer evolution highlighted.
Two populationsTask agents and optimizer agents are both explicit evolving populations.
Evidence
实验证据

Matched-compute runs across three optimization landscapes.

在三个数学优化场景中进行同等计算量实验。

Under a 10M equivalent-token budget per task, Escher-Loop pushes past static baselines and avoids reducing the system to one isolated best optimizer.

在每个任务 10M equivalent-token 预算下,Escher-Loop 突破静态基线,并且证明完整动态种群不是一个孤立最佳优化器可以替代的。

Kissing Number, D=110.981

Raw score: 582 points. Reference: 593 points.

原始分数:582 点。参考值:593 点。

Circle Packing, N=261.000

Raw score: 2.6352223118. Reference: 2.6350.

原始分数:2.6352223118。参考值:2.6350。

Heilbronn Triangle, N=110.999

Raw score: 0.0365253447. Reference: 0.0365298899.

原始分数:0.0365253447。参考值:0.0365298899。

All trajectories come from the same experimental batch; the plot is not a post-hoc selection of the best-looking runs. Read these results as matched-budget empirical evidence, not as a claim about the absolute performance ceiling of either Escher-Loop or the static baseline.

所有轨迹都来自同一批实验;图中结果不是事后挑选出来的“最好看”运行。这里的结果应被理解为同等预算下的经验证据,而不是对 Escher-Loop 或静态基线绝对性能上限的宣称。

Comparison of Escher-Loop and baseline trajectories.
Full Escher-Loop compared with the handcrafted OpenEvolve-style baseline under the same equivalent-token budget.同等 equivalent-token 预算下,完整 Escher-Loop 与手工 OpenEvolve 风格基线对比。Matched-compute trajectories
Comparison of a single best optimizer and the full loop.
Dynamic evolution continually updates the optimizer population during search, outperforming a frozen single optimizer even when it transfers useful behavior.动态演化会在搜索过程中持续更新优化器种群,因此即使单个演化优化器能迁移有效行为,也无法替代完整闭环。Dynamic evolution advantage
What evolves
进化内容

Optimizer agents learn search policy, not just better wording.

优化器学到的是搜索策略,而不只是更好的措辞。

Evolved optimizers add diagnostic feedback, adaptive search control, stage-aware prompts, and reference-program mining.

演化后的优化器引入诊断反馈、自适应搜索控制、阶段感知提示,以及参考程序挖掘。

Diagnostic feedback诊断反馈Regression, plateau, and low-diversity failures become explicit optimization signals.回退、平台期和低多样性失败成为显式优化信号。
Adaptive search radius自适应搜索半径The optimizer can decide when to refine locally and when to mutate architecture.优化器可以决定何时局部精修、何时进行结构变异。
Reference-program mining参考程序挖掘Evolution history becomes reusable material for future candidates.演化历史成为后续候选体可复用的材料。
Representative evolved optimizer code snippets.
Representative evolved optimizer code segments show how the search policy changes.代表性优化器代码片段展示搜索策略如何改变。Evolved optimizer policy
Authors
作者

The team behind Escher-Loop.

Escher-Loop 团队。

Escher-Loop is maintained by the paper authors. For questions about the website, code release, or paper links, contact the authors below.

Escher-Loop 由论文作者共同维护。关于网站页面、代码发布或论文链接,可通过下方邮箱联系作者。