Research Models & Releases·arXiv cs.CL·May 16

Closing the Gap at CRAC 2026: Two-Stage Adaptation for LLM-Based Multilingual Coreference Resolution

A Gemma-3-27b based system won the LLM track at CRAC 2026 by combining multilingual adapter tuning with iterative document annotation, achieving 74.32 CoNLL F1 across diverse languages and document structures. The two-stage fine-tuning approach, pairing a shared multilingual base adapter with task-specific refinements, signals a practical pattern for scaling reference resolution across linguistic boundaries. This work matters because coreference remains a bottleneck for downstream NLP tasks, and the adapter-based strategy offers a replicable blueprint for practitioners balancing model scale against multilingual robustness without full retraining.

Modelwire context

Explainer

The paper's core contribution is not the 74.32 F1 score itself, but the demonstration that shared multilingual adapters can be decoupled from task-specific refinements without catastrophic forgetting across languages. This separation of concerns is what makes the approach portable to new languages or domains without retraining the base model.

This work sits alongside the broader shift toward parameter-efficient adaptation we've seen across recent papers. The SkillTTA work from mid-May showed how retrieval and synthesis replace fine-tuning for agent task specialization; here, adapters replace full model retraining for multilingual robustness. Both sidestep the deployment friction of parameter updates. The difference: SkillTTA operates at inference time with no parameter changes, while this CRAC system still fine-tunes but constrains it to adapter layers. The practical implication is similar though: practitioners get customization without the cost of maintaining multiple full model copies.

If the same two-stage adapter strategy maintains its F1 gains when applied to a held-out language family (e.g., Dravidian languages if the CRAC 2026 test set was Indo-European heavy), that confirms the approach generalizes. If performance drops sharply on unseen language pairs, the multilingual adapter may be overfitted to the training language distribution rather than learning a truly language-agnostic representation.

Coverage we drew on

Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsGemma-3-27b · CRAC 2026 · CoNLL F1

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.