About me

Xiangxiang Chu(初祥祥)

Senior Director & Head of AMAP-ML, Alibaba Group

I lead AMAP-ML at Alibaba AMAP, a 100+ member product-facing AI team building foundation systems for spatial intelligence and generative intelligence. My research traces an arc from efficient neural architecture design to multimodal foundation models, LLM reasoning, world models, agent systems, and large-scale AI products serving hundreds of millions of users. The thread that connects all of it: making AI systems more efficient, more capable, and more broadly useful.

110+ Publications
14,000+ Citations
100+ Team Members
10,000+ GitHub Stars

LLM Reasoning
A simple and strong reinforcement learning baseline for model reasoning — no critic, no reference model, no KL penalty. Adopted by ByteDance's VERL framework as an official algorithm.
ICLR 2026 · First Author · GitHub stars
World Model
A general-purpose interactive world model that creates diverse, high-fidelity virtual environments with camera-controlled navigation and prompt-driven world events.
GitHub stars
Generation
Unified self-supervised pretraining that bridges image generation and understanding within a single framework.
ICCV 2025 · First Author · GitHub stars
Agent
Agentic skill evolution from real interaction traces, turning reusable skills into collective libraries across sessions, devices, and agents.
1,300+ Stars · GitHub stars
Detection
Industrial-grade real-time object detection framework. Widely deployed in production across the industry, with a full training-to-deployment toolchain.
5,700+ Stars · GitHub stars
Vision-Language
The first vision language assistant that runs in real-time on mobile devices (Snapdragon 888). The 1B/3B models benchmark against Gemini Nano.
First Author · GitHub stars
Architecture
Revisiting spatial attention in Vision Transformers. Outperforms Swin Transformer with simpler design and better deployment properties.
NeurIPS 2021 · First Author · Most Influential Paper · GitHub stars
Foundation
A unified LLaMA-style backbone for vision tasks. Pioneered auto-scaling 2D RoPE for multimodal Transformers — the approach was later adopted by Qwen-VL and others. Surpasses ViT across generation, classification, segmentation, and detection.
ECCV 2024 · First Author · GitHub stars

Research Journey

2024 – Present · Alibaba AMAP
Spatial Intelligence, Generative Intelligence, Reasoning Agents & World Models
Leading a 100+ member product-facing AI team across two AMAP product anchors: spatial intelligence and generative intelligence. The technical stack spans LLM reasoning (GPG — adopted by ByteDance's VERL, Tree-GRPO, CoEvolve), world models (DreamX-World, Code2World), AI agents (SkillClaw — 1,300+ Stars), generative AI (DCW, S2-Guidance, FluxText), multimodal understanding, and intelligent mobility (MobilityBench, GenMRP). Research directly ships to production: multimodal technology powers the Saojie Bang (扫街榜) pipeline; large-scale industrial Agent system drives AMAP's AI Companion (AI 伴行) — both serving hundreds of millions of users. Published 45+ papers at top venues; open-sourced 30+ AMAP-ML projects with 10,000+ cumulative GitHub stars.
2020 – 2024 · Meituan
Vision Transformers, Multimodal Models & Industrial AI
Built the Visual Intelligence team from scratch. Created Twins (NeurIPS'21), CPVT (ICLR'23), VisionLLaMA (ECCV'24); reproduced LLaMA 7B and built MobileVLM for on-device deployment; open-sourced YOLOv6 (5,700+ Stars); shipped autonomous delivery and drone perception systems.
2017 – 2020 · Xiaomi
Neural Architecture Search & AutoML
Founded Xiaomi's AutoML team. Produced a series of influential NAS works — FairNAS (ICCV'21), FairDARTS (ECCV'20), DARTS- (ICLR'21), FALSR — establishing new standards for fair and robust architecture search. Featured by Lei Jun and major AI media.
2013 – 2017 · KingStar
Power Grid AI & Reinforcement Learning
Core contributor to the "Complex Power Grid Autonomous-Collaborative Automatic Voltage Control" project. Contributed 20 invention patents. Awarded the National Science and Technology Progress First Prize (2018).
2012 – 2013 · IBM Research China
Large-Scale Data Analytics
Research scientist working on large-scale data analytics and machine learning solutions.

Recognition

  • Top 100 AI Scholars, AMiner 2023 — selected from hundreds of thousands of AI researchers worldwide
  • National Science and Technology Progress First Prize, 2018 — contributed 20 invention patents
  • 3 first-authored papers on PaperDigest's Most Influential Paper List: FairNAS, Twins, CPVT
  • Area Chair: ICLR, NeurIPS  |  Senior Program Committee: AAAI, IJCAI
  • 40+ domestic and 7 international invention patents

Core Technical Directions

Spatial Intelligence — Route-planning agents (MobilityBench), map-augmented geolocalization (Thinking-with-Map), autonomous-driving VLA reasoning (AutoDrive-R2), urban scene understanding, and industrial mobility systems

Generative Intelligence — Scene-text editing (FluxText), diffusion-model optimization (DCW, S2-Guidance), video virtual try-on (Eevee), 3D editing (RL3DEdit), and controllable visual effects (Omni-Effects)

Reasoning Agents — Reinforcement learning for LLM reasoning (GPG, MathForge), tree-search agent training (Tree-GRPO), agent-data co-evolution (CoEvolve), and collective skill evolution (SkillClaw)

World Models & Interactive AI — Interactive world simulation (DreamX-World), GUI world models (Code2World), and benchmarks for dynamic 4D response capabilities (Omni-WorldBench)

Multimodal Understanding — Vision-language reasoning, visual policy optimization, spatial intelligence evaluation (SpatialGenEval), multimodal in-context learning (STV)

Foundation Architectures — Frequency-aware sparse attention (FASA), unified pretraining for generation and understanding (USP), end-to-end pixel generation without VAE (EPG), and diffusion LLMs (AR-MAP)


Team & Opportunities

I lead the AMAP-ML team at Alibaba Group, a 100+ member product-facing AI team with over half recruited from top AI labs globally, including multiple Google PhD Fellowship recipients.

Our philosophy: We build systems where research quality, engineering discipline, open-source reproducibility, and product deployment reinforce each other. Every core paper ships with reproducible code, and our work directly powers products serving hundreds of millions of users.

Open Source

We maintain 30+ projects on GitHub spanning spatial intelligence, generative intelligence, reasoning agents, world models, and multimodal AI, with 10,000+ cumulative stars.

Hiring

We are always looking for talented interns, full-time researchers, and AI engineers in LLM agents, reinforcement learning, world models, multimodal learning, spatial intelligence, and generative AI. Drop me an email if interested.


Education

  • M.S. in Electrical Engineering, Tsinghua University, 2012
  • B.S. in Electrical Engineering, Southeast University, 2010