Xiangxiang Chu (初祥祥)

Senior Director & Head of AMAP-ML, Alibaba Group

I build foundation AI systems that move from original research to reproducible open source and large-scale map, mobility, and interactive AI products. I lead AMAP-ML at Alibaba AMAP, a 100+ member product-facing AI team working on spatial intelligence, generative intelligence, reasoning agents, and world models for AMAP products serving 300M+ users every day.

14,000+ Total Citations
6,000+ As First Author
120+ Publications
300M+ Daily Users

Updated June 2026 · Citation metrics from Google Scholar.


Recent Updates

2026.05.18 MobilityBench accepted to KDD 2026, benchmarking route-planning agents in real-world mobility scenarios.
2026.05.12 CoEvolve and Thinking-with-Map added new ACL 2026 work on agent-data evolution and map-augmented reasoning.
2026.05.11 DreamX-World released the 5B-Cam model and inference code for interactive world simulation.
2026.05.01 AMAP-ML added four ICML 2026 papers across unified multimodal generation, data-efficient RL, long-video generation, and preference optimization.

View all updates


Current Focus

LLM Reasoning
A simple and strong reinforcement learning baseline for model reasoning — no critic, no reference model, no KL penalty. Adopted by ByteDance's VERL framework as an official algorithm.
ICLR 2026 · First Author · GitHub stars
World Model
A general-purpose interactive world model that creates diverse, high-fidelity virtual environments with camera-controlled navigation and prompt-driven world events.
AMAP-ML · 2026 · GitHub stars
Spatial AI
A scalable benchmark for evaluating route-planning agents in real-world mobility scenarios, connecting agent research with AMAP's spatial intelligence anchor.
KDD 2026 · AMAP-ML · GitHub stars
Agent
Agentic skill evolution from real interaction traces, turning reusable skills into collective libraries across sessions, devices, and agents.
GitHub stars

Earlier Impact

Detection
Industrial-grade real-time object detection framework with a full training-to-deployment toolchain, broad open-source adoption, and follow-up deployment work on RepVGG-style quantization.
GitHub stars
Vision-Language
A compact vision-language assistant designed for real-time on-device deployment, with 1B/3B models evaluated on mobile hardware such as Snapdragon 888.
First Author · GitHub stars
Architecture
Revisiting spatial attention in Vision Transformers. Outperforms Swin Transformer with simpler design and better deployment properties.
NeurIPS 2021 · First Author · Most Influential Paper · GitHub stars
Foundation
A unified LLaMA-style backbone for vision tasks, introducing auto-scaling 2D RoPE for multimodal Transformers and reporting strong results across generation, classification, segmentation, and detection.
ECCV 2024 · First Author · GitHub stars

Research Journey

2024 – Present · Alibaba AMAP
Spatial Intelligence, Generative Intelligence, Reasoning Agents & World Models
Leading a 100+ member product-facing AI team across two AMAP product anchors: spatial intelligence and generative intelligence. The technical stack spans LLM reasoning (GPG — adopted by ByteDance's VERL, Tree-GRPO, CoEvolve), world models (DreamX-World, Code2World), AI agents (SkillClaw), generative AI (DCW, S2-Guidance, FluxText), multimodal understanding, and intelligent mobility (MobilityBench, GenMRP). Several research lines connect to production systems: multimodal technology supports the Saojie Bang (扫街榜) pipeline; large-scale industrial Agent work contributes to AMAP's AI Companion (AI 伴行) — alongside AMAP products serving 300M+ users every day. Published 45+ papers at top venues; open-sourced 30+ AMAP-ML projects.
2020 – 2024 · Meituan
Vision Transformers, Multimodal Models & Industrial AI
Built the Visual Intelligence team from scratch. Created Twins (NeurIPS 2021), CPVT (ICLR 2023), VisionLLaMA (ECCV 2024); reproduced LLaMA 7B and built MobileVLM for on-device deployment; open-sourced YOLOv6; shipped autonomous delivery and drone perception systems.
2017 – 2020 · Xiaomi
Neural Architecture Search & AutoML
Founded Xiaomi's AutoML team. Produced a series of influential NAS works — FairNAS (ICCV 2021), FairDARTS (ECCV 2020), DARTS- (ICLR 2021), FALSR — establishing new standards for fair and robust architecture search. Featured by Lei Jun and major AI media.
2013 – 2017 · KingStar
Power Grid AI & Reinforcement Learning
Core contributor to the "Complex Power Grid Autonomous-Collaborative Automatic Voltage Control" project. Contributed 20 invention patents. Awarded the National Science and Technology Progress First Prize (2018).
2012 – 2013 · IBM Research China
Large-Scale Data Analytics
Research scientist working on large-scale data analytics and machine learning solutions.

Recognition

  • Top 100 AI Scholars, AMiner 2023 — selected from hundreds of thousands of AI researchers worldwide
  • 3 first-authored papers on PaperDigest's Most Influential Paper List: FairNAS, Twins, CPVT
  • Area Chair: ICLR, NeurIPS  |  Senior Program Committee: AAAI, IJCAI
  • 40+ domestic and 7 international invention patents

Core Technical Directions

Spatial Intelligence — Route-planning agents (MobilityBench), map-augmented geolocalization (Thinking-with-Map), autonomous-driving VLA reasoning (AutoDrive-R2), urban scene understanding, and industrial mobility systems

Generative Intelligence — Scene-text editing (FluxText), diffusion-model optimization (DCW, S2-Guidance), video virtual try-on (Eevee), 3D editing (RL3DEdit), and controllable visual effects (Omni-Effects)

Reasoning Agents — Reinforcement learning for LLM reasoning (GPG, MathForge), tree-search agent training (Tree-GRPO), agent-data co-evolution (CoEvolve), and collective skill evolution (SkillClaw)

World Models & Interactive AI — Interactive world simulation (DreamX-World), GUI world models (Code2World), and benchmarks for dynamic 4D response capabilities (Omni-WorldBench)

Multimodal Understanding — Vision-language reasoning, visual policy optimization, spatial intelligence evaluation (SpatialGenEval), multimodal in-context learning (STV)

Foundation Architectures — Frequency-aware sparse attention (FASA), unified pretraining for generation and understanding (USP), end-to-end pixel generation without VAE (EPG), and diffusion LLMs (AR-MAP)


Team & Opportunities

I lead the AMAP-ML team at Alibaba Group, a 100+ member product-facing AI team with strong research and engineering backgrounds across foundation models, agents, multimodal learning, spatial intelligence, and generative AI.

Our philosophy: We build systems where research quality, engineering discipline, open-source reproducibility, and product deployment reinforce each other. Many core projects ship with reproducible code, and our work contributes to AMAP products serving 300M+ users every day.

Open Source

We maintain 30+ projects on GitHub spanning spatial intelligence, generative intelligence, reasoning agents, world models, and multimodal AI.

Hiring

We are always looking for talented interns, full-time researchers, and AI engineers in LLM agents, reinforcement learning, world models, multimodal learning, spatial intelligence, and generative AI. Drop me an email if interested.


Education

  • M.S. in Electrical Engineering, Tsinghua University, 2012
  • B.S. in Electrical Engineering, Southeast University, 2010