Xiangxiang Chu(初祥祥)
Senior Director & Head of AMAP-ML, Alibaba Group
I lead AMAP-ML at Alibaba AMAP, a 100+ member product-facing AI team building foundation systems for spatial intelligence and generative intelligence. My research traces an arc from efficient neural architecture design to multimodal foundation models, LLM reasoning, world models, agent systems, and large-scale AI products serving hundreds of millions of users. The thread that connects all of it: making AI systems more efficient, more capable, and more broadly useful.
110+ Publications
14,000+ Citations
100+ Team Members
10,000+ GitHub Stars
Featured Projects
LLM ReasoningA simple and strong reinforcement learning baseline for model reasoning — no critic, no reference model, no KL penalty. Adopted by ByteDance's
VERL framework as an official algorithm.
World ModelA general-purpose interactive world model that creates diverse, high-fidelity virtual environments with camera-controlled navigation and prompt-driven world events.
GenerationUnified self-supervised pretraining that bridges image generation and understanding within a single framework.
AgentAgentic skill evolution from real interaction traces, turning reusable skills into collective libraries across sessions, devices, and agents.
DetectionIndustrial-grade real-time object detection framework. Widely deployed in production across the industry, with a full training-to-deployment toolchain.
Vision-LanguageThe first vision language assistant that runs in real-time on mobile devices (Snapdragon 888). The 1B/3B models benchmark against Gemini Nano.
ArchitectureRevisiting spatial attention in Vision Transformers. Outperforms Swin Transformer with simpler design and better deployment properties.
FoundationA unified LLaMA-style backbone for vision tasks. Pioneered auto-scaling 2D RoPE for multimodal Transformers — the approach was later adopted by Qwen-VL and others. Surpasses ViT across generation, classification, segmentation, and detection.
Research Journey
2024 – Present · Alibaba AMAP
Spatial Intelligence, Generative Intelligence, Reasoning Agents & World Models
Leading a 100+ member product-facing AI team across two AMAP product anchors: spatial intelligence and generative intelligence. The technical stack spans LLM reasoning (GPG — adopted by ByteDance's VERL, Tree-GRPO, CoEvolve), world models (DreamX-World, Code2World), AI agents (SkillClaw — 1,300+ Stars), generative AI (DCW, S2-Guidance, FluxText), multimodal understanding, and intelligent mobility (MobilityBench, GenMRP). Research directly ships to production: multimodal technology powers the Saojie Bang (扫街榜) pipeline; large-scale industrial Agent system drives AMAP's AI Companion (AI 伴行) — both serving hundreds of millions of users. Published 45+ papers at top venues; open-sourced 30+ AMAP-ML projects with 10,000+ cumulative GitHub stars.
2020 – 2024 · Meituan
Vision Transformers, Multimodal Models & Industrial AI
Built the Visual Intelligence team from scratch. Created Twins (NeurIPS'21), CPVT (ICLR'23), VisionLLaMA (ECCV'24); reproduced LLaMA 7B and built MobileVLM for on-device deployment; open-sourced YOLOv6 (5,700+ Stars); shipped autonomous delivery and drone perception systems.
2017 – 2020 · Xiaomi
Neural Architecture Search & AutoML
Founded Xiaomi's AutoML team. Produced a series of influential NAS works — FairNAS (ICCV'21), FairDARTS (ECCV'20), DARTS- (ICLR'21), FALSR — establishing new standards for fair and robust architecture search. Featured by Lei Jun and major AI media.
2013 – 2017 · KingStar
Power Grid AI & Reinforcement Learning
Core contributor to the "Complex Power Grid Autonomous-Collaborative Automatic Voltage Control" project. Contributed 20 invention patents. Awarded the National Science and Technology Progress First Prize (2018).
2012 – 2013 · IBM Research China
Large-Scale Data Analytics
Research scientist working on large-scale data analytics and machine learning solutions.
Recognition
- Top 100 AI Scholars, AMiner 2023 — selected from hundreds of thousands of AI researchers worldwide
- National Science and Technology Progress First Prize, 2018 — contributed 20 invention patents
- 3 first-authored papers on PaperDigest's Most Influential Paper List: FairNAS, Twins, CPVT
- Area Chair: ICLR, NeurIPS | Senior Program Committee: AAAI, IJCAI
- 40+ domestic and 7 international invention patents
Core Technical Directions
Spatial Intelligence — Route-planning agents (MobilityBench), map-augmented geolocalization (Thinking-with-Map), autonomous-driving VLA reasoning (AutoDrive-R2), urban scene understanding, and industrial mobility systems
Generative Intelligence — Scene-text editing (FluxText), diffusion-model optimization (DCW, S2-Guidance), video virtual try-on (Eevee), 3D editing (RL3DEdit), and controllable visual effects (Omni-Effects)
Reasoning Agents — Reinforcement learning for LLM reasoning (GPG, MathForge), tree-search agent training (Tree-GRPO), agent-data co-evolution (CoEvolve), and collective skill evolution (SkillClaw)
World Models & Interactive AI — Interactive world simulation (DreamX-World), GUI world models (Code2World), and benchmarks for dynamic 4D response capabilities (Omni-WorldBench)
Multimodal Understanding — Vision-language reasoning, visual policy optimization, spatial intelligence evaluation (SpatialGenEval), multimodal in-context learning (STV)
Foundation Architectures — Frequency-aware sparse attention (FASA), unified pretraining for generation and understanding (USP), end-to-end pixel generation without VAE (EPG), and diffusion LLMs (AR-MAP)
Team & Opportunities
I lead the AMAP-ML team at Alibaba Group, a 100+ member product-facing AI team with over half recruited from top AI labs globally, including multiple Google PhD Fellowship recipients.
Our philosophy: We build systems where research quality, engineering discipline, open-source reproducibility, and product deployment reinforce each other. Every core paper ships with reproducible code, and our work directly powers products serving hundreds of millions of users.
Open Source
We maintain 30+ projects on GitHub spanning spatial intelligence, generative intelligence, reasoning agents, world models, and multimodal AI, with 10,000+ cumulative stars.
Hiring
We are always looking for talented interns, full-time researchers, and AI engineers in LLM agents, reinforcement learning, world models, multimodal learning, spatial intelligence, and generative AI. Drop me an email if interested.
Education
- M.S. in Electrical Engineering, Tsinghua University, 2012
- B.S. in Electrical Engineering, Southeast University, 2010