About me

Xiangxiang Chu(初祥祥)

Senior Director & Head of AMAP-ML, Alibaba Group

My research traces an arc from efficient neural architecture design to multimodal large models and generative AI. Starting with neural architecture search at Xiaomi, I moved to Vision Transformer design (Twins, CPVT) and multimodal foundation models (VisionLLaMA, MobileVLM) at Meituan, and now lead a 100+ member team at Alibaba AMAP building LLM reasoning, generative models, and intelligent mobility systems. The thread that connects all of it: making AI systems more efficient, more intelligent, and more broadly useful.

110+ Publications
14,000+ Citations
100+ Team Members
10,000+ GitHub Stars

LLM Reasoning
A simple and strong reinforcement learning baseline for model reasoning — no critic, no reference model, no KL penalty. Adopted by ByteDance's VERL framework as an official algorithm.
ICLR 2026 · First Author · GitHub stars
Detection
Industrial-grade real-time object detection framework. Widely deployed in production across the industry, with a full training-to-deployment toolchain.
5,700+ Stars · GitHub stars
Vision-Language
The first vision language assistant that runs in real-time on mobile devices (Snapdragon 888). The 1B/3B models benchmark against Gemini Nano.
First Author · GitHub stars
Architecture
Revisiting spatial attention in Vision Transformers. Outperforms Swin Transformer with simpler design and better deployment properties.
NeurIPS 2021 · First Author · Most Influential Paper · GitHub stars
Architecture
A unified LLaMA-style backbone for vision tasks that surpasses ViT across image generation, classification, segmentation, and detection.
ECCV 2024 · First Author · GitHub stars
Generation
Unified self-supervised pretraining that bridges image generation and understanding within a single framework.
ICCV 2025 · First Author · GitHub stars

Research Journey

2024 – Present · Alibaba AMAP
LLM Reasoning, Generative AI & Intelligent Mobility
Leading a 100+ member team building LLM reasoning systems (GPG — adopted by ByteDance's VERL framework, Tree-GRPO, CoEvolve), image/video generation (DCW, S-Guidance, Eevee), foundation architectures (FASA, EPG), and AI-powered navigation (MobilityBench, GenMRP). 30+ top-venue papers in the first year.
2020 – 2024 · Meituan
Vision Transformers, Multimodal Models & Industrial AI
Built the Visual Intelligence team from scratch. Created Twins (NeurIPS'21), CPVT (ICLR'23), VisionLLaMA (ECCV'24); reproduced LLaMA 7B and built MobileVLM for on-device deployment; open-sourced YOLOv6 (5,700+ Stars); shipped autonomous delivery and drone perception systems.
2017 – 2020 · Xiaomi
Neural Architecture Search & AutoML
Founded Xiaomi's AutoML team. Produced a series of influential NAS works — FairNAS (ICCV'21), FairDARTS (ECCV'20), DARTS- (ICLR'21), FALSR — establishing new standards for fair and robust architecture search. Featured by Lei Jun and major AI media.
2013 – 2017 · KingStar
Power Grid AI & Reinforcement Learning
Core contributor to the "Complex Power Grid Autonomous-Collaborative Automatic Voltage Control" project. Contributed 20 invention patents. Awarded the National Science and Technology Progress First Prize (2018).
2012 – 2013 · IBM Research China
Large-Scale Data Analytics
Research scientist working on large-scale data analytics and machine learning solutions.

Recognition

  • Top 100 AI Scholars, AMiner 2023 — selected from hundreds of thousands of AI researchers worldwide
  • National Science and Technology Progress First Prize, 2018 — contributed 20 invention patents
  • 3 first-authored papers on PaperDigest's Most Influential Paper List: FairNAS, Twins, CPVT
  • Area Chair: ICLR, NeurIPS  |  Senior Program Committee: AAAI, IJCAI
  • 40+ domestic and 7 international invention patents

Current Research Directions

LLM Reasoning & Agents — Reinforcement learning for LLM reasoning (GPG, MathForge), tree search for agent training (Tree-GRPO), agent-data co-evolution (CoEvolve), autonomous driving VLA (AutoDrive-R)

Image & Video Generation — Diffusion model optimization (DCW, S-Guidance), video virtual try-on (Eevee), long video narrative (NarrLV), motion generation benchmarks (VMBench)

Foundation Architectures — Frequency-aware sparse attention (FASA), unified pretraining for generation and understanding (USP), end-to-end pixel generation without VAE (EPG), diffusion LLMs (AR-MAP)

Intelligent Mobility — Route-planning agent benchmarks (MobilityBench), generative multi-route navigation (GenMRP), map-augmented geolocalization reasoning, integrated search-recommendation


Team & Opportunities

I lead the AMAP-ML team at Alibaba Group, a 100+ member research team with over half recruited from top AI labs globally, including multiple Google PhD Fellowship recipients.

Our philosophy: We believe in the tight coupling of academic research and industrial impact. Every core paper ships with reproducible open-source code, and our research directly powers products serving hundreds of millions of users.

Open Source

We maintain 20+ projects on GitHub spanning LLM reasoning, generative models, and intelligent mobility, with 10,000+ cumulative stars.

Hiring

We are always looking for talented interns and full-time researchers in LLM reasoning, multimodal models, generative AI, and intelligent mobility. Drop me an email if interested.


Education

  • M.S. in Electrical Engineering, Tsinghua University, 2012
  • B.S. in Electrical Engineering, Southeast University, 2010