About me

senior director, AMAP, Alibaba Group

I am a senior director in AMAP and leading a team that delivers intelligent route navigation and implements AIGC solutions for core business scenarios. Before that, I was a senior manager in Meituan. I obtained a bachelor of science in Electrical Engineering from SouthEast University in 2010 and later a master of science in Electrical Engineering at Tsinghua University in 2012. I was selected as one of the top 100 AI scholars by AMiner in 2023.

Research and Innovation: Published 40+ papers at AI conferences, of which 20 were first-authored; 3 first-authored papers were selected for the PaperDigest’s most influential paper list (FairNAS, Twins, CPVT); holds over 40 domestic and 7 international invention patents.
Research Directions: LLM pre-training, multimodal large models, reinforcement learning and generative models, foundation model design, NAS, model compression, self-supervised learning, 2D perception, 3D detection, etc., and has paper output or business implementation in the above directions.
Achievements: Has rich implementation experience in [AI middle platform, autonomous driving, edge cloud]; excels at creating value for the company and customers through technology, especially good at overcoming difficulties; good at coordinating resources, cooperation and win-win.
Team Building and Leadership: Has experience managing a technical team of more than 50 people, excels at building influential technical teams, over 50% of team members come from famous domestic and international AI labs, good at attracting top talents in the industry, open source well-known detection framework in the industry.
Influence: Multiple industry invited talks, top conference shares, technology media reports, Lei Jun & Xiaomi’s multiple public praises, etc.

We are always looking for talented interns/full-time researchers with strong coding skills and research experience. Please drop me an email if interested.

Work experience

Mar 2024 - present, Alibaba Group, AMAP, Senior Director
- Team direction: Multi-modal large models, Generation Models, Route Navigation, Recommendation, Spatio-temporal Data Mining
May 2020 - Mar 2024, Meituan, Visual Intelligence Department, Senior Technical Manager
- Team direction: Multi-modal large models，Large Language Model, perception (2D+3D), foundation model, self-supervised pre-training, model compression, MLOps.
- Large foundation models (LLM and multi-modal):
  - Responsible for building a general understanding (multi-modal) large model, open set understanding, and landing in Meituan scenarios.
  - Compression and efficient deployment of the company’s trillion LLM, reaches the advanced level in the industry.
  - Training and deployment of the LLM base model for the general intelligent robot (embodied intelligence) scene.
  - Reproduced Meta LLaMA 7B base model and SFT from scratch, and independently developed 1B and 3B MobileVLM benchmarking Gemini Nano. MobileVLM v2 outperforms many popular larger models.
  - Foundation models: VisionLLaMA unifies the architecture of vision and language, which also outperforms ViT in both understanding and generation tasks. Twins outperforms Swin and enjoys more deployment advantages in production environment.
- Core technology breakthroughs in autonomous driving and drones:
  - Building the 3D offline system for autonomous delivery vehicles, saving the business many million in annotation costs annually.
  - Improving online perception algorithms (3D obstacle detection, tracking), helping the autonomous vehicle team achieve industry-leading real-time perception capabilities. Improving the efficiency of building high-precision maps.
  - Several core perception modules have been launched on the third and fourth generation drones.
- Broad AI business support and MLOps construction:
  - Construction vision code base, construction of AutoML tools, model compression and deployment (end, edge, cloud), covering most of vision applications such as face recognition, OCR, security review, image understanding, saving many million in vision service costs annually.
- Team building:
  - Building a team from 0 to 1, completing interviews with multiple Google PHD Fellowship, attracting top talents recognized in the industry.
- Influence building:
  - Open sourcing and maintaining the well-known detection library YOLOv6.
  - As a project backbone, undertaking the national science and technology innovation 2030 major special project 3.3 research on key technologies of artificial intelligence fundamental models.
March 2017 - May 2020, Xiaomi, Artificial Intelligence Department, Senior Technical Manager
- Team direction: AutoML, base model design, machine learning.
- Initiated the AutoML project, built the AutoML team from 0 to 1, responsible for all-round technical research, algorithm development and implementation. Implemented in multiple businesses such as scene recognition, segmentation, acoustic scene classification, recommendation, etc.
- Lead the team and won the second place (“Automated Neural Network Design”) in the first “Million Dollar Prize” of Xiaomi Company.
June 2013 - March 2017, Beijing KingStar System Control Co., Ltd., Deputy Director
- As a technical backbone, participated in the “Complex Power Grid Autonomy - Collaborative Automatic Voltage Control Key Technology, System Development and Engineering Application” project, which was awarded the 2018 National Science and Technology Progress First Prize (contributing 20 invention patents).
July 2012 - May 2013, IBM Research of China (CRL), Research Scientist

Publications

First-author papers

Twins: Revisiting the design of spatial attention in vision transformers, NeurIPS21

Conditional Positional Encodings for Vision Transformers, ICLR23

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks, ECCV24

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

FairNAS: Rethinking evaluation fairness of weight sharing neural architecture search, ICCV21

Fair DARTS: Eliminating unfair advantages in differentiable architecture search, ECCV20

DARTS-: robustly stepping out of performance collapse without indicators, ICLR21

ROME: Robustifying memory-efficient nas via topology disentanglement and gradients accumulation, ICCV23

Make RepVGG Greater Again: A Quantization-aware Approach, AAAI24

MixPATH: A unified approach for one-shot neural architecture search, ICCV23

USP: Unified self-supervised pretraining for image generation and understanding, ICCV25

Noisy differentiable architecture search, BMVC21

A Unified Mixture-View Framework for Unsupervised Representation Learning, BMVC22

Multi-objective reinforced evolution in mobile neural architecture search, ECCVW2020

Fast, accurate and lightweight super-resolution with neural architecture search, ICPR20

MoGA: Searching beyond mobilenetv3, ICASSP2020

Scarlet-NAS: bridging the gap between stability and scalability in weight-sharing neural architecture search, ICCVW21

Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning

Improved crowding distance for NSGA-II

Policy optimization with penalized point probability distance: An alternative to proximal policy optimization

Gpg: A simple and strong reinforcement learning baseline for model reasoning

Other Collaborative Papers

UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement, ICCV25

VMBench: A Benchmark for Perception-Aligned Video Motion Generation, ICCV25

LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling, ICCV25

FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos, ACM MM25

Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition, ECCV24

Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness, ICML24

PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution, CVPR24

LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection, ICLR24

Norm Tweaking: High-performance Low-bit Quantization of Large Language Models, AAAI24

YOLOv6: A single-stage object detection framework for industrial applications (github stars 5.3k)

A Speed Odyssey for Deployable Quantization of LLMs

FPTQ: Fine-grained Post-Training Quantization for Large Language Models

Lenna: Language Enhanced Reasoning Detection Assistant

SCTNet: Single Branch CNN with Transformer Semantic Information for Real-time Segmentation, AAAI24

Promptdet: Towards open-vocabulary detection using uncurated images, ECCV22

SegVIT: Semantic segmentation with plain vision transformers, NeurIPS22

Fully convolutional one-stage 3D object detection on LiDAR range images, NeurIPS22

Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation, CVPR22

Aedet: Azimuth-invariant multi-view 3d object detection, CVPR23

EAPruning: Evolutionary Pruning for Vision Transformers and CNNs, BMVC22

AutoKWS: Keyword Spotting with Differentiable Architecture Search, ICASSP21

Neural Architecture Search on Acoustic Scene Classification, InterSpeech20

Accurate and efficient single image super-resolution with matrix channel attention network, ACCV20

STRETCH meat grinder with ICCOS, IEEE Transactions on Plasma Science

Comparisons of three inductive pulse power supplies, IEEE Transactions on Plasma Science

FastPillars: A Deployment-friendly Pillar-based 3D Detector

LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense Knowledge

Xiangxiang Chu（初祥祥）

senior director, AMAP, Alibaba Group

Work experience

Publications

Selected Related Reports