Publications

You can also find my articles on my Google Scholar profile.


First-Author Papers

  1. Twins: Revisiting the design of spatial attention in vision transformers, NeurIPS21 GitHub stars
  2. Conditional Positional Encodings for Vision Transformers, ICLR23 GitHub stars
  3. VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks, ECCV24 GitHub stars
  4. GPG: A simple and strong reinforcement learning baseline for model reasoning, ICLR26 GitHub stars
  5. MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices GitHub stars
  6. MobileVLM V2: Faster and Stronger Baseline for Vision Language Model GitHub stars
  7. FairNAS: Rethinking evaluation fairness of weight sharing neural architecture search, ICCV21 GitHub stars
  8. Fair DARTS: Eliminating unfair advantages in differentiable architecture search, ECCV20 GitHub stars
  9. DARTS-: Robustly stepping out of performance collapse without indicators, ICLR21 GitHub stars
  10. ROME: Robustifying memory-efficient NAS via topology disentanglement and gradients accumulation, ICCV23
  11. Make RepVGG Greater Again: A Quantization-aware Approach, AAAI24 GitHub stars
  12. MixPATH: A unified approach for one-shot neural architecture search, ICCV23 GitHub stars
  13. USP: Unified self-supervised pretraining for image generation and understanding, ICCV25 GitHub stars
  14. Noisy differentiable architecture search, BMVC21 GitHub stars
  15. A Unified Mixture-View Framework for Unsupervised Representation Learning, BMVC22
  16. Multi-objective reinforced evolution in mobile neural architecture search, ECCVW2020
  17. Fast, accurate and lightweight super-resolution with neural architecture search, ICPR20 GitHub stars
  18. MoGA: Searching beyond MobileNetV3, ICASSP2020 GitHub stars
  19. Scarlet-NAS: Bridging the gap between stability and scalability in weight-sharing NAS, ICCVW21 GitHub stars
  20. Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning
  21. Improved crowding distance for NSGA-II
  22. Policy optimization with penalized point probability distance: An alternative to PPO

Collaborative Papers

Image Generation & Editing

  1. Layer-wise Instance Binding for Regional and Occlusion Control in Text-to-Image Diffusion Transformers, CVPR26
  2. Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing GitHub stars
  3. From Scale to Speed: Adaptive Test-Time Scaling for Image Editing, CVPR26
  4. Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning
  5. From editor to dense geometry estimator, CVPR26 GitHub stars
  6. Ragsr: Regional attention guided diffusion for image super-resolution
  7. S-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models, ICLR26 GitHub stars
  8. LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling, ICCV25 GitHub stars
  9. Flux-text: A simple and advanced diffusion transformer baseline for scene text editing
  10. Preference Alignment for Diffusion Model via Explicit Denoised Distribution Estimation
  11. FlowDreamer: exploring high fidelity text-to-3D generation via rectified flow
  12. TEXTS-Diff: TEXTS-Aware Diffusion Model for Real-World Text Image Super-Resolution, ICASSP26
  13. Accurate and efficient single image super-resolution with matrix channel attention network, ACCV20

Video Generation & Understanding

  1. Video-CoE: Reinforcing Video Event Prediction via Chain of Events, CVPR26
  2. Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
  3. Eevee: Towards Close-up High-resolution Video-based Virtual Try-on, CVPR26 Findings GitHub stars
  4. ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints, AAAI26 GitHub stars
  5. Video-star: Reinforcing open-vocabulary action recognition with tools, ICLR26
  6. Omni-effects: Unified and spatially-controllable visual effects generation, AAAI26 GitHub stars
  7. Narrlv: Towards a comprehensive narrative-centric evaluation for long video generation models, ICLR26 GitHub stars
  8. VMBench: A Benchmark for Perception-Aligned Video Motion Generation, ICCV25 GitHub stars
  9. FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos, ACM MM25 GitHub stars
  10. Latent Temporal Discrepancy as Motion Prior: A Loss-Weighting Strategy for Dynamic Fidelity in T2V, ICASSP26
  11. Artifact-Aware Evaluation for High-Quality Video Generation, ICASSP26
  12. Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation, CVPR22

LLM Reasoning & Agents

  1. Code2World: A GUI World Model via Renderable Code Generation GitHub stars
  2. Entropy-Guided Data-Efficient Training for Multimodal Reasoning Reward Models
  3. Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation, ICLR26 GitHub stars
  4. AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting, AAAI26
  5. Tree search for LLM agent reinforcement learning, ICLR26 GitHub stars
  6. AutoDrive-R: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving, ICLR26
  7. Position bias mitigates position bias: Mitigate position bias through inter-position knowledge distillation, EMNLP25 oral GitHub stars
  8. HS-STAR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation, EMNLP25 oral GitHub stars
  9. Ranking-aware Reinforcement Learning for Ordinal Ranking, ICASSP26

Multimodal & Vision-Language

  1. What if Agents Could Imagine? Reinforcing Open-Vocabulary HOI Comprehension through Generation
  2. Q-Hawkeye: Reliable Visual Policy Optimization for Image Quality Assessment GitHub stars
  3. Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models, ICLR26 GitHub stars
  4. Urban Socio-Semantic Segmentation with Vision-Language Reasoning, ICLR26 GitHub stars
  5. Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning, AAAI26 GitHub stars
  6. Univg-r1: Reasoning guided universal visual grounding with reinforcement learning
  7. Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model
  8. Mmgenbench: Evaluating the limits of LMMs from the text-to-image generation perspective
  9. Lenna: Language Enhanced Reasoning Detection Assistant, ICASSP25 GitHub stars

Detection, Segmentation & 3D Perception

  1. UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement, ICCV25 GitHub stars
  2. PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus, CVPR25
  3. SCTNet: Single Branch CNN with Transformer Semantic Information for Real-time Segmentation, AAAI24
  4. FastPillars: A Deployment-friendly Pillar-based 3D Detector, IEEE TCSVT
  5. Yolov6 v3.0: A full-scale reloading
  6. AeDet: Azimuth-invariant multi-view 3D object detection, CVPR23
  7. SegViT: Semantic segmentation with plain vision transformers, NeurIPS22
  8. YOLOv6: A single-stage object detection framework for industrial applications, arXiv GitHub stars
  9. Fully convolutional one-stage 3D object detection on LiDAR range images, NeurIPS22
  10. PromptDet: Towards open-vocabulary detection using uncurated images, ECCV22
  11. Cctrans: Simplifying and improving crowd counting with transformer

Foundation Model Architectures

  1. FASA: Frequency-Aware Sparse Attention, ICLR26 GitHub stars
  2. AR-MAP: Are Autoregressive Large Language Models Implicit Teachers for Diffusion Large Language Models? GitHub stars
  3. Semantic Context Matters: Improving Conditioning for Autoregressive Models
  4. There is No VAE: End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-training, ICLR26 GitHub stars
  5. Scalar: Scale-wise controllable visual autoregressive learning, AAAI26 GitHub stars
  6. Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition, ECCV24
  7. Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness, ICML24
  8. PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution, CVPR24
  9. Efficientrep: an efficient repvgg-style convnets with hardware-aware neural network design

Model Compression & AutoML

  1. LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection, ICLR24
  2. Masked Autoencoders Are Robust Neural Architecture Search Learners
  3. A Speed Odyssey for Deployable Quantization of LLMs
  4. Norm Tweaking: High-performance Low-bit Quantization of Large Language Models, AAAI24
  5. FPTQ: Fine-grained Post-Training Quantization for Large Language Models
  6. EAPruning: Evolutionary Pruning for Vision Transformers and CNNs, BMVC22
  7. DAAS: Differentiable architecture and augmentation policy search
  8. AutoKWS: Keyword Spotting with Differentiable Architecture Search, ICASSP21
  9. Neural Architecture Search on Acoustic Scene Classification, InterSpeech20

Maps, Mobility & Recommendation

  1. MobilityBench: A Scalable Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios GitHub stars
  2. IntRR: A Framework for Integrating SID Redistribution and Length Reduction for Generative Recommendation GitHub stars
  3. IntTravel: A Real-World Dataset and Generative Framework for Integrated Multi-Task Travel Recommendation GitHub stars
  4. GenMRP: A Generative Multi-Route Planning Framework for Efficient and Personalized Real-Time Industrial Navigation
  5. SCASRec: A Self-Correcting and Auto-Stopping Model for Generative Route List Recommendation
  6. Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
  7. Intsr: An integrated generative framework for search and recommendation
  8. Comprehensive Comparison Network: a framework for locality-aware, routes-comparable and interpretable route recommendation
  9. Effective Probabilistic Time Series Forecasting with Fourier Adaptive Noise-Separated Diffusion
  10. DSFNet: Learning Disentangled Scenario Factorization for Multi-Scenario Route Ranking, WWW25 GitHub stars