Publications

You can also find my articles on my Google Scholar profile.


First-Author Papers

  1. Twins: Revisiting the design of spatial attention in vision transformers, NeurIPS21 GitHub stars
  2. Conditional Positional Encodings for Vision Transformers, ICLR23 GitHub stars
  3. VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks, ECCV24 GitHub stars
  4. GPG: A simple and strong reinforcement learning baseline for model reasoning, ICLR26 GitHub stars
  5. MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices GitHub stars
  6. MobileVLM V2: Faster and Stronger Baseline for Vision Language Model GitHub stars
  7. FairNAS: Rethinking evaluation fairness of weight sharing neural architecture search, ICCV21 GitHub stars
  8. Fair DARTS: Eliminating unfair advantages in differentiable architecture search, ECCV20 GitHub stars
  9. DARTS-: Robustly stepping out of performance collapse without indicators, ICLR21 GitHub stars
  10. ROME: Robustifying memory-efficient NAS via topology disentanglement and gradients accumulation, ICCV23
  11. Make RepVGG Greater Again: A Quantization-aware Approach, AAAI24 GitHub stars
  12. MixPATH: A unified approach for one-shot neural architecture search, ICCV23 GitHub stars
  13. USP: Unified self-supervised pretraining for image generation and understanding, ICCV25 GitHub stars
  14. Noisy differentiable architecture search, BMVC21 GitHub stars
  15. A Unified Mixture-View Framework for Unsupervised Representation Learning, BMVC22
  16. Multi-objective reinforced evolution in mobile neural architecture search, ECCVW2020
  17. Fast, accurate and lightweight super-resolution with neural architecture search, ICPR20 GitHub stars
  18. MoGA: Searching beyond MobileNetV3, ICASSP2020 GitHub stars
  19. Scarlet-NAS: Bridging the gap between stability and scalability in weight-sharing NAS, ICCVW21 GitHub stars
  20. Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning
  21. Improved crowding distance for NSGA-II
  22. Policy optimization with penalized point probability distance: An alternative to PPO

Collaborative Papers

  1. Code2World: A GUI World Model via Renderable Code Generation GitHub stars
  2. Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation, ICLR26 GitHub stars
  3. Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models, ICLR26 GitHub stars
  4. There is No VAE: End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-training, ICLR26 GitHub stars
  5. Video-star: Reinforcing open-vocabulary action recognition with tools, ICLR26
  6. Tree search for LLM agent reinforcement learning, ICLR26 GitHub stars
  7. AutoDrive-R: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving, ICLR26
  8. S-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models, ICLR26 GitHub stars
  9. Narrlv: Towards a comprehensive narrative-centric evaluation for long video generation models, ICLR26 GitHub stars
  10. Ranking-aware Reinforcement Learning for Ordinal Ranking, ICASSP26
  11. Latent Temporal Discrepancy as Motion Prior: A Loss-Weighting Strategy for Dynamic Fidelity in T2V, ICASSP26
  12. Artifact-Aware Evaluation for High-Quality Video Generation, ICASSP26
  13. TEXTS-Diff: TEXTS-Aware Diffusion Model for Real-World Text Image Super-Resolution, ICASSP26
  14. Urban Socio-Semantic Segmentation with Vision-Language Reasoning GitHub stars
  15. Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
  16. Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning
  17. Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
  18. Eevee: Towards Close-up High-resolution Video-based Virtual Try-on
  19. Semantic Context Matters: Improving Conditioning for Autoregressive Models
  20. Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning, AAAI26
  21. Scalar: Scale-wise controllable visual autoregressive learning, AAAI26 GitHub stars
  22. Omni-effects: Unified and spatially-controllable visual effects generation, AAAI26 GitHub stars
  23. AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting, AAAI26
  24. ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints, AAAI26 GitHub stars
  25. Intsr: An integrated generative framework for search and recommendation
  26. From editor to dense geometry estimator
  27. Ragsr: Regional attention guided diffusion for image super-resolution
  28. Comprehensive Comparison Network: a framework for locality-aware, routes-comparable and interpretable route recommendation
  29. Univg-r1: Reasoning guided universal visual grounding with reinforcement learning
  30. Effective Probabilistic Time Series Forecasting with Fourier Adaptive Noise-Separated Diffusion
  31. Flux-text: A simple and advanced diffusion transformer baseline for scene text editing
  32. Position bias mitigates position bias: Mitigate position bias through inter-position knowledge distillation, EMNLP25 oral GitHub stars
  33. HS-STAR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation, EMNLP25 oral GitHub stars
  34. UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement, ICCV25 GitHub stars
  35. VMBench: A Benchmark for Perception-Aligned Video Motion Generation, ICCV25 GitHub stars
  36. LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling, ICCV25 GitHub stars
  37. FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos, ACM MM25 GitHub stars
  38. Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition, ECCV24
  39. Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness, ICML24
  40. PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution, CVPR24
  41. LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection, ICLR24
  42. Norm Tweaking: High-performance Low-bit Quantization of Large Language Models, AAAI24
  43. YOLOv6: A single-stage object detection framework for industrial applications, arXiv GitHub stars
  44. A Speed Odyssey for Deployable Quantization of LLMs
  45. FPTQ: Fine-grained Post-Training Quantization for Large Language Models
  46. Lenna: Language Enhanced Reasoning Detection Assistant, ICASSP25 GitHub stars
  47. SCTNet: Single Branch CNN with Transformer Semantic Information for Real-time Segmentation, AAAI24
  48. PromptDet: Towards open-vocabulary detection using uncurated images, ECCV22
  49. SegViT: Semantic segmentation with plain vision transformers, NeurIPS22
  50. Fully convolutional one-stage 3D object detection on LiDAR range images, NeurIPS22
  51. Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation, CVPR22
  52. AeDet: Azimuth-invariant multi-view 3D object detection, CVPR23
  53. EAPruning: Evolutionary Pruning for Vision Transformers and CNNs, BMVC22
  54. AutoKWS: Keyword Spotting with Differentiable Architecture Search, ICASSP21
  55. Neural Architecture Search on Acoustic Scene Classification, InterSpeech20
  56. Accurate and efficient single image super-resolution with matrix channel attention network, ACCV20
  57. STRETCH meat grinder with ICCOS, IEEE Transactions on Plasma Science
  58. Comparisons of three inductive pulse power supplies, IEEE Transactions on Plasma Science
  59. FastPillars: A Deployment-friendly Pillar-based 3D Detector, IEEE TCSVT
  60. Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model
  61. Preference Alignment for Diffusion Model via Explicit Denoised Distribution Estimation
  62. Mmgenbench: Evaluating the limits of LMMs from the text-to-image generation perspective
  63. FlowDreamer: exploring high fidelity text-to-3D generation via rectified flow
  64. PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus, CVPR25
  65. Adafedfr: Federated face recognition with adaptive inter-class representation learning
  66. DSFNet: Learning Disentangled Scenario Factorization for Multi-Scenario Route Ranking, WWW25
  67. Masked Autoencoders Are Robust Neural Architecture Search Learners
  68. Efficientrep: an efficient repvgg-style convnets with hardware-aware neural network design
  69. Yolov6 v3.0: A full-scale reloading
  70. DAAS: Differentiable architecture and augmentation policy search
  71. Cctrans: Simplifying and improving crowd counting with transformer