Ji Shiyu

CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis

arXiv preprint arXiv:2508.02322, 2025.

Xu, Yuzhuang and Han, Xu and Zhang, Yuanchi and Wang, Yixuan and Liu, Yijun and Ji, Shiyu and Zhu, Qingfu and Che, Wanxiang

CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis

arXiv preprint arXiv:2508.02322, 2025.

Xu, Yuzhuang and Han, Xu and Zhang, Yuanchi and Wang, Yixuan and Liu, Yijun and Ji, Shiyu and Zhu, Qingfu and Che, Wanxiang

Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction

arXiv preprint arXiv:2509.10798, 2025.

Liu, Yijun and Wang, Yixuan and Xu, Yuzhuang and Ji, Shiyu and Xu, Yang and Zhu, Qingfu and Che, Wanxiang

Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction

arXiv preprint arXiv:2509.10798, 2025.

Liu, Yijun and Wang, Yixuan and Xu, Yuzhuang and Ji, Shiyu and Xu, Yang and Zhu, Qingfu and Che, Wanxiang

Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 34146--34162, 2025.

Wang, Yixuan and Ji, Shiyu and Liu, Yijun and Xu, Yuzhuang and Xu, Yang and Zhu, Qingfu and Che, Wanxiang

Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 34146--34162, 2025.

Wang, Yixuan and Ji, Shiyu and Liu, Yijun and Xu, Yuzhuang and Xu, Yang and Zhu, Qingfu and Che, Wanxiang

CRVQ: Channel-Relaxed Vector Quantization for Extreme Compression of LLMs

arXiv preprint arXiv:2412.09282, 2024.

Xu, Yuzhuang and Ji, Shiyu and Zhu, Qingfu and Che, Wanxiang

CRVQ: Channel-Relaxed Vector Quantization for Extreme Compression of LLMs

arXiv preprint arXiv:2412.09282, 2024.

Xu, Yuzhuang and Ji, Shiyu and Zhu, Qingfu and Che, Wanxiang