TY - GEN
T1 - An Improved Scheme of Victim Replication in Tiled Chip Multiprocessors
AU - Wu, Qianqian
AU - Ji, Zhenzhou
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/8
Y1 - 2019/8
N2 - The last level cache (LLC) in the shared configuration increases the effective cache capacity by not allowing replication but causes a long on-chip access latency when the data is on a remote tile. The previously proposed victim replication scheme allowed replicating victims evicted from L1 to its local LLC slice in order to reduce the on-chip access latency of subsequent L1 misses. However, this proposal loses sight of the impact of locality in all levels of cache and the L1 victim re-reference interval on replication and replicates lots of useless replicas, which results in limited performance improvements. In this paper, we propose a novel victim selective replication scheme based on L1 temporal locality, LLC reuse locality and L1 victim re-reference interval (VSR-TRV). We selectively replicate a victim that is detected as a reuse in a short L1 victim re-reference interval or recognized as the first time LLC access but only receives one L1 access, and filter out the replication of a victim that is recognized as the first time LLC access. The experimental results show that VSR-TRV can improve performance by 4.67% on average and by 11.22% at best over the previously proposed VR scheme. In addition, our proposal only incurs 1.48% storage overhead compared to that of the baseline system.
AB - The last level cache (LLC) in the shared configuration increases the effective cache capacity by not allowing replication but causes a long on-chip access latency when the data is on a remote tile. The previously proposed victim replication scheme allowed replicating victims evicted from L1 to its local LLC slice in order to reduce the on-chip access latency of subsequent L1 misses. However, this proposal loses sight of the impact of locality in all levels of cache and the L1 victim re-reference interval on replication and replicates lots of useless replicas, which results in limited performance improvements. In this paper, we propose a novel victim selective replication scheme based on L1 temporal locality, LLC reuse locality and L1 victim re-reference interval (VSR-TRV). We selectively replicate a victim that is detected as a reuse in a short L1 victim re-reference interval or recognized as the first time LLC access but only receives one L1 access, and filter out the replication of a victim that is recognized as the first time LLC access. The experimental results show that VSR-TRV can improve performance by 4.67% on average and by 11.22% at best over the previously proposed VR scheme. In addition, our proposal only incurs 1.48% storage overhead compared to that of the baseline system.
KW - chip multiprocessors (CMPs)
KW - reuse locality
KW - shared last level caches (SLLC)
KW - temporal locality
KW - victim re-reference interval
KW - victim replication
UR - https://www.scopus.com/pages/publications/85073161586
U2 - 10.1109/ICCSD.2019.8842919
DO - 10.1109/ICCSD.2019.8842919
M3 - 会议稿件
AN - SCOPUS:85073161586
T3 - 2019 IEEE 3rd International Conference on Circuits, Systems and Devices, ICCSD 2019
SP - 16
EP - 20
BT - 2019 IEEE 3rd International Conference on Circuits, Systems and Devices, ICCSD 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd IEEE International Conference on Circuits, Systems and Devices, ICCSD 2019
Y2 - 23 August 2019 through 25 August 2019
ER -