Reducing cache hierarchy energy consumption by predicting forwarding and disabling associative sets

Carazo, Pablo; Apollini, Ruben; Castro Rodríguez, Fernando; Chaver Martínez, Daniel Ángel; Piñuel Moreno, Luis; Tirado Fernández, José Francisco

doi:10.1142/S0218126612500570

Reducing cache hierarchy energy consumption by predicting forwarding and disabling associative sets

dc.contributor.author	Carazo, Pablo
dc.contributor.author	Apollini, Ruben
dc.contributor.author	Castro Rodríguez, Fernando
dc.contributor.author	Chaver Martínez, Daniel Ángel
dc.contributor.author	Piñuel Moreno, Luis
dc.contributor.author	Tirado Fernández, José Francisco
dc.date.accessioned	2023-06-20T03:48:07Z
dc.date.available	2023-06-20T03:48:07Z
dc.date.issued	2012-11
dc.description	This work has been supported in part by the Spanish government through the research contract CICYT-TIN 2008/508, TIN2012-32180, Consolider Ingenio-2010 CSD2007-0050 and the HIPEAC-3 European Network of Excellence
dc.description.abstract	The first level data cache in modern processors has become a major consumer of energy due to its increasing size and high frequency access rate. In order to reduce this high energy consumption, we propose in this paper a straightforward filtering technique based on a highly accurate forwarding predictor. Specifically, a simple structure predicts whether a load instruction will obtain its corresponding data via forwarding from the load-store structure - thus avoiding the data cache access - or if it will be provided by the data cache. This mechanism manages to reduce the data cache energy consumption by an average of 21.5% with a negligible performance penalty of less than 0.1%. Furthermore, in this paper we focus on the cache static energy consumption too by disabling a portion of sets of the L2 associative cache. Overall, when merging both proposals, the combined L1 and L2 total energy consumption is reduced by an average of 29.2% with a performance penalty of just 0.25%.
dc.description.department	Sección Deptal. de Arquitectura de Computadores y Automática (Físicas)
dc.description.faculty	Fac. de Ciencias Físicas
dc.description.refereed	TRUE
dc.description.sponsorship	Spanish governmen
dc.description.sponsorship	HIPEAC-3 European Network of Excellence
dc.description.status	pub
dc.eprint.id	https://eprints.ucm.es/id/eprint/28484
dc.identifier.doi	10.1142/S0218126612500570
dc.identifier.issn	0218-1266
dc.identifier.officialurl	http://dx.doi.org/10.1142/S0218126612500570
dc.identifier.relatedurl	http://oa.upm.es/22360/1/INVE_MEM_2012_152095.pdf
dc.identifier.uri	https://hdl.handle.net/20.500.14352/44467
dc.issue.number	7
dc.journal.title	Journal of Circuits Systems and Computers
dc.language.iso	eng
dc.publisher	World Scientific Publ co Pte LTD
dc.relation.projectID	CICYT-TIN 2008/508
dc.relation.projectID	TIN2012-32180
dc.relation.projectID	CSD2007-0050
dc.rights.accessRights	open access
dc.subject.cdu	004
dc.subject.keyword	Energy consumption
dc.subject.keyword	Filtering
dc.subject.keyword	Forwarding predictor
dc.subject.keyword	Cache hierarchy
dc.subject.ucm	Informática (Informática)
dc.subject.unesco	1203.17 Informática
dc.title	Reducing cache hierarchy energy consumption by predicting forwarding and disabling associative sets
dc.type	journal article
dc.volume.number	21
dcterms.references	1. F. Bower, D. Sorin and L. Cox, The impact of dynamically heterogeneous multicore processors on thread scheduling, IEEE Micro 28 (2008) 17-25. 2. M. D. Hill and M. R. Marty, Amdahl's law in the multicore era, IEEE Computer 41 (2008) 33-38. 3. J. L. Aragon, J. Gonzalez and A. Gonzalez, Power-aware control speculation through selective throttling, Proc. HPCA (2003), pp. 103-112. 4. J. Dai and L. Wang, Way-tagged cache: An Energy-efficient L2 cache architecture under write-through policy, Proc. Int. Symp. Low Power Electronics and Design (ISPLED) (2009), pp. 159-164. 5. T. V. Kalyan and M. Mutyam, Word-interleaved cache: An energy efficient data cache architecture, Proc. Int. Symp. Low Power Electronics and Design (ISPLED) (2008), pp. 265-270. 6. V. Kontorinis, A. Shayan, D. M. Tullsen and R. Kumar, Reducing peak power with a table-driven adaptive processor core, Proc. i2nd Annual IEEE/ACM International Symp. Microarchitecture (MICRO 42) (2009), pp. 189-200. 7. IBM Home page, Available at http://researcher.ibm.com/view_project.php?id=1515 (accessed January 2012). 8. M. Monchiero, R. Canal and A. Gonzalez, Power/performance/thermal design-space exploration for multicore architectures, IEEE Trans. Parallel Distrib. Syst. 19 (2008) 666-681. 9. Y. Etsion and D. G. Feitelson, LI cache filtering through random selection of memory references, Proc. 16th Int. Conf. Parallel Architecture and Compilation Techniques (PACT '07) (2007), pp. 235-244. 10. D. Nicolaescu, A. Veidenbaum and A. Nicolau, Reducing data cache energy consumption via cached load/store queue, Proc. Int. Symp. Low Power Electronics and Design (2003), pp. 252-257. 11. P. Racunas and Y. N. Patt, Partitioned first-level cache design for clustered microarchitectures, Proc. ICS (2003), pp. 22-31. 12. J. Kin, M. Gupta and W. Mangione-Smith, The filter cache: An energy efficient memory structure, Proc. Micro (1997), pp. 184-193. 13. D. Albonesi, Selective cache ways: On-demand cache resource allocation, J. Instruction-Level Parallelism 2 (2000) 1-6. 14. H. Lee, M. Smelyanskiy, C. Newburn and G. Tyson, Stack value file: Custom microarchitecture for the stack, Proc. HPCA (2001), pp. 5-14. 15. L. Jin and S. Cho, Reducing cache traffic and energy with macro data load, Proc. ISLPED (2006), pp. 147-150. 16. P. Carazo, R. Apolloni, F. Castro, D. Chaver, L. Pinuel and F. Tirado, LI data cache power reduction using a forwarding predictor, Lecture Notes in Computer Science, Vol. 6448, Springer-Verlag, (2011), pp. 116-125. 17. S. Subramaniam and G. Loh, Store vectors for scalable memory dependence prediction and scheduling, Proc. HPCA (2006), pp. 65-76. 18. I. Park, C. Ooi and T. Vijaykumar, Reducing design complexity of the load/store queue, Proc. of Micro (2003), pp. 411-422. 19. M. Powell, S. H. Yang, B. Falsafi, K. Roy and T. N. Vijaykumar, Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories, Proc. Int. Symp. Low Power Electronics and Design (ISPLED), Rapallo, Italy (2000), pp. 90-95. 20. S. Kaxiras, Z. Hu and M. Martonosi, Cache decay: Exploiting generational behavior to reduce cache leakage power, Proc. Int. Symp. Computer Architecture (ISCA) (2001), pp. 240-251. 21. K. Flautner, N. S. Kim, S. Martin, D. Blaauw and T. Mudge, Drowsy caches: Simple techniques for reducing leakage power, Proc. Int. Symp. Computer Architecture (ISCA) (2002), pp. 148-157. 22. F. Castro, D. Chaver, L. Pinuel, M. Prieto, M. Huang and F. Tirado, A load-store queue design based on predictive state filtering, J. Low Power Electronics 2 (2006) 27-36. 23. B. Bloom, Space/time trade-offs in hash coding with allowable errors, Commun. A CM 13 (1970) 422-426. 24. S. McFarling, Combining branch predictors, Technical report tn-36, Western Research Laboratory, Digital Equipment Corporation (1993). 25. S. Sethumadhavan, R. Desikan, D. Burger, C. Moore and S. Keckler, Scalable hardware memory disambiguation for high ILP processors, Proc. of IEEE/ACM International Symposium on Microarchitecture (2003), pp. 399—410, 26. Cacti page at HP labs home page, available at http://www.hpl.hp.com/research/cacti/. 27. M. T. Yourst, PTLsim: A cycle accurate full system x86-64 microarchitectural simulator, Proc. of ISPASS (2007), pp. 23-34. 28. G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker and P. Roussel, The microarchitecture of the Pentium 4, Intel Technol. J. 5 (2001) 1—13. 29. Copenhagen University College of Engineering, The Microarch of Intel and amd cpu's: An optimization guide for assembly programmers and compiler makers (2009). 30. SPEC 2006 Home page, Available at http://www.spec.org/cpu2006 (accessed November 2011). 31. Gprof home page, Available at http://www.cs.utah.edu/dept/old/texinfo/as/gprof.toc. html (accessed November 2011). 32. Simpoint home page, Available at http://cseweb.ucsd.edu/~calder/simpoint/(accessed November 2011). 33. Xen home page, Available at http://www.xen.org (accessed November 2011). 34. A. Gonzalez, F. Latorre and G. Magklis, Processor microarchitecture: An implementation perspective, Synthesis Lectures on Computer Architecture, Vol. 5, Morgan & Claypool Publishers (2010), pp. 1-116. 35. D. Grunwald, A. Klauser, S. Manne and A. Pleszkun, Confidence estimation for speculation control, Proc. of ISC A (1998), pp. 122-131
dspace.entity.type	Publication
relation.isAuthorOfPublication	9aac3e41-2993-45aa-b0e1-7bae1dacd982
relation.isAuthorOfPublication	6b8b1488-47cc-441e-921b-c1e8042d627c
relation.isAuthorOfPublication	2ce782af-0e05-45eb-b58a-d2efffec6785
relation.isAuthorOfPublication	1356616c-9e69-4852-8415-62fd0b8e7cfc
relation.isAuthorOfPublication.latestForDiscovery	9aac3e41-2993-45aa-b0e1-7bae1dacd982

Download

Original bundle

Now showing 1 - 1 of 1

Name:: piñuel02preprint.pdf
Size:: 6.21 MB
Format:: Adobe Portable Document Format

Download

Collections

Artículos