Reducing cache hierarchy energy consumption by predicting forwarding and disabling associative sets
dc.contributor.author | Carazo, Pablo | |
dc.contributor.author | Apollini, Ruben | |
dc.contributor.author | Castro Rodríguez, Fernando | |
dc.contributor.author | Chaver Martínez, Daniel Ángel | |
dc.contributor.author | Piñuel Moreno, Luis | |
dc.contributor.author | Tirado Fernández, José Francisco | |
dc.date.accessioned | 2023-06-20T03:48:07Z | |
dc.date.available | 2023-06-20T03:48:07Z | |
dc.date.issued | 2012-11 | |
dc.description | This work has been supported in part by the Spanish government through the research contract CICYT-TIN 2008/508, TIN2012-32180, Consolider Ingenio-2010 CSD2007-0050 and the HIPEAC-3 European Network of Excellence | |
dc.description.abstract | The first level data cache in modern processors has become a major consumer of energy due to its increasing size and high frequency access rate. In order to reduce this high energy consumption, we propose in this paper a straightforward filtering technique based on a highly accurate forwarding predictor. Specifically, a simple structure predicts whether a load instruction will obtain its corresponding data via forwarding from the load-store structure - thus avoiding the data cache access - or if it will be provided by the data cache. This mechanism manages to reduce the data cache energy consumption by an average of 21.5% with a negligible performance penalty of less than 0.1%. Furthermore, in this paper we focus on the cache static energy consumption too by disabling a portion of sets of the L2 associative cache. Overall, when merging both proposals, the combined L1 and L2 total energy consumption is reduced by an average of 29.2% with a performance penalty of just 0.25%. | |
dc.description.department | Sección Deptal. de Arquitectura de Computadores y Automática (Físicas) | |
dc.description.faculty | Fac. de Ciencias Físicas | |
dc.description.refereed | TRUE | |
dc.description.sponsorship | Spanish governmen | |
dc.description.sponsorship | HIPEAC-3 European Network of Excellence | |
dc.description.status | pub | |
dc.eprint.id | https://eprints.ucm.es/id/eprint/28484 | |
dc.identifier.doi | 10.1142/S0218126612500570 | |
dc.identifier.issn | 0218-1266 | |
dc.identifier.officialurl | http://dx.doi.org/10.1142/S0218126612500570 | |
dc.identifier.relatedurl | http://oa.upm.es/22360/1/INVE_MEM_2012_152095.pdf | |
dc.identifier.uri | https://hdl.handle.net/20.500.14352/44467 | |
dc.issue.number | 7 | |
dc.journal.title | Journal of Circuits Systems and Computers | |
dc.language.iso | eng | |
dc.publisher | World Scientific Publ co Pte LTD | |
dc.relation.projectID | CICYT-TIN 2008/508 | |
dc.relation.projectID | TIN2012-32180 | |
dc.relation.projectID | CSD2007-0050 | |
dc.rights.accessRights | open access | |
dc.subject.cdu | 004 | |
dc.subject.keyword | Energy consumption | |
dc.subject.keyword | Filtering | |
dc.subject.keyword | Forwarding predictor | |
dc.subject.keyword | Cache hierarchy | |
dc.subject.ucm | Informática (Informática) | |
dc.subject.unesco | 1203.17 Informática | |
dc.title | Reducing cache hierarchy energy consumption by predicting forwarding and disabling associative sets | |
dc.type | journal article | |
dc.volume.number | 21 | |
dcterms.references | 1. F. Bower, D. Sorin and L. Cox, The impact of dynamically heterogeneous multicore processors on thread scheduling, IEEE Micro 28 (2008) 17-25. 2. M. D. Hill and M. R. Marty, Amdahl's law in the multicore era, IEEE Computer 41 (2008) 33-38. 3. J. L. Aragon, J. Gonzalez and A. Gonzalez, Power-aware control speculation through selective throttling, Proc. HPCA (2003), pp. 103-112. 4. J. Dai and L. Wang, Way-tagged cache: An Energy-efficient L2 cache architecture under write-through policy, Proc. Int. Symp. Low Power Electronics and Design (ISPLED) (2009), pp. 159-164. 5. T. V. Kalyan and M. Mutyam, Word-interleaved cache: An energy efficient data cache architecture, Proc. Int. Symp. Low Power Electronics and Design (ISPLED) (2008), pp. 265-270. 6. V. Kontorinis, A. Shayan, D. M. Tullsen and R. Kumar, Reducing peak power with a table-driven adaptive processor core, Proc. i2nd Annual IEEE/ACM International Symp. Microarchitecture (MICRO 42) (2009), pp. 189-200. 7. IBM Home page, Available at http://researcher.ibm.com/view_project.php?id=1515 (accessed January 2012). 8. M. Monchiero, R. Canal and A. Gonzalez, Power/performance/thermal design-space exploration for multicore architectures, IEEE Trans. Parallel Distrib. Syst. 19 (2008) 666-681. 9. Y. Etsion and D. G. Feitelson, LI cache filtering through random selection of memory references, Proc. 16th Int. Conf. Parallel Architecture and Compilation Techniques (PACT '07) (2007), pp. 235-244. 10. D. Nicolaescu, A. Veidenbaum and A. Nicolau, Reducing data cache energy consumption via cached load/store queue, Proc. Int. Symp. Low Power Electronics and Design (2003), pp. 252-257. 11. P. Racunas and Y. N. Patt, Partitioned first-level cache design for clustered microarchitectures, Proc. ICS (2003), pp. 22-31. 12. J. Kin, M. Gupta and W. Mangione-Smith, The filter cache: An energy efficient memory structure, Proc. Micro (1997), pp. 184-193. 13. D. Albonesi, Selective cache ways: On-demand cache resource allocation, J. Instruction-Level Parallelism 2 (2000) 1-6. 14. H. Lee, M. Smelyanskiy, C. Newburn and G. Tyson, Stack value file: Custom microarchitecture for the stack, Proc. HPCA (2001), pp. 5-14. 15. L. Jin and S. Cho, Reducing cache traffic and energy with macro data load, Proc. ISLPED (2006), pp. 147-150. 16. P. Carazo, R. Apolloni, F. Castro, D. Chaver, L. Pinuel and F. Tirado, LI data cache power reduction using a forwarding predictor, Lecture Notes in Computer Science, Vol. 6448, Springer-Verlag, (2011), pp. 116-125. 17. S. Subramaniam and G. Loh, Store vectors for scalable memory dependence prediction and scheduling, Proc. HPCA (2006), pp. 65-76. 18. I. Park, C. Ooi and T. Vijaykumar, Reducing design complexity of the load/store queue, Proc. of Micro (2003), pp. 411-422. 19. M. Powell, S. H. Yang, B. Falsafi, K. Roy and T. N. Vijaykumar, Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories, Proc. Int. Symp. Low Power Electronics and Design (ISPLED), Rapallo, Italy (2000), pp. 90-95. 20. S. Kaxiras, Z. Hu and M. Martonosi, Cache decay: Exploiting generational behavior to reduce cache leakage power, Proc. Int. Symp. Computer Architecture (ISCA) (2001), pp. 240-251. 21. K. Flautner, N. S. Kim, S. Martin, D. Blaauw and T. Mudge, Drowsy caches: Simple techniques for reducing leakage power, Proc. Int. Symp. Computer Architecture (ISCA) (2002), pp. 148-157. 22. F. Castro, D. Chaver, L. Pinuel, M. Prieto, M. Huang and F. Tirado, A load-store queue design based on predictive state filtering, J. Low Power Electronics 2 (2006) 27-36. 23. B. Bloom, Space/time trade-offs in hash coding with allowable errors, Commun. A CM 13 (1970) 422-426. 24. S. McFarling, Combining branch predictors, Technical report tn-36, Western Research Laboratory, Digital Equipment Corporation (1993). 25. S. Sethumadhavan, R. Desikan, D. Burger, C. Moore and S. Keckler, Scalable hardware memory disambiguation for high ILP processors, Proc. of IEEE/ACM International Symposium on Microarchitecture (2003), pp. 399—410, 26. Cacti page at HP labs home page, available at http://www.hpl.hp.com/research/cacti/. 27. M. T. Yourst, PTLsim: A cycle accurate full system x86-64 microarchitectural simulator, Proc. of ISPASS (2007), pp. 23-34. 28. G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker and P. Roussel, The microarchitecture of the Pentium 4, Intel Technol. J. 5 (2001) 1—13. 29. Copenhagen University College of Engineering, The Microarch of Intel and amd cpu's: An optimization guide for assembly programmers and compiler makers (2009). 30. SPEC 2006 Home page, Available at http://www.spec.org/cpu2006 (accessed November 2011). 31. Gprof home page, Available at http://www.cs.utah.edu/dept/old/texinfo/as/gprof.toc. html (accessed November 2011). 32. Simpoint home page, Available at http://cseweb.ucsd.edu/~calder/simpoint/(accessed November 2011). 33. Xen home page, Available at http://www.xen.org (accessed November 2011). 34. A. Gonzalez, F. Latorre and G. Magklis, Processor microarchitecture: An implementation perspective, Synthesis Lectures on Computer Architecture, Vol. 5, Morgan & Claypool Publishers (2010), pp. 1-116. 35. D. Grunwald, A. Klauser, S. Manne and A. Pleszkun, Confidence estimation for speculation control, Proc. of ISC A (1998), pp. 122-131 | |
dspace.entity.type | Publication | |
relation.isAuthorOfPublication | 9aac3e41-2993-45aa-b0e1-7bae1dacd982 | |
relation.isAuthorOfPublication | 6b8b1488-47cc-441e-921b-c1e8042d627c | |
relation.isAuthorOfPublication | 2ce782af-0e05-45eb-b58a-d2efffec6785 | |
relation.isAuthorOfPublication | 1356616c-9e69-4852-8415-62fd0b8e7cfc | |
relation.isAuthorOfPublication.latestForDiscovery | 9aac3e41-2993-45aa-b0e1-7bae1dacd982 |
Download
Original bundle
1 - 1 of 1