Aviso: para depositar documentos, por favor, inicia sesión e identifícate con tu cuenta de correo institucional de la UCM con el botón MI CUENTA UCM. No emplees la opción AUTENTICACIÓN CON CONTRASEÑA
 

Reducing cache hierarchy energy consumption by predicting forwarding and disabling associative sets

dc.contributor.authorCarazo, Pablo
dc.contributor.authorApollini, Ruben
dc.contributor.authorCastro Rodríguez, Fernando
dc.contributor.authorChaver Martínez, Daniel Ángel
dc.contributor.authorPiñuel Moreno, Luis
dc.contributor.authorTirado Fernández, José Francisco
dc.date.accessioned2023-06-20T03:48:07Z
dc.date.available2023-06-20T03:48:07Z
dc.date.issued2012-11
dc.descriptionThis work has been supported in part by the Spanish government through the research contract CICYT-TIN 2008/508, TIN2012-32180, Consolider Ingenio-2010 CSD2007-0050 and the HIPEAC-3 European Network of Excellence
dc.description.abstractThe first level data cache in modern processors has become a major consumer of energy due to its increasing size and high frequency access rate. In order to reduce this high energy consumption, we propose in this paper a straightforward filtering technique based on a highly accurate forwarding predictor. Specifically, a simple structure predicts whether a load instruction will obtain its corresponding data via forwarding from the load-store structure - thus avoiding the data cache access - or if it will be provided by the data cache. This mechanism manages to reduce the data cache energy consumption by an average of 21.5% with a negligible performance penalty of less than 0.1%. Furthermore, in this paper we focus on the cache static energy consumption too by disabling a portion of sets of the L2 associative cache. Overall, when merging both proposals, the combined L1 and L2 total energy consumption is reduced by an average of 29.2% with a performance penalty of just 0.25%.
dc.description.departmentSección Deptal. de Arquitectura de Computadores y Automática (Físicas)
dc.description.facultyFac. de Ciencias Físicas
dc.description.refereedTRUE
dc.description.sponsorshipSpanish governmen
dc.description.sponsorshipHIPEAC-3 European Network of Excellence
dc.description.statuspub
dc.eprint.idhttps://eprints.ucm.es/id/eprint/28484
dc.identifier.doi10.1142/S0218126612500570
dc.identifier.issn0218-1266
dc.identifier.officialurlhttp://dx.doi.org/10.1142/S0218126612500570
dc.identifier.relatedurlhttp://oa.upm.es/22360/1/INVE_MEM_2012_152095.pdf
dc.identifier.urihttps://hdl.handle.net/20.500.14352/44467
dc.issue.number7
dc.journal.titleJournal of Circuits Systems and Computers
dc.language.isoeng
dc.publisherWorld Scientific Publ co Pte LTD
dc.relation.projectIDCICYT-TIN 2008/508
dc.relation.projectIDTIN2012-32180
dc.relation.projectIDCSD2007-0050
dc.rights.accessRightsopen access
dc.subject.cdu004
dc.subject.keywordEnergy consumption
dc.subject.keywordFiltering
dc.subject.keywordForwarding predictor
dc.subject.keywordCache hierarchy
dc.subject.ucmInformática (Informática)
dc.subject.unesco1203.17 Informática
dc.titleReducing cache hierarchy energy consumption by predicting forwarding and disabling associative sets
dc.typejournal article
dc.volume.number21
dcterms.references1. F. Bower, D. Sorin and L. Cox, The impact of dynamically heterogeneous multicore processors on thread scheduling, IEEE Micro 28 (2008) 17-25. 2. M. D. Hill and M. R. Marty, Amdahl's law in the multicore era, IEEE Computer 41 (2008) 33-38. 3. J. L. Aragon, J. Gonzalez and A. Gonzalez, Power-aware control speculation through selective throttling, Proc. HPCA (2003), pp. 103-112. 4. J. Dai and L. Wang, Way-tagged cache: An Energy-efficient L2 cache architecture under write-through policy, Proc. Int. Symp. Low Power Electronics and Design (ISPLED) (2009), pp. 159-164. 5. T. V. Kalyan and M. Mutyam, Word-interleaved cache: An energy efficient data cache architecture, Proc. Int. Symp. Low Power Electronics and Design (ISPLED) (2008), pp. 265-270. 6. V. Kontorinis, A. Shayan, D. M. Tullsen and R. Kumar, Reducing peak power with a table-driven adaptive processor core, Proc. i2nd Annual IEEE/ACM International Symp. Microarchitecture (MICRO 42) (2009), pp. 189-200. 7. IBM Home page, Available at http://researcher.ibm.com/view_project.php?id=1515 (accessed January 2012). 8. M. Monchiero, R. Canal and A. Gonzalez, Power/performance/thermal design-space exploration for multicore architectures, IEEE Trans. Parallel Distrib. Syst. 19 (2008) 666-681. 9. Y. Etsion and D. G. Feitelson, LI cache filtering through random selection of memory references, Proc. 16th Int. Conf. Parallel Architecture and Compilation Techniques (PACT '07) (2007), pp. 235-244. 10. D. Nicolaescu, A. Veidenbaum and A. Nicolau, Reducing data cache energy consumption via cached load/store queue, Proc. Int. Symp. Low Power Electronics and Design (2003), pp. 252-257. 11. P. Racunas and Y. N. Patt, Partitioned first-level cache design for clustered microarchitectures, Proc. ICS (2003), pp. 22-31. 12. J. Kin, M. Gupta and W. Mangione-Smith, The filter cache: An energy efficient memory structure, Proc. Micro (1997), pp. 184-193. 13. D. Albonesi, Selective cache ways: On-demand cache resource allocation, J. Instruction-Level Parallelism 2 (2000) 1-6. 14. H. Lee, M. Smelyanskiy, C. Newburn and G. Tyson, Stack value file: Custom microarchitecture for the stack, Proc. HPCA (2001), pp. 5-14. 15. L. Jin and S. Cho, Reducing cache traffic and energy with macro data load, Proc. ISLPED (2006), pp. 147-150. 16. P. Carazo, R. Apolloni, F. Castro, D. Chaver, L. Pinuel and F. Tirado, LI data cache power reduction using a forwarding predictor, Lecture Notes in Computer Science, Vol. 6448, Springer-Verlag, (2011), pp. 116-125. 17. S. Subramaniam and G. Loh, Store vectors for scalable memory dependence prediction and scheduling, Proc. HPCA (2006), pp. 65-76. 18. I. Park, C. Ooi and T. Vijaykumar, Reducing design complexity of the load/store queue, Proc. of Micro (2003), pp. 411-422. 19. M. Powell, S. H. Yang, B. Falsafi, K. Roy and T. N. Vijaykumar, Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories, Proc. Int. Symp. Low Power Electronics and Design (ISPLED), Rapallo, Italy (2000), pp. 90-95. 20. S. Kaxiras, Z. Hu and M. Martonosi, Cache decay: Exploiting generational behavior to reduce cache leakage power, Proc. Int. Symp. Computer Architecture (ISCA) (2001), pp. 240-251. 21. K. Flautner, N. S. Kim, S. Martin, D. Blaauw and T. Mudge, Drowsy caches: Simple techniques for reducing leakage power, Proc. Int. Symp. Computer Architecture (ISCA) (2002), pp. 148-157. 22. F. Castro, D. Chaver, L. Pinuel, M. Prieto, M. Huang and F. Tirado, A load-store queue design based on predictive state filtering, J. Low Power Electronics 2 (2006) 27-36. 23. B. Bloom, Space/time trade-offs in hash coding with allowable errors, Commun. A CM 13 (1970) 422-426. 24. S. McFarling, Combining branch predictors, Technical report tn-36, Western Research Laboratory, Digital Equipment Corporation (1993). 25. S. Sethumadhavan, R. Desikan, D. Burger, C. Moore and S. Keckler, Scalable hardware memory disambiguation for high ILP processors, Proc. of IEEE/ACM International Symposium on Microarchitecture (2003), pp. 399—410, 26. Cacti page at HP labs home page, available at http://www.hpl.hp.com/research/cacti/. 27. M. T. Yourst, PTLsim: A cycle accurate full system x86-64 microarchitectural simulator, Proc. of ISPASS (2007), pp. 23-34. 28. G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker and P. Roussel, The microarchitecture of the Pentium 4, Intel Technol. J. 5 (2001) 1—13. 29. Copenhagen University College of Engineering, The Microarch of Intel and amd cpu's: An optimization guide for assembly programmers and compiler makers (2009). 30. SPEC 2006 Home page, Available at http://www.spec.org/cpu2006 (accessed November 2011). 31. Gprof home page, Available at http://www.cs.utah.edu/dept/old/texinfo/as/gprof.toc. html (accessed November 2011). 32. Simpoint home page, Available at http://cseweb.ucsd.edu/~calder/simpoint/(accessed November 2011). 33. Xen home page, Available at http://www.xen.org (accessed November 2011). 34. A. Gonzalez, F. Latorre and G. Magklis, Processor microarchitecture: An implementation perspective, Synthesis Lectures on Computer Architecture, Vol. 5, Morgan & Claypool Publishers (2010), pp. 1-116. 35. D. Grunwald, A. Klauser, S. Manne and A. Pleszkun, Confidence estimation for speculation control, Proc. of ISC A (1998), pp. 122-131
dspace.entity.typePublication
relation.isAuthorOfPublication9aac3e41-2993-45aa-b0e1-7bae1dacd982
relation.isAuthorOfPublication6b8b1488-47cc-441e-921b-c1e8042d627c
relation.isAuthorOfPublication2ce782af-0e05-45eb-b58a-d2efffec6785
relation.isAuthorOfPublication1356616c-9e69-4852-8415-62fd0b8e7cfc
relation.isAuthorOfPublication.latestForDiscovery9aac3e41-2993-45aa-b0e1-7bae1dacd982

Download

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
piñuel02preprint.pdf
Size:
6.21 MB
Format:
Adobe Portable Document Format

Collections