Commercial-Off-the-shelf Hardware Transactional Memory for Tolerating Transient Hardware Errors

Participant: Rasha Faqeh  

Home Institution: 
Technische Universitat Dresden 

Home Country: Germany

Host: Prof. Osman Unsal

Host Institution: Barcelona Supercomputing Center 
Host Country: Spain

Start Date:

End Date: 2014-11-21

The Technische Universitat Dresden (TUD) and the Barcelona Supercomputing Center (BSC) intend to collaborate to build a cache extension that implements reliable targeted HTM. The idea is to extend the processor’s L1 data cache to maintain the written addresses locally at the granularity of a cache line. Any modification to the memory will take place in the cache and not in the memory during the speculative execution of the transaction. So, the cache acts as a write-back log. However, due to time limitations and the design similarities with already exist commercial-off-the-shelf hardware in particular Intel Haswell TSX ), STSM plan was modified. So, instead of building the HTM hardware from scratch, we redirect our investigation to study to what extent transient fault recovery can be implemented leveraging the abort mech- anism of commodity hardware transactional memory (HTM) (i.e. Intel Haswell TSX, IBM Power8).

