It is eighteen years since Maurice Herlihy & Eliot Moss's first paper about transactional memory (TM). Those 18 years have seen a massive amount of work on the subject: over five hundred papers covering different kinds of TM system, different kinds of language abstraction built over TM, and different kinds of programming problem being tackled using TM. In this short talk I'll highlight some of these differences, and argue that we should be more explicit about exactly which problems a TM system is, and is not, intended to solve.
This talk will address the question of whether there are still interesting research problems around transactional memory.
JPaxos and Paxos STM
In the talk, I briefly describe two tools that we develop for implementing robust network services in Java. They include JPaxos -- an implementation of the State Machine Replication using the Lamport's MultiPaxos protocol and Paxos STM -- an object-oriented distributed transactional memory (which is based on JPaxos). In the talk I briefly describe the architecture of these systems and some evaluation results. Our research within this project spans a range of topics, including distributed algorithms, semantics, and static analysis.
On the Cost of Concurrency in Transactional Memory
The crux of software transactional memory (STM) is to combine an
easy-to-use programming interface with an eﬃcient utilization of the
concurrent computing abilities provided by modern machines. But does
this combination come with an inherent cost?
We evaluate the cost of
concurrency by measuring the amount of expensive synchronization that
must be employed in an STM implementation that ensures positive
concurrency, i.e., allows for concurrent transaction processing in some
executions. We focus on two popular progress conditions that provide
positive concurrency: progressiveness and permissiveness.
that in permissive STMs, arguably providing the maximal degree of
concurrency, the number of expensive synchronization patterns needed to
implement a transaction is proportional to the size of the transaction's
data set. In contrast, we show that progressive STMs that only
guarantee minimal positive concurrency can be implemented using just a
constant amount of synchronization. Then we show that even in
progressive STMs, however, a transaction has to "protect" (e.g., by
using locks or strong synchronization primitives) a linear amount of
data with respect to its write-set size.
Relaxed transactional access models
memory (TM) has shown potential to simplify the task of writing
concurrent programs. Recent studies have shown that the conflict
detection mechanisms for TM do not allow the performance of TM to scale
to that of fine-grained locking in concurrent data structures. This has
led researchers to relax the conflict detection for transactions, using
mechanisms like early release and elastic transactions. We formalize the
notion of relaxed transactional access models and their desired
properties. We introduce a new transactional access model, peek-quake
that allows two new commands, peeks (a lighter form of read)
--and its dual quakes, (a heavier form of write)-- for finer conflict
detection in STM. In our augmented model, a normal read conflicts with
every write, while a peek conflicts only with quakes. We compare the
peek-quake access model to existing access models in our formalism. We
use the peek-quake access model to implement concurrent data structures
like linked list based sets and red-black trees.
On Multi-Versioning in Transactional Memory
An effective way to reduce the number of aborts in STM is to keep
multiple versions of transactional objects. We study inherent properties
of STMs that use multiple versions to guarantee successful commits of
all read-only transactions.
We show that some other desirable
properties must be sacrificed if multi-versioning is used to ensure that
all read only transactions commit successfully. In particular, we show
that managing the versions (deciding which versions to keep and which
can be garbage collected) is a major challenge.
We solve this
challenge in a real system implementation -- we build a selective
multi-versioning (SMV) algorithm. We demonstrate that an approach of
keeping constant number of versions wastes too much memory (exponential
memory growth in the worst case) and reaps too little benefits. In
contrast, SMV keeps old object versions as long as they might be useful.
Our algorithm succeeds to do so while keeping read-only transactions
invisible by leveraging automatic garbage collection presented in Java.
This way, SMV achieves much better (up to 11) throughput improvement
over a single-version STM, while operating successfully in systems with
Based on joint work with Dmitri Perelman and Rui Fan.
On Using Separation Logic to Detect Snapshot Isolation Anomalies in Software Transactional Memory
performance issues of software transactional memory are caused by
unnecessary abort situations where non serializable and yet non
conflicting transactions are scheduled to execute concurrently. By
smartly relaxing the isolation properties of transactions, transactional
systems may overcome these performance issues and attain considerable
gains. However, it is known that relaxing isolation restrictions may
lead to execution anomalies causing programs to yield unexpected
results. Some database management systems make that compromise and allow
the option to operate with relaxed isolation mechanisms. In these
cases, developers must accept the fact that they must explicitly deal
with anomalies. In software transactional memory systems, where
transactions protect the volatile state of programs, execution anomalies
may have severe and unforeseen semantic consequences. In this setting,
the balance between a relaxed isolation level and the ability to enforce
the necessary correctness in general purpose language programs is
harder to achieve.
The solution we devise in this work is to
statically analyze programs, using Separation Logic, to detect the kind
of execution anomalies that emerge under snapshot isolation. Our
approach allows a compiler to either warn the developer about the
possible snapshot isolation anomalies in a given program, or to possibly
inform automatic correctness strategies to ensure execution under full
On Proving Isolation Properties for Software Transactional Memory
An algorithm for STM is correct if it guarantees a proclaimed degree of isolation between concurrently executing transactions. A correctness proof requires explicit modeling of the effects of transaction bodies and the non-deterministic scheduling of their operations. We provide a formalization of STM that is explicit about all aspects required for a correctness proof: effects of operations, non-determinism, and modeling rollback. We prove that this algorithm is correct by showing that it implements opacity.
Towards a universal construction for transaction-based multiprocess programs
The aim of a Software Transactional Memory (STM) system is to discharge the programmer from the explicit management of synchronization issues. The programmer's job resides in the design of multiprocess programs in which processes are made up of transactions, each transaction being an atomic execution unit that accesses concurrent objects. The important point is that the programmer has to focus her/his efforts only on the parts of code which have to be atomic execution units without worrying on the way the corresponding synchronization has to be realized. Non-trivial STM systems allow transactions to execute concurrently and rely on the notion of commit/abort of a transaction in order to solve their conflicts on the objects they access simultaneously. In some cases, the management of aborted transactions is left to the programmer. In other cases, the underlying system scheduler is appropriately modified or an underlying contention manager is used in order that each transaction be (``practically always'' or with high probability) eventually committed. This talk will present a deterministic STM system in which (1) every invocation of a transaction is executed exactly once and (2) the notion of commit/abort of a transaction remains unknown to the programmer. This system, which imposes restriction neither on the design of processes nor or their concurrency pattern, can be seen as a step in the design of a deterministic universal construction to execute transaction-based multiprocess programs on top of a multiprocessor. Interestingly, the proposed construction is lock-free (in the sense that it uses no lock).
Talks for WG3
TMBox: A Configurable 16-core Hybrid TM FPGA prototype
In this talk, I will introduce TMBox, our open-source 16-core FPGA hybrid prototype. TMBox is built from a publicly available MIPS R2000 soft-core. We heavily modified this core to make it suitable to function in a multi-processor environment. These modifications include TLB and MMU support, floating-point units, exception support, synchronization primitives: LL/SC, snoopy caches, MSI cache coherence, performance counters and system libraries to support string and I/O operations. We then implemented a per-core hardware TM unit by extending the cache finite state machine, a tx-cache and new TM instruction extensions. Our design closely follows the TinySTM-ASF proposal: I will mainly talk about the choices we took while designing TMBox and conclude with our initial set of results.
The Velox Transactional Memory Stack
The adoption of multi- and many-core architectures for mainstream computing undoubtedly brings profound changes in the way software is developed. In particular, the use of fine grained locking as the multi-core programmer's coordination methodology is considered by more and more experts as a dead-end. The transactional memory (TM) programming paradigm is a strong contender to become the approach of choice for replacing locks and implementing atomic operations in concurrent programming. Combining sequences of concurrent operations into atomic transactions allows a great reduction in the complexity of both programming and verification, by making parts of the code appear to execute sequentially without the need to program using fine-grained locking. Transactions remove from the programmer the burden of figuring out the interaction among concurrent operations that happen to conflict when accessing the same locations in memory. The EU-funded FP7 VELOX project designs, implements and evaluates an integrated TM stack, spanning from programming language to the hardware support, and including runtime and libraries, compilers, and application environments. This presentation will present an overview of the VELOX TM stack and its associated challenges and contributions.
Talks for WG4
MUTS: Native Scala Constructs for Software Transactional Memory
In this paper we argue that the current approaches to imple- menting
transactional memory in Scala, while very clean, adversely affect the
programmability, readability and main- tainability of transactional
code. These problems occur out of a desire to avoid making modifications
to the Scala com- piler. As an alternative we introduce Manchester
University Transactions for Scala (MUTS), which instead adds key- words
to the Scala compiler to allow for the implementation of transactions
through traditional block syntax such as that used in “while”
statements. This allows for transactions that do not require a change of
syntax style and do not restrict their granularity to whole classes or
methods. While imple- menting MUTS does require some changes to the
compiler’s parser, no further changes are required to the compiler. This
is achieved by the parser describing the transactions in terms of
existing constructs of the abstract syntax tree, and the use of Java
Agents to rewrite to resulting class files once the compiler has
Atomicity and Structured Concurrency
Concurrent programming is notoriously difficult. By its nature,
concurrency introduces an entirely new set of possible program behaviors
about which the programmer must reason. Current programming
abstractions have not been adequate to manage this complexity. Threads
with shared state are at present the dominant approach to concurrent
programming. Under this model, the dynamic structure of the concurrent
execution is not at all apparent from the program's lexical structure.
Message passing models alleviate many of these difficulties, but the
global model is still unstructured. Since the program is factored into
individual actors and their interactions, it remains difficult to see
the overall structure of the execution.
The Orc programming language
changes the conceptual model of programming to be inherently concurrent
in a structured way. The concurrent structure of an Orc execution is
evident from the lexical structure of the program, improving on both the
shared state model and the message passing model. Orc uses sites to
communicate with the external world. Sites are numerous and diverse,
they can have nontrivial semantics, and in principle they each represent
some kind of shared resource. Thus, the difficulties inherent in using
shared state still recur frequently even when writing small Orc
At present, Orc programs make use of locks, semaphores, and
other mutual exclusion disciplines to safeguard access to shared state
at sites. This gives rise to all of the difficulties inherent in
locking. Additionally, Orc sites may represent external services, which
may be concurrently interacting with other clients that might not be
aware of the locking discipline. Thus, program-level locks are
insufficient to ensure safety.
The use of language-level
transactions, especially in the context of transactional memory, has
been explored as an alternative to locks. Inspired by this approach, I
propose Orca, an extension of Orc with a new 'atomic' combinator.
'atomic' executes an expression in such a way that: (1) from the
expression's perspective, no concurrent events take place outside of it
during its execution, and (2) from any external execution's perspective,
the expression evaluates in its entirety in one atomic step. I will
show how this notion of atomicity can be formalized simply,
symmetrically, and without appeal to linearization or traces.
'atomic' combinator has two critical characteristics. First, it ensures
atomicity for all events, not just memory accesses; every site called
within 'atomic' is invoked in an atomic context, and each such site must
determine internally how to maintain the atomicity of each such call
with respect to its other clients, including its other external clients.
Second, 'atomic' can be recursively nested under any of the
combinators, such as '|' (parallel evaluation), '<<' (fork-join),
or even 'atomic' itself. This allows unlimited concurrent evaluation
inside of transactions, as well as unbounded nesting of such parallel
I will also briefly discuss the comparative
expressivity of Orca, and the pi-calculus with guarded choice,
introducing a pattern called "atomic choice" using the atomic
Integration of transactional memory support into a data-flow extension of OpenMP
The combination of data-flow programming with transactional memory is
one ambitious challenge addressed by the TERAFLUX European project,
aiming increased expressiveness and performance while preserving the
paradigms' properties. To explore different semantics, compilation and
runtime methods, we extend the OpenMP specification and implement
prototypes based on GCC. With the maturation of these prototypes, time
has come to experiment it together with the emerging transactional
memory branch of GCC. This project combines development and research
components and can be decomposed into 5 phases.
1. Study the
compatibility of the transactional memory and OpenMP constructs in the
transmem development branch of GCC and propose solutions to the possible
2. Study the design and implementation of
both data-flow (streaming) and transmem branches of GCC, interacting
with their maintainers.
3. Address the infrastructure limitations,
software engineering and version management issues related to the
integration of both code bases into a single experimental branch.
Contribute to the design and implementation for a semantics of
transactions nested within OpenMP tasks, currently being defined in the
5. Write and rewrite relevant benchmarks leveraging
the new programming model, performing systematic evaluations and
detailed characterizations of the performance behavior.The first two
phases would be assigned as a research project during Ismail Kuru's
curriculum at T.U. München. While the last three phases would constitute
the matter of his Master thesis work during the Summer, working at
INRIA in Paris.
Programmer and Runtime Policies for Adaptive Many-Core Parallelism
This project investigates a novel extension to the Java programming
language, providing support for flexible parallelism on next-generation
many-core processors. The flexibility of our system is based on: (1)
encoding descriptions of the level of accuracy provided by alternative
implementations of a program module, allowing trade-offs between
computational cost and precision; and (2) enabling the runtime system to
determine adaptively when new parallel threads should be spawned. I
argue that this kind of programming extension fits naturally in a
transactional memory system setting.
Methods and Strategies to Rate and Optimize Transactional Memory Applications
talk will give insights on the goals and ongoing work in the DFG-funded
project ``TM-Opt''. The main goal of this project is to develop
strategies that enable the rating and optimization of transactional
memory applications. The research in this project aims to complement and
enrich ongoing transactional memory research by adopting the view of
the programmer. Through tool support, the project aims to close the gap
between low-level TM research, such as TM models and conflict detection
algorithms, and the high-level view of the programmer.
strategies enable to rate and optimize the run time behavior of TM
applications. As a first step, the run time behavior needs to be
captured and preserved. Second, a post-processing step enables to
analyze and rate the run time behavior. Through a customized analysis,
bottlenecks and repeatedly conflicting transactions are identified. The
TM tools highlight such performance-critical situations and guide the
programmer to solve these.
Crafting a High-Performance Ready-to-Go STM
The goal of this talk is to show that an STM having all the necessary
real-world features with can be built without compromising performance.
For that we will present the following directions: A Java STM based on
state-of-the-art algorithms. Built as an open research platform for new
algorithms and methods. Static analysis optimizations to the platform.
Hand craft most used libraries to be STM Ready. Benchmarks STM with real
world applications. The basis of our research initiative is Deuce, our
novel open-source Java framework for transactional memory. Deuce has
several desired features not found in earlier Java STM frameworks.
Currently there does not exist an efficient Java STM framework that
delivers a full featured STM that can be added to an existing
application without changes to its compiler or libraries. It was not
clear if one could build an efficient Java STM without compiler
support. Deuce is intended to fill this gap. It is non-intrusive in the
sense that no modifications to the Java virtual machine (JVM) or
extensions to the language are necessary.
JVSTM and its applications
The purpose of my presentation is to give an overview of the work that
we have been doing since 2004 in the development of JVSTM, a Java
library that implements a multi-version STM, and our experience in
applying it to various domains. JVSTM was extended to support
persistence and is being used in a real-world large web application
since 2005, being the first known use of an STM in a production
environment. Moreover, we have applied it to several domains, from
automatic memoization, to dynamic software upgrades, passing by
automatic parallelization of programs through thread-level speculation,
which I will discuss briefly.
Talks for WG5
CloudTM project: a distributed transactional memory platform for the Cloud
Cloud Computing has emerged as a new paradigm for deploying, managing
and offering services through a shared infrastructure. The foreseen
benefits of Cloud Computing are very compelling both from a cloud
consumer and from a cloud services provider perspective: freeing
corporations from large IT capital investments via usage-based pricing
schemes; leveraging the economies of scale for both services providers
and users of the cloud; easing the deployment of services.
the main challenges to materialize these perceived benefits is to
identify innovative distributed programming models that simplify the
development of Cloud-based services, allowing for ordinary programmers
to take full advantage of the seemingly unbounded amount of
computational power and storage available on demand in large scale Cloud
This project aims at designing, building, and
evaluating an innovative middleware platform for service implementation
of Cloud-based services: Cloud-TM (Cloud-Transactional Memory). Cloud-TM
offers a simple and intuitive programming model for large scale
distributed applications that integrates the familiar notion of atomic
transaction as a first-class programming language construct, sparing
programmers from the burden of implementing low level, error-prone
mechanisms (e.g. locking, persistence and fault-tolerance) and
permitting major reductions in the time and cost of the development
Cloud-TM will embed a set of autonomic mechanisms to
simplify service monitoring and administration, a major source of costs
in dynamic and elastic environments such as the cloud. These mechanisms
aim at ensuring the achievement of user defined Quality of Service
levels at minimum operational costs by automating the provisioning of
resources from the cloud and self-tuning the middleware platform to
achieve optimal efficiency in the utilization of resources.
Transactional Resources in Cloud Environment
Resources provisioning is a very important issue in cloud computing, i.e. how resources may be organized, allocated and managed in order to deliver high quality services. Clouds make wide use of virtual resources, which are usually managed in terms of Virtual Machines (VMs) and/or Virtual Clusters (VCs). In a cloud environment, the entity that performs resource provisioning, called Resource Manager (RM), has to allocate resources for applications/services on VMs or VCs, which may be deployed on several physical servers in the same cloud, or even in different clouds. Then, a RM has to maintain access to such resources until the applications/services are terminated, executing several tasks such as:
- collecting and indexing all the resources available from cloud providers;
- evaluating the most suitable VMs or VCs for the application needs;
- allocating specific resources to each user/application request in order to ensure agreed requirements of QoS ;
- monitoring the usage of allocated resources in order to verify the SLA with the user;
- dynamicalmly reallocating resources, so to have a more efficiently use of available resources;
- when Application/Services request to allocate (extra) resources, checking whether they are authorized;
- managing VMs migration among different physical sites.
We believe that Transactional Memories represent a very helpful technology to implement all the functionalities necessary in a cloud architecture to manage virtual resources. In particular, our interest is focused on Software TM to support the implementation of the RM entity.
Transactions can provide a lightweight mechanism to synchronize all the activities on the virtual resources, alleviating many of the problems associated with the locking of resources. In particular, the RM can very efficiently deal with different requests from users or applications, since the resource allocation tasks are executed atomically. They are able to guarantee the consistency of the shared virtual resources, thereby reducing the complexity of both programming and verification.
Using transactions, there is no need to lock the virtual resources during the decision tasks, before allocating them or to unlock if the user request can not be satisfied. In fact, the RM could spend a lot of time before deciding how to settle a specific request, due to several factors, such as discovery of resources able to fulfill the request, priority in request queues, renegotiation of SLA.
Furthermore, in classic lock-based parallel programming, the order of the locks is very important to avoid deadlocks and livelocks. Hence transactional schemes can be used to develop more robust and flexible code.
Transactions can help to improve the efficiency of functionalities inside the RM, but also to synchronize different entities that affect the composition of virtual resources. The availability of cloud providers and their related resources can change during the time, according to agreements between the cloud manager and cloud providers themselves. This is especially true in cloud environments based on volunteered resources. In fact, volunteered cloud systems use computers voluntarily offered by their owners. So, user resources are not only a passive interface to cloud-based services, but can actively participate with other Clouds to create a larger infrastructure. However, in such a type of scenario, the availability of resources can not be guaranteed during the time and the RM has to interact with a very dynamic virtual infrastructure. For example, if in the meantime the RM is allocating a specific resource and the related cloud provider modifies its availability in resource provisioning or the related volunteered provider abruptly leaves the system, the allocation process in the RM will be affected by inconsistency problems. However, by using transactions, the allocation process in the RM will abort and start again, solving the inconsistency problem.
As additional advantage, transaction priorities offer a very natural way to manage requests characterized by different SLA and with different priority levesl. In fact, it serves as a contention management strategy that allows the RM to determine which of two transactions should abort if they encounter a conflict on a virtual resource. Lower priority transactions should defer to higher priority ones.
The application of transactional memories in cloud environment is not trivial and several challenging issues need to be investigated. First, an appropriate representation of virtual resources is necessary, since the introduction of transactions to an application can have negative, counterintuitive consequences for performance if data structures are badly designed. Second, due to the high number and complexity of functionalities that operate on virtual resources, it is very hard to specify how to divide them into atomic transactions. Specific atomization strategies should be developed, because threads with very long execution time cause a lot of redundancy and overhead if they incur in aborting.
With regard transactions aborts, we know that detecting conﬂicts increases the transaction throughput, as transactions do not perform useless work. Commit-time locking may help to avoid some read-write conﬂicts, but in general conﬂicts discovered at commit time cannot be solved without aborting at least one transaction. We guess that in cloud environment conflicts on virtual resources are less probable, since a successful deployment of a Cloud environment implies the collection of a huge amount of virtual resources. However, due the high dynamism of the cloud environment, we have to plan a strategy to overcome this potential situation. Thus, a run-time conﬂict density evaluation has to be performed, in order to asses how many events are likely to be involved in a virtual resource conﬂict. Then, specific policies should be implemented to limit the number of concurrently executing transactions at a given time if the conﬂict density is very high.
We are investigating all these issues with reference to a specific cloud architecture, that is Cloud@Home (https://cloudathome.unime.it). It is a new volunteer computing paradigm, where the infrastructure is obtained by merging heterogeneous resources offered by different domains and/or providers such as other Clouds, Grid farms, clusters, and datacenters, till single desktops.
Dynamic replicas of parallel stream applications in interconnection networks
The recent architectural trend that has lead to the widespread adoption of multicore CPUs has fostered a huge research interest to wards Software Transactional Memory (STM). As STMs are starting to face the high availability and scalability requirements of real world production environments, it is natural to foresee the need for repli cation solutions specifically tailored for STMs. It has been demonstrated that Motion JPEG2000 application can be parallelized and formally represented as a Directed Acyclic Graph (DAG) and adaptively mapped on the interconnection network. It is worth noting that the proposed technique has been validated by simulation of 1000 cores 2D mesh topology for random patterns in the presence of failures up to 10%. The hierarchical model is utilized, based on DAG (fork - join) graph presentation of parallel streaming application (Motion JPEG), for keeping dynamically the replicas of the children in the Stream Leader(s). This model is reducing drastically the latency for spreading replicas in distributed systems (on memory or hard drives.) Based on a learning technique, the "explicit path" is created for each stream (atomic transaction). The method tolerates multiple failures (stream leaders and children ) in DAG graph. In network cache coherence, by utilizing Virtual Tree Cache, will be exploited for reducing the communication latency.
Performance Modelling and Replication of Software Transactional Memories
Pierangelo Di Sanzo and Roberto Palmieri
Transactional Memories (STMs) are emerging as a potentially disruptive
programming model. Within this context, the activities of our research
group is focused on two aspects. The first one deals with dependability
of STM systems via replication, an the main target of our work has been
to maximize the overlap between communication/coordination and local
processing, essentially by means of speculative processing techniques.
The key idea behind the proposed approaches is to explore the optimistic
delivery event offered by the Optimistic Atomic Broadcast (OAB) group
communication protocol, to speculatively process transactions in
tentative orders, and propagate uncommitted data versions along
conflicting transactions chains. The second one deals with performance
evaluation of STM systems. As for this aspect, these systems are based
on (and require) innovative design and development approaches, where the
optimization focus is shifted on aspects that historically had less
importance in traditional transactional systems (e.g. DBMS). Following
these considerations, the our main research target has been to cope
with such peculiarities, providing performance evaluation tools that
well fit them.
Distributed Transactional Memory for Clusters and Grids
talk will present distributed transactional memory (DTM) experiences
from two projects and gives an outlook on perspectives for DTM in cloud
computing applications. The Plurix project (1996-2004) implements a DTM
for PC clusters. The central abstraction of this language-based
operating system (OS) is a DTM storing data and code (even the OS code
and drivers are stored in the DTM). Applications are based on an
event-driven architecture where events are handled using speculative
transactions. Furthermore, we have implemented another DTM within the
Object Sharing Service (OSS) part of the XtreemOS project funded by the
EU (FP6, 2006-2010). OSS implements a shared library for the Linux-based
XtreemOS grid system with transparent data access detection, adaptive
conflict unit size management, and various commit protocols. The
algorithms use explicitly marked transaction boundaries and have been
evaluated using a MMVE, a ray tracer and micro benchmarks. Currently, we
are adopting DTM concepts for an extended MapReduce framework and MMVEs
running in cloud environments.