Learning from Mistakes
A Comprehensive Study on
Real World Concurrency
Bug Characteristics
Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan
Zhou
Presented By Vehbi Esref Bayraktar
( aka Benjamin Turk )
Introduction
Multicore hardware has made concurrent
programs ( CP ) pervasive.
Not only for high-end servers but desktop
machines
The difficulty of hitting the entire SW
environment.
Requirement of good quality.
Introduction
Roadblocks to a correct CP
Most of programmers think
sequentially
Non-determinism makes bugs hard to
repeat.
Addressing those roadblocks will require
different efforts from multiple aspects
Efforts
Concurrency bug detection
Deadlock bugs:
Two or more operations waiting for each other to release the
acquired resources.
Non-deadlock bugs:
Data-race bugs (Order violation)
Multiple confliction accesses to one shared variable
Atomicity violation bugs
Some operations needed to be done without interruption.
Other bugs
Efforts
CP testing and model checking
Exposing SW bugs before release
Model checking refers to test automatically whether the
model meets a given specification.
Exponential interleaving space for CPs to record in most
of current testing approaches
Even testing a small representative interleaving may still
expose most of the concurrency bugs
ConTest, Application of Synchronization coverage,
PPOPP,2005
The manifestation conditions of concurrency bugs
Efforts
Concurrent programming language
design
Good programming languages help
programmers correctly express their
intentions
Transactional memory (one approach to it)
Atomicity
def transfer_money(from_account,
Consistency
to_account, amount):
Isolation with transaction():
from_account -= amount
to_account += amount
Outline
Introduction
Case Study
Bug pattern study
Bug manifestation study
Bug fix strategies study
Conclusions
CP Characteristics
Bug Pattern
Deadlock bugs
Non-deadlock bugs
Order-violation
Data race
Happened-before property
Atomicity-violation
Others
CP Characteristics
Bug Manifestation
Manifestation conditions
Bug fix strategy
Patches
Transactional memory (TM)
Sources
MySQL
Database server
Apache
Web server
Mozilla
Browser suite
Open Office
Office suite
Bug Pattern Study
Bug Pattern Study
Bug Pattern Study
Bug Pattern Study
Bug Manifestation Study
How many threads are involved?
Bug Manifestation Study
How many threads are involved?
Bug Manifestation Study
How many variables are involved?
Bug Manifestation Study
How many variables are involved?
Bug Manifestation Study
How many accesses are involved?
Bug fix Study
Fix strategies
Mistakes during bug fixing
Bug avoidance
Bug fix Study
Fix Strategies
Adding / changing locks
Condition check
Code switch
Algorithm/Data-structure design change
Others
Bug fix Study
Fix Strategies
Adding / Changing Locks
Bug fix Study
Fix Strategy
Condition check
Problems cant be solved by the same level
of thinking that created them!!!
Bug fix Study
Fix Strategies
Bug fix Study
Mistakes during bug fixing
Buggy patches
Every fix should be re-tested and analyzed.
Bug fix Study
Bug avoidance
Other Characteristics
Bug impacts
34 cause program crashes
37 of them cause program hangs
Non-determinism
Some concurrency bugs are very difficult to repeat
Mozilla and occur once in a day bug
Test cases are critical to bug diagnosis
Good test inputs for repeating bugs.
Programmers lack diagnosis tools & training
Conclusions
Comprehensive study of the real world concurrency
bugs
Give us alternative views for
Concurrency bug detection
Concurrent program testing and model checking
Concurrent programming language design
Questions
Dont hesitate to ask!!!
EXTRA - I
EXTRA-II
Actors
Bound to a single thread (at a time)
No mutable Shared State (All mutable states are private)
EXTRA-III
Languages let us express ourselves against a model!!!!
EXTRA-IV
LOCKS
Should be used to protect logical invariants, not memory
locations
Rules while using them:
Avoid them by implicit replication
Try to at most one lock at a time : never call other people's code
while holding a lock.
Always acquire locks on in the same order (Deadlock problem)
Stratify locks, Assign each lock a level such that 2 locks on the same level
are never locked at the same time, then always acquire locks in level order.
Sort the locks to be locked
Back off : acquiring a set of locks, if any lock cant be acquired
immediately, release all locks already acquired.
Dont contend on a lock a lot, Strangled Scaling
EXTRA-V
Snapshot means:
Write any array element
Read multiple array elements atomically
EXTRA-VI
Wait-Free : each method call takes a finite number of steps
to finish
Nicer guarantee, no starvation
Lock-free : infinitely often some method call finishes
Practically good enough
Gambling on non-determinism
Consensus is the synchronization power
Universal, meaning from n-thread consensus
Wait-free/Lock-free
Linearizable
N-threaded
Implementation
Of any sequentially specified object