Skip to main content
E. Douglas  Jensen
  • Boston, MA
  • +1 508 728 0809
This paper compares several parametric and adaptive failure detection schemes in terms of their respective QoS. We introduce an improvement over existing methods, and evaluate their benefits. First, we propose an optimization to enhance... more
This paper compares several parametric and adaptive failure detection schemes in terms of their respective QoS. We introduce an improvement over existing methods, and evaluate their benefits. First, we propose an optimization to enhance the adaptation of Chen's FD, which significantly improves QoS, especially in the aggressive range and when the network is unstable. Second, we address the problem of
ABSTRACT In a decentralized network system, an authenticated node is referred to as a Byzantine node, if it is fully controlled by a traitor or an adversary, and can perform destructive behavior to disrupt the system. Typically, Byzantine... more
ABSTRACT In a decentralized network system, an authenticated node is referred to as a Byzantine node, if it is fully controlled by a traitor or an adversary, and can perform destructive behavior to disrupt the system. Typically, Byzantine nodes together or individually attack point-to-point information propagation by denying or faking messages. In this paper, we assume that Byzantine nodes can protect themselves from being identified by authentication mechanisms. We present an authentication-free, gossip-based application-level propagation mechanism called LASIRC, in which "healthy" nodes utilize Byzantine features to defend against Byzantine attacks. We show that LASIRC is robust against message-denying and message-faking attacks. Our experimental studies verify LASIRC's effectiveness.
We consider scheduling real-time distributable threads in the presence of node/link failures, message losses, and dynamic node joins and departures. We present a distributed scheduling algorithm called RTMG. The algorithm uses... more
We consider scheduling real-time distributable threads in the presence of node/link failures, message losses, and dynamic node joins and departures. We present a distributed scheduling algorithm called RTMG. The algorithm uses gossip-based communication for discovering eligible nodes. Traditionally, gossip protocols incur high message overhead. We explain that this problem is not that serious. We present a hybrid message propagation protocol
ABSTRACT First Page of the Article
ABSTRACT We consider scheduling distributable real-time threads that are subject to dependencies (e.g., due to mutual exclusion constraints) in ad hoc networks, in the presence of node and link failures, message losses, and dynamic node... more
ABSTRACT We consider scheduling distributable real-time threads that are subject to dependencies (e.g., due to mutual exclusion constraints) in ad hoc networks, in the presence of node and link failures, message losses, and dynamic node joins and departures. We present a gossip-based distributed scheduling algorithm, called RTG-D. We prove that thread blocking times under RTG-D are probabilistically bounded, thereby probabilistically bounding thread time constraint satisfactions'. Our simulation results validate RTG-D's effectiveness.
Session summary High availabihty, sometimes: referred to as fault tolerance, can be considered to be comprised of several classes of activities, e g.: fault detection, fault diagnosis, fault confinement, fault recovery. fault repair,... more
Session summary High availabihty, sometimes: referred to as fault tolerance, can be considered to be comprised of several classes of activities, e g.: fault detection, fault diagnosis, fault confinement, fault recovery. fault repair, fault reporting, and restart (if necessary). This topic has been the fc~cus of much research m centralized computer systems, and more recently m the context of distributed systems such as networks and distributed computers. The strong interest in high availability and fault tolerance in distributed systems stems not just from their inherently greater fault susceptibility in certain ways (e.g.. data inconsistency) but also from their potential for improved axailabflity over centralized systems (e.g.. physical isolation). However, in general the user of a distributed system takes neither of these perspectives: (s)he has a need which seems best filled by a distributed system, and must overcome availability obstacles and take advantage of availability opportunities. In this session, four invited speakers addressed the topic from different viewpoints. The first two speakers presented projects where the principal mechanism for achieving fault tolerance are atomic transactions supported m the kernel. The third speaker discussed language-based tools for dynamic reconfiguration of d~strtbuted systems. The last speaker presented the fault-tolerance aspects of a network-operating system based on actors messages, and ports. These four presentations are briefly synopsized below. ArchOS E. Douglas Jensen (Carnegie-_Mellon University. USA) outlined a large and long-term project performing research on "decentralized computers", in which a system-wide but physically replicated OS manages all the global resources through teams which negotiate, compromise, and reach a best-effort consensus based on inaccurate and incomplete information. This is supported by a general atomic transaction facility m each instance of the kernel, which provides "compound" nonseriahzable transactions on distributed objects and "failure safety" (both of which are supported by a new formal theory of consistency and correctness), as well as the conventional nested serlahzable transactions and failure atomicity whmh are special cases. Separate prototypes of the transaction kernel and a best-effort resource management kernel (initially confined to time-driven placement and scheduling of real-time processes) are expected to be operational on approximately ten Ethernet'ed Sun F, ticrosystems nodes before the end of the year. Each node is a multiprocessor to avoid OS processing from burdening the application processor; special-purpose OS support hardware is being designed. A complete global decentralized operating system named ArchOS is taking the unusual approach (for a research project) of proceeding through all the …
We consider the problem of scheduling exception handlers in real-time systems that operate under runtime uncertainties including those on execution times, activity arrivals, and failure occurrences. The application/scheduling model... more
We consider the problem of scheduling exception handlers in real-time systems that operate under runtime uncertainties including those on execution times, activity arrivals, and failure occurrences. The application/scheduling model includes activities and their exception handlers that are subject to time/utility function (TUF) time constraints and an utility accrual (UA) optimality criterion. A key underpinning of the TUF/UA scheduling paradigm is the notion of “best-effort” where high importance activities are always favored over low importance ones, irrespective of activity urgency. (This is in contrast to classical admission control models which favor feasible completion of admitted activities over admitting new ones, irrespective of activity
importance.) We consider a transactional style activity execution paradigm, where handlers that are released when their activities fail (e.g., due to time constraint violations) abort the failed activities after performing recovery actions. We present a scheduling algorithm called Handler-assured Utility accrual Algorithm (or
HUA) for scheduling activities and their handlers. We show that HUA’s properties include bounded-time completion for handlers and bounded loss of the best-effort property. Our implementation experience on a RTSJ (Real-Time Specification for Java) Virtual Machine demonstrates the algorithm’s effectiveness.
ABSTRACT First Page of the Article
Page 1. nrprcu. A New Generation Real-Time Decentralized Operating System E. Douglas Jensen Concurrent Computer Corporation Westford, MA ... References Northcutt, JD, Clark, RK,Shipman, S. E., Maynard, DP, Lindsay, DC, Jensen, ED, Smith,... more
Page 1. nrprcu. A New Generation Real-Time Decentralized Operating System E. Douglas Jensen Concurrent Computer Corporation Westford, MA ... References Northcutt, JD, Clark, RK,Shipman, S. E., Maynard, DP, Lindsay, DC, Jensen, ED, Smith, JM, ...
Alpha is a non-proprietary experimental operating system kernel which extends the real-time domain to encompass distributed applications, such as for telecommunications, factory automation, and defense. Distributed real-time systems are... more
Alpha is a non-proprietary experimental operating system kernel which extends the real-time domain to encompass distributed applications, such as for telecommunications, factory automation, and defense. Distributed real-time systems are inherently asynchronous, dynamic, and non-deterministic, and yet are nonetheless mission-critical. The increasing complexity and pace of these systems precludes the historical reliance solely on human operators for assuring system dependability under uncertainty. Traditional real-time OS technology is based on attempting to assert or impose determinism of not just the ends but also the means, for centralized low-level sampled-data monitoring and control, with an insufficiency of hardware resources. Conventional distributed OS technology is primarily based on two-party client/server hierarchies for explicit resource sharing in networks of autonomous users. These two technological paradigms are special cases which cannot be combined and scaled up cost-effectively to accommodate distributed real-time systems. Alpha’s new paradigm for real-time distributed computing is founded on best-effort management of all resources directly with computation completion time constraints which are expressed as benefit functions; and multiparty, peer-structured, trans-node computations for cooperative mission management.
ABSTRACT In this paper, we present a reliable real-time data delivery (communication) mechanism for ad-hoc networks, called RTRD. The mechanism makes use of a proactive wireless routing protocol (DSDV) for path finding and maintenance,... more
ABSTRACT In this paper, we present a reliable real-time data delivery (communication) mechanism for ad-hoc networks, called RTRD. The mechanism makes use of a proactive wireless routing protocol (DSDV) for path finding and maintenance, and timely delivers data through a priori bandwidth reservation. In addition, to be robust to network failures, or to deliver large data chunks, it simultaneously delivers data in multiple paths. The simulation results conducted by NS-2 validate RTRD's effectiveness.
ABSTRACT The desireability of increased synergism between the hardware and software of computer systems has become a cliche, but unfortunately without being significantly reflected in practice. One of the principle aspects of our research... more
ABSTRACT The desireability of increased synergism between the hardware and software of computer systems has become a cliche, but unfortunately without being significantly reflected in practice. One of the principle aspects of our research in distributed computer systems has been to actually apply these arguments in the implementations and explore their ramifications. The examples herein were largely derived from that experience.
The benefit accrual model, a framework for specifying, attaining, and evaluating timeliness, is presented. It generalizes the traditional special cases of deadlines as time constraints and unanimous optimum as the scheduling criterion.... more
The benefit accrual model, a framework for specifying, attaining, and evaluating timeliness, is presented. It generalizes the traditional special cases of deadlines as time constraints and unanimous optimum as the scheduling criterion. Consequently, this framework scales to encompass nondeterministic asynchronous decentralized systems.<<ETX>>
ABSTRACT Work is beginning on the Distributed Real-Time Specification for Java, as part of Sun's Java Community Process. This paper summarizes some ideas about an initial approach to the specification. The approach is based on... more
ABSTRACT Work is beginning on the Distributed Real-Time Specification for Java, as part of Sun's Java Community Process. This paper summarizes some ideas about an initial approach to the specification. The approach is based on providing a natural and minimal mechanistic extension to Remote Method Invocation (RMI) to support the end-to-end timeliness (and other) properties of distributed -- in the sense of trans-node -- behaviors. These timeliness properties must be preserved for any distributed real-time computing system, regardless of its application programming model -- whether RPC, mobile objects, or whatever. The proposed extension also facilitates real-time distributed programming with control flow programming models in particular. A similar facility has proven effective in several other distributed real-time operating system and middleware contexts, and is a primary feature of the unified proposal to OMG for Dynamic Real-Time CORBA.

And 229 more

Updated May 24, 2020 A Time/Utility (née Time/Value) Function (TUF) specifies an action's (e.g., task's) application-specific utility depending on its completion time (C). By convention, a TUF is concave. It has a critical time (even if... more
Updated May 24, 2020

A Time/Utility (née Time/Value) Function (TUF) specifies an action's (e.g., task's) application-specific utility depending on its completion time (C). By convention, a TUF is concave. It has a critical time (even if it is linear) after which its utility does not increase. A conventional deadline (d) is a simple special case, a downward step TUF having utility values {1,0}. More generally, a TUF permits downward (and upward) step functions to have any appropriate utilities {u1, u2}. Tardiness is a simple special case whose non-zero utility is the linear function C-d. More generally, a TUF allows non-zero earliness and tardiness to be non-linear. Thus, one useful interpretation of utility can be timeliness, providing a rich generalization of traditional action completion time constraints in real-time systems. TUF utility may include negative values. TUFs and their utility scales and values are derived from domain-specific subject matter knowledge. The optimality criteria for scheduling TUFs are maximal utility accrual (UA)-which can be interpreted as actions' collective timeliness-and predictability of that accrued utility (while respecting dependencies and resource constraints). The scheduler performs application-specific trade-offs between accrued utility and its predictability. The TUF/UA paradigm is intended for (but not limited to) open-world systems, so imperfections in the scheduling parameters are inevitable. Some of these imperfections may be amenable to stochastic scheduling. Others are too major and complex for orthodox (e.g., additive) probability theory. Imprecision may call for using fuzzy set theory. Epistemic uncertainties--e.g., ignorance of, or conflicts among, scheduling parameters--may be present and require an appropriate kind and degree of resilience in the UA algorithmic techniques. Thus, UA schedulers may base utility accrual and its predictability on more general uncertainty models, such as a (potentially "fuzzified" version of a) belief-based theory (e.g., the Transferable Belief Model, etc.). Online UA scheduling efficiency is often enhanced by implementing the scheduler in hardware (e.g., custom RISC-Vs, GPUs, FPGAs, ASICs). The TUF/UA paradigm has been particularly successful in military combat systems, because of the extreme uncertainties in those environments.
This Introduction describes the TUF/UA paradigm and its default system model in more detail than previously documented. The default system model is based on the author's extensive experience with applications which use this paradigm, and... more
This Introduction describes the TUF/UA paradigm and its default system model in more detail than previously documented. The default system model is based on the author's extensive experience with applications which use this paradigm, and on generality. Any instantiation of the paradigm is application-specific and normally has a system model different from (a subset of) the default one. The default system model also serves to illuminate research opportunities in this area. This paradigm has the side effect of highlighting the greater generality and applicability of "real-time" than is conventionally perceived, especially by the computing community. A Time/Utility Function (TUF)-originally [Jensen 77] [Jensen+ 85], and still often by others, called Time/Value Function-expresses the timeliness (both urgency and worth) of completing an action (such as a computational task or a device physical motion) in an application-specific utility ratio scale [Prasad+ 03], as an application-specific function of when that action's operation completes. TUFs and their utility scales and values are representations derived from system-and application-specific subject matter knowledge (e.g., Clark+ 99] [Theys+ 91]).
Extended Abstract (with References) This Introduction describes the TUF/UA paradigm and its default system model in more detail than previously documented. The default system model is based on application experience and generality. Any... more
Extended Abstract (with References)

This Introduction describes the TUF/UA paradigm and its default system model in more detail than previously documented. The default system model is based on application experience and generality. Any instantiation of the paradigm is application-specific and normally has a system model different from (a subset of) the default one. This paradigm has the side effect of illuminating the greater generality and applicability of "real-time" than is conventionally perceived. ¶ A Time/Utility Function (TUF)-originally [Jensen 77] [Jensen+ 85], and still often by others, called Time/Value Function-expresses the timeliness (both urgency and worth) of completing an action (such as a computational task or a device physical motion) in an application-specific utility ratio scale [Prasad+ 03], as an application-specific function of when (usually* in physical time) that action's operation completes. TUFs and their utility scales and values are derived from system-and application-specific subject matter knowledge (e.g., Clark+ 99]). A framework for assigning TUF utility values has been proposed for a class of very limited cases [Burns+ 2000]; but most cases necessitate specialized tools and techniques (e.g., [Lee+ 07]) *. ¶ Utility functions are normally non-convex-concave or linear (cf. Figure 2 in [Jensen 04]). Constant ones can either represent priority or be one way to represent relative importance. A (non-constant) TUF may have a "critical time" (a deadline is a special case), after which its utility does not increase [Locke 86]. A utility function's range may include negative values (penalties). A TUF's shape may be dynamically adapted-e.g., at an action or application mode change (such as for ballistic missile flight phases) [Maynard+ 88], cf. Figures 8 and 9 in [Jensen 04]. ¶ Actions which are being scheduled collectively may be any mix of aperiodic, sporadic, and periodic, but periodic actions do not receive traditional preferential treatment. Collectively scheduled (perhaps a subset of) actions are made to have consonant scales. At each of one or more scheduling instants, the scheduling algorithm considers the TUFs of all ready actions (and other system model information about the current situation), and schedules a set (from one to all) of those actions for operation. Specific UA scheduling algorithms are devised to provide specific acceptable timeliness and predictability-real-time QoS-assurances for specific system models and application situations. ¶ The scheduling is primarily according to two application-specific real-time QoS criteria-Utility Accrual (UA), which is the (commonly, expected) polynomial sum of their collective and individual utilities; and the (usually non-deterministic) predictability of that sum (i.e., of system* timeliness). Thus, UA scheduling is generally not greedy (and intentionally not fair), so actions may be either preemptible or not. A schedule normally also considers actions' constraints such as precedence and resource dependencies [Clark 90] [Li+ 06], and properties such as energy [Wu 05] and relative importance. ¶ In this paradigm, importance is application-specific (e.g., in terms of track quality, or weapon spherical error probable, etc.) and dynamic *. It is distinct from, and may be orthogonal to, an action's timeliness (urgency)-both
ABSTRACT Informally, a system is a “real-time” one if its core properties of timeliness and predictability of timeliness are integral to its logic, not only performance measures. In general, those properties are dynamic due primarily to... more
ABSTRACT

Informally, a system is a “real-time” one if its core properties of timeliness and predictability of timeliness are integral to its logic, not only performance measures. In general, those properties are dynamic due primarily to intrinsic epistemic uncertainties—e.g., ignorance, inaccuracy, non-determinism—in the system and its application environment. Despite such uncertainties, dynamically real-time systems have multiple application-specific kinds and degrees of criticality—including even the most extreme safety-critical systems (e.g., for warfare). Traditional real-time computing systems are a special case whose core properties are predominately static and presumed to be known á priori, thus greatly limiting those systems’ applicability. Many dynamically real-time systems exist, built by application domain experts outside the real-time field, but without the benefits of a coherent foundation for real-time per se. The design, implementation, and application of real-time systems can be extended and strengthened by creating such a foundation based on principles for those core properties. This book introduces one approach to that. Timeliness is dynamically expressive using time/utility functions. Uncertainty of timeliness predictability can be reasoned about using various mathematical theories of evidence (belief functions). The foundation has been successfully employed in different real-time contexts having wickedly dynamic core properties, where traditional static real-time perspectives and techniques were insufficient.
Research Interests:
This is the Introduction page in my Work-in-progress preview of a work-in-progress book "Introduction to Fundamental Principles of Dynamically Real-Time Systems," [Jensen 2018]. The Abstract page is in another of my sessions here.
Research Interests: