Software Technologies For Developing Distributed Systems: Objects and Beyond
Software Technologies For Developing Distributed Systems: Objects and Beyond
Douglas C. Schmidt Vanderbilt University Abstract A distributed system is a computing system in which a number of components cooperate by communicating over a network. The explosive growth of the Internet and the World Wide Web in the mid-1990's moved distributed systems beyond their traditional application areas, such as industrial automation, defense, and telecommunication, and into nearly all domains, including e-commerce, financial services, health care, government, and entertainment. This article describes the key characteristics and challenges of developing distributed systems and evaluates key software technologies that have emerged to resolve these challenges, including distributed object computing middleware, component middleware, publish/subscribe and message-oriented middleware, and web services.
1 Introduction Computer software traditionally ran in stand-alone systems, where the user interface, application business processing, and persistent data resided in one computer, with peripherals attached to it by buses or cables. Few interesting systems, however, are still designed this way. Instead, most computer software today runs in distributed systems, where the interactive presentation, application business processing, and data resources reside in looselycoupled computing nodes and service tiers connected together by networks. Despite the increasing ubiquity and importance of distributed systems, however, developers of software for distributed systems face a number of hard challenges [POSA2], including: Inherent complexities, which arise from fundamental domain challenges: E.g., components of a distributed system often reside in separate address spaces on separate nodes, so inter-node communication needs different mechanisms, policies, and protocols than those used for intra-node communication in a stand-alone systems. Likewise, synchronization and coordination is more complicated in a distributed system since components may run in parallel and network communication can be asynchronous and non-deterministic. The networks that connect components in distributed systems introduce additional forces, such as latency, jitter, transient failures, and overload, with corresponding impact on system efficiency, predictability, and availability [VKZ04]. Accidental complexities, which arise from limitations with software tools and development techniques, such as nonportable programming APIs and poor distributed debuggers. Ironically, many accidental complexities stem from deliberate choices made by developers who favor low-level languages and platforms, such as C and C-based operating system APIs and libraries, that scale up poorly when applied to distributed systems. As the complexity of application requirements increases, moreover, new layers of distributed infrastructure are conceived and released, not all of which are equally mature or capable, which complicates development, integration, and evolution of working systems. Inadequate methods and techniques. Popular software analysis methods and design techniques [DWT04] [SDL05] have focused on constructing single-process, single-threaded applications with best-effort quality of service (QoS) requirements. The development of high-quality distributed systemsparticularly those with stringent performance requirements, such as video-conferencing or air traffic control systemshas been left to the expertise of skilled software architects and engineers. Moreover, it has been hard to gain experience with software techniques for distributed systems without spending much time wrestling with platform-specific details and fixing mistakes by costly trial and error. Continuous re-invention and re-discovery of core concepts and techniques. The software industry has a long history of recreating incompatible solutions to problems that have already been solved. There are dozens of general-purpose and real-time operating systems that manage the same hardware resources. Similarly, there are dozens of incompatible operating system encapsulation libraries, virtual machines, and middleware that provide slightly different APIs that implement essentially the same features and services. If effort had instead been focused on
On Distributed Systems
enhancing a smaller number of solutions, developers of distributed system software would be able to innovate more rapidly by reusing common tools and standard platforms and components.
2 Technologies for Supporting Distributed Computing To address the challenge described above, therefore, three levels of support for distributed computing were developed: ad hoc network programming, structured communication, and middleware. Ad hoc network programming includes interprocess communication (IPC) mechanisms, such as shared memory, pipes, and sockets, that allow distributed components to connect and exchange information. These IPC mechanisms help address a key challenge of distributed computing: enabling components from different address spaces to cooperate with one another. Certain drawbacks arise, however, when developing distributed systems only using ad hoc network programming support. For instance, using sockets directly within application code tightly couples this code to the socket API. Porting this code to another IPC mechanism or redeploying components to different nodes in a network thus becomes a costly manual programming effort. Even porting this code to another version of the same operating system can require code changes if each platform has slightly different APIs for the IPC mechanisms [POSA2] [SH02]. Programming directly to an IPC mechanism can also cause a paradigm mismatch, e.g., local communication uses object-oriented classes and method invocations, whereas remote communication uses the function-oriented socket API and message passing. The next level of support for distributed computing is structured communication, which overcomes limitations with ad hoc network programming by not coupling application code to low-level IPC mechanisms, but instead offering higherlevel communication mechanisms to distributed systems. Structured communication encapsulates machine-level details, such as bits and bytes and binary reads and writes. Application developers are therefore presented with a programming model that embodies types and a communication style closer to their application domain. Historically significant examples of structured communication are remote procedure call (RPC) platforms, such as Sun RPC and the Distributed Computing Environment (DCE). RPC platforms allow distributed applications to cooperate with one another much like they would in a local environment: they invoke functions on each other, pass parameters along with each invocation, and receive results from the functions they called. The RPC platform shields them from details of specific IPC mechanisms and low-level operating system APIs. Another example of structured communication is ACE [SH02] [SH03], which provides reusable C++ wrapper facades and frameworks that perform common structured communication tasks across a range of OS platforms. Despite its improvements over ad hoc network programming, structured communication does not fully resolve the challenges described above. In particular, components in a distributed system that communicate via structured communication are still aware of their peers remotenessand sometimes even their location in the network. While location awareness may suffice for certain types of distributed systems, such as statically configured embedded systems whose component deployment rarely changes, structured communication does not fulfill the following the properties needed for more complex distributed systems: Location-independence of components. Ideally, clients in a distributed system should communicate with collocated or remote services using the same programming model. Providing this degree of location-independence requires the separation of code that deals with remoting or location-specific details from client and service application code. Even then, of course, distributed systems have failure modes that local systems do not have [WWWK96]. Flexible component (re)deployment. The original deployment of an applications services to network nodes could become suboptimal as hardware is upgraded, new nodes are incorporated, and/or new requirements are added. A redeployment of distributed system services may therefore be needed, ideally without breaking code and or shutting down the entire system. Mastering these challenges requires more than structured communication support for distributed systems. Instead it requires dedicated middleware [ScSc02], which is distribution infrastructure software that resides between an application and the operating system, network, or database underneath it. Middleware provides the properties described above so that application developers can focus on their primary responsibility: implementing their domain-specific functionality. Realizing the need for middleware has motivated companies, such as Microsoft, IBM, and Sun, and consortia, such as the Object Management Group (OMG) and the World Wide Web Consortium (W3C), to develop
technologies for distributed computing. Below, we describe a number of popular middleware technologies, including distributed object computing, component middleware, publish/subscribe middleware, and service-oriented architectures and Web Services [Vin04a]. 2.1 Distributed Object Computing Middleware A key contribution to distributed system development was the emergence of distributed object computing (DOC) middleware in the late 1980s and early 1990s. DOC middleware represented the confluence of two major information technologies: RPC-based distributed computing systems and object-oriented design and programming. Techniques for developing RPC-based distributed systems, such as DCE, focused on integrating multiple computers to act as a unified scalable computational resource. Likewise, techniques for developing object-oriented systems focused on reducing complexity by creating reusable frameworks and components that reify successful patterns and software architectures. DOC middleware therefore used object-oriented techniques to distribute reusable services and applications efficiently, flexibly, and robustly over multiple, often heterogeneous, computing and networking elements. CORBA 2.x and Java RMI are examples of DOC middleware technologies for building applications for distributed systems. These technologies focus on interfaces, which are contracts between clients and servers that define a locationindependent means for clients to view and access object services provided by a server. Standard DOC middleware technologies like CORBA also define communication protocols and object information models to enable interoperability between heterogeneous applications written in various languages running on various platforms. Despite its maturity and performance, however, DOC middleware had key limitations, including: Lack of functional boundaries. The CORBA 2.x and Java RMI object models treat all interfaces as client/server contracts. These object models do not, however, provide standard assembly mechanisms to decouple dependencies among collaborating object implementations. For example, objects whose implementations depend on other objects need to discover and connect to those objects explicitly. To build complex distributed applications, therefore, application developers must explicitly program the connections among interdependent services and object interfaces, which is extra work that can yield brittle and non-reusable implementations. Lack of software deployment and configuratoin standards. There is no standard way to distribute and start up object implementations remotely in DOC middleware. Application administrators must therefore resort to in-house scripts and procedures to deliver software implementations to target machines, configure the target machine and software implementations for execution, and then instantiate software implementations to make them ready for clients. Moreover, software implementations are often modified to accommodate such ad hoc deployment mechanisms. The need of most reusable software implementations to interact with other software implementations and services further aggravates the problem. The lack of higher-level software management standards results in systems that are harder to maintain and software component implementations that are much harder to reuse. 2.2 Component Middleware Starting in the mid to late 1990s, component middleware emerged to address the limitations of DOC middleware described above. In particular, to address the lack of functional boundaries, component middleware allows a group of cohesive component objects to interact with each other through multiple provided and required interfaces and defines standard runtime mechanisms needed to execute these component objects in generic applications servers. To address the lack of standard deployment and configuration mechanisms, component middleware specifies the infrastructure to package, customize, assemble, and disseminate components throughout a distributed system. Enterprise JavaBeans and the CORBA Component Model (CCM) are examples of component middleware that define the following general roles and relationships: A component is an implementation entity that exposes a set of named interfaces and connection points that components use to collaborate with each other. Named interfaces service method invocations that other components call synchronously. Connection points are joined with named interfaces provided by other components to associate clients with their servers. Some component models also offer event sources and event sinks, which can be joined together to support asynchronous message passing.
On Distributed Systems
A container provides the server runtime environment for component implementations. It contains various predefined hooks and operations that give components access to strategies and services, such as persistence, event notification, transaction, replication, load balancing, and security. Each container defines a collection of runtime strategies and policies, such as transaction, persistence, security, and event delivery strategies, and is responsible for initializing and providing runtime contexts for the managed components. Component implementations often have associated metadata written in XML that specify the required container strategies and policies. In addition to the building blocks outlined above, component middleware also typically automates aspects of various stages in the application development lifecycle, notably component implementation, packaging, assembly, and deployment, where each stage of the lifecycle adds information pertaining to these aspects via declarative metadata. These capabilities enable component middleware to create applications more rapidly and robustly than their DOC middleware predecessors. 2.3 Publish/Subscribe and Message-Oriented Middleware RPC platforms, DOC middleware, and component middleware are largely based on a request/response communication model, where requests flow from client to server and responses flow back from server to client. Certain types of distributed applications, particularly those that react to external stimui and events, such as control systems and online stock trading systems, are not well-suited certain aspects of the request/response communication model. These aspects include synchronous communication between the client and server, which can underutilize the parallelism available in the network and endsystems, designated communication, where the client must know the identity of the server, which tightly couples it to a particular recipient, and point-to-point communication, where a client talks with just one server at a time, which can limit its ability to convey its information to all interested recipients. An alternative approach to structuring communication in certain types of distributed systems is therefore to use message-oriented middleware, which is supported by IBMs MQ Series, BEAs MessageQ, and TIBCOs Rendezvous, or publish/subscribe middleware, which is supported by the Java Messaging Service (JMS), the Data Distribution Service (DDS), and WS-NOTIFICATION. The main benefits of message-oriented middleware include its support for asynchronous communication, where senders transmit data to receivers without blocking to wait for a response. Many message-oriented middleware platforms provide transactional properties, where messages are reliably queued and/or persisted until consumers can retrieve them. Publish/subscribe middleware augments this capability with anonymous communication, where publishers and subscribers are loosely coupled and thus do not know about each other existence since the address of the receiver is not conveyed along with the event data, and group communication, where multiple subscribers can receive events sent by a publisher. The elements of publish/subscribe middleware are separated into the following roles: Publishers are sources of events, that is, they produce events on certain topics that are then propagated through the system. Depending on architecture implementation, publishers may need to describe the type of events they generate a priori. Subscribers are the event sinks of the system, that is, they consume data on topics of interest to them. Some architecture implementations require subscribers to declare filtering information for the events they require. Event channels are components in the system that propagate events from publishers to subscribers. These channels can propagate events across distribution domains to remote subscribers. Event channels can perform various services, such as filtering and routing, QoS enforcement, and fault management. The events passed from publishers to consumers can be represented in various ways, ranging from simple text messages to richly-typed data structures. Likewise, the interfaces used to publish and subscribe the events can be generic, such as send and recv methods that exchange arbitrary dynamically typed XML messages in WSNOTIFICATION, or specialized, such as a data writer and data readers that exchange statically typed event data in DDS. 2.4 Service-Oriented Architectures and Web Services Service-Oriented Architecture (SOA) is a style of organizing and utilizing distributed capabilities that may be controlled by different organizations or owners. It therefore provides a uniform means to offer, discover, interact with
and use capabilities of loosely coupled and interoperable software services to support the requirements of the business processes and application users. The ubiquity of the World Wide Web (WWW) and the lessons learned from earlier forms of middleware were leveraged to create SOAP, which is a protocol for exchanging XML-based messages over a computer network, normally using HTTP. The introduction of SOAP spawned a popular new variant of SOA called Web Services that is being standardized by the World Wide Web Consortium (W3C). Web Services allow developers to package application logic into services whose interfaces are described with the Web Service Description Language (WSDL). WSDL-based services are often accessed using standard higher-level Internet protocols, such as SOAP over HTTP. Web Services can be used to build an Enterprise Service Bus (ESB), which is a distributed computing architecture that simplifies interworking between disparate systems. Mule and Celtix are open-source examples of the ESB approach to melding groups of heterogeneous systems into a unified distributed application. Despite some highly publicized drawbacks [Bell06] [Vin04b], Web Services have established themselves as the technology of choice for most enterprise business applications. This does not mean, however, that Web Services will completely displace earlier middleware technologies, such as EJB and CORBA. Rather, Web Services complements these earlier successful middleware technologies and provides standard mechanisms for interoperability. For example, the Microsoft Windows Communication Foundation (WCF) platform and the Service Component Architecture (SCA) being defined by IBM, BEA, IONA, and others combine aspects of component-based development and Web technologies. Like components, WCF and SCA platforms provide black-box functionality that can be described and reused without concern for how a service is implemented. Unlike traditional component technologies, however, WCF and SCA are not accessed using the object model-specific protocols defined by DCOM, Java RMI, or CORBA. Instead, Web services are accessed using Web protocols and data formats, such as HTTP and XML, respectively. Rather than trying to replace older approaches, todays Web Services technologies are instead focusing on middleware integration, thereby adding value to existing middleware platforms. WSDL allows developers to abstractly describe Web Service interfaces while also defining concrete bindings, such as the protocols and transports required at runtime to access the services. By providing these common communication mechanisms between diverse middleware platforms, Web Services allow component reuse across an organizations entire application set, regardless of their implementation technologies. For example, projects such as the Apache Web Services Invocation Framework (WSIF) [Apache06], Mule, and CeltiXfire, aim to allow applications to access Web Services transparently via EJB, JMS, or the SCA. This move towards integration allows services implemented in these different technologies to be integrated into an ESB and made available to a variety of client applications. Middleware integration is thus a key focus of Web Services applications for the foreseeable future [Vin03]. By focusing on integration, Web Services increases reuse and reduces middleware lock-in, so developers can use the right middleware to meet their needs without precluding interoperability with existing systems.
3 Understanding Distributed Systems Software Technologies via Patterns Although the various middleware technologies described in Section 2 differ widely in their programming interfaces and language mappings they share many of the same patterns [VKZ04]. Design-focused patterns provide a vocabulary for expressing architectural visions, as well as examples of representative designs and detailed implementations that are clear and concise. Presenting pieces of software in terms of their constituent patterns also allows developers to communicate more effectively, with greater conciseness and less ambiguity. Distributed computing has been a popular focus for pattern authors for many years. For example, [POSA2] and [VKZ04] present collections of patterns for developing distributed object computing middleware. Likewise, [HOHPE03] and [FOW02] present collections of patterns for enterprise message-oriented middleware and serviceoriented architectures. Most recently, [POSA4] has captured an extensive pattern language for building distributed software systems that connects over 250 patterns addressing topics ranging from defining and selecting an appropriate baseline architecture and communication infrastructure, to specifying component interfaces, their implementations, and their interactions. Together, the patterns covered in these books address key technical aspects of distributed computing,
On Distributed Systems
such as adaptation and extension, concurrency, database access, event handling, synchronization, and resource management. As software is integrated into mission-critical systems there is an increasing need for robust techniques to meet user dependability requirements. Patterns on fault tolerance and fault management have therefore been an active focus over the past decade. Several recent books [UTAS05] [HAN07] contain patterns and pattern languages that address fault tolerance and fault management for systems with stringent operational requirements. Likewise, developing high-quality distributed real-time and embedded (DRE) systems that provide predictable behavior in networked environments is also increasingly crucial to support mission-critical systems. Patterns that guide the development of resource management algorithms and architectures for DRE software appear in [DIP07] and [POSA3].
4 Concluding Remarks Software for distributed systems has historically been developed largely from scratch. This development process has been applied many times in many companies, by many projects in parallel. Even worse, it has been applied by the same teams in a series of projects. Regrettably, this continuous rediscovery and reinvention of core concepts and code has kept costs unnecessarily high throughout the software development life cycle. This problem is exacerbated by the diversity of todays hardware, operating systems, compilers, and communication platforms, which keep shifting the foundations of software development for distributed systems. In todays competitive, time-to-market-driven environments, it is increasingly infeasible to develop custom solutions manually from scratch. Such solutions are hard to customize and tune, because so much effort is spent just trying to make the software operational. Moreover, as requirements change over time, evolving custom software solutions becomes prohibitively expensive. End-users expector at least desiresoftware to be affordable, robust, efficient, and agile, which is hard to achieve without the solid architectural underpinnings achievable via systematic reuse of the middleware technologies described in this article. The past decade has yielded significant progress in the reuse of software for distributed systems stemming from the systematic documentation of patterns and pattern languages that help simplify the development and use of middleware.
Acknowledgements Thanks to Frank Buschmann, Krishnakumar Balasubramanian, Kevlin Henney, and Steve Vinoski for providing insightful comments that helped improve this article.
References [Bell06] A.E. Bell: Software Development Amidst the Whiz of Silver Bullets..., ACM Queue vol. 4, no. 5 - June 2006 {DIP07] Dipippo et al, Design Patterns for Distributed Real-Time Embedded Systems, Springer 2007 [DWT04] A. Dennis, B. Haley Wixom, D. Tegarden: Systems Analysis and Design with UML Version 2.0: An ObjectOriented Approach, John Wiley & Sons, 2004 [FOW02] M. Fowler, Patterns of Enterprise Application Architecture, Addison-Wesley 2002 [GoF95] E. Gamma, et al., Design Patterns: Elements of Reusable Object-Oriented Software, Addision Wesley, 1995 [HAN07] R.S. Hanmer: Patterns For Fault Tolerant Software, John Wiley and Sons, 2007 [HOHPE03] G. Hohpe, et al., Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions, Addison-Wesley 2003 [POSA2] D.C. Schmidt, M. Stal, H. Rohnert, F. Buschmann: Pattern-Oriented Software ArchitecturePatterns for Concurrent and Networked Objects, John Wiley and Sons, 2000 [POSA3] M. Kircher and P. Jain: Pattern-Oriented Software ArchitecturePatterns for Resource Management, John
Wiley and Sons, 2003 [POSA4] F. Buschmann, K. Heneny, D.C. Schmidt: Pattern-Oriented Software ArchitectureA Pattern Language for Distributed Computing, John Wiley and Sons, 2007 [ScSc02] R.E. Schantz, D.C. Schmidt: Middleware for Distributed Systems: Evolving the Common Structure for Network-centric Applications, Encyclopedia of Software Engineering, J. Marciniak, G. Telecki (eds.), John Wiley & Sons, New York, 2002 [SDL05] A. Prinz, R. Reed, J. Reed (edts.): Conference Proceedings of SDL 2005: Model Driven: 12th International SDL Forum, Grimstad, Norway, June 20-23, 2005, Proceedings, Springer Verlag, 2005 [SH02] D.C. Schmidt, S.D. Huston: C++ Network Programming, Volume 1: Mastering Complexity with ACE and Patterns, Addison Wesley, 2002 [SH03] D.C. Schmidt, S.D. Huston: C++ Network Programming, Volume 2: Systematic Reuse with ACE an Frameworks, Addison Wesley, 2003 [UTAS05] G. Utas, Robust Communications Software: Extreme Availability, Reliability and Scalability for CarrierGrade Systems, Wiley 2005 [VKZ04] M. Vlter, M. Kircher, U. Zdun: Object-Orieted Remoting Patterns Foundations of Real-Time, Internet, and Enterprise Distribution Infrastructures, John Wiley and Sons, 2004 [Vin03] S. Vinoski: Toward Integration with Web Services, IEEE Internet Computing, November / December 2003, IEEE, 2003 [Vin04a] S. Vinoski: An Overview of Middleware, 9th International Conference on Reliable Software Technologies Ada-Europe 2004, Palma de Mallorca, 14-18 June 2004 [WWWK96] J. Waldo, G. Wyant, A. Wollrath, S.C. Kendall (eds.): A Note On Distributed Computing, Lecture Notes of Computer Science, vo. 1222, Springer Verlag, 1996
About the Author Dr. Douglas C. Schmidt is a Professor of Computer Science at Vanderbilt University. His expertise focuses on distributed computing middleware, object-oriented patterns and frameworks, and distributed real-time and embedded (DRE) systems. He has authored over 400 publications in IEEE, ACM, IFIP, and USENIX technical journals and conferences, and 9 books that cover high-performance communication software systems, real-time distributed computing, and object-oriented patterns for concurrent and distributed systems.