Until recently there has been little research effort directed towards
software deployment. Software deployment includes activities such as releasing,
configuring, installing, updating, adapting, de-installing, and even de-releasing
a software system. Modern software systems are increasing the complexity
of these tasks as more sophisticated architectural models, such as system
of systems and coordinated distributed systems, become commonplace. Our
research direction focuses on two main areas with respect to the support
of software deployment. The first of these is the definition and implementation
of an architecture that supports software deployment activities in a global,
uniform manner. The second area is to use this architecture to create standard,
generic schemas and processes to perform software deployment activities.
This standardization will allow software producers to deploy their software
more fully and with less effort than the current ad-hoc methods. The goal
of this research is to make software deployment an integral part of software
development.
Traditional configuration management systems have focused on development activities, such as source code control, and until recently few mechanisms were in place to support software deployment activities. Solutions to the most common deployment activities, configuring and installing a software system, have seen the most development effort, but these efforts have failed to generate a sufficient general-purpose solution. A contributing factor to this failure to generate a general solution is that the information required to configure and install various software systems on a particular site is generally not accessible, complete, nor accurate. Even when the information is available, nonstandard methods to access the information make it difficult to automatically configure the system being deployed. Tools such as Autoconf [12] and Ship [6] attempt to obtain the information on a per installation, ad hoc basis by using scripts and heuristics. The Microsoft Registry [9] stores some amount of configuration information at a site, but that information is only partially standardized and is specific to Microsoft Windows. Compounding these configuration and installation problems is the fact that many development organizations do not make their software system's resource dependencies an explicit part of the system's definition. As a result, deployment of the software system fails or partially fails because there is no way to ensure that the resource dependencies are met. This type of failure can be characterized as the "missing component" problem.
The introduction of new software systems and the continual re-engineering of old software systems to take advantage of or to adhere to component technologies, such as CORBA [18] or JavaBeans [24], will result in the "missing component" problem becoming more common. The personal computing notion that software systems are self-contained, such that copies of all components needed for an installation are included on a CD-ROM, is overly simplistic. For example, the plug-ins and helper applications used with Web browsers are not themselves components of the browsers, but are independently developed and maintained systems themselves. Even if one could construct a monolithic installation, this approach still fails when components are shared among systems where the versions of those shared components may not be consistent.
Support for deployment activities other than configuration and installation is essentially nonexistent. The installed system typically becomes a static entity detached from its producers and poorly understood by its consumers. For example, a consumer might modify his consumer site configuration in such a way that prohibits an existing software system at the consumer site to operate properly. The consumer has no recourse for determining how his changes may affect the software systems at his site. This difficulty results from the fact that constraints and dependencies are not explicit nor is the software able to automatically adapt. Instead, as the environment changes, it becomes the consumer's burden to ensure that the system continues to function properly. As enhancements and bug fixes are released, there is no standard way for the consumer to become aware of these artifacts, to automatically upgrade the installed system, or to even locate the installed system.
Clearly, there is a need to bring all of the deployment activities under the umbrella of software development and configuration management. This can be accomplished by developing a powerful new generation of configuration management technologies that account for post-development activities. To be effective, these new technologies must:
As a contribution to the new generation of configuration management technologies, we are proposing the Software Dock, an architecture for supporting the software deployment process. By analogy to the hardware docking stations used with portable notebook computers, the Software Dock provides a context in which to situate a software system at a site. As with its hardware counterpart, "docking" a software system involves protocols for interrogating the local environment for its properties and adapting the software to that environment. But a significant difference from hardware docking is the fact that both the environment and the software system are malleable. For example, if a required component is not found at a site, then that component can be added dynamically to the site to satisfy the needs of the software being docked. Alternatively, a more appropriate version of the software system itself can be obtained dynamically and installed. This allows installation to become a process of negotiation between a producer and a consumer. Moreover, the mutual adaptation process can continue beyond the initial docking to provide a perpetually evolving combination of system and environment.
The Software Dock is a system of loosely-coupled, cooperating, distributed components that are bound together by a wide-area messaging and event system. The components include field docks for maintaining site-specific configuration information by consumers, release docks for managing the configuration and release of software systems by producers, and a variety of agents for automating the deployment process. Both the information about releases and the information about field sites are represented as hierarchies of data that, when combined, form a federated software deployment registry with a conceptually global name space. Events generated by operations on the hierarchies propagate throughout the federated registry and are received by interested agents. The agent technology enables concomitant actions to be automatically performed in response to those events.
This paper has the following organization. Section 2 introduces the basic terminology and background of the research area. Section 3 presents factors that are motivating research in the area of software deployment. Section 4 summarizes the characteristics and capabilities that are required in a software deployment solution, while Section 5 summarizes related work. In Section 6 the initial results of our research is presented which includes a discussion of the proposed Software Dock architecture and a prototype of the architecture. Section 7 presents the research plan for further exploring the Software Dock as a means for supporting software deployment and a plan for evaluating the Software Dock. The conclusions are then presented in Section 8.
This definition is intentionally vague in order to fully capture the scope of software deployment. The assembly and maintenance can be thought of as the specific deployment process being performed, such as installation or activation. The collection of all software deployment processes or activities form the basic deployment life-cycle of a software system. [Figure 1] These processes are, in fact, instantiated from a specific deployment policy. Policies can be thought of as a parameterized or generic process. In general, a process is concerned with what actually needs to be done, whereas a policy is concerned with how it is done.
A resource is anything that is needed to enable the use of the system. Examples include shared libraries, disk space, and component systems. A system generically refers to an artifact or collection of artifacts to be made available at a site. Some basic artifacts are binary executables, data files, and documentation. A version of a system refers to both time-order versions of an evolving system and to platform-specific and functional variants.
Once a system has been deployed it is available for use at a particular site. The term "use" is dependent upon the type of system that was deployed. An executable will be executed, a collection of Web pages will be viewed with a browser, or a complex distributed system may have servers that need to be started. The site that is the target of the deployment is generally referred to as the consumer or field site. The site where the system originates is generally referred to as the producer or release site. It is generally assumed that deployment will involve some form of transfer or copying of resources from a producer site to a consumer site. The resources in this instance may be the actual system or just the knowledge of how to access the system. By and large, site is used to refer to a single node connected to a network, but it is not limited by this usage and may indeed refer to some sort of collection of nodes working in a coordinated fashion.
Release: The release process is the interface between the development process and the deployment process. It encompasses all the activities needed to prepare and advertise a system so that it can be assembled correctly at some consumer site. The notion of advertising includes the dissemination of sufficient information to interested parties and providing access in some form so that they can perform the follow-on installation activity.
The release process must, in some form, package all of the knowledge about the software system to be deployed, processes to perform deployment tasks, and the actual system components. The deployment processes may be specific to the software system that is being deployed or they may be some generic processing engine, such as a scripting engine. The information in this package should include a description of the system including its dependencies and constraints in order to manage the deployed software on the consumer site.
Installation: The installation activity covers the initial deployment of a software system onto the consumer site. It is usually the most complex of the deployment activities because it must find and assemble all the resources necessary to use a system. The installation process uses the package created in the release process above. Given a package the installation process interprets the encoded knowledge in the package and then examines the target site in order to determine how to properly configure the software system to the specific target site. Once installation is completed the deployed software system is ready to be activated.
Activation: Activation refers to the activity of starting up those components of a system that must execute in order for the system to be usable. For a simple tool, activation involves establishing some form of command (or clickable graphical icon) for executing the binary component of the tool. For a complex system, there may be components that must run continuously in order for the system to be usable. Examples of the latter might be various servers and database systems needed by other parts of the system. Note that the installation process may actually use other systems and may therefore need to activate various tools in order to complete the installation of its system. For example, if a system has been packaged as an archive file, the installer must be able to activate the unarchiver tool to extract the pieces of the system to be installed. If an unarchiver is not available, then a recursive installation may be required to obtain and install the unarchiver tool.
De-activation: De-activation is the inverse of activation, and refers to the activity of shutting down any executing components of an installed system. De-activition may be required in order to perform other deployment activities, for example, before an update can be performed the software system may need to be de-activated. As a result, any servers or additional tasks that were performed during activation need to be returned to a mutable state.
Update: The update process involves modifying a software system that has been previously installed on a consumer site. The update may be the result of the release of a new version of the software to fix a bug or add new functionality. From an abstract perspective, installation is a special case of the update process where there are no existing components on the consumer site and, thus, everything must be updated. Update may normally be less complex than installation because it can often rely on the fact that many of the needed resources have already been obtained during the installation process. Typically the deployment life-cycle includes a repeated sequence in which a system is de-activated, an updated version of the system is installed, and then the system is reactivated. For some systems, de-activation may not be necessary and update can be performed while a previous version is still active.
Adapt: The adapt process involves modifying a software system that has been previously installed on a consumer site. Adapt differs from update in that updates are instigated by remote events whereas adaptations are instigated by local events. For example, if the configuration at the consumer site changes in a way that affects the deployed software system it may be necessary for the deployed software system to take some sort of corrective action. In such a situation the software system is an active participant in its own management, adapting to its environment as it changes.
De-installation: At some point, a system as a whole is no longer required at a given consumer site and the system will be de-installed. De-installation is not necessarily a trivial process. Special attention has to be paid to shared resources such as data files and libraries in order to prevent dangling references to the required resource. De-installation is therefore not the process of undoing everything that was done in installation, rather it is examining the current state of the system and its dependencies and constraints and then removing the specific software package in such a way that it will not violate these dependencies and constraints.
De-release: Ultimately, a system is marked as obsolete and support by the producer is withdrawn. De-release is distinct from de-installation in the sense that it makes the software system no longer available for installation at consumer sites, but it does not remove it from consumer sites that are using the software. Consumers of the software may continue to use the software without knowing that it has been marked as obsolete, but at the very least the de-release process should attempt to notify current users that support for the software has been withdrawn.
As networking, including local area, intranet, and internet, becomes the norm, software system complexity continues to grow. Distributed technologies are being combined with component technologies, such as CORBA and JavaBeans, creating potentially unmanageable relationships and dependencies among subsystem components. The ability to locate or even ensure that all necessary system dependencies are in place is a complex task that is not fully supported. Additionally, if a deployed system is a cooperating, distributed system, there is little if any support for deploying such a system where coordination of many possible servers over a collection of nodes is required.
The widespread popularity of the Internet has demanded that software producers rethink their deployment activities and new issues have arisen as a result. Most of these issues are concerned with the support of electronic commerce over the Internet. Providing secure distribution, licensing, and billing of software and services is a growing concern because the Internet has created a virtual marketplace. This virtual marketplace is demanding a software solution to what used to be largely a physical world scenario. Users want to purchase, install, maintain, and support their software systems via the Internet. As a result it is becoming increasingly difficult for software producers to develop their software systems without taking these issues into consideration.
A byproduct of the connectivity offered by the Internet are the new interaction possibilities between software producers and consumers. Through this connectivity producers can receive feedback during a software system's life-cycle and fashion responses to this feedback. This requires that software deployment systems support an aspect of deployment that was not directly available in the past. The consumer's expectations, by having this level of connectivity, are increasing and require a new level of quality of service from software producers. For this reason it is imperative that deployment activities become an integral part of the software development process and, as such, have a full set of tools and infrastructure to support them.
Systems such as rsync [25] provide content mirroring techniques to deploy a set of files on multiple machines with a specific purpose of keeping them synchronized with the release site. Other systems like Castanet [13] provide various content delivery mechanisms based on the multi-cast distribution through publish-and-subscribe paradigms. In other cases, deployment systems, such as installation systems, have been extended to integrate with Web browsers and use the browser's built-in content delivery mechanisms [1].
As alluded to above, system installation and update are two distinct processes where update has not been given the same focus as installation. The popularity of the Internet has changed this focus though. Since many installation systems have been extended to include installing software over networks it became apparent that installation was just a special case of update; an install is just an update where none of the software system's components are present on the target site. This realization has led certain systems like Castanet and OpenWEB netDeploy [20] to specifically address both the install and update processes in their solutions.
As the efforts in other areas of software deployment capabilities and activities continues to grow and become successful, network management will likely reduce its concentration on software deployment activities and focus more clearly on issues that fall outside of the software deployment life-cycle. These areas may include various system run-time management requirements, software monitoring, load balancing, and error recovery.
The vision of this proposal is to describe and provide a complete, unified software deployment framework. A complete, unified software deployment framework is a missing link in the current state of affairs in the software producing and consuming communities. The goal is to redefine the current, implicit limits of software development to include the notion of the software deployment life-cycle as an integral part of the software development process. By combining the software development life-cycle with the software deployment life-cycle a complete software life-cycle is created.
It is a natural progression to include software deployment activities as an extension to the responsibilities of the software producer; it is the software producer who has the required knowledge of the software system in the first place. Much like any other manufacturer is responsible for the ongoing proper functioning and repair of the items they produce, it is also the responsibility of the software producer to ensure the proper functioning and repair of the software systems they produce. In the past this burden was left to the consumer, but the consumer is starting to demand a higher level of quality of service. Recent advances in connectivity and enabling technologies have made it possible for the software producers to consider such an increase in quality of service.
It is our belief that it is necessary to create an environment where both the producer and the consumer concerns are brought together. This environment should facilitate communication and open negotiation between producers and consumers to perform the common goal of making software deployable. As such this environment represents a direction statement for creating deployable software systems for the future.
Given such an analysis, it is clear that there are three main components that can be abstracted by a software deployment system: consumer, software system, and process. By using these different levels of abstraction, a means for evaluating the level of support provided by a particular deployment system can be defined.
Mechanisms such as Gnu's Autoconf [12] and the Microsoft Registry [9] show two examples of consumer abstraction. Autoconf is used to produce a single program, "configure", which dynamically computes a consumer abstraction. The Registry, in contrast, is a passive repository containing the consumer abstraction. In either case, the deployment process is simplified since a producer can construct installation scripts that are parameterized by common information available from the abstraction. It is important to note, though, that these two examples specifically target different operating system platforms and, as such, do not provide a single, common interface.
The consumer abstraction largely provides querying or discovery mechanisms for a site. These particular mechanisms provide access to information about the configuration and resources at a site. The consumer abstraction is not limited to only obtaining descriptive information and may indeed abstract processes that have common functionality across all targets, such as file system access.
In the simplest case where a software system is a single executable and possibly some data files, then describing it completely is a simple process. The software system description is nothing more than an inventory of files, documentation pointers, contact pointers, and platform requirements.
Complex software systems pose the biggest challenge and have the greatest need for the software system abstraction. A complex system may be composed of multiple, distributed components where subsystem dependencies between components are explicitly required. This type of dependency and constraint information can have a direct impact on how some deployment processes, such as activation, are performed. It may also be possible that a software system has variant configurations that are dependent upon the resources available at the target site, all of which needs to be captured in the software system abstraction.
The software system abstraction is not limited to executable software systems. Software systems solely based on data must also be covered by the abstraction. A good example of this type of system is a collection of Web pages or any other document-based system.
For example, the update process has a distinct set of steps that it must take to actually perform the update. These steps include examining the target site's configuration, retrieving the necessary updated artifacts, and properly modifying the target site. Every update process will perform something similar to these basic steps. A policy could be used to determine how these steps are carried out. A policy might indicate that updates should only occur during non-business hours or that someone in the system administration department must first approve updates. Given this distinction, processes can be thought of as being parameterized by policy decisions.
At this time there are no known examples of a system that performs this level of abstraction in the general case.
For evaluation purposes, the broader the process coverage provided by a software deployment system the better the solution. It should be noted, though, that process coverage is not completely orthogonal to the abstractions describe above. Generally speaking, a very well defined set of abstractions is necessary to provide generic, broad process coverage.
Internet-scalability: The explosive popularity of the Internet has created a new, lucrative environment for software deployment. It is imperative that any proposed software deployment solution explicitly support large numbers of producers and many consumers, distributed over large geographical distances. The scale of the Internet is many magnitudes larger than local-area networks and organizational intranets and therefore requires that special attention is paid to its requirements.
Raise Abstraction Level for Software Deployment: Many of the current software deployment systems perform a specific process of the software deployment life-cycle or a subset. Unfortunately, these systems are usually limited to perform only the specific task or set of tasks that were originally intended by the deployment system developer. Support for deployment tasks other than those considered by the deployment system producer is usually in the form of allowing access to the Turing machine (i.e., the underlying computer). This is not a valid level of support. A software deployment solution must provide a framework that raises the level of abstraction for performing deployment related activities. Raising the level of abstraction will enable efficient and timely solutions to other deployment activities without requiring the reinvention of tedious, error-prone infrastructure.
Provide Unified Access to Procedural Resources: Most software deployment activities require more than just declarative information to be accomplished. Nearly all deployment activities require some sort of processing to be performed on the consumer site. Therefore a software deployment system cannot be considered complete unless it makes some attempt to provide controlled access to procedural resources as well as declarative information. Not only is consumer-side processing of this form necessary for completing most deployment tasks, but a unified approach to information and resource access can provide a great deal of security by limiting access to consumer site resources.
Explicit Bi-directional, Semi-continuous Communication: The importance of bi-directional, semi-continuous communication between producers and consumers, via the Internet, must be exploited. The connectivity afforded by the Internet enables producers and consumers to participate in a symbiotic relationship where information flows between the two participants. This level of cooperation has not been possible in the past. By incorporating bi-directional, semi-continuous communication support for the full software deployment life-cycle is enabled by allowing producers to monitor changes to a consumer site that may affect the producers software and, thus, to take corrective actions. In turn the consumer is able to receive direct notification of announcements, such as bug fixes, that are generated from the producer's site and to give feedback to the producer.
Autonomy: Since organizational boundaries and cultures are very distinct, it is of tantamount importance that a software deployment system create an environment in which those differences can coexist. In particular these difference exists from consumer to consumer and from producer to producer. Consumers should be able to control how their site is accessed and how deployment processes are performed. Producers should be able to define their deployment processes without regard for how dependent component organizations perform their deployment processes.
Platform Independence: Many systems address particular aspects of software deployment, some systems even come close to addressing the entire software deployment life-cycle, but no systems attempt to do so in a platform independent manner. Platform independence is a necessary prerequisite largely inspired by the global marketplace created by the Internet. In an effort to take advantage of such a market it is necessary to not make limiting assumptions about who or what the consumer may be. A consumer side abstraction is directly related to platform independence, though the two are not equivalent.
Additional consideration will be paid to those systems that actually raise the level of abstraction for software deployment. Some systems claim to provide support and coverage for all of the software deployment life-cycle, but in reality they merely provide access to the underlying machine. The claim that such a system actually supports software deployment is suspect since this is a variant of the "Turing Tar pit" argument, where one can claim to do anything if one can execute an arbitrary program (i.e., Turing machine).
Another consideration is that many software deployment systems have created a standard consumer abstraction by forcing their number of target sites to be one. That is, many systems target only Windows 95 systems, for example, and assume all such systems are effectively identical, at least with respect to deployment activities. As an aside, the fact that these systems are often not identical leads to many software deployment problems. Given such a limitation, a software deployment system employing such an approach cannot be characterized as having a generic consumer side abstraction.
Point-Cast and ZIP delivery provide news multi-casting services, they are somehow an evolution of the Internet News system [11]. Unlike Internet News, they don't support a bi-directional communication, but they provide news and advertisements through a programmable active receiver application that can be configured to poll the news server for new information. The receiver application has also a library of local display capabilities that enable a graphical presentation of the information that are received.
A consumer can determine which data he wants to receive by subscribing to a number of "channels", possibly from different producers. The subscription or some configuration on the consumer site determines how often or in response to which event the channel has to be updated.
This same publish/subscribe protocol is adopted by Castanet. Castanet is another content deployment system that has some additional features to deal with applications rather than news. A Castanet channel is in essence a set of files. On a regular basis, depending on the configuration of the channel on the consumer site, the consumer pulls an update for that channel. In addition, Castanet enables the producer to customize the channel with a channel plug-in; a channel plug-in is an application that manages the communication with the consumer and interacts with the tuner at the consumer site.
A fat web page [4] is an HTML document that contains embedded information intended to be installed in a database running on a local machine. The embedded data can be program logic, text, outlines, user interface elements, or arbitrary binary information. The goal of fat pages is to simplify the goal of distributing and installing small artifacts. Unfortunately, fat pages are only good for small artifacts and provide no real support for performing any of the deployment processes.
Rsync is a file synchronizer that synchronizes a set of files. The rsync operation involves two machines, the source machine, that has the files or the new versions, and the target machine, where the set of files must be deployed. Rsync can be invoked by either the source or the target.
These systems evaluate at a low level with respect to the abstraction level they provide for either the consumer or product side. For example, if a software system has any complex dependencies on consumer side parameters or other software systems, none of these approaches directly support these dependencies and deliver the same set of files under all circumstances. What configuration that does exists on the consumer side deals more with the communication mechanisms and does not specifically provide a consumer site model.
Content delivery may be treated as the simplest possible form of installation in which no target or product specific computation is carried out. It is also worth noting that these content delivery systems have adopted quite different technologies for carrying out the data transfer function with an eye toward making them scaleable and efficient, especially in a low-bandwidth or costly network environment.
As such, NET-Install provides a simple consumer side abstraction by providing a standard mechanism for obtaining some target site information. The target sites, though, are limited to those running Microsoft Windows. Additionally, a limited deployment system abstraction is provided by the files, dependencies, and constraints listed in the package definition. The usefulness of these abstractions are severely limited by their simplicity. In other words, these abstraction cover most simple installations but coverage would degrade as the complexity of the installation increased, such as in installation a distributed system.
OpenWEB netDeploy [20] is a deployment system that supports the release, installation, update, and de-installation activities of software deployment. OpenWEB netDeploy creates a deployment package which is merely a list of the files (either embedded or URLs) that comprise the system to be deployed. Update is also supported whenever the deployed system is executed by retrieving the latest version of the software system if it has changed. OpenWEB netDeploy is enabled through browser helper application/viewer technology. A Launcher utility on the consumer site is used for updating and executing the deployed system. Conditional configurations are possible based on limited consumer site configuration and file existence querying.
OpenWEB netDeploy provides a consumer site abstraction in the form of its Launcher/browser helper application combination. The completeness of the consumer site abstraction is very limited, but it does support multiple operating systems. The provided deployment system abstraction is much more constricting than the consumer site abstraction; all deployed systems are views as independent, finished file sets. Recent extension have added the ability to specify dependencies between file sets.
InstallShield [10] is a deployment system that supports the installation and de-installation activities of software deployment, though it does not necessarily support Internet-based deployment. Generally speaking InstallShield is a tool for building scripts to install Microsoft Windows-based software systems. InstallShield also provides the capability to create a single executable installation package that could be distributed over the Internet. Consumer site abstraction is provided through various mechanism to query and interact with the target site, though still limited to Microsoft Windows platforms. The deployment system package describes the system to be deployed allowing for conditional components, constraints, and dependencies. InstallShield only supports a limited set of deployment activities (i.e., installation and de-installation).
Oil Change [19] is a system for providing software updates to your computer via the Internet. Oil Change examines a consumer's site to determine all the software and versions of the software. Using this list Oil Change examines a "master list" of software and available updates; this master list is maintained by CyberMedia, Oil Change's producer. The automatic installation of updates is supported as well as notification of new updates. Oil Change does not support deployment processes other than update and its centralized architecture is a clear scalability issue.The FreeBSD porting system [7] supports the FreeBSD user community by organizing freely available software into a carefully constructed hierarchy known as the "ports collection."
The FreeBSD porting system uses various forms of heuristics to determine a site's state and employs the results in building and installing a software package. The primary flaw in the system is that it embeds dependencies and other knowledge into Make files which makes it difficult to locate and manage information about software systems. The deployment process support is also limited to installation and de-installation.
There are also a number of formalisms for describing systems and sites for deployment purposes. The Desktop Management Task Force (DMTF) is the major organizational force here and is pushing a standard called the Management Information Format [3] (MIF) for specifying various properties about both hardware and software. Tivoli's Application Management Specification [27] (AMS) is derived from the MIF. It specifically targets the description of application software systems. The Simple Network Management Protocol [2] (SNMP) defines a standard for defining schema information about network components, primarily hardware components. In terms of abstraction, these systems are used to specify both the site abstraction and the software system abstraction. None of these systems specifically cover a particular deployment process.
SMS from Microsoft, TME-10 [26] from Tivoli, Netview from IBM and OpenView [8] from Hewlett-Packard are representative of a number of complex, network management systems. Their original purpose was to support the management of corporate local-area networks. They had specific capabilities such as detecting hardware failures, network disruptions, and reporting problems for examination.
Recently, these systems have ventured beyond hardware and have begun addressing the problems of software management, including some parts of the deployment process. As a rule, these systems assume a homogeneous set of target sites within the corporate local-area network. Additionally, there is usually a logically centralized "producer", which is some designated central administration site (possibly multi-machine) for all officially approved system releases.
With respect to deployment life-cycle support, these systems support essentially all of the life-cycle activities. This is tempered by noting that they are oriented to the deployment of more-or-less standalone tools with few inter-system dependencies and with no complex activation requirements. These systems do not provide much support in the way of producer-side abstractions, again because the products are mostly standalone, and the consumer-side is of medium complexity because of the imposed homogeneity.
A specific capability of note is inventory, which may be considered a subpart of installation. These systems are capable of scanning a target site and determining the set of installed systems, and sometimes even the installed version of systems. This information is then brought back to a repository at a central site. Another capability of note is their ability to deploy software to a large number of targets. This is important for organizations that have networks of thousands of machines. Of course, this is partly made possible by the homogeneity of the targets.
There can be many field and release docks representing the interests of the many possible participants in the deployment process. Tying them together is WAM/E, which provides bi-directional communication pathways. Agent technology is used to provide a means of dynamically distributing functionality and enabling consumer-side processing of events on the behalf of producers. The following subsections describe each of these components in more detail.
It is important to point out that this proposed system, the Software Dock, is not intended as a legacy solution to software deployment problems. While current, real-world examples and systems will be used to illustrate the software deployment dilemma, applying the Software Dock to legacy systems is not necessarily where the computing community (i.e., producers and consumers) will see the greatest benefit. Rather it is when new software systems are designed to be Software Dock aware that the greatest benefits will be seen. Therefore the Software Dock proposes a direction for creating deployable software systems and facilitates the inclusion of software deployment as an integral part of software development.
The registry is organized as an n-ary tree. The tree model was chosen mainly for its simplicity, but also because it subsumes relevant existing models, such as the DMTF MIF format [3], the Microsoft Registry [9], the X resources model [23], and most file systems.
The schema of the registry is kept consistent across sites to ease the development of agents that access the information. Each tree node is a collection of name-attribute pairs. The names associated with an attribute are associated with the parent node and not the attribute itself. An attribute can be a primitive scalar type, such as a character string or an integer, or it can be a collection of attributes. Custom collection types can be created to facilitate schema definitions. The type of a collection is exploited to specify the structure of the sub-tree under that node. For example, a collection of type "Application" adheres to a specific sub-tree schema definition that is used to fully describe a software application.
Custom collection types are described as minimal schema definitions because they can be arbitrarily extended with additional attributes at run-time for proprietary reasons without affecting the type of the collection. This allows for schema augmentation without disrupting the behavior of agents.
Schema definition is critical to the Software Dock. Schema definition provides the backbone for platform independence. In addition, standardized schemas make it possible to create standard process definitions to perform generic deployment tasks. While initial schema development effort will build off of work done in DMTF's MIF [3] and AMS [27], the end result must define a direction for deployable software systems to pursue. Thus the goal of the registry schemas is to define how software systems should be defined for software deployment, not how to create schemas to deploy current software systems.
The registry typing system, as described in the introduction of this section, facilitates standard schema definitions. The registry typing system, based on the sub-tree structure definition of a registry collection, can be leveraged as part of the solution to the software deployment dilemma. The Software Dock system will include classes of standard registry types or schema definitions. In particular the field dock registry will include schemas for describing a target consumer site including its configuration, resources, and constraints. The release dock registry will include schemas for describing software systems including their components, the semantics of their components, constraints of the system, and dependencies of the system.
Event Propagation. The process of local event delivery and propagation is the responsibility of the field dock. When an event is generated the dock sends the event to any local agent that has subscribed to that event. In order to subscribe to an event, an agent uses the dock's event interface to tell the dock the type and the name of the event in which it is interested.
Controlled Site Access and Abstraction. The final function performed by the field dock is controlled access to the underlying site. All operations that can be performed on a site are directly exported in the field dock's interface or they are indirectly exported through specific agents performing a defined task in response to the occurrence of a specific event or event pattern. Examples of the former include the registry interfaces and the event interfaces of the field dock described above. An example of the latter is an agent that resides at a site and adds an icon to the desktop whenever an application is added to the site. This agent provides an indirect interface to the user's desktop. The indirect interfaces created by the field dock's registry and event system can be quite sophisticated. For example, an agent could create an indirect interface to the site's file system by registering for specific events that semantically denote file operations, and then map these events into the file system itself.
As a result of these controlled interfaces the consumer site is afforded some level of security. Agents that come from external, untrusted sources will only be allowed access to these controlled interfaces. The consumer has full control over which interfaces are made available and therefore can control what operations agents can or cannot perform.
A software release in the release dock's registry is a collection of artifacts, such as executables, libraries, documentation, dependencies, and constraints. It also includes the agents responsible for all of the deployment activities, including configuring, installing, maintaining, and de-installing the software.
Like the field dock, the release dock generates events when operations are performed on its registry. These events are used to indicate changes in the state of releases, such as the release of a new software system, a new version of an existing system, or a patch to an existing system.
Organizations use a release dock much like current FTP sites are used for distributing software, though the release dock is more sophisticated. The release dock may provide a user interface, perhaps through a Web page, to allow consumers to browse the available releases. The release dock, however, does not distribute software releases directly. Instead, when the consumer initiates a download, the release dock sends an agent to the consumer's site. This installation agent is responsible for installing the requested software release by interacting with the consumer site's field dock to obtain the appropriate configuration information. Once the configuration information is obtained, the agent retrieves the properly configured components from its release dock and installs them at the field site.
From the perspective of a particular site, there are two classes of agents: internal and external. Internal agents extend the functionality of a local dock and, to some extent, are trusted at that site. External agents are obtained from remote sites to perform some particular function on behalf of a remote organization. Therefore, external agents are not trusted to the same extent as internal agents.
An external agent that comes from a release dock is generically referred to as a deployment agent. The most common deployment agent is an agent responsible for installing a software system. This installation agent is typically downloaded from a release dock directly by a consumer or indirectly by another agent. The downloaded installation agent then proceeds to install the software on behalf of the consumer, using the mechanisms provided by the Software Dock. When an installation agent installs a software system, it may additionally install other deployment agents at the site to perform tasks such as updating the software when updates become available.
In contrast to external agents, which mainly perform deployment activities, internal agents provide three major capabilities: viewing, abstraction, and isolation. A viewing agent provides a user interface for accessing, browsing, or manipulating a site's registry. A specific possibility could include an agent that provides a graphical interface for accessing applications installed at the site, or one that adds an entry to the Windows 95 start menu whenever an application is added to the site. To perform these tasks, an agent needs only to register with the field dock for the specific application events.
Other internal agents define abstract interfaces for operations whose implementation requires site-specific knowledge. They accomplish this by subscribing to selected events and, in response, by performing site-specific actions. In effect, they provide an extended, controlled interface to the underlying site for use by other agents, typically deployment agents. A good example of an internal agent is an agent that creates an indirect interface to the local site's file system. This type of agent registers with the field dock for events that semantically indicate file operations, such as a root directory node for an application. Events that occur under this registry node are semantically equivalent to the creating and updating of sub-directories and files in the site's file system.
In effect, an internal agent that provides file system access allows external agents to write to the local site's file system without having direct access. Thus, internal agents support the isolation of potentially untrusted external agents from the local site's resources. This adds an important degree of security and access control to the whole deployment process.
Field and release dock implementations are accompanied by a variety of predefined agents for performing specific tasks. Generic agent classes are also provided for performing simple software system installations and updates by interpreting the standard schema in both the field and release dock registries. It is also possible to create agents from scratch using the interfaces provided by the field and release dock servers.
WAM/E is the component responsible for propagating events across a wide-area network to sites interested in global events. Its general operation is similar to that of the local event propagation mechanism. A difference is that the subscriber to locally propagated events is an agent, while docks themselves are the subscribers to globally propagated events, which in turn propagate the events to local agents. Thus, WAM/E provides an interface by which a dock can register interest in events. It also provides an interface by which a dock can inject an event consisting of a type, a name, and an attribute list.
It is also likely that WAM/E will need to participate in artifact propagation. Certain events, such as update notifications, have an associated artifact (i.e., the update patch) that may need to be delivered along with the event. Various options from physically including the artifact with the event to caching the artifact within the WAM/E infrastructure are possibilities that need to be explored. WAM/E also needs to address the issues of event visibility and scoping. It is necessary to limit the visibility of events for performance as well as privacy reasons. WAM/E is the subject of further research [22].
The registries have been sparsely implemented in order to support only the most basic requirements to prove the concepts described in this proposal. Simple support for registry manipulation and event generation is included. The schemas of the registries have not been rigorously defined and only include basic declarative information that pertains solely to the demonstration scenario that was used as a motivating example. There is no implementation of a WAM/E-like component in the current prototype.
This prototype has been successfully used to demonstrate the deployment of an actual software system called OLLA, Online Learning Academy. OLLA is an HTML content-based system occupying 45 megabytes in over 1700 files. OLLA is intended to be installed locally at a consumer site and is functionally dependent upon two other software systems. In order to be fully installed, OLLA requires various client-side configuration and processing. In addition to installation of OLLA, the Software Dock prototype was used to demonstrate a system update cycle. These initial results have indicated that the Software Dock framework is a viable approach and that further exploration is needed to verify the full extent of its utility.
Additionally, an initial version of WAM/E must be introduced to connect the field and release docks at the Internet-level. This initial version will either be implemented from scratch, a modification to an existing event system, or the product of a related research effort [22]. The implementation of a WAM/E-like component is a research issue in its own right, but its inclusion in the Software Dock architecture is critical to support Internet-scale, bi-directional, semi-continuous communication between producers and consumers.
Further validation of the Software Dock as a software deployment solution will be provided through the creation of Software Dock solutions to real-world software deployment scenarios. These solutions may then be used as a means to gauge various issues of scalability and performance of the Software Dock by varying certain parameters in the problem space. The recorded results will show whether the Software Dock solutions perform within an acceptable range of what is considered reasonable for such a deployment task as well as show whether the solutions scale at some reasonable, non-exponential rate.
To gather such metrics, each activity in the software deployment life-cycle must be treated independently since the parameters of the problem space that affect any given software deployment activity are independent of the other activities. As an example, the size and the number of artifacts to deploy directly affects the install and update activities. On the other hand, the amount of event traffic directly affects the update and adapt activities, but not the install activity. Therefore, by varying specific parameters for specific deployment activities it can be determined whether the overhead associated with the Software Dock solution scales at a pace no greater than the increasing scale of the parameters themselves.
A more complete evaluation of the Software Dock with respect to other deployment solutions will be provided through critical evaluation of the Software Dock and other deployment solutions with respect to the characterizations and required capabilities described in Section 4. The issues described in Section 4 are critical in providing a software deployment solution that will meet the needs of producers and consumers into the future.
These issues will be used as a sort of litmus test to eliminate certain software deployment solutions as viable alternatives. For example, systems that provide limited software deployment process coverage, offer a very simple abstraction for a deployable artifact, or assume homogeneity of platforms shall be eliminated as viable approaches since they do not meet the requirements set forth in Section 4. Any systems that have not been eliminated through the critical evaluation phase will be further evaluated to determine if there are any other limitations or shortcomings that have not been exposed by the issues of Section 4. The systems that remain after this evaluation phase will be considered viable software deployment solutions.
The Software Dock is an example of a framework to support software deployment.
The Software Dock creates servers, the release and field docks, that provide
abstractions for the two participants in software deployment, the producers
and the consumers respectively. The connectivity of these producer and
consumer abstractions is leveraged and the introduction of agent-based
technology serves as a mediator between the two participants to perform
software deployment activities. Software deployment will then become a
mutually evolving process of negotiation and cooperation between the producer
and the consumer.