Saburo: A Framework for Development of Concurrency and I/O in Servers | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Joint work between: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gautier Loyauté | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Rémy Forax | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gilles Roussel | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Introduction | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This work focused on the development of usable development tools for Internet servers and middlewares. The development of such applications become more and more complex because of the increasing demand for scalability (tens of thousands of simultaneous client connections), effectiveness (minimization of latency and maximization of bandwith), dynamicity (elements can be added or removed at runtime) and dynamic variability (the ability for elements to evolve or change at runtime). The main goal of this project is to build an environment capable of helping development of such applications, trying to take into consideration previous demands, avoiding the concurrent development problems (deadlocks, data races) and finally, verifying abstracted version of an Internet server using formal verification method (like model checking). Our approach is based on the decomposition of Internet servers into communicating automata represented as directed graphs. Breaking a complex software down into communicating pieces may dramatically diminish its complexity. The vertices of the graph, described with Java code, specifies a small part of an Internet server leading by zero or one I/O call (such as reading on a socket). The vertices interact by sending and receiving communication events in order to realize the expected global behavior. The edges of the graph, automatically generated, specifies the way to reach successor(s) in the graph. Our current prototype of Saburo development model is implemented entirely in Java and uses the JDK 1.4 java.nio package to provide nonblocking I/O support and JDK 1.5 java.util.concurrent package for utility classes commonly useful in concurrent programming. The Saburo software and several examples are available for download below. We have built a number of applications to demonstrate the usability and robustness of the Saburo framework. Ichimonji is a high-performance HTTP server using Tatoo, an innovative parser generator to decode requests from clients. Other applications include an Echo, Daytime and UpperCase servers. The best place to start for more informations is the IWJPDC 2006 paper on Saburo and the corresponding talk slides. If you have any questions, comments or are interested in collaborations, please feel free to contact my be e-mail at loyaute[at]univ-mlv.fr. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Approach | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The main idea of Saburo is to model a server as a directed graph. The vertices of this graph, called stages in reference to the Staged Event-Driven Architecture, correspond to atomic units of treatment. They consist of a sequence of instructions without blocking code leading to zero or one I/O call. The edges, called context, is equivalent to the way to reach successor(s) in the graph. In other words, the contexts are the channels used to connect the stages between them and may be function calls, local blocking queues or networks connections. In order to exchange data, the stages must define their input and output interfaces, respectively used for receiving and sending data. More exactly, there are three kinds of stages:
The construction of this graph relies on enumeration of all I/O and synchronization calls that a server must perform to provide a service. This concept of blocking graph was already used for thread scheduling at runtime vonb03b. Further works use a communicating graph to develop Java concurrent applications bern05 or an I/O automata for developing distributed systems garl00. The directed graph used in Saburo is a combination of these approaches to design and build concurrent and distributed applications. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Architectures available in Saburo | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In order to minimize latency and to maximize throughput, a server must interleave the handling of several requests. The strategy used by the server, to overlap disc, network and processor(s) activities induced by several requests, is determined by its concurrency model. In the following, we briefly describe those available in Saburo. Iterative Architecture:
The iterative architecture uses a single process and blocking I/O calls to handle several requests. Then, it carries out all the steps required by a request before accepting a new one. It respects the policy first in, first out and the network is used as a request buffer. ![]() This model does not exhibit real concurrency, because it does not interlace the handling of several requests. But it is very easy to develop and may be used for servers sporadically accessed. Single-Process Event-Driven Architecture:
Multi-Process Architecture
The multi-process architecture assigns one process to each request. Each process carries out sequentially all steps necessary to serve the client request before accepting a new one. Since several processes may be used on modern operating systems, many requests can be served at the ``same time'' on the server. ![]() Moreover, the management of the overlapping between disk activity, processor and network access is provided, in a transparent way for the programmer, by the underlying operating system. Indeed, it manages the concurrent accesses to these resources and schedules processes (i.e it switches to a runnable process when the active process blocks). In this model, the processes have their own memory space that usually avoids the use of the synchronization primitives but it induces difficulties to share information. However, it may be more difficult to perform optimizations that rely on global information such as shared cache and require some synchronization mechanisms to control their accesses. These synchronizations involve time overhead, greater difficulty of programming and safety problems in servers. It also implies a containment of the processes and thus greater safety of the server. Process creations induce time overhead for each request and require techniques, such as pre-created processes at the initial time (process pool), to alleviate the costs of founding. Multi-Threaded Architecture
Similarly to the use of processes in previous architecture, in the multi-threaded architecture a thread is in charged of one request before handling a new one. ![]() The primary difference between the two architectures, is that all threads can share their address spaces and it is very easy to pass information through global variables. However, some synchronization mechanisms are required to control their accesses. These synchronizations involve time overhead, greater difficulty of programming and safety problems (deadlocks) in servers. Moreover, the cost of thread creation, although less important than process one, and techniques to address it remain. Also when the number of threads reaches a particular threshold, heap size, cache misses, lock contention and scheduling overhead can lead to an important fall of performances \cite{wels01}. Another problem of multi-threaded architecture is related to thread implementation library which depends on the underlying operating system. Indeed, when one thread blocks (mutual exclusion, I/O operation), other runnable threads within the same address space must remain eligible for execution. Consequently, underlying operating system must provide kernel-space thread library to allow this behavior. In the other case, i.e operating system only providing user-space thread library, multi-threaded architecture cannot be effectively and efficiency supported. Asymmetric Multi-Process Event-Driven Architecture
The Asymmetric Multi-Process Event-Driven (AMPED) architecture tries to answer previous problems of disk I/O which are not always asynchronous \cite{pai99}. This architecture mixes the SPED architecture with threads in charge of performing the blocking disk I/O. ![]() In AMPED, the main process performs all instructions except disk I/O and delegates them, via an inter-process communication (IPC) channel, to dedicated processes (I/O handler)). Through the same way, the I/O handler returns a notification event to the main process. Moreover, the selection mecanism selects, like any other I/O completion, these notification events. This architecture preserves the effectiveness of SPED architecture while avoiding the blocking disk I/O problem. In addition, the number of processes involved in I/O must minimize cache misses, context switching and scheduling overhead and depend on potentially managed requests. Pipeline Architecture:
Staged Event-Driven Architecture:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Development process | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In order to implement an Internet server using Saburo, the developer
must provide the directed graph and the business code, i.e the stages
of an Internet server. Following a developer request, the framework
produces two different files. The first one is the Internet server
implementation itself generated according to the concurrency model
chosen by developer either in Java or directly in bytecode. The second
one is the Promela translation also generated according to the
concurrency model. The user may check the specification, by using the
Promela file as input to the SPIN
model checker. The model checking step is done in a separate
process and could be omitted by the user. The figure below depicts the
development process: ![]() The Saburo framework provides an API that simplifies I/O implementation and defines various objects like as thread pool and cache to help the servers and middlewares implementation. It also relies on several external tools: Apache Velocity, a Java-based template engine, on ASM framework, a tiny and efficient bytecode manipulation framework and on SPIN, a widely disseminated model checking tool. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Examples | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In order to illustrate the Saburo concepts, we describe the specification of a simple Echo server, a Daytime server and finally an UpperCase server. A simple Echo server
This introductory example is composed of three stages: the first one accepts new clients from network and establishes a connection with them, the second one reads data received on a connection and, the last one writes them back to the client. The directed graph below models the interconnection of these stages: ![]() And the following Java code is specified to implement this graph:
Events are used in order to exchange data between stages. They are defined using Java interfaces that only declare methods to set (for output events) and get (for input events) these data. According to the position of each stage in the previous graph, the developper has to define the interfaces for the stage input/ouput events. For the accept stage, which is initial, only one interface for output events is defined:
For the read stage, which is unspecified, an input and output interfaces should be defined:
For the write stage, which is final, an interface is only defined for input events:
Once previous interfaces defined, the developer must implement the stages. Each stage contains a method handle(...) which corresponds to instructions performed by the stage. The parameters of this method are a context, used to reach successors, an input event and an output event. The accept stage may be implemented by:
For the read stage, the handle method may be:
Lastly, the write stage can be implemented by:
A Daytime server
The Daytime example is composed of three stages: the first one accepts new clients from network and establishes a connection with them, the second one gives a specific instant in time, with millisecond precision and, the last one writes back the instant to the client. The directed graph below models the interconnection of these stages: ![]() And the following Java code is specified to implement this graph:
Events are used in order to exchange data between stages. They are defined using Java interfaces that only declare methods to set (for output events) and get (for input events) these data. According to the position of each stage in the previous graph, the developper has to define the interfaces for the stage input/ouput events. For the accept stage, which is initial, only one interface for output events is defined:
For the daytime stage, which is unspecified, an input and output interfaces should be defined:
Denote that InputDaytimeEvent is empty, because no data is needed from the accept stage. In the practice, it is not necessary to specify the empty interfaces. For the write stage, which is final, an interface is only defined for input events:
Once previous interfaces defined, the developer must implement the stages. Each stage contains a method handle(...) which corresponds to instructions performed by the stage. The parameters of this method are a context, used to reach successors, an input event and an output event. The accept stage may be implemented by:
For the daytime stage, the handle method may be:
Lastly, the write stage can be implemented by:
An UpperCase server
The UpperCase example is composed of four stages: the first one accepts new clients from network and establishes a connection with them, the second one reads data received on a connection. The third one takes a string and returns it with all alphabetic characters in uppercase, and finally, the last one writes back the upper string to the client. ![]() And the following Java code is specified to implement this graph:
Events are used in order to exchange data between stages. They are defined using Java interfaces that only declare methods to set (for output events) and get (for input events) these data. According to the position of each stage in the previous graph, the developper has to define the interfaces for the stage input/ouput events. For the accept stage, which is initial, only one interface for output events is defined:
For the read stage, which is unspecified, an input and output interfaces should be defined:
Like the read stage, the upper stage is unspecified, an input and output interfaces should be defined:
For the write stage, which is final, an interface is only defined for input events:
Once previous interfaces defined, the developer must implement the stages. Each stage contains a method handle(...) which corresponds to instructions performed by the stage. The parameters of this method are a context, used to reach successors, an input event and an output event. The accept stage may be implemented by:
For the read stage, the handle method may be:
The upper stage can be implemented by:
Lastly, for the write stage, the handle method may be:
Concurrency generation
A stage with successors use its context to send output event(s) using the dispatchToSuccessor() method. The context is automatically generated according to the concurrency model. If only one process is used to handle a request (competitive strategy), the generated code is only function call(s). If several processes are used (cooperative strategy) it is necessary to introduce blocking queues. Finally, for distributed applications, this method will establish the connection between peers. Saburo relies on several "front-end" generators (bytecode and Java code) to translate a directed graph into a specalized concurrent server according to a concurrent model. To obtain code described by the previous strategies download the latest Saburo "core" release and examples given above and play with. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Papers, posters and talks | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Refeered paper
Saburo, A tool for I/O and concurrency management in servers. Gautier Loyauté, Rémi Forax and Gilles Roussel. Eighteen International Workshop on Java for Parallel and Distributed Computing (JavaPDC'06), April 25-29, 2006, Rhodes Island, Greece (paper). Poster and presentation
Multi Single-Process Event-Driven: A new Internet server architecture. Gautier Loyauté. Second European Conference on System (EuroSys07), March 21-23, 2007, Lisbon, Portugal, (abstract, poster). Saburo, A tool for I/O and concurrency management in servers. Presented at the Eighteen International Workshop on Java for Parallel and Distributed Computing (JavaPDC'06), April 25-29, 2006, Rhodes Island, Greece (presentation). A framework for development of concurrency and I/O in servers. Gautier Loyauté. First European Conference on System (EuroSys06), April 18-21, 2006, Leuven, Belgium (abstract, poster). Thesis work Outils pour le développement de serveurs Internet en mode non-bloquant. Gautier Loyauté. Master's thesis, Université de Marne la Vallée, September, 2004 (pdf, gzipped postscript). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Software downloads | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
File downloads are actually hosted by Université de Marne la Vallée. You may either download the Saburo software as a set of pre-packaged "official" releases (.tgz format, source code included), or use CVS to access the "live" tree. The CVS tree will be updated more frequently than the "official" releases, which are meant to represent stable, tested versions of Saburo. The CVS tree is the "live code" that is under constant development. See the file README in each release for information on compilation and usage of Saburo. Note that all of the Saburo code is covered under a LGPL license. Official releases
We have packaged more stable version of the code. All files are available from this Web page, you may click on one of the links below:
CVS access CVS access is also available for those users who want to maintain a "live" source tree. To check out the Saburo tree using CVS, use the following commands: Javadoc API documentation You can browse the Javadoc documentation for Saburo framework. |
Copyright © 2003 - 2004 Gautier Loyauté, Last modified: Sat Aug 28 18:20:35 CEST 2004 |