Holumbus-Distribution library consists of modules and tools for the implementation of distributed systems in general. Besides common data types and small helper functions, this library provides a mailbox-based data transfer mechanism. It offers an easy-to-use interface for inter- and intra-process communication like in Erlang or Mozart/Oz?.
This site only provides an small overview of the ideas and concepts implemented in this library. For a deeper knowledge please take a look at Stefan Schmidt's Master's Thesis (available as PDF).
For the data exchange between two threads Haskell provides the @Chan@ data type from the module Control.Concurrent.Chan. Unfortunately, a channel can only be used between two threads which share the same address space and not between two processes or even two computers. Other programming languages like Erlang or Mozart/Oz? have build-in mechanisms for message-passing between threads or programs. Our goal is to implement a similar technology for Haskell.
The current implementation and naming is inspired by the concept of Mozart/oz to use stream-port-objects for data transmission.
A stream in this context has the role of a mailbox. It collects all incoming messages and handles them to a thread for further processing. Port objects encapsulate the sending of messages to a stream. Every port is linked to a specified stream. The sender is able to decide, if the stream is located in the same address space or on a different machine. Depending on the result of this decision, the message is directly sent to the stream or via a network connection. The following figure illustrates the use of streams and ports to solve the common producer-consumer-problem. The consumer owns a stream and the two producers send their messages via the ports to it. The ports decide if the message can be transferred directly or via the socket interface.
To be independent of hardware changes, the streams can be provided with a unique string identifier. The mapping between the stream names and their physical address is done by a registry application, the port-registry. This registry works like a DNS-server for stream names. Everytime a message is sent to a stream, its hardware address is resolved. If the hardware configuration changes, only the mapping has to be updated. The sender applications stay unchanged.
The following figure shows the two main functions of the port-registry, adding new entries from a stream and giving them to requesting ports.
As described so far, all streams are able to receive messages from the same process or from different processes. Even if the stream is only used for internal data exchange, external processes might send data to it and manipulate the internal program state. In applications which deal with confidential data, such a scenario represents a severe security violation. The programmer needs the possibility of controlling the outside access to a stream. Therefore three different types of streams are introduced:
- Global streams
- Local streams
- Private streams
Global streams accept messages from all internal and external sources and send their name to the registry.
Local streams do not register their name at the port-registry but like global streams, they accept messages from all sources. When transmitting messages to a local stream, the sender needs to know the name of the destination computer and the socket number in addition to the stream name.
Private streams can only be used to communicate between two threads in the same address space. They do not accept messages from foreign processes and solve the above security problem. These streams fulfil the same functionality like the existing Chan data types in Haskell. The advantage of using a private stream is the opportunity to change them without big effort to a more open communication.
Currently, a better implementation of the request-response handling is investigated. Besides this, the following topics need to be done in the future:
- introduce caches in ports
- re-register at port-registry
- introduce command line parameters for port-registry
After the installation is completed, the Holumbus-Distribution library can be used to build distributed systems. Although global streams can only be accessed from other programs if one instance of the port-registry program is running inside the network. Therefore the port-registry has to be started with the following command before all other components of the distributed system:
The port-registry is a command line program and accepts all the instructions shown in the following listing.
exit - exits the program help - prints this help lookup - gets the socket id for a port ports - lists all ports register - registers a port manually unregister - unregisters a port manually version - prints the version
When the port-registry is started, it creates the file registry.xml in the system's tmp-directory. It contains information how to access the port-registry and has to be copied to every machine which will be part of the distributed system.
The components which want to access the port-registry need to load the XML file by calling the function newPortRegistryFromXmlFile at program startup. After this the internal datastructures have to be updated by setPortRegistry.
import Holumbus.Network.PortRegistry.PortRegistryPort import Holumbus.Network.Port main :: IO () main = do reg <- newPortRegistryFromXmlFile "/tmp/registry.xml" setPortRegistry reg gS <- (newGlobalStream "global"):: IO (Stream String) msg <- readStream gS putStrLn msg closeStream gS
The program in the listing above sets up a global stream with the name "global". Then it waits for an incoming message, prints out its content and terminates. The following listing shows how to send messages to the stream.
import Holumbus.Network.PortRegistry.PortRegistryPort import Holumbus.Network.Port main :: IO () main = do reg <- newPortRegistryFromXmlFile "/tmp/registry.xml" setPortRegistry reg gP <- (newGlobalPort "global")::IO (Port String) send gP "Hello World"