Welcome to the Holumbus project!
Holumbus is a Haskell library which provides the basic building blocks for creating powerful indexing and search applications. This includes a framework for distributed crawling and indexing as well as distributed query processing. Additionally, a full-fledged distributed Map-Reduce framework is included.
To explore the power of Holumbus, have a look at Hayoo!, a Haskell API search engine, or download (see below) Holumbus and check out some of the included examples.
The Holumbus framework ist split into four sub-packages, which build upon each other but may also be used independently:
- The Holumbus Distribution Library
- The Holumbus Storage System
- The Holumbus MapReduce Framework
- The Holumbus Search Engine
Some extensive documentation can be found in the following cookbooks:
Currently, Holumbus is under heavy development and should not be considered for creating applications in productive environments. You can follow the development on our blog and on this page. We plan to release a first stable version of Holumbus throughout the year 2009.
If you want to try Holumbus, you can get a current development snapshot using Git:
$ git clone git://github.com/fortytools/holumbus.git
You can also browse the repository (see link in navigation above) and of course submit patches via github ;)
Distribution packages are available from Hackage for the following Holumbus libraries:
The Holumbus library is distributed under the terms of the MIT license. Please have a look at the LICENSE file in the package and the copyright note at the top of every source file.
Holumbus is developed and tested with GHC 6.8 and 6.10 but will probably work with GHC 6.6 through minor adjustments. In addition to the libraries coming with GHC, Holumbus needs Binary, BZip, HDBC, HDBC-sqlite3, HXT, Regex-Compat, UTF8-String and PureMD5. To run the websearch example, the janus application server is required. Running the test suite requires QuickCheck and HUnit.
For each sub-project, a Cabal file is provided, therefore Holumbus can be installed using the standard Cabal way:
$ runhaskell Setup.hs configure $ runhaskell Setup.hs build $ runhaskell Setup.hs install # with root privileges
For those more familiar to make, a Makefile is provided which has some shortcuts for the commands above:
$ make configure $ make build $ make install # with root priviliges
To run the Holumbus test suite, you can use the alltests target:
$ make alltests
The standalone examples can be built using the allexamples target (also see the README file included with each example):
$ make allexamples
To build the package with profiling capabilities, the prof target can be used:
$ make prof $ make install # with root privileges
Have a look at the API documentation for the searchengine, the mapreduce framework, the storage system and the distribution library generated by Haddock. You will also find some examples in the examples directory of the distribution.
More in-depth information about the Holumbus framework is available in the master's thesis The Holumbus Framework: Creating fast, flexible and highly customizable search engines with Haskell by Timo B. Hübel (available as PDF) as well as in the master's thesis The Holumbus Framework: Creating scalable and highly customized crawlers and indexers by Sebastian M. Schlatt (available as PDF). The Holumbus MapReduce System, the distribution library and the storage system are developed in the master's thesis The Holumbus Framework: Distributed computing with MapReduce in Haskell (available as PDF) by Stefan Schmidt.
Holumbus is developed and maintained by Sebastian M. Schlatt, Timo B. Hübel, Stefan Schmidt and Sebastian Reese in cooperation with Dr. Uwe Schmidt and FH Wedel University of Applied Sciences.
Holumbus in Action
Full text search of the public internet pages at FH-Wedel
Hayoo! Haskell API Search
- Janus - A Dynamic Webserver with Servlet Functionality in Haskell representing all internal data by means of XML (thesis)
- HXT - Haskell XML Toolbox