Geographical Information System Application of Multiprocessor Multidisk Image Servers

Hypermedia interfaces increase the need for large image and media servers, capable of storing and reading large amounts of (un)compressed data. To fulfill these requirements, commercial systems are based on RAID arrays of disks connected to a high-speed network. The Peripheral Systems Laboratory (LSP, EPFL) proposes as an alternative multiprocessor-multidisk (MPMD) image-server architectures. A MPMD architecture consists of an array of intelligent disk nodes, each disk node being composed of one processor and one or more disks. Such an architecture supports data storage and processing, while minimizing costly data transfers. Large images are divided into extents (image parts with good locality), which are allocated on various disks in the architecture so as to balance the load when accessing and processing arbitrary parts of the image. The LSP has implemented a version of the MPMD architecture, based on a transputer array, and is implementing two new versions, one based on a shared-memory multiprocessor workstation cluster (MT-MDFS), and one based on a network of workstations running the PVM software (PVMDFS).

This contribution analyzes the respective behavior of the three MPMD architectures running a GIS (geographical information system) application. GIS applications require a large range of data types, classified into two major categories, vectors (explicit topological structure) and grids or rasters (implicit topological structure). Grid data (2-D and 3-D fields, mostly scalar, such as average temperatures, rainfall) are stored as pixmaps. The best storage format for vector data depends on the size of the vector database, on its organization and on the application. Very large detailed vector maps (topographic maps, for example) are best stored as pixmaps for fast visualization purposes. These maps are geographically referenced, and have varying scales (1:25000 for topographic maps ; 1:500 for cadastral maps ; 1 pixel per 100m-by-100m square for land-utilization maps). Typical processing operations on the spatially referenced data are queries such as extracting data from single maps (filtering operations) ; or combining data from multiple maps to generate new maps or histograms.

This contribution analyses how to distribute the data from various maps across multiple disk-nodes of the GigaView, so as to minimize data transfers and maximize the parallelism for both data access and processing. Our approach is to evaluate through experiments on single-processor single-disk workstations elementary operation performance (e.\,g. reading data from the disks, resampling or filtering an extents, merging two maps), and use the performance parameters to build simulation models of MPMD architectures. We validate the models for already implemented operations (reading a pixmap window from the disks). We then extrapolate through simulation the performance of the modeled MPMD architectures for new operations (read-resample-merge operations).

Through simulation, we show the benefits which can be obtained from adequate data and task distribution across disk nodes, as a function of the available component performance (processor performance, transputer link throughput, PVM network throughput and latency, disk throughput and latency). The results demonstrate that, despite lower individual component performance, distributed transputer- and PVM-based architectures compete favorably with shared-memory multiprocessor workstation-cluster architectures.

Download the full paper: Acrobat PDF 11 KB

<basile.schaeli@epfl(add: .ch)>

Last modified: 2007/09/26 21:26:05