J. Castro, M. Georgiopoulos, and R. F. DeMara, "Data Partitioning with Fuzzy
ARTMAP using the Hilbert Space Filling Curves: Effect on the Speed of
Convergence of Fuzzy ARTMAP for Large Database Problems," revision pending to
Neural Networks, revised and resubmitted on November 21, 2004.
Abstract
The Fuzzy ARTMAP algorithm has been proven to be one of the premier neural
network architectures for classification problems. One of the properties of
Fuzzy ARTMAP, which can be both an asset and a liability, is its capacity to
produce new nodes (templates) on demand to represent classification categories.
This property allows Fuzzy ARTMAP to automatically adapt to the database without
having to a-priori specify its network size. On the other hand, it has the
undesirable side effect that large databases might produce a large network size
that can dramatically slow down the training speed of the algorithm. To address
the slow convergence speed of Fuzzy ARTMAP for large database problems, we
propose the use of space-filling curves, specifically the Hilbert space-filling
curves (HSFC). Hilbert space filling curves allow us to divide the problem into
smaller sub-problems, each focusing on a smaller than the original data set. For
learning each partition of data, a different Fuzzy ARTMAP network is used.
Through this divide-and-conquer approach we are avoiding the node proliferation
problem, and consequently we speed-up Fuzzy ARTMAP's training. Results have been
produced for a 2-class, 16-dimensional Gaussian data, and on the Forrest
database, available at the UCI repository. Our results indicate that the Hilbert
space-filling curve approach reduces the time that it takes to train Fuzzy
ARTMAP without affecting the generalization performance attained by Fuzzy ARTMAP
trained on the original large data-set. Given that the resulting smaller data-
sets that the HSFC approach produces can independently be learned by different
Fuzzy ARTMAP networks, we have also implemented and tested a parallel
implementation of this approach on a Beowulf cluster of workstations that
further speeds up the time that it takes to train and test Fuzzy ARTMAP on large
database problems.