TITLE : A Dimensionality
Reduction Technique for Classification and Similarity Searches of
Region Data in Spatial Databases
Talk by : Despina Kontos , Ph.D. student, CIS Department -
Temple University
Talk by : Dr.Marcus J. Sobel , Statistics Department - Temple
University
Abstract: In most of the attempts to characterize
data (images, signals, text, etc.) The prime concern is to extract
descriptive features that provide significant information. A characterization
approach is to map the data to points in a k-dimensional space,
where k is the number of features extracted. Dimensionality reduction
can further be used to select the most discriminative features,
improving classification, indexing and retrieval. Here, we focus
on characterizing spatial Regions of Interest (ROIs). We propose
a novel statistical approach based on a supervised framework for
reducing the dimensionality of the feature space, when distinct
classes of data are present. The method employs a Markov Chain Monte
Carlo (MCMC) algorithm designed to select the most informative features,
according to their discriminative power across distinct classes
of data. This reduces the dimensionality of the initial feature
space and also improves the classification of the ROIs, since attributes
providing irrelevant information with respect to class membership
are discarded. We extend this effect by introducing as well a weighted
Euclidean Distance, designed to effectively classify the ROIs. We
demonstrate the effectiveness of the proposed technique by applying
it to 2D and 3D spatial ROIs, we test its scalability on large datasets
and perform similarity searches. Finally, we compare the proposed
approach with other dimensionality reduction techniques (Singular
Value Decomposition, Karhunen-Loève transform) and present
classification performance using Neural Networks, Decision Trees
and Euclidean Distance measurements.
We will also discuss clustering methods e.g.,
adaboost' which minimize clustering bias and variance when the number
and characterization of region features are known. Additionally,
we will discuss the relevance of random trees, multiscale analysis,
wavelets and fractal methodologies to properly choosing the number
and characterization of region features when they are not known.
|