A simple coding for cross-domain matching with dimension reduction via spectral graph embedding

29 Dec 2014 · Hidetoshi Shimodaira ·

Data vectors are obtained from multiple domains. They are feature vectors of images or vector representations of words. Domains may have different numbers of data vectors with different dimensions. These data vectors from multiple domains are projected to a common space by linear transformations in order to search closely related vectors across domains. We would like to find projection matrices to minimize distances between closely related data vectors. This formulation of cross-domain matching is regarded as an extension of the spectral graph embedding to multi-domain setting, and it includes several multivariate analysis methods of statistics such as multiset canonical correlation analysis, correspondence analysis, and principal component analysis. Similar approaches are very popular recently in pattern recognition and vision. In this paper, instead of proposing a novel method, we will introduce an embarrassingly simple idea of coding the data vectors for explaining all the above mentioned approaches. A data vector is concatenated with zero vectors from all other domains to make an augmented vector. The cross-domain matching is solved by applying the single-domain version of spectral graph embedding to these augmented vectors of all the domains. An interesting connection to the classical associative memory model of neural networks is also discussed by noticing a coding for association. A cross-validation method for choosing the dimension of the common space and a regularization parameter will be discussed in an illustrative numerical example.

PDF Abstract