Data variety, as one of the three Vs of the Big Data, is man-<lb>ifested by a growing number of complex data types such as<lb>documents, sequences, trees, graphs and high dimensional<lb>vectors. To perform similarity search on these data, exist-<lb>ing works mainly choose to create customized indexes for<lb>different data types. Due to the diversity of customized in-<lb>dexes, it is hard to devise a general parallelization strategy<lb>to speed up the search. In this paper, we propose a generic<lb>inverted index on the GPU (called GENIE), which can sup-<lb>port similarity search of multiple queries on various data<lb>types. GENIE can effectively support the approximate near-<lb>est neighbor search in different similarity measures through<lb>exerting Locality Sensitive Hashing schemes, as well as sim-<lb>ilarity search on original data such as short document data<lb>and relational data. Extensive experiments on different real-<lb>life datasets demonstrate the efficiency and effectiveness of<lb>our system.
Download Full PDF Version (Non-Commercial Use)