Title
Semantic hashing
Abstract
We show how to learn a deep graphical model of the word-count vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the graphical model performs ''semantic hashing'': Documents are mapped to memory addresses in such a way that semantically similar documents are located at nearby addresses. Documents similar to a query document can then be found by simply accessing all the addresses that differ by only a few bits from the address of the query document. This way of extending the efficiency of hash-coding to approximate matching is much faster than locality sensitive hashing, which is the fastest current method. By using semantic hashing to filter the documents given to TF-IDF, we achieve higher accuracy than applying TF-IDF to the entire document set.
Year
DOI
Venue
2009
10.1016/j.ijar.2008.11.006
Int. J. Approx. Reasoning
Keywords
DocType
Volume
Information retrieval,large set,approximate matching,information retrieval graphical models unsupervised learning,deepest layer,graphical model,Latent Semantic Analysis,Unsupervised learning,deep graphical model,Graphical models,entire document set,semantically similar document,better representation,query document
Journal
50
Issue
ISSN
Citations 
7
International Journal of Approximate Reasoning
248
PageRank 
References 
Authors
17.09
18
2
Search Limit
100248
Name
Order
Citations
PageRank
Ruslan Salakhutdinov112190764.15
geoffrey e hinton2404354751.69