Semantic hashing - Citegraph

Paper Info

Title
Semantic hashing

Abstract
We show how to learn a deep graphical model of the word-count vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the graphical model performs ''semantic hashing'': Documents are mapped to memory addresses in such a way that semantically similar documents are located at nearby addresses. Documents similar to a query document can then be found by simply accessing all the addresses that differ by only a few bits from the address of the query document. This way of extending the efficiency of hash-coding to approximate matching is much faster than locality sensitive hashing, which is the fastest current method. By using semantic hashing to filter the documents given to TF-IDF, we achieve higher accuracy than applying TF-IDF to the entire document set.

Year	DOI	Venue
2009	10.1016/j.ijar.2008.11.006	Int. J. Approx. Reasoning
Keywords	DocType	Volume
Information retrieval,large set,approximate matching,information retrieval graphical models unsupervised learning,deepest layer,graphical model,Latent Semantic Analysis,Unsupervised learning,deep graphical model,Graphical models,entire document set,semantically similar document,better representation,query document	Journal	50
Issue	ISSN	Citations
7	International Journal of Approximate Reasoning	248
PageRank	References	Authors
17.09	18	2

Search Limit

100248

Authors (2 rows)

Cited by (100 rows)

References (18 rows)

Name	Order	Citations	PageRank
Ruslan Salakhutdinov	1	12190	764.15
geoffrey e hinton	2	40435	4751.69

1