gensim's LSI model to calculate the similarity of documents

[python]  view plain copy
  1. from gensim import corpora,models,similarities  
  2. dictionary=corpora.Dictionary.load('/tmp/deerwester.dict')  
  3. corpus=corpora.MmCorpus('/tmp/deerwester.mm')  
  4. print (corpus)  
  5.   
  6. lsi=models.LsiModel(corpus,id2word=dictionary,num_topics=2)  
  7. doc="human computer interaction"  
  8. vec_bow=dictionary.doc2bow(doc.lower().split())  
  9. vec_lsi = lsi [vec_bow]   #convert the query to LSI space  
  10. print (vec_lsi)  
  11.   
  12. #transform corpus to space and index it  
  13. index=similarities.MatrixSimilarity(lsi[corpus])  
  14.   
  15. index.save('/tmp/deerwester.index')  
  16. sims = index [vec_lsi]  
  17. sims=sorted(enumerate(sims),key=lambda item:-item[1])  
  18.   
  19. from  pprint  import  pprint  
  20. pprint (sims)  

Using gensim's LSI model to calculate the similarity of documents


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325587542&siteId=291194637