germanjke :
i have corpus_text
with string of text, then i'm convert this to list with split on words
i need to count all of words, but my algorithm counting only unique
corpus_test = 'cat dog tiger tiger tiger cat dog lion'
corpus_test = [[word.lower() for word in corpus_test.split()]]
word_counts = defaultdict(int)
for rowt in corpus_test:
for wordt in rowt:
word_counts[wordt] += 1
v_count = len(word_counts.keys())
words_list = list(word_counts.keys())
word_index = dict((word, i) for i, word in enumerate(words_list))
index_word = dict((i, word) for i, word in enumerate(words_list))
and i want show you outputs from this algorithm
v_count
#4
words_list
#['cat', 'dog', 'tiger', 'lion']
word_counts
#defaultdict(int, {'cat': 2, 'dog': 2, 'tiger': 3, 'lion': 1})
word_index
#{'cat': 0, 'dog': 1, 'tiger': 2, 'lion': 3}
index_word
#{0: 'cat', 1: 'dog', 2: 'tiger', 3: 'lion'}
i need to have:
index_word
#{0: 'cat', 1: 'dog', 2: 'tiger', 3: 'tiger', 4: 'tiger', 5: 'cat', 6: 'dog', 7:'lion'}
and
v_count
#8
Serkan Arslan :
with the existing algorithm, you can try this.
index_word = dict((i, word) for i, word in enumerate(rowt))
v_count = len(index_word)
Guess you like
Origin http://43.154.161.224:23101/article/api/json?id=360690&siteId=1