Have number of words per sentence, how to get mean?

md2614 :

I have a function and want to return (a) the number of words per sentence and (b) mean length of words per sentence in a list of tuples. I can get (a). For (b) I can get the total number of characters per sentence but not mean.

I've reviewed a few posts (this, that and another) but can't wrap my head around this last piece.

I've included a couple failed attempts commented out.

import statistics

def sentence_num_and_mean(text):
    """ Output list of, per sentence, number of words and mean length of words """
    # Replace ! and ? with .
    for ch in ['!', '?']:
        if ch in text:
            text = text.replace(ch, '.')

    # Number of words per sentence
    num_words_per_sent =  [len(element) for element in (element.split() for element in text.split("."))]

    # Mean length of words per sentence

    # This gets sum of characters per sentence, so on the right track
    mean_len_words_per_sent = [len(w) for w in text.split('.')]

    # This gives me "TypeError: unsupported operand type(s) for /: 'int' and 'list'" error
    # when trying to get the denominator for the mean
    # A couple efforts
    #mean_len_words_per_sent = sum(num_words_per_sent) / [len(w) for w in text.split('.')]
    #mean_len_words_per_sent = [(num_words_per_sent)/statistics.mean([len(w) for w in text.split()])]

    # Return list zipped together
    return list(zip(num_words_per_sent, mean_len_words_per_sent))

Driver program:

split_test = "First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah"
func_test = sentence_num_and_mean(split_test)
print(split_test)
print(func_test)

which prints

First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah
[(6, 33), (7, 35), (2, 15), (2, 17), (3, 15)]

For one, I need to strip out spaces and periods, but ignoring that for now, if I did the simple math right it should be:

[(6, 5.5), (7, 5), (2, 7.5), (2, 8.5), (3, 5)]
Ed Ward :

If you only want letters, this should work:

def sentence_num_and_mean(text):
    # Replace ! and ? with .
    for ch in ['!', '?']:
        if ch in text:
            text = text.replace(ch, '.')

    output = []
    sentences = text.split(".")
    for sentence in sentences:
        words = [x for x in sentence.split(" ") if x]
        word_count = len(words)
        word_length = sum(map(len, words))
        word_mean = word_length / word_count
        output.append((word_count, word_mean))

    return output


split_test = "First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah"
func_test = sentence_num_and_mean(split_test)
print(split_test)
print(func_test)

Output:

First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah
[(6, 4.666666666666667), (7, 4.0), (2, 6.5), (2, 7.5), (3, 4.0)]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=336644&siteId=1