Halloween Sale: Flat 70% + Addl. 25% Off + 30 Days Extra Lab on all Courses | Use Coupon HS25 in Checkout | Offer Expires In

  Enroll Now

Spam Classifier - Create Transformer to Convert Word Counts to Vectors

Now we have the word counts, and we need to convert them to vectors. For this, we will build another transformer whose fit() method will build the vocabulary (an ordered list of the most common words) and whose transform() method will use the vocabulary to convert word counts to vectors. The output will be a sparse matrix.

INSTRUCTIONS
  • Create a transformer WordCounterToVectorTransformer to convert word counts to vectors.

    from scipy.sparse import csr_matrix
    
    class << your code goes here >>(BaseEstimator, TransformerMixin):
        def __init__(self, vocabulary_size=1000):
            self.vocabulary_size = vocabulary_size
        def fit(self, X, y=None):
            total_count = Counter()
            for word_count in X:
                for word, count in word_count.items():
                    total_count[word] += min(count, 10)
            most_common = total_count.most_common()[:self.vocabulary_size]
            self.most_common_ = most_common
            self.vocabulary_ = {word: index + 1 for index, (word, count) in enumerate(most_common)}
            return self
        def transform(self, X, y=None):
            rows = []
            cols = []
            data = []
            for row, word_count in enumerate(X):
                for word, count in word_count.items():
                    rows.append(row)
                    cols.append(self.vocabulary_.get(word, 0))
                    data.append(count)
            return csr_matrix((data, (rows, cols)), shape=(len(X), self.vocabulary_size + 1))
    
  • Now we will try this transformer that we created:

    vocab_transformer = WordCounterToVectorTransformer(vocabulary_size=10)
    X_few_vectors = vocab_transformer.fit_transform(X_few_wordcounts)
    X_few_vectors
    
  • And finally we will convert the output vector to an array:

    << your code goes here >>.toarray()
    

No hints are availble for this assesment

Answer is not availble for this assesment


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...