gensim logo

gensim
gensim tagline
models.logentropy_model – LogEntropy model

models.logentropy_model – LogEntropy model

class gensim.models.logentropy_model.LogEntropyModel(corpus, id2word=None, normalize=True)

Objects of this class realize the transformation between word-document co-occurence matrix (integers) into a locally/globally weighted matrix (positive floats).

This is done by a log entropy normalization, optionally normalizing the resulting documents to unit length. The following formulas explain how to compute the log entropy weight for term i in document j:

local_weight_{i,j} = log(frequency_{i,j} + 1)

P_{i,j} = frequency_{i,j} / sum_j frequency_{i,j}

                      sum_j P_{i,j} * log(P_{i,j})
global_weight_i = 1 + ----------------------------
                      log(number_of_documents + 1)

final_weight_{i,j} = local_weight_{i,j} * global_weight_i

The main methods are:

  1. constructor, which calculates the global weighting for all terms in

    a corpus.

  2. the [] method, which transforms a simple count representation into the

    log entropy normalized space.

>>> log_ent = LogEntropyModel(corpus)
>>> print = log_ent[some_doc]
>>> log_ent.save('/tmp/foo.log_ent_model')

Model persistency is achieved via its load/save methods.

normalize dictates whether the resulting vectors will be set to unit length.

initialize(corpus)

Initialize internal statistics based on a training corpus. Called automatically from the constructor.

classmethod load(fname)

Load a previously saved object from file (also see save).

save(fname)

Save the object to file via pickling (also see load).