gensim logo

gensim
gensim tagline
models – Package for transformation models

models – Package for transformation models

This package contains algorithms for extracting document representations from their raw bag-of-word counts.

class gensim.models.VocabTransform(old2new, id2token=None)

Remap feature ids to new values.

Given a mapping between old ids and new ids (some old ids may be missing = these features are to be discarded), this will wrap a corpus so that iterating over VocabTransform[corpus] returns the same vectors but with the new ids.

Old features that have no counterpart in the new ids are discarded. This can be used to filter vocabulary of a corpus “online”:

>>> old2new = dict((oldid, newid) for newid, oldid in enumerate(ids_you_want_to_keep))
>>> vt = VocabTransform(old2new)
>>> for vec_with_new_ids in vt[corpus_with_old_ids]:
>>>     ...
classmethod load(fname)

Load a previously saved object from file (also see save).

save(fname)

Save the object to file via pickling (also see load).