Changes in BigARTM CLI¶
v0.9.0¶
- added option
--cooc-windowto set width of window in which tokens are considered to occurr together. - added option
--cooc-min-tfto set minimal value of absolute co-occurrence of tokens. - added option
--cooc-min-dfto set minimal value of documental frequency of token co-occurrerence. - added option
--write-cooc-tfto set path of output file with absolute co-occurrences. - added option
--write-cooc-dfto set path of output file with documental frequency of token co-occurrences. - added option
--write-ppmi-tfto set path of output file with ppmi’s calculated on base of absolute co-occurrences. - added option
--write-ppmi-dfto set path of output file with ppmi’s calculated on base of documental frequences of token co-occurrences.
v0.8.2¶
- added option
--rand-seedto initialize random number generator; without this options, RNG will be set using system time - added option
--write-vw-corpusto convert batches into plain text file in Vowpal Wabbit format - change the naming scheme of the batches, saved with
--save-batchesoption. Previously file names were guid-based, while new format will look like this:aabcde.batch. New format ensures the ordering of the documents in the collection is be preserved, given that user scans batches alphabetically. - added switch
--guid-batch-nameto enable old naming scheme of batches (guid-based names). This option is useful if you launch multiple instances of BigARTM CLI to concurrently generate batches. - speedup parsing large files in VowpalWabbit format
- when
--use-modalityis specified, the batches saved with--save-batcheswill only include tokens from these modalities. Other tokens will be ignored during parsing. This option is implemented for both VW and UCI BOW formats. - implement
TopicSelection,LabelRegularization,ImproveCoherence,Bitermsregularizer in BigARTM CLI - added option
--dictionary-sizeto give user an easy option to limit his dictionary size - add more diagnostics information about dictionary size (before and after filtering)
- add strict verification of scores and regularizers; for example, BigARTM CLI will raise an exception for this input:
bigartm -t obj:10,back:5 --regularizer "0.5 SparsePhi #obj*". There shouldn’t be star sign in#obj*.
v0.8.0¶
- renamed
--passesinto--num-collection-passes - renamed
--num-inner-iterationsinto--num-document-passes - removed
--model-v06option - removed
--use-dense-bowoption