BigARTM command line utilityΒΆ
This document provides an overview of bigartm
command-line utility shipped with BigARTM.
For a detailed description of bigartm
command line interface refer to
bigartm.exe notebook (in Russian).
In brief, you need to download some input data (a textual collection represented in bag-of-words format).
We recommend to download vocab and docword files by links provided in Downloads section of the tutorial.
Then you can use bigartm
as described by bigartm --help
:
>bigartm --help
BigARTM - library for advanced topic modeling (http://bigartm.org):
Input data:
-c [ --read-vw-corpus ] arg Raw corpus in Vowpal Wabbit format
-d [ --read-uci-docword ] arg docword file in UCI format
-v [ --read-uci-vocab ] arg vocab file in UCI format
--batch-size arg (=500) number of items per batch
--use-batches arg folder with batches to use
Dictionary:
--dictionary-min-df arg filter out tokens present in less than N
documents / less than P% of documents
--dictionary-max-df arg filter out tokens present in less than N
documents / less than P% of documents
--use-dictionary arg filename of binary dictionary file to use
Model:
--load-model arg load model from file before processing
-t [ --topics ] arg (=16) number of topics
--use-modality arg modalities (class_ids) and their weights
Learning:
-p [ --passes ] arg (=10) number of outer iterations
--inner-iterations-count arg (=10) number of inner iterations
--update-every arg (=0) [online algorithm] requests an update of
the model after update_every document
--tau0 arg (=1024) [online algorithm] weight option from
online update formula
--kappa arg (=0.699999988) [online algorithm] exponent option from
online update formula
--reuse-theta reuse theta between iterations
--regularizer arg regularizers (SmoothPhi,SparsePhi,SmoothT
heta,SparseTheta,Decorrelation)
--threads arg (=0) number of concurrent processors (default:
auto-detect)
Output:
--save-model arg save the model to binary file after
processing
--save-batches arg batch folder
--save-dictionary arg filename of dictionary file
--write-model-readable arg output the model in a human-readable
format
--write-predictions arg write prediction in a human-readable
format
--score-level arg (=2) score level (0, 1, 2, or 3
--score arg scores (Perplexity, SparsityTheta,
SparsityPhi, TopTokens, ThetaSnippet, or
TopicKernel)
--final-score arg final scores (same as scores)
Other options:
-h [ --help ] display this help message
--response-file arg response file
--paused start paused and waits for a keystroke
(allows to attach a debugger)
--disk-cache-folder arg disk cache folder
--disable-avx-opt disable AVX optimization (gives similar
behavior of the Processor component to
BigARTM v0.5.4)
--use-dense-bow use dense representation of bag-of-words
data in processors
Examples:
cpp_client -d docword.kos.txt -v vocab.kos.txt
set GLOG_logtostderr=1 & cpp_client -d docword.kos.txt -v vocab.kos.txt