C++ interface

This document explains C++ interface of BigARTM library.

In addition to this page consider to look at Plain C interface of BigARTM, Python Interface or Messages. These documentation files are also to certain degree relevant for C++ interface, because C++ interface is quite similar to Python interface and share the same Protobuf messages.

MasterComponent

class MasterComponent
MasterComponent(const MasterComponentConfig &config)

Creates a master component with configuration defined by MasterComponentConfig message.

void Reconfigure(const MasterComponentConfig &config)

Updates the configuration of the master component.

const MasterComponentConfig &config() const

Returns current configuration of the master component.

MasterComponentConfig *mutable_config()

Returns mutable configuration of the master component. Remember to call Reconfigure() to propagate your changes to master component.

void InvokeIteration(int iterations_count = 1)

Invokes certain number of iterations.

bool AddBatch(const Batch &batch, bool reset_scores)

Adds batch to the processing queue.

bool WaitIdle(int timeout = -1)

Waits for iterations to be completed. Returns true if BigARTM completed before the specific timeout, otherwise false.

std::shared_ptr<TopicModel> GetTopicModel(const std::string &model_name)

Retrieves Phi matrix of a specific topic model. The resulting message TopicModel will contain information about token weights distribution across topics.

std::shared_ptr<TopicModel> GetTopicModel(const GetTopicModelArgs &args)

Retrieves Phi matrix based on extended parameters, specified in GetTopicModelArgs message. The resulting message TopicModel will contain information about token weights distribution across topics.

std::shared_ptr<ThetaMatrix> GetThetaMatrix(const std::string &model_name)

Retrieves Theta matrix of a specific topic model. The resulting message ThetaMatrix will contain information about items distribution across topics. Remember to set MasterComponentConfig.cache_theta prior to the last iteration in order to gather Theta matrix.

std::shared_ptr<ThetaMatrix> GetThetaMatrix(const GetThetaMatrixArgs &args)

Retrieves Theta matrix based on extended parameters, specified in GetThetaMatrixArgs message. The resulting message ThetaMatrix will contain information about items distribution across topics.

std::shared_ptr<T> GetScoreAs<T>(const Model &model, const std::string &score_name)

Retrieves given score for a specific model. Template argument must match the specific ScoreData type of the score (for example, PerplexityScore).

Model

class Model
Model(const MasterComponent &master_component, const ModelConfig &config)

Creates a topic model defined by ModelConfig inside given MasterComponent.

void Reconfigure(const ModelConfig &config)

Updates the configuration of the model.

const std::string &name() const

Returns the name of the model.

const ModelConfig &config() const

Returns current configuration of the model.

ModelConfig *mutable_config()

Returns mutable configuration of the model. Remember to call Reconfigure() to propagate your changes to the model.

void Overwrite(const TopicModel &topic_model, bool commit = true)

Updates the model with new Phi matrix, defined by topic_model. This operation can be used to provide an explicit initial approximation of the topic model, or to adjust the model in between iterations.

Depending on the commit flag the change can be applied immediately (commit = true) or queued (commit = false). The default setting is to use commit = true. You may want to use commit = false if your model is too big to be updated in a single protobuf message. In this case you should split your model into parts, each part containing subset of all tokens, and then submit each part in separate Overwrite operation with commit = false. After that remember to call MasterComponent::WaitIdle() and Synchronize() to propagate your change.

void Initialize(const Dictionary &dictionary)

Initialize topic model based on the Dictionary. Each token from the dictionary will be included in the model with randomly generated weight.

void Export(const string &file_name)

Exports topic model into a file.

void Import(const string &file_name)

Imports topic model from a file.

void Synchronize(double decay_weight, double apply_weight, bool invoke_regularizers)

Synchronize the model.

This operation updates the Phi matrix of the topic model with all model increments, collected since the last call to Synchronize() method. The weights in the Phi matrix are set according to decay_weight and apply_weight values (refer to SynchronizeModelArgs.decay_weight for more details). Depending on invoke_regularizers parameter this operation may also invoke all regularizers.

Remember to call Model::Synchronize() operation every time after calling MasterComponent::WaitIdle().

void Synchronize(const SynchronizeModelArgs &args)

Synchronize the model based on extended arguments SynchronizeModelArgs.

Regularizer

class Regularizer
Regularizer(const MasterComponent &master_component, const RegularizerConfig &config)

Creates a regularizer defined by RegularizerConfig inside given MasterComponent.

void Reconfigure(const RegularizerConfig &config)

Updates the configuration of the regularizer.

const RegularizerConfig &config() const

Returns current configuration of the regularizer.

RegularizerConfig *mutable_config()

Returns mutable configuration of the regularizer. Remember to call Reconfigure() to propagate your changes to the regularizer.

Dictionary

class Dictionary
Dictionary(const MasterComponent &master_component, const DictionaryConfig &config)

Creates a dictionary defined by DictionaryConfig inside given MasterComponent.

void Reconfigure(const DictionaryConfig &config)

Updates the configuration of the dictionary.

const std::string name() const

Returns the name of the dictionary.

const DictionaryConfig &config() const

Returns current configuration of the dictionary.

Utility methods

void SaveBatch(const Batch &batch, const std::string &disk_path)

Saves Batch into a specific folder. The name of the resulting file will be autogenerated, and the extention set to .batch

std::shared_ptr<DictionaryConfig> LoadDictionary(const std::string &filename)

Loads the DictionaryConfig message from a specific file on disk. filename must represent full disk path to the dictionary file.

std::shared_ptr<Batch> LoadBatch(const std::string &filename)

Loads the Batch message from a specific file on disk. filename must represent full disk path to the batch file, including .batch extention.

std::shared_ptr<DictionaryConfig> ParseCollection(const CollectionParserConfig &config)

Parses a text collection as defined by CollectionParserConfig message. Returns an instance of DictionaryConfig which carry all unique words in the collection and their frequencies.