C++ interface¶
This document explains C++ interface of BigARTM library.
In addition to this page consider to look at Plain C interface of BigARTM, Python Interface or Messages. These documentation files are also to certain degree relevant for C++ interface, because C++ interface is quite similar to Python interface and share the same Protobuf messages.
MasterComponent¶
-
class
MasterComponent
¶ -
MasterComponent
(const MasterComponentConfig &config)¶ Creates a master component with configuration defined by MasterComponentConfig message.
-
void
Reconfigure
(const MasterComponentConfig &config)¶ Updates the configuration of the master component.
-
const MasterComponentConfig &
config
() const¶ Returns current configuration of the master component.
-
MasterComponentConfig *
mutable_config
()¶ Returns mutable configuration of the master component. Remember to call
Reconfigure()
to propagate your changes to master component.
-
void
InvokeIteration
(int iterations_count = 1)¶ Invokes certain number of iterations.
-
bool
AddBatch
(const Batch &batch, bool reset_scores)¶ Adds batch to the processing queue.
-
bool
WaitIdle
(int timeout = -1)¶ Waits for iterations to be completed. Returns true if BigARTM completed before the specific timeout, otherwise false.
-
std::shared_ptr<TopicModel>
GetTopicModel
(const std::string &model_name)¶ Retrieves Phi matrix of a specific topic model. The resulting message TopicModel will contain information about token weights distribution across topics.
-
std::shared_ptr<TopicModel>
GetTopicModel
(const GetTopicModelArgs &args)¶ Retrieves Phi matrix based on extended parameters, specified in GetTopicModelArgs message. The resulting message TopicModel will contain information about token weights distribution across topics.
-
std::shared_ptr<ThetaMatrix>
GetThetaMatrix
(const std::string &model_name)¶ Retrieves Theta matrix of a specific topic model. The resulting message ThetaMatrix will contain information about items distribution across topics. Remember to set
MasterComponentConfig.cache_theta
prior to the last iteration in order to gather Theta matrix.
-
std::shared_ptr<ThetaMatrix>
GetThetaMatrix
(const GetThetaMatrixArgs &args)¶ Retrieves Theta matrix based on extended parameters, specified in GetThetaMatrixArgs message. The resulting message ThetaMatrix will contain information about items distribution across topics.
-
std::shared_ptr<T>
GetScoreAs
<T>(const Model &model, const std::string &score_name)¶ Retrieves given score for a specific model. Template argument must match the specific ScoreData type of the score (for example, PerplexityScore).
-
Model¶
-
class
Model
¶ -
Model
(const MasterComponent &master_component, const ModelConfig &config)¶ Creates a topic model defined by ModelConfig inside given
MasterComponent
.
-
void
Reconfigure
(const ModelConfig &config)¶ Updates the configuration of the model.
-
const std::string &
name
() const¶ Returns the name of the model.
-
const ModelConfig &
config
() const¶ Returns current configuration of the model.
-
ModelConfig *
mutable_config
()¶ Returns mutable configuration of the model. Remember to call
Reconfigure()
to propagate your changes to the model.
-
void
Overwrite
(const TopicModel &topic_model, bool commit = true)¶ Updates the model with new Phi matrix, defined by topic_model. This operation can be used to provide an explicit initial approximation of the topic model, or to adjust the model in between iterations.
Depending on the commit flag the change can be applied immediately (commit = true) or queued (commit = false). The default setting is to use commit = true. You may want to use commit = false if your model is too big to be updated in a single protobuf message. In this case you should split your model into parts, each part containing subset of all tokens, and then submit each part in separate Overwrite operation with commit = false. After that remember to call
MasterComponent::WaitIdle()
andSynchronize()
to propagate your change.
-
void
Initialize
(const Dictionary &dictionary)¶ Initialize topic model based on the
Dictionary
. Each token from the dictionary will be included in the model with randomly generated weight.
-
void
Export
(const string &file_name)¶ Exports topic model into a file.
-
void
Import
(const string &file_name)¶ Imports topic model from a file.
-
void
Synchronize
(double decay_weight, double apply_weight, bool invoke_regularizers)¶ Synchronize the model.
This operation updates the Phi matrix of the topic model with all model increments, collected since the last call to
Synchronize()
method. The weights in the Phi matrix are set according to decay_weight and apply_weight values (refer toSynchronizeModelArgs.decay_weight
for more details). Depending on invoke_regularizers parameter this operation may also invoke all regularizers.Remember to call
Model::Synchronize()
operation every time after callingMasterComponent::WaitIdle()
.
-
void
Synchronize
(const SynchronizeModelArgs &args)¶ Synchronize the model based on extended arguments SynchronizeModelArgs.
-
Regularizer¶
-
class
Regularizer
¶ -
Regularizer
(const MasterComponent &master_component, const RegularizerConfig &config)¶ Creates a regularizer defined by RegularizerConfig inside given
MasterComponent
.
-
void
Reconfigure
(const RegularizerConfig &config)¶ Updates the configuration of the regularizer.
-
const RegularizerConfig &
config
() const¶ Returns current configuration of the regularizer.
-
RegularizerConfig *
mutable_config
()¶ Returns mutable configuration of the regularizer. Remember to call
Reconfigure()
to propagate your changes to the regularizer.
-
Dictionary¶
-
class
Dictionary
¶ -
Dictionary
(const MasterComponent &master_component, const DictionaryConfig &config)¶ Creates a dictionary defined by DictionaryConfig inside given
MasterComponent
.
-
void
Reconfigure
(const DictionaryConfig &config)¶ Updates the configuration of the dictionary.
-
const std::string
name
() const¶ Returns the name of the dictionary.
-
const DictionaryConfig &
config
() const¶ Returns current configuration of the dictionary.
-
Utility methods¶
-
void
SaveBatch
(const Batch &batch, const std::string &disk_path)¶ Saves Batch into a specific folder. The name of the resulting file will be autogenerated, and the extention set to .batch
-
std::shared_ptr<DictionaryConfig>
LoadDictionary
(const std::string &filename)¶ Loads the DictionaryConfig message from a specific file on disk. filename must represent full disk path to the dictionary file.
-
std::shared_ptr<Batch>
LoadBatch
(const std::string &filename)¶ Loads the Batch message from a specific file on disk. filename must represent full disk path to the batch file, including .batch extention.
-
std::shared_ptr<DictionaryConfig>
ParseCollection
(const CollectionParserConfig &config)¶ Parses a text collection as defined by CollectionParserConfig message. Returns an instance of DictionaryConfig which carry all unique words in the collection and their frequencies.