Plain C interface of BigARTM¶
This document explains all public methods of the low level BigARTM interface.
Introduction¶
The goal of low level BigARTM interface is to expose all functionality of the library in a set of simple functions written in good old plain C language. This makes it easier to consume BigARTM from various programming environments. For example, the Python Interface of BigARTM uses ctypes module to call the low level BigARTM interface. Most programming environments also have similar functionality: PInvoke in C#, loadlibrary in Matlab, etc.
Note that most methods in this API accept a serialized binary representation of some Google Protocol Buffer message. Please, refer to Messages for more details about each particular message.
All methods in this API return an integer value.
Negative return values represent an error code.
See error codes for the list of all error codes.
To get corresponding error message as string use ArtmGetLastErrorMessage()
.
Non-negative return values represent a success, and for some API methods
might also incorporate some useful information.
For example, ArtmCreateMasterComponent()
returns the ID of newly created master component,
and ArtmRequestTopicModel()
returns the length of the buffer that should be allocated before
calling ArtmCopyRequestResult()
.
ArtmCreateMasterComponent¶
-
int
ArtmCreateMasterComponent
(int length, const char* master_component_config)¶ Creates a master component.
Parameters: - master_component_config (const_char*) – Serialized MasterComponentConfig message, describing the configuration of the master component.
- length (int) – The length in bytes of the master_component_config message.
Returns: In case of success, a non-negative ID of the master component, otherwise one of the error codes.
The ID, returned by this operation, is required by most methods in this API. Several master components may coexist in the same process. In such case any two master components with different IDs can not share any common data, and thus they are completely independent from each other.
ArtmReconfigureMasterComponent¶
-
int
ArtmReconfigureMasterComponent
(int master_id, int length, const char* master_component_config)¶ Changes the configuration of the master component.
Parameters: - master_id (int) – The ID of a master component
returned by
ArtmCreateMasterComponent()
method. - master_component_config (const_char*) – Serialized MasterComponentConfig message, describing the new configuration of the master component.
- length (int) – The length in bytes of the master_component_config message.
Returns: A zero value if operation succeeded, otherwise one of the error codes.
- master_id (int) – The ID of a master component
returned by
ArtmDisposeMasterComponent¶
-
int
ArtmDisposeMasterComponent
(int master_id)¶ Disposes master component together with all its models, regularizers and dictionaries.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method.
Returns: This operation always returns
ARTM_SUCCESS
.This operation releases memory and other unmanaged resources, used by the master component.
After this operation the master_id value becames invalid and must not be used in other operations.
- master_id (int) – The ID of a master component,
returned by
ArtmCreateModel¶
-
int
ArtmCreateModel
(int master_id, int length, const char* model_config)¶ Defines a new topic model.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - model_config (const_char*) – Serialized ModelConfig message, describing the configuration of the topic model.
- length (int) – The length in bytes of the model_config message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.Note that this method only defines the configuration a topic model, but does not tune it. Use
ArtmInvokeIteration()
method to process the collection of textual documents, and thenArtmRequestTopicModel()
to retrieve the resulting topic model.It is important to notice that model_config must have a unique value in the
ModelConfig.name
field, that can be further used to identify the model (for example inArtmRequestTopicModel()
call).- master_id (int) – The ID of a master component,
returned by
ArtmReconfigureModel¶
-
int
ArtmReconfigureModel
(int master_id, int length, const char* model_config)¶ Updates the configuration of topic model.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - model_config (const_char*) – Serialized ModelConfig message, describing the new configuration of the topic model.
- length (int) – The length in bytes of the model_config message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmDisposeModel¶
-
int
ArtmDisposeModel
(int master_id, const char* model_name)¶ Explicitly delete a specific topic model. All regularizers within specific master component are also deleted automatically by
ArtmDisposeMasterComponent()
.After
ArtmDisposeModel()
the model_name became invalid and shell not be used inArtmRequestScore()
,ArtmRequestTopicModel()
,ArtmRequestThetaMatrix()
or any other method (or protobuf message) that require model_name.Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - model_name (const_char*) – A string identified of the model that should be deleted.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmCreateRegularizer¶
-
int
ArtmCreateRegularizer
(int master_id, int length, const char* regularizer_config)¶ Creates a new regularizer.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - regularizer_config (const_char*) – Serialized RegularizerConfig message, describing the configuration of a new regularizer.
- length (int) – The length in bytes of the regularizer_config message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.This operation only creates the regularizer so that it can be used by topic models. To actually apply the regularizer you should include its name in
ModelConfig.regularizer_name
list of a model config.- master_id (int) – The ID of a master component,
returned by
ArtmReconfigureRegularizer¶
-
int
ArtmReconfigureRegularizer
(int master_id, int length, const char* regularizer_config)¶ Updates the configuration of the regularizer.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - regularizer_config (const_char*) – Serialized RegularizerConfig message, describing the configuration of a new regularizer.
- length (int) – The length in bytes of the regularizer_config message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmDisposeRegularizer¶
-
int
ArtmDisposeRegularizer
(int master_id, const char* regularizer_name)¶ Explicitly delete a specific regularizer. All regularizers within specific master component are also deleted automatically by
ArtmDisposeMasterComponent()
.Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - regularizer_name (const_char*) – A string identified of the regularizer that should be deleted.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmCreateDictionary¶
-
int
ArtmCreateDictionary
(int master_id, int length, const char* dictionary_config)¶ Creates a new dictionary.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - dictionary_config (const_char*) – Serialized DictionaryConfig message, describing the configuration of a new dictionary.
- length (int) – The length in bytes of the dictionary_config message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmReconfigureDictionary¶
-
int
ArtmReconfigureDictionary
(int master_id, int length, const char* dictionary_config)¶ Updates the dictionary.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - dictionary_config (const_char*) – Serialized DictionaryConfig message, describing the new configuration of the dictionary.
- length (int) – The length in bytes of the dictionary_config message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmDisposeDictionary¶
-
int
ArtmDisposeDictionary
(int master_id, const char* dictionary_name)¶ Explicitly delete a specific dictionary. All dictionaries within specific master component are also deleted automatically by
ArtmDisposeMasterComponent()
.Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - dictionary_name (const_char*) – A string identified of the dictionary that should be deleted.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmAddBatch¶
-
int
ArtmAddBatch
(int master_id, int length, const char* add_batch_args)¶ Adds batch for processing.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - add_batch_args (const_char*) – Serialized AddBatchArgs message, describing the arguments of this operation.
- length (int) – The length in bytes of the add_batch_args message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmInvokeIteration¶
-
int
ArtmInvokeIteration
(int master_id, int length, const char* invoke_iteration_args)¶ Invokes several iterations over the collection.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - char* invoke_iteration_args (const) – Serialized InvokeIterationArgs message, describing the arguments of this operation.
- length (int) – The length in bytes of the invoke_iteration_args message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmSynchronizeModel¶
-
int
ArtmSynchronizeModel
(int master_id, int length, const char* sync_model_args)¶ Synchronizes topic model.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - sync_model_args (const_char*) – Serialized SynchronizeModelArgs message, describing the arguments of this operation.
- length (int) – The length in bytes of the sync_model_args message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.This operation updates the Phi matrix of the topic model with all model increments, collected since last call to ArtmSynchronizeModel. In addition, this operation invokes all Phi-regularizers for the requested topic model.
- master_id (int) – The ID of a master component,
returned by
ArtmInitializeModel¶
-
int
ArtmInitializeModel
(int master_id, int length, const char* init_model_args)¶ Initializes the phi matrix of a topic model with some random initial approximation.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - init_model_args (const_char*) – Serialized InitializeModelArgs message, describing the arguments of this operation.
- length (int) – The length in bytes of the init_model_args message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmExportModel¶
-
int
ArtmExportModel
(int master_id, int length, const char* export_model_args)¶ Exports phi matrix into a file.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - export_model_args (const_char*) – Serialized ExportModelArgs message, describing the arguments of this operation.
- length (int) – The length in bytes of the export_model_args message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmImportModel¶
-
int
ArtmImportModel
(int master_id, int length, const char* import_model_args)¶ Import phi matrix from a file.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - import_model_args (const_char*) – Serialized ImportModelArgs message, describing the arguments of this operation.
- length (int) – The length in bytes of the import_model_args message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmWaitIdle¶
-
int
ArtmWaitIdle
(int master_id, int length, const char* wait_idle_args)¶ Awaits for ongoing iterations.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - wait_idle_args (const_char*) – Serialized WaitIdleArgs message, describing the arguments of this operation.
- length (int) – The length in bytes of the wait_idle_args message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmOverwriteTopicModel¶
-
int
ArtmOverwriteTopicModel
(int master_id, int length, const char* topic_model)¶ This operation schedules an update of an entire topic model or of it subpart.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - topic_model (const_char*) – Serialized TopicModel message, describing the new topic model.
- length (int) – The length in bytes of the topic_model message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.Note that this operation only schedules the update of a topic model. To make sure the update is completed you must call
ArtmWaitIdle()
andArtmSynchronizeModel()
. Remember that by defaultArtmSynchronizeModel()
will calculate all regularizers enabled in the configuration of the topic model. The may result in a different topic model than the one you passed as topic_model parameter. To avoid this behavior setSynchronizeModelArgs.invoke_regularizers
tofalse
.- master_id (int) – The ID of a master component,
returned by
ArtmRequestThetaMatrix¶
-
int
ArtmRequestThetaMatrix
(int master_id, int length, const char* get_theta_args)¶ Requests theta matrix. Use
ArtmCopyRequestResult()
to copy the resulting message.Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - get_theta_args (const_char*) – Serialized GetThetaMatrixArgs message, describing the arguments of this operation.
- length (int) – The length in bytes of the get_theta_args message.
Returns: In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to
ArtmCopyRequestResult()
method. This will populate the buffer with ThetaMatrix message, carrying the requested information. In case of a failure, returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmRequestTopicModel¶
-
int
ArtmRequestTopicModel
(int master_id, int length, const char* get_model_args)¶ Requests topic model.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - get_model_args (const_char*) – Serialized GetTopicModelArgs message, describing the arguments of this operation.
- length (int) – The length in bytes of the get_model_args message.
Returns: In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to
ArtmCopyRequestResult()
method. This will populate the buffer with TopicModel message, carrying the requested information. In case of a failure, returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmRequestRegularizerState¶
-
int
ArtmRequestRegularizerState
(int master_id, const char* regularizer_name)¶ Requests state of a specific regularizer.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - regularizer_name (const_char*) – A string identified of the regularizer.
Returns: In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to
ArtmCopyRequestResult()
method. This will populate the buffer with RegularizerInternalState message, carrying the requested information. In case of a failure, returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmRequestScore¶
-
int
ArtmRequestScore
(int master_id, int length, const char* get_score_args)¶ Request the result of score calculation.
Parameters: - master_id (int) – The ID of a master component,
returned by
ArtmCreateMasterComponent()
method. - const_char* – get_score_args: Serialized GetScoreValueArgs message, describing the arguments of this operation.
- length (int) – The length in bytes of the get_score_args message.
Returns: In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to
ArtmCopyRequestResult()
method. This will populate the buffer with ScoreData message, carrying the requested information. In case of a failure, returns one of the error codes.- master_id (int) – The ID of a master component,
returned by
ArtmRequestParseCollection¶
-
int
ArtmRequestParseCollection
(int length, const char* collection_parser_config)¶ Parses a text collection into a set of batches and stores them on disk. Returns a DictionaryConfig message that lists all tokens, occured in the collection.
Check the description of CollectionParserConfig message for more details about this operation.
Parameters: - const_char* – collection_parser_config: Serialized CollectionParserConfig message, describing the configuration the collection parser.
- length (int) – The length in bytes of the collection_parser_config message.
Returns: In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to
ArtmCopyRequestResult()
method. The buffer will contain DictionaryConfig message, that lists all unique tokens from the collection being parsed. In case of a failure, returns one of the error codes.
Warning
The following error most likelly indicate that you are trying to parse a very large file in 32 bit version of BigARTM.
InternalError : failed mapping view: The parameter is incorrect
Try to use 64 bit BigARTM to workaround this issue.
ArtmRequestLoadDictionary¶
-
int
ArtmRequestLoadDictionary
(const char* filename)¶ Loads a DictionaryConfig message from disk.
Parameters: - const_char* – filename: A full file name of a file that contains a serialized DictionaryConfig message.
Returns: In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to
ArtmCopyRequestResult()
method. The buffer will contain the resulting DictionaryConfig message. In case of a failure, returns one of the error codes.
This method can be used to load CollectionParserConfig.dictionary_file_name
or CollectionParserConfig.cooccurrence_file_name
dictionaries,
saved by ArtmRequestParseCollection method.
ArtmRequestLoadBatch¶
-
int
ArtmRequestLoadBatch
(const char* filename)¶ Loads a Batch message from disk.
Parameters: - const_char* – filename: A full file name of a file that contains a serialized Batch message.
Returns: In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to
ArtmCopyRequestResult()
method. The buffer will contain the resulting Batch message. In case of a failure, returns one of the error codes.
This method can be used to load batches saved by ArtmRequestParseCollection method or ArtmSaveBatch method.
ArtmCopyRequestResult¶
-
int
ArtmCopyRequestResult
(int length, char* address)¶ Copies the result of the last request.
Parameters: - const_char* – address: Target memory location to copy the data.
- length (int) – The length in bytes of the address buffer.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.
ArtmSaveBatch¶
-
int
ArtmSaveBatch
(const char* disk_path, int length, const char* batch)¶ Saves a Batch message to disk.
Parameters: - const_char* – disk_path: A floder where to save the batch.
- batch (const_char*) – Serialized Batch message to save.
- length (int) – The length in bytes of the batch message.
Returns: Returns
ARTM_SUCCESS
value if operation succeeded, otherwise returns one of the error codes.
ArtmGetLastErrorMessage¶
-
const char*
ArtmGetLastErrorMessage
()¶ Retrieves the textual error message, occured during the last failing request.
Error codes¶
#define ARTM_SUCCESS 0
#define ARTM_STILL_WORKING -1
#define ARTM_INTERNAL_ERROR -2
#define ARTM_ARGUMENT_OUT_OF_RANGE -3
#define ARTM_INVALID_MASTER_ID -4
#define ARTM_CORRUPTED_MESSAGE -5
#define ARTM_INVALID_OPERATION -6
#define ARTM_DISK_READ_ERROR -7
#define ARTM_DISK_WRITE_ERROR -8
-
ARTM_SUCCESS
¶ The API call succeeded.
-
ARTM_STILL_WORKING
¶ This error code is applicable only to
ArtmWaitIdle()
. It indicates that library is still processing the collection. Try to retrieve results later.
-
ARTM_INTERNAL_ERROR
¶ The API call failed due to internal error in BigARTM library. Please, collect steps to reproduce this issue and report it with BigARTM issue tracker.
-
ARTM_ARGUMENT_OUT_OF_RANGE
¶ The API call failed because one or more values of an argument are outside the allowable range of values as defined by the invoked method.
-
ARTM_INVALID_MASTER_ID
¶ An API call that require master_id parameter failed because MasterComponent with given ID does not exist.
-
ARTM_CORRUPTED_MESSAGE
¶ Unable to deserialize protocol buffer message.
-
ARTM_INVALID_OPERATION
¶ The API call is invalid in current state or due to provided parameters.
-
ARTM_DISK_READ_ERROR
¶ The required files coult not be read from disk.
-
ARTM_DISK_WRITE_ERROR
¶ The required files could not be writtent to disk.