Plain C interface of BigARTM

This document explains all public methods of the low level BigARTM interface.

Introduction

The goal of low level BigARTM interface is to expose all functionality of the library in a set of simple functions written in good old plain C language. This makes it easier to consume BigARTM from various programming environments. For example, the Python Interface of BigARTM uses ctypes module to call the low level BigARTM interface. Most programming environments also have similar functionality: PInvoke in C#, loadlibrary in Matlab, etc.

Note that most methods in this API accept a serialized binary representation of some Google Protocol Buffer message. Please, refer to Messages for more details about each particular message.

All methods in this API return an integer value. Negative return values represent an error code. See error codes for the list of all error codes. To get corresponding error message as string use ArtmGetLastErrorMessage(). Non-negative return values represent a success, and for some API methods might also incorporate some useful information. For example, ArtmCreateMasterComponent() returns the ID of newly created master component, and ArtmRequestTopicModel() returns the length of the buffer that should be allocated before calling ArtmCopyRequestResult().

ArtmCreateMasterComponent

int ArtmCreateMasterComponent(int length, const char* master_component_config)

Creates a master component.

Parameters:
  • master_component_config (const_char*) – Serialized MasterComponentConfig message, describing the configuration of the master component.
  • length (int) – The length in bytes of the master_component_config message.
Returns:

In case of success, a non-negative ID of the master component, otherwise one of the error codes.

The ID, returned by this operation, is required by most methods in this API. Several master components may coexist in the same process. In such case any two master components with different IDs can not share any common data, and thus they are completely independent from each other.

ArtmReconfigureMasterComponent

int ArtmReconfigureMasterComponent(int master_id, int length, const char* master_component_config)

Changes the configuration of the master component.

Parameters:
  • master_id (int) – The ID of a master component returned by ArtmCreateMasterComponent() method.
  • master_component_config (const_char*) – Serialized MasterComponentConfig message, describing the new configuration of the master component.
  • length (int) – The length in bytes of the master_component_config message.
Returns:

A zero value if operation succeeded, otherwise one of the error codes.

ArtmDisposeMasterComponent

int ArtmDisposeMasterComponent(int master_id)

Disposes master component together with all its models, regularizers and dictionaries.

Parameters:
Returns:

This operation always returns ARTM_SUCCESS.

This operation releases memory and other unmanaged resources, used by the master component.

After this operation the master_id value becames invalid and must not be used in other operations.

ArtmCreateModel

int ArtmCreateModel(int master_id, int length, const char* model_config)

Defines a new topic model.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • model_config (const_char*) – Serialized ModelConfig message, describing the configuration of the topic model.
  • length (int) – The length in bytes of the model_config message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

Note that this method only defines the configuration a topic model, but does not tune it. Use ArtmInvokeIteration() method to process the collection of textual documents, and then ArtmRequestTopicModel() to retrieve the resulting topic model.

It is important to notice that model_config must have a unique value in the ModelConfig.name field, that can be further used to identify the model (for example in ArtmRequestTopicModel() call).

ArtmReconfigureModel

int ArtmReconfigureModel(int master_id, int length, const char* model_config)

Updates the configuration of topic model.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • model_config (const_char*) – Serialized ModelConfig message, describing the new configuration of the topic model.
  • length (int) – The length in bytes of the model_config message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmDisposeModel

int ArtmDisposeModel(int master_id, const char* model_name)

Explicitly delete a specific topic model. All regularizers within specific master component are also deleted automatically by ArtmDisposeMasterComponent().

After ArtmDisposeModel() the model_name became invalid and shell not be used in ArtmRequestScore(), ArtmRequestTopicModel(), ArtmRequestThetaMatrix() or any other method (or protobuf message) that require model_name.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • model_name (const_char*) – A string identified of the model that should be deleted.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmCreateRegularizer

int ArtmCreateRegularizer(int master_id, int length, const char* regularizer_config)

Creates a new regularizer.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • regularizer_config (const_char*) – Serialized RegularizerConfig message, describing the configuration of a new regularizer.
  • length (int) – The length in bytes of the regularizer_config message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

This operation only creates the regularizer so that it can be used by topic models. To actually apply the regularizer you should include its name in ModelConfig.regularizer_name list of a model config.

ArtmReconfigureRegularizer

int ArtmReconfigureRegularizer(int master_id, int length, const char* regularizer_config)

Updates the configuration of the regularizer.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • regularizer_config (const_char*) – Serialized RegularizerConfig message, describing the configuration of a new regularizer.
  • length (int) – The length in bytes of the regularizer_config message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmDisposeRegularizer

int ArtmDisposeRegularizer(int master_id, const char* regularizer_name)

Explicitly delete a specific regularizer. All regularizers within specific master component are also deleted automatically by ArtmDisposeMasterComponent().

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • regularizer_name (const_char*) – A string identified of the regularizer that should be deleted.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmCreateDictionary

int ArtmCreateDictionary(int master_id, int length, const char* dictionary_config)

Creates a new dictionary.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • dictionary_config (const_char*) – Serialized DictionaryConfig message, describing the configuration of a new dictionary.
  • length (int) – The length in bytes of the dictionary_config message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmReconfigureDictionary

int ArtmReconfigureDictionary(int master_id, int length, const char* dictionary_config)

Updates the dictionary.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • dictionary_config (const_char*) – Serialized DictionaryConfig message, describing the new configuration of the dictionary.
  • length (int) – The length in bytes of the dictionary_config message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmDisposeDictionary

int ArtmDisposeDictionary(int master_id, const char* dictionary_name)

Explicitly delete a specific dictionary. All dictionaries within specific master component are also deleted automatically by ArtmDisposeMasterComponent().

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • dictionary_name (const_char*) – A string identified of the dictionary that should be deleted.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmAddBatch

int ArtmAddBatch(int master_id, int length, const char* add_batch_args)

Adds batch for processing.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • add_batch_args (const_char*) – Serialized AddBatchArgs message, describing the arguments of this operation.
  • length (int) – The length in bytes of the add_batch_args message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmInvokeIteration

int ArtmInvokeIteration(int master_id, int length, const char* invoke_iteration_args)

Invokes several iterations over the collection.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • char* invoke_iteration_args (const) – Serialized InvokeIterationArgs message, describing the arguments of this operation.
  • length (int) – The length in bytes of the invoke_iteration_args message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmSynchronizeModel

int ArtmSynchronizeModel(int master_id, int length, const char* sync_model_args)

Synchronizes topic model.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • sync_model_args (const_char*) – Serialized SynchronizeModelArgs message, describing the arguments of this operation.
  • length (int) – The length in bytes of the sync_model_args message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

This operation updates the Phi matrix of the topic model with all model increments, collected since last call to ArtmSynchronizeModel. In addition, this operation invokes all Phi-regularizers for the requested topic model.

ArtmInitializeModel

int ArtmInitializeModel(int master_id, int length, const char* init_model_args)

Initializes the phi matrix of a topic model with some random initial approximation.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • init_model_args (const_char*) – Serialized InitializeModelArgs message, describing the arguments of this operation.
  • length (int) – The length in bytes of the init_model_args message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmExportModel

int ArtmExportModel(int master_id, int length, const char* export_model_args)

Exports phi matrix into a file.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • export_model_args (const_char*) – Serialized ExportModelArgs message, describing the arguments of this operation.
  • length (int) – The length in bytes of the export_model_args message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmImportModel

int ArtmImportModel(int master_id, int length, const char* import_model_args)

Import phi matrix from a file.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • import_model_args (const_char*) – Serialized ImportModelArgs message, describing the arguments of this operation.
  • length (int) – The length in bytes of the import_model_args message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmWaitIdle

int ArtmWaitIdle(int master_id, int length, const char* wait_idle_args)

Awaits for ongoing iterations.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • wait_idle_args (const_char*) – Serialized WaitIdleArgs message, describing the arguments of this operation.
  • length (int) – The length in bytes of the wait_idle_args message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmOverwriteTopicModel

int ArtmOverwriteTopicModel(int master_id, int length, const char* topic_model)

This operation schedules an update of an entire topic model or of it subpart.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • topic_model (const_char*) – Serialized TopicModel message, describing the new topic model.
  • length (int) – The length in bytes of the topic_model message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

Note that this operation only schedules the update of a topic model. To make sure the update is completed you must call ArtmWaitIdle() and ArtmSynchronizeModel(). Remember that by default ArtmSynchronizeModel() will calculate all regularizers enabled in the configuration of the topic model. The may result in a different topic model than the one you passed as topic_model parameter. To avoid this behavior set SynchronizeModelArgs.invoke_regularizers to false.

ArtmRequestThetaMatrix

int ArtmRequestThetaMatrix(int master_id, int length, const char* get_theta_args)

Requests theta matrix. Use ArtmCopyRequestResult() to copy the resulting message.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • get_theta_args (const_char*) – Serialized GetThetaMatrixArgs message, describing the arguments of this operation.
  • length (int) – The length in bytes of the get_theta_args message.
Returns:

In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to ArtmCopyRequestResult() method. This will populate the buffer with ThetaMatrix message, carrying the requested information. In case of a failure, returns one of the error codes.

ArtmRequestTopicModel

int ArtmRequestTopicModel(int master_id, int length, const char* get_model_args)

Requests topic model.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • get_model_args (const_char*) – Serialized GetTopicModelArgs message, describing the arguments of this operation.
  • length (int) – The length in bytes of the get_model_args message.
Returns:

In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to ArtmCopyRequestResult() method. This will populate the buffer with TopicModel message, carrying the requested information. In case of a failure, returns one of the error codes.

ArtmRequestRegularizerState

int ArtmRequestRegularizerState(int master_id, const char* regularizer_name)

Requests state of a specific regularizer.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • regularizer_name (const_char*) – A string identified of the regularizer.
Returns:

In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to ArtmCopyRequestResult() method. This will populate the buffer with RegularizerInternalState message, carrying the requested information. In case of a failure, returns one of the error codes.

ArtmRequestScore

int ArtmRequestScore(int master_id, int length, const char* get_score_args)

Request the result of score calculation.

Parameters:
  • master_id (int) – The ID of a master component, returned by ArtmCreateMasterComponent() method.
  • const_char* – get_score_args: Serialized GetScoreValueArgs message, describing the arguments of this operation.
  • length (int) – The length in bytes of the get_score_args message.
Returns:

In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to ArtmCopyRequestResult() method. This will populate the buffer with ScoreData message, carrying the requested information. In case of a failure, returns one of the error codes.

ArtmRequestParseCollection

int ArtmRequestParseCollection(int length, const char* collection_parser_config)

Parses a text collection into a set of batches and stores them on disk. Returns a DictionaryConfig message that lists all tokens, occured in the collection.

Check the description of CollectionParserConfig message for more details about this operation.

Parameters:
  • const_char* – collection_parser_config: Serialized CollectionParserConfig message, describing the configuration the collection parser.
  • length (int) – The length in bytes of the collection_parser_config message.
Returns:

In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to ArtmCopyRequestResult() method. The buffer will contain DictionaryConfig message, that lists all unique tokens from the collection being parsed. In case of a failure, returns one of the error codes.

Warning

The following error most likelly indicate that you are trying to parse a very large file in 32 bit version of BigARTM.

InternalError :  failed mapping view: The parameter is incorrect

Try to use 64 bit BigARTM to workaround this issue.

ArtmRequestLoadDictionary

int ArtmRequestLoadDictionary(const char* filename)

Loads a DictionaryConfig message from disk.

Parameters:
  • const_char* – filename: A full file name of a file that contains a serialized DictionaryConfig message.
Returns:

In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to ArtmCopyRequestResult() method. The buffer will contain the resulting DictionaryConfig message. In case of a failure, returns one of the error codes.

This method can be used to load CollectionParserConfig.dictionary_file_name or CollectionParserConfig.cooccurrence_file_name dictionaries, saved by ArtmRequestParseCollection method.

ArtmRequestLoadBatch

int ArtmRequestLoadBatch(const char* filename)

Loads a Batch message from disk.

Parameters:
  • const_char* – filename: A full file name of a file that contains a serialized Batch message.
Returns:

In case of success, returns the length in bytes of a buffer that should be allocated on callers site and then passed to ArtmCopyRequestResult() method. The buffer will contain the resulting Batch message. In case of a failure, returns one of the error codes.

This method can be used to load batches saved by ArtmRequestParseCollection method or ArtmSaveBatch method.

ArtmCopyRequestResult

int ArtmCopyRequestResult(int length, char* address)

Copies the result of the last request.

Parameters:
  • const_char* – address: Target memory location to copy the data.
  • length (int) – The length in bytes of the address buffer.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmSaveBatch

int ArtmSaveBatch(const char* disk_path, int length, const char* batch)

Saves a Batch message to disk.

Parameters:
  • const_char* – disk_path: A floder where to save the batch.
  • batch (const_char*) – Serialized Batch message to save.
  • length (int) – The length in bytes of the batch message.
Returns:

Returns ARTM_SUCCESS value if operation succeeded, otherwise returns one of the error codes.

ArtmGetLastErrorMessage

const char* ArtmGetLastErrorMessage()

Retrieves the textual error message, occured during the last failing request.

Error codes

#define ARTM_SUCCESS 0
#define ARTM_STILL_WORKING -1
#define ARTM_INTERNAL_ERROR -2
#define ARTM_ARGUMENT_OUT_OF_RANGE -3
#define ARTM_INVALID_MASTER_ID -4
#define ARTM_CORRUPTED_MESSAGE -5
#define ARTM_INVALID_OPERATION -6
#define ARTM_DISK_READ_ERROR -7
#define ARTM_DISK_WRITE_ERROR -8
ARTM_SUCCESS

The API call succeeded.

ARTM_STILL_WORKING

This error code is applicable only to ArtmWaitIdle(). It indicates that library is still processing the collection. Try to retrieve results later.

ARTM_INTERNAL_ERROR

The API call failed due to internal error in BigARTM library. Please, collect steps to reproduce this issue and report it with BigARTM issue tracker.

ARTM_ARGUMENT_OUT_OF_RANGE

The API call failed because one or more values of an argument are outside the allowable range of values as defined by the invoked method.

ARTM_INVALID_MASTER_ID

An API call that require master_id parameter failed because MasterComponent with given ID does not exist.

ARTM_CORRUPTED_MESSAGE

Unable to deserialize protocol buffer message.

ARTM_INVALID_OPERATION

The API call is invalid in current state or due to provided parameters.

ARTM_DISK_READ_ERROR

The required files coult not be read from disk.

ARTM_DISK_WRITE_ERROR

The required files could not be writtent to disk.