Changes in Protobuf Messages¶
v0.8.2¶
- added
CollectionParserConfig.num_threadsto control the number of threads that perform parsing. At the moment the feature is only implemented for VW-format. - added
CollectionParserConfig.class_id(repeated string) to control which modalities should be parsed. If token’s class_id is not from this list, it will be excluded from the resulting batches. When the list is empty, all modalities are included (this is the default behavior, as before). - added
CollectionParserInfomessage to export diagnostics information fromArtmParseCollection - added
FilterDictionaryArgs.max_dictionary_sizeto give user an easy option to limit his dictionary size - added
MergeModelArgs.dictionary_nameto define the set of tokens in the resulting matrix - added
ThetaMatrix.num_values,TopicModel.num_valuesto define number of non-zero elements in sparse format
v0.8.0¶
Warning
New batches, created in BigARTM v0.8, CAN NOT be used in the previous versions of the library. Old batches, created prior to BigARTM v0.8, can still be used. See below for details.
added
token_idandtoken_weightfield inItemmessage, and obsoletedItem.field. Internally the library will merge the content ofField.token_idandField.token_weightacross all fields, and store the result back intoItem.token_id,Item.token_weight. NewItemmessage is as follows:message Item { optional int32 id = 1; repeated Field field = 2; // obsolete in BigARTM v0.8.0 optional string title = 3; repeated int32 token_id = 4; repeated float token_weight = 5; }
renamed
topics_countintonum_topicsacross multiple messsages (TopicModel,ThetaMatrix, etc)renamed
inner_iterations_countintonum_document_passesinProcessBatchesArgsrenamed
passesintonum_collection_passesinFitOfflineMasterModelArgsrenamed
threadsintonum_processorsinMasterModelConfigrenamed
topic_indexfield intotopic_indicesinTopicModelandThetaMatrixmessagesadded messages
ScoreArray,GetScoreArrayArgsandClearScoreArrayCacheArgsto bring score tracking functionality down into BigARTM coreadded messages
BackgroundTokensRatioConfigandBackgroundTokensRatio(new score)moved
model_namefromGetScoreValueArgsintoScoreConfig; this is done to support score tracking functionality in BigARTM core; each Phi score needs to know which model to use in calculationremoved
topics_countfromInitializeModelArgs; users must specify topic names inInitializeModelArgs.topic_namefieldremoved
topic_indexfromGetThetaMatrixArgs; users must specify topic names to retrieve inGetThetaMatrixArgs.topic_nameremoved
batchfield inGetThetaMatrixArgsandGetScoreValueArgs.batchmessages; users should useArtmRequestTransformMasterModelorArtmRequestProcessBatchesto process new batches and calculate theta scoresremoved
reset_scoresflag inProcessBatchesArgs; users should use new APIArtmClearScoreCacheremoved
clean_cacheflag inGetThetaMatrixArgs; users should use new APIArtmClearThetaCacheremoved
MasterComponentConfig; users should userArtmCreateMasterModeland passMasterModelConfigremoved obsolete fields in
CollectionParserConfig; same arguments can be specified atGatherDictionaryArgsand passed toArtmGatherDictionaryremoved
Filtermessage inInitializeModelArgs; same arguments can be specified atFilterDictionaryArgsand passed toArtmFilterDictionaryremoved
batch_namefromImportBatchesArgs; the field is no longer needed; batches will be identified via theirBatch.ididentifierremoved
use_v06_apiinMasterModelConfigremoved
ModelConfigmessageremoved
SynchronizeModelArgs,AddBatchArgs,InvokeIterationArgs,WaitIdleArgsmessages; users should use new APIs based on MasterModelremoved
GetRegularizerStateArgs,RegularizerInternalState,MultiLanguagePhiInternalStatemessagesremoved
model_nameandmodel_name_cacheinThetaMatrix,GetThetaMatrixArgsandProcessBatchesArgs; the code of master component is simplified to only handle one theta matrix, so there is no longer any reason to identify theta matrix withmodel_nameremoved
Streammessage,MasterComponentConfig.streamfield, and allstream_namefields across several messages; train/test streaming functionality is fully removed; users are expected to manage their train and test collections (for example as separate folders with batches)removed
use_sparse_bowfield in several messages; the computation mode with dense matrices is no longer supported;renamed
item_countintonum_itemsinThetaSnippetScoreConfigadd global enum
ScoreTypeas a replacement for enumsTypefromScoreConfigandScoreDatamessagesadd global enum
RegularizerTypeas a replacement for enumTypefromRegularizerConfigmessageadd global enum
MatrixLayoutas a replacement for enumMatrixLayoutfromGetThetaMatrixArgsandGetTopicModelArgsmessagesadd global enum
ThetaMatrixTypeas a replacement for enumThetaMatrixTypefromProcessBatchesArgsandTransformMasterModelArgsmessagesrenamed enum
TypeintoSmoothTypeinSmoothPtdwConfigto avoid conflicts in C# messagesrenamed enum
ModeintoSparseModeinSpecifiedSparsePhiConfigto avoid conflicts in C# messagesrenamed enum
FormatintoCollectionFormatinCollectionParserConfigto avoid conflicts in C# messagesrenamed enum
NameTypeintoBatchNameTypeinCollectionParserConfigto avoid conflicts in C# messagesrenamed field
transform_typeintotypeinTransformConfigto avoid conflicts in C# messagesremove message
CopyRequestResultArgs; this is a breaking change; please check that- all previous calls to
ArtmCopyRequestResultare changed to toArtmCopyRequestedMessage - all previous calls to
ArtmCopyRequestResultExwith request typesGetThetaSecondPassandGetModelSecondPassare changed toArtmCopyRequestedObject - all previous calls to
ArtmCopyRequestResultExwithDefaultRequestTypeare changed toArtmCopyRequestedMessage
- all previous calls to
remove field
request_typeinGetTopicModelArgs; to request only topics and/or tokens users should setGetTopicModelArgs.matrix_layouttoMatrixLayout_Sparse, andGetTopicModelArgs.eps = 1.001(any number greather that 1.0).change
optional FloatArrayintorepeated floatin fieldcoherenceofTopTokensScorechange
optional DoubleArrayintorepeated doublein fieldskernel_size,kernel_purity,kernel_contrastandcoherenceofTopicKernelScorechange
optional StringArrayintorepeated stringin fieldtopic_nameofTopicKernelScore