Changes in Protobuf Messages¶
v0.8.2¶
- added
CollectionParserConfig.num_threads
to control the number of threads that perform parsing. At the moment the feature is only implemented for VW-format. - added
CollectionParserConfig.class_id
(repeated string) to control which modalities should be parsed. If token’s class_id is not from this list, it will be excluded from the resulting batches. When the list is empty, all modalities are included (this is the default behavior, as before). - added
CollectionParserInfo
message to export diagnostics information fromArtmParseCollection
- added
FilterDictionaryArgs.max_dictionary_size
to give user an easy option to limit his dictionary size - added
MergeModelArgs.dictionary_name
to define the set of tokens in the resulting matrix - added
ThetaMatrix.num_values
,TopicModel.num_values
to define number of non-zero elements in sparse format
v0.8.0¶
Warning
New batches, created in BigARTM v0.8, CAN NOT be used in the previous versions of the library. Old batches, created prior to BigARTM v0.8, can still be used. See below for details.
added
token_id
andtoken_weight
field inItem
message, and obsoletedItem.field
. Internally the library will merge the content ofField.token_id
andField.token_weight
across all fields, and store the result back intoItem.token_id
,Item.token_weight
. NewItem
message is as follows:message Item { optional int32 id = 1; repeated Field field = 2; // obsolete in BigARTM v0.8.0 optional string title = 3; repeated int32 token_id = 4; repeated float token_weight = 5; }
renamed
topics_count
intonum_topics
across multiple messsages (TopicModel
,ThetaMatrix
, etc)renamed
inner_iterations_count
intonum_document_passes
inProcessBatchesArgs
renamed
passes
intonum_collection_passes
inFitOfflineMasterModelArgs
renamed
threads
intonum_processors
inMasterModelConfig
renamed
topic_index
field intotopic_indices
inTopicModel
andThetaMatrix
messagesadded messages
ScoreArray
,GetScoreArrayArgs
andClearScoreArrayCacheArgs
to bring score tracking functionality down into BigARTM coreadded messages
BackgroundTokensRatioConfig
andBackgroundTokensRatio
(new score)moved
model_name
fromGetScoreValueArgs
intoScoreConfig
; this is done to support score tracking functionality in BigARTM core; each Phi score needs to know which model to use in calculationremoved
topics_count
fromInitializeModelArgs
; users must specify topic names inInitializeModelArgs.topic_name
fieldremoved
topic_index
fromGetThetaMatrixArgs
; users must specify topic names to retrieve inGetThetaMatrixArgs.topic_name
removed
batch
field inGetThetaMatrixArgs
andGetScoreValueArgs.batch
messages; users should useArtmRequestTransformMasterModel
orArtmRequestProcessBatches
to process new batches and calculate theta scoresremoved
reset_scores
flag inProcessBatchesArgs
; users should use new APIArtmClearScoreCache
removed
clean_cache
flag inGetThetaMatrixArgs
; users should use new APIArtmClearThetaCache
removed
MasterComponentConfig
; users should userArtmCreateMasterModel
and passMasterModelConfig
removed obsolete fields in
CollectionParserConfig
; same arguments can be specified atGatherDictionaryArgs
and passed toArtmGatherDictionary
removed
Filter
message inInitializeModelArgs
; same arguments can be specified atFilterDictionaryArgs
and passed toArtmFilterDictionary
removed
batch_name
fromImportBatchesArgs
; the field is no longer needed; batches will be identified via theirBatch.id
identifierremoved
use_v06_api
inMasterModelConfig
removed
ModelConfig
messageremoved
SynchronizeModelArgs
,AddBatchArgs
,InvokeIterationArgs
,WaitIdleArgs
messages; users should use new APIs based on MasterModelremoved
GetRegularizerStateArgs
,RegularizerInternalState
,MultiLanguagePhiInternalState
messagesremoved
model_name
andmodel_name_cache
inThetaMatrix
,GetThetaMatrixArgs
andProcessBatchesArgs
; the code of master component is simplified to only handle one theta matrix, so there is no longer any reason to identify theta matrix withmodel_name
removed
Stream
message,MasterComponentConfig.stream
field, and allstream_name
fields across several messages; train/test streaming functionality is fully removed; users are expected to manage their train and test collections (for example as separate folders with batches)removed
use_sparse_bow
field in several messages; the computation mode with dense matrices is no longer supported;renamed
item_count
intonum_items
inThetaSnippetScoreConfig
add global enum
ScoreType
as a replacement for enumsType
fromScoreConfig
andScoreData
messagesadd global enum
RegularizerType
as a replacement for enumType
fromRegularizerConfig
messageadd global enum
MatrixLayout
as a replacement for enumMatrixLayout
fromGetThetaMatrixArgs
andGetTopicModelArgs
messagesadd global enum
ThetaMatrixType
as a replacement for enumThetaMatrixType
fromProcessBatchesArgs
andTransformMasterModelArgs
messagesrenamed enum
Type
intoSmoothType
inSmoothPtdwConfig
to avoid conflicts in C# messagesrenamed enum
Mode
intoSparseMode
inSpecifiedSparsePhiConfig
to avoid conflicts in C# messagesrenamed enum
Format
intoCollectionFormat
inCollectionParserConfig
to avoid conflicts in C# messagesrenamed enum
NameType
intoBatchNameType
inCollectionParserConfig
to avoid conflicts in C# messagesrenamed field
transform_type
intotype
inTransformConfig
to avoid conflicts in C# messagesremove message
CopyRequestResultArgs
; this is a breaking change; please check that- all previous calls to
ArtmCopyRequestResult
are changed to toArtmCopyRequestedMessage
- all previous calls to
ArtmCopyRequestResultEx
with request typesGetThetaSecondPass
andGetModelSecondPass
are changed toArtmCopyRequestedObject
- all previous calls to
ArtmCopyRequestResultEx
withDefaultRequestType
are changed toArtmCopyRequestedMessage
- all previous calls to
remove field
request_type
inGetTopicModelArgs
; to request only topics and/or tokens users should setGetTopicModelArgs.matrix_layout
toMatrixLayout_Sparse
, andGetTopicModelArgs.eps = 1.001
(any number greather that 1.0).change
optional FloatArray
intorepeated float
in fieldcoherence
ofTopTokensScore
change
optional DoubleArray
intorepeated double
in fieldskernel_size
,kernel_purity
,kernel_contrast
andcoherence
ofTopicKernelScore
change
optional StringArray
intorepeated string
in fieldtopic_name
ofTopicKernelScore