Scores

This page describes *Scores classes.

See detailed descrition of scores for understanding their sense.

class artm.SparsityPhiScore(name=None, class_id=None, topic_names=None, model_name=None, eps=None)
__init__(name=None, class_id=None, topic_names=None, model_name=None, eps=None)
Parameters:
  • name (str) – the identifier of score, will be auto-generated if not specified
  • class_id (str) – class_id to score
  • topic_names (list of str) – list of names of topics to regularize, will score all topics if not specified
  • model_name – phi-like matrix to be scored (typically ‘pwt’ or ‘nwt’), ‘pwt’ if not specified
  • eps (float) – the tolerance const, everything < eps considered to be zero
class artm.ItemsProcessedScore(name=None)
__init__(name=None)
Parameters:name (str) – the identifier of score, will be auto-generated if not specified
class artm.PerplexityScore(name=None, class_ids=None, topic_names=None, dictionary=None, use_unigram_document_model=None)
__init__(name=None, class_ids=None, topic_names=None, dictionary=None, use_unigram_document_model=None)
Parameters:
  • name (str) – the identifier of score, will be auto-generated if not specified
  • class_ids (list of str) – class_id to score, means that tokens of all class_ids will be used
  • dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified
  • use_unigram_document_model (bool) – use unigram document/collection model if token’s counter == 0
class artm.SparsityThetaScore(name=None, topic_names=None, eps=None)
__init__(name=None, topic_names=None, eps=None)
Parameters:
  • name (str) – the identifier of score, will be auto-generated if not specified
  • topic_names (list of str) – list of names of topics to regularize, will score all topics if not specified
  • eps (float) – the tolerance const, everything < eps considered to be zero
class artm.ThetaSnippetScore(name=None, item_ids=None, num_items=None)
__init__(name=None, item_ids=None, num_items=None)
Parameters:
  • name (str) – the identifier of score, will be auto-generated if not specified
  • item_ids (list of int) – list of names of items to show, default=None
  • num_items (int) – number of theta vectors to show from the beginning (no sense if item_ids was given)
class artm.TopicKernelScore(name=None, class_id=None, topic_names=None, eps=None, dictionary=None, probability_mass_threshold=None)
__init__(name=None, class_id=None, topic_names=None, eps=None, dictionary=None, probability_mass_threshold=None)
Parameters:
  • name (str) – the identifier of score, will be auto-generated if not specified
  • class_id (str) – class_id to score
  • topic_names (list of str) – list of names of topics to regularize, will score all topics if not specified
  • probability_mass_threshold (float) – the threshold for p(t|w) values to get token into topic kernel. Should be in (0, 1)
  • dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified
  • eps (float) – the tolerance const, everything < eps considered to be zero
class artm.TopTokensScore(name=None, class_id=None, topic_names=None, num_tokens=None, dictionary=None)
__init__(name=None, class_id=None, topic_names=None, num_tokens=None, dictionary=None)
Parameters:
  • name (str) – the identifier of score, will be auto-generated if not specified
  • class_id (str) – class_id to score
  • topic_names (list of str) – list of names of topics to regularize, will score all topics if not specified
  • num_tokens (int) – number of tokens with max probability in each topic
  • dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified
class artm.TopicMassPhiScore(name=None, class_id=None, topic_names=None, model_name=None, eps=None)
__init__(name=None, class_id=None, topic_names=None, model_name=None, eps=None)
Parameters:
  • name (str) – the identifier of score, will be auto-generated if not specified
  • class_id (str) – class_id to score
  • topic_names (list of str) – list of names of topics to regularize, will score all topics if not specified
  • model_name – phi-like matrix to be scored (typically ‘pwt’ or ‘nwt’), ‘pwt’ if not specified
  • eps (float) – the tolerance const, everything < eps considered to be zero
class artm.BackgroundTokensRatioScore(name=None, class_id=None, delta_threshold=None, save_tokens=None, direct_kl=None)
__init__(name=None, class_id=None, delta_threshold=None, save_tokens=None, direct_kl=None)
Parameters:
  • name (str) – the identifier of score, will be auto-generated if not specified
  • class_id (str) – class_id to score
  • delta_threshold (float) – the threshold for KL-div between p(t|w) and p(t) to get token into background. Should be non-negative
  • save_tokens (bool) – save background tokens or not, save if field not specified
  • direct_kl (bool) – use KL(p(t) || p(t|w)) or via versa, true if field not specified