diff --git a/doc/modules/naive_bayes.rst b/doc/modules/naive_bayes.rst index b0b32c28e455a..05ca928dfae0b 100644 --- a/doc/modules/naive_bayes.rst +++ b/doc/modules/naive_bayes.rst @@ -69,11 +69,15 @@ On the flip side, although naive Bayes is known as a decent classifier, it is known to be a bad estimator, so the probability outputs from ``predict_proba`` are not to be taken too seriously. -.. topic:: References: +|details-start| +**References** +|details-split| - * H. Zhang (2004). `The optimality of Naive Bayes. - `_ - Proc. FLAIRS. +* H. Zhang (2004). `The optimality of Naive Bayes. + `_ + Proc. FLAIRS. + +|details-end| .. _gaussian_naive_bayes: @@ -147,8 +151,13 @@ that is particularly suited for imbalanced data sets. Specifically, CNB uses statistics from the *complement* of each class to compute the model's weights. The inventors of CNB show empirically that the parameter estimates for CNB are more stable than those for MNB. Further, CNB regularly outperforms MNB (often -by a considerable margin) on text classification tasks. The procedure for -calculating the weights is as follows: +by a considerable margin) on text classification tasks. + +|details-start| +**Weights calculation** +|details-split| + +The procedure for calculating the weights is as follows: .. math:: @@ -173,12 +182,18 @@ classification rule is: i.e., a document is assigned to the class that is the *poorest* complement match. -.. topic:: References: +|details-end| + +|details-start| +**References** +|details-split| - * Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). - `Tackling the poor assumptions of naive bayes text classifiers. - `_ - In ICML (Vol. 3, pp. 616-623). +* Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). + `Tackling the poor assumptions of naive bayes text classifiers. + `_ + In ICML (Vol. 3, pp. 616-623). + +|details-end| .. _bernoulli_naive_bayes: @@ -190,7 +205,7 @@ algorithms for data that is distributed according to multivariate Bernoulli distributions; i.e., there may be multiple features but each one is assumed to be a binary-valued (Bernoulli, boolean) variable. Therefore, this class requires samples to be represented as binary-valued -feature vectors; if handed any other kind of data, a ``BernoulliNB`` instance +feature vectors; if handed any other kind of data, a :class:`BernoulliNB` instance may binarize its input (depending on the ``binarize`` parameter). The decision rule for Bernoulli naive Bayes is based on @@ -205,24 +220,28 @@ that is an indicator for class :math:`y`, where the multinomial variant would simply ignore a non-occurring feature. In the case of text classification, word occurrence vectors (rather than word -count vectors) may be used to train and use this classifier. ``BernoulliNB`` +count vectors) may be used to train and use this classifier. :class:`BernoulliNB` might perform better on some datasets, especially those with shorter documents. It is advisable to evaluate both models, if time permits. -.. topic:: References: +|details-start| +**References** +|details-split| + +* C.D. Manning, P. Raghavan and H. Schütze (2008). Introduction to + Information Retrieval. Cambridge University Press, pp. 234-265. - * C.D. Manning, P. Raghavan and H. Schütze (2008). Introduction to - Information Retrieval. Cambridge University Press, pp. 234-265. +* A. McCallum and K. Nigam (1998). + `A comparison of event models for Naive Bayes text classification. + `_ + Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48. - * A. McCallum and K. Nigam (1998). - `A comparison of event models for Naive Bayes text classification. - `_ - Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48. +* V. Metsis, I. Androutsopoulos and G. Paliouras (2006). + `Spam filtering with Naive Bayes -- Which Naive Bayes? + `_ + 3rd Conf. on Email and Anti-Spam (CEAS). - * V. Metsis, I. Androutsopoulos and G. Paliouras (2006). - `Spam filtering with Naive Bayes -- Which Naive Bayes? - `_ - 3rd Conf. on Email and Anti-Spam (CEAS). +|details-end| .. _categorical_naive_bayes: @@ -239,6 +258,10 @@ For each feature :math:`i` in the training set :math:`X`, of X conditioned on the class y. The index set of the samples is defined as :math:`J = \{ 1, \dots, m \}`, with :math:`m` as the number of samples. +|details-start| +**Probability calculation** +|details-split| + The probability of category :math:`t` in feature :math:`i` given class :math:`c` is estimated as: @@ -253,6 +276,8 @@ to class :math:`c`, :math:`N_{c} = |\{ j \in J\mid y_j = c\}|` is the number of samples with class c, :math:`\alpha` is a smoothing parameter and :math:`n_i` is the number of available categories of feature :math:`i`. +|details-end| + :class:`CategoricalNB` assumes that the sample matrix :math:`X` is encoded (for instance with the help of :class:`~sklearn.preprocessing.OrdinalEncoder`) such that all categories for each feature :math:`i` are represented with numbers