CountVectorizer does not lowercase() entries in vocabulary when lowercase is set to True

Describe the bug

The default value of for lowercase in CountVectorizer is True. This has the effect that all content of documents is lowercased by default. However, the entries in the vocabulary are not lowercased. So if the vocabulary contains uppercase characters it won't match against the content in the documents.
I think CountVectorizer should either

lowercase the vocabulary as well when lowercase is True or
not allow upper case characters in the vocabulary when lowercase is True

Steps/Code to Reproduce

Expected Results

Actual Results

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

def test_count_vectorizer():
    voc = ["A", "B", "C"]
    documents = ["A B C"]

    count_model = CountVectorizer(
        ngram_range=(1, 1),
        vocabulary=voc,
    )
    x = count_model.fit_transform(documents).toarray()
    assert np.array_equal(x, [[1, 1, 1]])  # x is [[0, 0, 0]]; should be [[1, 1, 1]]

Versions

   setuptools: 51.0.0
      sklearn: 0.23.2
        numpy: 1.19.4
        scipy: 1.5.4
       Cython: 0.29.21
       pandas: 1.1.5
   matplotlib: 3.3.3
       joblib: 1.0.0
threadpoolctl: 2.1.0
Built with OpenMP: True

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions