10000 [RFC] [Validator] adding a new WordCount constraint validator · Issue #6813 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content
8000

[RFC] [Validator] adding a new WordCount constraint validator #6813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hhamon opened this issue Jan 20, 2013 · 10 comments · Fixed by #57716
Closed

[RFC] [Validator] adding a new WordCount constraint validator #6813

hhamon opened this issue Jan 20, 2013 · 10 comments · Fixed by #57716

Comments

@hhamon
Copy link
Contributor
hhamon commented Jan 20, 2013

Hi guys,

I was thinking of adding a new constraint validator similar to the Length constraint but for counting words in a string. Sometimes, when writing a text or a description, we're asked to write at least x words or at most x words.

Do you think it would be needed in the validator component?

@fabpot
Copy link
Member
fabpot commented Jan 20, 2013

-1 for a lot of reasons but the main two are:

  • Seems too specific
  • What is a word?

@hhamon
Copy link
Contributor Author
hhamon commented Jan 20, 2013

I was expecting the too specific need argument. That's why I also opened the PR to know more about the community comments.

Defining what a word is, is tricky I guess. I was thinking of using the native str_word_count() function. The documentation of the function says:

'word' is defined as a locale dependent string containing alphabetic characters, which also may contain, but not start
with "'" and "-" characters.

I guess it only works with latin alphabets. I don't know if it works with languages like Chinese or Russian.

8000

@dlsniper
Copy link
Contributor

This is going to be off-topic, sorry.
I think this has been asked a few times but why don't we have a ValidatorExtras / ValidatorExtensions / something that's maintained on a separate release cycle and can be used to add validators like this?

They should be compatible with the currently active versions of Symfony2 and would help creating various validators and sharing them with other users without touching the main repository?

If I've missed the reason for it, sorry, just tell me to look for it and I'll do.

@javiereguiluz
Copy link
Member

It's true that the proposed constraint seems too specific, but in my opinion Symfony already includes some very specific (and perhaps not used much) constraints such as IpValidator.

About the "What is a word?" consideration, this is of course a problem impossible to solve, but there are some good-enough solutions. Besides the str_word_count() function suggested by @hhamon, there are some well-known regular expressions to detect word separators, such as the one used by SublimeText:

// Characters that are considered to separate words
"word_separators": "./\\()\"'-:,.;<>~!@#$%^&*|+=[]{}`~?",

Last but not least, keep in mind that the word constraint is absolutely mandatory for journalists, professional bloggers and other technical writers. These people use CMS tools and Symfony is the technology selected by lots of CMS-related projects (Drupal, Symfony CMF, eZ Publish, Fork CMS, etc.)

@webda2l
Copy link
webda2l commented Jan 30, 2013

As mentionned @dlsniper, a policy about *Extensions repositories would be welcome.

There are currently:
https://github.com/beberlei/DoctrineExtensions/tree/master/lib/DoctrineExtensions
https://github.com/fabpot/Twig-extensions

An more or less official FormExtensions would be great and some publicity about these extensions from symfony.com or other, would be even better.

@fabpot
Copy link
Member
fabpot commented Apr 28, 2014

Closing as I fear we won't agree of what a word is.

@fabpot fabpot closed this as completed Apr 28, 2014
@damienalexandre
Copy link
Contributor

ICU / Intl is providing a breakIterator to detect words / sentences etc: https://unicode-org.github.io/icu/userguide/boundaryanalysis/#word-boundary

So we don't have to agree on what a word is, that's not for Symfony to decide. We can leverage the BreakIterator like this:

<?php

function getWords(string $content)
{
    $bi = \IntlBreakIterator::createWordInstance();
    $bi->setText($content);
    
    $words = iterator_to_array($bi->getPartsIterator());
    $words = array_map('trim', $words);
    
    return array_values(array_filter($words));
}

var_dump(getWords('次のバスは遠いです')); // Expected 6
var_dump(getWords('Ton père je suis !')); // Expected 5

https://3v4l.org/lv95t#v8.3.8

So I would like to suggest we reopen this issue to explore this path, maybe even add a method in the String component?

Thanks!

@fabpot fabpot reopened this Jun 18, 2024
@manuelsucco
Copy link

Hi,

I'm new to contributing to symfony, but work with it daily. We use word count for limiting paper abstract sizes and such. I don't think this validator is too specific and with the IntlBreakIterator-class it seems grounded enough. There could even be a SentenceCount validator without much more work.

I offer to help.

@kriskoch
Copy link

Yeah I dont think 'too specific' can be a valid argument anymore, when we have Bic, ISBN, etc

@xabbuh xabbuh closed this as completed in e63495e Jul 17, 2024
@xabbuh
Copy link
Member
xabbuh commented Jul 17, 2024

implemented in #57716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants
0