8000 english language detect. snippet · beernlinux/python_reference@3f81ead · GitHub
[go: up one dir, main page]

Skip to content

Commit 3f81ead

Browse files
committed
english language detect. snippet
1 parent 1499636 commit 3f81ead

File tree

1 file changed

+57
-1
lines changed

1 file changed

+57
-1
lines changed

python_patterns/patterns.ipynb

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"metadata": {
33
"name": "",
4-
"signature": "sha256:0c9d8c8b65b0eec5bb7c2a2790f08a1e49daf27dac2c9dcfe8d85ce958046a2c"
4+
"signature": "sha256:714a46a359c5b1c3e7e7bd4d19d73221f9def5bcb806840be82541070041d29e"
55
},
66
"nbformat": 3,
77
"nbformat_minor": 0,
@@ -57,6 +57,7 @@
5757
"- [Differences between 2 files](#Differences-between-2-files)\n",
5858
"- [Differences between successive elements in a list](#Differences-between-successive-elements-in-a-list)\n",
5959
"- [Doctest example](#Doctest-example)\n",
60+
"- [English language detection](#English-language-detection)\n",
6 10000 061
"- [File browsing basics](#File-browsing-basics)\n",
6162
"- [File reading basics](#File-reading-basics)\n",
6263
"- [Indices of min and max elements from a list](#Indices-of-min-and-max-elements-from-a-list)\n",
@@ -595,6 +596,61 @@
595596
"<br>"
596597
]
597598
},
599+
{
600+
"cell_type": "heading",
601+
"level": 2,
602+
"metadata": {},
603+
"source": [
604+
"English language detection"
605+
]
606+
},
607+
{
608+
"cell_type": "markdown",
609+
"metadata": {},
610+
"source": [
611+
"[back to top](#Table-of-Contents)"
612+
]
613+
},
614+
{
615+
"cell_type": "code",
616+
"collapsed": false,
617+
"input": [
618+
"import nltk\n",
619+
"\n",
620+
"def eng_ratio(text):\n",
621+
" ''' Returns the ratio of non-English to English words from a text '''\n",
622+
"\n",
623+
" english_vocab = set(w.lower() for w in nltk.corpus.words.words()) \n",
624+
" text_vocab = set(w.lower() for w in text.split() if w.lower().isalpha()) \n",
625+
" unusual = text_vocab.difference(english_vocab)\n",
626+
" diff = len(unusual)/len(text_vocab)\n",
627+
" return diff\n",
628+
" \n",
629+
"text = 'This is a test fahrrad'\n",
630+
"\n",
631+
"print(eng_ratio(text))"
632+
],
633+
"language": "python",
634+
"metadata": {},
635+
"outputs": [
636+
{
637+
"output_type": "stream",
638+
"stream": "stdout",
639+
"text": [
640+
"0.2\n"
641+
]
642+
}
643+
],
644+
"prompt_number": 1
645+
},
646+
{
647+
"cell_type": "markdown",
648+
"metadata": {},
649+
"source": [
650+
"<br>\n",
651+
"<br>"
652+
]
653+
},
598654
{
599655
"cell_type": "heading",
600656
"level": 2,

0 commit comments

Comments
 (0)
0