8000 CS 11.0 official release · ChatScript/ChatScript@1c12647 · GitHub
[go: up one dir, main page]

Skip to content

Commit 1c12647

Browse files
committed
CS 11.0 official release
1 parent 55446da commit 1c12647

File tree

166 files changed

+6843
-6182
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

166 files changed

+6843
-6182
lines changed

BINARIES/ChatScriptMssql.exe

-23.5 KB
Binary file not shown.

BINARIES/ChatScriptMysql.exe

-23 KB
Binary file not shown.

BINARIES/ChatScriptmongo.exe

-17.5 KB
Binary file not shown.

BINARIES/ChatScriptpg.exe

-17.5 KB
Binary file not shown.

BINARIES/LinuxChatScript64

8.43 KB
Binary file not shown.

BINARIES/chatscript.dll

-18 KB
Binary file not shown.

BINARIES/chatscript.exe

-18 KB
Binary file not shown.

BINARIES/chatscript.lib

0 Bytes
Binary file not shown.

HTMLDOCUMENTATION/ChatScript-Advanced-Pattern-Manual.html

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,17 @@
99
</head>
1010
<body>
1111
<h1 id="chatscript-advanced-pattern-manual">ChatScript Advanced Pattern Manual</h1>
12-
<p>copyright Bruce Wilcox, mailto:gowilcox@gmail.com <br> <br>Revision 10/18/2020 cs10.7</p>
12+
<p>copyright Bruce Wilcox, mailto:gowilcox@gmail.com <br> <br>Revision 1/1/2021 cs11.0</p>
1313
<h1 id="advanced-patterns">ADVANCED PATTERNS</h1>
14+
<h2 id="unlimited-wildcards">UNLIMITED WILDCARDS</h2>
15+
<p>When you use <code>*</code> you want everything matching until the next significant token, eg</p>
16+
<pre><code>u: (I love * tomorrow) matches all words between love and tomorrow.
17+
u: (I love * ) matches all words after love and up to the implied &gt; (end of sentence)</code></pre>
18+
<p>and similarly *~5 wants to match up to 5 words betwen here and the next pattern word. But this breaks down if the next token after the wildcard is not a word or the end. CS must resolve the gap involved on the next pattern token. For example you can't do this:</p>
19+
<pre><code>u: ( I love _* $var:=_0 )</code></pre>
20+
<p>For capturing the rest of sentence you can do this:</p>
21+
<pre><code>u: ( I love _* &gt; $var:=_0 )</code></pre>
22+
<p>But you can't intrude pattern assignments or function calls or whatever between actual words and concepts, a wildcard, and then more words and concepts.</p>
1423
<h2 id="keyword-phrases">Keyword Phrases</h2>
1524
<p>You cannot make a concept with a member whose string includes starting or trailing blanks, like &quot; X &quot;. Such a word could never match as a pattern, since spaces are skipped over. But you can make it respond to idiomatic phrases and multiple words. Just put them in quotes, e.g.</p>
1625
<pre><code>concept: ~remove ( &quot;take away&quot; remove )</code></pre>
@@ -32,7 +41,7 @@ <h2 id="dictionary-keyword-sets">Dictionary Keyword sets</h2>
3241
<pre><code>concept: ~buildings [ shelter~1 living_accomodations~1 building~3 ]</code></pre>
3342
<p>The concept <code>~buildings</code> represents 760 general and specific building words found in the WordNet dictionary any word which is a child of: definition 1 of shelter, definition 1 of accommodations, or definition 3 of building in WordNet's ontology.</p>
3443
<p>How would you be able to figure out creating this? This is described under <code>:up</code> in Word Commands later.</p>
35-
<p><code>Building~3</code> and <code>building~3n</code> are equivalent.</p>
44+
<p><code>Building~3</code> and <code>building~3n</code> are equivalent. Note, however, that CS does not compile your named meaning into the script as is. You are naming a meaning, so CS will find the corresponding master meaning and use that instead (if different). This is because the inheritance hierarchy from below will only come thru the master meaning. For example: if you write prison-break~1, the system will compile break~1 because that is the master, and covers all other specific word meanings that are equivalent including breakout~1.</p>
3645
<p>The first is what you might say to refer to the 3rd meaning of building. Internally <code>building~3n</code> denotes the 3rd meaning and its a <em>noun</em> meaning.</p>
3746
<p>You may see that in printouts from Chatscript. If you write <code>building~3n</code> yourself, the system will strip off the <code>n</code> marker as superfluous.</p>
3847
<p>Similarly you can invoke parts of speech classes on words. By default you get all of them. If you write:</p>
@@ -79,6 +88,7 @@ <h3 id="pattern-macros">Pattern macros</h3>
7988
<p>If you call a patternmacro with a string argument, like <em>&quot;scuba dive&quot;</em> above, the system will convert that to its internal single-token format just as it would have had it been part of a normal pattern. Quoted strings to output macros are treated differently and left in string form when passed.</p>
8089
<p>You can declare a patternmacro to accept a variable number of arguments. You define the macro with the maximum and then put &quot;variable&quot; before the argument list. All missing arguments will be set to <code>null</code> on the call.</p>
8190
<pre><code>patternmacro: ^myfn variable (^arg1 ^arg2 ^arg3 ^arg4)</code></pre>
91+
<p>Patterns process a token at a time. A token is characters with no white space (generally). But the system recognizes direct function calls from patterns and the arguments and parens surrounding them may have spaces. But you cannot do assignment statements from a function call. I.e., <br /><span class="math display"><em>t</em><em>m</em><em>p</em>:=<sup><em>f</em></sup><em>o</em><em>o</em>(<em>a</em>)<em>i</em><em>s</em><em>n</em><em>o</em><em>t</em><em>l</em><em>e</em><em>g</em><em>a</em><em>l</em>.<em>Y</em><em>o</em><em>u</em><em>c</em><em>a</em><em>n</em><em>g</em><em>e</em><em>t</em><em>t</em><em>h</em><em>e</em><em>e</em><em>f</em><em>f</em><em>e</em><em>c</em><em>t</em><em>y</em><em>o</em><em>u</em><em>w</em><em>a</em><em>n</em><em>t</em><em>w</em><em>i</em><em>t</em><em>h</em></span><br />tmp:=^&quot; ^foo(a) &quot; because the active string protects the function call.</p>
8292
<h3 id="dual-macros">Dual macros</h3>
8393
<p>You can also declare something dualmacro: which means it can be used in both pattern and output contexts.</p>
8494
<p>A patternmacro cannot be passed a factset name. These are not legal calls:</p>

HTMLDOCUMENTATION/ChatScript-Advanced-User-Manual.html

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
</head>
1010
<body>
1111
<h1 id="chatscript-advanced-users-manual">ChatScript Advanced User's Manual</h1>
12-
<p>Copyright Bruce Wilcox, gowilcox@gmail.com www.brilligunderstanding.com<br> <br>Revision 11/26/2020 cs10.8</p>
12+
<p>Copyright Bruce Wilcox, gowilcox@gmail.com www.brilligunderstanding.com<br> <br>Revision 1/1/2021 cs11.0</p>
1313
<ul>
1414
<li><a href="ChatScript-Advanced-User-Manual.html#review-overview-of-how-cs-works">Review</a></li>
1515
<li><a href="ChatScript-Advanced-User-Manual.html#advanced-tokenization">Advanced Tokenization</a></li>
@@ -149,6 +149,8 @@ <h4 id="call-by-reference">Call by reference</h4>
149149
<p>Of course, had you tried to do <code>^argument2 += 1</code> then that would be the illegal <code>1 += 1</code> and the assignment would fail.</p>
150150
<h1 id="advanced-tokenization">ADVANCED TOKENIZATION</h1>
151151
<p>The CS natural language workflow consists of taking the user's input text, splitting it into tokens and stopping each time at a perceived sentence boundary. It continues with the input after processing that &quot;sentence&quot;. That leaves two tricky bits: what is a token and what is a sentence boundary. The `$cs_token~ variable gives you some control over how these work. The naive definition of a token is a sequence of letters terminating in a space or end of input. But there are exceptions to that like some kind of sentence punctuation (comma, period, colon, exclamation) is not part of a bigger token. The sentence punctuation notion has exceptions, like the period within a floating point number or as part of an abbrviation or webaddress. And hyphens with more letters on the other side are generally not punctuation either. And normally we consider bracketing things like parens not part of a word (except in emoticons). So CS will normally break things apart as it believes they should be done. If you need to actually allow a token to have embedded punctuation in it, you can list the token in the LIVEDATA/SUBSTITUTES/abbreviations.txt file and the tokenizer will respect it.</p>
152+
<h1 id="continuation-lines">Continuation lines</h1>
153+
<p>File or live user input ending in ^ will erase the ^ and join with the next read line.</p>
152154
<h1 id="system-functions">System Functions</h1>
153155
<p>There are many system functions to perform specific tasks. These are enumerated in the <a href="ChatScript-System-Functions-Manual.html">ChatScript System Functions Manual</a> and the <a href="ChatScript-Fact-Manual.html">ChatScript Fact Manual</a>.</p>
154156
<h1 id="out-of-band-communication">Out of band Communication</h1>
@@ -314,7 +316,9 @@ <h2 id="dict-files">DICT files</h2>
314316
<p>The <code>facts0.txt</code> file contains hierarchy relationships in wordnet. You are unlikely to edit these.</p>
315317
<p>The <code>dict.bin</code> file is a compressed dictionary which is faster to read. If you edit the actual dictionary word files, then erase this file. It will regenerate anew when you run the system again, revised per your changes. The actual dictionary files themselves… you might add a word or alter the type data of a word. The type information is all in <code>dictionarySystem.h</code></p>
316318
<h2 id="livedata-files">LIVEDATA files</h2>
317-
<p>The substitutions files consistof pairs of data per line. The first is what to match. Individual words are separated by underscores, and you can request sentence boundaries <code>&lt;</code> and <code>&gt;</code> .</p>
319+
<p>These files are dynamically read per language.</p>
320+
<h3 id="substitutions">SUBSTITUTIONS</h3>
321+
<p>The SUBSTITUTES folder files consist of pairs of data per line. The first is what to match. Individual words are separated by underscores, and you can request sentence boundaries <code>&lt;</code> and <code>&gt;</code> .</p>
318322
<p>The output can be missing (delete the found phrase) or words separated by plus signs (substitute these words) or a <code>%word</code> which names a system flag to be set (and the input deleted). The output can also be prefixed with <code>![…]</code> where inside the brackets are a list of words separated by spaces that must not follow this immediately. If one does, the match fails. You can also use <code>&gt;</code> as a word, to mean that this is NOT at the end of the sentence. The files include:</p>
319323
<table>
320324
<colgroup>
@@ -375,6 +379,14 @@ <h2 id="livedata-files">LIVEDATA files</h2>
375379
</tbody>
376380
</table>
377381
<p>Processing done by various of these files can be suppressed by setting <code>$cs_token</code> differently. See Control over Input.</p>
382+
<h3 id="dictionary-augmentation-files">Dictionary Augmentation Files</h3>
383+
<div style="white-space: pre-line;"><code>plurals.txt</code> | is a list of word pairs, singular and plural form
384+
<code>canonicals.txt</code> | is a list of word pairs, original and canonical form, that override what CS might have decided.
385+
<code>currencies.txt</code> | map currency words to currency concepts it defines
386+
<code>months.txt</code> | lines of month names and abbreviations
387+
<code>numbers.txt</code> | lines of words that have numeric value (see below)
388+
<code>systemfacts.txt</code> | lines of system concepts, declaring them as concepts</div>
389+
<p>Numbers.txt entries will list the word, give its value, and define how to interpret its type. REAL_NUMBER is a word that directly represents a number, like two. WORD_NUMBER is a word that implies a number value, like dozen. FRACTION_NUMBER is a word that implies a faction value like half.</p>
378390
<h1 id="common-script-idioms">Common Script Idioms</h1>
379391
<h2 id="selecting-specific-cases-refine">Selecting Specific Cases <code>^refine</code></h2>
380392
<p>To be efficient in rule processing, I ofte 37FE n catch a lot of things in a rule and then refine it.</p>

0 commit comments

Comments
 (0)
0