This
file contains subfile # 14
(subfile 14: the utility of collapse & some mechanics of
concatenation)
It is better if the idea “to go home” becomes a unit expressed as a
single word-object, rather than always existing as three separate ones.
Remaining separate would require the parsing of all the words each time
the phrase is seen. (Such clumping is known to be one of the ways
“expert” thinking differs from a novice's.) After a conversation
has reached a certain point of completion, on the other hand, if it has
been decided “to go home”, then the definition of “home”, previously
irrelevant, might become crucial, requiring that the 3-word entity once
again be separated out into its constituents.
Many of the word-series that are useful to collapse into single points
would be referred to as "phrases" in the usual grammar-lingo. When
examining input, there are some signals to the pre-processing
parser that a collection of words is a phrase - for instance, the
presence of an “extra” verb in a sentence or the appearance of a
preposition. By using both repetition of the collection
itself (in association with a number of other words, all of the same
part of speech) and the use of clusters (p.18) the algorithm has a
variety of ways to determine the boundaries of phrases.
The collection “I would like”, for example, appears over and over,
followed by an article and a noun. The repeated phrase should be
treated as one entity whose definition includes the analysis performed
earlier by the program.
(The repetition is itself an excellent indicator of phrase-hood.) We do
this all the time, and sometimes there’s hardly any need to be able to
decompose the phrase, ever again. We even make jokes about this;
jests of the form:
say a silly
sound,
then establish a context that explains what the sound means
“Jeet
jet?”
“Did you eat yet?”
“Juwannago?”
“Would you like to go?”
It is noteworthy that the use of clusters to delineate phrases operates
as the sentence is spoken, and not by performing some analysis after
the whole sentence is available - analysis such as traditional
"diagramming".