(you may click
the number of the subfile to be viewed, or
scroll down)
This
file contains the following subfiles:
15 - collapse
of sentences without
information loss
15.25
- a limit on summing functions
16 - phrases in all three environments
16.5
- two methods for calculating phrase boundaries: examples
(subfile 15: collapse of sentences without
information loss)
It is not necessary to lose any information when words are put together
into summed objects. To avoid such losses it is necessary
1) to retain information contained in the order of appearance of the
addends and 2) to take care of axis duplications. We want to arrive at
unique MS locations for sentences that may have 1) the same set of
words in differing orders, or 2) that have overlapping axis sets.
The first problem is trivially solved by inserting locator codes (e.g.
“first word in second sentence”) for the various elements as they are
added. Such codes insure that summations of the two sentences “Men like
women” and “Women like men” have different MS locations.
There are two ways to address the second problem.
1)
Originally I was using some sleight-of-hand. If some of the words being
combined include common axes in their definitions, then the program is
confronted with the need to store, or combine, two values for a single
axis; unfortunately calculations must constantly be done that require
each axis to have a single value. Combining them – say, by averaging
– loses some information.
The saving grace is that this algorithm operates primarily by
retrieving items stored in a content-addressable fashion: most
functions require the storage and manipulation of objects whose
location is thus encoded. The program doesn't depend on being sensible
or intelligible to us. Therefore a solution that seems messy to us may
be perfectly adequate for everything the program has to do.
In the case of this second problem, the question is "how can we set up
storage of a summed object when multiple addends share axes?" The
answer is to add an axis - replacing one of the duplicates with a dummy
that has the same properties as the original, but a different ID number.
Axes have properties (held in their word-form definition) and they
exist at differing angles with other axes. They are identified by an
I.D. number. To solve the two-value problem, we simply create a new
axis, with the same definition and angles as the original, with a
different I.D. number. All the requirements of unique location in the
MS content-addressable memory are satisfied by this kluge. In any kind
of "real" space, this would result in two points at the same place, but
it will map to different memory locations perfectly well.
2)
As the proposed data structures migrated more and more towards the same
structure, it became clear that axes could have meaning added using the
same procedure as is used for verb-forms, plurals, possessives, and
homonyms. In those cases, new words are temporarily created that
consist of the dictionary definition of the root form plus a
minimal word-structure (called a "plus-word") containing the
information needed to transform the ur-form into the inflected form.
Since axes are defined in the same ways as words, it is sensible to
inflect an axis with a plus-word for "also" to the duplicated axis'
definition. This has a very similar effect to method number one, but is
more readable and consistent.
(subfile # 15.25 – a limit on summing
functions)
It is important to remember that "definitions" change according to
context. Carried to its fullest extent, allowing definitions to change
in this way makes a word equivalent to a frame, and the summing of
frames is much more complicated and more likely to involve the
difficulties of incompatible and duplicated axes. Thus summing is a
function that works better when it takes local, context-delimited
definitions as arguments, rather than full definitions.
Elsewhere I describe the expansion of a word into an object that is
similar to the classical AI concept of frame: pointer-operators can
exist within the levels of a word's definition, and if such a pointer
exists without its associated value, it is equivalent to a "frame
terminal". Such terminals are also easy to associate with default
values, lists of exceptions, etc. (See subfiles 29 {third example},
subfile 49, and "Parts-of-thought and classical A.I. frames", p.20,
main file.)
(subfile16: phrases in all the environments)
In European, Indian, and Arabian monophonic music, there exist small
clusters of events that recur as units. In classical North Indian
music, each Raga includes, as part of its definition, numerous such
entities, and, very similarly, each mode in Gregorian Chant has its own
characteristic set of short phrases. Likewise, the control of a robot
arm entails many collections of primitive operations – collections that
are learned once and then are never
again used in their
decomposed form. Whole sequences of navigational commands were learned
by
ROBOT (p.42, main file) in single trials, and were then available
forever as single
"words".
(subfile 16.5: two methods for calculating
phrase boundaries)
I am not referring to complete, formal grammatical entities such as
“noun phrase” or “predicate phrase” but rather to those repeated word
sequences that the program might usefully concatenate into single units
of meaning. Consider the following sentence fragments – each of which
is likely to recur in the training of a program like this. In each case
the phrase I wish to address is underlined, and a possible completion
of the sentence – not relevant right now – follows in parenthesis.
Do you mean (”genes”
or “jeans”)
I don’t know (George.)
When we were talking
(before, you said......)
In each of these cases, a running sum of the axes presented by the
words would encounter no repetitions of axes until the part of the
sentence in parentheses. The three words “do you mean” share no axes of
definition, but the arrival of a noun (”genes”) does share some with
the pronoun “you”. Thus as soon as "you" arrives, the program can sense
an articulation in the sentence structure. It amounts to an
exclusive-or operation on axis ID numbers - not exactly rocket science.
This sort of calculation provides ways of suggesting points of division
within a sentence. There exists a way to test these functions before
run-time (see the fifth mode of learning, p.33, main file).
The second method is less connected to word definitions, and is a
simple example of the use of a minimal running cluster in this program.
Part of the enhanced Puss module developed in the 1980's includes a
field for the storage of the number of times a datum gets stored. An
extremely simple, low-level Puss whose window includes the most recent
two words of input would exhibit a useful behavior that could trigger
more sophisticated phrase-detection methods. Consider the sentence "I
was reading to the little girl." The 'number of times a datum gets
stored' would be
=small after the window "reading to"
=very large after the words "to the"
=very small again after "the little".
These two large changes would occur frequently at phrase boundariess,
and it is trivially simple to set up a program module that can
graphically display this sort of quantity as a "real" conversation is
scanned. Such
a module allows the program designer quickly to evaluate the utility of
specific clustering functions.