(Subfile
#32: example)
One of the model conversations I enjoy considering is
1) Want an umbrella?
2) Is it raining?
3) Well, duh.
Sentence one requires preprocessing with transforms, and is sufficient
for this example.
For sentence 1) to be useful it must be changed into a) below. This is
partly the function of the internal conversation, but is also necessary
for other unspoken calculations to take place.
a) Do you want to be given an umbrella for your walk
outside?
We understand that fragment X below is an idiom that can mean b), c),
or d).
X: "Want a <>?"
b) "I have one here for you" or
c) "We can acquire one" or
d) "We should spend time getting one".
The first consideration for programming is seeing to it that there is a
mechanism that can learn from experience that these choices exist, and
that both ends of the process are appropriately general. Second,
transforming "Want an ...." from 1) into "Do you want to be given
an...." in a) requires deciding which of the available contexts b), c),
or d) to use. Finally, the words actually to be substituted for "Want
a" must be assembled.
Generalizing Input
A number of common phrases have the same content as X:
Would
you like to have…
Does madam care for a…
Do you want…
Is it your wish to acquire a…
And slightly less similar:
Can I get you a…
Does George want a…
Did you want some…
Each of these can accept the same objects (Would you like to have a cup
of tea?) Were the program to see these seven similar phrases all as
completely different objects, it would have to learn to associate
every object with each of the seven similar phrases. Relating all
seven, however, to a single point in MS, means that this unacceptable
situation does not arise.
To bring about this common association the internal conversation of the
program needs to extract from each fragment the same primitive elements
and form them into a single feature vector with which to make
associations. "Extract" is what this process looks like to a human, but
it is in fact nothing more than other associations. For example, one of
the primitives common to each fragment is the "question" plus-word. In
each case the word order of auxiliary verbs has been used as the
LH-side for the prediction "question". This process also can look like
a transformation, since it re-orders words like so:
Do you want
becomes Question: you (do) want
Would you like to
have becomes
Question: you
want to have
Does Madam care
for a becomes Question: Madam wants
We like to have these
various ways of speaking, but the program uses just one process.
Generalizing Output
In transforming “Want a…” into one of the phrases above, we understand
the involvement of a central point in MS to which all possible
transformations are near. Any of these nearby phrases, if seen in a
conversation as input, could be treated by the program the same way
(I.e., to produce the same output).
The output of the transformation also needs to be general. It would be
unacceptably inefficient if the program had to repeat
every aspect of its learning for each possible object that could be
wanted. The same list can be considered to be output. If some transform
resulted in the use any of these closely related phrases, all of its
partners would then become equally available. (Other aspects of the
current situation would have to provide reason to choose one over the
other: if we knew that we were addressing an honored, older female
guest we would use “Madam, would you care for a…”.)
Learning from experience that
a set of choices exists
For the kind of learning we want to occur, the seven similar phrases
(“want an umbrella?” “would you like to have an umbrella” etc)
each would have to appear in the experience of the program - either in
input or in internal calculation. Although it is certainly not
necessary for the program to hear something before it can be produced
as output, that’s not the question here; we want to be certain that the
program CAN learn about a set of choices, regardless of their origin.
The phrases can be collapsed to points like any other object;
these in fact exist at different coordinates in MS. The points are
close, as such points go, but not exactly coincident. For this reason
it is necessary for the program to have ways of deciding if such a
group of phrases is interrelated. The mechanisms by which the program
learns to associate such nearby points are all discussed in detail
elsewhere:
1) overlap of fuzzy
data
(points are actually clouds - <fuzzy data: p.21>)
2) crawler-optimized search
(searching for nearby objects is taken care of outside of
run-time, or by a separate computer - <pp.36
& 22> )
3) dark-cycle-created bombs
(associating pointers connect regions of MS and are also
created by crawlers - <p.10>)
4) resonance
(two objects associated in common
to a third are said
to resonate - <p.17>)
Each of these methods requires no heuristic cleverness, and each
results in an easily calculated numerical measure of relatedness.
Learning that a set of phrases all may serve equivalent purposes
requires that they somehow be labeled, or stored on a list, or - as is
the Purr-Puss way - associated as RH-sides with some point in MS. There
may be groups of phrases (held together by their common ability to
communicate a single meaning) that overlap with other groups: a phrase
might serve different idiomatic purposes under different circumstances.
These "different circumstances" are just other feature vectors,
however, and a particular cluster type can be associated with them to
limit and precisely define regions of MS that can be considered
equivalent, such as the members of our list ("would you like a. .
." "do you want a. . ." etc). See "region clusters" under
"Cluster Types", p.27 in the main file.
Deciding on a context
1) Want an umbrella?
b) "I have one here for you" or
c) "We can acquire one" or
d) "We should spend time getting one".
Currently I am not considering writing a program that lies - that is,
this will not be a program that "acts as if" it were a person. It will
converse as if it assumes that it is, itself, conscious. Among other
things, this means it
cannot "have" any physical objects (unless it gets connected to a real
robot arm - and then it can really only have a few things. . .).
Therefore the first context is only going to appear when a human says
it, so it's not worth worrying about yet. The program can,
however, have knowledge about the whereabouts of things.
Sentence 1) tells us what object we're concerned with - it will
therefore be in the current-objects buffer (see "BigBuffer", main file
p.25). One
of the activities of this buffer is amassing all learned associations
immediately upon the appearance of an object, and if the location of
the object has been discussed recently enough, it will thus be
indicated by a
pointer in the buffer. The presence of a pointer to <object> will
unquestionably be related to the idea of acquisition very early in
learning; therefore c) is a natural result of the sequence:
sentence 1) --> place umbrella in buffer
--> acquire "umbrella"'s associations -->comment on them
It is at this point that the defTrans would become active. There is a
word in the input that’s quite rare (umbrella) and a phrase that’s
quite common. The common phrase is a pointer to an object ("an")
and an interrogative ("Want"(?)) - the summed object formed from this
pair provides the content of the defTrans. It is this content
that becomes the context (puss-window, LH-side, feature vector) at
which the grammatical patterns of sentences that followed are
stored. Each of the contexts b), c), and d) correspond to one of those
patterns. In this way it is learned that the grammatical
structure of part of the input (i.e. of "Want an...") can be associated
with any of the structures of b), c), or d) when the input appears
again later.
Transforming the sentence
fragment
Above we
have considered the transform type in which the structure of words’
definitions is used to learn associations to other structures. It is
understood that most of these structures could underlie multiple
meanings - probably too many to allow them to be useful directly in
predicting specific content. If the input to the transform exists only
in the realm of structure, the process is trapped in that one
level.
Fortunately there is no reason to limit the variety of data types used
in inputs (freedom in such combinations is clearest in the simple
context of ECONOMIST <main file p.44>.) In this discussion,
"structure" means "axis ID, pointers, frame terminals", etc., but no
values of the axes: that is "content". The information in the
structure of some of the words can be combined with specific content of
others, reducing the generality of the result in different ways
according to the meaning chosen for input, and reducing it an amount
proportional to the amount of content chosen.
In the example above. The structural information in “Want a…” can
be used to predict structures appropriate for the next sentence, while
the
content of the definition of "umbrella" can relate to appropriate
contexts. These two types of information can combine to form the
LH-side of Purr-Puss production rules that act as transforms - that is,
rules that allow the learning/linkage of a) grammatical and
definitional structure, plus some content, to b) future content. This
procedure is one of those similar to use of intersecting MS surfaces
<pp. 15 and
31>.
------------
Finally,
imagine that a series of deftrans labels has succeeded at
predicting a reply-formation-transform (rft), and that the result of
using that rft receives a positive reinforcement. A typical process
would then be to store the deftrans label(s) at a location provided by
the rft itself. This reverse-storage is the natural way, at some later
point when back-chaining is going on, for the program to infer from
known rft's that some particular definitional structure should be
relevant. Knowing that something should be relevant provides another
inroad on the problem of choosing sensible replies. More interestingly,
this "prediction" of what grammatical structure should have preceded
the current sentence, provides the program with a way to evaluate the
progress of this conversation as opposed to previous ones. Such means
of numerically evaluating arguments must be found if internal
conversation is successfully to be managed.
(Subfile #32.2: deciding what the program
should do)
Where do we draw the lines between
correctness of communication
(factual)
efficiency of communication
(bits per concept?)
grammatical-correctness of reply construction
(adherence to rules of grammar)
elegance of language use
(artistic
composition)
language as art
(poetry)
Should a program start with correctness and build through the four
stages to elegance? This does not seem to me to be a realistic model
for most actual conversation; people abbreviate, leave out "understood"
elements, use redundancy for emphasis, etc etc. These first two depend
on a person's internal model of the other person in the conversation -
something as yet well beyond the current task - and the third requires
that there be some active mechanism of social motivation; such
motivation, like hunger and pain, is something computers just don't do
<see subfile #65>.
(subfile 32.9: examples of internal and
external transforms)
Internal transforms
Input
Output
<word> arrives as input ....................................
command:retrieve definition
previous sentence ..............................................execute
script:decrement reinforcement
negatively
reinforced
of fuzzily related sentences
External transforms:
Input
Output
Sentence from Teacher ....................Pre-processed path for
beginning calculations
Content from calculations .................Grammatical output for human
listeners
(Subfile 33: philosophical problems)