The Geometry of Meaning: a fantasia
by Chris Lanz
Abstract
A mathematical model for linguistic meaning is proposed, and some
implications
for artificial intelligence programming are
explored. The model and procedural
ideas from the work of John Andreae are
combined in a single large structure
intended to perform three tasks, 1)
conversation in any human language,
2) imitation of any style of melody, and 3)
navigation in a virtual physical
environment. The model directly suggests solutions to various problems,
including 1) data structures; 2) the
definition,
construction and deconstruction
of various entities: words, sentences, phrases,
parts of speech,
and transforms
between sentences; 3) originality and
creativity as
opposed to a degree of
parroting; and 4) the interplay of generality
and specificity. A method
for allowing learning from conversation to inform and reform the topology of the
proposed meaning-space is suggested, and a
variety of mathematical structures
for learning are provided. This algorithm is the result of many years of experiments
in machine learning, and the history of those
experiments is included after the
description of both the model and of
programs
expected to result from it.
Introduction
Since the time of Descartes, scientists have
been
representing aspects of
our world with mathematical structures. Since
much
"reason" and much of
"reasoning" are reproducible and communicable,
they would appear to be
likely candidates for such representation. This
text
describes one attempt
to represent language in a way that
incorporates some of
the reasoning that
underlies everyday conversation, and in a way
that is
compatible with other
tasks common in human behavior. The attempt is
being
manifested in a
computer program that is intended 1) to learn
to converse
in a way
comprehensible to native speakers of whichever
language
is used, and 2) to
use a programming structure functionally
identical to one
already shown to
be appropriate for musical composition,
locomotion/navigation, character
recognition, and playing some games.
This project is ongoing, and the current
algorithm
represents the latest in a
22 year series of experiments in machine
learning. In
retrospect the earlier
programs are understood to have been
preparatory
exercises for the current
project, and they are described in the second
part
of this
paper. They provided
experience with a number of areas crucial to
the current
algorithm.
I have tried to use tenses very carefully in
this text: present
indicative
means "code exists and runs"; simple future
means "I am
certain that
I know exactly how to write the code, but
haven't done so
yet"; and
any type of conditional or subjunctive means "I'm not sure this can be
coded" or "I have no code in mind for
this module
yet."
Models should make predictions that can be
tested
experimentally. This
algorithm represents a model for a
behavior-managing
central processor, and
when the behavior involved is linguistic, the
model will
predict, for
example, the existence of words which are
actually found
in the dictionary,
or the need for new parameters that must be
added to the
definition-space.
There is also a mechanism for optimizing the
topology of
the space in which
words' definitions exist (this is equivalent to
saying
that the algorithm
can make certain predictions about the axes in
that
space), and the program
will produce some sentences which are "legal"
and some which are "correct"
for the grammatical
and reasoning environment in which it operates (this is
equivalent to saying that it makes predictions
about
combinations of symbols
that are legal in a grammar).
I hope that this paper will be of interest
to other
workers interested
in machine learning and natural language, and
that
through them I will find
other research that is related. More
importantly, I hope
to receive advice
and criticism that will help me progress. It is
unpleasant to write about a
project that has produced no results as yet,
but I can
see that the road
ahead is years long, and experience has shown
that
isolation is
fundamentally counter-productive. Although I am
working
on a couple
of programs for a software company, my "day
job" in an
unrelated field prevents
the type of collegial interactions most
researchers
enjoy,
and thus I am
presenting my work in this form.
Axes, Words,
and Meaning Space
Definitions
Some
aspects of words' definitions can be expressed like this:
Red
-->is a property of --> apple
material
--> is a property of --> apple
size
--> is a property of --> apple
apple
<-- is subject to <--eat
teeth -->act upon-->class
of
things subject to
eating<--is a member of<-- apple
"Redness" can have a
value, as can "materiality" and
"size" - that is,
things can be more or less red, more or less
material,
and larger or
smaller.
Thus part of the definition of "apple" is a point in a 3
dimensional space whose axes are redness,
materiality,
and size. A musical
event is inevitably very much simpler, often
involving
no more axes than
pitch and a time value. In conversation,
physical
locations for locomotion
are seldom expressed simply
as coordinates on a map (we are prone to saying
things like "I would like to sit in the comfy
chair" rather than things like
"I would like to sit down 4.3 meters from the
North
wall") but they are
always simpler than definitions of words. Each
type of
element looks the
same, however, to this program, whether it be a
word, a
note, or a location.
Each is expressed as an object with values
along axes
(complete definitions
of words have much more structure, as is
described
below).
How many dimensions in Meaning Space?
The first important question to answer is "how
many
axes will be required to
define words"? Clearly any vocabulary of size n
can be defined perfectly by
a set of n
axes: the axes must simply be
defined as
"<word>ness" (thus the
axis for a word like "green" would be defined
as "greenness"). Such a set of
axes would also be useless, since all
definitions would
be perfectly
tautological, providing no possibility for
inter-relation
or logical
combination. Clearly what is desirable is a set
of axes
numbering well below
the size of the vocabulary.
A set of such axes has been constructed for
several
hundred words taken from
a 3,000 word vocabulary comprising the word
list in a
beginning foreign
language reader. One form of this set is
presented in the
appendix. Each
time a word from the list was considered - that
is, each time a definition
was attempted - it was possible that some
axes would have to be
added to the space. At the beginning of the
process,
several new axes might
be required for each new word, but as the lists
lengthened, the number of
axes per new word diminished. The number of
axes required
leveled off at
about 450, and the curve was asymptotic. This
number of
axes is fairly
manageable, and it appears the total number
will not continue
to increase at
an unreasonable rate as words are added.
How complex
are definitions?
Another problem exists with respect to the
number of axes
that must have a
value in order to define a given word. If a
definition
required that 500
axes be given a value, then the task of
defining a few
thousand words would
be impossibly lengthy. Fortunately, the largest
number of
axes required for
a single word was 25 - that word was the verb
"to
deserve" (as in "How much
freedom does a ex-con deserve?"). The average
number
of axes required was
around 10. In any case, these numbers are also
manageable.
Since the axes used in defining words relate to
different
words in different
ways, it might appear that the individual
coordinates are
themselves
somewhat more complex than spatial ones. This
is a detail
of the definition
of the space, however, since the axis "red" can
be replaced by several axes
whose definitions are "red as a property",
"red as a frequency", "red as an
aesthetic value", etc.
For the purposes of this study, then, words
exist as
points in a space with
perhaps 500 dimensions. No claim is made that
these
definitions are in any
sense "complete" or "correct" from
either a lexicographic or linguistic
point of view. Many humans exhibit perfectly
sensible
verbal behavior
without such completeness or correctness. The
points in
meaning-space
should be thought of as a rich and flexible
labeling
system.
The data structure being used is minimal. All
the attributes that more commonly
might be held in variables with detailed
structure (objects, records, etc) are stored
as axes with values. One result of this is that
words (and all the other objects in the
space) move
around according to context, and
according to experiences relating to
the objects. As an object is used, and as
associations with it are built, because
all the information about it is kept in the one
data format, its location in the space
changes.
Templates
The way we represent the spectrum of light from
a star
consists
fundamentally of a series of values (the
heights of the
spectral components)
along a number line (the frequency axis). There
is,
however, more
information - for example, the width of each
peak. It is
useful to think of
the definitions of words in a similar way.
Instead of the
frequency axis
used for spectra, there is the number line
consisting of
the 500 axis
numbers. The value used in the word's
definition is
analogous to the height
of the spectral peak. Thus the definition may
be thought
of as a graphical
object like a spectrum, rather than as a point
in a
multi-dimensional state.
When dealing with language it is essential to
have an
organized way of
dealing with such problems as approximation,
unknown
elements of a
definition, elements which comprise a range of
values,
and definitions which
specify classes of other definitions. These
four ideas
are computationally
related. Imagine a set of star-light spectra,
printed out, and that we
want
to find a subset
must be present for some frequencies, and those
values must fall in a
certain range.
It would be possible to construct a piece of
cardboard - a
template - with
windows cut
in certain places defined by these characteristics. By holding
the template over the individual spectra, we
could see
which ones correspond
to the set of characteristics. The size of the
hole cut
for a given
frequency would be one parameter associated
with that
frequency. Also
imagine that the templates' windows come with conditions, so that one can
specify things like "if at 750 angstroms the
value
is between x and y, then
ignore the value you see through
that
window at 800 angstroms."
Considering words' definitions to be templates,
rather
than as single values
associated with base line positions, creates a
natural
environment for
approximation, ranged values in definitions,
classes and
so on. In fact it
is helpful to think of all words as degenerate
examples
of templates - that
is, defined words are templates in which the
windows are
open to viewing
only single values - and conversely it is
helpful to
think that all defined
words provide models for templates. It is then
possible
to move from one
definition to any other in a precise and
controlled way,
keeping track of
the transformative actions required and
consequently the
"distance" between
the definitions. Getting from one definition to
another
involves a concrete
series of steps, adding or subtracting baseline
indices
that have values, or
changing the ranges of values in particular
baseline positions.
Definitions
that are more
different require more steps.
Orthogonality, Distance, and Clouds
The angles between spatial axes are meaningful.
I believe
that the angles
between the axes in this "meaning-space" can be
determined geometrically,
using a mechanism postulated below, that
obtains the
necessary information
from conversation itself. (It is desirable that
no
abstract or rote input
from the programmer be necessary to define an
axis, or
else no new axes
could ever be added by the algorithm itself,
nor could
any attempt be made
to allow the program to start completely from
scratch -
that is, without any
words or axes pre-defined.) Normal arithmetic
vector
calculations suggest
that a pair of axes which are completely
unrelated ( say,
"color" and
"manufacturability" ) exist at right angles to
each other, while those which
are
related ( say, "color" and
"redness") would exist at lesser angles.
Assuming for the moment that appropriate sets
of
relations among axes have
been determined, then the distances between
words in meaning-space (MS)
should
bear a relation to our
(somewhat subjective)
evaluations of words'
similarity.
There will be regions of MS in which all
definitions are
closely related; one
such region would correspond to the class
"fruit". Note that the definition
of "fruit" can be obtained by widening some
windows on the
template-definition of "apple".
Many axis definitions require the assignment of
values
which are not
precisely quantifiable, or else it might appear
that
precisely quantifying
the values would be counterproductive. "Red" is
necessarily understood as a
range of frequencies along an axis that
includes other
colors as well.
Therefore words are not best thought of as
points in MS,
but rather as
variously shaped clouds, whose deviation from
sphericality depends on the
ranges of possible values on the word's several
axes.
Computer memory has
become very cheap, and it is no problem, from
the
computational point of
view, to store pointers - to each text item
such
as
"apple" - at a large number
of nearby points in MS. This fuzziness in
individual
words' definitions
relieves the system of the requirement of any
specific
level of precision
when making its various calculations. (See
"Fuzzy
Data, Fuzzy Logic" below.)
In general, in the discussion which follows, I
will refer
to words as
points, and the reader should remember that
this usage is
approximate.
Syllables, Words, and Phrases
For an algorithm like this to improve itself,
it must
concatenate "words"
into concepts it can deal with as units. It is
better if
the idea "to go
home" becomes a single unit, rather than always
existing as three words.
(This clumping is known to be one of the ways
"expert" thinking
differs from "novice" thinking.) Remaining
separate would
require parsing of all
the
words, each time the phrase is seen.
The data structure I have
selected is
suited both to the
combining of words' axis/value/pointer units,
and to
their disassembly.
Certain phrases, in which the words have very
distinct
and different
functions (different meaning, different
definitions,
etc.) sum very
naturally, like "to go" and "home".
The combined idea looks exactly like
all the other entities visible to the algorithm
- therefore it which doesn't
need to
care that there were, originally, separate word units. This process
of concatenation is also applied to entire sentences, reducing them to
a single
meaning-space point.
Phrases - concatenations of primitive objects
into new ones - occur in all three
of the environments addressed. In European,
Indian, and Arabian monophonic
music, there exist small clusters of
note-rhythms that recur as units. Each Raga,
for example, includes as part of its definition
numerous such entities, and each
mode in Gregorian Chant has its own
characteristic set of short phrases. Likewise,
the control of a robot arm entails many
collections of primitive operations - collections
that are learned once and then are never used
in their decomposed form again.
Equally essential to concatenation is the
ability to decompose
definitions
and larger
units (like the paths, sentences, and
transforms
described
below) into smaller
units, like syllables. At a shallow level, this
is a
trivial matter, since
words' definitions consist of completely
discreet units -
namely, an axis
number and a value. It is of use to be able to
take any
subset of a word's
definition (or of a path, transform, etc.) and
find out
quickly if that
subset comprises a word in the dictionary or in
any
recognized concept. This
capability is a natural result of the
associative,
content-addressable
memory that is utilized by the algorithm. (This
technique of memory management
is
clarified in the description of the historical progression of
experiments
that preceded this project.)
The ability easily and without much
computational
complexity to form large
units from small ones, and to decompose units
into
smaller fragments, makes
it natural for the algorithm to use information at any of various levels,
from portions of words to phrases of
considerable length.
It is important
that the computations performed can be the same
at widely
differing levels
of resolution, and that logical operations can
therefore
either be insulated
from irrelevant details, or can descend to the
level of
those details when
necessary. I don't often think about the
safety,
convenience, etc., of my
"home" when I think about leaving work to go
home, but I have to know about
all those things when having a discussion about
the
"value" of having
such a
place to go.
The data structure being used for words
includes axes and
values along those
axes, but also a provision for various types of
relationships among those
axes. These relationships primarily fall into
two
categories. First, at the
top level of the definition of "apple" would
appear "red" and "fruit". Both
of these are words themselves, and a lower
level of
apple's definition would
then be the definitions of these two words.
Second,
"red" has a pointer to
"fruit", and the pointer is of type
"modifies".
Because of this structure, it is rather
straightforward
to concatenate the
definitions of a few words forming a common
phrase into
one entity, which
itself looks - to the program - exactly like a
word.
When examining input, there are some signals to
the
pre-processing parser
that a collection of words is a phrase (for
instance, the
presence of an
"extra" verb in a sentence or the appearance of
a preposition) but primarily
it is a matter of repetition of the collection
itself, in
association with a
number of other words, all of the same part of
speech.
The collection "I
would like", for example, appears over and
over,
followed by an article and
a noun. It's inefficient for a program to
repeat its
analysis of the
repeated phrase over & over. Rather, the
repeated
phrase should be treated
as one entity whose definition includes the
previous
analysis. We do this
all the time, and sometimes there's hardly any
need to be
able to decompose
the phrase, ever again. In fact, we make jokes
about
this, jests that are of
the form
say
a silly sound
establish a
context that explains what the sound means
"Jeet
jet?"
"Did you eat yet?"
"Juwannago?"
"Would you like to go?"
Routines to construct and disassemble phrases
will
operate on single words,
phrases, and paths alike, according to
repetition rates,
eventually making
reply formation more efficient.
Concatenation could serve another purpose. Once
"I would
like..." has been
summed and is treated as a single word, it
should become clear that the
'word' is either followed by single nouns with an associated article, or
by larger and more
complex collections of words. The grammatical
equivalence (of
these larger collections) to the nouns/article
pair could allow the program
to define such things as "noun phrases"
without being
explicitly instructed that they exist.
Words are not
only to be concatenated with other words. The data structure
includes a provision for context-dependent
information to be appended to the
definition of a word that is "in play" (see
"Big Buffer" below). The simplest
example is tonal inflection. Clearly, four
different replies are required for the
four versions of the following sentence:
1) I
want to go home. (George doesn't want to, though.)
2) I WANT
to go home. (It is my wish. It is not an urgent need.)
3) I want to GO home.
(I wish to travel to my dwelling.
The travel itself is what's important.)
4) I want to go HOME.
(As opposed to going to work.)
Tonal inflection of this type can become visible to the algorithm by
defining
a two-value axis for "inflection" and allowing
the input stage to include the
assignment of accented words.
Predicting New Words
How would the existence of as-yet-undefined
words be
predicted by this
algorithm? Suppose training conversations have
been
limited to food and
eating. After some learning has occurred, and
definitions of sufficient
numbers of words has taken place, there would
be a
number of
words that would
differ only along one axis. This situation
would be
represented by a
particular type of template: most of its
windows are
rather limited, because
identical specific values are present in all
the words'
definitions (all
fruits are material objects subject to being
eaten), but
one window, on one
axis, would be very wide, because
the various words in the collection
have
very different values for that parameter
(fruits come in
all colors). It is
reasonable to imagine that the collapse of this
wide window to a discreet
value, in combination with the other axis-value
pairs, might point to a
location in MS that actually does (or
should) represent a particular word or phrase.
The program is then in a position to search for
a
word in
meaning space, and to
ask about it, if there is nothing at the
location
specified. The sort of
example described here would be predicting the
existence
of a word like
"apple" if one had been introduced to
"pippin", "red-delicious",
"granny-smith", etc, which are words that might
differ only along the
"color" axis.
If the color axis had not yet been defined,
then this
same mechanism would
suggest to the algorithm that it should be
added. The
existence of more than
one word with identical coordinates, but which
the
teacher says are not the
same thing,
requires that there be another axis
about
which the algorithm
has yet to hear. Automation of the definition
of new axes is the most difficult
aspect of this project.
The data structure provides a straightforward
way to recognize, and help with
processing, many of the phrases (that is, short
combinations of words that either
recur exactly or that involve wild-card
positions) that occur in normal con-
versation. I am not referring to
complete, formal grammatical entities such as
"noun phrase" or "predicate phrase" but rather
to those repeated word sequences
that the program might usefully concatenate
into single units of meaning. Con-
sider the following sentence fragments - each
of which is likely to recur in the
training of a program like this. In each case
the phrase I wish to address is
italicized, and a possible completion of
the sentence - not relevant right
now - follows in parenthesis.
Do you mean ("genes" or
"jeans")
I don't
know (George.)
When we were
talking (before, you said......)
In each of these cases, a running sum of the axes presented by the
words would
encounter no repetitions of axes until the part
of the sentence in parentheses. The
three words "do you mean" share no axes of
definition, but the arrival of another
noun ("genes") would share some with the pronoun "you". This sort of cal-
culation provides a way of suggesting points of
division within a sentence.
Paths, Bombs,
and Transforms
Sentences
If linguistic words, musical notes, and
physical
locations are centered on
points in MS, then sentences, melodies and
movements are
paths through the
space. The ur-form of these paths
consists of a series of ordered points. There is
a simple way to collapse these paths into
single points themselves without losing
any information - although the resulting
structure is not as efficient as the original
in terms of memory usage.
In thinking about sentences and melodies, one
must resist
the illusion of
continuity between points. In physical 3-space,
proceeding from one point to
another involves a continuum of intermediate
points whose
relation to the
endpoints is precisely defined and entirely
meaningful.
It is unlikely that
intermediate meaning-space points between two
words in
a sentence - points
whose location could be calculated by some
simple
arithmetic function -
would be meaningful in a similar way.
Because the data structures involved are
closely related, the program will be able
to move between paths and
word-definitions quite freely: sentences may thus
easily be constructed from the definitions, and
new definitions may be formed
from conversational input. When the processor
is not occupied with new input,
the program will be free to work on its
database, and it is during these times that
paths will be used to form new definitions, and
that questions (or statements)
concerning existing definitions will be
constructed.
Bombs
I prefer to think of the links between words in
paths as
"bombs". Suppose
you are "standing" on the infinitive verb in
the
fragment " I like to
eat....", and that you are awaiting the
object. There
could
easily have occurred,
in previous conversations, several appropriate
objects,
for example, apple,
pear, banana, etc.
A bomb is an entity with a direction number (a
set of
numbers specifying the
desired direction with respect to all relevant
axes), a
distance, and an
"amount of explosive". Standing
on the infinitive, one tosses
a bomb in a
direction, forcefully enough to go as far as
the required distance. The bomb
disappears in flight (since intermediate points
are not
meaningful as they
would be in 3-space), reappears at its goal,
and,
depending on the
generality of the desired result, it explodes
more or
less powerfully,
illuminating larger or smaller regions of MS.
Thus a very
large number of
actual sentences are fuzzily available from a
single
source sentence. In the
diagram here, read a column as a sentence, and
reading across, see words
comparatively close in MS:
I
you
we
George
Bombs come in a variety of flavors, each
associated with
a particular
operator - operators that are closely related
to the pointer types that exist
inside word definitions. For instance, a bomb
can be
associated with a
grammatical necessity, such as a direct or
indirect object
for a verb.
The types
I have defined so far are: bombs which create
paths,
those that
comprise
transforms, those that define forbidden
transitions (as
learned by
reinforcement), those that point to/from
grammatical
direct
and indirect objects,
those which define a property of the object at
the
origin, those dealing
with agency (who or what entity "did that"),
and
those that
lead from
conversational commands to non-linguistic
behaviors
necessary for the
operation of the program. This last type is
particularly
relevant in
managing the problems associated with "being
wrong" and changing the
database in response to negative reinforcement.
The bombs with the label "is a property of" are
one of the ways this
algorithm manages to answer the ever-present
question, "what are some
properties of this (thing)(situation)(MS
location)?" And, as with most types
of information stored in this program, the
particular type of content-
addressability used in memory management
obviates any search-time
problems.
The function of all types of bombs can be enriched further by
associating
each bomb with a set of values for some subset
of axes (this fits nicely into
the datastructure for a word, and would be
MS-stored in the same way). The
values would indicate the allowable generality
of the bomb's result along each
axis. This is a natural way to recognize
situations in which a class word is
appropriate, or when a class of objects is the
subject rather than an individual
member of the class. It is also easy to learn
this information: when bombs are
stored, it is a trivial matter, (almost)
instantaneously to find a set of other bombs
that share all but one axis value. The
successive examination of such related
sets of bombs will provide a range of suitable
values along that one axis wherein
the target axes differ. Once this range has
been established by a suitable number
of "hits", a bomb can be configured with the
appropriate generality along that
axis. In this way conversation gradually shapes
the algorithm's handling of
generality and classes.
Transforms
If sentences are paths through MS, then the
process of
conversation (whether
it be the internal conversation that we engage
in while
reasoning, or the
external one that can result) involves
transformations
from one path to
another. With paths stored as points, the
transform between them is especially
simple, and is isomorphic with a bomb.
It is at this point that the musical and
locomotor parts
of this program are
no longer involved. There is no analogy between
moving
about a room, and a
conversation (except perhaps dancing with a
partner?),
and the analogous musical
level of complexity, beyond a melody, is
counterpoint. I am still limiting my
program to monophonic music such as that
composed before
1100 A.D. (See
CHANTER below) in which counterpoint is not
involved.
Once again it is advisable to resist any
analogous
spatial image, in which
one would imagine two sentences as two angular
snakes,
each consisting of
several points with straight lines drawn
between. Such a
vision would
encourage one to imagine the transforms to be
predictable
shape-transformations. (Imagine one snakey
squiggle,
morphing into another
snakey squiggle, situated somewhere else in the
space.)
This is precisely as
unlikely to be valid as the
image of meaningful intermediate points between
words within a single path.
I prefer an image similar to word-linking bombs. Suppose some input has
arrived which begins "Do you like...." and in
which tonal emphasis is placed
on "you". Many subsequent replies would have
begun "I (do) like....". Thus
an element of one type of transform would look
just like
a path from "you"
to "I". Another fuzzy explosion of
possibilities is available without any
computational strain, with different variants
being
appropriate to different
tonal stresses in the input:
I
George
You
It is useful that paths, bombs, and transforms
look
alike to the computer.
Many sentences, after all, are also transforms.
The
sentence "I went home"
transforms the sentence "I'm at work" to the
one which says "I'm home." The
less computation required to move between paths
and
transforms - and the
reverse - the better.
The progression of transpositions can be
continued. Axes
are related to each
other by angles. Axes are related within word
definitions
by pointers of
various types (such as "is a property of").
Words are related within
sentences by transitions between points in MS.
Successive
sentences are
related to each other by path transforms. And
transforms
themselves are
related to each other by another level of
transition.
Using a single data
structure for each type of transition allows
for a
unified and seamless
functional programming environment.
Most transforms will operate on small portions
of input - input to transforms
comes both from context and from immediately
preceding sentences. Likewise
most will only provide part of the answer
when a reply is being formed. The
simplest will hold some values constant while
rotating others about, as in the
following example where the subject and verb
are constant, but pronouns and
mood are flipped:
You like
apples?
I
like
apples.
Various means of discovering stable elements have been constructed,
some of
which are quite complicated (see "Meaning Space
and the Management of
Conversation" below). The mode of reversal
differs by axis, but is learned from
conversation and stored in the usual
content-addressable fashion.
Definition
Transforms
The grammatical structure of sentences is
obviously
related to the forms of
definitions of the words present. In replying
to the
question "Which one?" a
number of antecedent relations have to be
managed. The
pattern of
parts-of-speech appropriate for a reply depends
on the
nature of the
headless pointers implied by these pronouns.
Grammatical
patterns and
definition forms will be associated with
labels, and
these labels will then
be used in a higher-level module to suggest
appropriate
grammatical patterns
for output, or for the next stage of the
internal
conversation.
The data structure for the definitions of words
consists
primarily of values
on the various axes, and these data are linked
by
pointers of various types.
(The data structure is described in some detail
in the appendix.) This
is the same data structure needed for some
types of
transforms, in
which axis-values or groups of them are
transformed into
other values, or
are transformed along some axis. Because of
this
similarity, the definitions
of words look like transforms, and for this
reason, since
an early stage in
the development of these ideas, I have referred
to the
definitions as
"definition
transforms". The unity of axis structure makes
it as natural for the algorithm to forecast the
grammatical structure of the
reply from the definitions of words in the
input, as it
is to simply
forecast the words in the reply from the words
in the
input.
The definition transform of a word consists of
everything in the word's
definition except the values along any of the
(non zero) axes, making them
similar, in a way, to class-words. As an
example,
imagine removing
successively more and more axis values from the
definition of "apple".
Removing the value for "color" (but leaving the
axis there, to indicate
that this definition is for entities that can
have color) makes the definition
into a class-word for all apples. Removing the
values for sweetness and
animal-vegetable-mineralness brings us to a
class-word for edible objects.
Removing the value "edible" reduces the class
restriction to "objects subject
to <an operation>" rather than the more detailed "objects subject to <being
eaten>". When all the values are removed
there remain only a set of axes - ones
that could be
filled - and pointers, such as the one that linked the original value
on the color axis to the outer-level-definition object itself (the apple). Note that this
type of value-free definition is like a template in which all the windows are
fully expanded, but in which the relevance of a
particular subset of axes is
still specified.
The way the algorithm uses these transforms demonstrates one typical
process
naturally available within the memory-style
chosen. First, the information in
the individual deftrans is available from the
MS location at which the index
of the deftrans is stored. Each deftrans
itself has a number, or a label.
All the labels of the deftrans in a path, taken
together, provide a compressed
desgnation of the procedural or grammatical
structure of the path, as contained
in the structure of the definitions of its
constituent words. This collection of
labels points to a memory location at which
predictive information is stored.
For instance, storing
reply-formation-transforms at these locations provides a
way to predict what process might best be used
in the transformation of a
known path into a new reply.
As this process continues, the selection of reply-formation-transforms
(henceforth "rft") will stimulate
reinforcement. Imagine that a series of deftrans
labels has succeeded at predicting an
rft, and that the result of using that rft
receives a positive reinforcement. A
typical process would then be to store
the deftrans label(s) at a location provided by
the rft itself. This reverse-storage
is the natural way, at some later point when
back-chaining is going on, for the
program to infer from known rft's that some
particular definitional structure
should be relevant.
Some
Transform Geometry
Transforms are operations that may involve
complete
sentences, but I believe
that smaller arguments will prove more useful,
especially
at the beginning
of the program's learning. For example, imagine
the
sentence "question: you
like apples" to be a path on a surface in
meaning
space. The surface is
defined by 1) the antecedent to "you" (let's
suppose, for the moment, that "you"
means "the computer"), 2) the template
for preference
words, and 3)
"apples". Thus
we imagine a point at "computer", a point at
"apple" and a cloud of points
defined by the template for words such as
"like" "dislike" etc. Just as
three points define a plane in classical
geometry, this
collection of
meaning-space specifications defines a
meaning-space
surface, or a cloud (in
either case, a geometrically limited portion of
the
space).
Suppose we are interested in constructing a
sentence to
follow "You like
apples?" Let's imagine for the moment that the
antecedent for "you" has been
properly transformed into the word "I". Now
imagine the points on the
surfaces that include "I" and
"apple". The geodesics drawn through those
words, assuming the surface is limited by the
"preference word" axis, will
include lots of legal replies ( I hate apples,
I am
indifferent to apples, I
love apples, etc.). The transform involved here
would
hold two points
constant ("I" and "apple") and would
only operate on the third. Other
surfaces, limited by other meaning-space
templates, would
result in other
collections of legal sentences. Suppose the
preference
words are replaced by
verbs in general. Then the geodesics between
"I" and "apple" would pass
through "throw", "eat",
"sell", "grow", etc.
Of course, the interesting question is "how does an algorithm decide
between saying
'I like apples' and 'I hate apples'?" For this,
the reader is urged to
visualize
the type of MS surface described above, with
its two fixed points on
the subject
and the object. The decision between 'like' and 'hate' must of necessity come from
some other sentence. Other sentences represent other surfaces, surfaces that
limit the choices in different ways (with respect to different axes). The intersections
of surfaces (although computationally obnoxious) are expected to be essential
decision-directing quantities.
Transforms
and the Melding of Conversation to Computer Code
Writing computer programs to interact with
people always
means crossing a
barrier at which the person's input has to
stimulate the
computer code to do
something. This inevitably requires a
translation of
human behavior to
something recognizable by the software. At one
extreme, a
pre-determined
list of behaviors from the human is specified,
and any
other behavior is
un-recognized. When you opt to start Windows in
safe
mode, you are given a
few numbered choices. Keying in any other
number brings
no result.
A goal of this project is to arrive at the
other extreme,
so that "internal"
transforms, such as those between sentences,
are
indistinguishable from
"external" or "functional"
transforms, such as those between the human input
and the program, or those between elements of a
conversation, and commands
issued to the program (to perform some
programming task
like "remember that"
or "look that up".)
As long as transforms can
move from a path in MS to a
path in another space consisting of computer
code, there
need be no barrier -
and the image
of these spaces as being separate
is
illusory. All the things
a computer needs to do, or know, are just
values on other
axes, and the
program doesn't need to care whether it is
transforming a
sentence into a
reply, or into a programming directive such as
"call
function 'parser' now".
Merging the two spaces is accomplished by
simply ignoring
their differences.
(This statement may seem glib and meaningless
at this
point. The idea of
using all sorts of input, untranslated, is
another idea
that is worked
through in recounting the historical
development of these
programs.)
Illusions
of Transforms and the Ur-Path
An early plan for this program involved the use
of shape
transforms to morph
input sentences into output replies. A variety
of
transform styles were
foreseen. Perhaps the most obvious was to
utilize
geometric rotation,
translation, and dimensional changes to learn
how
Teacher's input relates to
previous conversation elements and to the
current
context. Such transforms
would use pattern recognition to allow the
program to
recognize grammatical
structures from Teacher that generated reinforced behavior in the past. The
transform, with its fuzzy generalization at
both input
& output, would then
be applied to the current conversation, and
output would be
a morphed form
of the most recent sentence. In this view, a
transform is
a daemon,
activated, like an antibody, by the appearance
of input
that fits its
template.
This early formulation still has its place, but
I no
longer believe that a
"previous sentence" would be sufficient input,
nor do I expect output to
consist primarily of grammatical replies.
Fortunately, it
is no easier to
use, as input, an immediately-preceding
sentence than it
is to use any other
extant object in the program (as will be
described below:
cf. the use of
diverse time series in ECONOMIST).
Reinforcement is
expected to drive input
selection, in the same way as it is hoped to
drive the
actual procedural
decision making process.
Bertrand Russell explains clearly how Zeno's
paradox is
resolved by a modern
understanding of different sorts of infinities,
but
Zeno's conclusion that
motion is impossible is still a valid
conclusion for a
person unaware of the
niceties of 20th Century mathematics. To Zeno,
motion had
to be construed as
a succession - in time - of discrete states.
It is almost certainly illusory to imagine that
a series
of finished
sentences in a human conversation moves from
one sentence
to the next as a
result of direct transformation. There is a
huge amount
of internal
"conversation" that must happen inside each
human as the external
conversation progresses. To possess any kind of
"truth", the transforms
(which could
be defined as existing between successive sentences) would have
to include all of this internal processing.
This is
unlikely to be possible
without postulating the existence of
intermediate states,
and without
representing these in some way. Each
participant in the
conversation is
moving from many internal states to others, and
there are
no two internal
representational states between which another
cannot be
inserted. This makes
reaching the goal of a reply as impossible as
it was for
Zeno's runner to
reach his goal. And yet Achilles beats the
tortoise, and
replies are, in
reality, constantly being constructed.
For the moment, let's not concern ourselves
with the
problem of
initialization - so we're not going to try to
think about
what a baby's
brain does, starting from its first
perceptions. Let's
just imagine that an
adult mind is conscious of its own current
state, and
that such a state is
at least partially representable by a path or
constellation of
operator-related paths within an MS that has
been
enriched by decades of
conversational experience. Every element of
this ur-path
is represented by a
fuzzy cloud of some size. Perhaps
consciousness, or at
least the progression
of the internal conversation necessary for
reply-generation, could be
thought of as the crawling of this active path
through MS
as elements of the
path interact with stored associations, via the
mechanism
of resonance and
oscillation described below. Such an image
allows for a
practical
computation to be constructed that mirrors the
internal
"conversation" we
know to be taking place.
Then the problem becomes "at what point in your
calculations do you reply"
or "how do you know when the internal
conversation
should become external?"
Oscillation,
Resonance, and Spectrum
Clouds
& Vibration
It is useful, from the point of view of
computation, to
think of words as
clouds in MS. When actual input arrives, with a
specific
word like "apple",
the cloud is either very small (including only
"Pippin" and "Red delicious")
or completely collapsed to a degenerate state
("the
partially-eaten, very
ripe Pippin, from George's orchard, that we
were
discussing"). In considering
a reply, however, it might be useful to
consider clouds
which are very large
(all edible objects) or of some intermediate
size (all
fruits). I have found
it productive to imagine these reply-formation
clouds as
oscillations among
the words found within the boundaries of the
cloud. It is
a useful image
because the parameters we naturally associate
with
oscillations (such as
those between neutrino flavors or electron
spins) provide
a natural
computational framework for concepts usually
thought of
as linguistic.
Apples are similar to pears. They are close
together in
any reasonably
constructed MS. Therefore we define it to be "easy" for "apple"
to become
"pear". If the transition happens often - at a
high frequency - then "pear"
becomes almost as likely to be seen as its
parent,
"apple". If the
oscillation is weak - or requires little
"energy" - it is like an
"explosion" of a very specific bomb, as opposed
to a very general one (that
is, the oscillation occurs only between "apple"
and other words very nearby
in meaning-space.)
Vibration also provides a metric for certainty
that can
be sensed directly
by the program. A completely new word/idea will
not be
vibrating at all
(because there would be nothing with which to
set it in
vibratory-motion)
and it therefore exists in a state of
degenerate
collapse,
rather like a
quantum-mechanical object whose qualities have
been
forced to assume a
known value by the intrusion of an observer.
And as a
conversation proceeds,
an idea can become less and less uncertain as
information
builds up to the
point that its vibration collapses to zero - at
which
point that part of the
conversation can be perceived by the program to
be over, in the sense that
no further works needs to be done in that
regard.
For example,
"fruit" might be the topic, but if someone
brings in the idea of "red" or of
"gift to a teacher" then vibration to
"apple" can be enhanced while that of
vibration to "pear" is diminished. These bits
of information are readily
available to the program (and are nearly
instantaneously discovered) becasue
of the memory-management technique employed in
storing definitions and
conversations.
Likewise, a
region of MS in which rapid, high amplitude
vibrations are occurring, can
attract
the attention of the
program - there is necessarily much left to
investigate
in such a region,
whereas in a collapsed area, there is little to
discuss.
Thus the idea of
vibration can also address the problem of focus
- another
area of simulation
programming traditionally considered to be
somewhat
opaque. (I am aware that
I am using the collapsed state in two ways here
- this is
an area I am still
working on. Needing two modes of vibration,
however, each
with its own state
of collapse, is no problem - physicists do it
all the
time with independent
quantum numbers - and it just requires more
memory for a
computer program.)
Resonance
and a Resonant Network
It is particularly easy for a computer program
to
recognize the existence of
certain sorts of associations. All the nouns
which ever
occurred after the
phrase "I like" can easily be collected
together. The words in such
collections can be said to be oscillating among
themselves. Thus an
arithmetic relation can be established between
words,
based on
conversational appearance, rather than based on
static
definition, and
conversation can directly effect definitions in
the
database.
(Programmers not familiar with the form of data
and
memory management used
in this project might balk here, correctly
realizing that
the following
statement would not be true in an exponential
search
environment. This
problem is avoided by using content-addressable
memory,
and therefore I must
ask you, please suspend your disbelief until
you've read
the history below.)
It is also easy for a computer program to
recognize when
two words are both
oscillating to a common, third word. This
property, which I
like to
think of as a
resonance, also provides arithmetic parameters
which
allow us to move
between the realms of meaning and computation.
A learned
network of
resonances of variable strengths, makes the
dictionary
much richer than the
initial set of points in MS.
When two objects are found to be vibrating to
the same
pole (or to poles
fuzzily nearby one another), their 'energy'
might be
increased, or they might
be defined as having a useful association.
Likewise, the 'motion' between
objects in MS can be mediated by a number of
fundamental
operators
( "is a property of" or "points at" are operators
that link ideas
in the sentences "Red
is a property of apples" and " 'adjective' is an
object which must point at a
noun").
Resonance can be established as the vibration
along equal
mediators, as
opposed to vibrations to similar locations in
MS. This
provides a means of
sensing intersections of meaning outside of a
conversational rubric. It also
provides a natural way for the program to sense
relationships between internal
parts of two words, independently of their
complete
definitions.
Words can be similar, or close together,
because they
have similar values on
similar axes, or they can be similar because
their
definitions have a
similar internal structure. While the first
sort of
similarity is easily
seen between a pair of words like apple and
pear, the
second sort is seen
between ochre ( used for dyeing, has a color)
and pepper
(used for spicing,
has a
flavor). This example is mundane, but
consider the
following. There is
a fairly short list of social/logical
descriptors of
conversational
elements, and one of them is the polar relation
between
"agreement" and
"disagreement". Along this axis is the
word/phrase/sentence
"Probably so."
This concatenation is also similar to "ochre",
since it is used for
agreeing, and has
a position (along that axis
of
"degree of agreement".)
This is an example of a simple relation that
would
unavoidably be sensed by
this algorithm, but which points to a
relationship of which I
have never
personally been aware. If this relationship
is never
used, it will never
be reinforced. If it is never reinforced, it
will seldom
be used - thus if
the relationship is meaningless it will subside
into the
background. On the
other hand, this sort of mechanism allows the
algorithm
to sense and use
relationships unknown to its creator.
This use of relations unknown to the programmer is a
core
value expressed by
the programs described in the history.
Physical
objects
interact according to
physical laws. It may be useful to think of
"splashes of
axis-value
information in context-created clouds" as
interacting,
according to some set
of laws. By setting up linguistic entities with
a wide
variety of active
parameters, and by setting up the program to
favor or
inhibit the uses of
these parameters in different ways, according
to
reinforcement received from
outside, it may be possible for the program to
establish
and use such laws,
even though they are not available to the
programmer.
This property of the
brute-force, unprimed robot - demonstrated
below in
simpler programming
circumstances - is expected to prove crucial to
the
success of this program.
The program CHANTER involved no musical
knowledge
whatsoever, and the
current project is dealing with a subject of
extraordinary difficulty and
complexity - it is absolutely essential that
the program
itself be capable
of establishing and using relationships which
the
programmer never predicts
or encodes.
Resonance relations may allow the use of an
analogy to
the idea of
cross-section in particle physics - in which
some
particles are transparent
(have a low cross section) with other
particles. The word
"Bill Clinton"
would be expected to have a low interaction
with the word
"dye", but the
word "maroon" would be expected to interact
highly with "dye". Put in other
words, "to dye" should have a very high
resonant energy of interaction with
"maroon", but not with "Bill
Clinton". Dyeing and maroon would both be
vibrating to the MS location "color" and would
therefore resonate
automatically. Easily found, quasi-automatic
means of
association, like
this, are of use to a system building up
its behavioral
repertoire through
massive repetition and constant
positive and
negative reinforcement.
A peculiar network structure is also possible
using these
ideas, a structure
that may be related to neural network function.
= Take word X, and
make a list of all the words to which it vibrates, or
with which it has ever
been associated, or to which you can "bomb" from
=Give each word on the list a counter, so that
you can
number of times the word recurs throughout this whole process
=Then do recursion: go through
each word on the list, doing the same thing,
such that you create lists of new words associated variously with each
word
on the original list (that is, the list that came from the original
word )
= Each time a word recurs - that is, each time one returns to the
same word
from
a different direction, or via associations with
a different word -
tell the program
to increment that word's counter.
=Allow this
dark-cycle
crawler (see below) to operate on the database
whenever the processor
is free.
The words that acquire higher values must
surely be
differently related
to the initial word
than words that appear only once.
Oscillation
at Other Levels
If a word in a path is "in motion", vibrating
between itself and related
words, then the whole path may be considered to
be thus
alive. This provides
a second model for the procession of paths in a
consciousness, not modeled
on the idea of "reply-formation" or
"conversation". An idea, or a
consciousness, or the concentration of a mind
on a
subject, might usefully
be thought of as a fuzzy path whose vibrations
cause it
to crawl through MS.
If the vibrational modes are rich enough - and
in
programming there is
little to limit the richness of these modes -
then this
model of a path
might function more realistically than one
based on
precise locations of
points, or on transforms between skinny paths.
This vibration has different meaning depending
on what
level of object is
involved, but all levels are seen as candidates
for this
idea. It's pretty
easy, and the analogous "meaning" fairly
obvious, to make "orange" oscillate
with "yellow" - they're right there, next to
each other, on the "color" axis
- and if that parameter in the definition of a
fruit is
oscillating, then
there's a simple, arithmetic, easily programmed
mechanism
for generalization
of knowledge among members of a class - and
this
generalization would be
controlled easily. For instance, if you're
talking about
fruits as
ammunition in a food fight, no mention of color
is likely
to arise, and the
two fruits would be considered functionally
identical. If
you're talking
about painting them, color would be present in
the
context, causing the
color of the objects to vibrate. (Exactly how
& when
such an "energy
transfer" would occur is one of the most
stimulating
avenues of this inquiry
- what laws do blobs of axis-value-pairs and
pointers
obey, when they
encounter one another in a context? Is this
meaningful at
all, or useful?
What definitions of "energy" might be most
useful? Does all this relate
directly to any imaginable physical process in
a brain,
or is it purely an
abstraction? If it's an abstraction, is there
any hope
it's isomorphic with
brain activity? Is that even desirable? A
beginning in
this area has been
attempted with respect to physical analogies
below.)
One difficulty in programming simulations of
human
thought concerns the
seemingly unpredictable associations that we
are so prone
to make. Programs
that "think" also have a related need to
be
able to become different - even
insane - as the separate "runs" of the program
mature. The vibration of axis
values within the definitions of words provides
a
computer analogy for
associations which are not connected to the
complete dictionary
definitions
of any currently active word. Similarly, a word
in a path
can be set into
vibration with other words whose only
association with
the initial word was
that they appeared together in a previous
conversation.
In this way
linguistic associations independent of
definition can be
built up. Thus
separate runs of the program, even using the
same initial
database, could
develop entirely different learned behavior,
and
instances of the program
which are ineffectively taught or reinforced
can
establish nonsensical
associations, making them behave in ways
seeming
relatively "childish",
"stupid" or "insane". This property
implies the possibility of becoming
relatively "adult", "intelligent", or
"sane".
Physical
Vibration and the Data Structure
Physical vibrations have specific ways of being
characterized: frequency,
amplitude, etc. These have been carried over
into the
data-structures I am
using for words and their constituents.
In the first place, "vibration" has to
be understood as something other than simple
harmonic motion, in which
a value varies according to some periodic
function - like a sine wave. In such
cases, the function spends equal amounts of
"time" above and below zero. The
vibrations I need are more like the action
potentials of a neuron: the function spends
most of its time on the base value, and, at
some frequency, the function "fires" and
spends a discreet brief interval at the other
value. Such functions, while more unpleasant
mathematically, are very simple to program
with. The
frequency of
vibration is a
rather direct matter: imagine looking at two
paths with
similar
constructions, and in one, a word is vibrating
at a high
frequency, and in
another, it vibrates only rarely. The one
vibrating
rapidly spends more time
at the opposite pole of its vibration than the
slow one,
making its
vibrational partner more of a candidate for
perception
by, or use in,
another part of the program. The occurrence of
a related
word, or of an axis
present in both words, might be a signal to
increase the
frequency or
otherwise change the energy of an element's
vibration.
(These processes are
expected to be directed, sorted, and evaluated
automatically via the
reinforcement idea described below.)
Amplitude has an equally direct analog in
programming. A
cloud surrounding a
word extends further into MS if its amplitude
is higher
- providing another
numerical analog for degree of generalization.
A number
of structures in the
program incorporate the general premise that
any point in
MS essentially
defines the center of a larger region.
Most vibrations in nature are damped, and
slowly decay to a degenerate state.
Some oscillations used by this program will be
part of the current context only,
rather than a permanent learned part of the
word's definition (these impermanent
vibrations will exist in a quasi-tmporary
structure called the BigBuffer, and
will not be stored in association with a word).
These context-dependent
vibrations will decay as they age, losing
frequency until they drop out of the
buffer. This provibes an imperfect method
of insuring that context can evolve.
Unfortunately there are many times in thought
and conversation when contexts
are entirely cancelled - I have no idea yet
about teaching the program
automatically to recognize when this happens.
For example, a new participant
in the conversation could arrive, and could
introduce a new subject. The old
one might never return, and in such a case, all
this context-driven oscillation
should be suddenly quashed. Perhaps some
signal will become clear that can
stimulate the program to ask "have you changed
the subject?"
Finally, the energy of physical objects is said
to have a
spectrum, and this
idea leads into the next topic of this
discussion. If a
word has different
"energies" in different directions (and
directions are what axes are all
about) then the connections that words form
will be
controlled in part by
the details of the directionality of its
energy. For
instance, any word
defined initially as a noun modifier will
automatically
resonate much more
with nouns than with, say, other noun
modifiers. This
provides a tangible,
mathematical analogy for parts of speech, which
we know
to be an essential
component of grammatical analysis. The use of
directional
differentials in
activation or resonance between words creates a
partial
mathematical model
for grammatical function.
Parts-of-Speech
The idea of parts-of-speech is important for
all kinds of
considerations
in processing linguistic material. It is an
idea that should arise
naturally from any model
of language use. One reasonable criterion
for success of a computational
theory of meaning would be its
ability to predict a word's part-of-speech
from the information
contained in the word's definition, and/or from
its use
in conversation.
The definition of a part-of-speech as a
template allows
for this requirement
to be met. It would be desirable for these
definitions of
the
parts-of-speech themselves to be divined by the
algorithm, without any
prompting from the programmer. This would be
easy if word
order were always
the same, and if there were no words (like
those nasty
adjectives) or phrase
structures which could intervene between words
in legal
sentences. This
algorithm should be able to set up classes of
words based
on templates
divined from conversational data, but deciding
that
"material objects" and
"labels assigned to abstract ideas" are both
"nouns" or at least are
"capable of being a subject" is a far more
difficult matter.
The program "Conversor" described in the
history provides a conceptual
structure that allows the current algorithm to
establish
and optimize its
own definitions of parts of speech. In that
program,
words' definitions
included a list of parts of speech that the
word could
represent. Sentences
were constructed by first constructing a
sequence of
parts of speech, based
on previous sentences' sequences. Other
routines then specified what
words might fit into
the various slots provided.
If template definitions for each part of speech
are
provided by the
programmer, then the current algorithm could do
the same
thing, by seeking
words that fit the template, and then by
concerning
itself with series or
patterns of template labels. But this algorithm
is
supposed, as much as
possible, to find all these sorts of things on
its own,
merely by examining
its input data (conversations between other
humans or
those between humans
and itself).
The program uses two fundamental cycles,
referred to as
"light" and
"dark".
"Light cycle" activities require the presence
of an interacting human, and
most such organisms are active during daylight.
A large
number of functions
must occur, however, that consist of
operations on the
database, and these
take place either in the absence of interaction
or as a
multi-tasked
dove-tail with the calculations required for
interaction.
One
autonomous routine (see "crawlers" below)
postulates templates which
might serve to define parts of speech and
traverses the
database to evaluate
and optimize its postulate. It converges on
template
definitions which
maximize the success in predicting patterns of
template
appearance in
conversation. (These predictions are made by
re-using previously acquired
training data. This sort of self-testing was at
the heart of the evaluation process
for the program ECONOMIST dexcribed in the
history below.)
Some of these templates look like
parts of
speech, although
the program will find not likely end up with
the small
number of classes we
usually use in grammar class. "Preposition" is
just too broad a category,
including both the word which forces
"infinitive" ( "to go home"), and
something like "underneath".
Unfortunately, a consideration of the
data
structure (that
results from
defining a part-of-speech as a template)
suggests a
complication not
obviously accounted for in normal grammar.
Since the data
structure is
similar at all levels of construction, it is
clear
that it
allows for entities
that function like pronouns in several other
"places".
Pronoun:
subs for a noun
(George
likes apples) (He likes apples)
auxiliary:
subs for a verb
(question: George likes apples?) (Yes,
he does)
phrase:
subs for a modifier
(George is green?) (Yes, he looks-thus)
"yes":
subs for a sentence
(Are you green?) (Yes)
((I am green))
a word
is a substitute (label) for a set of
axis-value
pairs + pointer
a phrase-number
substitutes for a phrase
concatenated
from words
When a wildcard function substitutes for an
axis-value in a
definition, a
class-word results ("apple" minus
"red" equals "fruit")
but exactly what is implied by the data
structure in
which an axis is
replaced by a label rather than by a variable
is unclear.
Resonance, Orthogonality, and Continuity
Resonance and oscillation among words can be
established
by observing
conversation. If these ideas are taken to be
analogous to
distance in MS,
then conversation can provide the information
necessary
to calculate the
angles among MS axes. Imagine the 2-dimensional
version.
Knowing the
coordinates of two points allows the
calculation of the
distance between
them, and knowing the coordinates of one point
and the
distance between, one
can specify a locus on which the other point
must be.
|
| a
|
|
|
|
b
|
-----------------------------------------------------------------
Now assume that the numerical coordinates of a
and b are
fixed and known,
and that we know the distance between the
points. If the
same pair of
coordinate values must be retained, but the
distance
between them is known,
because of conversational input, to be
decreasing, then
the angle between the two axes must decrease
also. It's
not possible to
solve this concisely without knowing one other
fact, such
as the angle
between the line ab and one of the axes, but
that is not
important for MS
calculations, which involve fuzzy quantities
and
imprecise magnitudes to
start with. Moving the axes away from
orthogonality
simply means that there
will be a component of motion along one axis
when there
is change along the
other, unlike the situation with right-angle
axes. This
is sufficient to
deal with the types of resonance &
vibration I
believe to be necessary. It
is also sufficient to establish that two axes
are
identical - a situation
likely to occur once the program begins
creating new axes
of its own.
Thus conversation itself will both enhance the
richness of
words' definitions,
and progressively refine and inform the
topology of the
space itself. It
is easy to imagine that functioning programs
with
different internal
relations among axes would behave very
differently. For
instance, a program
which has learned that the "color" and
"redness" axes are not orthogonal
would naturally appear to be more intelligent -
or at
least less ignorant -
than one which had not learned this fact. This
aspect of
the model is one of
the ways to relate ideas like ranges of "correctness" or
"competence" to
ranges of specific arithmetic parameters. An
analyst
could look into the
algorithm's stored opinions about axis angles,
and could
see quickly that
some level of error exists.
It was remarked above that there are dangers
with certain
analogies between
the usual spatial reasoning and MS functions.
It is
possible that these
analogies would be more useful if the MS axes'
angular
relationships had
been fully optimized by extensive exposure to
conversation. Perhaps bombs need
not
disappear on their trajectories, and maybe
shape-changing transforms
would
be useful in such an optimized space. If
this were
true, then a bomb
that is known to represent a valid transition
between
words in a sentence
would retain its validity when its origin is
translated, or a shape-morphing
transformation between two sentences could be
applied
with general success
to any sentence matching the shape of the
transform's
"origin" sentence, no
matter where the shape appears in MS.
Fuzzy data, Fuzzy logic
"Fuzzy logic" was a term invented when
researchers began to address the
difficulties presented by chaotic systems, such
as
turbulent flow in fluid
dynamics. Computer realizations of such systems
occasionally involved bits
which went beyond the traditional binary ones.
The term
came to be connected
with specific techniques of analysis
that were
anything
but inexact - the
name has become somewhat misleading, rather
like the
Heisenberg Uncertainty
Principle - a bit of knowledge that allows
calculations
to be made in which
there is no "uncertainty" at all.
In my work, the idea of fuzziness is much more
direct,
and may have little
or nothing to do with fuzzy logic as it is
commonly
defined. Rather, there
is, in the attempt to imitate human behavior,
the
necessity to face the
human ability to generalize (so that if we know
something
about apples, then
we know something about pears) and our ability
instantly
to evaluate the
extent to which new input is similar to
something we have
experienced before
(such as our ability to recognize faces
immediately, or
to evaluate the
face-like-ness of an abstract representation
without
apparent analysis or
difficulty <draw a happy face here>).
Both
difficulties require the
manipulation of regions of meaning-space rather
than
points. The MS points
representing pears, cumquats, peaches, and so
on form a
cloud near the
point which defines "apple", or perhaps more
accurately, around a point
defining "fruit". It is this type of
"fuzziness" with which I am attempting
to work. In a fuzzy space, paths no longer
appear as thin
lines projected
between precise points, but rather as less
precise
elongated regions
connecting clouds of varying sizes.
In practical terms, this fuzziness is
accomplished in a
three ways. First,
when a word, bomb, path, behavior, or any other
entity is
stored in the
database, nearby points in MS will be filled
with
pointers to the object in
the center. Thus if some process accesses a
spot in MS that might otherwise
be empty, but that is reasonably near to a
filled
location, a pointer to the
nearest word could be found. (This immensely
memory-hungry technique was not
available as recently as 2000, before really
large memory
storage systems
became available at low price.) One of the
"sub-conscious" processes (which
must take place for a program like this to run)
is this
explosion of MS
points into fuzzy clouds of pointers. (Important unanswered questions
involve the interactions of these clouds.
Should they be
transparent to one
another, if they overlap, or is that overlap a
useful
"cognitive" event?
What laws would these interactions obey? How
could such
laws be found
empirically, or better, automatically, by the
program
itself?)
The second form of fuzziness requires a
somewhat more
detailed reference to
the
mechanics of memory management used. The "window" used by
OTELLO, a
game playing program, is its input information.
It includes 8 ternary bits
of information, representing the current status
of an
edge on the board. A
window might look like
1
2 1 0 0 1 0 1
when taken
directly from the board. This window would
cause data about a good move to be
stored at a feature-space location numbered
1+(2*3)+(3^2)+(3^5)+(3^7).
A more complete series, including all the
required powers
of three, and all
the zero terms would be
1*(3^0) + 2*(3^1) + 1*(3^2) + 0*(3^3) + 0*(3^4)
+ 1*(3^5)
+ 0*(3^6) + 1*(3^7).
This computes to 2446. The first type of fuzzy
storage is
the simple removal
of one window element, and replacing it with
zero. The
same move is then
stored at a new location, calculated with this
zero in
place. This extra
storage at
imperfect windows means that the move will be returned in a
variety of circumstances, rather than only in
one
circumstance. Suppose the
game proceeds, and a new window occurs. If
there is no
move stored for that
window, the program could successively zero-out
window
elements, searching
for a fuzzy-stored move. Any returned move
would not be
as good, since it
may have come from a seriously different
situation (in
fact in OTHELLO one
would never do this!) but at least there would
be some
output where before
there was none.
One of the things people do, that is difficult
for
computer programs, is
constantly to generate behavior even in the
absence of a
precise image,
either of the current situation or of the
desired
outcome. We flail about,
expecting, if nothing else, to be smacked on
the hand by
Teacher when we
mis-behave, or we expect to be told "you're
getting
warmer". This smack, its
absence, its
opposite (a reward), or its fuzzy version ("warmth") lets us
know whether or not our flailing was
appropriate, and
next time the same
situation obtains, we can more usefully direct
our
behavior. The initial,
inexact step - the generation of efficient
"flailing" - is trivially easy
for us, and damnably difficult to encode. The
combination
of the PURR-PUSS
algorithm described below, with these
elementary fuzzy
techniques provides
an inroad to this problem that is both very
simple to
calculate and
procedurally efficient. This is another way of
saying
that an inroad on the
problem exists that is related to very simple
mathematics
- and that is one
of my overall values in programming.
The third type of fuzzy storage also operates
in the
realm of PUSS modules,
and requires
the storage of the new input at locations calculated from
windows which have been adjusted slightly. Thus
one might
store the same
OTHELLO move at locations calculated from all
the
following windows:
1
2
1 0
0
1 0
1 ( the original, central feature
vector)
1.5 2
1 0 0
1 0
1
1 2.5
1 0 0
1 0
1
1 2
1.5 0 0
1 0
1
1 2
1 0.5 0
1 0
1
1 2
1 0 0.5
1 0
1
1 2
1 0 0
0.5 0
1
1 2
1 0 0
1 -0.5
1
1 2
1 0 0
1 0
0.5
Etc....
Of course this procedure is isomorphic with
storage at
nearby locations in
the space involved. There are slight procedural
differences which make one
or the other more convenient at different times.
Fuzziness is expected to be applied to every
aspect of
this program,
including reinforcement. The same idea (that MS
points
are replaced by
clouds in the manner of wave-functions) will be
applied
to the adjustment of
choices and methods of choosing. If one
specific
transform, for example, is
reinforced, under a certain set of conditions,
then other
transforms would
receive reinforcement to the extent that they
existed
"near" the first one
in MS, and to the extent that the "set of
circumstances" resembles the
original. This is only possible because all the
parameters of all the
behaviors and transforms available to the
algorithm exist
as similarly
defined collections of MS locations,
pointer-types and
processes - processes
and pointer-types which are themselves also
located in
the space.
Meaning Space and the Management of
Conversation
At this point is is possible to describe some of the nuts & bolts
of conversational
decision making. There are a number of objects
and relationships, all of which
are visible to the central processing logic.
Objects
Axis
Axis/value pair plus pointers
Series of a/v pairs (one level in a word's
definition)
Series of levels (complete definition of a
word, bomb, or transform: not necessarily associated with
a dictionary entry)
Path (a series of objects with the "word"
data-structure)
Transform (generalized instructions for: how
to get from one specific path to another)
Hi-Transform (generalized instructions for:
how to get from one specific transform to another)
Relationships
Adjacency (words next to each other in input
sentences)
Relational adjacency (words associated by
bombs, vibration, resonance, etc)
Transformational adjacency (objects and the
objects to which the algorithm can get through transforms)
The algorithm functions by learning associations between any of these
objects,
via any of the relationships available. This is
purpose of the PURR-PUSS process,
as described below in the history of previous
programs. One of the phases of
program operation involves the traversal of
very large training sets, from which
information is extracted, buiding a large
learned and structured internal database.
When a reply is being formed by the program, the main process involves
the
creation of temporary objects according to the
stored associations in the database.
The creation of these objects takes a variety
of forms, as does their use. It is the
tuning of these processes by reinforcement that
will progressively refine the
algorithm's conversational ability.
As an example let's suppose the program has been asked "What do you
want to
do?" and that the reply-formation transform has
decided that the reply verb could
be "to go". The simplest path to path
transforms produce fragments of the form
"I want ....." when confronted with the input
"What do you want....", and with
the selected verb, there would then exist, as a
starting point for this example,
the fragment "I want to go .....".
The simplest relationship is adjacency. (Appendix III has the complete
list of
relationships.) There is a very direct, simple,
and fast procedure for accessing
the words that have ever occurred after the
fragment "to go", and it produces a
list that looks like this:
home
by car
to the bathroom
quickly
away from here
on American Airlines
etc.
Having a word, of course, allows access to a definition that includes
large
amounts of information both from the dictionary
and from the context-dependent
parts of the data-structure.
A rather simple statistical summing procedure easily separates the
words on this
list into three clusters, according to
common axis values present in the
definitions of the words. A complete list of
these collapsing functions may
be found in Appendix III.
locations (destinations)
means (modes of travel)
adverbs (qualities of travel)
Each of these clusters represents a collapse of template windows taken
from
the listed words' definitions. The object
created by this collapse has a list of
axes, values, pointers, etc., but may not
correspond to any word in the dictionary.
The summing function returns not only this
word-like object, but also a measure
of the strength of the association. There are
of course a variety of functions that
could be used to measure the relationships
between definitions, but for the
moment let's just imagine that the function is
simple distance in MS (see "templates"
above). The example in question involves three
clusters of words that are very,
very similar in definition, and the clustering
would be recorded as being very
strong. (Naturally these cluster calculations
need not be repeated, but since they
involve not only words, but phrases and
context-dependent information, there
would be many cluster lists created, some of
which might never be used again.
PURR-PUSS, as usual, keeps track of which clusters have been calculated already.)
In our example, therefore, the program has created a number of objects
that represent
classes of learned associations relevant to the
current context. Presumably, earlier
processes have produced other temporary
objects. In this example, the crucial decision
to be made is a selection of one of the three
clusters as a direction for the
conversation to take. Let's suppose that the
conversation had, for awhile, been
involved with determining what time it
is. The discussion of time would have
created a temporary object that is essentially
a generalized adverb, and the
resonance function would have no difficulty
pointing out the match between
the previous conversation's focus on time and
the current reply's "adverb"
cluster. Thus the program uses both definitions
and context, plus all the specific
words and phrases in play, plus its stored database of associations.
The resonance relation is actually rather crude and arithmetic. Much
more
interesting are the topological possibilities.
The definitions of words are
appropriately visualized as clouds with a
density that increases as some
relatively central point is approached. If the
word is thought of as a template
with narrow, limited window openings, then the
cloud shrinks as the windows
are narrowed. Opening a particular window
stretches the cloud along a
particular axis. Imagine that a cloud has been
stretched 'way out, and that we
decide to be interested only in a region with a
certain minimum density, or,
said another way, within a certain distance of
the central values. This entity
approaches being a line, if only a few
dimensions are involved, as the value along
one increases with respect to the values along the others.
Having created a line, it is easy to imagine that the opening of two
template
windows would stretch a cloud in two
directions. A similar restriction by
density creates a surface in MS rather than a
line. Clearly this procedure is
available for any number of dimensions. It is
the interaction of these surfaces
that is expected to provide conversational
behavior more advanced and
sensible than the simpler examples presented
above.
The clusters created by each word and phrase acquire a contextual
axis-value
according to the relation that was used in
their creation, and the clusters may
themselves interact directly, creating
descendents. For example, pronouns
create clusters with wild-card openings. The
sentence "Which one?" creates
a cluster with an opening for a noun. Any noun
that has been lit up, or any
class that appears as a cluster, will be
attracted by this opening, and the
resonance relationship will increase the energy
of the noun. Whichever
noun enjoys the strongest resonance relations
with all the clusters present
will have a favored position in the stack of
possible reply words. It may
also be sensible to allow the cluster with the
opening to change or spawn
a descendant when its wild-card position is filled.
The filling of a wild-card slot is a fairly obvious thing to do, but
the MS
structure allows for many more interactions.
During the learning process,
clusters will be created as input arrives.
Subsequent input, presumed to
be "correct" or "desirable", provides targets
to be sought out amongst
possible relations between clusters. If two
clusters are found that combine
in some way that points to a word that appears
in the subsequent input,
then that relation is reinforced, and when
clusters of the same type are
present in the future, the reinforced relation
has a better chance of
influencing outcome. This lengthy process is
expected to take place
primarily as a dark-cycle crawler (which see).
In any case a repertoire
of reinforced relations will be built up and
refined as input is analyzed.
All the relationship types, and every method of
pointing from one place
to another, will be a candidate for this process.
A single cluster can also be formed whose axes and values age rapidly
and
diasappear frequently when not renewed by
successive appearance in the
conversation. Such a running sum could contain
a relatively rich definition
of the "current subject", or, if its
summing-function were an operator like
"is subject to", then the cluster could define
a region in MS containing sensible
grammatical objects for verbs, prepositions,
etc. Dark-cycle evaluations of such
clusters - again, like ECONOMIST - provide an
optimizing process independent
of any teaching.
Using a subtraction rather than a summing in the formation of a cluster
leads to
a different set of possibilities. For example,
any program like this must keep some
kind of historical tree of the current
conversation, so that side-subjects (such as one
required if the teacher uses a word the
computer has never seen) can be pursued,
and then originating threads can be taken up
again. One sort of branching that must
be dealt with in such a tree occurs when a
sentence like this comes up:
"George thinks it's green, but Bob says it hasn't
been dyed yet."
There are two branches that must be analysed independently. The program
must
establish stable elements "person who believes"
and "thinks = says in this context",
and then must figure out what the relevant
conflict is. A subtraction cluster would
result in time axes, since that is the
difference between the two un-stable sentence
elements "green" and "hasn't been dyed yet" is
"time". This would tell the
algorithm that the important thing to discuss
involves not Bob or George, not green
or dye,
not it or is or hasn't, but "when". Therefore the
sensible reply requires nothing but
the necessary pronouns and auxiliary
substitutes and time-words:
"When will it be done?"
Starting
from Scratch
An obvious weakness of any program such as this
would be
an inability to add
understanding of the world as the program
continues to
function. Clearly it
must be possible to add new words and
behaviors, automatically - that is,
not by explicit direction from humans through
either
programming or
linguistic instruction. Likewise, the program
must be
able to sense the need
for a new axis, and then to redefine words to
include
values on the new
axis.
There is some reason to believe that the brain
is not an
empty,
connection-free void at birth, but rather that
it has
vastly more
connections than can be useful, and that they
must be
pared down before any
sense can be made of sensory input. This is the
opposite
paradigm from that
used by most learning algorithms, which start
with a
literally empty memory,
and add to it progressively. When
operating from a completely ignorant
starting point, this program will utilize some
procedures
that view the task
from both directions. There is no functional
difference
between the two
following computer realizations: 1) consider
all possible
connections in MS
to exist a priori, and use reinforcement to
remove them
successively; and 2)
allow a random vector generator to be the
source of
behaviors presented for
reinforcement. (Even if all connections exist
to start
with, they have to be
selected one at a time for testing by the
reinforcing
routines. One way to
model equal probability of selection of behaviors is by the use of
randomness in the selection procedure.) For a
reinforcement-guided system to
function, there must be some behavior to
reinforce.
Generating the
"flailing" with random selection of behaviors,
or through random creation of
the behaviors themselves, may mimic this
full-at-start-up
situation in the
infant brain. This randomness is the most
primitive way
new entities would
be created.
The addition of new words will happen all the
time, in a
sense: anything
that has a location in MS might be a word - the
program
will in many
circumstances identify MS locations which have
no
definition, and it will
have no option but to ask "is there a word that
means: ____" and then list
the axis information associated with the
location. It is
something of a
conundrum as to what the program should do if
Teacher
says "No, there is no
such word". Should the computer refuse to make
any
calculations involving
that MS location? Or assume that it is in some
sense
incorrect, invalid, or
negatively reinforced? Or should it insist that
its new
"word" be recognized
and allowed into the vocabulary of the
conversation?
I can imagine some limited ways for a program
to
recognize the need for a
new axis. For example, if a color-blind program
were
taught that there
existed "grannysmiths" and
"goldendeliciouses", it could become clear
through parallel sentence structures that these
two
noun-classes were
identical with respect to every axis the
program knew,
but that Teacher knew
the two classes differed in a single way. The
program
obviously has no way
to divine what word Teacher would attach to the
new axis,
and so it seems
legitimate for the program to be constructed
with the
knowledge that it may
ask
for the name of the new axis - in this case,
"color".
Some new axes would be directly suggested
linguistically.
Colors, for
example, could be thought of a different "types
of
light". Any situation
involving different types of anything could be
seen as an
occasion to add an
axis, or at least to ask Teacher if one is
needed.
Furthermore, from a
purely arithmetical point of view, every
multi-coordinate
location in MS can
define a new axis, although not a new
orthogonal one.
Additionally there may be purely mathematical
techniques
to apply. For
example, consider the situation described
above. Suppose
the program
discovers two transforms which can operate on a
single
sentence which
includes the word "apple", and that the two
resulting replies are both
accepted as equally true by Teacher.
Additionally require
that the two
resulting sentences have a parallel grammatical
structure
and the same parts
of speech in the same positions. In some such
pairs of
sentences, there
should be a relationship between 1) the
higher-level
transform one would
have to use to get one transform from the
other, and 2)
the new axis that
would be required for the program to
distinguish between
the two varieties
of apple.
It is another matter to automate the geometric
definition
of the new axis.
The assumption that the new axis is orthogonal
to all
existing ones is
clearly wrong - among 500-odd axes there is
almost
certainly going to be a
closer relation to some axes than to others
(and the more
similar an axis is
to another in meaning, the further from 90
degrees is the angle between the
two).
This matter may be irrelevant - if it turns out
to be
important that the
definitions of words be "correct", then this
geometric puzzle will have to
be solved. Fortunately, perhaps, this may not
be a
requirement at all, since
it appears that input encoding in our brains
works perfectly well even though
we have to use whatever (quasi-random) physical
connections are established
during the development of the embryo.
These connections, that cannot be
genetically specified at the level of detail of
a the processes of single neuron,
are used by the learning algorithm of the
brain, no matter what their details are,
and in an analogous way, this algorithm could
use any set of defining axis-value
sets as label-generators, as long as they were
sufficiently separate and sufficiently
rich.
Input
Encoding
One of the most difficult questions for
neuroscientists
has always been how
information is encoded in the cerebral cortex.
It is
obviously nothing like
the neat one-to-one mappings used in computer
databases.
Memories survive
physical damage, removal of half the brain, and
the
superposition of
seemingly infinite amounts of additional
storage. What is
going on? I
propose we not even try to answer precisely,
but rather
ask "what is the
character of what is known about the cortex",
and see
how far we can go with
simulation.
We know the cortex is a system which has a
current
state, that this state
evolves with time, and that the cortex adjusts,
according to
input, its temporary
and permanent internal structures (or
circuitry, or
resonances, or RNA, or
something). We know that the system is capable
of treating
the same input the
same way, time after time - this is called
recognition.
Somehow, initially,
senses send information to the flexible
changing cortex
in a way that is
stable, detailed and repeatable, and something
physical
happens which
produces the same change of cortical state,
time after
time.
The definition of words according to axes in a
meaning-space might appear to
be an
attempt to establish a computer structure which can contain the
meanings of words. Although I briefly was
entranced by
this possibility, I
have come to understand that such a grandiose
and
difficult task is unlikely
even to be breached by such a simple construct.
However,
these detailed,
stable and
repeatable definitions absolutely do provide a program's
changeable, flexible memory with a detailed,
stable, and
repeatable system
for input encoding. The success of MS in making
the
actual meaning of words
available to a computer is unlikely - as
unlikely as it
is difficult to be
"correct" in one's construction of the axis
system and the definitions
included therein - but the utility of a stable
input
encoding mechanism has
been demonstrated throughout the history of
programs
created in this series.
No "correct" analysis of Gregorian chant was
performed in the creation of
CHANTER, and no suggestion of "understanding of
meaning" was contained in
the machinations of ECONOMIST, yet each was
able to
operate with some
facility in its chosen encoded realm. The
dictionary of
MS-defined words at
the very least creates a stable input path with
sufficient richness to allow
for differentiation amongst a wide variety of
objects.
The combination of a
few hundred axes of "meaning" with the few
dozen types of operators,
pointers, bombs, and so on, unquestionably
provide the
raw material for the
creation, by the stable and repeating
mechanisms of the
program, of a highly
ordered, entirely deterministic internal
tapestry. The
output routines that
interpret that tapestry for an observer, if
properly and
sufficiently tuned
by reinforcement, might allow the observer to
witness the
same sort of
filtered, processed regurgitation of that
elusive fantasy
- objective
reality - that we see every moment in ourselves
and in
each other.
Program
Structures
There are a number of program structures, not
directly
connected with
meaning-space or the definitions of words, that
appear to
be necessary for
this project.
Temporary
Definitions, Parallel Reasoning Structures and the BigBuffer
The definition of a word has both stable and
context-dependent parts. The
stable parts of the definition are part of the
word's
location in MS, but
the context-dependent parts can only be
determined at
run-time. For example,
"finding" is forced to be a noun by the
presence of an article - as in "The
judge authored a finding....". This means that
words
must be placed in
temporary buffers when they are "in play", as
it were, and part of the
pre-processing done on input must be able to
adjust
values in the buffer so
that the word can be interpreted properly for
the current
context.
Operations on the definition of the buffer
would only
effect that part of
the word's permanent definition which catalogs
what
objects (as in direct-
or indirect objects) or associations the word
is able to
form.
If a word is forced to exist as a part of
speech not
originally included in
its definition, then a new word must be placed
in the
database. There are
avenues by which the program would then be
forced to
construct questions
about the new word. First, there are certain
values which
must be present in
the definition of a word, purely for the
purpose of
managing the data
structure. The presence of unknown values -
which would
result from the
creation of a new word by the program - is easy
to
recognize as a focus
point for the program's attention. Second, the
history
that the program
keeps of its activity assigns dates of usage to
words,
and a new word's low
age can serve as another signal to the
program's
attention-focusing routine.
Third, a new word would have no poles toward
which it
could vibrate. A
motionless object is easy to pick out among a
large
number of familiar - and
therefore oscillating - entities.
The use of temporary buffers for words'
context-driven
temporary definition
parameters is a part of another necessary
organizational
structure for a
program like this. It must be possible
sometimes for the
program to keep
track of two branches of a conversation, and
eventually
return to an earlier
branching point. For instance, the appearance
of new
information will
usually force a somewhat standard set of
questions to
appear, and the posing
and answering of these represent a side-branch
of a
conversation. To be able
to return to that point in the conversation at
which the
side-branch
originated, the program has to keep a sort of
historical
tree that specifies
which sentences belong in which branch. This
parallel
reasoning structure
would have to be used if the opinions of two
humans had
to be analyzed
("George said apples are red, but John says
they
taste good.") Parallel
grammatical structure of adjacent input
phrases, and a
number of signal
words (but, however, either) serve as signals
that such
branches are needed.
Finally, there are two parts of the program's
function
which require that a
very large buffer be kept, which will
essentially list
the elements which
are currently active. First, the idea that a
region of MS
is to be "lit up"
by the explosion of a bomb, as well as the idea
that
clouds in MS might
interact on their own - that is, as
sub-conscious
processes outside of
reply-formation - require that the algorithm be
able to perceive that some
subset of all MS locations are currently
available, or
are currently active.
I know of no way for a computer to globally
scan its
memory and pick out all
the points which have some characteristic,
outside of the
kind of
brute-force search which the PURR-PUSS memory
system is
specifically
designed to obviate. If an MS element is to be
"lit
up" and therefore is to
draw attention to itself, it must be present on
a short
list which can be
efficiently scanned. (The brain does all this
really well
- anything can be
"lit up" and our attention drawn to it. This is
the definition of focus, or
attention.) This list will be large - probably
thousands
of elements - but
it will not grow exponentially as the program
progresses,
and its items will
"age" and "die" as "time"
passes, allowing for a stable buffer size to be
maintained. This is a programming element which
is a part
of the
housekeeping necessary for other parts of the
algorithm
to function, but
which has
little "meaning" of its own. It would be reaching to assert
some
sort of similarity to short-term memory.
Of course, it
may turn out that the most efficient implementation of this
buffer would still involve the PURR-PUSS
storage/memory system: in
this case, the input to the memory-probing
routines would just be the label
"BigBuffer", and it would then retrieve a
series of items' labels - these
would be the items currently "lit up". Each
time the buffer is accessed,
its members "ages" would be incremented, and
sufficiently unused ones
(this is the same as "sufficiently old", since
use of an element resets its "age")
could be removed. If one of the tricks of
PURR-PUSS storage is used,
then it could be easy to retrieve subsets of
items in the buffer simply by
including the limiting match in the input to
the probing routines.
Pre-processing
The user's input must be parsed and
pre-processed.
Musical input would have
to be decomposed into rhythmic & pitch
elements.
Input from proprioceptive
sensors would have to be encoded. As an example
in the
linguistic area,
pronoun antecedents must be determined, either
by
analysis or by formulating
questions.
In initial formulations, I have decided not to
bother
with verb forms, but
rather to reduce all conversationally used
verbs to a
schematic
representation such as " to go: past plural
conditional
". Additionally,
many, many single spellings map to
multiple
meanings. I have also decided
not to trouble the computer with this problem
at the
outset, but to
make sure that every word taken as input has a
unique
definition. "Mean"
will mean "nasty" when it is given to the
program as "mean2" and it will
mean "intend definitionally" when given to the
program as "mean1". Of course
these are essential aspects of "natural
language", but one must crawl before one
runs.
Reply
Formation and the Internal Conversation
After all the preprocessing is complete, the
program will
be in a position
to "construct a reply". If
the reply is to be a continuation of
a musical
idea being cooperatively constructed, then the
procedure
is simple and very
similar to that used by CHANTER. The linguistic
process
will result in one
of a number of outcomes, eventually arriving at
output;
there will usually
have to be a
good deal of initial internal conversation. For instance, a
simple PUSS (see below) will determine whether
input has
been seen before,
and what responses were previously generated.
If a
question has been asked
and answered before, the program must recognize
this and
either repeat the
previous response, asserting that it is a
repetition, or
it must take the
previous response series as new input so that
new output
would result. (This
routine would also naturally prevent excessive
repetition
of musical ideas
already used, or inappropriate looping of
physical moves.)
Even the simplest conversation requires
considerable
internal discussion on
each side. If you tell me "Spot ran home" I
know that we're talking about
the past, that Spot wasn't home before this,
that Spot
was at least at home
for a moment at some point after running there,
that you
can see Spot or
that someone told you about him, and so on. I
expect the
program to spend a
good deal of time talking to itself like this,
and that
its arrival at
output will be like the arrival of CHANTER at a
series of
pitches and
symbols known to be "cadential", that is, ones
which signal an ending. On
the other hand, since we humans will be able to
watch the
entire process as
it happens, we can intervene whenever we choose
-
whenever we think
sufficient work has been done that the current
ur-path
looks like output.
Since the algorithm fundamentally operates by
making
predictions based on
past experience, each time the Teacher
intervenes, the
algorithm will learn
something about "when it is appropriate to
stop"
and say
something "out loud".
Eventually, then, as the program learns other
stuff, it
will also learn to
stop and output its current state. (The
intervention by
Teacher in the
middle of internal calculations may be more
like
schizophrenic "voices" than
like a conversational interruption. Who knows
if this
might not turn out to
be counterproductive?)
The program will utilize a large number of
different
methods of finding or
creating relationships among words and phrases,
and a
number of methods of
getting these relationships into and out of
conversation.
These methods form
the core of the program's function.
Conversation
inherently includes
mechanisms for reinforcement ( a mother
responds to her
child in ways that
tell the child whether its "output" is
appropriate, correct, useful, etc.)
and this reinforcement - positive and negative
- will be
stored and used to
influence the process in the future. Presumably
the
various methods will
have different utility under different
circumstances, and
the reinforcement
regime will allow the learning of which methods
are best
applied under
various circumstances.
The
Program's Cycle
When not engaged by a human, sub-conscious
processes will
run. Various
"crawlers" (inspired by internet search
engines) will peruse the database,
establishing interrelationships of various
sorts
described elsewhere. These
processes will search for matching patterns,
compare
definitions and add the
results of comparisons to the database, perform
iterative
forward and
backward chaining searches to establish remote
relations
between words in
the manner of a neural net, allow regions of MS
to
interact according to
their intrinsic parameters and resonances,
manage the
list of questions that
need to be
asked of Teacher, construct word-primitives from word
definitions or add phrases that are
sufficiently
repeated to be used
as single words, and so on.
Pattern recognition is an essential first step
in many of
the processes to
be used. The model for this is the antibody:
crawlers
will search both input
and the database for objects which fuzzily
match items
that have already
been seen. These crawlers, and other
pattern-matching
processes, operate
like antibodies, that float throughout
biological
systems and bind
selectively to objects whose structure fits the
antibody's receptor
proteins. The various fuzzy-data techniques
will allow
the pattern
recognizers to bind not only to precise
matches, but
also, more weakly, to
objects which only partially match the optimal
target.
When engaged by a human user, these processes
will be
interrupted, and the
program will either "listen to a story" - that
is, take input without
replying - or it will attempt to engage in
conversation.
This "conversation"
may involve linguistic, musical, or
virtual-physical
events. The attempt to
reply will involve a number of stages,
including pre-processing
and parsing,
internal conversation, reply formation, reply
testing,
and the management of
repetition.
Managing
Reinforcement and Conditioning
Computer programs inevitably involve explicit
machine
instructions and
precisely defined memory management, and thus
one is
always in the position
of being able to keep a complete a record of
program
behavior. In this
project, constant interaction with humans
provides a flow of positive and
negative reinforcement coming to the program.
Together,
the ability to
maintain a history, and the presence of
reinforcement,
mean that program
behaviors can be associated with rewards and
punishment
in very detailed
ways. In fact almost all of the algorithm's
decisions
about which methods to
apply, under which circumstances, will be
formed by long
term, statistical
summing of inferred approval.
It is expected that a great deal of direct
reinforcing
input will occur
naturally in the behavior of Teacher, as the
program
sputters and stumbles
through its early attempts at behavior. Any
reply from a
human that starts
"No,..."
or "Yes, that's right,....." is an unambiguous message that
the
program's behavior has succeeded or failed.
Additionally,
any continuation
of a conversational tack (without some
corrective or
negative word) is itself
intrinsically positive - in answering
a statement that is nonsense, no
reply is
possible that is honest, kind, sane, and
non-humorous
except one
that somehow
recognizes the nonsensical nature of the
stimulus. It
will be amusing (!) to
watch instanciations of this program be driven
insane by
ill-conceived or
incompetently executed habits of reinforcement.
Or maybe we'll just do it for fun.
A biological organism's infantile behavior is
heavily
influenced by rewards
and punishments that are central to aspects of
its
existence. Unfortunately,
eating and pain are things that computers just
don't do.
These sorts of
motivators provide the background raison d'etre
for much
of the logic that
underlies the behavior of biological organisms.
How shall
our artificial
behavior-generation deal with the absence of
these
intrinsic motivations? It
seems completely "cheap" or "like
cheating" simply to write computer
instructions that say "try to find food" and
then assert that one has
created a computer program that experiences
hunger.
A start towards finding a useful understanding
of
reinforcement can be the
ideas from 19th Century philosophers involving
input
limitations for human
minds. The central point is "all information -
all
input - that a human mind
receives is mediated by the senses". It is
pretty
well known how sensory
receptors work, and pretty well known how that
input is
encoded by the
sensors themselves. Although it might be
technically
beyond us at the
moment, one can easily imagine intervening in
the
process, and sending to a
brain artificially
created sensory information.
It would therefore be possible completely to
modularize the
function of the
system's elements: there's a sensor, there's
the message
it sends, and
there's "central processing". If all input is
sensory, then it is reasonable
to assert all events taken to be reinforcers
reach
central processing as
sensory input. This is half of what we need: if
it's
correct, then the input
path to reinforcement can be automated, and
there's no
philosophical
"problem" about artificiality or motivation.
The other half requires that the modularity
postulated
for sensory pathways
exist as well within what I refer to as "central processing". This
very
well may be an oversimplification, but it's a
useful way
out of the problem
of defining motivation and reinforcement
described above.
If the pleasure
resulting from the satisfaction of
hunger is isolated neurologically, then
we can just as well imagine intervening in that
set of
messages as we can
between sensors and processors. If this is
true, than a
baby couldn't tell
the difference between its own perception of
pleasure and
the perception of
artificially induced pleasure. If this is true
then we
need not concern
ourselves with the artificiality of motivation
in computer
programs. We are
"free" to define motivators and to set up
definitions of pleasure, and
tropisms towards pleasure, without the danger
of creating
structures (in
imitation of biological ones) only by
"cheating". At some point in its life,
a baby "realizes" that Mommy
"wants" it to say
things. Therefore it tries
to talk. We can just assert to our program
"responding to input is a
positive thing" and add this fact to its
database of
behavior probabilities,
without worrying that this is any different
from smiling
at a child that
says "dada".
Structures for
the Implementation of Reinforcement
Neither single characteristics nor simple actions produce rich enough
results
for their reinforcement to be particularly
useful. The simple existence of a
light-sensitive patch on a flatworm would have
no value to the individual
and could not therefore participate in natural
selection. Such a structure
would have to be connected somehow to the
worm's behavior, and this
requires much more information than that
required to construct the patch.
It is necessary for combinations of apparently
independent events to be
reinforced as a group.
On the other hand complete organisms are much,
much too complex for
reinforcement to work. When the complete
organism receives positive
reinforcement, or when it survives and
reproduces, there is no way for
any process to separate out which of that
individual organism's properties
were responsible, and how much of which
properties were active.
Mother nature solved this problem in early
evolution by selecting not for
expressed characteristics or behaviors but for
the genes themselves. The
genes represent just that level of complexity
most practical for reinforcement:
complex enough to be connected dependably with
specific events, but local
enough to be dependably held responsible.
In this algorithm, the analog for the
light-sensitive patch would be a single
object, or a single process. The program
has hundreds of such simple
information-containing or behavioral options,
and even larger numbers of
parameters to vary in the process of carrying
out procedures. Small instruction
vectors exist that direct the details of
processes the program needs to carry out.
These vectors - containing roughly 16 bits
- are sufficient to direct one action.
They are structured exactly like words and
paths, so that they may be used to
construct sentences, and can be constructed
from them, and so they can be
contained as parts of words or the various
types of summed objects that
occur. They are also cataloged and stored
like words.
It is these objects that build up a useful repertoire of reinforced
behavior.
The program keeps a record of which vectors
were used in the construction
of any object or path, in conjuction with
details of the current situation. When
sufficiently similar situations occur again,
the levels of reinforcement of the
available vectors with effect their selection.
(Of course, the reinforcement of
one vector will cause reinforcement of other
vectors to the extent they are
"similar" - in other words, depending on their
"distance" apart in MS.)
Structures
for Handling Cognitive Dissonance and "Being Wrong"
A particularly thorny area is what to do when
the program
says something
incorrect, and then receives correct
information from
Teacher. Because of
their inexperience, it is natural that children
are
particularly prone both
to making inaccurate statements and to holding
incorrect
beliefs. In
response to this, teachers are equally prone to
making
statements which are
intended to contradict or to correct wrong
ideas.
A central goal of education, growing-up, and
training
computer programs like
this one, is that errors not persist or recur.
Three
things must happen if
this persistence is to be prevented. First, an
appropriate grammatical reply
has to be formulated to keep the conversation
going.
Second, the program's
internal response must recognize the nature of
the
contradictory input or
the mis-match between some aspect of the input
and the
program's data.
Finally, the database must be adjusted.
The first of these is expected to be taken care
of by the
program's usual
reply-formation routine, and should prevent one
type of
error persistence.
Consider the following exchange:
sentence #1: program: "Apples make great pets!"
sentence #2: Teacher: "No, apples aren't pets!"
sentence #3: program: "Apples not in [pets]?"
Sentence #3 is easy to generate because word
definitions
include
operator-pointers to classes to which the words
belong,
and to other words
that belong in classes defined by the words
themselves.
Having these two
replies to sentence #1 in its historical
record,
forward-chaining would
prevent to program from repeating the incorrect
statement
of sentence #1,
even though the reply-formation does nothing to
correct
the bad database
information or association which allowed the
incorrect
sentence in the first
place.
The program's
pre-processing mechanism will have options
for recognizing the
nature of the error. The first possible route
would be to
find a two-stage
connection. "Pets" has in its definition the
word "animal", and "animal" has
in its definition "motile", while
"apples" has a different, contradictory
value in the axis related to movement. This
sort of
multi-stage
cognitive-dissonance discovery, so easy and
clear to us,
would require a
procedurally useful understanding of the verb
"to
be" and a laborious
comparison of a large number of sets of axes.
Much
simpler and quicker is
the simple recognition that "pets" is a class
word, and that it has no
bomb to "apple". The program then
experiences no actual cognitive
dissonance, only the comparatively innocuous
lack of an
expected pointer. A
simple question generator then attempts to
confirm that
this lack of a
pointer is in fact meaningful - there is no
pointer there
because "apples"
is not in
the class [pets]. The confirmation of this fact would place a
negative bomb from [pets] to "apples", that
would inhibit the restatement of
the original erroneous sentence.
The addition of such a negative-reinforcer in
the
database doesn't address
the need to remove whatever resource allowed
the initial
erroneous
assertion. If the stored history of procedures
hasn't
recorded that the
incorrect assertion was the result of a random
choice
(such as the flailing
which is part of "starting from scratch"), then
it will have stored the
vibratory modes or transform involved in
generating the
assertion. These can
then be inhibited through the usual
reinforcement regime.
Finally there remains the possibility that the
database
of definitions
contains an error. As programming begins, I
have assessed
the need for
database protection to be greater than the need
for
automated database
adjustment, and thus I have allowed writing to
the
definition database
itself to take place only upon receiving
explicit answers
to standardized
questions such as "Should I change this value
in the
database now?"
Structures
for Functioning without Meaning
Many human users of language have limited
abilities to
communicate precisely
about "meaning". Additionally I suspect few
three-year-olds could tell you
in paraphrase what it means to "want"
something. Therefore it is natural to
conclude that communicable, internalized
"meaning", consisting of logical
definitions, is only one of a number of means
toward the
end of imitating
reasonable conversation. I opine that this mode
is
"late" - both in the
development of an individual's conversational
capabilities, and also in the
biological evolution of our species'
linguistics.
This conclusion forces another: an effective
imitator of
human conversation
(if not of pure human reasoning) is unlikely to
function
if it relies
exclusively on that "late" development.
Fortunately, there are aspects of
"earlier" behavior generators which are
substantially simpler to model than
meaning. These are not of any great interest,
but they
are necessary parts
of a program that is expected to generate
behavior. For
example, imitation
involves a good deal more than just aping,
parroting or
echolalia. It's a
fair bit of work to write a program even that
remembers
enough of training
conversations to imitate them. See CONVERSOR
and CHANTER below.
Keeping
Track and Post hoc, Propter hoc
A history must be kept, of what has been said,
by whom,
and when. Proximity
of words in Teacher's input is an important
first method
for establishing
associations. Surely one of the reasons we have
such
trouble with "post hoc,
propter hoc" is that it is one of a baby's
first
ways of forming
associations, and that as such, this invalid
reasoning
method gets applied
all sorts of times when it shouldn't.
------------------------------------------------------------------------------------------
The History of
Earlier Experiments
The series of programs leading up to the
current project
provided some
contact with ideas essential to the algorithm
as it is
currently conceived.
Specifically:
- distributed content-addressable memory
(Purr-Puss)
- pattern recognition (character recognition)
- large multi-dimensional feature spaces
(Organism)
- production systems & cellular automata
(insect
simulations)
- machine
learning
(automatic extraction of rules from an environment - Economist)
- creative behavior generation, originality
& expert
systems (Chanter)
- simplification of grammars for computer
representation
(Conversor)
OTHELLO
- 1982
"Othello",
also called "Reversi", is a simple game played on an 8x8 board,
onto which counters are placed alternately by
two
players. All that matters
for this project is that the edges of the board
are 1) of
special importance
to play and 2) are all equivalent (such that a
strategy
useful on one edge
is equally useful on the others.)
Board spaces may be empty or belong to
one of two players - these are the only three
possibilities. Therefore, the
state of each space may be represented by a
ternary bit
(a piece of data
consisting of 0,1,or 2), and all the possible
states an
edge can therefore
be represented by a number from 0 to (3^8).
This is not a
particularly large
number, and a data array can be declared with
3^8 places
for numbers from 1
to 8 (representing the 8 places on the edge).
First the
system is set it up
so that the computer watches people play. Every
time a
player makes a move
on a non-empty edge, we store their move, that
is, the
number representing
the space they moved into;
that move is stored at a location in the data
array determined by the state of that edge
before the
move was made. In this
way a database of moves onto edges is created,
that is
indexed by numbers
calculated from the state of the board before
the move
was made.
Look at
What you see
tells
What the expert player actually
the board
you where to store
does tells you
what to store
Later, after lots of moves are stored, when you
look at
the board ("looking"
tells you where something relevant might be
stored) you
can look in your
data array, and if you find anything in the
array (at the
memory location
which the board tells you to look in),
then you know
that the
datum stored
is a move that the expert player thought is
best, given
that board
configuration.
Various excellent computer programs exist now
against
which humans may play,
but if one is limited to 8-bit processor
technology and
32 kilobytes of
memory,
the capabilities of possible programs are severely limited.
Using
this stored database of edge moves improved the
operation
of one such
program substantially.
Two things are important to note, for the
purposes of the
history this paper
recounts: 1) little understanding of the game
was
required, nor was any
programming cleverness needed - this method of
improving
the play of an
extant program relies entirely on
un-intelligent brute
force; 2) no
searching is required when looking up a move to
use - the
current state of
the board directly provides the location of the
stored
move. "Search" is a
major subject in basic computer programming,
and is an
area in which the
polynomial explosion of task-duration can
become the
determining factor in
deciding whether or not a program can ever run
to
completion.
ROBOT
- 1983
A small robot vehicle was built, using as a
platform a
steerable motorized
model tank. An umbilical connected the model to
an 8-bit
microcomputer. Two
sensors were used. First, a bumper mounted on
the front
sensed the presence
of an obstacle (and stopped the motors) when
the model ran into it, and
second,
a stepping-motor-mounted, sonar range-finding
device from a
Polaroid
development kit enabled the robot to scan the
surrounding
area, returning a
low resolution image of the shape of the
environment.
These two sensors
returned information about the area around the
tank to
the computer. Using
the same brute-force learning idea described
for OTHELLO,
this robot learned
to recognize all the locations to which it had
access,
and learned to get
from any one of those locations to any other.
Once again,
no analysis (in
this case, either of the sensors' behavior or
of the
environment itself) was
necessary to allow for the functions of the
robot to
proceed. This is
important to remember: even though the behavior
of the
sonar sensor was
extremely bizarre and seemed unpredictable to
us, the
learning algorithm
didn't care - as long as the sensor's behavior
was
reasonably similar each
time it looked at the same location from the
same angle,
its output was
perfectly usable.
CHARACTER
RECOGNITION - 1984
Using a low resolution optical sensor mounted
on an x-y
scanner's transport,
information about small areas on a page were
read into
the same 8-bit
computer's memory. Using rote methods similar
to those of
ROBOT and OTHELLO,
and a transport-controlling algorithm similar
to ROBOT's
image-building
method, the program learned to recognize
typewritten
characters. This
scanner was initially intended to automate the
provision of input
data for ECONOMIST, another program described
below.
PURR-PUSS
The idea of content-addressability used in all
these
programs was inspired
by a program called PURR-PUSS, invented by
Andreae and
described in the book
"Thinking with the Teachable Machine". This
acronym stands for "programmable
un-primed rewardable robot with a predictor
using slides
and strings".
The locations in computer memory at which
OTHELLO's moves
were stored were
calculated from the current status of the
board.
Therefore, moves could be
found without searching a database of board configurations. Rather, the
board configuration itself told the algorithm
where to
look for a move. The
environmental images in ROBOT were encoded in
such a way
that allowed the
names of environmental locations to be stored
at memory
locations directly
calculable from the observed physical space -
no
searching through a list of
location-descriptions was required. And finally
in
recognizing characters,
the images themselves directed the computer to
memory
locations where
commands for moving the scanning sensor were
stored, or
where it would find
the names of characters recognized.
In each case, the algorithms were "un-primed":
that is, no analysis of the
relevant concepts was required to create an
algorithm
capable of functioning
in the environment.
Another interesting feature of PURR-PUSS is the
use of a
technique of memory
management in which information is stored in a
redundant
fashion, spread out
through available memory. Although the
mathematical
underpinnings of the
three systems are unrelated, PURR-PUSS,
holographic
storage, and the
cerebral cortex all share these properties. The
important
thing about the
PUSS memory-management algorithm is that it
provides
content-addressing,
like that described above for the three
programs, in a
way that allows
unlimited input strings (i.e., it is as if the
board
positions in OTHELLO
could be as large and complex as you like) and
can
squeeze very large
amounts of data (amounts which don't have to be
specified
in advance) into
large, mostly empty computer memories. The
original PURR-PUSS algorithm
can also be modified 1) to return ordered
series
of objects that are of unspecified (and
essentially unlimited) length, 2) so that
stored
memories can be erased, 3) so that the
output of some PUSSes may be used as input for
others, and 4) so that
conditions that resulted in storage of
information may themselves be recalled.
ECONOMIST
- 1985
Each of the applications described can be
viewed as
performing the following
task:
1)
look around
2)
calculate a memory location from the image received
3)
examine that memory location - if it is empty, ask Teacher for
input.
If
it is
full, return the information to the algorithm
4)
execute whatever commands are either found in memory or received from
Teacher; this changes the environment, making it necessary
to
return to step one
1
2
3
4
OTHELLO:
Examine
the Calculate a number Find a learned
move
Execute the
board
from 0 to
3^8
or ask Teacher
move
for a good one
ROBOT:
Scan
the
area Calculate an
address If the spot is
known, Execute the move
return its name
Or,
command the robot
to move and re-scan
CHARACTER RECOGNITION:
Read out from Calculate an address
If the image is a
known Execute the move
the imager
object, return
its
name, or
move on to
or command the scanner
a new region
to move the sensor to
examine
The three programs described so far each looked
at single
events (either board
configurations, images of the robot's
environment, or
images from the
scanner) and store information about those
isolated
events. Another task
structure to which PURR-PUSS is suited involves
series of
numbers. Take for
example the following series of numbers, as if
printed on
a paper tape:
1
2 3 1 2 3 1 2 3
Nothing is more primeval for a computer than
the
examination of successive
spots on a tape! Suppose we set up our system
so that it
can "see" two
numbers at a time, and that it can move along
the tape.
It is useful to
imagine a transparent "window" moving along,
which is able to see only a
fixed amount of the tape. Now, use what you see
in the
window to define a
memory location. After you look at your current
window,
and access the
memory location associated with it, you slide
the window
along one more
place, revealing the next number. This number
is what you
store at the
memory location you defined by the window as it
was
before you slid the
window along. Functionally the two numbers in
the window
are used to predict
single numbers that follow, and the ability to
make
predictions
arises from a
brute-force examination of a training set.
In the case defined here, only three memory
locations
would be used. The
simplest formulation would be to set up 31
memory
locations, and then
interpret whatever you see in the window as a
decimal
number: 12, 23, or 31.
As you slide the window along, every time you
see
"12" you would store a "3"
in the 12th memory location (or increment a
counter held
there), every time
you see "23" you would store a "1" in
the 23d, and every time you see "31"
you store a "2" in the 31st. After going along
three places, you would be
able to predict what the next number was going
to be by
looking at the
current window, then looking in that memory
location.
This is a PUSS: a
predictor using slides and strings.
This view of the task, as making predictions
about the
next appropriate
number in a series already established,
suggested to us
that the method be
applied to economic time series. (At this time
the
project involved three
people: myself, an electrical engineer, and a
recent
emigre from Russia, a
physicist, who was "hot" for the stock market.)
For example, take monthly
interest rates for the last 10 years. Set the
algorithm
to look at index n
(representing, say, April, 1975). Use data
points (n-1,
n-2,
n-3,....) as environmental data to calculate an
address
in memory. Store the
actual value that occurs in the known sequence
(data
point n) in that memory
location. Increment to counter and continue. In
this way,
a database of
predictions is built up, and if, this
month, the values
of the last few
months point at a full memory location (meaning
you had
seen this month's
economic environment before) then you have a
prediction
to make about next
month's value for that time series. This is
roughly equivalent to an
autocorrelation calculation on a function.
This method lends itself easily to the
combination of
data streams. Multiple
time series can be used to provide
environmental
information (for memory
calculations) allowing storage of any other
time series.
Thus the algorithm
brings about a correlation of numerous data
streams for
the prediction of
one. This is similar to calculating
coefficients of
correlation among
multiple data streams, but can be more
flexible. To
describe it in words:
one could use a series of the most recent
interest rate
figures to predict
the next interest rate, or you could use: 1) last month's interest rate,
plus 2) the same month's price of gold, and
3) the last six
weeks' closing
Dow, to predict the average temperature in
Chicago. Any
combination of time
series data can be used to attempt to predict
any other
time series.
What is most advantageous is that the method
can be
tested to whatever
extent available data allows. A window can be
defined,
and then historical
data is allowed to percolate through the system
- if any
predictions are
made along the way, they can be compared with
values
which actually
occurred, thus providing an exact measure of
the window's
relationship to
the series one was trying to predict.
The output of a single PUSS module itself
represents a
fundamentally
different type of data stream. One can
therefore combine
information from
the environment with predictions made by some
module of
the program, to
create a multi-level structure that produces
much more
complex
inter-correlative behavior. Such multi-level
structures
may not correspond
obviously to any standard technique of
statistical
analysis. Suppose, for example,
that there is a target time-series one wishes to predict, and that one has access
to three other time series. One could simply define a window that includes
all four time series, but this creates large feature spaces and requires
an equally large amount of training.
Imagine, however, that a PUSS is used to make predictions based on two
of
the available series. The predictions that it makes effectively combines the
information contained in the two windowed time
series. This new series
-
the predictions - can be used in a window for the prediction of the target series,
and the window will be compressed, allowing for a smaller amount of training
to succeed at making predictions. This is, of course, one of those information-
theory situations in which information is lost in the transformation of data
brought about by the 2-series PUSS, but the availability of a technique that
reduces the often vast sizes of the relevant feature spaces can come in handy when
the target system is sufficiently deterministic. Which, by the way, we
discovered,
the relevant economic series are not. Chalk up another victory for the
efficient-market-hypothesis.
SMON
- 1999
Before proceeding to discuss applications which
involve
some cybernetic
complexity, one rather mundane application of
this
memory-management
philosophy must be included. The act of
creating large
programs in the
PASCAL programming language requires a great
deal of
extremely repetitive
and, once it has been begun, predictable
behavior on the
part of the
programmer. This is exactly the sort of
behavior which
all these simulations
address, and it seemed silly not to use the
method to
lighten the load of
its own creation.
This goal was to write a program which resides
logically
between the user of
a computer and whatever software is being run.
The
acronym SMON means "smart
monitor" with this placement in mind. What is
needed
is a keystroke monitor
which remembers what has been done before, and
can offer
the user assistance
at a speed faster than the software user's own
typing.
The program
calculates memory addresses from short
sequences of
keystrokes as the user
types. Whenever a memory location defined is
not empty,
you know that the
sequence just typed has been typed before.
Because of the
details if the
PUSS storage algorithm, finding one prediction
can lead
directly to an
ordered series of stored data which can be
offered to the
user as a
prediction of what was about to be done.
Another detail
of the PUSS
procedures allows multiple variants to be
stored and
offered to the user
selectively, based on the frequency of their
previous
appearance. In this
way commonly typed sequences in programming can
signal
the smart monitor to
offer options automatically. These options
would be text
to be inserted,
automatically - that is, without typing every
character.
A typical situation involves the decision to
insert, between routines
already written, code for a function or a
procedure. Such
an action
frequently occurs after the programmer has
written a
'calling' line in a
routine higher up in the stack. For example,
suppose the
programmer writes a
line such as
x:=FunctionName(
y );
and then moves in the code-text to the place where functions
are kept, and inserts a couple of empty lines in preparation for encoding the
function. The smart monitor program would
then
offer the following two
options to the user:
Function FunctionName( temp:integer ) : integer;
var I:integer;
begin
FunctionName:=
end;
Procedure
( temp: integer );
var I:integer;
begin
end;
The options can be selected using single
keystrokes - a
substantial saving
in typing.
As it turned out, computer technology was not
quite up to
the requirements
of this program at the time of its creation. If
the same
internal processor
(Intel - 486) was used to run both the smart
monitor and
the target
software, an unacceptable overload occurred.
The monitor
program had to be
tested, and its final form contrived, using a
second
computer set up to act
as the keyboard for the computer being used to
run the
PASCAL editor. Since
the operation of a PC keyboard is hardly
trivial, an
extra hardware
component had to be inserted between the two
computers
which converted ASCII
from the monitor-computer's output port into a
keyboard-emulating series of
bytes. When this device, called a VETRA, was
used, the
two computers acted
in perfect concert to assist in writing
programs.
CHANTER
- 1986
The combinations of multiple data streams in
ECONOMIST
suggests the behavior
of grammars. The first grammar examined was
that of
Gregorian Chant
melodies. These melodies provide two data
streams as a
starting point: a
series of pitches, and a series of notational
symbols that are generally
taken to have rhythmic implications. A program
was
devised for composing new
melodies in the style of the training set
(which was
limited to Alleluiae in
the Phrygian mode). Essentially, the program
would scan
examples, storing
information about the rhythmic series and about
the pitch
series
independently. Predictive modules were soon
able to begin
producing output,
and this output provided a second level of
pre-processed,
or pre-correlated
data. After scanning "environments" - namely, series of pitch-rhythm
combinations -
and storing, at each opportunity, whatever the real melody
"did" in that "environment" - it was
possible to set the program running
from scratch, allowing its own predictions to
become a
new data stream,
which then became the "environment" that the
program used as a basis for
more predictions. This process was allowed to
run until
it predicted the
combination of
symbols known to represent the end of a melody. The
resultant series can be executed by human
performers, and
if the same
performers then execute examples from the
training set
("real" Gregorian
melodies), one is in a position to apply part
of the
Turing test for the
existence of artificial intelligence. Namely,
if the
source of the performed
examples cannot be identified by an observer -
that is,
if the
computer-generated examples cannot be
distinguished from
the examples from
the training set - then one of the conditions
for passage
of Turing's "test"
has been satisfied. Performance of the six
melodies in
the appendix
established that CHANTER had succeeded in this
regard.
No analysis of Gregorian melody was undertaken
to allow
the program to learn
to imitate the style. Likewise, little planning
of the
"compositional
process" was required. The technique is almost
as
much a brute-force,
rote-learning method, as was the learning of
moves in
OTHELLO. It begins to
become important with CHANTER that no human
understanding
of the systems
involved is required. This is important,
because,
starting with CHANTER, the
systems involved are becoming so complex as to
be beyond
current human
capability unambiguously to analyze.
Rule-based musical composition programs are fun
to play
with, but they
almost always generate painfully dull, nearly
featureless
output, which is
extremely easy to distinguish from examples in
the
training set. CHANTER had
to be equipped with one special step before it
began to
create examples that
were successful.
Although I am a very poor composer, I am an
experienced
and facile keyboard
improvisor of useful musical material (having
played for
hundreds of hours
of dance classes). The mental procedure I use
is
absolutely clear and
conscious: listen to what was just played, and,
using
years of experience
with music, "calculate" what "should"
come next. Play whatever should come
next, and then repeat these steps for as long
as
necessary. "Calculating
what should come next" is another way of saying
"imagine what might sound
right". In the process of improvising,
especially at
a fast tempo, one finds
that one is almost constantly trying to
"recover" from some unintended,
random accident of the hands.
A variant on this idea of "recovery" allowed
CHANTER to succeed. If all the
program ever does is "follow the rules" that it
has learned by examining the
training set, then it is missing one element of
the
compositional process: a
composer decides at every juncture whether or
not to
follow established
procedure, or to veer off into original,
inspired
territory. Having executed
an inspired move, however, great composers
usually return
to predictable
rule-based behavior, as if to "recover" from
their own inspiration. CHANTER
models this procedure by periodically relaxing
the
control of its behavior
by the rules in the database, and allowing a
rare or a
relatively random
event to occur. This is allowed to happen
rarely enough
that the program has
time to recover from the unusual move in ways
which match
the most standard
practices it has learned. Further, the
"inspired" moves are never allowed
either at the very beginning or near the end of
a melody.
The addition of
this procedure was the programming idea crucial
for the
creation of examples
that observers couldn't distinguish from the
training
set.
CONVERSOR
- 1987
The next grammar to be examined was that of
natural
conversational language.
Two data streams are extracted from training
set
conversations, data sets
which are very similar the two streams used in
CHANTER.
One stream consists
of numbers representing words. The second
stream consists
of the
parts-of-speech of those words. At its core,
CONVERSOR
first constructs a
series of parts-of-speech for a reply, and then
assigns
specific words to
those parts of speech. There is once again
considerable
interaction between
the two data streams, mediated by internal
predictive
modules at secondary
and tertiary levels. Once again, after
examining
thousands of words of
training, the program can be set running,
creating its
own independent
replies to comments made by a human. There is, within the system, a
rigorous prevention of plagiarism, so that no
reply of
more than a word or
two ever made by CONVERSOR was identical to one
from the
training set.
Rather, the program reconstructs its own sentences from scratch. Thus
CONVERSOR is completely different from programs that operate by remembering
previous conversations.
Of course, natural language is worlds beyond
Gregorian
melody in complexity,
or else our sensitivity to the definition of
"correct" with respect to
reasoning and grammatical construction is much
greater
than our sensitivity
to the definition of "correct" with respect to
a melody. In any case, the
results of CONVERSOR are considerably less
successful,
with respect to
Turing's test, than are those of CHANTER. The
"conversations" are only one
step more advanced than programs like "Eliza";
examples appear in the appendices.
CHANTER melodies may be viewed by going to
http://www2.potsdam.edu/lanzcc
and downloading the chanter.pdf file.
ORGANISM
- 1992
Several small simulations were also created
using the
memory management
techniques described. The initial targets were
insect
behaviors that lend
themselves to treatment as cellular automata.
The idea is
to investigate the
level of complexity required to generate simple
collective and constructive
behaviors. These programs were begun in 1985
and
continued to be
investigated for several years.
During late Summer afternoons, clouds of dozens
of gnats
can be seen
hovering and swirling about. In these clouds,
individuals
fly around in such
a way as to avoid clumping on a small scale
(and
presumably avoiding
collisions) while maintaining the coherence of
the group
as a whole. What
complexity of visual imaging, and how many
rules of
behavior, are required
to imitate this result? A simulation called FLY
examined
this question. As
screen-world flies move about, their simple eye
provides
an image of other
flies' positions. This image encodes a memory
location,
and in each cycle
through the individuals (during which they can
traverse
one screen pixel)
either the teacher provides a move, or else a
remembered
move directs the
fly's next motion. Within a few cycles,
individuals begin
moving on their
own, and by providing moves at each stage which
appear to
be appropriate for
avoiding collisions while retaining large-scale
cloud
coherence, Teacher's
input makes it possible for clouds with the
required
characteristics to
form. This program shows that screen world
flies require
only a few bits of
visual imaging and a small amount of
rule-memory to
imitate this behavior of
the real organisms. The result is chaotic -
that is, it
is entirely
deterministic, but non-periodic.
Similar simulations were constructed for a
spider
building an orb web (the
"spider" sees with its legs what has already
been built, and decides on that
basis whether or not to lay down new webbing or
to move
somewhere) and for a
colony of screen-world ants who dig a burrow,
dispose
of the excess dirt,
find food particles, and store them back at the
burrow.
Similar in structure was the program SOCCER
(1989). Screen-world soccer
players must behave in ways that can be
directed using
input paths and
window structures similar to the insect
simulations. The
goal of this
program was to show that interactive game logic
could be
addressed using the
same simulation ideas. In fact the behaviors
required for
effective tactical
cooperation required an expansion of the scope
of the
input encoding. In
programs involving environments less complex
than SOCCER,
it was possible to
assign single input parameters to each bit of
the
available input channels
(or to use PUSS terminology, to each digit of
the
window). For instance, one
Fly "eye" was always associated with a specific
bit of the input channel.
With SOCCER it became necessary to automate the
definition of input channels
so that they were associated with different
observations
under different
circumstances. This sounds rather more
complicated than
it is. The first
understanding should be easy: clearly, players
need to
think differently if
their team is currently on defense or offense.
This fact
is always
available, and thus input channel definitions
can safely
be changed back and
forth as possession changes. In fact stability
of these
input channel
definitions is of no importance - they can be
changed
whenever the
environment presents unambiguous changes of
circumstance
the scope of which
exceeds that of the input channel involved.
Such
flexibility in input
effectively compresses more data into the same
amount of
input channel
bandwidth. In this program, the factor of
compression was
approximately
four.
These initial simulations of cellular
individuals'
behavior were expanded
into the largest program yet completed, called
ORGANISM.
In this program,
the role of the teacher is eliminated, and is
replaced by
a system of
positive and negative reinforcement based on
natural
selection. Screen-world
entities, each with about 120 variable
characteristics
(for instance, each
individual falls somewhere in a spectrum of
tendencies to
reproduce sexually
as opposed to asexually), live in an
environment
including resources (like
air, food, allies, and mates) and hazards (like
cliffs,
predators, and bad
"weather"). The characteristics of the
organisms and the characteristics of
their behaviors are both defined as points in
large
feature spaces. A very
large population is created and allowed to
evolve.
Behaviors are also
encoded in a way allowing for variation, just
as the
characteristics of the
individuals are variable across generations.
The presence
of resource
shortages and predation provide an environment
in which
"stronger" or more
capable individuals are more likely to be more
successful,
and therefore
have more opportunity to reproduce. The
reproduction
algorithm involves the
possibility of mutation (of physical
characteristics as
well as of behavior)
and for that particular source of trait
re-combination
provided by sexual
reproduction. Particularly difficult was the
interface -
how is the
programmer to know what successes and failures
happen? A
large statistical
analysis module had to be created to keep track
of the
character of the
evolving populations and to allow the observer
to
evaluate the progress
being made. This program entails roughly an
order of
magnitude more code
than CONVERSOR or CHANTER, and comprises about
8,000
lines of PASCAL
code.
--------------------------------------------------------------------------------------
Appendix I:
Linguistic Axes
The version of the list of axes given here is
the most
readable version,
rather than the most rigorous. For example, it
is easiest
for the person
assigning values to words, to have axes that
are defined
like no.3,
"kingdom".
3:kingdom
Animal(1-12) Mineral(13-15)
Vegetable(16-20)
If the values "1" through "12" are
used for "animal", and values "13"
through "15" for "mineral", then there is
a large discontinuity between the
values 12 and 13 - a discontinuity that is
mathematically
invalid. To bring
the database back to sensible arithmetic
structure, this
axis would be split
into three separate axes each of which would
have more
natural relationships
between numerical values and subjective
evaluation. This
split can be
accomplished automatically after definitions
are put in
(doing these
definitions requires an unreasonable dedication
of time
as it is, even
without the large increase in labor that would
be
required to put separate
values in for each split-up axis).
These axes appear in the order they were
created, which
depended on the
words being defined. In the programs I have
written to
use these axes, they
are collected into groups unified conceptually,
if
subjectively. In some
axes discrete values are suggested (as when
successive
elements are
separated by commas) and in others a continuity
is
postulated (whenever
elements are separated by ellipsis). In some
cases like
"mass" a scale has
been defined so that the axis will have a
usefully
limited set of values.
(Pattern recognizers can't handle continuity.)
Some axes
are just a list of
disparate items, between which no continuum of
points
could be said to
exist:
67:arith
root,
div, mod, sub, add, mult, power
These would be split into seven axes, each of
which has
the form
xx: operator doesn't
involve
this operator.....................does involve it.
Some axes have single, independent values tacked on to one end of the
spectrum of values, that would have to be split
off in a
similar way:
71:timely? Timely, short/shorter.....
med/same.......long/longer
Here are some samples of the way definitions
turn out:
sticky :
physical-substance
2-dimensional
add
purpose: require-remain-location
shut up!: command:quiet
negativelyDisposed
clock : physical-object
breadbox tool purpose:time-amount
360 (or loop)
1:awkwardness awkwd/clumsy/badfit.....graceful/deft/goodfit
2:flex
noshape/infiniteFlex......resist
change/limitFlex......fixed form
3:kingdom anim(1-12) Min(13-15) Veg(16-20)
4:phaseMatter nonExist, energy,
plasma, gas, superFluid, liquid, gel, solid, condensed
5:mass
atom,
bacillus, flea,1(gram), 1k, 50k, 100k,1m, mountain, planet
6:size
atom,bacillus,flea,1(cm),1m,10m,100m,km,mm,l.y.
7:acute
circle,
arc, concave,convex, acute, line
8:curvy?
Straight...curved...360...wound-up...knotted
9:intensity min....max
10:freq
never,once,1/kyr,1/cen,1/yr,day,hr,mn,1hz,10hz,100hz,1khz,Mhz
11:life
inanimate,
dead, sick, animate
12:intel
no-organiz,
chemical, virus, unicell, bug, reptile, mammal, ape, moron, human
13:arty?
Scientific/math/analytical.....art/fuzzy/not.analytical
14:reality conceptual....physical
15:intentional? unthought/accidental/random.....conscious/purposeful/patterned
16:claim? Assert/claim....act/demonstrate
17:opine
believe/guess....perceive/selectWithKnowledge
18:fun
serious/work....joke/fun
19:inProgram outside,
progData, ProgProcedure, ProgStructure, code
20:contain internal
...entering....border...exiting....external
21:commerce (produce?),
invest, buy, hold, broker, sell, produce
22:transform stable....change
23:fullth
owe,
empty.....full,overflow, engulf
24:fame
unknown,
known to 1, someKnow,
allKnow
25:familiarity neverheardof,
...,neverseen...,heardof,...seenAgo...seenrecent....hereNow
26:focus
unnoticed...background...foreground...onlyThingSeenHere
27:ornament functional/structural.....decorative/ornamental
28:correct false/wrong......true/correct
29:utility? DoesntInteract....PartOfBody....useful/tool
30:advargument badfor...neutral...goodfor
31:typeValue emotional,
info,
physical, monetary
32:nutri
poison...nonfood...nutritious
33:friendliness positivelyDisoposed....negatively
34:anger
murderous....mad....quiet....pleased....loving
35:movemode aviate,
fly, swim, sail, ooze, walk, roll, ride
36:grouped disconnected,
dissimilar, similarityBound...connected
37:similar noIntersect....different....inMannerOf/similar....identical
38:rawness oreMixture....rawMatInSitu....rawmaterial...manufactured....manufacturedTool
39:Ur
UrForm....developedForm
40:health sick/broken/degenerate.....well/whole/archetype
41:regular unmeasured/irregular/noPattern/random.....measured/regular/pattern
42:complexity monadConcept....HumanMind
43:info
real/physical/objectItself.......symbol/info/label
44:entropy disorg/maxInfo....organized.....unary/minInfo
45:temp
passive/still/cold.....active/hot/moving
46:(x)caused NoAgent/observer......causedBy(x)/actor
47:depend indep/noCause......dependsOn/hasCause
48:argument ego,sib,parent,surrogate,relative,encaps,ally,randCollective,enemy
49:involves? Others...noObject...self...allObjects
50:unique? 1ofAkind...excludeFromGroup....memberGroupOfSims....memberGroupIdents
51:symmetry oneway/onesided/asymm....mutual/reflexif/symm
52:own
unpossessable...possessableButFree...possessed
53:partitif superGroup....wholeThing....partOfLargerWhole
54:relatif unrelated....related
55:includeRelate internal...external....between2....amongMany
56:person passiveListener,
1st, 2nd, 3d, apostrophe
57:heirPos peon...master
58:ATN
start,
node, link, branch, loop, end
59:serialPos 1st....mid.....last
60:span
point/monad....lineSegm/definedObj.....infin/all
61:dimension 1,2,3,4, more
62:sign
-,
=, +
63:equality < ,
=,
>
64:topolDegr noholes...torus....2....3.....lots
65:comparAmt little/less.....lots/more
66:number lack/owe/neg....0.....1....few...lots....infin1....infin2
67:arith
root,
div, mod, sub, equal, add, mult, power
68:makeBreak destroy/disrupt/damage.....create/facilitate/fix
69:chngState cantChng....stableCanChng....changeable...chngsItself
70:sinceChng current,
newSit,
oldSitRemains, oldSitGone
71:timely? Timely,
short/shorter, med/same, long/longer
72:dateType calendar,
ProgramDate, ProgRevision, ProgCycle
73:timeSpan point,
sec,
min, hr, day, wk, yr, cen, millen, geol, cosmic, inf
74:tense
pluPast,PastMoment,FutInPst,PastContin,Pres,
Progr, fut, pastInFut, pluFut
75:sensoryMode read, see, hearLang, hear,
presence, pain, temp, smell, taste, proprioception
76:acuity/resol/fieldview
Coarse......Fine
77:signal
transmit,
0...weak....100%...overMod
78:grammarObj passive/actedOn,
dirObj, indirObj, subj/actor
79:PofS
Noun,pronoun,article,prep,conj,interog,excl,salu,HITvbs,adv,adj
80:covered surface/uncov/unimpede/open......submerg/cover/block/closed
81:exchange keep....exchange.....alternate
82:relateType abov/belo, frnt/beh,
abut/sep, on/off, in/out, near/far
83:locates noLocation,
in1place....involvesManyLocs
84:planes x/wdth/frnt....y/lngth/crossSect...z/hght/side
85:3dir
1...26
directions ( 8 each plane, 45deg interval)
86:directions nodir....1dir....manyDirs....AllDirs
87:motility doesntMove...can'tMove....canMovStill...moving....movesItself
88:ToAway approach....be/have.....moveAway
89:giv&take give/provide/alms....take/require/needAlms
90:choose? Unchooseable.....takeWhatCOmes.....abstain....activeChoice
91:question? Answer/comply.....question/request
92:grammMood indic, condit,
subjunc
93:goodBad1 obscur/satan/offens/unacceptable.....pure/divine/pleasant/sociable
94:logic
nonSequit....logicalProgress
95:special universal,
ordinary, special, unique
96:clean
dirty/dross....clean/noContam
98:occup
art,
intel, prof, publ, finance, service, merch, manuf, farm
99:PredRemem remember....findOutNow....predict
100:PofSinvolve N,pronoun,article,prep,conj,interog,excl,salu,HITvbs,adv,adj
101:set
union,intersect,Sep(canMap),disparate,mixture
102:color IR.......UV
103:Utransf involvesTheNegativ.....KnownNotToInvolv.....Involves
104:moveReal physicalMove.
.rotate...changeInObserv....ideationMove
105:likes? Hate....love
106:attract? Repulse....Desire
107:happy? Sad...Glad
108:arousal depress....excite
109:certainty unknown/uncertain.......known/definite
110:agree shutOut,
disagree......acceptInput.....agree
111:converse
narrate/describe/monolog....converse/negotiate/alternate...argue/groupMeeting
112:why
justify,
explain, agree, assignResponsib
113:serial single....unrelated....RowOnOneAxis.....RowInAspace
114:causal unrelated....follow...followAndRelate
115:command request.....command
116:represent antecedent/label.....repetition/itself.....classify/symbolize
117:invent plagiarize....quote.....originate
118:to be
propertyOf,
inProcessOf, identity, equivalence, ClassMember
119:property hasOppositeProp....neutral....hasProperty
120:operate ZeroCrossSection,
operatesOnWeakly.....operatesOn/isSubjectTo
121:orthog 90....0
122:remember noBuff...buffer....ageStore....store...analyze
123:dirRelate diffAxis, reverse,
turn, sameDir
124:dataType value, Axis,
point, cloud, pair, path, crawl, transform, transf^21
125:certainty
unknown/uncertain...known/definite
126:agree
shutOut...disagree...acceptInput...agree
127:conversation narrate/describe/monolog...converse/negotiate/alternate...argue/groupMeeting
128:blame none...nature...animal...person...fate...god
129:answer"why?" dontKnow...assignBlame...egoAgrees...explain...justify
130:ordered/serial? single...unorderedPlural...serialOrder...multi-dim-ordered
131:causal unrelated...follow...followANDrelate
132:command dontCare...allow...request...command
133:represent antecedent/label...repetition/itself...classify/symbolize
134:original plagiarize...quote...originate
135:propertyPoint noProp,noPoint...pointsAtOpposite...indicatesPropertyOf
136:pointerDirection back/down/upStack...sideways/toPeers...forw/up/downStack
137:processState notYetBegun...inProcess...finished
138:operatesOn? zeroCrossSect...operatesOnWeakly...operatesOn/isSubjTo
139:orthog 0...90
140:HowRemember noBuff/temp...buffer/STM...ageStore/LTMbutForget...store/LTM...analyze/decompose/relateANDstore
141:dirRelate diffAxis...reverse...turn...sameDir
142:dataType scalar...axis...pairVector...point...cloud...path...crawl...transform...transform^2
143:calendarDate BigBang...historical...now...endHistory...BigCrunch
144:relativeDate longAgo...recent...aroundNow...nearFut...farFuture
145:programTime startUp...now...future
146:revisionNumber original...current...future
147:wordlike least...most
148:ATNlike least...most
149:transformLike? least...most
150:bombLike? least...most
151:pathLike least...more
152:vibrLike least...most
153:pascalType value...axis#...axVal...level...deftrans...template...cluster...word...cloud...bomb...path...transform...ATN...vibr
154:IntrnlSOurce ask...procedure...calculate...primitive...PP...vibr...resonance
155:parserVerb notMultiVbForm...requiresTranslate
156:phraseSignal neverSignalsPhr...oftenSignalsPhr
157:homonym noTwin...gotTwins
158:GramCase nominative...dative...accusative
159:modifiesOP no pre-defined values (obj label numbers)
160:InLevPointFrom no pre-defined values (obj label numbers)
161:TweenLevPointTo no pre-defined values (obj label numbers)
162:TweenLevPointFrom no pre-defined values (obj label numbers)
163:ObjPointTo no pre-defined values (obj label numbers)
164:word's d.o.b. no pre-defined values (scalar)
165:lastUseDate no pre-defined values (scalar)
166:numLevels no pre-defined values (scalar)
167:FromWhom# no pre-defined values (obj label numbers)
168:hits no pre-defined values (scalar)
169:CurrReinfLevel negatively...neutral...positively
170:LevelIDnum no pre-defined values (scalar)
171:ImaWord no pre-defined values (obj label numbers)
172:PointWord no pre-defined values (obj label numbers)
173:PointLevel no pre-defined values (scalar)
174:PointAxis no pre-defined values (scalar)
175:PointValue no pre-defined values (value designation)
176:species no pre-defined values (scalar)
177:genus no pre-defined values (scalar)
178:gender femme...neut...masc
179:PPnoveltyFlag NotNovel...NewToCVP
180:privacy internalConv/privat...output/public
181:loopStates list
182:command list
183:processSucc fail...succeed
184:speechInton silent...loTone...hiTone...accent...frantic
185:parserSignal list
186:scalarValue no defined...elements
187:MSdistance zero...100
188:axisAngle zero...2pi
189:BombPower zero...MaxExplo
190:BmbSerRank first...hundredth
191:BmbSerMemb list
192:ClustUse list
193:ClustOriginType list
194:CLustFunc list
195:ClustGlue list
196:ModalCan neg...pos
197:modalMust neg...pos
198:modalTry neg...pos
199:ModalDoes neg...pos
200:modalWantsTo neg...pos
201:modalOften neg...pos
202:We'rePointrs&ax list
203:We'reLevel list
204:We'reDictWord list
205:We'reCloud list
206:We'reDefTransf list
207:We'reTemplate list
208:We'rePath list
209:We'reSplash list
210:We'reTransform list
211:We'reBomb list
212:We'reBehavior list
213:SumWdInPath no pre-defined values (obj label numbers)
214:SumLevel no pre-defined values (obj label numbers)
215:SumAxInLevel no pre-defined values (obj label numbers)
216:LevelType list
Appendix
II: CHANTER & CONVERSOR output
CONVERSOR Conversations (1987):
Teacher
Computer
Hello.
Hello.
How are you?
OK.
That's good.
How's
that?
Wouldn't you prefer to be OK?
I can guess.
Do it.
I
do.
That's right.
How
do I know?
You tell me.
How
should we tell think you?
I don't know.
Do
I?
How should I know?
I
don't know.
This is silly.
What
is this?
Our conversation.
What
do you mean conversation?
That's what I mean.
Do
you think I'm glad you mean that?
-----------
Hello.
Hello.
How are you today?
Who
wants to know?
I do.
You
are what?
I'm your parent.
That's
ridiculous.
But it's a fact.
You
mean today.
Yes.
Is
it?
I already told you.
Me
really dumb.
That's OK.
Why
is that?
Because you are just a program.
Is it important?
It's very important.
Why
is it?
There aren't many talking programs. You are funny.
Thanks.
How's
that?
Never mind.
Now
why do you do not.
What?
We
need some cats.
Why?
I
like cats.
Cats bother me.
OK.
They're so dumb.
You're
right.
But you like them anyhow.
It
is all I.
What?
I
do not know ridiculous.
You make no sense.
I
don't like that.
I'm not surprised.
I
can't understand you.
Do you understand anything?
Yes.
Like what?
What
do you want to be?
Answer my question.
What
answer question?
In this
web-published paper, the CHANTER output
has to be a separate file.
Exit from this text, and click the link to Chanter melodies - or, go to
http://www2.potsdam.edu/lanzcc
and click the chanter link.