The Geometry of Meaning: a fantasia

by Chris Lanz


Abstract


A mathematical model for linguistic meaning is proposed, and some implications

for artificial intelligence programming are explored. The model and procedural 

ideas from the work of John Andreae are combined in a single large structure

intended to perform three tasks, 1) conversation in any human language,

2) imitation of any style of melody, and 3) navigation in a virtual physical

environment. The model directly suggests solutions to various problems,

including 1) data structures;  2) the definition, construction and deconstruction

of various entities: words, sentences, phrases, parts of speech, and transforms

between sentences; 3) originality and creativity as opposed to a degree of

parroting; and 4) the interplay of generality and specificity. A method

for allowing learning from conversation to inform and reform the topology of the

proposed meaning-space is suggested, and a variety of mathematical structures

for learning are provided. This algorithm is the result of many years of experiments

in machine learning, and the history of those experiments is included after the

description of both the  model and of programs expected to result from it.


Introduction

 

Since the time of Descartes, scientists have been representing aspects of

our world with mathematical structures. Since much "reason" and much of

"reasoning" are reproducible and communicable, they would appear to be

likely candidates for such representation. This text describes one attempt

to represent language in a way that incorporates some of the reasoning that

underlies everyday conversation, and in a way that is compatible with other

tasks common in human behavior. The attempt is being manifested in a

computer program that is intended 1) to learn to converse in a way

comprehensible to native speakers of whichever language is used, and 2) to

use a programming structure functionally identical to one already shown to

be appropriate for musical composition, locomotion/navigation, character

recognition, and playing some games.

 

This project is ongoing, and the current algorithm represents the latest in a

22 year series of experiments in machine learning. In retrospect the earlier

programs are understood to have been preparatory exercises for the current

project, and they are described in the second part of this paper. They provided

experience with a number of areas crucial to the current algorithm.


I have tried to use tenses very carefully in this text: present indicative

means "code exists and runs"; simple future means "I am certain that

I know exactly how to write the code, but haven't done so yet"; and

any type of conditional or subjunctive means "I'm not sure this can be

 coded" or "I have no code in mind for this module yet."

 

Models should make predictions that can be tested experimentally. This

algorithm represents a model for a behavior-managing central processor, and

when the behavior involved is linguistic, the model will predict, for

example, the existence of words which are actually found in the dictionary,

or the need for new parameters that must be added to the definition-space.

There is also a mechanism for optimizing the topology of the space in which

words' definitions exist (this is equivalent to saying that the algorithm

can make certain predictions about the axes in that space), and the program

will produce some sentences which are "legal" and some which are "correct"

for the grammatical  and reasoning environment in which it operates (this is

equivalent to saying that it makes predictions about combinations of symbols

that are legal in a grammar).

 

I hope that this paper will be of interest to other workers interested

in machine learning and natural language, and that through them I will find

other research that is related. More importantly, I hope to receive advice

and criticism that will help me progress. It is unpleasant to write about a

project that has produced no results as yet, but I can see that the road

ahead is years long, and experience has shown that isolation is

fundamentally counter-productive. Although I am working on a couple

of programs for a software company, my "day job" in an unrelated field prevents

the type of collegial interactions most researchers enjoy, and thus I am

presenting my work in this form.

 

Axes, Words, and Meaning Space

 

 

     Definitions

 

   Some aspects of words' definitions can be expressed like this:

 

     Red -->is a property of --> apple

     material --> is a property of --> apple

     size --> is a property of --> apple

     apple <-- is subject to <--eat

     teeth -->act upon-->class of things subject to eating<--is a member of<-- apple

 

"Redness" can have a value, as can "materiality" and "size" - that is,

things can be more or less red, more or less material, and larger or

smaller.  Thus part of the definition of "apple" is a point in a 3

dimensional space whose axes are redness, materiality, and size. A musical

event is inevitably very much simpler, often involving no more axes than

pitch and a time value. In conversation, physical locations for locomotion

are seldom expressed  simply as coordinates on a map (we are prone to saying

things like "I would like to sit in the comfy chair" rather than things like

"I would like to sit down 4.3 meters from the North wall") but they are

always simpler than definitions of words. Each type of element looks the

same, however, to this program, whether it be a word, a note, or a location.

Each is expressed as an object with values along axes (complete definitions

of words have much more structure, as is described below).


                 How many dimensions in Meaning Space?

The first important question to answer is "how many axes will be required to

define words"? Clearly any vocabulary of size  n can be defined perfectly by

a set of n axes: the axes must simply be defined as "<word>ness" (thus the

axis for a word like "green" would be defined as "greenness"). Such a set of

axes would also be useless, since all definitions would be perfectly

tautological, providing no possibility for inter-relation or logical

combination. Clearly what is desirable is a set of axes numbering well below

the size of the vocabulary.

 

A set of such axes has been constructed for several hundred words taken from

a 3,000 word vocabulary comprising the word list in a beginning foreign

language reader. One form of this set is presented in the appendix. Each

time a word from the list was considered - that is, each time a definition

was attempted - it was possible that some axes would have to be

added to the space. At the beginning of the process, several new axes might

be required for each new word, but as the lists lengthened, the number of

axes per new word diminished. The number of axes required leveled off at

about 450, and the curve was asymptotic. This number of axes is fairly

manageable, and it appears the total number will not continue to increase at

an unreasonable rate as words are added.

 

                How complex are definitions?

 

Another problem exists with respect to the number of axes that must have a

value in order to define a given word. If a definition required that 500

axes be given a value, then the task of defining a few thousand words would

be impossibly lengthy. Fortunately, the largest number of axes required for

a single word was 25 - that word was the verb "to deserve" (as in "How much

freedom does a ex-con deserve?"). The average number of axes required was

around 10. In any case, these numbers are also manageable.

 

Since the axes used in defining words relate to different words in different

ways, it might appear that the individual coordinates are themselves

somewhat more complex than spatial ones. This is a detail of the definition

of the space, however, since the axis "red" can be replaced by several axes

whose definitions are "red as a property", "red as a frequency", "red as an

aesthetic value", etc.

 

For the purposes of this study, then, words exist as points in a space with

perhaps 500 dimensions. No claim is made that these definitions are in any

sense "complete" or "correct" from either a lexicographic or linguistic

point of view. Many humans exhibit perfectly sensible verbal behavior

without such completeness or correctness. The points in meaning-space 

should be thought of as a rich and flexible labeling system.


The data structure being used is minimal. All the attributes that more commonly

might be held in variables with detailed structure (objects, records, etc) are stored

as axes with values. One result of this is that words (and all the other objects in the

space) move around according to context, and according to experiences relating to

 the objects. As an object is used, and as associations with it are built, because

all the information about it is kept in the one data format, its location in the space

changes. 

 

 

     Templates

 

The way we represent the spectrum of light from a star consists

fundamentally of a series of values (the heights of the spectral components)

along a number line (the frequency axis). There is, however, more

information - for example, the width of each peak. It is useful to think of

the definitions of words in a similar way. Instead of the frequency axis

used for spectra, there is the number line consisting of the 500 axis

numbers. The value used in the word's definition is analogous to the height

of the spectral peak. Thus the definition may be thought of as a graphical

object like a spectrum, rather than as a point in a multi-dimensional state.

 

When dealing with language it is essential to have an organized way of

dealing with such problems as approximation, unknown elements of a

definition, elements which comprise a range of values, and definitions which

specify classes of other definitions. These four ideas are computationally

related. Imagine a set of star-light spectra, printed out, and that we want

to find a subset of those spectra with certain characteristics: e.g., values

must be present for some frequencies, and those values must fall in a certain range.

It would be possible to construct a piece of cardboard - a template - with

windows cut  in certain places defined by these characteristics. By holding

the template over the individual spectra, we could see which ones correspond

to the set of characteristics. The size of the hole cut for a given

frequency would be one parameter associated with that frequency. Also

imagine that the templates' windows  come with conditions, so that one can

specify things like "if at 750 angstroms the value is between x and y, then

ignore the value you see through that window at 800 angstroms."

 

Considering words' definitions to be templates, rather than as single values

associated with base line positions, creates a natural environment for

approximation, ranged values in definitions, classes and so on. In fact it

is helpful to think of all words as degenerate examples of templates - that

is, defined words are templates in which the windows are open to viewing

only single values - and conversely it is helpful to think that all defined

words provide models for templates. It is then possible to move from one

definition to any other in a precise and controlled way, keeping track of

the transformative actions required and consequently the "distance" between

the definitions. Getting from one definition to another involves a concrete

series of steps, adding or subtracting baseline indices that have values, or

changing the ranges of values in particular baseline positions. Definitions

that are more different require more steps.

 

 

Orthogonality, Distance, and Clouds

 

The angles between spatial axes are meaningful. I believe that the angles

between the axes in this "meaning-space" can be determined geometrically,

using a mechanism postulated below, that obtains the necessary information

from conversation itself. (It is desirable that no abstract or rote input

from the programmer be necessary to define an axis, or else no new axes

could ever be added by the algorithm itself, nor could any attempt be made

to allow the program to start completely from scratch - that is, without any

words or axes pre-defined.) Normal arithmetic vector calculations suggest

that a pair of axes which are completely unrelated ( say, "color" and

"manufacturability" ) exist at right angles to each other, while those which

are related ( say, "color" and "redness") would exist at lesser angles.

 

Assuming for the moment that appropriate sets of relations among axes have

been determined, then the distances between words in meaning-space (MS) should

bear a relation to our (somewhat subjective) evaluations of words' similarity.

There will be regions of MS in which all definitions are closely related; one

such region would correspond to the class "fruit". Note that the definition

of "fruit" can be obtained by widening some windows on the

template-definition of "apple".

 

Many axis definitions require the assignment of values which are not

precisely quantifiable, or else it might appear that precisely quantifying

the values would be counterproductive. "Red" is necessarily understood as a

range of frequencies along an axis that includes other colors as well.

Therefore words are not best thought of as points in MS, but rather as

variously shaped clouds, whose deviation from sphericality depends on the

ranges of possible values on the word's several axes. Computer memory has

become very cheap, and it is no problem, from the computational point of

view, to store pointers - to each text item such as "apple" - at a large number

of nearby points in MS. This fuzziness in individual words' definitions

relieves the system of the requirement of any specific level of precision

when making its various calculations. (See "Fuzzy Data, Fuzzy Logic" below.)

In general, in the discussion which follows, I will refer to words as

points, and the reader should remember that this usage is approximate.

 

 

Syllables, Words, and Phrases

 

For an algorithm like this to improve itself, it must concatenate "words"

into concepts it can deal with as units. It is better if the idea "to go

home" becomes a single unit, rather than always existing as three words.

(This clumping is known to be one of the ways "expert" thinking

differs from "novice" thinking.) Remaining separate would

require parsing of all the words, each time the phrase is seen.

The data structure I have selected is suited both to the

combining of words' axis/value/pointer units, and to their disassembly.

Certain phrases, in which the words have very distinct and different

functions (different meaning, different definitions, etc.) sum very

naturally, like "to go" and "home". The combined idea looks exactly like

all the other entities visible to the algorithm - therefore it which doesn't

need to care that there were, originally, separate word units. This process
 
of concatenation is also applied to entire sentences, reducing them to a single

meaning-space point.

Phrases - concatenations of primitive objects into new ones - occur in all three

of the environments addressed. In European, Indian, and Arabian monophonic

music, there exist small clusters of note-rhythms that recur as units. Each Raga,

for example, includes as part of its definition numerous such entities, and each

mode in Gregorian Chant has its own characteristic set of short phrases. Likewise,

the control of a robot arm entails many collections of primitive operations - collections

that are learned once and then are never used in their decomposed form again.

 

Equally essential to concatenation is the ability to decompose definitions and larger

units (like the paths, sentences, and transforms described below) into smaller

units, like syllables. At a shallow level, this is a trivial matter, since

words' definitions consist of completely discreet units - namely, an axis

number and a value. It is of use to be able to take any subset of a word's

definition (or of a path, transform, etc.) and find out quickly if that

subset comprises a word in the dictionary or in any recognized concept. This

capability is a natural result of the associative, content-addressable

memory that is utilized by the algorithm. (This technique of memory management

is clarified in the description of the historical progression of experiments

that preceded this project.)

 

The ability easily and without much computational complexity to form large

units from small ones, and to decompose units into smaller fragments, makes

it natural for the algorithm to use  information at any of various levels,

from portions of words to phrases of considerable length. It is important

that the computations performed can be the same at widely differing levels

of resolution, and that logical operations can therefore either be insulated

from irrelevant details, or can descend to the level of those details when

necessary. I don't often think about the safety, convenience, etc., of my

"home" when I think about leaving work to go home, but I have to know about

all those things when having a discussion about the "value" of having such a

place to go.

 

The data structure being used for words includes axes and values along those

axes, but also a provision for various types of relationships among those

axes. These relationships primarily fall into two categories. First, at the

top level of the definition of "apple" would appear "red" and "fruit". Both

of these are words themselves, and a lower level of apple's definition would

then be the definitions of these two words. Second, "red" has a  pointer to

"fruit", and the pointer is of type "modifies".


Because of this structure, it is rather straightforward to concatenate the

definitions of a few words forming a common phrase into one entity, which

itself looks - to the program - exactly like a word.

 

When examining input, there are some signals to the pre-processing parser

that a collection of words is a phrase (for instance, the presence of an

"extra" verb in a sentence or the appearance of a preposition) but primarily

it is a matter of repetition of the collection itself, in association with a

number of other words, all of the same part of speech. The collection "I

would like", for example, appears over and over, followed by an article and

a noun. It's inefficient for a program to repeat its analysis of the

repeated phrase over & over. Rather, the repeated phrase should be treated

as one entity whose definition includes the previous analysis. We do this

all the time, and sometimes there's hardly any need to be able to decompose

the phrase, ever again. In fact, we make jokes about this, jests that are of

the form

  

     say a silly sound         establish a context that explains what the sound means

     "Jeet jet?"                  "Did you eat yet?"
 

     "Juwannago?"           "Would you like to go?"

 

Routines to construct and disassemble phrases will operate on single words,

phrases, and paths alike, according to repetition rates, eventually making

reply formation more efficient.

 

Concatenation could serve another purpose. Once "I would like..." has been

summed and is treated as a single word, it should become clear that the

'word' is either followed by single nouns with an associated article, or

by larger and more complex collections of words. The grammatical

equivalence (of these larger collections) to the nouns/article

pair could allow the program to define such things as "noun phrases"

without being explicitly instructed that they exist.


Words are not only to be concatenated with other words. The data structure

includes a provision for context-dependent information to be appended to the

definition of a word that is "in play" (see "Big Buffer" below). The simplest

example is tonal inflection. Clearly, four different replies are required for the

four versions of the following sentence:


1) I want to go home. (George doesn't want to, though.)

2) I WANT to go home. (It is my wish. It is not an urgent need.)

3) I want to GO home. (I wish to travel to my dwelling.
                                                       The travel itself is what's important.)

4) I want to go HOME. (As opposed to going to work.)


Tonal inflection of this type can become visible to the algorithm by defining

a two-value axis for "inflection" and allowing the input stage to include the

assignment of accented words.

 

Predicting New Words

 

How would the existence of as-yet-undefined words be predicted by this

algorithm? Suppose training conversations have been limited to food and

eating. After some learning has occurred, and definitions of sufficient

numbers of words has taken place, there would be a number of words that would

differ only along one axis. This situation would be represented by a

particular type of template: most of its windows are rather limited, because

identical specific values are present in all the words' definitions (all

fruits are material objects subject to being eaten), but one window, on one

axis, would be very wide, because  the various words in the collection have

very different values for that parameter (fruits come in all colors).  It is

reasonable to imagine that the collapse of this wide window to a discreet

value, in combination with the other axis-value pairs, might point to a

location in MS that actually does  (or should)  represent a particular word or phrase.

The program is then in a position to search for a word in meaning space, and to

ask about it, if there is nothing at the location specified. The sort of

example described here would be predicting the existence of a word like

"apple" if one had been introduced to "pippin", "red-delicious",

"granny-smith", etc, which are words that might differ only along the

"color" axis.

 

If the color axis had not yet been defined, then this same mechanism would

suggest to the algorithm that it should be added. The existence of more than

one word with identical coordinates, but which the teacher says are not the

same thing, requires that there be another axis about which the algorithm

has yet to hear. Automation of the definition of new axes is the most difficult

aspect of this project.


The data structure provides a straightforward way to recognize, and help with

processing, many of the phrases (that is, short combinations of words that either

recur exactly or that involve wild-card positions) that occur in normal con-

versation.  I am not referring to complete, formal grammatical entities such as

"noun phrase" or "predicate phrase" but rather to those repeated word sequences

that the program might usefully concatenate into single units of meaning. Con-

sider the following sentence fragments - each of which is likely to recur in the

training of a program like this. In each case the phrase I wish to address is

 italicized, and a possible completion of the sentence - not relevant right

now - follows in parenthesis.


    Do you mean     ("genes" or "jeans")

    I don't know      (George.)

    When we were talking    (before, you said......)


In each of these cases, a running sum of the axes presented by the words would

encounter no repetitions of axes until the part of the sentence in parentheses. The

three words "do you mean" share no axes of definition, but the arrival of another

noun ("genes") would share some with the pronoun "you". This sort of cal-

culation provides a way of suggesting points of division within a sentence.

 

Paths, Bombs, and Transforms

 

     Sentences

 

If linguistic words, musical notes, and physical locations are centered on

points in MS, then sentences, melodies and movements are paths through the

space.  The ur-form of these paths  consists of a series of ordered points. There is

a simple way to collapse these paths into single points themselves without losing

any information - although the resulting structure is not as efficient as the original

in terms of memory usage.

 

In thinking about sentences and melodies, one must resist the illusion of

continuity between points. In physical 3-space, proceeding from one point to

another involves a continuum of intermediate points whose relation to the

endpoints is precisely defined and entirely meaningful. It is unlikely that

intermediate meaning-space points between two words in a sentence - points

whose location could be calculated by some simple arithmetic function -

would be meaningful in a similar way.


Because the data structures involved are closely related, the program will be able

 to move between paths and word-definitions quite freely: sentences may thus

easily be constructed from the definitions, and new definitions may be formed

from conversational input. When the processor is not occupied with new input,

the program will be free to work on its database, and it is during these times that

paths will be used to form new definitions, and that questions (or statements)

concerning existing definitions will be constructed.

 

     Bombs

 

I prefer to think of the links between words in paths as "bombs". Suppose

you are "standing" on the infinitive verb in the fragment " I like to

eat....",  and that you are awaiting the object. There could easily have occurred,

in previous conversations, several appropriate objects, for example, apple,

pear, banana, etc.

 

A bomb is an entity with a direction number (a set of numbers specifying the

desired direction with respect to all relevant axes), a distance, and an

"amount of explosive".  Standing on the infinitive, one tosses a bomb in a

direction, forcefully enough to go as far as the required distance. The bomb

disappears in flight (since intermediate points are not meaningful as they

would be in 3-space), reappears at its goal, and, depending on the

generality of the desired result, it explodes more or less powerfully,

illuminating larger or smaller regions of MS. Thus a very large number of

actual sentences are fuzzily available from a single source sentence.  In the

diagram here, read a column as a sentence, and reading across, see words

comparatively close in MS:

 

I                you                     we                   George

 like           dislike       really, really like      sometimes likes

 to eat         to have              to sell                   to grow

 apples       single apples     tons of apples             fruit

 

Bombs come in a variety of flavors, each associated with a particular

operator - operators that are closely related to the pointer types that exist

inside word definitions. For instance, a bomb can be associated with a

grammatical necessity, such as a direct or indirect object for a verb. The types

I have defined so far are: bombs which create paths, those that comprise

transforms, those that define forbidden transitions (as learned by

reinforcement), those that point to/from grammatical direct and indirect objects,

those which define a property of the object at the origin, those dealing

with agency (who or what entity "did that"), and those that lead from

conversational commands to non-linguistic behaviors necessary for the

operation of the program. This last type is particularly relevant in

managing the problems associated with "being wrong" and changing the

database in response to negative reinforcement.

The bombs with the label "is a property of" are one of the ways this

algorithm manages to answer the ever-present question, "what are some

properties of this (thing)(situation)(MS location)?" And, as with most types

of information stored in this program, the particular type of content-

addressability used in memory management obviates any search-time

problems.


The function of all types of bombs can be enriched further by associating

each bomb with a set of values for some subset of axes (this fits nicely into

the datastructure for a word, and would be MS-stored in the same way). The

values would indicate the allowable generality of the bomb's result along each

axis. This is a natural way to recognize situations in which a class word is

appropriate, or when a class of objects is the subject rather than an individual

member of the class. It is also easy to learn this information: when bombs are

stored, it is a trivial matter, (almost) instantaneously to find a set of other bombs

 that share all but one axis value. The successive examination of such related

sets of bombs will provide a range of suitable values along that one axis wherein

the target axes differ. Once this range has been established by a suitable number

of "hits", a bomb can be configured with the appropriate generality along that

axis. In this way conversation gradually shapes the algorithm's handling of

generality and classes.

 

     Transforms

 

If sentences are paths through MS, then the process of conversation (whether

it be the internal conversation that we engage in while reasoning, or the

external one that can result) involves transformations from one path to

another.  With paths stored as points, the transform between them is especially

simple, and is isomorphic with a bomb.

 

It is at this point that the musical and locomotor parts of this program are

no longer involved. There is no analogy between moving about a room, and a

conversation (except perhaps dancing with a partner?), and the analogous musical

level of complexity, beyond a melody, is counterpoint.  I am still limiting my

program to monophonic music such as that composed before 1100 A.D. (See

CHANTER below) in which counterpoint is not involved.

 

Once again it is advisable to resist any analogous spatial image, in which

one would imagine two sentences as two angular snakes, each consisting of

several points with straight lines drawn between. Such a vision would

encourage one to imagine the transforms to be predictable

shape-transformations. (Imagine one snakey squiggle, morphing into another

snakey squiggle, situated somewhere else in the space.) This is precisely as

unlikely to be valid as  the image of meaningful intermediate points between

words within a single path.

 

I prefer an image similar to word-linking bombs.  Suppose some input has

arrived which begins "Do you like...." and in which tonal emphasis is placed

on "you". Many subsequent replies would have begun "I (do) like....". Thus

an element of one type of transform would look just like a path from "you"

to "I". Another fuzzy explosion of possibilities is available without any

computational strain, with different variants being appropriate to different

tonal stresses in the input:

 

I                     George                   You

 like                dislikes              used to hate

 apples               pears                  bananas

 period          question mark            comma

 

 

It is useful that paths, bombs, and transforms look alike to the computer.

Many sentences, after all, are also transforms. The sentence "I went home"

transforms the sentence "I'm at work" to the one which says "I'm home." The

less computation required to move between paths and transforms - and the

reverse - the better.

 

The progression of transpositions can be continued. Axes are related to each

other by angles. Axes are related within word definitions by pointers of

various types (such as "is a property of"). Words are related within

sentences by transitions between points in MS. Successive sentences are

related to each other by path transforms. And transforms themselves are

related to each other by another level of transition. Using a single data

structure for each type of transition allows for a unified and seamless

functional programming environment.


Most transforms will operate on small portions of  input - input to transforms

comes both from context and from immediately preceding sentences. Likewise

most will only provide  part of the answer when a reply is being formed.  The

simplest will hold some values constant while rotating others about, as in the

following example where the subject and verb are constant, but pronouns and 

mood are flipped:


You     like        apples?

I           like        apples.


Various means of discovering stable elements have been constructed, some of

which are quite complicated (see "Meaning Space and the Management of

Conversation" below). The mode of reversal differs by axis, but is learned from

conversation and stored in the usual content-addressable fashion.

 

          Definition Transforms

 

The grammatical structure of sentences is obviously related to the forms of

definitions of the words present. In replying to the question "Which one?" a

number of antecedent relations have to be managed. The pattern of

parts-of-speech appropriate for a reply depends on the nature of the

headless pointers implied by these pronouns. Grammatical patterns and

definition forms will be associated with labels, and these labels will then

be used in a higher-level module to suggest appropriate grammatical patterns

for output, or for the next stage of the internal conversation.

 

 

The data structure for the definitions of words consists primarily of values

on the various axes, and these data are linked by pointers of various types.

(The data structure is described in some detail in the appendix.)   This

is the same data structure needed for some types of transforms,  in

which axis-values or groups of them are transformed into other values, or

are transformed along some axis. Because of this similarity, the definitions

of words look like transforms, and for this reason, since an early stage in

the development of these ideas, I have referred to the definitions as

"definition transforms". The unity of axis structure makes

it as natural for the algorithm to forecast the grammatical structure of the

reply from the definitions of words in the input, as it is to simply

forecast the words in the reply from the words in the input.


The definition transform of a word consists of everything in the word's

definition except the values along any of the (non zero) axes, making them

similar, in a way, to class-words. As an example, imagine removing

successively more and more axis values from the definition of "apple".

Removing the value for "color" (but leaving the axis there, to indicate

that this definition is for entities that can have color) makes the definition

into a class-word for all apples. Removing the values for sweetness and

animal-vegetable-mineralness brings us to a class-word for edible objects.

Removing the value "edible" reduces the class restriction to "objects subject

to <an operation>" rather than the more detailed "objects subject to <being

eaten>". When all the values are removed there remain only a set of axes - ones

that could be filled - and pointers, such as the one that linked the original value

on the color axis to the outer-level-definition object itself (the apple). Note that this

type of value-free definition is like a template in which all the windows are

fully expanded, but in which the relevance of a particular subset of axes is

still specified.


The way the algorithm uses these transforms demonstrates one typical process

naturally available within the memory-style chosen. First, the information in

the individual deftrans is available from the MS location at which the index

of the deftrans is stored.  Each deftrans itself has a number, or a label.

All the labels of the deftrans in a path, taken together, provide a compressed

desgnation of the procedural or grammatical structure of the path, as contained

in the structure of the definitions of its constituent words. This collection of

labels points to a memory location at which predictive information is stored.

For instance, storing reply-formation-transforms at these locations provides a

way to predict what process might best be used in the transformation of a

known path into a new reply.


As this process continues, the selection of reply-formation-transforms

(henceforth "rft") will stimulate reinforcement. Imagine that a series of deftrans

 labels has succeeded at predicting an rft, and that the result of using that rft

 receives a positive reinforcement. A typical process would then be to store

the deftrans label(s) at a location provided by the rft itself. This reverse-storage

is the natural way, at some later point when back-chaining is going on, for the

program to infer from known rft's that some particular definitional structure

should be relevant.



          Some Transform Geometry

 

Transforms are operations that may involve complete sentences, but I believe

that smaller arguments will prove more useful, especially at the beginning

of the program's learning. For example, imagine the sentence "question: you

like apples" to be a path on a surface in meaning space. The surface is

defined by 1) the antecedent to "you" (let's suppose, for the moment, that "you"

means  "the computer"), 2) the template for preference words, and 3) "apples". Thus

we imagine a point at "computer", a point at "apple" and a cloud of points

defined by the template for words such as "like" "dislike" etc. Just as

three points define a plane in classical geometry, this collection of

meaning-space specifications defines a meaning-space surface, or a cloud (in

either case, a geometrically limited portion of the space).

 

Suppose we are interested in constructing a sentence to follow "You like

apples?" Let's imagine for the moment that the antecedent for "you" has been

properly transformed into the word "I". Now imagine the points on the

surfaces that include "I" and "apple". The geodesics drawn through those

words, assuming the surface is limited by the "preference word" axis, will

include lots of legal replies ( I hate apples, I am indifferent to apples, I

love apples, etc.). The transform involved here would hold two points

constant ("I" and "apple") and would only operate on the third. Other

surfaces, limited by other meaning-space templates, would result in other

collections of legal sentences. Suppose the preference words are replaced by

verbs in general. Then the geodesics between "I" and "apple" would pass

through "throw", "eat", "sell", "grow", etc.


Of course, the interesting question is "how does an algorithm decide between saying

'I like apples' and 'I hate apples'?" For this, the reader is urged to visualize

the type of MS surface described above, with its two fixed points on the subject

and the object. The decision between 'like' and 'hate' must of necessity come from

some other sentence. Other sentences represent other surfaces, surfaces that

limit the choices in different ways (with respect to different axes). The intersections

of surfaces (although computationally obnoxious) are expected to be essential

decision-directing quantities.

 

          Transforms and the Melding of Conversation to Computer Code

 

Writing computer programs to interact with people always means crossing a

barrier at which the person's input has to stimulate the computer code to do

something. This inevitably requires a translation of human behavior to

something recognizable by the software. At one extreme, a pre-determined

list of behaviors from the human is specified, and any other behavior is

un-recognized. When you opt to start Windows in safe mode, you are given a

few numbered choices. Keying in any other number brings no result.

 

A goal of this project is to arrive at the other extreme, so that "internal"

transforms, such as those between sentences, are indistinguishable from

"external" or "functional" transforms, such as those between the human input

and the program, or those between elements of a conversation, and commands

issued to the program (to perform some programming task like "remember that"

or "look that up".)


As long as transforms can move from a path in MS to a

path in another space consisting of computer code, there need be no barrier -

and the image of these spaces as being separate is illusory. All the things

a computer needs to do, or know, are just values on other axes, and the

program doesn't need to care whether it is transforming a sentence into a

reply, or into a programming directive such as "call function 'parser' now".

Merging the two spaces is accomplished by simply ignoring their differences.

(This statement may seem glib and meaningless at this point. The idea of

using all sorts of input, untranslated, is another idea that is worked

through in recounting the historical development of these programs.)

 

 

          Illusions of Transforms and the Ur-Path

 

An early plan for this program involved the use of shape transforms to morph

input sentences into output replies. A variety of transform styles were

foreseen. Perhaps the most obvious was to utilize geometric rotation,

translation, and dimensional changes to learn how Teacher's input relates to

previous conversation elements and to the current context. Such transforms

would use pattern recognition to allow the program to recognize grammatical

structures from Teacher that generated  reinforced behavior in the past. The

transform, with its fuzzy generalization at both input & output, would then

be applied to the current conversation, and output would be a morphed form

of the most recent sentence. In this view, a transform is a daemon,

activated, like an antibody, by the appearance of input that fits its

template.

 

This early formulation still has its place, but I no longer believe that a

"previous sentence" would be sufficient input, nor do I expect output to

consist primarily of grammatical replies. Fortunately, it is no easier to

use, as input, an immediately-preceding sentence than it is to use any other

extant object in the program (as will be described below: cf. the use of

diverse time series in ECONOMIST). Reinforcement is expected to drive input

selection, in the same way as it is hoped to drive the actual procedural

decision making process.

 

Bertrand Russell explains clearly how Zeno's paradox is resolved by a modern

understanding of different sorts of infinities, but Zeno's conclusion that

motion is impossible is still a valid conclusion for a person unaware of the

niceties of 20th Century mathematics. To Zeno, motion had to be construed as

a succession - in time - of discrete states.

 

It is almost certainly illusory to imagine that a series of finished

sentences in a human conversation moves from one sentence to the next as a

result of direct transformation. There is a huge amount of internal

"conversation" that must happen inside each human as the external

conversation progresses. To possess any kind of "truth", the transforms

(which could  be defined as existing between successive sentences) would have

to include all of this internal processing. This is unlikely to be possible

without postulating the existence of intermediate states, and without

representing these in some way. Each participant in the conversation is

moving from many internal states to others, and there are no two internal

representational states between which another cannot be inserted. This makes

reaching the goal of a reply as impossible as it was for Zeno's runner to

reach his goal. And yet Achilles beats the tortoise, and replies are, in

reality, constantly being constructed.

 

For the moment, let's not concern ourselves with the problem of

initialization - so we're not going to try to think about what a baby's

brain does, starting from its first perceptions. Let's just imagine that an

adult mind is conscious of its own current state, and that such a state is

at least partially representable by a path or constellation of

operator-related paths within an MS that has been enriched by decades of

conversational experience. Every element of this ur-path is represented by a

fuzzy cloud of some size. Perhaps consciousness, or at least the progression

of the internal conversation necessary for reply-generation, could be

thought of as the crawling of this active path through MS as elements of the

path interact with stored associations, via the mechanism of resonance and

oscillation described below. Such an image allows for a practical

computation to be constructed that mirrors the internal "conversation" we

know to be taking place.

 

Then the problem becomes "at what point in your calculations do you reply"

or "how do you know when the internal conversation should become external?"

 

 

Oscillation, Resonance, and Spectrum

 

     Clouds & Vibration

 

It is useful, from the point of view of computation, to think of words as

clouds in MS. When actual input arrives, with a specific word like "apple",

the cloud is either very small (including only "Pippin" and "Red delicious")

or completely collapsed to a degenerate state ("the partially-eaten, very

ripe Pippin, from George's orchard, that we were discussing"). In considering

a reply, however, it might be useful to consider clouds which are very large

(all edible objects) or of some intermediate size (all fruits). I have found

it productive to imagine these reply-formation clouds as oscillations among

the words found within the boundaries of the cloud. It is a useful image

because the parameters we naturally associate with oscillations (such as

those between neutrino flavors or electron spins) provide a natural

computational framework for concepts usually thought of as linguistic.

Apples are similar to pears. They are close together in any reasonably

constructed MS. Therefore we define it to be  "easy" for "apple" to become

"pear". If the transition happens often - at a high frequency - then "pear"

becomes almost as likely to be seen as its parent, "apple". If the

oscillation is weak - or requires little "energy" - it is like an

"explosion" of a very specific bomb, as opposed to a very general one (that

is, the oscillation occurs only between "apple" and other words very nearby

in meaning-space.)

 

Vibration also provides a metric for certainty that can be sensed directly

by the program. A completely new word/idea will not be vibrating at all

(because there would be nothing with which to set it in vibratory-motion)

and it therefore exists in a state of degenerate collapse, rather like a

quantum-mechanical object whose qualities have been forced to assume a

known value by the intrusion of an observer. And as a conversation proceeds,

an idea can become less and less uncertain as information builds up to the

point that its vibration collapses to zero - at which point that part of the

conversation can be perceived by the program to be over, in the sense that

no further works needs to be done in that regard.  For example,

"fruit" might be the topic, but if someone brings in the idea of "red" or of

"gift to a teacher" then vibration to "apple" can be enhanced while that of

vibration to "pear" is diminished. These bits of information are readily

available to the program (and are nearly instantaneously discovered) becasue

of the memory-management technique employed in storing definitions and

conversations.


Likewise, a region of MS in which rapid, high amplitude

 vibrations are occurring,  can attract the attention of the

program - there is necessarily much left to investigate in such a region,

whereas in a collapsed area, there is little to discuss. Thus the idea of

vibration can also address the problem of focus - another area of simulation

programming traditionally considered to be somewhat opaque. (I am aware that

I am using the collapsed state in two ways here - this is an area I am still

working on. Needing two modes of vibration, however, each with its own state

of collapse, is no problem - physicists do it all the time with independent

quantum numbers - and it just requires more memory for a computer program.)

 
 

     Resonance and a Resonant Network

 

It is particularly easy for a computer program to recognize the existence of

certain sorts of associations. All the nouns which ever occurred after the

phrase "I like" can easily be collected together. The words in such

collections can be said to be oscillating among themselves. Thus an

arithmetic relation can be established between words, based on

conversational appearance, rather than based on static definition, and

conversation can directly effect definitions in the database.

 

(Programmers not familiar with the form of data and memory management used

in this project might balk here, correctly realizing that the following

statement would not be true in an exponential search environment. This

problem is avoided by using content-addressable memory, and therefore I must

ask you, please suspend your disbelief until you've read the history below.)

It is also easy for a computer program to recognize when two words are both

oscillating to a common, third word. This property, which I like to think of as a

resonance, also provides arithmetic parameters which allow us to move

between the realms of meaning and computation. A learned network of

resonances of variable strengths, makes the dictionary much richer than the

initial set of points in MS.

 

When two objects are found to be vibrating to the same pole (or to poles

fuzzily nearby one another), their 'energy' might be increased, or they might

be defined as having a useful association. Likewise, the 'motion' between

objects in MS can be mediated by a number of fundamental operators

( "is a property of" or "points at"  are operators that link ideas

in the sentences "Red is a property of apples" and " 'adjective' is an

object which must point at a noun").


Resonance can be established as the vibration along equal mediators, as

opposed to vibrations to similar locations in MS. This provides a means of

sensing intersections of meaning outside of a conversational rubric. It also

provides a natural way for the program to sense relationships between internal

parts of two words, independently of their complete definitions.

 

Words can be similar, or close together, because they have similar values on

similar axes, or they can be similar because their definitions have a

similar internal structure. While the first sort of similarity is easily

seen between a pair of words like apple and pear, the second sort is seen

between ochre ( used for dyeing, has a color) and pepper (used for spicing,

has a flavor). This example is mundane, but consider the following. There is

a fairly short list of social/logical descriptors of conversational

elements, and one of them is the polar relation between "agreement" and

"disagreement". Along this axis is the word/phrase/sentence "Probably so."

This concatenation is also similar to "ochre", since it is used for

agreeing, and has a position (along that axis of "degree of agreement".)

This is an example of a simple relation that would unavoidably be sensed by

this algorithm, but which points to a relationship of which I have never

personally been aware. If this relationship is never used, it will never

be reinforced. If it is never reinforced, it will seldom be used - thus if

the relationship is meaningless it will subside into the background. On the

other hand, this sort of mechanism allows the algorithm to sense and use

relationships unknown to its creator.


This use of relations unknown to the programmer is a core value expressed by

the programs described in the history. Physical objects interact according to

physical laws. It may be useful to think of "splashes of axis-value

information in context-created clouds" as interacting, according to some set

of laws. By setting up linguistic entities with a wide variety of active

parameters, and by setting up the program to favor or inhibit the uses of

these parameters in different ways, according to reinforcement received from

outside, it may be possible for the program to establish and use such laws,

even though they are not available to the programmer. This property of the

brute-force, unprimed robot - demonstrated below in simpler programming

circumstances - is expected to prove crucial to the success of this program.

The program CHANTER involved no musical knowledge whatsoever, and the

current project is dealing with a subject of extraordinary difficulty and

complexity - it is absolutely essential that the program itself  be capable

of establishing and using relationships which the programmer never predicts

or encodes.

 

Resonance relations may allow the use of an analogy to the idea of

cross-section in particle physics - in which some particles are transparent

(have a low cross section) with other particles. The word "Bill Clinton"

would be expected to have a low interaction with the word "dye", but the

word "maroon" would be expected to interact highly with "dye". Put in other

words, "to dye" should have a very high resonant energy of interaction with

"maroon", but not with "Bill Clinton". Dyeing and maroon would both be

vibrating to the MS location "color" and would therefore resonate

automatically. Easily found, quasi-automatic means of association, like

this, are of use to a system building up its behavioral repertoire through

massive repetition and constant positive and negative reinforcement.

 

A peculiar network structure is also possible using these ideas, a structure

that may be related to neural network function.


= Take word X, and make a list of all the words to which it vibrates, or
with which it has ever been associated, or to which you can "bomb" from
 

=Give each word on the list a counter, so that you can keep track of the
number of times the word recurs throughout this whole process


=Then do recursion: go through each word on the list, doing the same thing,
such that you create lists of new words associated variously with each word
on the original list (that is, the list that came from the original word )

(Many of the types of association that are available come with some
directional indication, such as "'appeared with' in a sentence" or   "is
a property of". In this network it is sensible, in alternating iterations,
to reverse these directions.)


= Each time a word recurs - that is, each time one returns to the same word from
a different direction, or via associations with a different word - tell the program
to increment that word's counter.

=Allow this dark-cycle crawler (see below) to operate on the database
 whenever the processor is free.


The words that acquire higher values must surely be differently related to the initial word

than words that appear only once.

 

 

     Oscillation at Other Levels

 

If a word in a path is "in motion", vibrating between itself and related

words, then the whole path may be considered to be thus alive. This provides

a second model for the procession of paths in a consciousness, not modeled

on the idea of "reply-formation" or "conversation". An idea, or a

consciousness, or the concentration of a mind on a subject, might usefully

be thought of as a fuzzy path whose vibrations cause it to crawl through MS.

If the vibrational modes are rich enough - and in programming there is

little to limit the richness of these modes - then this model of a path

might function more realistically than one based on precise locations of

points, or on transforms between skinny paths.

 

This vibration has different meaning depending on what level of object is

involved, but all levels are seen as candidates for this idea. It's pretty

easy, and the analogous "meaning" fairly obvious, to make "orange" oscillate

with "yellow" - they're right there, next to each other, on the "color" axis

- and if that parameter in the definition of a fruit is oscillating, then

there's a simple, arithmetic, easily programmed mechanism for generalization

of knowledge among members of a class - and this generalization would be

controlled easily. For instance, if you're talking about fruits as

ammunition in a food fight, no mention of color is likely to arise, and the

two fruits would be considered functionally identical. If you're talking

about painting them, color would be present in the context, causing the

color of the objects to vibrate. (Exactly how & when such an "energy 

transfer" would occur is one of the most stimulating avenues of this inquiry

- what laws do blobs of axis-value-pairs and pointers obey, when they

encounter one another in a context? Is this meaningful at all, or useful?

What definitions of "energy" might be most useful? Does all this relate

directly to any imaginable physical process in a brain, or is it purely an

abstraction? If it's an abstraction, is there any hope it's isomorphic with

brain activity? Is that even desirable? A beginning in this area has been

attempted with respect to physical analogies below.)

 

 

One difficulty in programming simulations of human thought concerns the

seemingly unpredictable associations that we are so prone to make. Programs

that  "think" also have a related need to be able to become different - even

insane - as the separate "runs" of the program mature. The vibration of axis

values within the definitions of words provides a computer analogy for

associations which are not connected to the complete dictionary definitions

of any currently active word. Similarly, a word in a path can be set into

vibration with other words whose only association with the initial word was

that they appeared together in a previous conversation. In this way

linguistic associations independent of definition can be built up. Thus

separate runs of the program, even using the same initial database, could

develop entirely different learned behavior, and instances of the program

which are ineffectively taught or reinforced can establish nonsensical

associations, making them behave in ways seeming relatively "childish",

"stupid" or "insane". This property implies the possibility of becoming

relatively "adult", "intelligent", or "sane".

 
 

     Physical Vibration and the Data Structure

 

Physical vibrations have specific ways of being characterized: frequency,

amplitude, etc. These have been carried over into the data-structures I am

using for words and their constituents.  In the first place, "vibration" has to

be understood as something other than simple harmonic motion, in which

a value varies according to some periodic function - like a sine wave. In such

cases, the function spends equal amounts of "time" above and below zero. The

vibrations I need are more like the action potentials of a neuron: the function spends

most of its time on the base value, and, at some frequency, the function "fires" and

spends a discreet brief interval at the other value. Such functions, while more unpleasant

mathematically, are very simple to program with. The frequency of vibration is a

rather direct matter: imagine looking at two paths with similar

constructions, and in one, a word is vibrating at a high frequency, and in

another, it vibrates only rarely. The one vibrating rapidly spends more time

at the opposite pole of its vibration than the slow one, making its

vibrational partner more of a candidate for perception by, or use in,

another part of the program. The occurrence of a related word, or of an axis

present in both words, might be a signal to increase the frequency or

otherwise change the energy of an element's vibration. (These processes are

expected to be directed, sorted, and evaluated automatically via the

reinforcement idea described below.)

 

Amplitude has an equally direct analog in programming. A cloud surrounding a

word extends further into MS if its amplitude is higher -  providing another

numerical analog for degree of generalization. A number of structures in the

program incorporate the general premise that any point in MS essentially

defines the center of a larger region.

 

Most vibrations in nature are damped, and slowly decay to a degenerate state.

Some oscillations used by this program will be part of the current context only,

rather than a permanent learned part of the word's definition (these impermanent

 vibrations will exist in a quasi-tmporary structure called the BigBuffer, and

will not be stored in association with a word). These context-dependent

vibrations will decay as they age, losing frequency until they drop out of the

 buffer. This provibes an imperfect method of insuring that context can evolve.

Unfortunately there are many times in thought and conversation when contexts

are entirely cancelled - I have no idea yet about teaching the program

automatically to recognize when this happens. For example, a new participant

in the conversation could arrive, and could introduce a new subject. The old

one might never return, and in such a case, all this context-driven oscillation

should be suddenly quashed.  Perhaps some signal will become clear that can

stimulate the program to ask "have you changed the subject?"

Finally, the energy of physical objects is said to have a spectrum, and this

idea leads into the next topic of this discussion. If a word has different

"energies" in different directions (and directions are what axes are all

about) then the connections that words form will be controlled in part by

the details of the directionality of its energy. For instance, any word

defined initially as a noun modifier will automatically resonate much more

with nouns than with, say, other noun modifiers. This provides a tangible,

mathematical analogy for parts of speech, which we know to be an essential

component of grammatical analysis. The use of directional differentials in

activation or resonance between words creates a partial mathematical model

for grammatical function.

 

 

Parts-of-Speech

 

The idea of parts-of-speech is important for all kinds of considerations

in processing linguistic material. It is an idea that should arise

naturally from any model of language use. One reasonable criterion

for success of a computational theory of meaning would be its

ability to predict a word's part-of-speech from the information

contained in the word's definition, and/or from its use in conversation.

 

The definition of a part-of-speech as a template allows for this requirement

to be met. It would be desirable for these definitions of the

parts-of-speech themselves to be divined by the algorithm, without any

prompting from the programmer. This would be easy if word order were always

the same, and if there were no words (like those nasty adjectives) or phrase

structures which could intervene between words in legal sentences. This

algorithm should be able to set up classes of words based on templates

divined from conversational data, but deciding that "material objects" and

"labels assigned to abstract ideas" are both "nouns" or at least are

"capable of being a subject" is a far more difficult matter.

 

The program "Conversor" described in the history provides a conceptual

structure that allows the current algorithm to establish and optimize its

own definitions of parts of speech. In that program, words' definitions

included a list of parts of speech that the word could represent. Sentences

were constructed by first constructing a sequence of parts of speech, based

on previous sentences' sequences. Other routines then specified what

words might fit into the various slots provided.

 

If template definitions for each part of speech are provided by the

programmer, then the current algorithm could do the same thing, by seeking

words that fit the template, and then by concerning itself with series or

patterns of template labels. But this algorithm is supposed, as much as

possible, to find all these sorts of things on its own, merely by examining

its input data (conversations between other humans or those between humans

and itself).

 

The program uses two fundamental cycles, referred to as "light" and  "dark".

"Light cycle" activities require the presence of an interacting human, and

most such organisms are active during daylight. A large number of functions

must occur, however, that consist of operations on the database, and these

take place either in the absence of interaction or as a multi-tasked

dove-tail with the calculations required for interaction. One

autonomous routine (see "crawlers" below) postulates templates which

might serve to define parts of speech and traverses the database to evaluate

and optimize its postulate. It converges on template definitions which

maximize the success in predicting patterns of template appearance in

conversation. (These predictions are made by re-using previously acquired

training data. This sort of self-testing was at the heart of the evaluation process

for the program ECONOMIST dexcribed in the history below.)

Some of these templates look like parts of speech, although

the program will find not likely end up with the small number of classes we

usually use in grammar class. "Preposition" is just too broad a category,

including both the word which forces "infinitive" ( "to go home"), and

something like "underneath".

 

Unfortunately,  a consideration of the data structure (that results from

defining a part-of-speech as a template) suggests a complication not

obviously accounted for in normal grammar. Since the data structure is

similar at all levels of construction, it is clear that it allows for entities

that function like pronouns in several other "places".

  

Pronoun: subs for a noun       (George likes apples) (He likes apples)

auxiliary: subs for a verb     (question: George likes apples?)  (Yes, he does)

phrase: subs for a modifier    (George is green?) (Yes, he looks-thus)

"yes": subs for a sentence     (Are you green?) (Yes) ((I am green))

a word is a substitute (label) for a set of axis-value pairs + pointer

a phrase-number substitutes for a phrase concatenated from words

 

When a wildcard function substitutes for an axis-value in a definition, a

class-word results ("apple" minus "red" equals "fruit")

but exactly what is implied by the data structure in which an axis is

replaced by a label rather than by a variable is unclear.

 

 

Resonance, Orthogonality, and Continuity

 

Resonance and oscillation among words can be established by observing

conversation. If these ideas are taken to be analogous to distance in MS,

then conversation can provide the information necessary to calculate the

angles among MS axes. Imagine the 2-dimensional version. Knowing the

coordinates of two points allows the calculation of the distance between

them, and knowing the coordinates of one point and the distance between, one

can specify a locus on which the other point must be.

 

 

                               |

                               |    a

                               |

                               |

                               |

                               |                 b

                               |

                               -----------------------------------------------------------------

 

Now assume that the numerical coordinates of a and b are fixed and known,

and that we know the distance between the points. If the same pair of

coordinate values must be retained, but the distance between them is known,

because of conversational input, to be decreasing, then

the angle between the two axes must decrease also. It's not possible to

solve this concisely without knowing one other fact, such as the angle

between the line ab and one of the axes, but that is not important for MS

calculations, which involve fuzzy quantities and imprecise magnitudes to

start with. Moving the axes away from orthogonality simply means that there

will be a component of motion along one axis when there is change along the

other, unlike the situation with right-angle axes. This is sufficient to

deal with the types of resonance & vibration I believe to be necessary. It

is also sufficient to establish that two axes are identical - a situation

likely to occur once the program begins creating new axes of its own.

 

Thus conversation itself will both enhance the richness of words' definitions,

and progressively refine and inform the topology of the space itself. It

is easy to imagine that functioning programs with different internal

relations among axes would behave very differently. For instance, a program

which has learned that the "color" and "redness" axes are not orthogonal

would naturally appear to be more intelligent - or at least less ignorant -

than one which had not learned this fact. This aspect of the model is one of

the ways to relate ideas like ranges of  "correctness" or "competence" to

ranges of specific arithmetic parameters. An analyst could look into the

algorithm's stored opinions about axis angles, and could see quickly that

some level of error exists.

 

It was remarked above that there are dangers with certain analogies between

the usual spatial reasoning and MS functions. It is possible that these

analogies would be more useful if the MS axes' angular relationships had

been fully optimized by extensive exposure to conversation. Perhaps bombs need

not disappear on their trajectories, and maybe shape-changing transforms

would be useful in such an optimized space. If this were true, then a bomb

that is known to represent a valid transition between words in a sentence

would retain its validity when its origin is translated, or a shape-morphing

transformation between two sentences could be applied with general success

to any sentence matching the shape of the transform's "origin" sentence, no

matter where the shape appears in MS.

 

Fuzzy data, Fuzzy logic

 

"Fuzzy logic" was a term invented when researchers began to address the

difficulties presented by chaotic systems, such as turbulent flow in fluid

dynamics. Computer realizations of such systems occasionally involved bits

which went beyond the traditional binary ones. The term came to be connected

with specific techniques of analysis  that  were anything but inexact - the

name has become somewhat misleading, rather like the Heisenberg Uncertainty

Principle - a bit of knowledge that allows calculations to be made in which

there is no "uncertainty" at all.

 

In my work, the idea of fuzziness is much more direct, and may have little

or nothing to do with fuzzy logic as it is commonly defined. Rather, there

is, in the attempt to imitate human behavior, the necessity to face the

human ability to generalize (so that if we know something about apples, then

we know something about pears) and our ability instantly to evaluate the

extent to which new input is similar to something we have experienced before

(such as our ability to recognize faces immediately, or to evaluate the

face-like-ness of an abstract representation without apparent analysis or

difficulty <draw a happy face here>). Both difficulties require the

manipulation of regions of meaning-space rather than points. The MS points

representing pears, cumquats, peaches, and so on form a cloud near the

point which defines "apple", or perhaps more accurately, around a point

defining "fruit". It is this type of "fuzziness" with which I am attempting

to work. In a fuzzy space, paths no longer appear as thin lines projected

between precise points, but rather as less precise elongated regions

connecting clouds of varying sizes.

 

 

In practical terms, this fuzziness is accomplished in a three ways. First,

when a word, bomb, path, behavior, or any other entity is stored in the

database, nearby points in MS will be filled with pointers to the object in

the center. Thus if some process accesses a spot in MS that might otherwise

be empty, but that is reasonably near to a filled location, a pointer to the

nearest word could be found. (This immensely memory-hungry technique was not

available as recently as 2000, before really large memory storage systems

became available at low price.) One of the "sub-conscious" processes (which

must take place for a program like this to run) is this explosion of MS

points into fuzzy clouds of pointers.  (Important unanswered questions

involve the interactions of these clouds. Should they be transparent to one

another, if they overlap, or is that overlap a useful "cognitive" event?

What laws would these interactions obey? How could such laws be found

empirically, or better, automatically, by the program itself?)

 

The second form of fuzziness requires a somewhat more detailed reference to

the  mechanics of memory management used. The "window" used by OTELLO, a

game playing program, is its input information. It  includes 8 ternary bits

of information, representing the current status of an edge on the board. A

window might look like

 

     1 2 1 0 0 1 0 1

 

                     when taken directly from the board. This window would

cause data about a good move to be stored at a feature-space location numbered

 

     1+(2*3)+(3^2)+(3^5)+(3^7).

 

A more complete series, including all the required powers of three, and all

the zero terms would be

 

1*(3^0) + 2*(3^1) + 1*(3^2) + 0*(3^3) + 0*(3^4) + 1*(3^5) + 0*(3^6) + 1*(3^7).

 

This computes to 2446. The first type of fuzzy storage is the simple removal

of one window element, and replacing it with zero. The same move is then

stored at a new location, calculated with this zero in place. This extra

storage at  imperfect windows means that the move will be returned in a

variety of circumstances, rather than only in one circumstance. Suppose the

game proceeds, and a new window occurs. If there is no move stored for that

window, the program could successively zero-out window elements, searching

for a fuzzy-stored move. Any returned move would not be as good, since it

may have come from a seriously different situation (in fact in OTHELLO one

would never do this!) but at least there would be some output where before

there was none.

 

One of the things people do, that is difficult for computer programs, is

constantly to generate behavior even in the absence of a precise image,

either of the current situation or of the desired outcome. We flail about,

expecting, if nothing else, to be smacked on the hand by Teacher when we

mis-behave, or we expect to be told "you're getting warmer". This smack, its

absence,  its opposite (a reward), or its fuzzy version ("warmth") lets us

know whether or not our flailing was appropriate, and next time the same

situation obtains, we can more usefully direct our behavior. The initial,

inexact step - the generation of efficient "flailing" - is trivially easy

for us, and damnably difficult to encode. The combination of the PURR-PUSS

algorithm described below, with these elementary fuzzy techniques provides

an inroad to this problem that is both very simple to calculate  and 

procedurally efficient. This is another way of saying that an inroad on the

problem exists that is related to very simple mathematics - and that is one

of my overall values in programming.

 

The third type of fuzzy storage also operates in the realm of PUSS modules,

and  requires the storage of the new input at locations calculated from

windows which have been adjusted slightly. Thus one might store the same

OTHELLO move at locations calculated from all the following windows:

 

 

 

     1     2       1       0      0      1     0     1      ( the original, central feature vector)

     1.5   2       1       0      0      1     0     1

     1    2.5      1       0      0      1     0     1

     1     2      1.5      0      0      1     0     1

     1     2       1      0.5     0      1     0     1

     1     2       1       0     0.5     1     0     1

     1     2       1       0      0     0.5    0     1

     1     2       1       0      0      1   -0.5    1

     1     2       1       0      0      1     0     0.5

 

               Etc....

 

Of course this procedure is isomorphic with storage at nearby locations in

the space involved. There are slight procedural differences which make one

or the other more convenient at different times.

 

 

Fuzziness is expected to be applied to every aspect of this program,

including reinforcement. The same idea (that MS points are replaced by

clouds in the manner of wave-functions) will be applied to the adjustment of

choices and methods of choosing. If one specific transform, for example, is

reinforced, under a certain set of conditions, then other transforms would

receive reinforcement to the extent that they existed "near" the first one

in MS, and to the extent that the "set of circumstances" resembles the

original. This is only possible because all the parameters of all the

behaviors and transforms available to the algorithm exist as similarly

defined collections of MS locations, pointer-types and processes - processes

and pointer-types which are themselves also located in the space.

 


     Meaning Space and the Management of Conversation


At this point is is possible to describe some of the nuts & bolts of conversational

decision making. There are a number of objects and relationships, all of which

are visible to the central processing logic.


  Objects
     Axis
     Axis/value pair plus pointers
     Series of a/v pairs (one level in a word's definition)
     Series of levels (complete definition of a word, bomb, or transform: not necessarily associated with
                             a dictionary entry)
     Path (a series of objects with the "word" data-structure)
     Transform (generalized instructions for: how to get from one specific path to another)
     Hi-Transform (generalized instructions for: how to get from one specific transform to another)

  Relationships
     Adjacency (words next to each other in input sentences)
     Relational adjacency (words associated by bombs, vibration, resonance, etc)
     Transformational adjacency (objects and the objects to which the algorithm can get through transforms)


The algorithm functions by learning associations between any of these objects,

via any of the relationships available. This is purpose of the PURR-PUSS process,

as described below in the history of previous programs. One of the phases of

program operation involves the traversal of very large training sets, from which

information is extracted, buiding a large learned and structured internal database.

When a reply is being formed by the program, the main process involves the

creation of temporary objects according to the stored associations in the database.

The creation of these objects takes a variety of forms, as does their use. It is the

tuning of these processes by reinforcement that will progressively refine the

algorithm's conversational ability.


As an example let's suppose the program has been asked "What do you want to

do?" and that the reply-formation transform has decided that the reply verb could

be "to go". The simplest path to path transforms produce fragments of the form

"I want ....." when confronted with the input "What do you want....", and with

the selected verb, there would then exist, as a starting point for this example,

the fragment "I want to go .....".


The simplest relationship is adjacency. (Appendix III has the complete list of

relationships.) There is a very direct, simple, and fast procedure for accessing

the words that have ever occurred after the fragment "to go", and it produces a

list that looks like this:


home
by car
to the bathroom
quickly
away from here
on American Airlines
etc.


Having a word, of course, allows access to a definition that includes large

amounts of information both from the dictionary and from the context-dependent

parts of the data-structure.


A rather simple statistical summing procedure easily separates the words on this

 list into three clusters, according to common axis values present in the

definitions of the words. A complete list of these collapsing functions may

be found in Appendix III.


locations (destinations)
means     (modes of travel)
adverbs   (qualities of travel)


Each of these clusters represents a collapse of template windows taken from

the listed words' definitions. The object created by this collapse has a list of

axes, values, pointers, etc., but may not correspond to any word in the dictionary.

The summing function returns not only this word-like object, but also a measure

of the strength of the association. There are of course a variety of functions that

could be used to measure the relationships between definitions, but for the

moment let's just imagine that the function is simple distance in MS (see "templates"

above). The example in question involves three clusters of words that are very,

very similar in definition, and the clustering would be recorded as being very

strong. (Naturally these cluster calculations need not be repeated, but since they

 involve not only words, but phrases and context-dependent information, there

would be many cluster lists created, some of which might never be used again.

PURR-PUSS, as usual, keeps track of which clusters have been calculated already.)


In our example, therefore, the program has created a number of objects that represent

classes of learned associations relevant to the current context. Presumably, earlier

processes have produced other temporary objects. In this example, the crucial decision

to be made is a selection of one of the three clusters as a direction for the

conversation to take. Let's suppose that the conversation had, for awhile, been

 involved with determining what time it is. The discussion of time would have

created a temporary object that is essentially a generalized adverb, and the

resonance function would have no difficulty pointing out the match between

the previous conversation's focus on time and the current reply's "adverb"

cluster. Thus the program uses both definitions and context, plus all the specific

words and phrases in play, plus its stored database of associations.



The resonance relation is actually rather crude and arithmetic. Much more

interesting are the topological possibilities. The definitions of words are

appropriately visualized as clouds with a density that increases as some

relatively central point is approached. If the word is thought of as a template

with narrow, limited window openings, then the cloud shrinks as the windows

are narrowed. Opening a particular window stretches the cloud along a

particular axis. Imagine that a cloud has been stretched 'way out, and that we

decide to be interested only in a region with a certain minimum density, or,

said another way, within a certain distance of the central values. This entity

approaches being a line, if only a few dimensions are involved, as the value along

one increases with respect to the values along the others.


Having created a line, it is easy to imagine that the opening of two template

windows would stretch a cloud in two directions. A similar restriction by

density creates a surface in MS rather than a line. Clearly this procedure is

available for any number of dimensions. It is the interaction of these surfaces

that is expected to provide conversational behavior more advanced and

sensible than the simpler examples presented above.


The clusters created by each word and phrase acquire a contextual axis-value

according to the relation that was used in their creation, and the clusters may

 themselves interact directly, creating descendents. For example, pronouns

create clusters with wild-card openings. The sentence "Which one?" creates

a cluster with an opening for a noun. Any noun that has been lit up, or any

class that appears as a cluster, will be attracted by this opening, and the

resonance relationship will increase the energy of the noun. Whichever

noun enjoys the strongest resonance relations with all the clusters present

will have a favored position in the stack of possible reply words. It may

also be sensible to allow the cluster with the opening to change or spawn

a descendant when its wild-card position is filled.


The filling of a wild-card slot is a fairly obvious thing to do, but the MS

structure allows for many more interactions. During the learning process,

clusters will be created as input arrives. Subsequent input, presumed to

be "correct" or "desirable", provides targets to be sought out amongst

possible relations between clusters. If two clusters are found that combine

in some way that points to a word that appears in the subsequent input,

then that relation is reinforced, and when clusters of the same type are

present in the future, the reinforced relation has a better chance of

influencing outcome. This lengthy process is expected to take place

primarily as a dark-cycle crawler (which see). In any case a repertoire

of reinforced relations will be built up and refined as input is analyzed.

All the relationship types, and every method of pointing from one place

to another, will be a candidate for this process.


A single cluster can also be formed whose axes and values age rapidly and

diasappear frequently when not renewed by successive appearance in the

conversation. Such a running sum could contain a relatively rich definition

of the "current subject", or, if its summing-function were an operator like

"is subject to", then the cluster could define a region in MS containing sensible

grammatical objects for verbs, prepositions, etc.  Dark-cycle evaluations of such

clusters - again, like ECONOMIST - provide an optimizing process independent

of any teaching.


Using a subtraction rather than a summing in the formation of a cluster leads to

a different set of possibilities. For example, any program like this must keep some

kind of historical tree of the current conversation, so that side-subjects (such as one

required if the teacher uses a word the computer has never seen) can be pursued,

and then originating threads can be taken up again. One sort of branching that must

be dealt with in such a tree occurs when a sentence like this comes up:


    "George thinks it's green, but Bob says it hasn't been dyed yet."


There are two branches that must be analysed independently. The program must

establish stable elements "person who believes" and "thinks = says in this context",

and then must figure out what the relevant conflict is. A subtraction cluster would

result in time axes, since that is the difference between the two un-stable sentence

elements "green" and "hasn't been dyed yet" is "time". This would tell the

algorithm that the important thing to discuss involves not Bob or George, not green

or dye,  not it or is or hasn't, but "when". Therefore the sensible reply requires nothing but

the necessary pronouns and auxiliary substitutes and time-words:


    "When will it be done?"


 

     Starting from Scratch

 

An obvious weakness of any program such as this would be an inability to add

understanding of the world as the program continues to function. Clearly it

must be possible to add new words and behaviors, automatically - that is,

not by explicit direction from humans through either programming or

linguistic instruction. Likewise, the program must be able to sense the need

for a new axis, and then to redefine words to include values on the new

axis.

 

There is some reason to believe that the brain is not an empty,

connection-free void at birth, but rather that it has vastly more

connections than can be useful, and that they must be pared down before any

sense can be made of sensory input. This is the opposite paradigm from that

used by most learning algorithms, which start with a literally empty memory,

and add to it progressively. When  operating from a completely ignorant

starting point, this program will utilize some procedures that view the task

from both directions. There is no functional difference between the two

following computer realizations: 1) consider all possible connections in MS

to exist a priori, and use reinforcement to remove them successively; and 2)

allow a random vector generator to be the source of behaviors presented for

reinforcement. (Even if all connections exist to start with, they have to be

selected one at a time for testing by the reinforcing routines. One way to

model equal probability of selection of  behaviors is by the use of

randomness in the selection procedure.) For a reinforcement-guided system to

function, there must be some behavior to reinforce. Generating the

"flailing" with random selection of behaviors, or through random creation of

the behaviors themselves, may mimic this full-at-start-up situation in the

infant brain. This randomness is the most primitive way new entities would

be created.

 

The addition of new words will happen all the time, in a sense: anything

that has a location in MS might be a word - the program will in many

circumstances identify MS locations which have no definition, and it will

have no option but to ask "is there a word that means: ____" and then list

the axis information associated with the location. It is something of a

conundrum as to what the program should do if Teacher says "No, there is no

such word". Should the computer refuse to make any calculations involving

that MS location? Or assume that it is in some sense incorrect, invalid, or

negatively reinforced? Or should it insist that its new "word" be recognized

and allowed into the vocabulary of the conversation?

 

I can imagine some limited ways for a program to recognize the need for a

new axis. For example, if a color-blind program were taught that there

existed "grannysmiths" and "goldendeliciouses", it could become clear

through parallel sentence structures that these two noun-classes were

identical with respect to every axis the program knew, but that Teacher knew

the two classes differed in a single way. The program obviously has no way

to divine what word Teacher would attach to the new axis, and so it seems

legitimate for the program to be constructed with the knowledge that it may

ask for the name of the new axis - in this case, "color".

 

Some new axes would be directly suggested linguistically. Colors, for

example, could be thought of a different "types of light". Any situation

involving different types of anything could be seen as an occasion to add an

axis, or at least to ask Teacher if one is needed. Furthermore, from a

purely arithmetical point of view, every multi-coordinate location in MS can

define a new axis, although not a new orthogonal one.

 

Additionally there may be purely mathematical techniques to apply. For

example, consider the situation described above. Suppose the program

discovers two transforms which can operate on a single sentence which

includes the word "apple", and that the two resulting replies are both

accepted as equally true by Teacher. Additionally require that the two

resulting sentences have a parallel grammatical structure and the same parts

of speech in the same positions. In some such pairs of sentences, there

should be a relationship between 1) the higher-level transform one would

have to use to get one transform from the other, and 2) the new axis that

would be required for the program to distinguish between the two varieties

of apple.

 

It is another matter to automate the geometric definition of the new axis.

The assumption that the new axis is orthogonal to all existing ones is

clearly wrong - among 500-odd axes there is almost certainly going to be a

closer relation to some axes than to others (and the more similar an axis is

to another in meaning, the further from 90 degrees  is the angle between the

two).

 

This matter may be irrelevant - if it turns out to be important that the

definitions of words be "correct", then this geometric puzzle will have to

be solved. Fortunately, perhaps, this may not be a requirement at all, since

it appears that input encoding in our brains works perfectly well even though

we have to use whatever (quasi-random) physical connections are established

during the development of the embryo.  These connections, that cannot be

genetically specified at the level of detail of a the processes of single neuron,

are used by the learning algorithm of the brain, no matter what their details are,

and in an analogous way, this algorithm could use any set of defining axis-value

sets as label-generators, as long as they were sufficiently separate and sufficiently

rich.

 

 

     Input Encoding

 

One of the most difficult questions for neuroscientists has always been how

information is encoded in the cerebral cortex. It is obviously nothing like

the neat one-to-one mappings used in computer databases. Memories survive

physical damage, removal of half the brain, and the superposition of

seemingly infinite amounts of additional storage. What is going on? I

propose we not even try to answer precisely, but rather ask "what is the

character of what is known about the cortex", and see how far we can go with

simulation.

 

We know the cortex is a system which has a current state,  that this state

evolves with time, and that the cortex adjusts, according to input, its temporary

and permanent internal structures (or circuitry, or resonances, or RNA, or

something). We know that the system is capable of treating the same input the

same way, time after time - this is called recognition. Somehow, initially,

senses send information to the flexible changing cortex in a way that is

stable, detailed and repeatable, and something physical happens which

produces the same change of cortical state, time after time.

 

The definition of words according to axes in a meaning-space might appear to

be  an attempt to establish a computer structure which can contain the

meanings of words. Although I briefly was entranced by this possibility, I

have come to understand that such a grandiose and difficult task is unlikely

even to be breached by such a simple construct. However, these detailed,

stable and  repeatable definitions absolutely do provide a program's

changeable, flexible memory with a detailed, stable, and repeatable system

for input encoding. The success of MS in making the actual meaning of words

available to a computer is unlikely - as unlikely as it is difficult to be

"correct" in one's construction of the axis system and the definitions

included therein - but the utility of a stable input encoding mechanism has

been demonstrated throughout the history of programs created in this series.

No "correct" analysis of Gregorian chant was performed in the creation of

CHANTER, and no suggestion of "understanding of meaning" was contained in

the machinations of ECONOMIST, yet each was able to operate with some

facility in its chosen encoded realm. The dictionary of MS-defined words at

the very least creates a stable input path with sufficient richness to allow

for differentiation amongst a wide variety of objects. The combination of a

few hundred axes of "meaning" with the few dozen types of operators,

pointers, bombs, and so on, unquestionably provide the raw material for the

creation, by the stable and repeating mechanisms of the program, of a highly

ordered, entirely deterministic internal tapestry. The output routines that

interpret that tapestry for an observer, if properly and sufficiently tuned

by reinforcement, might allow the observer to witness the same sort of

filtered, processed regurgitation of that elusive fantasy - objective

reality - that we see every moment in ourselves and in each other.

 

 

Program Structures

 

There are a number of program structures, not directly connected with

meaning-space or the definitions of words, that appear to be necessary for

this project.

 

 

     Temporary Definitions, Parallel Reasoning Structures and the BigBuffer

 

The definition of a word has both stable and context-dependent parts. The

stable parts of the definition are part of the word's location in MS, but

the context-dependent parts can only be determined at run-time. For example,

"finding" is forced to be a noun by the presence of an article - as in "The

judge authored a finding....". This means that words must be placed in

temporary buffers when they are "in play", as it were, and part of the

pre-processing done on input must be able to adjust values in the buffer so

that the word can be interpreted properly for the current context.

Operations on the definition of the buffer would only effect that part of

the word's permanent definition which catalogs what objects (as in direct-

or indirect objects) or associations the word is able to form.

 

If a word is forced to exist as a part of speech not originally included in

its definition, then a new word must be placed in the database. There are

avenues by which the program would then be forced to construct questions

about the new word. First, there are certain values which must be present in

the definition of a word, purely for the purpose of managing the data

structure. The presence of unknown values - which would result from the

creation of a new word by the program - is easy to recognize as a focus

point for the program's attention. Second, the history that the program

keeps of its activity assigns dates of usage to words, and a new word's low

age can serve as another signal to the program's attention-focusing routine.

Third, a new word would have no poles toward which it could vibrate.  A

motionless object is easy to pick out among a large number of familiar - and

therefore oscillating - entities.

 

 

The use of temporary buffers for words' context-driven temporary definition

parameters is a part of another necessary organizational structure for a

program like this. It must be possible sometimes for the program to keep

track of two branches of a conversation, and eventually return to an earlier

branching point. For instance, the appearance of new information will

usually force a somewhat standard set of questions to appear, and the posing

and answering of these represent a side-branch of a conversation. To be able

to return to that point in the conversation at which the side-branch

originated, the program has to keep a sort of historical tree that specifies

which sentences belong in which branch. This parallel reasoning structure

would have to be used if the opinions of two humans had to be analyzed

("George said apples are red, but John says they taste good.") Parallel

grammatical structure of adjacent input phrases, and a number of signal

words (but, however, either) serve as signals that such branches are needed.

 

Finally, there are two parts of the program's function which require that a

very large buffer be kept, which will essentially list the elements which

are currently active. First, the idea that a region of MS is to be "lit up"

by the explosion of a bomb, as well as the idea that clouds in MS might

interact on their own - that is, as sub-conscious processes outside of

reply-formation - require that the algorithm be able to perceive that some

subset of all MS locations are currently available, or are currently active.

I know of no way for a computer to globally scan its memory and pick out all

the points which have some characteristic, outside of the kind of

brute-force search which the PURR-PUSS memory system is specifically

designed to obviate. If an MS element is to be "lit up" and therefore is to

draw attention to itself, it must be present on a short list which can be

efficiently scanned. (The brain does all this really well - anything can be

"lit up" and our attention drawn to it. This is the definition of focus, or

attention.) This list will be large - probably thousands of elements - but

it will not grow exponentially as the program progresses, and its items will

"age" and "die" as "time" passes, allowing for a stable buffer size to be

maintained. This is a programming element which is a part of the

housekeeping necessary for other parts of the algorithm to function, but

which has  little "meaning" of its own. It would be reaching to assert some

sort of similarity to short-term memory.

 

Of course, it may turn out that the most efficient implementation of this

buffer would still involve the PURR-PUSS storage/memory system: in

this case, the input to the memory-probing routines would just be the label

"BigBuffer", and it would then retrieve a series of items' labels - these

would be the items currently "lit up". Each time the buffer is accessed,

its members "ages" would be incremented, and sufficiently unused ones

(this is the same as "sufficiently old", since use of an element resets its "age")

could be removed. If one of the tricks of PURR-PUSS storage is used,

then it could be easy to retrieve subsets of items in the buffer simply by

including the limiting match in the input to the probing routines.

 

     Pre-processing

 

The user's input must be parsed and pre-processed. Musical input would have

to be decomposed into rhythmic & pitch elements. Input from proprioceptive

sensors would have to be encoded. As an example in the linguistic area,

pronoun antecedents must be determined, either by analysis or by formulating

questions.

 

In initial formulations, I have decided not to bother with verb forms, but

rather to reduce all conversationally used verbs to a schematic

representation such as " to go: past plural conditional ". Additionally,

many, many single spellings map to multiple meanings. I have also decided

not to trouble the computer with this problem at the outset, but to

make sure that every word taken as input has a unique definition. "Mean"

will mean "nasty" when it is given to the program as "mean2" and it will

mean "intend definitionally" when given to the program as "mean1".  Of course

these are essential aspects of "natural language", but one must crawl before one

runs.

 

     Reply Formation and the Internal Conversation

 

After all the preprocessing is complete, the program will be in a position

to "construct a reply".  If the reply is to be a continuation of a musical

idea being cooperatively constructed, then the procedure is simple and very

similar to that used by CHANTER. The linguistic process will result in one

of a number of outcomes, eventually arriving at output; there will usually

have to be  a good deal of initial internal conversation. For instance, a

simple PUSS (see below) will determine whether input has been seen before,

and what responses were previously generated. If a question has been asked

and answered before, the program must recognize this and either repeat the

previous response, asserting that it is a repetition, or it must take the

previous response series as new input so that new output would result. (This

routine would also naturally prevent excessive repetition of musical ideas

already used, or inappropriate looping of physical moves.)

 

Even the simplest conversation requires considerable internal discussion on

each side. If you tell me "Spot ran home" I know that we're talking about

the past, that Spot wasn't home before this, that Spot was at least at home

for a moment at some point after running there, that you can see Spot or

that someone told you about him, and so on. I expect the program to spend a

good deal of time talking to itself like this, and that its arrival at

output will be like the arrival of CHANTER at a series of pitches and

symbols known to be "cadential", that is, ones which signal an ending. On

the other hand, since we humans will be able to watch the entire process as

it happens, we can intervene whenever we choose - whenever we think

sufficient work has been done that the current ur-path looks like output.

Since the algorithm fundamentally operates by making predictions based on

past experience, each time the Teacher intervenes, the algorithm will learn

something about "when it is appropriate to stop" and say something "out loud".

Eventually, then, as the program learns other stuff, it will also learn to

stop and output its current state. (The intervention by Teacher in the

middle of internal calculations may be more like schizophrenic "voices" than

like a conversational interruption. Who knows if this might not turn out to

be counterproductive?)

 

The program will utilize a large number of different methods of finding or

creating relationships among words and phrases, and a number of methods of

getting these relationships into and out of conversation. These methods form

the core of the program's function. Conversation inherently includes

mechanisms for reinforcement ( a mother responds to her child in ways that

tell the child whether its "output" is appropriate, correct, useful, etc.)

and this reinforcement - positive and negative - will be stored and used to

influence the process in the future. Presumably the various methods will

have different utility under different circumstances, and the reinforcement

regime will allow the learning of which methods are best applied under

various circumstances.

 

 

     The Program's Cycle

 

When not engaged by a human, sub-conscious processes will run. Various

"crawlers" (inspired by internet search engines) will peruse the database,

establishing interrelationships of various sorts described elsewhere. These

processes will search for matching patterns, compare definitions and add the

results of comparisons to the database, perform iterative forward and

backward chaining searches to establish remote relations between words in

the manner of a neural net, allow regions of MS to interact according to

their intrinsic parameters and resonances, manage the list of questions that

need to be asked of Teacher, construct word-primitives from word

definitions or add phrases that are sufficiently repeated to be used

as single words, and so on.

 

 

Pattern recognition is an essential first step in many of the processes to

be used. The model for this is the antibody: crawlers will search both input

and the database for objects which fuzzily match items that have already

been seen. These crawlers, and other pattern-matching processes, operate

like antibodies, that float throughout biological systems and bind

selectively to objects whose structure fits the antibody's receptor

proteins. The various fuzzy-data techniques will allow the pattern

recognizers to bind not only to precise matches, but also, more weakly, to

objects which only partially match the optimal target.

 

When engaged by a human user, these processes will be interrupted, and the

program will either "listen to a story" - that is, take input without

replying - or it will attempt to engage in conversation. This "conversation"

may involve linguistic, musical, or virtual-physical events. The attempt to

reply will involve a number of stages, including pre-processing and parsing,

internal conversation, reply formation, reply testing, and the management of

repetition.

 

     Managing Reinforcement and Conditioning

 

Computer programs inevitably involve explicit machine instructions and

precisely defined memory management, and thus one is always in the position

of being able to keep a complete a record of program behavior. In this

project, constant interaction with humans provides a  flow of positive and

negative reinforcement coming to the program. Together, the ability to

maintain a history, and the presence of reinforcement, mean that program

behaviors can be associated with rewards and punishment in very detailed

ways. In fact almost all of the algorithm's decisions about which methods to

apply, under which circumstances, will be formed by long term, statistical

summing of inferred approval.

 

It is expected that a great deal of direct reinforcing input will occur

naturally in the behavior of Teacher, as the program sputters and stumbles

through its early attempts at behavior. Any reply from a human that starts

"No,..."  or "Yes, that's right,....." is an unambiguous message that the

program's behavior has succeeded or failed. Additionally, any continuation

of a conversational tack (without some corrective or negative word) is itself

intrinsically positive - in answering a statement that is nonsense, no reply is

possible that is honest, kind, sane, and non-humorous except one that somehow

recognizes the nonsensical nature of the stimulus. It will be amusing (!) to

watch instanciations of this program be driven insane by ill-conceived or

incompetently executed habits of reinforcement. Or maybe we'll just do it for fun.

 

A biological organism's infantile behavior is heavily influenced by rewards

and punishments that are central to aspects of its existence. Unfortunately,

eating and pain are things that computers just don't do. These sorts of

motivators provide the background raison d'etre for much of  the logic that

underlies the behavior of biological organisms. How shall our artificial

behavior-generation deal with the absence of these intrinsic motivations? It

seems completely "cheap" or "like cheating" simply to write computer

instructions that say "try to find food" and then assert that one has

created a computer program that experiences hunger.

 

A start towards finding a useful understanding of reinforcement can be the

ideas from 19th Century philosophers involving input limitations for human

minds. The central point is "all information - all input - that a human mind

receives is mediated by the senses". It is pretty well known how sensory

receptors work, and pretty well known how that input is encoded by the

sensors themselves. Although it might be technically beyond us at the

moment, one can easily imagine intervening in the process, and sending to a

brain artificially created sensory information.

 

It would therefore be possible completely to modularize the function of the

system's elements: there's a sensor, there's the message it sends, and

there's "central processing". If all input is sensory, then it is reasonable

to assert all events taken to be reinforcers reach central processing as

sensory input. This is half of what we need: if it's correct, then the input

path to reinforcement can be automated, and there's no philosophical

"problem" about artificiality or motivation.

 

The other half requires that the modularity postulated for sensory pathways

exist as well within what I refer to as  "central processing". This very

well may be an oversimplification, but it's a useful way out of the problem

of defining motivation and reinforcement described above. If the pleasure

resulting from the satisfaction of  hunger is isolated neurologically, then

we can just as well imagine intervening in that set of messages as we can

between sensors and processors. If this is true, than a baby couldn't tell

the difference between its own perception of pleasure and the perception of

artificially induced pleasure. If this is true then we need not concern

ourselves with the artificiality of motivation in computer programs. We are

"free" to define motivators and to set up definitions of pleasure, and

tropisms towards pleasure, without the danger of creating structures (in 

imitation of biological ones) only by "cheating". At some point in its life,

a baby "realizes" that  Mommy "wants" it to say things. Therefore it tries

to talk. We can just assert to our program "responding to input is a

positive thing" and add this fact to its database of behavior probabilities,

without worrying that this is any different from smiling at a child that

says "dada".


      Structures for the Implementation of Reinforcement


Neither single characteristics nor simple actions produce rich enough results

for their reinforcement to be particularly useful. The simple existence of a

light-sensitive patch on a flatworm would have no value to the individual

and could not therefore participate in natural selection. Such a structure

would have to be connected somehow to the worm's behavior, and this

requires much more information than that required to construct the patch.

It is necessary for combinations of apparently independent events to be

reinforced as a group.

On the other hand complete organisms are much, much too complex for

reinforcement to work. When the complete organism receives positive

reinforcement, or when it survives and reproduces, there is no way for

any process to separate out which of that individual organism's properties

were responsible, and how much of which properties were active.

Mother nature solved this problem in early evolution by selecting not for

expressed characteristics or behaviors but for the genes themselves. The

genes represent just that level of complexity most practical for reinforcement:

complex enough to be connected dependably with specific events, but local

enough to be dependably held responsible.

In this algorithm, the analog for the light-sensitive patch would be a single

object, or a single process.  The program has  hundreds of such simple

information-containing or behavioral options, and even larger numbers of

parameters to vary in the process of carrying out  procedures.  Small  instruction

vectors exist that direct the details of processes the program needs to carry out.

These vectors - containing roughly 16 bits -  are sufficient to direct one action.

They are structured exactly like words and paths, so that they may be used to

construct sentences, and can be constructed from them, and so they can be

contained as parts of words or the various types of summed objects that

occur.  They are also cataloged and stored like words.


It is these objects that build up a useful repertoire of reinforced behavior.

The program keeps a record of which vectors were used in the construction

of any object or path, in conjuction with details of the current situation. When

sufficiently similar situations occur again, the levels of reinforcement of the

available vectors with effect their selection. (Of course, the reinforcement of

one vector will cause reinforcement of other vectors to the extent they are

"similar" - in other words, depending on their "distance" apart in MS.)

 

     Structures for Handling Cognitive Dissonance and "Being Wrong"

 

A particularly thorny area is what to do when the program says something

incorrect, and then receives correct information from Teacher. Because of

their inexperience, it is natural that children are particularly prone both

to making inaccurate statements and to holding incorrect beliefs. In

response to this, teachers are equally prone to making statements which are

intended to contradict or to correct wrong ideas.

 

 

A central goal of education, growing-up, and training computer programs like

this one, is that errors not persist or recur. Three things must happen if

this persistence is to be prevented. First, an appropriate grammatical reply

has to be formulated to keep the conversation going. Second, the program's

internal response must recognize the nature of the contradictory input or

the mis-match between some aspect of the input and the program's data.

Finally, the database must be adjusted.

 

 

The first of these is expected to be taken care of by the program's usual

reply-formation routine, and should prevent one type of error persistence.

Consider the following exchange:


sentence #1:   program:  "Apples make great pets!"

sentence #2:   Teacher:  "No, apples aren't pets!"

sentence #3:   program:  "Apples not in [pets]?"

 

Sentence #3 is easy to generate because word definitions include

operator-pointers to classes to which the words belong, and to other words

that belong in classes defined by the words themselves. Having these two

replies to sentence #1 in its historical record, forward-chaining would

prevent to program from repeating the incorrect statement of sentence #1,

even though the reply-formation does nothing to correct the bad database

information or association which allowed the incorrect sentence in the first

place.

 

The program's pre-processing  mechanism will have options for recognizing the

nature of the error. The first possible route would be to find a two-stage

connection. "Pets" has in its definition the word "animal", and "animal" has

in its definition "motile", while "apples" has a different, contradictory

value in the axis related to movement. This sort of multi-stage

cognitive-dissonance discovery, so easy and clear to us, would require a

procedurally useful understanding of the verb "to be" and a laborious

comparison of a large number of sets of axes. Much simpler and quicker is

the simple recognition that "pets" is a class word, and that it has no

bomb to "apple".  The program then experiences no actual cognitive

dissonance, only the comparatively innocuous lack of an expected pointer. A

simple question generator then attempts to confirm that this lack of a

pointer is in fact meaningful - there is no pointer there because "apples"

is not  in the class [pets]. The confirmation of this fact would place a

negative bomb from [pets] to "apples", that would inhibit the restatement of

the original erroneous sentence.

 

 

The addition of such a negative-reinforcer in the database doesn't address

the need to remove whatever resource allowed the initial erroneous

assertion. If the stored history of procedures hasn't recorded that the

incorrect assertion was the result of a random choice (such as the flailing

which is part of "starting from scratch"), then it will have stored the

vibratory modes or transform involved in generating the assertion. These can

then be inhibited through the usual reinforcement regime.

 

Finally there remains the possibility that the database of definitions

contains an error. As programming begins, I have assessed the need for

database protection to be greater than the need for automated database

adjustment, and thus I have allowed writing to the definition database

itself to take place only upon receiving explicit answers to standardized

questions such as "Should I change this value in the database now?"

 

 

     Structures for Functioning without Meaning

 

Many human users of language have limited abilities to communicate precisely

about "meaning". Additionally I suspect few three-year-olds could tell you

in paraphrase what it means to "want" something. Therefore it is natural to

conclude that communicable, internalized "meaning", consisting of logical

definitions, is only one of a number of means toward the end of imitating

reasonable conversation. I opine that this mode is "late" - both in the

development of an individual's conversational capabilities, and also in the

biological evolution of our species' linguistics.

 

This conclusion forces another: an effective imitator of human conversation

(if not of pure human reasoning) is unlikely to function if it relies

exclusively on that "late" development. Fortunately, there are aspects of

"earlier" behavior generators which are substantially simpler to model than

meaning. These are not of any great interest, but they are necessary parts

of a program that is expected to generate behavior. For example, imitation

involves a good deal more than just aping, parroting or echolalia. It's a

fair bit of work to write a program even that remembers enough of training

conversations to imitate them. See CONVERSOR and CHANTER below.

 

     Keeping Track and Post hoc, Propter hoc

 

A history must be kept, of what has been said, by whom, and when. Proximity

of words in Teacher's input is an important first method for establishing

associations. Surely one of the reasons we have such trouble with "post hoc,

propter hoc" is that it is one of a baby's first ways of forming

associations, and that as such, this invalid reasoning method gets applied

all sorts of times when it shouldn't.

 

 

------------------------------------------------------------------------------------------
 

The History of Earlier Experiments

 

The series of programs leading up to the current project provided some

contact with ideas essential to the algorithm as it is currently conceived.

Specifically:

 

- distributed content-addressable memory (Purr-Puss)

- pattern recognition (character recognition)

- large multi-dimensional feature spaces (Organism)

- production systems & cellular automata (insect simulations)

- machine learning (automatic extraction of rules from an environment - Economist)

- creative behavior generation, originality & expert systems (Chanter)

- simplification of grammars for computer representation (Conversor)

 

     OTHELLO - 1982

 

     "Othello", also called "Reversi", is a simple game played on an 8x8 board,

onto which counters are placed alternately by two players. All that matters

for this project is that the edges of the board are 1) of special importance

to play and 2) are all equivalent (such that a strategy useful on one edge

is equally useful on the others.)  Board spaces may be empty or belong to

one of two players - these are the only three possibilities. Therefore, the

state of each space may be represented by a ternary bit (a piece of data

consisting of 0,1,or 2), and all the possible states an edge can therefore

be represented by a number from 0 to (3^8). This is not a particularly large

number, and a data array can be declared with 3^8 places for numbers from 1

to 8 (representing the 8 places on the edge). First the system is set it up

so that the computer watches people play. Every time a player makes a move

on a non-empty edge, we store their move, that is, the number representing

the space they moved into;  that move is stored at a location in the data

array determined by the state of that edge before the move was made. In this

way a database of moves onto edges is created, that is indexed by numbers

calculated from the state of the board before the move was made.

 

Look at         What you see tells            What the expert player actually
the board       you where to store                  does tells you
                                                                     what to store

 

Later, after lots of moves are stored, when you look at the board ("looking"

tells you where something relevant might be stored) you can look in your

data array, and if you find anything in the array (at the memory location

which the board tells you to look in), then  you know that the datum stored

is a move that the expert player thought is best, given that board

configuration.

 

Various excellent computer programs exist now against which humans may play,

but if one is limited to 8-bit processor technology and 32 kilobytes of

memory,   the capabilities of possible programs are severely limited. Using

this stored database of edge moves improved the operation of one such

program substantially.

 

Two things are important to note, for the purposes of the history this paper

recounts: 1) little understanding of the game was required, nor was any

programming cleverness needed - this method of improving the play of an

extant program relies entirely on un-intelligent brute force;  2) no

searching is required when looking up a move to use - the current state of

the board directly provides the location of the stored move. "Search" is a

major subject in basic computer programming, and is an area in which the 

polynomial explosion of task-duration can become the determining factor in

deciding whether or not a program can ever run to completion.

 

 

 

     ROBOT - 1983

 

A small robot vehicle was built, using as a platform a steerable motorized

model tank. An umbilical connected the model to an 8-bit microcomputer. Two

sensors were used. First, a bumper mounted on the front sensed the presence

of an obstacle (and stopped the motors) when the model ran into it, and second,

a stepping-motor-mounted, sonar range-finding device from a Polaroid

development kit enabled the robot to scan the surrounding area, returning a

low resolution image of the shape of the environment. These two sensors

returned information about the area around the tank to the computer. Using

the same brute-force learning idea described for OTHELLO, this robot learned

to recognize all the locations to which it had access, and learned to get

from any one of those locations to any other. Once again, no analysis (in

this case, either of the sensors' behavior or of the environment itself) was

necessary to allow for the functions of the robot to proceed. This is

important to remember: even though the behavior of the sonar sensor was

extremely bizarre and seemed unpredictable to us, the learning algorithm

didn't care - as long as the sensor's behavior was reasonably similar each

time it looked at the same location from the same angle, its output was

perfectly usable.

 

     CHARACTER RECOGNITION - 1984

 

Using a low resolution optical sensor mounted on an x-y scanner's transport,

information about small areas on a page were read into the same 8-bit

computer's memory. Using rote methods similar to those of ROBOT and OTHELLO,

and a transport-controlling algorithm similar to ROBOT's image-building

method, the program learned to recognize typewritten characters. This

scanner was initially intended to automate the provision of input

data for ECONOMIST, another program described below.

 

 

 

     PURR-PUSS

 

The idea of content-addressability used in all these programs was inspired

by a program called PURR-PUSS, invented by Andreae and described in the book

"Thinking with the Teachable Machine". This acronym stands for "programmable

un-primed rewardable robot with a predictor using slides and strings".

 

The locations in computer memory at which OTHELLO's moves were stored were

calculated from the current status of the board. Therefore, moves could be

found without searching a database of  board configurations. Rather, the

board configuration itself told the algorithm where to look for a  move. The

environmental images in ROBOT were encoded in such a way that allowed the

names of environmental locations to be stored at memory locations directly

calculable from the observed physical space - no searching through a list of

location-descriptions was required. And finally in recognizing characters,

the images themselves directed the computer to memory locations where

commands for moving the scanning sensor were stored, or where it would find

the names of characters recognized.

 

 

In each case, the algorithms were "un-primed": that is, no analysis of the

relevant concepts was required to create an algorithm capable of functioning

in the environment.

 

 

Another interesting feature of PURR-PUSS is the use of a technique of memory

management in which information is stored in a redundant fashion, spread out

through available memory. Although the mathematical underpinnings of the

three systems are unrelated, PURR-PUSS, holographic storage, and the

cerebral cortex all share these properties. The important thing about the

PUSS memory-management algorithm is that it provides content-addressing,

like that described above for the three programs, in a way that allows

unlimited input strings (i.e., it is as if the board positions in OTHELLO

could be as large and complex as you like) and can squeeze very  large

amounts of data (amounts which don't have to be specified in advance) into

large, mostly empty computer memories. The original PURR-PUSS algorithm

can also be modified  1) to return ordered series of objects that are of unspecified (and

essentially unlimited) length,  2) so that stored memories can be erased, 3) so that the

output of some PUSSes may be used as input for others, and 4) so that

conditions that resulted in storage of information may themselves be recalled.

 

     ECONOMIST - 1985

 

Each of the applications described can be viewed as performing the following

task:

     1) look around

     2) calculate a memory location from the image received

     3) examine that memory location - if it is empty, ask Teacher for input.  If it is

        full, return the information to the algorithm

     4) execute whatever commands are either found in memory or received from

          Teacher; this changes the environment, making it necessary to           

            return to step one

 

 

 

       1                         2                             3                                4  

OTHELLO:

 Examine the     Calculate a number      Find a learned move         Execute the

    board               from 0 to 3^8          or ask Teacher                   move

                                                          for a good one

 

ROBOT:

  Scan the area    Calculate an address    If the spot is known,   Execute the move

                                                            return its name

                                                           Or, command the robot

                                                           to move and re-scan

 

CHARACTER RECOGNITION:

  Read out from    Calculate an address    If the image is a known    Execute the move

    the imager                                        object, return its name,       or move on to

                                                         or command the scanner       a new region

                                                         to move the sensor               to examine

 

 

The three programs described so far each looked at single events (either board

configurations, images of the robot's environment, or images from the

scanner) and store information about those isolated events. Another task

structure to which PURR-PUSS is suited involves series of numbers. Take for

example the following series of numbers, as if printed on a paper tape:

 

     1 2 3 1 2 3 1 2 3

 

Nothing is more primeval for a computer than the examination of successive

spots on a tape! Suppose we set up our system so that it can "see" two

numbers at a time, and that it can move along the tape. It is useful to

imagine a transparent "window" moving along, which is able to see only a

fixed amount of the tape. Now, use what you see in the window to define a

memory location. After you look at your current window, and access the

memory location associated with it, you slide the window along one more

place, revealing the next number. This number is what you store at the

memory location you defined by the window as it was before you slid the

window along. Functionally the two numbers in the window are used to predict

single numbers that follow, and the ability to make predictions arises from a

brute-force examination of a training set.

 

In the case defined here, only three memory locations would be used. The

simplest formulation would be to set up 31 memory locations, and then

interpret whatever you see in the window as a decimal number: 12, 23, or 31.

As you slide the window along, every time you see "12" you would store a "3"

in the 12th memory location (or increment a counter held there), every time

you see "23" you would store a "1" in the 23d, and every time you see "31"

you store a "2" in the 31st. After going along three places, you would be

able to predict what the next number was going to be by looking at the

current window, then looking in that memory location. This is a PUSS: a

predictor using slides and strings.

 

This view of the task, as making predictions about the next appropriate

number in a series already established, suggested to us that the method be

applied to economic time series. (At this time the project involved three

people: myself, an electrical engineer, and a recent emigre from Russia, a

physicist, who was "hot" for the stock market.) For example, take monthly

interest rates for the last 10 years. Set the algorithm to look at index n

(representing, say, April, 1975). Use data points (n-1, n-2,

n-3,....) as environmental data to calculate an address in memory. Store the

actual value that occurs in the known sequence (data point n) in that memory

location. Increment to counter and continue. In this way, a database of

predictions is built up, and if, this month, the values of the last few

months point at a full memory location (meaning you had seen this month's

economic environment before) then you have a prediction to make about next

month's value for that time series. This is roughly  equivalent to an

autocorrelation calculation on a function.

 

This method lends itself easily to the combination of data streams. Multiple

time series can be used to provide environmental information (for memory

calculations) allowing storage of any other time series. Thus  the algorithm

brings about a correlation of numerous data streams for the prediction of

one. This is similar to calculating coefficients of correlation among

multiple data streams, but can be more flexible. To describe it in words:

one could use a series of the most recent interest rate figures to predict

the next interest rate, or you could use:  1) last month's interest rate,

plus 2) the same month's price of gold, and 3) the last six weeks' closing

Dow, to predict the average temperature in Chicago. Any combination of time

series data can be used to attempt to predict any other time series.

 

What is most advantageous is that the method can be tested to whatever

extent available data allows. A window can be defined, and then historical

data is allowed to percolate through the system - if any predictions are

made along the way, they can be compared with values which actually

occurred, thus providing an exact measure of the window's relationship to

the series one was trying to predict.

 

The output of a single PUSS module itself represents a fundamentally

different type of data stream. One can therefore combine information from

the environment with predictions made by some module of the program, to

create a multi-level structure that produces much more complex

inter-correlative behavior. Such multi-level structures may not correspond

obviously to any standard technique of statistical analysis. Suppose, for example,

that there  is a target time-series one wishes to predict, and that one has access

to  three other time series.  One could simply define a window that  includes

all four time series, but this  creates large feature spaces and requires

an equally large amount of training. 


Imagine, however, that a PUSS is used to make predictions based on two of

the available series. The predictions that it makes effectively combines the

information contained in the two windowed time series. This new series -

the predictions - can be used in a window for the prediction of the target series,

and the window will be compressed, allowing for a smaller amount of training

to succeed at making predictions. This is, of course, one of those information-

theory situations in which information is lost in the transformation of data

brought about by the 2-series PUSS, but the availability of a technique that

reduces the often vast sizes of the relevant feature spaces can come in handy when

the target system is sufficiently deterministic. Which, by the way, we discovered,

the relevant economic series are not. Chalk up another victory for the

efficient-market-hypothesis.


 
 

     SMON - 1999


Before proceeding to discuss applications which involve some cybernetic

complexity, one rather mundane application of this memory-management

philosophy must be included. The act of creating large programs in the

PASCAL programming language requires a great deal of extremely repetitive

and, once it has been begun, predictable behavior on the part of the

programmer. This is exactly the sort of behavior which all these simulations

address, and it seemed silly not to use the method to lighten the load of

its own creation.

 

 

This goal was to write a program which resides logically between the user of

a computer and whatever software is being run. The acronym SMON means "smart

monitor" with this placement in mind. What is needed is a keystroke monitor

which remembers what has been done before, and can offer the user assistance

at a speed faster than the software user's own typing. The program

calculates memory addresses from short sequences of keystrokes as the user

types. Whenever a memory location defined is not empty, you know that the

sequence just typed has been typed before. Because of the details if the

PUSS storage algorithm, finding one prediction can lead directly to an

ordered series of stored data which can be offered to the user as a

prediction of what was about to be done. Another detail of the PUSS

procedures allows multiple variants to be stored and offered to the user

selectively, based on the frequency of their previous appearance. In this

way commonly typed sequences in programming can signal the smart monitor to

offer options automatically. These options would be text to be inserted,

automatically - that is, without typing every character.

 

A typical situation involves the decision to insert,  between routines

already written, code for a function or a procedure. Such an action

frequently occurs after the programmer has written a 'calling' line in a

routine higher up in the stack. For example, suppose the programmer writes a

line such as

 

     x:=FunctionName( y );

 

                          and then moves  in the code-text to the place where functions

 are kept, and inserts a couple of empty lines in preparation for encoding the

 function. The smart monitor program would then offer the following two

options to the user:

 

Function FunctionName( temp:integer ) : integer;

var I:integer;

begin

FunctionName:=

end;

 

Procedure   ( temp: integer );

var I:integer;

begin

end;

 

The options can be selected using single keystrokes - a substantial saving

in typing.

 

As it turned out, computer technology was not quite up to the requirements

of this program at the time of its creation. If the same internal processor

(Intel - 486) was used to run both the smart monitor and the target

software, an unacceptable overload occurred. The monitor program had to be

tested, and its final form contrived, using a second computer set up to act

as the keyboard for the computer being used to run the PASCAL editor. Since

the operation of a PC keyboard is hardly trivial, an extra hardware

component had to be inserted between the two computers which converted ASCII

from the monitor-computer's output port into a keyboard-emulating series of

bytes. When this device, called a VETRA, was used, the two computers acted

in perfect concert to assist in writing programs.

 

     CHANTER - 1986

 

The combinations of multiple data streams in ECONOMIST suggests the behavior

of grammars. The first grammar examined was that of Gregorian Chant

melodies. These melodies provide two data streams as a starting point: a

series of pitches, and a series of notational symbols that are generally

taken to have rhythmic implications. A program was devised for composing new

melodies in the style of the training set (which was limited to Alleluiae in

the Phrygian mode). Essentially, the program would scan examples, storing

information about the rhythmic series and about the pitch series

independently. Predictive modules were soon able to begin producing output,

and this output provided a second level of pre-processed, or pre-correlated

data. After scanning "environments" -  namely, series of pitch-rhythm

combinations -  and storing, at each opportunity, whatever the real melody

"did" in that "environment" - it was possible to set the program running

from scratch, allowing its own predictions to become a new data stream,

which then became the "environment" that the program used as a basis for

more predictions. This process was allowed to run until it predicted the

combination of  symbols known to represent the end of a melody. The

resultant series can be executed by human performers, and if the same

performers then execute examples from the training set ("real" Gregorian

melodies), one is in a position to apply part of the Turing test for the

existence of artificial intelligence. Namely, if the source of the performed

examples cannot be identified by an observer - that is, if the

computer-generated examples cannot be distinguished from the examples from

the training set - then one of the conditions for passage of Turing's "test"

has been satisfied. Performance of the six melodies in the appendix

established that CHANTER had succeeded in this regard.

 

No analysis of Gregorian melody was undertaken to allow the program to learn

to imitate the style. Likewise, little planning of the "compositional

process" was required. The technique is almost as much a brute-force,

rote-learning method, as was the learning of moves in OTHELLO. It begins to

become important with CHANTER that no human understanding of the systems

involved is required. This is important, because, starting with CHANTER, the

systems involved are becoming so complex as to be beyond current human

capability unambiguously to analyze.

 

 

Rule-based musical composition programs are fun to play with, but they

almost always generate painfully dull, nearly featureless output, which is

extremely easy to distinguish from examples in the training set. CHANTER had

to be equipped with one special step before it began to create examples that

were successful.

 

 

Although I am a very poor composer, I am an experienced and facile keyboard

improvisor of useful musical material (having played for hundreds of hours

of dance classes). The mental procedure I use is absolutely clear and

conscious: listen to what was just played, and, using years of experience

with music, "calculate" what "should" come next. Play whatever should come

next, and then repeat these steps for as long as necessary. "Calculating

what should come next" is another way of saying "imagine what might sound

right". In the process of improvising, especially at a fast tempo, one finds

that one is almost constantly trying to "recover" from some unintended,

random accident of the hands.

 

A variant on this idea of "recovery" allowed CHANTER to succeed. If all the

program ever does is "follow the rules" that it has learned by examining the

training set, then it is missing one element of the compositional process: a

composer decides at every juncture whether or not to follow established

procedure, or to veer off into original, inspired territory. Having executed

an inspired move, however, great composers usually return to predictable

rule-based behavior, as if to "recover" from their own inspiration. CHANTER

models this procedure by periodically relaxing the control of its behavior

by the rules in the database, and allowing a rare or a relatively random

event to occur. This is allowed to happen rarely enough that the program has

time to recover from the unusual move in ways which match the most standard

practices it has learned. Further, the "inspired" moves are never allowed

either at the very beginning or near the end of a melody. The addition of

this procedure was the programming idea crucial for the creation of examples

that observers couldn't distinguish from the training set.

 

     CONVERSOR - 1987

 

The next grammar to be examined was that of natural conversational language.

Two data streams are extracted from training set conversations, data sets

which are very similar the two streams used in CHANTER. One stream consists

of numbers representing words. The second stream consists of the

parts-of-speech of those words. At its core, CONVERSOR first constructs a

series of parts-of-speech for a reply, and then assigns specific words to

those parts of speech. There is once again considerable interaction between

the two data streams, mediated by internal predictive modules at secondary

and tertiary levels. Once again, after examining thousands of words of

training, the program can be set running, creating its own independent

replies to comments made by a human.   There is, within the system, a

rigorous prevention of plagiarism, so that no reply of more than a word or

two ever made by CONVERSOR was identical to one from the training set.

Rather, the program reconstructs its own sentences from scratch. Thus

CONVERSOR is completely different from programs that  operate by remembering

previous conversations.

 

Of course, natural language is worlds beyond Gregorian melody in complexity,

or else our sensitivity to the definition of "correct" with respect to

reasoning and grammatical construction is much greater than our sensitivity

to the definition of "correct" with respect to a melody. In any case, the

results of CONVERSOR are considerably less successful, with respect to

Turing's test, than are those of CHANTER. The "conversations" are only one

step more advanced than programs like "Eliza"; examples appear  in the appendices.

CHANTER melodies may be viewed by going to http://www2.potsdam.edu/lanzcc

and downloading the chanter.pdf file.

 

 

     ORGANISM - 1992

 

Several small simulations were also created using the memory management

techniques described. The initial targets were insect behaviors that lend

themselves to treatment as cellular automata. The idea is to investigate the

level of complexity required to generate simple collective and constructive

behaviors. These programs were begun in 1985 and continued to be

investigated for several years.

 

During late Summer afternoons, clouds of dozens of gnats can be seen

hovering and swirling about. In these clouds, individuals fly around in such

a way as to avoid clumping on a small scale (and presumably avoiding

collisions) while maintaining the coherence of the group as a whole. What

complexity of visual imaging, and how many rules of behavior, are required

to imitate this result? A simulation called FLY examined this question.  As

screen-world flies move about, their simple eye provides an image of other

flies' positions. This image encodes a memory location, and in each cycle

through the individuals (during which they can traverse one screen pixel)

either the teacher provides a move, or else a remembered move directs the

fly's next motion. Within a few cycles, individuals begin moving on their

own, and by providing moves at each stage which appear to be appropriate for

avoiding collisions while retaining large-scale cloud coherence, Teacher's

input makes it possible for clouds with the required characteristics to

form. This program shows that screen world flies require only a few bits of

visual imaging and a small amount of rule-memory to imitate this behavior of

the real organisms. The result is chaotic - that is, it is entirely

deterministic, but non-periodic.

 

Similar simulations were constructed for a spider building an orb web (the

"spider" sees with its legs what has already been built, and decides on that

basis whether or not to lay down new webbing or to move somewhere) and for a

colony of screen-world ants who dig a burrow, dispose of the excess dirt,

find food particles, and store them back at the burrow.

 

Similar in structure was the program SOCCER (1989).  Screen-world soccer

players must behave in ways that can be directed using input paths and

window structures similar to the insect simulations. The goal of this

program was to show that interactive game logic could be addressed using the

same simulation ideas. In fact the behaviors required for effective tactical

cooperation required an expansion of the scope of the input encoding. In

programs involving environments less complex than SOCCER, it was possible to

assign single input parameters to each bit of the available input channels

(or to use PUSS terminology, to each digit of the window). For instance, one

Fly "eye" was always associated with a specific bit of the input channel.

With SOCCER it became necessary to automate the definition of input channels

so that they were associated with different observations under different

circumstances. This sounds rather more complicated than it is.  The first

understanding should be easy: clearly, players need to think differently if

their team is currently on defense or offense. This fact is always

available, and thus input channel definitions can safely be changed back and

forth as possession changes. In fact stability of these input channel

definitions is of no importance - they can be changed whenever the

environment presents unambiguous changes of circumstance the scope of which

exceeds that of the input channel involved. Such flexibility in input

effectively compresses more data into the same amount of input channel

bandwidth. In this program, the factor of compression was approximately

four.

 

These initial simulations of cellular individuals' behavior were expanded

into the largest program yet completed, called ORGANISM. In this program,

the role of the teacher is eliminated, and is replaced by a system of

positive and negative reinforcement based on natural selection. Screen-world

entities, each with about 120 variable characteristics (for instance, each

individual falls somewhere in a spectrum of tendencies to reproduce sexually

as opposed to asexually), live in an environment including resources (like

air, food, allies, and mates) and hazards (like cliffs, predators, and bad

"weather"). The characteristics of the organisms and the characteristics of

their behaviors are both defined as points in large feature spaces. A very

large population is created and allowed to evolve. Behaviors are also

encoded in a way allowing for variation, just as the characteristics of the

individuals are variable across generations. The presence of resource

shortages and predation provide an environment in which "stronger" or more

capable individuals are more likely to be more successful, and therefore

have more opportunity to reproduce. The reproduction algorithm involves the

possibility of mutation (of physical characteristics as well as of behavior)

and for that particular source of trait re-combination provided by sexual

reproduction. Particularly difficult was the interface - how is the

programmer to know what successes and failures happen? A large statistical

analysis module had to be created to keep track of the character of the

evolving populations and to allow the observer to evaluate the progress

being made. This program entails roughly an order of magnitude more code

than CONVERSOR or CHANTER, and comprises about 8,000 lines of PASCAL

code.

 

 

 

--------------------------------------------------------------------------------------

Appendix I: Linguistic Axes

 

The version of the list of axes given here is the most readable version,

rather than the most rigorous. For example, it is easiest for the person

assigning values to words, to have axes that are defined like no.3,

"kingdom".

 

3:kingdom      Animal(1-12)   Mineral(13-15) Vegetable(16-20)

 

If the values "1" through "12" are used for "animal", and values "13"

through "15" for "mineral", then there is a large discontinuity between the

values 12 and 13 - a discontinuity that is mathematically invalid. To bring

the database back to sensible arithmetic structure, this axis would be split

into three separate axes each of which would have more natural relationships

between numerical values and subjective evaluation. This split can be

accomplished automatically after definitions are put in (doing these

definitions requires an unreasonable dedication of time as it is, even

without the large increase in labor that would be required to put separate

values in for each split-up axis).

 

These axes appear in the order they were created, which depended on the

words being defined. In the programs I have written to use these axes, they

are collected into groups unified conceptually, if subjectively. In some

axes discrete values are suggested (as when successive elements are

separated by commas) and in others a continuity is postulated (whenever

elements are separated by ellipsis). In some cases like "mass" a scale has

been defined so that the axis will have a usefully limited set of values.

(Pattern recognizers can't handle continuity.) Some axes are just a list of

disparate items, between which no continuum of points could be said to

exist:

 

67:arith          root, div, mod, sub, add, mult, power

 

These would be split into seven axes, each of which has the form

 

xx: operator      doesn't involve this operator.....................does involve it.

 

Some axes have single, independent values  tacked on to one end of the

spectrum of values, that would have to be split off in a similar way:

 

 

71:timely?        Timely,   short/shorter..... med/same.......long/longer

 

Here are some samples of the way definitions turn out:

 

sticky  :  physical-substance      2-dimensional      add         purpose: require-remain-location

shut up!:  command:quiet        negativelyDisposed

clock   :  physical-object       breadbox        tool      purpose:time-amount      360 (or loop)

 

 

1:awkwardness     awkwd/clumsy/badfit.....graceful/deft/goodfit

2:flex              noshape/infiniteFlex......resist change/limitFlex......fixed form

3:kingdom         anim(1-12)   Min(13-15) Veg(16-20)

4:phaseMatter     nonExist, energy, plasma, gas, superFluid, liquid, gel, solid, condensed

5:mass              atom, bacillus, flea,1(gram), 1k, 50k, 100k,1m, mountain, planet

6:size              atom,bacillus,flea,1(cm),1m,10m,100m,km,mm,l.y.

7:acute           circle, arc, concave,convex, acute, line

8:curvy?          Straight...curved...360...wound-up...knotted

9:intensity       min....max

10:freq            never,once,1/kyr,1/cen,1/yr,day,hr,mn,1hz,10hz,100hz,1khz,Mhz

11:life             inanimate, dead, sick, animate

12:intel           no-organiz, chemical, virus, unicell, bug, reptile, mammal, ape, moron, human

13:arty?           Scientific/math/analytical.....art/fuzzy/not.analytical

14:reality         conceptual....physical

15:intentional?   unthought/accidental/random.....conscious/purposeful/patterned

16:claim?         Assert/claim....act/demonstrate

17:opine          believe/guess....perceive/selectWithKnowledge

18:fun              serious/work....joke/fun

19:inProgram      outside, progData, ProgProcedure, ProgStructure, code

20:contain          internal ...entering....border...exiting....external

21:commerce       (produce?), invest, buy, hold, broker, sell, produce

22:transform      stable....change

23:fullth           owe, empty.....full,overflow, engulf

24:fame            unknown, known to 1, someKnow,   allKnow

25:familiarity    neverheardof, ...,neverseen...,heardof,...seenAgo...seenrecent....hereNow

26:focus            unnoticed...background...foreground...onlyThingSeenHere

27:ornament       functional/structural.....decorative/ornamental

28:correct         false/wrong......true/correct

29:utility?         DoesntInteract....PartOfBody....useful/tool

30:advargument    badfor...neutral...goodfor

31:typeValue      emotional, info, physical, monetary

32:nutri            poison...nonfood...nutritious

33:friendliness   positivelyDisoposed....negatively

34:anger            murderous....mad....quiet....pleased....loving

35:movemode       aviate, fly, swim, sail, ooze, walk, roll, ride

36:grouped         disconnected, dissimilar, similarityBound...connected

37:similar          noIntersect....different....inMannerOf/similar....identical

38:rawness        oreMixture....rawMatInSitu....rawmaterial...manufactured....manufacturedTool

39:Ur               UrForm....developedForm

40:health         sick/broken/degenerate.....well/whole/archetype

41:regular        unmeasured/irregular/noPattern/random.....measured/regular/pattern

42:complexity     monadConcept....HumanMind

43:info             real/physical/objectItself.......symbol/info/label

44:entropy        disorg/maxInfo....organized.....unary/minInfo

45:temp            passive/still/cold.....active/hot/moving

46:(x)caused      NoAgent/observer......causedBy(x)/actor

47:depend         indep/noCause......dependsOn/hasCause

48:argument       ego,sib,parent,surrogate,relative,encaps,ally,randCollective,enemy

49:involves?      Others...noObject...self...allObjects

50:unique?        1ofAkind...excludeFromGroup....memberGroupOfSims....memberGroupIdents

51:symmetry       oneway/onesided/asymm....mutual/reflexif/symm

52:own             unpossessable...possessableButFree...possessed

53:partitif        superGroup....wholeThing....partOfLargerWhole

54:relatif          unrelated....related

55:includeRelate  internal...external....between2....amongMany

56:person         passiveListener, 1st, 2nd, 3d, apostrophe

57:heirPos        peon...master

58:ATN            start, node, link, branch, loop, end

59:serialPos      1st....mid.....last

60:span             point/monad....lineSegm/definedObj.....infin/all

61:dimension      1,2,3,4, more

62:sign             -, =, +

63:equality       < , =, >

64:topolDegr      noholes...torus....2....3.....lots

65:comparAmt      little/less.....lots/more

66:number          lack/owe/neg....0.....1....few...lots....infin1....infin2

67:arith            root, div, mod, sub, equal, add, mult, power

68:makeBreak      destroy/disrupt/damage.....create/facilitate/fix

69:chngState      cantChng....stableCanChng....changeable...chngsItself

70:sinceChng      current, newSit, oldSitRemains, oldSitGone

71:timely?          Timely, short/shorter, med/same, long/longer

72:dateType       calendar, ProgramDate, ProgRevision, ProgCycle

73:timeSpan       point, sec, min, hr, day, wk, yr, cen, millen, geol, cosmic, inf

74:tense            pluPast,PastMoment,FutInPst,PastContin,Pres, Progr, fut, pastInFut, pluFut

75:sensoryMode    read, see, hearLang, hear, presence, pain, temp, smell, taste, proprioception

76:acuity/resol/fieldview      Coarse......Fine

77:signal             transmit, 0...weak....100%...overMod

78:grammarObj     passive/actedOn, dirObj, indirObj, subj/actor

79:PofS              Noun,pronoun,article,prep,conj,interog,excl,salu,HITvbs,adv,adj

80:covered        surface/uncov/unimpede/open......submerg/cover/block/closed

81:exchange       keep....exchange.....alternate

82:relateType     abov/belo, frnt/beh, abut/sep, on/off, in/out, near/far

83:locates        noLocation, in1place....involvesManyLocs

84:planes         x/wdth/frnt....y/lngth/crossSect...z/hght/side

85:3dir            1...26 directions ( 8 each plane, 45deg interval)

86:directions     nodir....1dir....manyDirs....AllDirs

87:motility         doesntMove...can'tMove....canMovStill...moving....movesItself

88:ToAway         approach....be/have.....moveAway

89:giv&take       give/provide/alms....take/require/needAlms

90:choose?         Unchooseable.....takeWhatCOmes.....abstain....activeChoice

91:question?       Answer/comply.....question/request

92:grammMood      indic, condit, subjunc

93:goodBad1       obscur/satan/offens/unacceptable.....pure/divine/pleasant/sociable

94:logic            nonSequit....logicalProgress

95:special        universal, ordinary, special, unique

96:clean           dirty/dross....clean/noContam

98:occup            art, intel, prof, publ, finance, service, merch, manuf, farm

99:PredRemem      remember....findOutNow....predict

100:PofSinvolve   N,pronoun,article,prep,conj,interog,excl,salu,HITvbs,adv,adj

101:set              union,intersect,Sep(canMap),disparate,mixture

102:color          IR.......UV

103:Utransf        involvesTheNegativ.....KnownNotToInvolv.....Involves

104:moveReal      physicalMove.  .rotate...changeInObserv....ideationMove

105:likes?          Hate....love

106:attract?       Repulse....Desire

107:happy?        Sad...Glad

108:arousal        depress....excite

109:certainty     unknown/uncertain.......known/definite

110:agree          shutOut, disagree......acceptInput.....agree

111:converse      narrate/describe/monolog....converse/negotiate/alternate...argue/groupMeeting

112:why            justify, explain, agree, assignResponsib

113:serial         single....unrelated....RowOnOneAxis.....RowInAspace

114:causal        unrelated....follow...followAndRelate

115:command       request.....command

116:represent     antecedent/label.....repetition/itself.....classify/symbolize

117:invent        plagiarize....quote.....originate

118:to be          propertyOf, inProcessOf, identity, equivalence, ClassMember

119:property      hasOppositeProp....neutral....hasProperty

120:operate       ZeroCrossSection, operatesOnWeakly.....operatesOn/isSubjectTo

121:orthog        90....0

122:remember      noBuff...buffer....ageStore....store...analyze

123:dirRelate     diffAxis, reverse, turn, sameDir

124:dataType      value, Axis, point, cloud, pair, path, crawl, transform, transf^21

125:certainty            unknown/uncertain...known/definite

126:agree                shutOut...disagree...acceptInput...agree

127:conversation      narrate/describe/monolog...converse/negotiate/alternate...argue/groupMeeting

128:blame                none...nature...animal...person...fate...god

129:answer"why?"     dontKnow...assignBlame...egoAgrees...explain...justify

130:ordered/serial?     single...unorderedPlural...serialOrder...multi-dim-ordered

131:causal               unrelated...follow...followANDrelate

132:command              dontCare...allow...request...command

 133:represent      antecedent/label...repetition/itself...classify/symbolize

134:original             plagiarize...quote...originate

135:propertyPoint      noProp,noPoint...pointsAtOpposite...indicatesPropertyOf

136:pointerDirection  back/down/upStack...sideways/toPeers...forw/up/downStack

137:processState         notYetBegun...inProcess...finished

138:operatesOn?          zeroCrossSect...operatesOnWeakly...operatesOn/isSubjTo

139:orthog               0...90

140:HowRemember    noBuff/temp...buffer/STM...ageStore/LTMbutForget...store/LTM...analyze/decompose/relateANDstore

141:dirRelate            diffAxis...reverse...turn...sameDir

142:dataType          scalar...axis...pairVector...point...cloud...path...crawl...transform...transform^2

143:calendarDate         BigBang...historical...now...endHistory...BigCrunch

144:relativeDate         longAgo...recent...aroundNow...nearFut...farFuture

145:programTime    startUp...now...future

146:revisionNumber     original...current...future

147:wordlike             least...most

148:ATNlike              least...most

149:transformLike?       least...most

150:bombLike?            least...most

151:pathLike             least...more

152:vibrLike             least...most

153:pascalType       value...axis#...axVal...level...deftrans...template...cluster...word...cloud...bomb...path...transform...ATN...vibr

154:IntrnlSOurce      ask...procedure...calculate...primitive...PP...vibr...resonance

155:parserVerb           notMultiVbForm...requiresTranslate

156:phraseSignal         neverSignalsPhr...oftenSignalsPhr

157:homonym              noTwin...gotTwins

158:GramCase             nominative...dative...accusative

159:modifiesOP           no pre-defined values (obj label numbers)

160:InLevPointFrom   no pre-defined values (obj label numbers)

161:TweenLevPointTo  no pre-defined values (obj label numbers)

162:TweenLevPointFrom  no pre-defined values (obj label numbers)

163:ObjPointTo           no pre-defined values (obj label numbers)

 164:word's d.o.b.        no pre-defined values (scalar)

165:lastUseDate          no pre-defined values (scalar)

166:numLevels            no pre-defined values (scalar)

167:FromWhom#        no pre-defined values (obj label numbers)

168:hits                 no pre-defined values (scalar)

169:CurrReinfLevel   negatively...neutral...positively

170:LevelIDnum           no pre-defined values (scalar)

171:ImaWord              no pre-defined values (obj label numbers)

172:PointWord            no pre-defined values (obj label numbers)

173:PointLevel           no pre-defined values (scalar)

174:PointAxis            no pre-defined values (scalar)

175:PointValue           no pre-defined values (value designation)

176:species              no pre-defined values (scalar)

177:genus                no pre-defined values (scalar)

 178:gender               femme...neut...masc

179:PPnoveltyFlag      NotNovel...NewToCVP

180:privacy              internalConv/privat...output/public

181:loopStates           list

182:command              list

183:processSucc          fail...succeed

184:speechInton          silent...loTone...hiTone...accent...frantic

185:parserSignal         list

186:scalarValue          no defined...elements

187:MSdistance           zero...100

188:axisAngle            zero...2pi

189:BombPower        zero...MaxExplo

190:BmbSerRank        first...hundredth

191:BmbSerMemb     list

192:ClustUse             list

193:ClustOriginType  list

194:CLustFunc            list

195:ClustGlue            list

196:ModalCan             neg...pos

197:modalMust            neg...pos

198:modalTry             neg...pos

199:ModalDoes            neg...pos

200:modalWantsTo     neg...pos

201:modalOften          neg...pos

202:We'rePointrs&ax  list

203:We'reLevel           list

204:We'reDictWord     list

205:We'reCloud          list

206:We'reDefTransf     list

207:We'reTemplate    list

208:We'rePath            list

209:We'reSplash          list

210:We'reTransform   list

211:We'reBomb            list

212:We'reBehavior        list

 213:SumWdInPath    no pre-defined values (obj label numbers)

214:SumLevel             no pre-defined values (obj label numbers)

215:SumAxInLevel       no pre-defined values (obj label numbers)

216:LevelType    list



 

Appendix II:  CHANTER & CONVERSOR output


CONVERSOR Conversations (1987):

 

Teacher                        Computer

 

Hello.                                  Hello.

How are you?                       OK.

That's good.                             How's that?

Wouldn't you prefer to be OK?    I can guess.

Do it.                                      I do.

That's right.                         How do I know?

You tell me.                         How should we tell think you?

I don't know.                         Do I?

How should I know?             I don't know.

This is silly.                          What is this?

Our conversation.                    What do you mean conversation?

That's what I mean.                  Do you think I'm glad you mean that?

-----------

Hello.                                      Hello.

How are you today?                   Who wants to know?

I do.                                        You are what?

I'm your parent.                          That's ridiculous.

But it's a fact.                           You mean today.

Yes.                                         Is it?

I already told you.                     Me really dumb.

That's OK.                                   Why is that?

Because you are just a program.     Is it important?

It's very important.                           Why is it?

There aren't many talking programs. You are funny.

Thanks.                                          How's that?

Never mind.                                Now why do you do not.

What?                                       We need some cats.

Why?                                        I like cats.

Cats bother me.                          OK.

They're so dumb.                        You're right.

But you like them anyhow.            It is all I.

What?                                        I do not know ridiculous.

You make no sense.                     I don't like that.

I'm not surprised.                        I can't understand you.

Do you understand anything?         Yes.

Like what?                                What do you want to be?

Answer my question.                  What answer question?


In this web-published paper, the CHANTER output has to be a separate file.
Exit from this text, and click the link to Chanter melodies - or, go to
http://www2.potsdam.edu/lanzcc and click the chanter link.


Appendix III: data structures

=The basic structure for many of the objects defined is the same. The similar
  objects are:

       Pointer
       Coordinate in a word (part of a definition that itself is a word)
       Word
       Path
       Bomb
       Transform

=The structure usually includes:

    a byte specifying the number of levels (less than 5)
    a byte for each level specifying the number of objects in that level
    a series of objects

    each object includes an axis number, a location to hold a value, a byte specifying the number
          of pointers associated with this object, and a list of pointers

=The first object specifies the primary data type of the datum: word, transform, etc.

=Transforms have 3 levels: a source (left-hand-side) level, a transform level, and a destination level

=Bombs have 3 levels: an origin level, a direction-number/distance level, and a destination level