The Treatment of Language: Computing vs. the Natural Sciences
 1987 - 2017
   Greg Bryant

This story begins with a fairly straightforward bit of math in 1987, which slowly
revealed fundamental problems with computer science as it is seen today by its
practitioners and academics.

---

I'd finished five years of ambitious consulting projects in Berkeley, Silicon Valley, 
and Japan, and I decided to retire to my own projects, and address a certain feeling
that I had about computing. 

It was obvious to everyone that the computer industry was a bloated marketing machine, 
where truth counted for very little, and power counted for everything ... 

... but that was not the feeling I was addressing. I felt that no one had the slightest 
idea how to define what computation IS in the natural world. I'd asked all the people
who should know, and they more or less had no rational answer -- only self-contained models, 
human artifacts, which didn't raise anything that even sounded like a research question.

I started re-reading and picking at a famous introduction to theoretical computer science 
text, by Aho, Hopcraft, and Ullman. I was reading what they described as the 'Chomsky
Hierarchy', and what sounded like their interpretation of it, which could not be correct.

At this point, I should say that Noam Chomsky in 1956 made some observations about insufficient
formal models of natural language which had a huge effect upon computer scientists. 

The first thing that computer people learned from him, was that formal languages and 
natural languages are just very different. We don't know all the ways that they are
different, but we know enough important differences to know that one should be very
careful not to confuse them.

He drove home the point by showing inadequacies of describing even formal languages 
with a kind of state machine known as a Markov chain. These state machines were
considered helpful in natural language work ... until Chomsky proved that they
were far too weak to capture much.

He added that one could create more sophisticated formal languages with Context
Free Grammars, and a kind of machine that he called a pushdown automata. But he then
proved, in the same paper, that these were still not sufficient to describe some very 
simple constructions that could be found in natural languages.

Computer scientists grabbed this notion of a context-free grammar and pushdown
automata, and used them to create all subsequent generations of formal programming 
languages. Many famous computer scientists that emerged after Chomsky's paper credited
him with their careers.

But some of them were giving the impression that Chomsky was saying "all languages
fit into this hierarchy." He was not saying that. He was saying that natural languages
are much more complicated than computer people imagine, and here are a set of
increasingly difficult examples to prove it.

Although for many decades most computer people continued to understand the distinction 
between natural and formal languages (now it depends on their level of exposure to these
ideas) the understanding of Chomsky's argument degenerated into a kind of typology
of computational sets. This was part of a kind of natural history of machine idealizations
and their highly abstracted relationships known as the "complexity zoo". Chomsky's complexity 
results were consequently taught as a hierarchy, implying that all languages fit cleanly into it. 
Regular expressions were finite. Context-free, context-sensitive, and the equivalent attribute 
grammars were not.

That didn't seem right to me. Little known to me, the idea was resisted by computer
people as soon as this misinterpretation emerged in the literature, apparently in
the early 60's, when the entire point of Chomsky's paper was still fresh in people's
minds.

Time for another aside ... I have positivistic friends who seem to think that somehow
'computing' is 'progressing', whether linearly, geometrically, or exponentially. 
I see no evidence for ANY intellectual progress, at all. Good ideas fall by the side 
of the road all the time. Bad ideas re-emerge continually. If computer science were 
framed as a natural science, this might be less likely to happen. Perhaps if it were 
understood only as engineering, this wouldn't happen. Perhaps if it were understood 
as a formal science, such as mathematics, it also wouldn't happen. But, unfortunately, 
individuals simply recruit what they need to suit their purposes, and there's no 
collective drive towards anything but individual success.

Back to the story.

So I circulated a paper that demonstrated that there was another sort of machine,
a multi-head finite automata or MFA, defining a set of languages, MFAL, which weren't
equivalent to regular expressions / finite automata nor to context-free grammars / 
pushdown automata. Everyone said I was right, and I started to prepare drafts for 
publication. 1 2 3 4

But then I found that this was published as a result in 1965. But that wasn't the
thing that bothered me the most. Neither was the loss of Chomsky's point and the 
propagation by good computer scientists of the misleading idea of of a Hierarchy
of formal languages.

No, I was bothered by a deeper problem. This was just the first of the deeper problems
that would emerge.

Pushdown automata  are not "finite machines". They have an infinitely extendable 
memory, used subsequently to great effect in computing circles: the state of the stack.

But if these MFA cut across the partition between finite-state automata and pushdown
automata, are they "finite" or "infinite"? Are the languages they recognize "finite"
or "infinite"? 

Something smelled fishy. Is there something wrong with our notion of finite and infinite?

Oh yes. Very wrong. But that's just the beginning.

First of all, let's take our intuition about finite machines, and imagine creating
limits for them. Let's make a rule like: you can create a formal system that 'operates'
only in the sense of the application of some rules. It is NOT trying to be a model of 
the real world. It is literally something with rules that you can implement inside of 
a closed box. That seems like one way to define a finite machine.

What's wrong with that? Many things.

For example, these rules are all being carried out by your brain. You're bound to let
a little boundlessness in when you do that.

Ok then: let's build a machine to follow the rules. This is computing after all. If we can
get a machine to follow the rules, and it all happens in a closed box, and we read the 
results, then it's finite.

Is it? We're providing input. We're protecting the box from interference. We're reading
the output and interpreting it. Sounds rather indefinite or 'infinite' to me.

There's a single problem here. 

And this problem should be taught in every beginning programming classes the way that 
"there is no contact-mechanical model for gravity" is taught to every first-year physics 
student.

The "computer" is not "out there". "Infinite" and "finite" machines can not be discovered in
the world. There is no "meter" that can be built to detect them. The machines, and their many 
properties, are simply not there.

We are creating them.

We are using these words, and these terms.

These words represent ideas in our brains.

If we want to understand these terms, we need to look at ourselves, and not be confused about
our imaginary worlds. They are not real. 

Of course we build real stuff. We agree on what our terms mean. But if you look beyond that, 
you'll find humans, making stuff up. You can examine those things, to find out more about the 
human mind, and how it creates 'objects' and 'connections' and 'networks' and 'parts' and 
'models' and many other ideas. And you'll also discover how people are confused about the 
reality of these ideas, confused into thinking them to be mind-external. Confused -- upon hearing 
that they are mind-internal -- about why they don't understand the simplest things about themselves.

If computer science is ever to have a foundation, we need to start with this simple idea.
A 'computer' is just a word, and an idea, which we really don't know, but which we can
easily see has no external reality, outside of agreement about what 'is' and 'isn't' a 
computer by people who are sufficiently indoctrinated about it. We need then to work harder
at making these terms independent of the minds of their users. That's the role of science.
We can start by actually creating a different set of terms to more explicitly refer to
things we are trying to find externally. That's natural science.

All this started with a question about formal language sets. But you could start from 
anywhere to come to the same conclusion.



Multiple-head Finite Automata

Greg Bryant, 1987

Abstract:

A new generalization of Finite Automata (FA) called Multiple-head Finite Automata (MFA) is described. MFA accepts a set of languages (MFAL) that do not fit the proper hierarchy of mappings between machines and language classes, often known as the Chomsky hierarchy. MFAL includes regular sets (RS), often written as regular expressions (RE). MFAL are a proper subset of context sensitive languages (CSL). But MFAL cut across context-free language (CFL) partitions: there are CFL that cannot be recognized by a MFA, and there are MFA languages that are not context-free, i.e. cannot be recognized by a standard pushdown automata. Since standard FA are considered 'Finite Machines', and pushdown automata are considered constrained 'Infinite Machines', this anomalous set implies that the distinctions between finite and infinite machines are not sufficiently well-defined.

[Note, 1988: MFAL were discovered independently in 1965. But the result never entered standard theoretical or pedagogical works. The implications here are the work of the present author.]

Keywords: automata, languages, finite automata, finite-state machines, grammars, translators, parallel processing

I. FA that accepts more than regular sets

In the finite control model of FA, the machine comes in contact with a tape, representing a string of symbols, by means of a single read head, or input.

Consider instead a finite state machine with a finite number of read-only heads, all reading the same tape. This is essentially a finite number of finite control models linked together.

A two-head machine of this type can accept palindromes. A one-head FA cannot. Palindromes were previously thought to be outside the recognition ability of finite automata.

Here is how these FA machines can accept palindromes. Place one end of the tape on the leftmost head, and the other end on the rightmost head.

The algorithm is to compare the symbols read by both heads. If they match, the tape advances to the left, on the leftmost head, and to the right on the rightmost head. When the tape can no longer move (the heads are spaced as adjacent records), the string is a palindrome. It is otherwise rejected.

This automaton is finite, in the sense that the heads and the machine have the same number of states regardless of the length of the tape. The interaction with the tape, reading and moving/manipulating, is mechanically as finite as the old finite control model.

The finite state diagram for this machine is simple. Here we use a two-symbol alphabet consisting of 'a' and 'b'.

The ordered pairs represent the symbols read at any step by the left and right heads.

This palindrome, whose context-free grammar production is S-> aSa | bSb | {empty}, cannot be defined using FA regular expressions. If you accept that the machine is indeed finite, then there is no bijection between finitie automata and regular sets. This is of course a matter of definition, dealt with further in section V.

II. FA that accepts some non-CFL languages

These Multi-head finite auomata (MFA) can also accept languages that are not context-free. For example:

a^nb^nc^n

where n is a natural number.

The algorithm requires a three-head FA. With the initial configuration like so:

The MFA then needs to advance the tape over the rightmost head until it reaches a 'c', and the tape of the center head until it reaches a 'b'. From there it is sufficient to use the machine defined below (where 'rm' is the right end-marker of the tape, and with the tape always moving to the right over each head).

III. Machine and class notation

One might think that the notation of regular expressions with superscript extensions like the variables above, R for palindromes or mirror case, and perhaps a substitution notation for strings like "abbabaab" would be sufficient to express all languages acceptable to an MFA. However one further example will show these extensions to be insufficient and/or inappropriate. A notation representing the machine behavior turns out to be a direct way of describing a given language.

The task is to accept strings with a prime number of 1's. This too can be accomplished with a finite automaton.

The algorithm uses three read heads (or pointers). The first pointer increments through possible divisors, 2 through n-1 (though the program has no representation of n in it, being a finite state machine) [foreshadowing there]. Subsequent to each increment the third pointer counts to the divisor and resets iterativelywhile paralleld by the second pointer stepping through the entire string. Remainderless division sends the machine into a reject state; otherwise, after all the divisors are stepped through, the machine accepts the tape.

The notation below contains a few bits of shorthand to reduce the number of states to those pivotal to the algorithm. Eliminating these conveniences one has a language that provides a bijection to the set of possible MFA and a mapping to the languages accepted by them.

Standard notations

[ , , ] -- action triplet directing tape to move over three heads
+ or - -- move tape to left or right
( , , ) -- values over read heads; conditioon for state transition function
lm or rm -- left or right end-marker.
. -- any value
l -- lone member of our alphabet
(condition) -> state -- transition function

Shorthand

lm within [ , , ] -- move tape to the left endmarker (treated as a position and in this shorthand as though many heads could be there at one time)
lm+1 -- move tape to the first symbol
(pl=p3) -- condition when heads 1 and 3 are in the same virtual position (?)
rm-1 within ( , , ) -- when head i on rightmost symbol
Note: multiple conditions are tested in order listed

1. initial state

[lm + 1, lm, lm}

(.,.,.) -> 2

2. increment divisor

[+, lm, lm]

(rm - 1, ., .) -> 3
(1, .,.) -> 4

3. accept state

4. dividing loop, first increment for third head

[,+,+]

(.,rm,.) -> 7

(pl = p3) -> 6

(.,1,1) -> 5

5. dividing loop, subsequent increments

[,+,+]

(.,rm,.) -> 7

(pl = p3) -> 6

(.,1,1) -> 5

6. reset dividing counter

[ , ,lm]

(.,.,.) -> 4

7. reject state

MFA are capable of quite varied behavior. One wonders what languages a Multi-head Pushdown Automata (MPDA) might accept with one and many stacks and finite control. More generally we need to ask what the characteristics of finite and infinite tasks [or behavior?] really are.

IV. MFA not LBA

Despite the power of these machines, there are context-free languages that none of them are able to accept. Since context sensitive languages (CSL) include CFL, and linear bounded automata (LBA) have been shown equivalent to CSL, this would imply that MFA are not LBA. I claim that this makes MFA finite.

To demonstrate that the context-free grammar

S -> aSSa | bSSb | c

canot be accepted by a finite number of states and heads, consider the form of such strings.

      cc
     a  ac        cc
    b      bc  ca   a        cc  cc
   b         ba      a      a  aa  a
  b                   bc  cb
        b
 a                      aa           a
b                                     b

Except for the single case of string 'c', every string generated by this grammar has the substring 'cc', occuring one or more times.

To accept a string in S one must first accept some substring, say q. Unless q is 'cc'. one must match it with its counterpart q' elsewhere in the string. The distance between these parts is arbitrary, so a finite state machine needs two heads to match them.

However, whichever 'direction' one matches and assimilates, either towards or away from a 'cc' (the centerpoint for the matching, otherwise another pair of heads will be necessary elsewhere in the string), one may run into a new 'peak' (i.e. another application of the aSSa or bSSb productions) an arbitrary length string that must be accepted before one can match further.

                 .....        ..  ..
      cc        b     bc     a  aa  a
     b  bc    ca        a  cb        b
   ca     a  b           ba           a
  a        aa                          a
 b
                                     bc
a                                         a

Again, two heads are needed to accept this substring, but the current two are not available, since without them the finite state machine can have no idea what it has already matched, and will repeat itself. This problem will occur an arbitrary number of times.

Similarly, if one begins at a 'cc' (beginning at all of them implies an arbitrary number of heads) and works to accept the surrounding string, the same difficulty arises. This demonstrates by cases how no MFA can recognize an arbitrary string of S.

A class of automata that does not accept some CFL is certainly not equivalent to LBA, which could accept this tape by writing markers in the appropriate places. The Multi-head Finite Automata Languages (MFAL) therefore don't fit into the RS, CFL and CSL hierarchy, as shown below.

V. But is it finite?

Certainly MFA would be considered finite automata if we chose to call them such.

For example, the finite control model is not finite in some ver yimportant ways. Any arbitrary length tape can be placed on its single read head, making the number of possible states it can enter, including the tape as part of the machine, infinite. And the number of states, again included in the tape, given any particular tape is multiplied by the number of possible positions on that tape.

Also, the behavior of the single-head model is 'infinite' over time ... a 2-way finite automata can keep on reading indefinitely. And the combinationsof all the states an FA goes through over time are as limitless as the number of tapes it can read.

Yet we've agreed to call them finite. The do not fit our criteria for calling an automaton infinite.

Similarly an LBA's behavious of marking its tape and then interpreting it is not acceptable behavior for a finite machine, although it is certainly finite in many ways.

The MFA is an odd case. Being built like a single-head finite control is a strong case for its being considered finite, as is its limited ability. As a generalization of a finite automaton with many senses interacting with its environment, it's a very appealing model of a deterministic system in the real world.


[I leave the following 2002 observations here, even though they don't suffficiently incorporate the internalist/externalist distinction I descibed in the intro to this page. On first re-reading the 1987 essay after many years, I was curious about the external problem. Even though I sensed that it was entirely imaginary and mind-internal, like every logical contradiction, ever. So here's how a smart engineer can confuse themselves with their own imagined worlds when they don't consistently remain within the worldview of the natural sciences, and don't consistently ask themselves where the subject under investigation may actually exist.]

Notes, 2002.

The main point is that MFAs, machines with no internal variables, which would be considered to follow the finite control model, are more powerful than PDAs in some cases. Yet a PDA is considered an infinite machine, and an MFA is not. The point is that a PDA may have an infinite number of internal states, but an MFA externalizes states -- because taken as a whole, the tape adds significantly to the number of states available to the MFA.

But we don't usually take machines as a whole. We tend to isolate the machine's behavior, and separate its internal states from its environment, creating a logically consistent fantasy space, and partitioning data sets accordingly.

This paper still goes to the heart of a number of problems ... where does the machine begin and end? If we consider the machine to be a center, with the tape a supporting center, and the environment the tape flops around in to be yet another, then there's no need for such strict divisions. The notion of a finite machine which takes input from the real world is a very strange one, and should be tossed, in favor of a more straightfoward description of the machines -- finite internal states, infinite tape, externalized memory via the shape of the tape, etc.

It's interesting how this is reminiscent of our difficulty in understanding the central behavior in quantum mechanical systems -- how does the photon in the two slit experiment know about the existence of the other slit? Clearly an MFA is capable of more than meets the eye -- including some Universal Turing Machine-only level functionality. But, it does this by plodding away through finite states. How does the addition of more heads give it an infinite storage of states? It actually uses the outside world to do work, without being aware of it.

The reason this is intriguing, is that in quantum mechanics we have a behavioral theory not a contact mechanical one. And this is certainly a contact mechanical model. Probably we simply do not understand the systems we choose to define.