RSS

Daily Archives: March 13, 2012

A Tale of Three Languages

If I had to name one winner amongst the thousands of programming languages that have been created over the last 60 years, the obvious choice would be C. Developed by Dennis Ritchie from 1969 as the foundation of the Unix operating system, C remains one of the most commonly used languages even today; the Linux kernel, for example, is implemented in C. Yet that only tells part of the story. Dozens of other languages have borrowed the basic syntax of C while adding bells and whistles of their own. This group includes the most commonly used languages in computing, such as Java, C++, and Perl; quickly growing upstarts like C# and Objective C; and plenty of more esoteric domain-specific languages, like the interactive-fiction development system TADS 3. For a whole generation of programmers, C’s syntax, so cryptic and off-putting to newcomers with its parenthesis, curly braces, and general preference for mathematical symbols in lieu of words, has become a sort of comfort food. “This new language can’t be that bad,” we think. “After all, it’s really just C with…” (“these new things called classes that hold functions as well as variables”; “a bunch of libraries to make text-adventure development easy”; etc.). For obvious reasons, “C-like syntax” always seems to be near the top of the feature list of new languages that have it. (And for those who don’t: congratulations on sticking to your aesthetic guns, but you’ve chosen a much harder road to acceptance. Good luck!)

When we jump back 30 years, however, we find in this domain of computing like in so many others a very different situation. In this time C was the standard language of the fast-growing institutional operating system Unix, but had yet to really escape the Unix ghetto to join the top tier of languages in the computing world at large. Microcomputers boasted only a few experimental and/or stripped-down C compilers, and the language was seldom even granted a mention when magazines like Byte did one of their periodic surveys of the state of programming. The biggest buzz in Byte went instead to Niklaus Wirth’s Pascal, named after the 17th-century scientist, inventor, and philosopher who invented an early mechanical calculating machine. Even after C arrived on PCs in strength, Pascal, pushed along by Borland’s magnificent Turbo Pascal development environment, would compete with and often even overshadow it as the language of choice for serious programmers. Only in the mid-1990s did C finally and definitively win the war and become the inescapable standard we all know today.

While I was researching this post I came across an article by Chip Weems of Oregon State University. I found it kind of fascinating, so much that I’m going to quote from it at some length.

In the early days of the computer industry, the most expensive part of owning a computer was the machine itself. Of all the components in such a machine, the memory was the most costly because of the number of parts it contained. Early computer memories were thus small: 16 K was considered large and 64 K could only be found in supercomputers. All of this meant that programs had to take advantage of what little space was available.

On the other hand, programs had to be written to run as quickly as possible in order to make the most efficient use of the large computers. Of course these two goals almost always contradicted each other, which led to the concept of the speed versus space tradeoff. Programmers were prized for the ability to write tricky, efficient code which took advantage of special idiosyncrasies in the machine. Supercoders were in vogue.

Fortunately, hardware evolved and became less expensive. Large memories and high speed became common features of most systems. Suddenly people discovered that speed and space were no longer important. In fact roles had reversed and hardware had become the least expensive part of owning a computer.

The costliest part of owning a computer today is programming it. With the advent of less expensive hardware, the emphasis has shifted from speed versus space to a new tradeoff: programmer cost versus machine cost. The new goal is to make the most efficient use of a programmer’s time, and program efficiency has become less important — it’s easier to add more hardware.

If you know something about the history of the PC, you’re probably nodding along right now, as we’re seemingly on very familiar ground. If you’re a crotchety old timer, you may even be mulling over a rant about programmers today who solve all their problems just by throwing more hardware at them. (When old programmers talk about the metaphorical equivalent of having to walk both ways uphill in the snow to school every morning, they’re actually pretty much telling the truth…) Early Apple II magazines featured fawning profiles of fast-graphics programming maestros like Nasir Gebelli (so famous everyone just knew him by his first name), Bill Budge, and Ken Williams, the very picture of Weems’s “supercoders” who wrote “tricky, efficient code which took advantage of special idiosyncrasies in the machine.” If no one, including themselves after a few weeks, could quite understand how their programs did their magic, well, so be it. It certainly added to the mystique.

Yet here’s the surprising thing: Weems is not describing PC history at all. In fact, the article predates the fame of the aforementioned three wizards. It appeared in the August, 1978, issue of Byte, and is describing the evolution of programming to that point on the big institutional systems. Which leads us to the realization that the history of the PC is in many ways a repeat of the history of institutional computing. The earliest PCs being far too primitive to support the relatively sophisticated programming languages and operating systems of the institutional world, early microcomputer afficionados were thrown back into a much earlier era, the same that Weems is bidding a not-very-fond farewell to above. Like the punk-rock movement that was exploding just as the trinity of 1977 hit the market, they ripped it up and started again, only here by necessity rather than choice. This explains the reaction, somewhere between bemused contempt and horror, that so many in the institutional world had to the tiny new machines. (Remember the unofficial motto of MIT’s Dynamic Modeling Group: “We hate micros!”) It also explains the fact that I’m constantly forced to go delving into the history of computing on the big machines to explain developments there that belatedly made it to PCs. In fact, I’m going to do that again, and just very quickly look at how institutional programming got to the relatively sophisticated place at which it had arrived by the time the PC entered the scene.

The processor at the heart of any computer can ultimately understand only the most simplistic of instructions. Said instructions, known as “opcodes,” do such things as moving a single number from memory into a register of the processor; or adding a number already stored in a register to another; or putting the result from an operation back into memory. Each opcode is identified by a unique sequence of bits, or on/off switches. Thus the first programmers were literally bit flippers, laboriously entering long sequences of 1s and 0s by hand. (If they were lucky, that is; some early machines could only be programmed by physically rewiring their internals.) Assemblers were soon developed, which allowed programmers to replace 1s and 0s with unique textual identifiers: “STO” to store a number in memory, “ADD” to do the obvious, etc. After writing her program using this system of mnemonics, the programmer just had to pass it through the assembler to generate the 1s and 0s the computer needed. That was certainly an improvement, but still, programming a computer at the processor level is very time consuming. Sure, it’s efficient in that the computer does what you tell it to and only what you tell it to, but it’s also extremely tedious. It’s very difficult to write a program of real complexity from so far down in the weeds, hard to keep track of the forest of what you’re trying to accomplish when surrounded by trees made up of endless low-level STOs and ADDs. And even if you’re a supercoder who’s up to the task, good luck figuring out what you’ve done after you’ve slept on it. And as for others figuring it out… forget about it.

And so people started to develop high-level languages that would let them program at a much greater level of abstraction from the hardware, to focus more on the logic of what they were trying to achieve and less on which byte they’d stuck where 2000 opcodes ago. The first really complete example of such a language arrived in 1954. We’ve actually met it before on this blog: FORTRAN, the language Will Crowther chose to code the original Adventure more than 20 years later. LISP, the ancestor of MIT’s MDL and Infocom’s ZIL, arrived in 1958. COBOL, language of a million dull-but-necessary IBM mainframe business programs, appeared in 1959. And they just kept coming from there, right up until the present.

As the 1960s wore on, increasing numbers of people who were not engineers or programmers were beginning to make use of computers, often logging on to timesharing systems where they could work interactively in lieu of the older batch-processing model, in which the computer was fed some data, did its magic, and output some result at the other end without ever interacting with the user in between. While they certainly represented a huge step above assembly language, the early high-level languages were still somewhat difficult for the novice to pick up. In addition, they were compiled languages, meaning that the programmer wrote and saved them as plain text files, then passed them through another program called a compiler which, much like an assembler, turned them into native code. That was all well and good for the professionals, but what about the students and other amateurs who also deserved a chance to experience the wonder of having a machine do their bidding? For them, a group of computer scientists at Dartmouth University led by John Kemeny and Thomas Kurtz developed the Beginner’s All-Purpose Symbolic Instruction Code: BASIC. It first appeared on Dartmouth’s systems in 1964.

As its name would imply, BASIC was designed to be easy for the beginner to pick up. Another aspect, somewhat less recognized, is that it was designed for the new generation of time-sharing systems: BASIC was interactive. In fact, it wasn’t just a standalone language, but rather a complete computing environment which the would-be programmer logged into. Within this environment, there was no separation between statements used to accomplish something immediately, like LISTing a program or LOADing one, and those used within the program itself. Entering “PRINT ‘JIMMY'” prints “JIMMY” to the screen immediately; put a line number in front of it (“10 PRINT ‘JIMMY'”) and it’s part of a program. BASIC gave the programmer a chance to play. Rather than having to type in and save a complete program, then run it through a compiler hoping she hadn’t made any typos, and finally run the result, she could tinker with a line or two, run her program to see what happened, ad infinitum. Heck, if she wasn’t sure how a given statement worked or whether it was valid, she could just type it in by itself and see what happened. Because BASIC programs were interpreted at run-time rather than compiled beforehand into native code, they necessarily ran much, much slower than programs written in other languages. But still, for the simple experiments BASIC was designed to facilitate that wasn’t really so awful. It’s not like anyone was going to try to program anything all that elaborate in BASIC… was it?

Well, here’s where it all starts to get problematic. For very simple programs, BASIC is pretty straightforward and readable, easy to understand and fun to just play with. Take everybody’s first program:

10 PRINT "JIMMY RULES!"
20 GOTO 10

It’s pretty obvious even to someone who’s never seen a line of code before what that does, it took me about 15 seconds to type it in and run it, and in response I get to watch it fill the screen with my propaganda for as long as I care to look at it. Compared to any other contemporary language, the effort-to-reward ratio is off the charts. The trouble only starts if we try to implement something really substantial. By way of example, let’s jump to a much later time and have a look at the first few lines of the dungeon-delving program in Richard Garriott’s Ultima:

0 ONERR GOTO 9900
10 POKE 105, PEEK (30720): POKE 106, PEEK (30721): POKE 107, PEEK (30722): POKE 108, PEEK (30723): POKE 109, PEEK (30724): POKE 110, PEEK (30725): POKE 111, PEEK (30726): POKE 112, PEEK (30727)
20 PRINT "BLOAD SET"; INT (IN / 2 + .6)
30 T1 = 0:T2 = 0:T3 = 0:T4 = 0:T5 = 0:T6 = 0:T7 = 0:T8 = 0:T9 = 0: POKE - 16301,0: POKE - 16297,0: POKE - 16300,0: POKE - 16304,0: SCALE= 1: ROT= 0: HCOLOR= 3: DEF FN PN(RA) = DNG%(PX + DX * RA,PY + DY * RA)
152 DEF FN MX(MN) = DN%(MX(MN) + XX,MY(MN)): DEF FN MY(MN) = DN%(MX(MN),MY(MN) + YY): DEF FN L(RA) = DNG%(PX + DX * RA + DY,PY + DY * RA - DX) - INT (DN%(PX + DX * RA + DY,PY + DY * RA - DX) / 100) * 100: DEF FN R(RA) = DNG%(PX + DX * RA - DY,PY + DY * RA + DX) - INT (DN%(PX + DX * RA - DY,PY + DY * RA + DX) / 100) * 100
190 IF PX = 0 OR PY = 0 THEN PX = 1:PY = 1:DX = 0:DY = 1:HP = 0: GOSUB 500
195 GOSUB 600: GOSUB 300: GOTO 1000
300 HGR :DIS = 0: HCOLOR= 3

Yes, given the entire program so that you could figure out where all those line-number references actually lead, you could theoretically find the relatively simple logic veiled behind all this tangled syntax, but would you really want to? It’s not much fun trying to sort out where all those GOTOs and GOSUBs actually get you, nor what all those cryptic one- and two-letter variables refer to. And because BASIC is interpreted, comments use precious memory, meaning that a program of real complexity like the one above will probably have to dispense with even this aid. (Granted, Garriott was also likely not interested in advertising to his competition how his program’s logic worked…)

Now, everyone can probably agree that BASIC was often stretched by programmers like Garriott beyond its ostensible purpose, resulting in near gibberish like the above. When you have a choice between BASIC and assembly language, and you don’t know assembly language, necessity becomes the mother of invention. Yet even if we take BASIC at its word and assume it was intended as a beginner’s language, to let a student play around with this programming thing and get an idea of how it works and whether it’s for her, opinions are divided about its worth. One school of thought says that, yes, BASIC’s deficiencies for more complex programming tasks are obvious, but if used as a primer or taster of sorts for programming it has its place. Another is not only not convinced by that argument but downright outraged by BASIC, seeing it as an incubator of generations of awful programmers.

Niklaus Wirth was an early member of the latter group. Indeed, it was largely in reaction to BASIC’s deficiencies that he developed Pascal between 1968 and 1970. He never mentions BASIC by name, but his justification for Pascal in the Pascal User Manual and Report makes it pretty obvious of which language he’s thinking.

The desire for a new language for the purpose of teaching programming is due to my dissatisfaction with the presently used major languages whose features and constructs too often cannot be explained logically and convincingly and which too often defy systematic reasoning. Along with this dissatisfaction goes my conviction that the language in which the student is taught to express his ideas profoundly influences his habits of thought and invention, and that the disorder governing these languages imposes itself into the programming style of the students.

There is of course plenty of reason to be cautious with the introduction of yet another programming language, and the objection against teaching programming in a language which is not widely used and accepted has undoubtedly some justification, at least based on short-term commercial reasoning. However, the choice of a language for teaching based on its widespread acceptance and availability, together with the fact that the language most taught is thereafter going to be the one most widely used, forms the safest recipe for stagnation in a subject of such profound pedagogical influence. I consider it therefore well worthwhile to make an effort to break this vicious cycle.

If BASIC, at least once a program gets beyond a certain level of complexity, seems to actively resist every effort to make one’s code readable and maintainable, Pascal swings hard in the opposite direction. “You’re going to structure your code properly,” it tells the programmer, “or I’m just not going to let you compile it at all.” (Yes, Pascal, unlike BASIC, is generally a compiled language.) Okay, that’s not quite true; it’s possible to write ugly code in any language, just as it’s at least theoretically possible to write well-structured BASIC. But certainly Pascal works hard to enforce what Wirth sees as proper programming habits. The opinions of others on Wirth’s approach have, inevitably, varied, some seeing Pascal and its descendants as to this day the only really elegant programming languages ever created and others seeing them as straitjackets that enforce a certain inflexible structural vision that just isn’t appropriate for every program or programmer.

For my part, I don’t agree with Wirth and so many others that BASIC automatically ruins every programmer who comes into contact with it; people are more flexible than that, I think. And I see a bit of both sides of the Pascal argument, finding myself alternately awed by its structural rigorousness and infuriated by it every time I’ve dabbled in the language. Since I seem to be fond of music analogies today: Pascal will let you write a beautiful programming symphony, but it won’t let you swing or improvise. Still, when compared to a typical BASIC listing or, God forbid, an assembly-language program, Pascal’s clarity is enchanting. Considering the alternatives, which mostly consisted of BASIC, assembly, and (on some platforms) creaky old FORTRAN, it’s not hard to see why Byte and many others in the early PC world saw it as the next big thing, a possible successor to BASIC as the lingua franca of the microcomputer world. Here’s the heart of a roulette game implemented in Pascal, taken from another article in that August 1978 issue:

begin 
     askhowmany  (players); 
     for  player :  =  1  to players do 
          getname  (player ,  playerlist) ; 
     askif (yes); 
     if  yes  then  printinstructions; 
     playersleft : =  true ; 
     while  playersleft do 
          begin  
          for  player :  =  1  to players do 
          repeat 
               getbet (player,  playerlist);
               scanbet (player, playerlist); 
               checkbet  (player, playerlist, valid);
          until valid; 
          determine (winningnumber); 
          for  player : =  1 to  players do 
               begin  
               if  quit (player, playerlist) 
                    then  processquit  (player, playerlist, players, playersleft); 
               if  pass  (player, playerlist) 
                    then  processpass (player, playerlist); 
               if  bet  (player , playerlist) 
                    then  processbet  (player, playerlist, winningnumber)
               end
     end  
end.

The ideal of Wirth was to create a programming language capable of supporting self-commenting code: code so clean and readable that comments became superfluous, that the code itself was little more difficult to follow than a simple textual description of the program’s logic. He perhaps didn’t quite get there, but the program above is nevertheless surprisingly understandable even if you’ve never seen Pascal before. Just to make it clear, here’s the pseudocode summary which the code extract above used as its model:

Begin program. 
     Ask how many  players. 
     For  as many players as there are, 
          Get each player's name. 
     Ask if instructions are needed. 
     If  yes, output  the  instructions. 
     While there are still any players left, 
          For as many  players as there are, 
               Repeat until a valid bet is obtained: 
                    Get the player's bet. 
                         Scan the bet. 
                         Check bet for validity. 
          Determine the winning number. 
          For as many players as there are, 
               If player quit, process  the quit. 
               If  player passed , process the  pass. 
               If  player bet, 
                    Determine whether player won or lost. 
                    Process  this accordingly.
End program.

Yet Pascal’s readability and by extension maintainability was only part of the reason that Byte was so excited. We’ll look at the other next time… and yes, this tangent will eventually lead us back to games.

 
 

Tags: ,