Historical Background of Spreadsheets

by Christopher Browne

1. How Spreadsheets Came to Us

Based on the history of personal computers, spreadsheets may be argued to be the most important application area on personal computers.

Something that isn't questionable is that the entry of personal computers into business began as a specific result of the advent of spreadsheet software.

Businesses bought Apple II computers because they wanted to use VisiCalc. (Which, interestingly to operating system aficionados, was originally developed on the Multics platform, according to VisiCalc coauthor Bob Frankston. Yet another argument for the notion that virtually all important modern computing innovations took place 20 years ago on the Multics platform...) More details may be found at VisiCalc: Information from its creators, Dan Bricklin and Bob Frankston

Thanks are in order to Bob Frankston for some corrections he has provided to this history.

When Lotus 123 became available for IBM PCs, the cycle continued, and IBM PC sales took off. Then Microsoft got into the picture, and things have unfortunately degraded considerably...

Here is another Brief History of Spreadsheets. The straight facts are pretty accurate; I'll just take a bit of issue with a couple points that represent ``editorial opinion'' that we can probably agree to disagree over...

The authors suggested that Lotus 123's ``A1'' referencing system was ``more intuitive'' than the ``R1C1'' system used by various other spreadsheets notably including Microsoft's MultiPlan.

I suggest instead that neither is particularly more intuitive than the other; ``A1'' simply happens to be shorter.

	The A1 notation originated in VisiCalc not 1-2-3. The reasoning for it was simple: it was simple to type. Reversing standard notation, we used the letters for the columns because there were fewer columns than rows.
-- Bob Frankston

They stated that: The spreadsheet instantly became easier to use than the archaic interface of PC-DOS products...

The text-based user interfaces were hardly ``archaic'' at the time; they were as up to date at the time as anything could be. It is indeed fairly convenient to select ``blocks'' using a mouse, and it is fair to claim this is easier to learn than the (quite functional) keyboard-based ways for doing this that were and likely still are faster. (Except with Excel , where the keyboard interface appears to have been made deliberately arcane, but I digress...)

Spreadsheets have provided for (without most of the users having any conscious awareness of this), the large scale deployment of Cellular Automata. The FAQ on Cellular Automata defines Cellular Automata thus:

A cellular automaton is a discrete dynamical system. Space, time, and the states of the system are discrete. Each point in a regular spatial lattice, called a cell, can have any one of a finite number of states. The states of the cells in the lattice are updated according to a local rule. That is, the state of a cell at a given time depends only on its own state one time step previously, and the states of its nearby neighbors at the previous time step. All cells on the lattice are updated synchronously. Thus the state of the entire lattice advances in discrete time steps.

Spreadsheets satisfy these requirements, with a few "bits of fuzziness," notably that:

"Nearby neighbors" don't have to be terribly nearby with a spreadsheet.
There are some questions of synchronicity.

Spreadsheet packages have often provided some configurability to indicate update policies that are not synchronous, but which rather define some ordering of updates.
Pure CA systems tend to apply a single update rule to many cells in the lattice; in contrast, the spreadsheet software in common use attach individual formulae to each and every cell.

Improv and related packages would apply rules to different regions of cells; it seems to me that a more intelligent use of spreadsheets should involve some sort of "pattern matching" to help discover such rules for the user.

ZigZag seeks to transform things from the "globally Euclidean" space used by spreadsheets to that of "locally euclidean" directions that more resemble the hyperlinking of Nelson's Xanadu.

I've been using spreadsheets of various sorts on a wide variety of platforms since the mid-'80s. The first one which I used any substantial amount was SuperCalc, running under ZR-DOS (an enhanced CP/M ``clone'' that grew into ZCPR). Others have included (in rough chronological order):

Lotus 123
Lotus Symphony, which integrated a wordprocessor of dubious functionality, a simple ``form'' processor, and a telecom module into the mix;
A 6502 Atari 8-bit based spreadsheet called SpeedCalc that was published in Compute! Magazine;
LDW Power, a Lotus 123 ``clone'' for the Atari ST;
Borland's Quattro Pro, used in grad school primarily for its linear programming capabilities;
The spreadsheet built into the TRS-80 Model 100 portable computer (a very small, early version of Microsoft Multiplan that eventually grew into the monster now known as Excel)
As Easy As (guess who they were cloning?);

MS-DOS-based and available as shareware, this is still a featureful spreadsheet. It doesn't allow heavy-duty GUI formatting of spreadsheets, but only people with far too much time on their hands do so...
Lucid 3D;

The first version of Lucid ran on TRS-80 Model 100 laptop computers, and the necessity for frugality on that platform resulted in a design that was sparse, frugal, and indeed, extremely lucid.

The MS-DOS version was my favorite MS-DOS program of any sort; its user interface is a wonderful model of integration of powerful use of both keyboard and mouse; the program was still frugal in its use of disk/RAM and yet provided excellent overall functionality.
SC ("Spreadsheet Calculator");
VC (Enhanced, "more visual" version of SC);
The much despised memory hog, Microsoft Excel ;
The spreadsheet built into the Psion 3 handheld computer;
Angoss SmartWare
Xess;
Wingz
Teapot!

with bits of playing around with others such as Lotus Improv , the Microsoft Works spreadsheet for MS-DOS, the FSF's Oleo, an entirely-custom one that I wrote in LISP (mostly just as a programming exercise), and sundry fiddling around with a pretty wide variety of MS-DOS, MS-Windows, and Unix-based ``integrated packages.'' The most notable spreadsheet of which I have never made significant use is VisiCalc, the program that popularized the whole idea of software that that allow interactive entry of numbers, text, and formulae arranged in rows and columns.

Tasks I've done with spreadsheets have included:

Preparation of accounting working papers
Preparation of financial statements
Economic simulations via difference equations/"cellular automata"
Mathematical modelling, solving linear and nonlinear programs
Loan modelling
Database conversions
Statistical analysis

In short, I've done enough work of enough various types with a large enough variety of spreadsheet packages that I figure I'm entitled to rant a little bit about their proper use.

Here are some further useful resources on the ancient and more modern history of spreadsheets:

The newsgroup comp.apps.spreadsheets has a FAQ.
CS 130.12 - Spreadsheets - History and Introduction
The Spreadsheet Page

2. Problems with Modern Spreadsheet Developments

``Enhancements'' of spreadsheets over the last few years have not involved any substantive improvements in functionality, but have primarily just involved enhancing their ``typesetting'' capabilities, that is, the ability to change fonts, insert special formatting, and to otherwise make tables look ``pretty.''

I put ``enhancements'' in quotes because I am skeptical that this actually represents a true improvement of either the quality of the information or user efficiency in finding and using information.

These so-called improvements gloss over the continuing problems that plague spreadsheet users:

Spreadsheet models encourage the use of ``spaghetti'' logic, where cells point to cells that point to cells, and can grow into random networks of calculation logic;
They permit lots of easy off-by-one errors;
They generally are difficult to verify/audit;
They do not provide good tools for managing data either in terms of consolidation or searching for specific detail;
Perhaps most importantly, despite their convenience, spreadsheets are not a robust repository for information.

I have seen one multinational enterprise that (believe it or not) built a budgeting system atop sets of dozens of departmental spreadsheets that they would roll up into a master budget; while it's a neat extension of the technology, only a fool would try to use this to run a large enterprise. One bad link in one subsheet, and the whole house of cards could fall down. (And the ``top'' vendor these days, Microsoft, isn't noted for building products that are of industrial grade robustness.)

The last few points point towards where I would like to see spreadsheets go. They have been, and are very good at producing ad-hoc, one-off reports. This is a proper use of spreadsheets.

They are often being used instead as repositories for information that really ought to be managed by a database management system of some sort.

What spreadsheets should do is to allow, nay encourage, the use of data extracts from external sources, notably relational databases. The use of named ranges (which are a venerable feature from at least as early as Lotus 123 v2.01) is of assistance; Lotus Improv was a rather complex-to-use test platform for improved "modelling" whose functionality included database extraction.

Using external repositories permits the benefits of:

A single repository that can be kept correct, rather than a multitude of mutually incompatible data stores;
Data synchronization (a restatement of the last);
All the good RDBMS "stuff" like:
- Field validation,
- Maintaining field relationships,
- Transaction logging,
- Centralized backups,
and perhaps even more sophisticated things such as
- Data modelling and
- Stored Procedures/Triggers

In effect, the real point I would propose is that the task of building a spreadsheet should involve some data modelling, with thought not just about the report at hand, but also about where the data comes from and perhaps should go to.

2.1. Lotus Improv - An Attempt to Improve

I would suggest that what happened in the history of the developments is that, for ``political'' reasons, the developments attributable to Lotus Improv (originally developed using NeXtstep) were lost, and that its better model of spreadsheet construction/management was thereby lost.

Improv provided an interface that actively encouraged, nay required the user to add additional structure to spreadsheet models.

It provided the ability to define a variety of ``categories'' to provide multidimensional analysis, as well as ``groups'' to allow the grouping of data that is not so readily decomposed.

Items

Every time some form of categorization is defined, this defines a sequence of ``items,'' whether that be a list of months (``January, February, March, ...''), a list of countries (``Canada, United States, United Kingdom, France, Germany, ...''), continents (``North America, South America, Asia, Europe, Africa, Australia, Antarctica,''), or whatever.

Each ``item'' represents a row or column, and thus may contain many cells.

Categories

In a company selling things internationally, it would be unambiguously valuable to set up Country as a category, as you would certainly need to analyze data based on that. Currency exchange rates are based on countries' currencies, the set of laws that apply depend on the country, and so forth.

Periods of time, such as months, quarters, and years, also tend to unambiguously reflect a ``dimension,'' in this case that of time.

Groups

On the other hand, reports might need to group countries together into ``regions'' or ``continents,'' depending on who is looking at the data. Those groupings are likely to be fuzzier, whether we're talking about grouping several countries together to represent a Continent or Sales Region, or if there is need to have smaller regions (such as states, provinces, or counties, or shires) to decompose the activities within a country.

In both cases, it would be fairly appropriate to define a less-structured ``group'' that does not add an extra dimension to the hierarchy, and thereby to the complexity of the data model. Thus, a set of related items are collected together to represent a ``Group.''

Formulae

The behaviour of formulae in Improv is exceedingly different from that of traditional spreadsheets.

In a traditional spreadsheet, a formula is associated with a cell, and in order to have a particular formula apply to many cells, you must copy the formula into that range of cells.

In Improv, on the other hand, formulae are not associated with cells, but are "first class" objects associated with the spreadsheet, and rather than representing a mere single cell, are applied to an entire range/group of cells, and may thereby operate on items, groups, and categories.

Instead of operating on cryptic ``cell ranges,'' they operate on named ranges, and thereby tend to be more readable than the traditional spreadsheet alternative.
Improv formulae almost always represent vector operations, providing values for multiple cells at once. Thus, a formula that computes monthly totals across category ``Fruit'' might look like Total = SUM(Fruit). The crosscheck formula, computing annual totals for each variety of fruit, might be Annual = SUM(Months).

This example displays the ``overlap'' issue; the pair of formulae overlap in the cell that contains the total for all fruit for the whole year. Improv discloses this overlap, and allows formulae to be placed into a priority order where ``later'' formulae override ``earlier'' formulae.

2.1.1. Links Related to Improve

Here are links to historical information about Improv as well as about other packages that might be considered to be ``successors.''

Story of Improv
The Story of Improv versus PowerStep (another NeXTstep spreadsheet)
Quantrix - Would-be Improv Successor
Advance Planning Solutions Advance had a spreadsheet-like system rather like Improv.

The company was acquired by PeopleSoft in 2000, so parts of this may have been integrated into their applications, but it is not likely still available as a separate product.
The following observation came to me via email:

... It's worth noting that Improv flattered an earlier program, Javelin (by a company of the same name located in Cambridge MA).

-- Bob Frankston

I was aware of the release of Javelin; it was an MS-DOS-based software package that had an unfortunately-brief flash of fame.

See also the web page of one of Javelin's authors, John R. Levine, as well as Probert Encyclopaedia on Javelin.

It appears that Javelin may have been an early victim ofMicrosoft Predatory Marketing; an InfoWorld article indicates that when Javelin won the InfoWorld ``Product of the Year'' award, beating out Excel, Bill Gates ``got up and stomped out of the room in front of everybody in a spectacularly rude manner.''

Another report suggests that they got overambitious, planning to try to dominate Lotus 123 when they really needed to grow their niche. And then were "done in" by bad timing on an IPO, scheduled just a week after the market crash of October 1987.

Other comments suggest that the failure had to do with the software being difficult to use. It was powerful, but business GUI software was in its infancy at the time, and the implication is that Javelin did not have a sufficiently "user-friendly" interface to permit widespread adoption.

These may all have been contributing factors.

You may still be able to get copies of Javelin; I'm told that it was bought out by a "venture capital" group; they integrated it into some data retrieval tools, and you may be able to get a copy of World Bank Indicators - World*Data 1995 which included a copy of Javelin.
Lotus Support File Library for Improv

2.2. Recent Spreadsheet Research: Model Master

A paper was recently presented on a system called Model Master. I have excerpted the following:

Spreadsheet models can be difficult to read and maintain. Spreadsheets provide few facilities for documentation, and although the structure of a spreadsheet program is implicit in the cell equations, it cannot be made explicit as it would if programmed in a conventional programming language. To make spreadsheets easier to use, we are developing Model Master (MM), a compiler that generates spreadsheet equations from textual specifications of models.

An MM program consists of one or more object specifications. To specify single objects, the user describes their attributes or properties, together with equations stating how these depend on one another's present and past values. To specify a complete model, the user describes how these objects are to be connected together, by writing extra equations that say how their attributes are interrelated. MM compiles these specifications into cell equations. It automatically allocates attributes to columns and time points to rows: the user can override these allocations, but will not usually need to do so.

MM is based on a new programming paradigm, System Limit Programming, also used in the development of the Web authoring tool Web-O-Matic.

Further information as well as a on Model Master: an object-oriented spreadsheet front end

--Jocelyn Paine

Furthermore, the paper references some of the preliminary research that has been done on the issue of the correctness of spreadsheets. They are highly dependent on user input, there are several serious vulnerabilities from which spreadsheet tools suffer. The use of a tool like Model Master to construct a spreadsheet allows conscious validation of more of the spreadsheet model which can't but be helpful.

There have been other presentations on Model Master, including Ensuring Spreadsheet Integrity with Model Master , presented to the European Spreadsheet Risks Interest Group. More recently, Model Master has been augmented to include a decompiler so that a spreadsheet may be turned into a concise set of equation specifications. Several interesting things pop out of that:

The Model Master program may be a more attractive interchange format than raw spreadsheets themselves.

This has the various merits that:
- What is transferred is effectively a description of the spreadsheet model; that may be more usefully readable than the spreadsheet itself;
- It is likely to be more compact than the "binary dumps" that commercial spreadsheets generate.
- The model program can't contain the "macro viruses" that MS Office has been plagued with of late.
- Any "nefarious" calculations will be visibly described in the model's text.
  
  For instance, suppose I were to have a special formula for the line calculating my payroll amounts, that would show up.
The decompiler can readily pick up on which calculations are being run "hard-coded," and which are using formulae.
The "tough part," at this point, which warrants considerable additional research, is the notion of doing some searching to find repeated patterns of formulae.

For instance, it is very common for there to be a column of cells that computes some sort of "cost," by multiplying a quantity cell by a price cell, perhaps adding in taxes or other costs, coming up with a "total cost."

It would be very valuable to recognize the repetition of that formula, and essentially present the formula only once in the model. Note that this is exactly the way Improv treated formulae; they were not defined merely for a cell, but rather for a whole range of cells.

In a traditional spreadsheet, the repetition is done by hand, which is one of the major areas that modelling errors creep in. By "pattern searching," such errors may both be found (when decompiling) and avoided altogether (when compiling).

Further development has built a web-based front end for Model Master, The Spreadsheet Autopublisher

Here's a Page with Excel -versus-Access Spreadsheet-vs-DB test. Worth looking at.

2.3. Idle Thoughts About Fundamental Improvements

A number of "dead ends" have been encountered in the ongoing development of spreadsheets.

The "traditional" spreadsheet systems went through a process of "racing for successive refinements" through the late 1980s and early 1990s, largely seeking to add "feature points" to win the contests for "most features counted in the reviews."

Unfortunately, this means that the current code bases are not terribly amenable to more radical evolution, and many of the "improvements" are merely cosmetic.

The major problem with the "traditional" spreadsheet system is that it does not provide much in the way of "structuring tools" to recognize and enforce the structure of the data model, as described in Problems with Modern Spreadsheet Developments.
Improve proposed better ways of building models, and essentially mandated constructing spreadsheets as a process of constructing a system model.

This had the unfortunate effect of preventing the "free form" spreadsheet construction that traditional spreadsheets encouraged.
Model Master provides considerable power in defining models, provides all sorts of "strong typing" options, and provides the logical extension of having the language specify access to robust data sources like relational databases, but has two substantial demerits:
- It altogether rules out "free form" construction of data into sheets
  
  Although this is changing, as construction of a "decompiler" is underway.
- It mandates using a declarative programming language to describe the model.
  
  The implicit "programmability" of the cellular automata means that the average user doesn't need to know about programming; unfortunately, Model Manager pushes programming in their face.

It seems to me that a "step forward" is to try to take the merits of each of these approaches, whilst seeking to avoid the demerits.

Robert Monfera <monfera@fisec.com> pointed out to me what he described as "uniform structural unification;" basically the notion of taking spreadsheets in the traditional "free form" pioneered with VisiCalc, and then, rather than starting by trying to enforce structure (as was the case with Improv), instead searching for structure.

It's not a mechanism of infinite analytical power; it likely will only be helpful to find some limited bits of structure. Of course, "limited" may still be sufficient to actually provide some useful added functionality to relatively unsophisticated users, and forcing people to start from a data modelling perspective, as with Improv , hasn't proven too terribly popular.

1.2.3.1. A Possible Architecture

2.3.1.1. Start Out With a Free-Form Sheet

The strength of the traditional spreadsheet is in providing a "free form" medium where users may construct models without directly having to program.

So, we start with a front end that is a very "traditional" sort of spreadsheet. Rows, columns, cells, formulae.

2.3.1.2. Attaching "Rules" Via Pattern Wizards

It would be nice to get the benefits of Model Master, in providing the ability to attach fairly strong "rules" to portions of the spreadsheet, whether enforcing the use of common formulae or in enforcing "strong typing" of the data types used in those regions. For instance, a region that represents "dates" should contain dates, and only dates.

The route to this is to use some "artificial intelligence-like" techniques to search for patterns in the data, and to write up rules to propose to the user. I will call these "Pattern Wizards."

This could include patterns such as the following:

Detecting sequences of cell contents that look, for instance, like dates.

The proposal would then offer to:
- Give the region a name indicating that it is a "date" region;
- Attach "type" information to the region, requiring that all cells contain dates;
- If a clear sequence is indicated, offer a formula that would compute the contents of the cells.
Detecting that a region contains a Price/Quantity formula.

For instance, there may be a column that multiplies the contents of a cell in another column of the current row (indicating quantity?) by either a specific value, or by the contents of one cell somewhere in the sheet (perhaps containing a constant price?), or by the contents of a cell in another column (indicating a price for the current row?).

Such a pattern is suggestive of a price/quantity relationship, and the system could offer to:
- To establish all three regions (price, quantity, total cost) as having names;
- To attach "type" information to all three regions;
- To name and attach the single formula to all of the formulae cells.
Detecting a "running balance" formula.

If there is commonality that a sequence of cells add together the "cell above" with "values to the left," this looks like a running balance.

A similar set of proposals could be generated, to attach "type" information, to attach names, and to attach the single formula across all the "balance" cells.
In order to provide goodly flexibility in offering "abstractions" that make for convenient formulae, the system would need to allow construction of "user-defined" functions.

My preference would be for this to be a dynamic language such as Lisp; a critical factor is for the language to be quite functional, where cells receive one value that is solely based on the input parameters. Other interesting alternatives would include:
- ML, which makes major use of the notion of strong static typing; this would address the issue of the way cells can contain different "types" of data, and prevent many classes of runtime errors.
- Haskell, characterized by being purely functional, and providing lazy evaluation.
  
  Lazy evaluation is a particularly useful notion for a spreadsheet, as it allows deferring calculations until they are actually needed.
- It would be valuable for the "extension language" to be the language in which the "pattern wizards" described above are constructed so that it is possible to augment the patterns without a need to deploy a whole new system.
  
  The point here is not to expect users to write their own patterns, as most won't be able to cope with this. The "average user" is not going to be writing "Pattern Wizards," but will rather receive "canned" ones.
  
  However, an organization might hire a programmer that looks around to find the organization's "favorite patterns," and create Wizards to detect them.

2.3.1.3. Attaching Additional Attributes to Regions

Note that the "pattern wizards" are given the ability to decide that certain regions of the spreadsheet are to have certain names, and are to use common types/formats/formulae.

A logical extension to this would be to allow attaching database tablesto regions, so that you might have columns that look up database information based on either static SQL SELECTs, or looking up data based on other cells.

.3.2. ZigZag: Another Possible Approach

ZigZag, an invention of Ted Nelson, is a new type of data structure. For mathematicians, the key words would be discrete, multidimensional, locally euclidean, with global directions (coordinate axes). Nonmathematicians can find an explanation in the FAQ at the GZigZag project website, but as a short and very inadequate summary, it is a spreadsheet on steroids. Or a database on acid. Or a filesystem on ... whatever.

There is a free implementation, GZigZag , written in Java using Swing.

Major properties of the system are thus:

Discrete

the information is stored in cells, kind of like a spreadsheet.

Multidimensional

instead of two dimensions, X and Y, that a spreadsheet has, a ZigZag space can have any number of dimensions which are distinguished by strings.

Locally Euclidean

a spreadsheet is globally euclidean, i.e. it is a lattice. ZigZag is only locally euclidean, so the neighborhood of (i.e. the cells next to) a given cell looks euclidean: if you go up and come down, you are back where you were before.

But if you start from location 1, and go up, left, down and right, you might not get back to where you were, let's say you are at location 2. But if you then go left, up, down and right from location 2, you get back to location 1.

The connections in ZigZag are user-alterable so you can connect any two cells along any given dimension, but because of the local euclidean constraint, each cell can be connected to only one cell in the positive and one cell in the negative direction on each dimension.

Computer scientists might note that ZigZag is an interesting special case of graphs.

3. Crossreferences

Scott Nealy on "From Spreadsheet to Websheet"
Information on Excel's Pivot Table facility.

Pivot tables are roughly analagous to database views, and are doubtless essentially copied from the functionality of Lotus Improv.
Nathan's site on Developing Excel Sheets
Designing a spreadsheet

	... It's worth noting that Improv flattered an earlier program, Javelin (by a company of the same name located in Cambridge MA).
-- Bob Frankston