MUMPS | |
Paradigm: | Imperative, procedural |
Designer: | Neil Pappalardo, Curt Marble, Robert A. Greenes |
Latest Release Version: | ANSI X11.1-1995 |
Typing: | Typeless |
Influenced By: | JOSS |
Influenced: | PSL, Caché ObjectScript, GT.M |
Operating System: | Cross-platform |
MUMPS ("Massachusetts General Hospital Utility Multi-Programming System"), or M, is an imperative, high-level programming language with an integrated transaction processing key–value database. It was originally developed at Massachusetts General Hospital for managing patient medical records and hospital laboratory information systems.
MUMPS technology has since expanded as the predominant database for health information systems and electronic health records in the United States. MUMPS-based information systems, such as Epic Systems', provide health information services for over 78% of patients across the U.S.[1]
A unique feature of the MUMPS technology is its integrated database language, allowing direct, high-speed read-write access to permanent disk storage.
MUMPS was developed by Neil Pappalardo, Robert A. Greenes, and Curt Marble in Dr. Octo Barnett's lab at the Massachusetts General Hospital (MGH) in Boston during 1966 and 1967. It grew out of frustration, during a National Institutes of Health (NIH)-support hospital information systems project at the MGH, with the development in assembly language on a time-shared PDP-1 by primary contractor Bolt Beranek & Newman, Inc. (BBN). MUMPS came out of an internal "skunkworks" project at MGH by Pappalardo, Greenes, and Marble to create an alternative development environment. As a result of initial demonstration of capabilities, Dr. Barnett's proposal to NIH in 1967 for renewal of the hospital computer project grant took the bold step of proposing that the system be built in MUMPS going forward, rather than relying on the BBN approach. The project was funded, and serious implementation of the system in MUMPS began.
The original MUMPS system was, like Unix a few years later, built on a DEC PDP-7. Octo Barnett and Neil Pappalardo obtained a backward compatible PDP-9, and began using MUMPS in the admissions cycle and laboratory test reporting. MUMPS was then an interpreted language, yet even then, incorporated a hierarchical database file system to standardize interaction with the data and abstract disk operations so they were only done by the MUMPS language itself. MUMPS was also used in its earliest days in an experimental clinical progress note entry system and a radiology report entry system.
Some aspects of MUMPS can be traced from RAND Corporation's JOSS through BBN's TELCOMP and STRINGCOMP. The MUMPS team chose to include portability between machines as a design goal.
An advanced feature of the MUMPS language not widely supported in operating systems or in computer hardware of the era was multitasking. Although time-sharing on mainframe computers was increasingly common in systems such as Multics, most mini-computers did not run parallel programs and threading was not available at all. Even on mainframes, the variant of batch processing where a program was run to completion was the most common implementation for an operating system of multi-programming.
It was a few years until Unix was developed. The lack of memory management hardware also meant that all multi-processing was fraught with the possibility that a memory pointer could change some other process. MUMPS programs do not have a standard way to refer to memory directly at all, in contrast to C language, so since the multitasking was enforced by the language, not by any program written in the language it was impossible to have the risk that existed for other systems.
Dan Brevik's DEC MUMPS-15 system was adapted to a DEC PDP-15, where it lived for some time. It was first installed at Health Data Management Systems of Denver in May 1971. The portability proved to be useful and MUMPS was awarded a government research grant, and so MUMPS was released to the public domain which was a requirement for grants. MUMPS was soon ported to a number of other systems including the popular DEC PDP-8, the Data General Nova and on DEC PDP-11 and the Artronix PC12 minicomputer. Word about MUMPS spread mostly through the medical community, and was in widespread use, often being locally modified for their own needs.
Versions of the MUMPS system were rewritten by technical leaders Dennis "Dan" Brevik and Paul Stylos of DEC in 1970 and 1971. By the early 1970s, there were many and varied implementations of MUMPS on a range of hardware platforms. Another noteworthy platform was Paul Stylos' DEC MUMPS-11 on the PDP-11, and MEDITECH's MIIS. In the Fall of 1972, many MUMPS users attended a conference in Boston which standardized the then-fractured language, and created the MUMPS Users Group and MUMPS Development Committee (MDC) to do so. These efforts proved successful; a standard was complete by 1974, and was approved, on September 15, 1977, as ANSI standard, X11.1-1977. At about the same time DEC launched DSM-11 (Digital Standard MUMPS) for the PDP-11. This quickly dominated the market, and became the reference implementation of the time. Also, InterSystems sold ISM-11 for the PDP-11 (which was identical to DSM-11).
During the early 1980s several vendors brought MUMPS-based platforms that met the ANSI standard to market. The most significant were:
This period also saw considerable MDC activity. The second revision of the ANSI standard for MUMPS (X11.1-1984) was approved on November 15, 1984.
The chief executive of InterSystems disliked the name MUMPS and felt that it represented a serious marketing obstacle. Thus, favoring M to some extent became identified as alignment with InterSystems. The 1990 ANSI Standard was open to both M and MUMPS and after a "world-wide" discussion in 1992 the Mumps User Groups officially changed the name to M. The dispute also reflected rivalry between organizations (the M Technology Association, the MUMPS Development Committee, the ANSI and ISO Standards Committees) as to who determines the "official" name of the language.
As of 2020, the ISO still mentions both M and MUMPS as officially accepted names.[8]
Massachusetts General Hospital registered "MUMPS" as a trademark with the USPTO on November 28, 1971, and renewed it on November 16, 1992, but let it expire on August 30, 2003.[9]
See main article: MUMPS syntax.
MUMPS is a language intended for and designed to build database applications. Secondary language features were included to help programmers make applications using minimal computing resources. The original implementations were interpreted, though modern implementations may be fully or partially compiled. Individual "programs" run in memory "partitions". Early MUMPS memory partitions were limited to 2048 bytes so aggressive abbreviation greatly aided multi-programming on severely resource limited hardware, because more than one MUMPS job could fit into the very small memories extant in hardware at the time. The ability to provide multi-user systems was another language design feature. The word "Multi-Programming" in the acronym points to this. Even the earliest machines running MUMPS supported multiple jobs running at the same time. With the change from mini-computers to micro-computers a few years later, even a "single user PC" with a single 8-bit CPU and 16K or 64K of memory could support multiple users, who could connect to it from (non-graphical) video display terminals.
Since memory was tight originally, the language design for MUMPS valued very terse code. Thus, every MUMPS command or function name could be abbreviated from one to three letters in length, e.g. (exit program) as, = function, = command, = function. Spaces and end-of-line markers are significant in MUMPS because line scope promoted the same terse language design. Thus, a single line of program code could express, with few characters, an idea for which other programming languages could require 5 to 10 times as many characters. Abbreviation was a common feature of languages designed in this period (e.g., FOCAL-69, early BASICs such as Tiny BASIC, etc.). An unfortunate side effect of this, coupled with the early need to write minimalist code, was that MUMPS programmers routinely did not comment code and used extensive abbreviations. This meant that even an expert MUMPS programmer could not just skim through a page of code to see its function but would have to analyze it line by line.
Database interaction is transparently built into the language. The MUMPS language provides a hierarchical database made up of persistent sparse arrays, which is implicitly "opened" for every MUMPS application. All variable names prefixed with the caret character use permanent (instead of RAM) storage, will maintain their values after the application exits, and will be visible to (and modifiable by) other running applications. Variables using this shared and permanent storage are called Globals in MUMPS, because the scoping of these variables is "globally available" to all jobs on the system. The more recent and more common use of the name "global variables" in other languages is a more limited scoping of names, coming from the fact that unscoped variables are "globally" available to any programs running in the same process, but not shared among multiple processes. The MUMPS Storage mode (i.e. globals stored as persistent sparse arrays), gives the MUMPS database the characteristics of a document-oriented database.[10]
All variable names which are not prefixed with caret character are temporary and private. Like global variables, they also have a hierarchical storage model, but are only "locally available" to a single job, thus they are called "locals". Both "globals" and "locals" can have child nodes (called subscripts in MUMPS terminology). Subscripts are not limited to numerals—any ASCII character or group of characters can be a subscript identifier. While this is not uncommon for modern languages such as Perl or JavaScript, it was a highly unusual feature in the late 1970s. This capability was not universally implemented in MUMPS systems before the 1984 ANSI standard, as only canonically numeric subscripts were required by the standard to be allowed.[11] Thus, the variable named 'Car' can have subscripts "Door", "Steering Wheel", and "Engine", each of which can contain a value and have subscripts of their own. The variable could have a nested variable subscript of "Color" for example. Thus, you could say
to modify a nested child node of . In MUMPS terms, "Color" is the 2nd subscript of the variable (both the names of the child-nodes and the child-nodes themselves are likewise called subscripts). Hierarchical variables are similar to objects with properties in many object-oriented languages. Additionally, the MUMPS language design requires that all subscripts of variables are automatically kept in sorted order. Numeric subscripts (including floating-point numbers) are stored from lowest to highest. All non-numeric subscripts are stored in alphabetical order following the numbers. In MUMPS terminology, this is canonical order. By using only non-negative integer subscripts, the MUMPS programmer can emulate the arrays data type from other languages. Although MUMPS does not natively offer a full set of DBMS features such as mandatory schemas, several DBMS systems have been built on top of it that provide application developers with flat-file, relational, and network database features.
Additionally, there are built-in operators which treat a delimited string (e.g., comma-separated values) as an array. Early MUMPS programmers would often store a structure of related information as a delimited string, parsing it after it was read in; this saved disk access time and offered considerable speed advantages on some hardware.
MUMPS has no data types. Numbers can be treated as strings of digits, or strings can be treated as numbers by numeric operators (coerced, in MUMPS terminology). Coercion can have some odd side effects, however. For example, when a string is coerced, the parser turns as much of the string (starting from the left) into a number as it can, then discards the rest. Thus the statement IF 20<"30 DUCKS"
is evaluated as TRUE
in MUMPS.
Other features of the language are intended to help MUMPS applications interact with each other in a multi-user environment. Database locks, process identifiers, and atomicity of database update transactions are all required of standard MUMPS implementations.
In contrast to languages in the C or Wirth traditions, some space characters between MUMPS statements are significant. A single space separates a command from its argument, and a space, or newline, separates each argument from the next MUMPS token. Commands which take no arguments (e.g., ELSE
) require two following spaces. The concept is that one space separates the command from the (nonexistent) argument, the next separates the "argument" from the next command. Newlines are also significant; an IF
, ELSE
or FOR
command processes (or skips) everything else till the end-of-line. To make those statements control multiple lines, you must use the DO
command to create a code block.
A simple "Hello, World!" program in MUMPS might be:
and would be run with the command do ^hello
after it has been saved to disk. For direct execution of the code a kind of "label" (any alphanumeric string) on the first position of the program line is needed to tell the mumps interpreter where to start execution. Since MUMPS allows commands to be strung together on the same line, and since commands can be abbreviated to a single letter, this routine could be made more compact:
The ',!
' after the text generates a newline. This code would return to the prompt.
ANSI X11.1-1995 gives a complete, formal description of the language; an annotated version of this standard is available online.[12]
Language features include:
a<b
yields 1 if a is less than b, 0 otherwise.SET:N<10 A="FOO"
sets A to "FOO" if N is less than 10; DO:N>100 PRINTERR,
performs PRINTERR if N is greater than 100. This construct provides a conditional whose scope is less than a full line.MUMPS can be made more obfuscated by using the contracted operator syntax, as shown in this terse example derived from the example above:
^abc, ^def
. These are stored on disk, are available to all processes, and are persistent when the creating process terminates. Very large globals (for example, hundreds of gigabytes) are practical and efficient in most implementations. This is MUMPS' main "database" mechanism. It is used instead of calling on the operating system to create, write, and read files.@VBL
can be used, and effectively substitutes the contents of VBL into another MUMPS statement. SET XYZ="ABC" SET @XYZ=123
sets the variable ABC to 123. SET SUBROU="REPORT" DO @SUBROU
performs the subroutine named REPORT. This substitution allows for lazy evaluation and late binding as well as effectively the operational equivalent of "pointers" in other languages.$PIECE(STRINGVAR,"^",3)
means the "third caret-separated piece of ." The piece function can also appear as an assignment (SET command) target.$PIECE("world.std.com",".",2)
yields .
After
SET $P(X,"@",1)="office"
causes X to become "office@world.std.com" (note that is equivalent to and could be written as such).
$Order(stuff(""))
yields, $Order(stuff(6))
yields, $Order(stuff(8))
yields, $Order(stuff(10))
yields, $Order(stuff(15))
yields .
Here, the argument-less repeats until stopped by a terminating . This line prints a table of and where is successively 6, 10, and 15.
For iterating the database, the Order function returns the next key to use.
MUMPS supports multiple simultaneous users and processes even when the underlying operating system does not (e.g., MS-DOS). Additionally, there is the ability to specify an environment for a variable, such as by specifying a machine name in a variable (as in SET ^|"DENVER"|A(1000)="Foo"
), which can allow you to access data on remote machines.
Some aspects of MUMPS syntax differ strongly from that of more modern languages, which can cause confusion, although those aspects vary between different versions of the language. On some versions, whitespace is not allowed within expressions, as it ends a statement: 2 + 3
is an error, and must be written 2+3
. All operators have the same precedence and are left-associative (2+3*10
evaluates to 50). The operators for "less than or equal to" and "greater than or equal to" are '>
and '<
(that is, the Boolean negation operator '
plus a strict comparison operator in the opposite direction), although some versions allow the use of the more standard <=
and >=
respectively. Periods (.
) are used to indent the lines in a DO block, not whitespace. The ELSE command does not need a corresponding IF, as it operates by inspecting the value in the built-in system variable $test
.
MUMPS scoping rules are more permissive than other modern languages. Declared local variables are scoped using the stack. A routine can normally see all declared locals of the routines below it on the call stack, and routines cannot prevent routines they call from modifying their declared locals, unless the caller manually creates a new stack level (do
) and aliases each of the variables they wish to protect (. new x,y
) before calling any child routines. By contrast, undeclared variables (variables created by using them, rather than declaration) are in scope for all routines running in the same process, and remain in scope until the program exits.
Because MUMPS database references differ from internal variable references only in the caret prefix, it is dangerously easy to unintentionally edit the database, or even to delete a database "table".
The US Department of Veterans Affairs (formerly the Veterans Administration) was one of the earliest major adopters of the MUMPS language. Their development work (and subsequent contributions to the free MUMPS application codebase) was an influence on many medical users worldwide. In 1995, the Veterans Affairs' patient Admission/Tracking/Discharge system, Decentralized Hospital Computer Program (DHCP) was the recipient of the Computerworld Smithsonian Award for best use of Information Technology in Medicine. In July 2006, the Department of Veterans Affairs (VA) / Veterans Health Administration (VHA) was the recipient of the Innovations in American Government Award presented by the Ash Institute of the John F. Kennedy School of Government at Harvard University for its extension of DHCP into the Veterans Health Information Systems and Technology Architecture (VistA). Nearly the entire VA hospital system in the United States, the Indian Health Service, and major parts of the Department of Defense CHCS hospital system use MUMPS databases for clinical data tracking.
Other healthcare IT companies using MUMPS include:
Many reference laboratories, such as DASA, Quest Diagnostics,[14] and Dynacare, use MUMPS software written by or based on Antrim Corporation code. Antrim was purchased by Misys Healthcare (now Sunquest Information Systems) in 2001.[15]
MUMPS is also widely used in financial applications. MUMPS gained an early following in the financial sector and is in use at many banks and credit unions. It is used by the Bank of England and Barclays Bank.[16] [17] [18]
Since 2005, the most popular implementations of MUMPS have been Greystone Technology MUMPS (GT.M) from Fidelity National Information Services, and Caché, from Intersystems Corporation. The European Space Agency announced on May 13, 2010, that it will use the InterSystems Caché database to support the Gaia mission. This mission aims to map the Milky Way with unprecedented precision.[19] InterSystems is in the process of phasing out Caché in favor of Iris.[20]
Other current implementations include: