Linux Ports |
The iron penguin, Part 1Linux takes to big iron within virtual machinesSummaryBy Neale Ferguson |
Linux on S/390: Read the whole series! | |
---|---|
|
So what is this thing called the S/390? What is VM/ESA and LPAR? Where did such a port come from? For those unfamiliar with the S/390 system but interested in hearing about this Linux port, this three part series will explain:
Introduction to S/390
The S/390
(System/390) architecture evolved from the S/360 (System/360) of the 1960s. IBM,
and Thomas Watson, Jr., in particular, risked the family jewels in undertaking
the development of the S/360. It was the largest private venture in American
history, with $5 billion spent on five new plants and 60,000 additional
employees. S/360 was first to employ instruction microprogramming to facilitate
derivative designs and create the concept of a family architecture. The family
originally consisted of six computers that could each use the same software and
peripherals. The system also popularized remote computing, with terminals
communicating to the host via phone lines (see Resources,
Data General, 1997).
Since that time there have been radical changes and enhancements, but a programmer from that era would recognize many of the facilities of S/390. S/360, S/370, and S/390 (or ESA/390 as it is now known) are upwardly compatible. The S/360 was originally designed to allow programs written for earlier IBM hardware to migrate to the new platform. This required that S/360 use the IBM EBCDIC character set rather than the standard ASCII system. Above all else, this feature is what has differentiated and separated the S/360 from the rest of the computing world.
Currently, the three leading vendors that offer mainframes are IBM, Hitachi Data Systems (HDS), and Amdahl, with IBM leading sales by a good margin. The S/390 uses a 31-bit custom processor, compared to the more common 32-bit systems. This is specific to memory addressing capabilities only and not to the general processor architecture. A 64-bit version of the hardware is rumored to be in the works for release in the near future.
Basic architecture
S/390 uses 31
bits to address 2 GB of physical memory. Like many other processor platforms
(e.g., i386, PowerPC), the S/390 uses a two-tier paging scheme
(segments and pages) as opposed to the three-tier mechanism
defined in Linux. The good news is that the three-tier mechanism has already
been built for these other environments, helping ease some of the porting tasks.
In addition, ESA/390 allows for multiple address spaces of 2 GB each and
multiple translate lookaside buffers (TLBs) for mapping each separate address
space to the physical memory. Theoretically, up to 16 terabytes of address
spaces can be controlled by the hardware. We exploited this feature in the Linux
for S/390 port, simplifying complex memory processes like
copy_to_user()
to a couple of instructions.
SMP support
The ESA/390 architecture
is implemented on processors that range from a card that slips into your laptop
to a 16-way SMP configuration not much larger than a refrigerator that sits in a
corner of the machine room. IBM's largest model is a 12-way SMP system. HDS
currently ships a 13-way and has a 16-way system on the way. Amdahl already
offers a 16-way model.
Why Linux on VM/ESA? | |
---|---|
|
Processor partitioning
Processor
partitioning goes by various names according to manufacturer: Amdahl calls it
Multiple Domain Facility (MDF); Hitachi calls it Multiple Logical Partition
Feature (MLPF); and IBM calls it Logical Partitioning (LPAR). Whatever the name,
the intent is the same: It divides a single machine into multiple virtual
systems or images, each of which appears to the operating system
running in it as a complete and isolated processor. Partitioning allows you to
share all processing resources selectively. The number of partitions you can
create depends on the manufacturer and the machine type, but typically the
maximum is in the range of 10 to 15 images.
Partitioning can also be achieved using the hypervisor VM/ESA, which I'll discuss in greater detail in the next part of this series. It provides a processor with virtual machines for which the limit is measured in the range of hundreds to tens of thousands.
I/O subsystem
One of the
distinguishing features of S/390 is its channel subsystem. S/390 defines a
unified means of accessing its I/O subsystem. It does this by defining a channel
subsystem that is, in effect, a collection of sophisticated independent outboard
processing systems that take complete responsibility for performing I/O
operations from the CPU. A System/390 operating system has only to issue a
single instruction to get an I/O operation initiated. The channel subsystem and
the I/O devices will perform all the support actions, such as memory access,
path selection, and connection, and handle conditions such as RPS miss, caching,
and error recovery.
Computers are often rated for speed in terms of MIPS, sometimes (correctly) referred to as "meaningless indicators of processor speed." This is especially true of S/390. Any true estimate of MIPS must include the work performed by the channel subsystem. Each component of the subsystem may have considerable processing power that is equivalent to a standalone server. Bear this in mind when you see comparisons of CPU performance.
A more detailed explanation of the I/O subsystem as it affects the implementation of Linux on S/390 will be detailed in part two of this series.
Early operating systems
In the early
days, computing was batch oriented, and the operating systems first used on the
S/390 architecture reflected this. They had names like Basic Operating System,
Tape Operating System, Disk Operating System, and (my favorite acronym) PCP
(Primary Control Program).
These evolved into the predecessors of the OS/390 and VSE/ESA that are available today. As they evolved, significant and robust timesharing and realtime transaction processing capabilities were added.
A brief history of IBM, S/360, and
Unix
In her treatise "VM, Past, Present, and Future," Melinda
Varian (see Resources)
of Princeton University describes some interesting machinations involving the
development of System/360, MIT, timesharing, and Unix. This passage is
reproduced here with permission.
At the time IBM was embarking on its "make-or-break" development of System/360 (the grandfather of S/390), MIT was committed to timesharing and was providing timesharing services to several other New England universities as well as to its own users. At MIT, it was "no longer a question of the feasibility of a timesharing system, but rather a question of how useful a system [could] be produced". The IBMers in the MIT Liaison Office and the Cambridge Branch Office, being well aware of what was happening at MIT, had become strong proponents of timesharing and were making sure that the System/360 designers knew about the work that was being done at MIT. They arranged for several of the leading System/360 architects to visit MIT and talk with the faculty. However, inside IBM at that time there was a strong belief that timesharing would never amount to anything and that what the world needed was faster batch processing. MIT and other leading-edge customers were dismayed, and even angered, on April 7, 1964, when IBM announced System/360 without address relocation capability.The previous fall, MIT had founded Project MAC to design and build an even more useful timesharing system based on the CTSS prototype. Within Project MAC, MIT were to draw on the lessons they had learned from CTSS to build the Multics system. The basic goal of the Multics project "was to develop a working prototype for a computer utility embracing the whole complex of hardware, software, and users that would provide a desirable, as well as feasible, model for other system designers to study." At the outset, Project MAC purchased a second modified 7094 on which to run CTSS while developing Multics. It then requested bids for the processor on which Multics would run.
One of the first jobs for the staff of the new center was to put together IBM's proposal to Project MAC. In the process, they brought in many of IBM's finest engineers to work with them to specify a machine that would meet Project MAC's requirements, including address translation. They were delighted to discover that one of the lead S/360 designers, Gerry Blaauw, had already done a preliminary design for address translation on System/360. Address translation had not been incorporated into the basic System/360 design, however, because it was considered to add too much risk to what was already a very risky undertaking. It must be remembered that IBM was placing the entire future of its business on the line with System/360.
The machine that IBM proposed to Project MAC was a System/360 that had been modified to include the "Blaauw Box." This machine was also bid to Bell Labs at about the same time. It was never built, however, because both MIT and Bell Labs chose another vendor. MIT's stated reason for rejecting IBM's bid was that it wanted a processor that was a mainline product, so that others could readily acquire a machine on which to run Multics. It was generally believed, however, that displeasure with IBM's attitude toward timesharing was a factor in Project MAC's decision.
Losing Project MAC and Bell Labs had important consequences for IBM. Seldom after that would IBM processors be the machines of choice for leading-edge academic computer science research. Project MAC would go on to implement Multics on a GE 645 and would have it in general use at MIT by October 1969. Also in 1969, the system that was to become Unix would be begun at Bell Labs as an offshoot and elegant simplification of both CTSS and Multics, and that project, too, would not make use of IBM processors.
So started a period of long estrangement between System/360 and its descendents and the world of Unix. How different things might have been!
In the late '80s and early '90s, IBM had made attempts to get back into the Unix game on its mainframes with the introduction of AIX/370 and AIX/ESA. Unfortunately, these birds would not fly, and they were quickly retired to the operating system graveyard. Fortunately for IBM, AIX on the RT and RS6000 platforms did take off and has been a great line of business for the company.
The proliferation of business applications that were appearing in the Unix world prompted IBM to try a different approach to making the Unix APIs available to System/390 programmers. This time IBM came up with OpenEdition for OS/390 (later called Unix System Services, or USS) and VM/ESA. The premise behind these offerings was to provide a set of APIs to the base that would allow vendors to port their Unix applications to System/390 without rewriting the programs.
Both USS and OpenEdition still have an important, and even growing, role to play within an enterprise as a result of the advent of Linux for S/390. Their chief problem is that they are both EBCDIC implementations. The beauty of Linux for S/390 for software vendors is that it is an ASCII implementation that should look, feel, and act the same in all important respects as any other port of Linux.
Enter VM
So where did VM come from
and why was it created? Again, Melinda Varian's history of VM is the canonical
source for this material:
In the fall of 1964, the folks in Cambridge suddenly found themselves in the position of having to cast about for something to do next. A few months earlier, before Project MAC was lost to GE, they had been expecting to be in the center of IBM's timesharing activities. Now, inside IBM, "timesharing" meant TSS, and that was being developed in New York State. However, Norm Rasmussen (who had headed IBM's bid for Project MAC) was very dubious about the prospects for TSS and knew that IBM must have a credible timesharing system for the S/360. He decided to go ahead with his plan to build a timesharing system, with Bob Creasy leading what became known as the CP-40 Project.The official objectives of the CP-40 Project were the following:
The project's real purpose was to build a timesharing system, but the other objectives were genuine, too, and they were always emphasized in order to disguise the project's "counter-strategic" aspects.
- The development of means for obtaining data on the operational characteristics of both systems and application programs;
- The analysis of this data with a view toward more efficient machine structures and programming techniques, particularly for use in interactive systems;
- The provision of a multiple-console computer system for the center's computing requirements; and
- The investigation of the use of associative memories in the control of multiuser systems.
Bob Creasy and Les Comeau spent the last week of 1964 joyfully brainstorming the design of CP-40, a new kind of operating system, a system that would provide not only virtual memory, but also virtual machines. They had seen that the cleanest way to protect users from one another (and to preserve compatibility as the new System/360 design evolved) was to use the System/360 Principles of Operations manual to describe the user's interface to the Control Program. Each user would have a complete System/360 virtual machine (which at first was called a "pseudo-machine"). (The term virtual machine has been attributed to Dave Sayre at IBM Research.)
This skunk-works project (which seems to be paralleled 30 years later by the Linux for S/390 effort) resulted in CP-40, which became CP-67, VM/370, VM/SP, and VM/XA and had been transformed by the early '90s into VM/ESA. The internals are probably unrecognizable to the original developers but the underlying principles remain the same.
Virtual machines
Virtual machines
have found renewed interest in things like VMWare and Java Virtual Machines.
VM/ESA, a virtual machine, can run anything that could be run on the bare iron,
including a copy of VM/ESA itself (and a copy running in that copy, and so on).
Virtual machines provide a "padded-cell environment" that isolates one user from
another while also allowing all users access to both the real resources of the
machine and the virtual resources of the VM operating system. You can, for
example, define multiple virtual CPUs when more or fewer real ones exist, or
virtual disks that may or may not correspond to real hardware.
So, why virtual machines? R. P. Goldberg, in the March 1973 Proceedings of ACM SIGARCH-SIGOPS Workshop on Virtual Computer Systems, describes the rationale:
The development of interest in virtual computer systems can be traced to a number of causes. First, there has been a gradual understanding by the technical community of certain limitations inherent in conventional timeshared multiprogramming operating systems. While these systems have proved valuable and quite flexible for most ordinary programming activities, t hey have been totally inadequate for system programming tasks. Virtual machine systems have been developed to extend the benefits of modern operating system environments to system programmers. This has greatly expedited operating system debugging and has also simplified the transporting of system software. Because of the complexity of evolving systems, this is destined to be an even more significant benefit in the future.As a second point, a number of independent researchers have begun to propose architectures that are designed to directly support virtual machines, i.e., virtualizable architectures. These architectures trace their origins to an accumulated body of experience with earlier virtual machines, plus a set of principles taken from other areas of operating system analysis. They also depend upon a number of technical developments, such as the availability of low-cost associative memories and very large control stores, which now make proposals of innovative architectures feasible.
A third reason for the widespread current interest in virtual machines stems from its proposed use in attacking some important new problems and applications such as software reliability and system privacy/security. A final point is that IBM has recently announced the availability of VM/370 as a fully supported software product on System/370. With this action, IBM has officially endorsed the virtual machine concept and transformed what had been regarded as an academic curiosity into a major commercial product.
VM/ESA is a hypervisor, that is, it provides an interface definition to the entities running on it that is the same as the interface definition provided by the real hardware. What this means is the logical entities we call virtual machines are idealized simulations of a computer. The Control Program (CP) component of VM/ESA operates the real machine hardware and multiplexes the physical resources of the computing system to the virtual machines.
The System/390 architecture allows VM to do this because it separates its instruction set into privileged (aka Supervisor State) and nonprivileged (aka Problem State) groups. In the Supervisor State, all instructions are valid. In the Problem State, only those instructions are valid that provide meaningful information to the problem program and that cannot affect system integrity; such instructions are called unprivileged instructions. The instructions that are never valid in the Problem State are called privileged instructions. When a CPU in the Problem State attempts to execute a privileged instruction, a privileged-operation exception is recognized. A CPU executes another group of instructions, called semiprivileged instructions, in the Problem State only if specific authority tests are met; otherwise, a privileged-operation exception or a special-operation exception is recognized.
An operating system uses these privileged operations to schedule resources between competing applications that are running under it. CP will dispatch a virtual machine running the operating system in non-privileged mode and then trap any privileged operations performed by the virtual machines. When it traps these operations it can:
Similarly, when interrupts occur on the real machine, CP will determine if the interrupt needs to be reflected to a particular virtual machine, such as when an I/O operation that had been initiated by a Linux virtual machine has just completed.
Much of the workload for intercepting and simulating instructions and interrupts for a virtual machine has been lifted from CP by the inclusion of hardware assist functions built into the processor complexes. These hardware assists provide significant performance boosts for the virtual machine.
VM and open source
VM started out
within IBM but was soon adopted by the user community, which soon started
providing new functions, enhancements, and fixes to the operating system. The
code was a licensed program product of IBM but was free of charge and came with
complete source. The philosophy of the development team is best described in the
words of one of the chief architects, Bob Creasy:
"The design of CP/CMS by a small and varied software research and development group for its own use and support was, in retrospect, a very important consideration. It was to provide a system for the new IBM System/360 hardware. It was for experimenting with timesharing system design. It was not part of a formal product development. Schedules and budgets, plans and performance goals did not have to be met. It drew heavily on past experience. New features were not suggested before old ones were completed or understood. It was not supposed to be all things to all people. We did what we thought was best within reasonable bounds. We also expected to redo the system at least once after we got it going. For most of the group, it was meant to be a learning experience. Efficiency was specifically excluded as a software design goal, although it was always considered. We did not know if the system would be of practical use to us, let alone anyone else. In January 1965, after starting work on the system, it became apparent from presentations to outside groups that the system would be controversial. This is still true today." (Varian, p. 97)
However, gradually what had been public started to become more and more private. On February 8, 1983, IBM announced its Object Code Only (OCO) policy. The VM community made an enormous effort to convince IBM's management that the OCO policy was a mistake. Many people contributed to the effort in SHARE (an IBM user group) and in the other user groups.
In February 1985, the SHARE VM Group presented IBM with a White Paper that concluded with the sentence, "We hope that IBM will decide not to kill the goose that lays the golden eggs." IBM chose not to reply to it.
A few months after the announcement of the OCO policy, IBM released the first OCO version of VM, VM/PC. VM/PC had a number of problems, including poor performance and incorrect, missing, or incompatible functions. Without source, users were unable to correct or compensate for these problems, so nobody was surprised when VM/PC fell flat.
IBM continued throughout the decade to divert much of its energy to closing up its systems, not noticing until too late that the rest of the industry (and many of its customers) were moving rapidly toward open systems. By 1991, the same time Linus Torvalds began releasing his first Linux efforts, IBM made major parts of VM Object Code Only (OCO: no source) and Object Code Maintained (OCM: source available but fixes are object files only). IBM was doing the exact opposite of what Richard Stallman was advocating with regard to open source.
This is a salutary lesson for devotees of open source software: The price of open source is eternal vigilance.
VM has always been the bastard child of IBM. It is extremely efficient, which means that you do not need as much hardware to run it. This does not please those who sell hardware. Every so often IBM attempts to kill it off, but it has proven resilient:
"Throughout 1967 and very early 1968, IBM's Systems Development Division, the guys who brought you TSS/360 and OS/360, continued its effort to have CP-67 killed, sometimes with the help of some IBM Research staff. Substantial amounts of Norm Rasmussen's, John Harmon's, and my time was spent participating in technical audits which attempted to prove we were leading IBM's customers down the wrong path and that for their (the customers'!) good, all work on CP-67 should be stopped and IBM's support of existing installations withdrawn." (R. U. Bayles quoted in Varian, p. 97).
Now with Linux for S/390, VM is again coming into its own. VM has a lot to offer Linux in the S/390 environment. Think of it as a highly intelligent BIOS that relieves Linux of distractions such as dynamic sparing and hardware recovery, as well as supporting the concurrent operation of thousands of virtual machines.
Finally, after years of working its way through the beast that is the IBM
bureaucracy (and the fact that the bottom line was starting to hurt), IBM
rediscovered open source.
About
the author Neale Ferguson is a long-time IBM S/390 system administrator with over nineteen years of experience with VM/ESA. He worked on the non-IBM port of Linux to S/390 and jumped to the IBM-sponsored effort as soon as it was released. Formerly with TAB Limited in Sydney, Australia, he currently works as a consultant at Computer Associates in Reston, Virginia. |