The Interent is not TV: Web Publishing

Communications of the ACM, Vol 38, No 3, March, 1995, pp 17-23


Don't turn the page. This is not yet-another article on how great the World Wide Web is, and how fast it is growing. By now you must have read many of those, and surfed the Web a bit. But, the Internet is not television -- are you ready to become a Web author? This article describes tools and considerations for publishing on the Web, and looks toward the future.{footnote 1} Let's start with server platforms.

Server Platforms

I have tried two Web servers -- the WebMaster Starter Kit (WSK) [11] from Enterprise Integration Technologies (EIT) on a Silicon Graphics Indy running Unix, and HTTPD [3], written by Bob Denny, on a small Dell PC running Microsoft Windows 3.1. Both are distributed at no cost to single users.{footnote 2}

WSK installation is guided by a series of Web pages. The first is a form asking for a few values like your email address and operating system. Next you are told to create a WSK subdirectory, the software is downloaded to that directory, and you run "sh" to unpack it. You are also given the option of downloading some utilities, like a utilization analysis program. In five minutes you have a home page that can be accessed by anyone with a Web browser anywhere in the world.

This foreshadows electronic software distribution. The WSK is free, but it would be easy to integrate charging, registration, and upgrades. Support could also be provided on-line, with remote technicians accessing customer machines. EIT will market their software distribution system, and for a comprehensive discussion of current commercial developments in electronic software distribution, see [13].

There are versions of the WSK for most commercial unix platforms, and for Linux, an Intel-based, public domain unix named for its author, Linus Torvalds. There is an ambitious Linux developer community (for example, software to execute Windows applications is under way), and it is supported in active news groups. If you know unix, Linux on a PC would be an excellent platform for a low-cost Web server.

Linux is free on the Internet, but buying a CD-ROM is more convenient. Morse Telecommunications bundles ready-to-run binaries, 300 megabytes of compressed source code, TCP/IP, X- Window, compilers, interpreters, applications, even games on a CD-ROM. The package also includes a printed manual and 30 days technical support. A minimal installation takes only 7 MB, and can reside in the DOS file system. The bulk of the software remains on the CD-ROM drive.

Bob Denny's HTTPD is for Windows, not Unix, and is a bit harder to install than the WSK. You download it, and unpack it, creating a directory structure. A few minor modifications must be made to your autoexec.bat and win.ini files, and you are ready to go. Within an hour (mostly spent reading on-line documentation), I was up and running.

After working with WSK and HTTPD, I have concluded that Windows is fine for experimentation and light support of informal groups, but, today, unix is the platform of choice for a production Web server. New tools are developed first for unix; the commercial unix platforms are faster than PCs; and protection and multitasking make for reliability and speed. The public domain Web servers are fast and robust, and commercial unix servers are developing rapidly -- particularly in the areas of transaction processing and security.

That is today, but tomorrow will bring other options. Developers will continue unix support, but Microsoft Windows NT (NT) and Windows 95 will bloom. {footnote 3}

Industry and marketing considerations compel a serious look at NT as a server for the Web and other applications. Since it offers a way out of the Intel X86 straitjacket, NT is strategic for Microsoft, and while unix is divided, Microsoft controls NT development. They will also continue to control most desktop clients with Windows, Windows 95, and eventually NT-Workstation. Microsoft is pressuring developers to make Windows applications available for both NT and Windows 95, and with their muscle, they will succeed (benefiting both themselves and users). Web application developers will also be assured of compatibility with Microsoft's low-cost development tools like Visual Basic, Visual C++, Access, and Excel.

Microsoft plans an NT Resource Kit, which will include Web, Gopher, and WAIS servers. This software is from the European Microsoft Windows NT Academic Centre (EMWAC) at the University of Edinburgh. EMWAC is a consortium funded by Microsoft, DEC, Sequent, Datalink Computers, and Research Machines, and current versions of the EMWAC servers are available as shareware [4]. Although Microsoft does not recommend it, the EMWAC Web server will run on an NT client as well as a server, providing an upgrade path for small applications. Commercial server companies, including Mosaic Communications and EIT, also plan NT Support.

It is no surprise that Microsoft likes NT, but they are not alone. A recent Forrester Research survey of the plans of 50 Fortune 1000 companies predicts NT-based servers will be selling over 300,000 units per year in 1998, about double their unix prediction [1]. Forrester's respondents feel unix will continue as a high-end application server, Netware will continue as a file server, but not an application server, and OS/2 servers are going nowhere. A recent review of NT [8] concludes "NT 3.5 is certainly a worthy rival to Netware and VINES as a network operating system platform and with unix as a premiere application server platform. And it costs much less."

Web Desktop Publishing

Unix and NT may be the platforms of choice for high volume servers, but they may not be necessary for small applications. The Web may usher in an era of network-based desktop publishing in which individuals, committees and project teams, grade schools, little league teams, homes, and so forth have Web home pages on their desktops. For example, I set up a home page for a single class on a small Dell PC. That home page lists the student's names, and has links to pages in their directories. (Their directories are on a server, but they could be on the desktop PC). The students used their home pages for personal statements and class projects like a prototype of the university catalog, and home pages for an art gallery, a catalog business, the Los Angeles YMCA, Kia Autos, and Hilton Hotels.

Small, local applications like this can run on a client operating system, and Bob Denny is developing a new, commercial version of his HTTPD for Windows 95. It will be bundled with a book from O'Reilly and Associates and Web document-management tools from EIT [4]. The EIT tools are designed to manage collections of Web documents -- tracking their locations and verifying links between them. Denny also plans to include server management tools (remotely operable) and tools to simplify application development. This package sounds like it will be a good solution for the small publisher, and it has room to grow since the software will be compatible with both Windows 95 and NT.

Will we have home-based Web servers? I attended the 1994 Western Cable Show, run for cable TV system operators. The mainstream show featured CATV equipment and programming vendors, but there was also an extensive CableNET section in which 51 companies set up a cable headend, and demonstrated cable-based access to the Internet, telephone system, on-line services, video servers, educational programs, and so forth.{footnote 4} I surfed the Web using a system with 2 mbps burst upstream data rate -- surely sufficient for the Press Family server.

Cable companies, phone companies, and wireless services will eventually bring high-speed connectivity to our homes, but first there are regulatory and technical hurdles to be overcome, and system operators must see this as a viable business. A channel devoted to Internet connectivity will have to pay a better return than alternative uses of the bandwidth. Putting family albums on a Web server will not generate as much revenue as the Playboy Channel, but telecommuting will, and the family server will be on-line.

Web Development and Tools

If non-technical students and people at home are to publish Web pages they need tools. Web pages are created using HTML, the Hypertext Markup Language. HTML defines tags like " ... " to delimit document elements. In HTML, simple applications are simple, and complex applications possible. Starting simply, I give students template home pages, with a place for their names and photos, and some examples showing HTML tags for constructs like titles, headings, text paragraphs, enumerated lists, links to other pages, links within a page, and included images.

While HTML pages can be created with any text editor, specialized HTML editors, partially automate tag insertion. They show either a WYSIWYG view of your work or a view with the tags exposed, and can link to a Web browser to show an evolving page as a user would see it. Editors may also include document templates, file import filters, and syntax checkers to verify HTML conformance (although today's browsers generally ignore unrecognized tags). Another approach to HTML editing is taken by Microsoft which extended Word to insert tags when a document is "saved as" type HTML.

Learning HTML basics and an HTML editor is not trivial, but it is closer in complexity to learning a desktop-publishing program than learning to build Hypercard stacks. As tools improve, it will become easier.

One can do interesting, professional work with HTML tags, but interactive applications require more sophistication. The most common interaction formats are clickable maps and forms. Clickable maps provide graphical hyperlinks. The developer designates hot regions on an image saved in the Graphic Information File (GIF) format, and directs the server to respond differently depending on which image region the user clicks on.

Forms can have text fields, scrollable text, checkboxes, radio buttons, scrollable lists, passwords, and push buttons. Figure 1 shows a simple form. When the user clicks on "add the numbers," the field names, values, and information about the client system are passed to the server using the Common Gateway Interface (CGI) protocol.{footnote 5} The developer writes a program that returns either an HTML page it composes on the fly or a link to another location. The program can also have side affects like updating a database or sending an email message. CGI programs can be written in any programming or scripting language. On unix systems, perl, an interpreted language, is commonly used, and on Windows-based servers, Visual Basic is common.{footnote 6)

Clickable maps and forms suggest database applications, but they can be used more creatively. For example, the Virtual Frog server [6] at Lawrence Berkeley Labs, Figure 2, uses clickable maps and forms to direct the "dissection" of a frog. The user can select organs to be displayed, see organ descriptions, and rotate the frog. Form and clickable-map input is passed to a server, which generates requests to a back-end processor, which accesses a 10,000,000-point data set and renders the appropriate GIF image. The image returned to the server and passed back to the client.

The Robotic Telescope [2] at The University of Bradford, England is another innovative application. The user fills in a form (see Figure 3) requesting an observation, and the server passes the request to a PC controlling a telescope at a remote observatory. When the image is recorded, it is sent from the telescope to the server, and the user is informed via email. Web-based control over a scanning electron microscope is also being developed at California State University, Hayward{footnote 7}.

Looking Forward

The thousands of Web sites are evidence of the variety of things one can accomplish using today's tools, but they are just the beginning. Change is already under way.

HTML limits control over document appearance. A character- oriented browser like Lynx, eliminates all graphic{footnote 8}. With graphically-oriented browsers like Mosaic or Netscape, the user, not the developer, determines fonts, window size, and other display characteristics. There are proposals for HTML extensions to allow things like style sheets, display-time interpretation of formatting rules, display hints from authors, and flowing text around graphics. A proposal by Sperberg-McQueen and Goldstein [10] calls for the generality of SGML on the Web.

SGML, is a metalanguage for defining markup languages, and HTML is an instance, specified as a document-type definition (DTD), of such a language. An SGML-based browser could download a DTD and style sheet (if they were not already cached) and interpret any SGML document. This would give the author more control over appearance, and would also allow semantic interpretation. For example, a browser could know an item was a "table that can be imported into a spreadsheet" or "an equation that can be imported into Mathematica." The browser could also use structural information, for example, allowing an outline view, or showing only the abstract of an article. For further discussion of SGML in relation to the Web, see [7] and http://www.sil.org/sgml/sgml.html.

Adobe Systems' portable document format (PDF), is a proprietary means of controlling document appearance. PDF is a platform and resolution-independent format for document viewing, linking, and searching. PDF documents can have hot links to Web documents, and when the user clicks on them, a Web browser is evoked and the document is found and displayed. Similarly, if you come across a PDF document while using a Web browser, Adobe's PDF viewer is evoked.

There is also a need to adapt old electronic documents, legacy documents, to the Web. This can be done using Acrobat, Adobe's authoring system, which can read documents created by many word and page processing programs and convert them to PDF. Interleaf's Cyberleaf is another program for converting documents written using word processors or desktop publishing systems to Web pages. It is not just a collection of filters, but an environment for managing linked, distributed Web documents. It is available now for various unix systems, and will soon be available for Windows.

I have not tested Acrobat or Cyberleaf, but regardless of their features and speed, a key difference is their output -- PDF or HTML. PDF has the advantage of providing control over document appearance today, but, unlike HTML, it is proprietary. Adobe's PostScript page-description language became a de facto standard, and time will tell if PDF will do the same.

Full-text search is another development area. It is possible to pass Web search requests to public domain WAIS servers, and other tools for indexing and retrieving text are becoming available. For example, Verity is adapting their text retrieval software to the Web; Adobe's PDF files are searchable; and WAIS, Inc. has a commercial gateway that allows Web servers to initiate full text searches.

Interaction using Web forms is still clumsy and slow. Users demand responsive, direct manipulation interfaces for database searching and browsing are needed. We need protocols for downloading data and searching with tools like Ben Shneiderman's dynamic query engine [8, 12].

Utility programs are needed. I do not want to enter the hot- region coordinates of clickable bitmaps manually, I want to draw them on the image, and have the corresponding Web directives generated automatically. Similarly, I want to lay out forms using a graphical editor, and generate the corresponding HTML. Such tools will hide HTML from the developer, which will be necessary as HTML becomes more complex. (The first Web browser had an integrated editor which hid HTML).

One does not need the flexibility of being able to represent every type of page ever seen in print. The familiar elements of print documents -- footnotes, abstracts, indented quotes, 8 1/2 by 11 sheets (in the US), and so forth -- evolved over time. Since displays are different than paper, electronic documents will evolve different conventions. For example, varying degrees of linearity will become customary in different applications.

Many Web documents are highly non-linear -- collections of many screen-sized frames or links to dozens of other documents. While a link-intensive format has a place in some applications, relatively linear formats may be appropriate in others. In scholarly or technical writing, I would prefer semi-linear text with an explicit table of contents and three types of link: {ital}asides{/ital}, {ital}elaborations{/ital}, and {ital}related-documents{/ital}. An aside would be something like a footnote which the author expects the reader to either skip or quickly visit. Elaborations would include self-contained sidebars, tables, figures, or executable programs which would be visited for a longer time, before returning to the main document. Related-documents would be things like references and further- reading, from which a return is not expected. If constructs like aside, elaboration, and related-document become common, they will be made explicit in future versions of HTML.{footnote 9}

The Web will also evolve into a platform for collaboration and communication. When I read an article, I jot many marginal notes, and want to talk with the author and other readers. For now, forms are the primary means of feedback from users, but they are limited. Writeable documents will allow for conversation, joint authorship, and other forms of collaboration. Synchronous audio and video conversation will also become possible (but probably not free).

Writeable documents coupled with low-cost accessibility to the Web may very well be the start of something big. One wonders what meta behavior might emerge from a global network of millions of writeable Web servers. Whatever emerges, I cannot predict it any better than a bee can predict that its local activity produces honey.


Footnotes

1. An earlier column, [8], discussed non-Web electronic publishing.

2. For pointers to many public domain Web servers, see: http://info.cern.ch/hypertext/Web/Daemon/Overview.html.

3. Novell has promised a Web server for NetWare, but it will not be available for a year, after they make improvements in their operating system and development environment.

4. This is reminiscent of the network connecting the booths at Interop Conferences. Merging CableNET with an Interop show net would demonstrate interoperability.

5. All fields are treated as character strings. Type checking by the browser would be a valuable extension.

6. There are several public domain perl interpreters for Windows, but the author of one recommended that I back up my hard disk before trying it out, so I decided to pass.

7. For information, contact Nancy Fegan, nfegan@csuhayward.edu or Nancy Smith, nsmith@csuhayward.edu or Richard Tullis, rtullis@csuhayward.edu.

8. While we speak of graphical developments, character-oriented viewers are fast, economical, and will be with us for many years, particularly in developing nations and poor communities.

9. Meanwhile, browsers should implement an option to linearize and download documents following only links on the home server. This would give the reader the choice of following hyperlinks on the screen or downloading and printing a hard copy.

References

1. Buchanan, Richard D. and McCarty, John C., "Server Operating System Shootout," The Computing Strategy Report, Forrester Research, Camrbidge, MA, Vol 11, No 12, pp 2-12, October, 1994.

2. Cox, Mark J. and Baruch, E. F., Robotic Telescopes: An Interactive Exhibit on the World Wide Web, http://www.eia.brad.uk/.

3. Denny, Bob, http://www.alisa.com/win-httpd/

4. EMWAC, http://emwac.ed.ac.uk/html/top.html.

5. Glicksman, Jay., Kramer, Glen A., and Mayer, Niels P., "Internet Publishing via the World Wide Web," Proceedings of Groupware '94, San Jose, August, 1994, pp. 431-442, http://www.eit.com/papers/gpware94/paper.html.

6. Johnston, W. E., "The Whole Frog Project," http://george.lbl.gov/ITG.hm.pg.docs/Whole.Frog/Whole.Frog.html.

7. Michalski, Jerry, "Content in Context: The Future of SGML and HTML," Release 1.0, pp 1-20, September 27, 1994.

8. Press, L., "Emerging Dynabase Tools," Communications of the ACM, March, 1994.

9. Robertson, Bruce, "Microsoft Windows NT Version 3.5," Network computing, December 1, 1994, pp 72-79.

10. Sperberg-MdQueen, C. M. and Goldstein, Robert F., "HTML to the Max: A Manifesto for Adding SGML Intelligence to the World-Wide Web," Proceedings of the Second Web Conference, Chicago, October, 1994, http://www.ncsa.uiuc.edu/SDG/IT94/IT94Info.html.

11. Weber, Jay C., "Webmaster's Starter Kit," Proceedings of the Second Web Conference, Chicago, October, 1994, http://www.eit.com/papers/wsk/www94.html, the software is at http://www.eit.com/

12. Williamson, C. and Schneiderman, B., "The Dynamic Home- Finder: Evaluating Dynamic Queries in a Real-Estate Information Exploration System, "Proceedings of the ACM SIGIR '92," Copenhagen, June, 1992, pp. 338-346.

13. --, "Money in the Middle," ComputerLetter, Vol 10, Number 36, October 31, 1994, Technologic Partners, 371.2412@mcimail.com, 212-696-9330


Pointers

O'Reilly and Associates, http://gnn.com.ora/, is the Web providers best friend. In addition to Bob Denny's server software and companion book, the following are of interest to unix-based Web developers:

Larry Wall and Randal L. Schwartz, "Programming Perl," and Randal L. Schwartz, "Learning Perl." Perl is an interpreted language with string-handling capability is frequently used for writing CGI programs.
Olaf Kirch, "The Linux Network Administrator's Guide." Get it if you are basing your server on Linux.
Cricket Liu, Tony Sanders, Bryan Buus, and Jerry Peek, "Managing Internet Information Services." This book is confined to unix, but covers Gopher, WAIS, and Web. The Web chapters document system and administration as well as development topics like clickable images and CGI programming.

Each of these excellent HTML editors has a public domain version which be sufficient for what you do:

HTML Assistant, a Windows-based editor that includes unix text import and a generalized page template. Preview the public domain version at http://cs.dal.ca/ftp/htmlasst/htmlafaq.html.
HotMetal, has Windows and Unix versions which include HTML syntax checking and tag and WYSIWYG views. Preview the public domain version at: http://www.sq.com/.
HTML Writer, a Windows-based editor with dialog boxes for creating forms and inserting images. The shareware program is at: http://wwf.et.byu.edu/~nosackk/html-writer/get_copy.html

The Proceedings of the Second Web Conference, October 17-20, 1994, is an excellent source of information and ideas. There are papers on applications, products, Web design and management, the future of Web standards and technology, and so forth. The proceedings are at: http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings. Future conferences will be in Darmstadt, Germany (April, 1995), Boston Massachusetts (Fall, 1995). For information on these see http://www.osf.org:8001/ri/announcements/Web_Conf_F94.html.

Two industry newsletters are recommended for those with an interest in the topics covered here:

Release 1.0, EDventure Holdings, (212)924-8800, 511.3705@mcimail.com.
Seybold Report on Desktop Publishing, (800) 325-3830.

Companies mentioned in this article

Interleaf: http://www.ileaf.com/

Morse Telecommunications: http://www.morse.net/

EIT: http://wsk.eit.com/wsk/doc/

Mosaic Communications: http://home.mcom.com/home/welcome.html

Adobe: http://www.adobe.com/

Verity: http://wwww.verity.com/

WAIS Inc.: http://www.wais.com/


Figure Captions

Figure 1. A Simple Form Example. The user supplies two numbers (a), the values are sent to the server, and a developer-written program returns the result (b).

Figure 2. Virtual Frog Dissection

a. The user specifies the organs to display using this form. A choice of Spanish or English is also allowed.

b. The appropriate image is returned, and the user can either display the name and a short description of an organ by clicking on it or rotate the image.

c. Here the image has been rotated.

Figure 3. Robotic Telescope Operation. A user request form and the corresponding image.


Disclaimer: The views and opinions expressed on unofficial pages of California State University, Dominguez Hills faculty, staff or students are strictly those of the page authors. The content of these pages has not been reviewed or approved by California State University, Dominguez Hills.