What HTML Is

Submitted by Syscrusher on Mon, 2005/06/06 - 23:04.

Most people are familiar with creating documents and memos in a word processor or text editor. Creating a web page is very similar to this, but instead of just typing your text you also need to describe how the text should look and work in a web browser.

In traditional publishing, a document written in unadorned text is marked up to show how it should look in final published form. There are special symbols known to proofreaders that indicate that text should be boldface, italicized, underlined, and so on. Documents have titles, chapter headings, section headings, footnotes, and paragraph boundaries. They may include tables of information consisting of rows and columns, pictures or diagrams, logos, and specialized layout design to help the reader understand how the information is organized.

On the web, all of these things are still true, but there is the additional complexity of document hyperlinks. These are the special areas in text (or sometimes in a graphic) which the viewer can select in order to jump to another page, or perhaps to another section of the same page. For example, selecting the words "Table of Contents" at the bottom of this page will cause your browser to display that page.

Web pages use an internationally-standardized format called HyperText Markup Language, or HTML, to describe the appearance and structure of each page, and the relationships (links) between pages. An HTML document consists of all of the text you create (the content) plus the special markup tags which instruct the web browser how to interpret and display your content.

The entire web document, both content and tags, are really just plain text -- there is nothing magical about an HTML file, and you can open and edit them in your system's standard text editor. The tags are just text that appears in such an unusual form that it would seldom occur in an ordinary person's writing (and if it does, there are ways to handle those special cases, as discussed in a later lesson).

A Simple Example

Perhaps the best way to illustrate HTML is to show a very simple example. What follows is an actual -- albeit trivial -- HTML document.

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<!-- I created this with a text editor -->
<head>
<title>My First Document</title>
</head>
<body>
<h1>My First Document</h1>
<p>This is just a little document. It contains only
two sentences, and is of <i>very little</i> use.</p>
</body>
</html>

See how easy it is to tell what is a tag and what isn't? Without knowing any HTML at all, you can easily see that things like <html>, <body>, </i>, and so forth are special, and not part of the regular text. Everything inside the <> symbols is a tag.

So what does all that mean? Let's take it step-by-step. The explanation that follows is long, but if you spend a little time reading it carefully, you will understand most of the fundamentals of writing HTML. Take the time to understand this page before you move on, and this knowledge will serve you well later. The rest is easy!

  • The first line, which begins with "<!DOCTYPE" is a document type descriptor (DTD). Basically, this line should appear exactly as shown in each of your HTML documents, and it simply informs the browser that this is indeed a standards-compliant HTML document. HTML standards are defined by a group called the World Wide Web Consortium, abbreviated as "the W3C". "IETF" stands for Internet Engineering Task Force, the nonprofitorganization that defines how the Internet itself works in a technical sense. DTD and HTML you already know, and "EN" stands for English, the language in which this document is written. For now, just copy and paste this line into each of your documents, as the very first line.
  • The "<html>" is your first tag. HTML tags are easy to spot because they begin with a less-than sign "<" and end with a greater-than sign ">". Notice that the actual document (not counting the DTD) always begins with <html> and always ends with </html>. The slash ("/") in "</html>" means "end" or "close". Most HTML tags are matched this way, with the tag and its closing tag exactly corresponding to one another. There are a few exceptions, which will be pointed out later.
  • The next line, beginning with "<!--" and ending with "-->", is a comment. Comments are text that you put into the document for your own benefit, perhaps as a reminder of something that isn't quite finished yet and which you intend to change later. Comments do not display in a web browser, but the viewer can see them by asking his or her browser to show the HTML source of the page. Adding comments to pages is a good idea, to help you remember things about the page, but comments are never required. You can put as much text as you like inside a comment, and it doesn't have to fit all on one line. Comments can go anywhere in your document but not inside a single tag (for example, "<body <!-- some comment -->>" is not valid, but "<body>....<!-- some comment -->..." is perfectly fine).
  • The next few lines are the heading section and the title of the document. Now, it may seem redundant because the title would appear to be all that is in the heading, but that's only because this is a very simple document. Later on you will learn how to put other things into the heading, such as keywords that help search engines to find your page.
  • Notice how the heading section, that is, everything from <head> through </head>, completely contains the title (everything from <title> through </title>)? This inclusion of one tag set completely inside another is called nesting and it is a very important concept. Nested tags must always, always, always be done this way, with one tag completely inside the other. For example, <head>.....<title>....</title>...</head> is valid because the title tag set is completely within the head tag set. But <head>...<title>...</head>...</title> is not correct because the title begins inside the head but tries to end outside the head. If you do this in your HTML, the results may look okay visually in some browsers but will not work right all the time.
  • After the heading section comes the document's body, which contains everything that is actually visible on the viewer's browser. You will learn later how to add some optional things to the <body> tag to control the document's color, background texture, and other features. All of the HTML tags that relate to actual content, such as paragraphs, section headings, fonts, graphics, and hyperlinks, are only used within the body section.
  • The first thing in the body is the top-level heading, marked by the <h1>...</h1> tag set. There are six heading levels available to you, and as you might suspect their tags are <h1>...</h1>, <h2>...</h2>, and so on through <h6>...</h6>. The HTML standard from W3C does not specify exactly how each heading level will look, but you can be assured that a level-one heading will be considerably more emphasized than a level-six heading. These headings should be used like outline levels. Headings are optional, not required. A later lesson will talk about some recommended design practices, but you can use headings in whatever [reasonable] way makes sense for your content.
  • Here's a question: If we have a level-one heading <h1>My First Document</h1>, then why also bother to specify the same text as a title <title>My First Document</title>? What's the difference? The answer is that the heading displays on the actual web page, like anything else in the body of the document. The title is for the browser's use only, and the W3C does not specify how it is to be used. Most modern graphical browsers, such as Netscape or Internet Explorer, put the title in the top bar (also called the "title bar") of the browser window itself. The W3C made the title tag separate from the heading tags so that you, the web designer, could choose to have different text displayed in these two locations.
  • DESIGN TIP: If someone visits your page and chooses to add it to his or her "Bookmarks" or "Favorites", it is the title text and not the heading that will be the name of the remembered entry. A lot of people title their home page as "Home Page" (or something equally generic), but that's bad because the person who bookmarks the page will not be able to easily tell whose home page that bookmark represents. So choose your page title with care!
  • Following the heading is a paragraph of ordinary text, within a <p>....</p> tag set. The </p> tag is actually optional, but adding it is good design practice and will make your HTML easier to maintain because you can clearly see where the paragraph ends. Part of the text, specifically the words "very little", is italicized using the <i>...</i> tag set. There are a number of similar tags for things like boldfacing, underlining, strikethrough, and typeface (font) selection.
  • Finally, the </body> and </html> tags indicate the end of the body and the end of the document, respectively. Notice how, looking back at the document, every tag is correctly nested? Here are all the tags, in order:

    <html><head><title>...</title></head><body><p>...<i>...</i></p></body></html>

    It may help, if your document isn't working the way you want it to, to actually print out your HTML source code (from the text editor, not the browser) and use a pen to draw brackets around each tag set, connecting the opening of a tag with its closing tag. If you have bracket lines crossing each other, that usually indicates a problem.

This little page doesn't contain much content, but it illustrates all of the fundamental elements of HTML. Click here to see what the page actually looks like as rendered by your browser. To see a more complex example, ask your browser to show you the HTML source code of this page (or any page of the tutorial).

( categories: | )