What HTML Is
Submitted by Syscrusher on Mon, 2005/06/06 - 23:04.
Most people are familiar with creating documents and memos in a word processor or text editor. Creating a web page is very similar to this, but instead of just typing your text you also need to describe how the text should look and work in a web browser.
In traditional publishing, a document written in unadorned text is marked up to show how it should look in final published form. There are special symbols known to proofreaders that indicate that text should be boldface, italicized, underlined, and so on. Documents have titles, chapter headings, section headings, footnotes, and paragraph boundaries. They may include tables of information consisting of
rows and columns, pictures or diagrams, logos, and specialized layout design to help the reader understand how the information is organized.
On the web, all of these things are still true, but there is the additional complexity of document hyperlinks. These are the special areas in text (or sometimes in a graphic) which the viewer can select in order to jump to another page, or perhaps to another section of the same page. For example, selecting the words "Table of Contents" at the bottom of this page will cause your browser to display that page.
Web pages use an internationally-standardized format called HyperText Markup Language, or HTML, to describe the appearance and structure of each page, and the relationships (links) between pages. An HTML document consists of all of the text you create (the content) plus the special markup tags which instruct the web browser how to interpret and display your content.
The entire web document, both content and tags, are really just plain text -- there is nothing magical about an HTML file, and you can open and edit them in your system's standard text editor. The tags are just text that appears in such an unusual form that it would seldom occur in an ordinary person's writing (and if it does, there are ways to handle those special cases, as discussed in a later lesson).
A Simple Example
Perhaps the best way to illustrate HTML is to show a very simple example. What follows
is an actual -- albeit trivial -- HTML document.
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<!-- I created this with a text editor -->
<head>
<title>My First Document</title>
</head>
<body>
<h1>My First Document</h1>
<p>This is just a little document. It contains only
two sentences, and is of <i>very little</i> use.</p>
</body>
</html>
See how easy it is to tell what is a tag and what isn't? Without knowing any HTML at
all, you can easily see that things like <html>, <body>, </i>, and so
forth are special, and not part of the regular text. Everything inside the <>
symbols is a tag.
So what does all that mean? Let's take it step-by-step. The explanation that follows is
long, but if you spend a little time reading it carefully, you will understand most of the
fundamentals of writing HTML. Take the time to understand this page before you move on,
and this knowledge will serve you well later. The rest is easy!
- The first line, which begins with "<!DOCTYPE" is a document type
descriptor (DTD). Basically, this line should appear exactly as shown in each of
your HTML documents, and it simply informs the browser that this is indeed a
standards-compliant HTML document. HTML standards are defined by a group called the World Wide Web Consortium, abbreviated as
"the W3C". "IETF" stands for Internet Engineering Task Force, the nonprofitorganization that defines
how the Internet itself works in a technical sense. DTD and HTML you already know, and
"EN" stands for English, the language in which this document is written. For
now, just copy and paste this line into each of your documents, as the very first line.
- The "<html>" is your first tag. HTML tags are easy to
spot because they begin with a less-than sign "<" and end with a greater-than
sign ">". Notice that the actual document (not counting the DTD) always
begins with <html> and always ends with </html>. The slash ("/") in
"</html>" means "end" or "close". Most HTML tags are
matched this way, with the tag and its closing tag exactly corresponding to one another.
There are a few exceptions, which will be pointed out later.
- The next line, beginning with "<!--" and ending with "-->", is
a comment. Comments are text that you put into the document for your own
benefit, perhaps as a reminder of something that isn't quite finished yet and which you
intend to change later. Comments do not display in a web browser, but the viewer can
see them by asking his or her browser to show the HTML source of the page. Adding comments
to pages is a good idea, to help you remember things about the page, but comments are
never required. You can put as much text as you like inside a comment, and it doesn't have
to fit all on one line. Comments can go anywhere in your document but not inside a single
tag (for example, "<body <!-- some comment -->>" is not valid, but
"<body>....<!-- some comment -->..." is perfectly fine).
- The next few lines are the heading section and the title
of the document. Now, it may seem redundant because the title would appear to be all that
is in the heading, but that's only because this is a very simple document. Later on you
will learn how to put other things into the heading, such as keywords that help search
engines to find your page.
- Notice how the heading section, that is, everything from <head> through
</head>, completely contains the title (everything from <title> through
</title>)? This inclusion of one tag set completely inside another is called nesting
and it is a very important concept. Nested tags must always, always, always be
done this way, with one tag completely inside the other. For example,
<head>.....<title>....</title>...</head> is valid because the
title tag set is completely within the head tag set. But
<head>...<title>...</head>...</title> is not correct
because the title begins inside the head but tries to end outside the head. If you do this
in your HTML, the results may look okay visually in some browsers but will not work right
all the time.
- After the heading section comes the document's body, which contains
everything that is actually visible on the viewer's browser. You will learn later how to
add some optional things to the <body> tag to control the document's color,
background texture, and other features. All of the HTML tags that relate to actual
content, such as paragraphs, section headings, fonts, graphics, and hyperlinks, are only
used within the body section.
- The first thing in the body is the top-level heading, marked by the
<h1>...</h1> tag set. There are six heading levels available to you, and as
you might suspect their tags are <h1>...</h1>, <h2>...</h2>, and
so on through <h6>...</h6>. The HTML standard from W3C does not specify
exactly how each heading level will look, but you can be assured that a level-one heading
will be considerably more emphasized than a level-six heading. These headings should be
used like outline levels. Headings are optional, not required. A later lesson
will talk about some recommended design practices, but you can use headings in whatever
[reasonable] way makes sense for your content.
- Here's a question: If we have a level-one heading <h1>My First
Document</h1>, then why also bother to specify the same text as a title
<title>My First Document</title>? What's the difference? The answer is that the
heading displays on the actual web page, like anything else in the body of the
document. The title is for the browser's use only, and the W3C does not specify
how it is to be used. Most modern graphical browsers, such as Netscape or Internet
Explorer, put the title in the top bar (also called the "title bar") of the
browser window itself. The W3C made the title tag separate from the heading tags so that
you, the web designer, could choose to have different text displayed in these two
locations.
- DESIGN TIP: If someone visits your page
and chooses to add it to his or her "Bookmarks" or "Favorites", it is
the title text and not the heading that will be the name of the remembered entry. A lot of
people title their home page as "Home Page" (or something equally generic), but
that's bad because the person who bookmarks the page will not be able to easily tell whose
home page that bookmark represents. So choose your page title with care!
- Following the heading is a paragraph of ordinary text, within a <p>....</p>
tag set. The </p> tag is actually optional, but adding it is good design practice
and will make your HTML easier to maintain because you can clearly see where the paragraph
ends. Part of the text, specifically the words "very little", is italicized
using the <i>...</i> tag set. There are a number of similar tags for things
like boldfacing, underlining, strikethrough, and typeface (font) selection.
- Finally, the </body> and </html> tags indicate the end of the body and the
end of the document, respectively. Notice how, looking back at the document, every tag is
correctly nested? Here are all the tags, in order:
<html><head><title>...</title></head><body><p>...<i>...</i></p></body></html>
It may help, if your document isn't working the way you want it to, to actually print out
your HTML source code (from the text editor, not the browser) and use a pen to draw
brackets around each tag set, connecting the opening of a tag with its closing tag. If you
have bracket lines crossing each other, that usually indicates a problem.
This little page doesn't contain much content, but it illustrates all of the
fundamental elements of HTML. Click here to see what the page
actually looks like as rendered by your browser. To see a more complex example, ask your
browser to show you the HTML source code of this page (or any page of the tutorial).
|