Designing a blog with html5 | HTML5 Doctor

This article is an edited version of two articles published by Opera Web Evangelist, Bruce Lawson, reproduced with permission. All rights reserved.

This article was updated 27 May 2014 to use the <main> element to denote the main content of the page (was previously <div>).

Much of HTML 5’s feature set involves JavaScript APIs that make it easier to develop interactive web pages but there are a slew of new elements that allow you extra semantics in your conventional Web 1.0 pages. In order to investigate these, let’s look at marking up a blog.

Firstly what we’ll do is use the header, footer, main and nav elements to mark up the broad structure of the page. Doing this will make your site more accessible to real people who use some assistive technologies.

Next, we’ll make the blog comments form much smarter by using the new data types and built-in validation available in HTML 5-aware browsers.

Then we’ll do some work on the guts of the page, using HTML 5’s article elements to better mark up blog posts and comments and show how to use the sectioning elements to better structure accessible hierarchical headings on sites that are CMS-driven. As blogs are chronologically ordered, we’ll see what HTML 5 offers us for representing dates and times.

So take the phone of the hook, and make a cup of tea and we’ll get started.

Setting the DOCTYPE

HTML 5, when used as plain HTML rather than its XHTML 5 sibling doesn’t need a DOCTYPE. But all browsers do, otherwise they go into Quirksmode, which you don’t want: the collision of HTML 5 and Quirksmode is like matter and anti-matter meeting, and will cause a negative reality inversion that will make your underwear catch fire.

You’ve been warned, so at the top of your document, you need the line <!DOCTYPE html>.

Some sites “use” HTML 5, when in actual fact all they’ve done is take their existing code and change the DOCTYPE. That’s fine and dandy if you’ve been using valid, semantic code as HTML 5 is very similar to valid HTML 4.01. Eric Meyer mentions small differences like “not permitting a value attribute on an image submit”, and there are a few differences between the languages, summarised in the document HTML 5 differences from HTML 4.

However, I don’t want simply to rebadge my existing code, but to use some of the new structural elements now.

Using some new structural elements

My blog – like millions of others – has a header denoted by <div id="header">, a footer <div id="footer">, some articles (wrapped by an area called “content”, <div id="content">) and some navigation (wrapped up in an area called “sidebar” <div id="sidebar">). Most sites on the Web have similar constructs, although depending on your choice they might be called “branding” or “info” or “menu”, or you might use the equivalent words in your own language.

Although these all have very different functions within the page, they use the same generic div in the markup. as HTML 4 has no other way to code them. HTML 5 has new elements for distinguishing these logical areas: header, nav, footer and friends. (There’s more on this in an artice by Lachlan Hunt on A List Apart: A Preview of HTML 5.)

The overall aim is to replace this structure:

structure chart before redesign

with this:

structure chart after redesign

It’s a simple matter to change the HTML divs into the new elements. The only difficulty I had was deciding which element to use to mark up my sidebar, as it also includes a search field and “colophon” information as well as site-wide navigation. I decided against the aside element, as the spec says it “represents a section of a page that consists of content that is tangentially related to the content around the aside element, and which could be considered separate from that content”, but it’s nevertheless content rather than navigation. So I decided to go for the nav element as the closest fit.

I’ve wrapped the main content in a <main> element, which maps to the ARIA role=main so that screenreaders and other assistive technologies can easily find the main content. Note that you should only use this element once per page.

New form attributes

As well as the main structural item on the page, I’ve added some new attributes on input elements in the comments form.

HTML 5 was designed to reflect what developers actually do rather than to reflect a philosophical purity, and that’s very clearly manifested in the new attributes which mean the browser takes up much of the work currently done by developers re-inventing validation routines in JavaScript. (Extending forms functionality was how the HTML 5 spec began).

So I amended the email input field in the comments to be input type="email" which means that when the user submits the form, an HTML 5-aware browser will use built-in validation routines to check that it’s in the correct format, without any JavaScript. Check it out in the current version of Opera, as that the only full implementation at the time of writing (June 2009), and note that it also adds a “mail” icon in the input field as a cue to the user.

There are similar input validation routines triggered by some of the new input types, such as url (which I use on the field that asks for the user’s web address), number and pattern which compares the input with an arbitrary regular expression.

Another good example is input type="date", which pops up a calendar widget/ date picker when the user focusses on the input field. You can test the date picker, or here’s a screenshot from Opera 9.6:

date picker pops up next to input field

A very useful new attribute I used on both input fields for commenter’s name and email address, and the comment textarea was type="required" which generates a message on submission if the fields are left blank. Each different localisation of a browser would need to have a different message, and it’s not (so far) customisable by the developer.

red-bordered browser message 'You have to specify a value' next to unfilled required field

The good thing with all this new form fabulousness is it’s all formed around new attributes on an exisiting element, so people using older browsers just see a plain old input field.

A note on screenreaders and HTML 5

I hope screenreaders are ready for these new interactions; I asked the html 5 group to formally invite screenreader vendors to participate in the specification; to my knowledge, none have done so. However, using <header<, <footer>, <nav> and <main> gives real benefits to some assistive technology users.

Laying out the new elements

It’s not too hard to layout the new elements. In all the non-IE browsers, you can lay out anything using CSS – even a nonsense element. One gotcha is that that the current crop of browsers have no “knowledge” of these elements, although support is improving all the time.

All browsers have default settings for the elements they “know about”—how much padding, margin they element gets, does it display as a block or inline?. (Here’s a sample default stylesheet.) Unless over-ridden by CSS, these defaults apply. But the browsers don’t know about header, nav and the like, so have no defaults to apply to them.

I got terrible rendering oddities until I explicitly told the browsers

header, footer, nav, article, main {display:block;}

IE layout

There’s one gotcha about styling HTML 5 pages in IE: it doesn’t work. You can force it to quite easily with a JavaScript hack document.createElement('element name'). (Lachlan Hunt gets to the bottom of why IE needs this.)

For your convenience, Remy Sharp has written an HTML 5 enabling script which I use in the header to conjure all the missing elements into existence all at once.

But let’s be clear: this won’t work if your IE user doesn’t have JavaScript turned on. How much of that’s a big deal that is for you to decide. Pragmatically, it seems to me that the sites that will benefit the most from HTML 5’s new “Ajax”-style features, such as drag-and-drop, are the sites that currently have no hope of working without JavaScript.

Firefox 2 and Camino 1 layout

Firefox 2 and Camino 1 both use Gecko 1.9 which has a bug and so gets grumpy if you don’t serve the page as XHTML. (Go figure; that’s almost enough to trigger a negative reality inversion and you know how bad that can be). However, Firefox and Camino users upgrade frequently so Firefox is in version 3, while Camino 2 beta is out, so the problem will soon cease to exist. (Read more at How to get HTML5 working in IE and Firefox 22 by Remy Sharp.)

What’s the point of those new structural elements?

Well, they add semantics to the page. The browser now knows which area of your site is the header or the footer because there are header and footer elements, whereas div might be called “branding” and “legal”, or even “en-tete” and “pied-de-page” or “kopfzeile” and “fusszeile“.

But so what?

Robin Berjon expressed it beautifully in a comment on A List Apart:

Pretty much everyone in the Web community agrees that “semantics are yummy, and will get you cookies”, and that’s probably true. But once you start digging a little bit further, it becomes clear that very few people can actually articulate a reason why. So before we all go another round on this, I have to ask: what’s it you wanna do with them darn semantics?

The general answer is “to repurpose content”. That’s fine on the surface, but you quickly reach a point where you have to ask “repurpose to what?” … I think HTML should add only elements that either expose functionality that would be pretty much meaningless otherwise (e.g. canvas) or that provide semantics that help repurpose for Web browsing uses.

In my view, there are a couple of things I want to do with those semantics. The first is for search engine use; it’s easy to imagine Messrs Google or Yahoo! giving lower weighting to content in footer elements, or extra weight to content in the header. The second reason is to make the site navigable for people with disabilities. People with learning difficulties might instruct their browser always to put the articles before the navigation, for example. User agents might very well have a keyboard shortcut which jumped straight to the nav for example, or jumped straight past the navigation, in a programmatic implementation of “skip links“.

Which brings me to…

Further refining the HTML 5 structure

The blog home page

An interesting thing about a blog homepage is that there are generally the last 5 or so posts, each with a heading, a “body” and data about the post (time, who wrote it, how many comments etc.) and usually a link to another page that has the full blog post (if the homepage just showed an excerpt) and its comments.

HTML 5 has an article element which I use to wrap each story:

The article element represents a section of a page that consists of a composition that forms an independent part of a document, page, or site. This could be a forum post, a magazine or newspaper article, a Web log entry, a user-submitted comment, or any other independent item of content.

Let’s look in more detail at the guts of how I mark up each blogpost.

Anatomy of a blog post

diagram of article structure; explanation follows

The wrapper is no longer a generic div but an article. Within that is a header, comprising a heading (the title of the blogpost) and then the time of publication, marked up using the time element.

Then there are the pearls of wit and wisdom that consitute each of my posts, marked up as paragraphs, blockquotes etc., and is pulled unchanged out of the database. Following that is data about the blog post (category, how many comments) marked up as a footer and, in the case of pages that show a single blogpost, there are comments expressing undying admiration and love. Finally, there may be navigation from one article to the next.

Data about the article

Following the content there is some “metadata” about the post: what category it’s in, how many comments there are. I’ve marked this up as footer. I previously used aside which “represents a section of a page that consists of content that is tangentially related to the content around the aside element, and which could be considered separate from that content” but decided that it was too much of a stretch; data about a post is intimately related.

footer is a much better fit: “A footer typically contains information about its section such as who wrote it, links to related documents, copyright data, and the like.” I was initially thrown off-course by the presentational name of the element; my use here isn’t at the bottom of the page, or even at the bottom of the article, but it certainly seems to fit the bill – it’s information about its section, containing author name, links to related documents (comments) and the like. There’s no reason that you can’t have more than one footer on page; the spec’s description says “the footer element represents a footer for the section it applies to” and a page may have any number of sections. The spec also says “Footers don’t necessarily have to appear at the end of a section, though they usually do.”

Comments

I’ve marked up comments as articles, too, as the spec says that an article could be “a user-submitted comment”, but nested these inside the parent article. The spec says

When article elements are nested, the inner article elements represent articles that are in principle related to the contents of the outer article. For instance, a Web log entry on a site that accepts user-submitted comments could represent the comments as article elements nested within the article element for the Web log entry.

These are headed with the date and the time of the comment and name of its author—if you want, you can wrap these in a header, too, but to me it seems like markup for the sake of it.

Times and dates

Most blogs, news sites and the like provide dates of article publication.

Microformats people, the most vocal advocates of marking up dates and times, believe that computer-formatted dates are best for people: their wiki says “the ISO8601 YYYY-MM-DD format for dates is the best choice that is the most accurately readable for the most people worldwide, and thus the most accessible as well”. I don’t agree (and neither do candidates in my vox pop of non-geeks, my wife, brother and parents).

I do agree with the microformats people that hidden metadata is not as good as visible, human-readable data and therefore elected not to use the pubdate attribute of article.

Therefore I’ve used the HTML 5 time element to give a machine parsable date to computers, while giving people a human-readable date. Blog posts get the date, while comments get the date and time.

The spec is quite hard to understand, in my opinion, but the format you use is 2004-02-28T15:19:21+00:00, where T separates the date and the time, and the + (or a -) is the offset from UTC. Dates on their own don’t need a timezone; full datetimes do. Oddly, the spec suggests that if you use a time without a date, you don’t need a timezone either.

There’s considerable controversy over the time element at the moment. Recently one of the inner circle, Henri Sivonen, wrote that it’s for marking up future events only and not for timestamping blogs or news items: “The expected use cases of hCalendar are mainly transferring future event entries from a Web page into an application like iCal.” This seems very silly to me; if there is a time element, why not allow me to mark up any time or date?

The spec for time does not mention the future event-only restriction: “The time element represents a precise date and/or a time in the proleptic Gregorian calendar” and gives three examples, two of which are about the past and none of which are “future events”. Although the spec doesn’t (currently) limit use of the element, it does limit format to precise dates in “the proleptic Gregorian calendar”. This means I can mark up an archive page for “all blog posts today” using time, but not “all July 2008 posts” as that’s not a full YYYY-MM-DD date. Neither can you mark up precise, but ancient dates, so the date of Julius Ceasar’s assassination, 15 March 44 BC is not compatible.

I don’t expect this to be resolved. If you think as I do, feel free to mail the Working Group to express your feeling!

Hopefully, this brief article (geddit?) has given you a quick overview of how to use some of the new semantic elements. Let me know what you think!