Build A The Web

Introduction

This is Build A The Web, a book devoted to taking us, people with basic programming and computer knowledge, and upgrading us into web programmers. I am Cube Drone and I'll be your host.

The best kind of textbook about the web is made out of HTML and filled with media and links to other content. That is what the web is all about, not dusty ol' textbooks that are no longer valid because they cover a version of the web that went obsolete nine years ago.

This book was last updated May 1, 2020. If that's a long time ago, I might have died. Avenge me!

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

This site, and all of the resources used to generate this site, including class materials and examples, are available at https://github.com/cube-drone/buildatheweb. That's also a good place to comment, suggest updates, file bugs, and generally improve the book if you feel that it needs some attention.

What are we going to learn?

Becoming a web developer involves a lot of skills! This handy Web Developer Roadmap lays them out in a way that's pretty sensible, and we're going to try our best to carve a path through many of these skills that leaves us ready to build some cool websites.

Okay, let's get started. Are you ready?

Chapter 1: Request, Response

Our first chapter is going to focus on what happens behind the scenes when we make a web request. What that means is a crash course in computer networking.

Uniform Resource Locators

Let's start by looking at what happens when we crack open a web browser and type in:

https://en.wikipedia.org/w/index.php?title=Blinkenlights&action=edit#Etymology

This is a URL, which stands for Uniform Resource Locator.

This URL uniquely identifies a document somewhere on someone else's computer that we are going to request from that computer.

This divides into protocol, domain name, path, parameters, fragment, locus, and spindle. Memorize all of these terms now.

Dissecting a URL

Protocol

https://en.wikipedia.org/w/index.php?title=Blinkenlights&action=edit#Etymology

The protocol describes how to connect.

Domain Name

https://en.wikipedia.org/w/index.php?title=Blinkenlights&action=edit#Etymology

The domain name describes where to connect to.

Path

https://en.wikipedia.org/w/index.php?title=Blinkenlights&action=edit#Etymology

The path describes what is being requested.

Parameters

https://en.wikipedia.org/w/index.php?title=Blinkenlights&action=edit#Etymology

The parameters describes extra arguments for the thing being requested.

Fragment

https://en.wikipedia.org/w/index.php?title=Blinkenlights&action=edit#Etymology

The fragment describe a specific part of the document that we want to look at.

Locus & Spindle

the locus and spindle were in your heart the entire time

This is just a bunch of words, describing where we think our document is. How do we actually get that document?

Transport Control Protocol / Internet Protocol

Communicating with a far away computer is a process fraught with interesting problems. Problems like:

How do we make sure that messages reliably arrive?
How do we make sure that messages arrive in the right order?
How does our computer communicate with our router?
How does our router communicate with our modem?
How do we get messages to travel across a thin strip of copper, or fiberoptic cable, at all?

Most of these problems are quite complicated — and the lower level we get, the more likely it is that we'll need to consult an electrical engineer to explain signal processing theory to us. Believe me, that is the last thing that we want.

Fortunately, smart people have already solved most of these problems for us. The solutions to these problems stack up on top of another — at the bottom, electrical engineers figuring out how to send messages across wires, at the top, math PhDs figuring out how to make sure that messages arrive reliably in a fixed order.

We have two protocols at the very top of the stack that define how we communicate between computers — IP, the Internet Protocol, which defines how we send messages across the network, and TCP, the Transport Control Protocol, which makes sure that our messsages completely arrive, in the right order, and uncorrupted.

Protocol & Stack

I'm going to say the word protocol a lot, and it's probably important that I establish what that means. In Computing Science, we learn the difference between an algorithm and a program — an algorithm describes a specific way of solving a problem, whereas a program is the actual code that we need to run the algorithm. We could have five different programs, all implementing the same algorithm.

A protocol is an algorithm for communication. It delineates the rules of communication.

In the same way that a program is an implementation of an algorithm, a stack is an implementation of a protocol. A protocol is an algorithm, and a stack is a program to implement a protocol. So, in order to run TCP/IP, our computer runs the TCP/IP stack, which implements the TCP/IP protocol.

IP Address & Sockets

The abstraction presented by TCP/IP is simple: every computer has an IP address. An IP address looks like this: 192.0.2.0 — or, like this: 2001:DB80:c501:17ef:a063:a37f:3803:5c1a. These are just identifiers that communicate a unique identity for the computer in question.

If we know our IP address, and the IP address of the computer that we want to communicate with, and that other computer is online, we can trust that IP will get the message to that computer. We can use TCP to open a socket to a specific port on our target computer.

Giant Walls of Plugs

Imagine these computers like giant walls of plugs — or ports — and when we open a connection, TCP creates a two way communication link between two ports with a socket on each end.

There are thousands of these ports — they're numbered from 1 to 65,535. In order to keep things tidy, each different protocol that runs on top of TCP usually runs on a different port. Of course, most computers aren't communicating on all of these ports at once — in fact, there are six ports that, on most computers, get more use than all of the rest of them combined:

25, for the Simple Mail Transfer Protocol (SMTP)
53, for the Domain Name System (DNS)
67 and 68 for the Dynamic Host Configuration Protocol (DHCP)
80 for the HyperText Transfer Protocol (HTTP)
443, for the HyperText Transfer Protocol over Transport Layer Security (HTTPS)

We're going to cover all of these protocols in detail at one point or another. editor's note: no, we're not

For every port that we could contact on a remote computer, there's a program on the remote computer that's running, listening, and potentially responding to our requests.

Server vs. Client

A computer that stays connected to the internet all of the time and responds to these requests is called a server, and server programming is half of the battle of web programming. The second half is client programming, which describes the parts of the transaction that occur on the customer's side of things.

DHCP

In order to communicate with a remote server, then, we need three things — our IP address, their IP address, and a port number, to communicate with.

First of all, let's talk about how we got our IP Address.

Well, the short answer is, our computer already knows its IP address. We just ask.

How did our computer get its IP address? Well, when we connected it to the router — either via WiFi or by plugging it in — it communicated to the router, using DHCP, the Dynamic Host Configuration Protocol, where our computer asked the router to assign it an IP Address.

Then, how did the router that gave us an IP address get its own IP address? Well, when we connected the router to the internet, either by plugging it in to a modem or by plugging it in to another link in the network, it also communicated to a router, and communicated using DHCP. It asked "What is my IP?".

Network Address Translation

If we ask our computer to tell us its IP address, it'll probably report something that starts with 10.0 or 192.168 — but, if we go to Google and ask "what is my IP?", it'll tell us a completely different IP address.

What gives? How can our computer have more than one IP address?

As part of our deal with our ISPs, we usually get just one IP address. Just the one. Presumably, we have more than one device in our home — a computer, a cel phone, a laptop, a second computer, a smart TV, a toaster that connects to the internet for some reason, a third computer, a toothbrush that connects to the internet for some reason, things have really gone out of control lately.

All of these devices need to share the one IP address, so, our router creates a little private network, just for us, in our home. In this private network, any computer can have any IP address that it wants. By convention, the IP addresses for use in private networks start with 192.168 or 10.0. Then, when we're connecting to the outside world, our router translates our IP address in the private network into our public IP address.

This is NAT, Network Address Translation.

The router creates a private network for all of our devices and assigns them local IP addresses.

When our devices make requests to the internet, the router translates them into the public-facing IP address.

a diagram of a request being made through the router

When the internet responds, the router remembers who requested the content and forwards the response back to that IP address.

a diagram of the router returning the request's response to the source computer

Domain Name System

In order to communicate with a server, we need both our own IP address and the IP address of the computer that we want to communicate with.

We have our own IP address — now we need to find the address of the server that we want to talk to.

Let's look at the link we're trying to access.

https://en.wikipedia.org/w/index.php?title=Blinkenlights&action=edit#Etymology

There's no IP address anywhere in this link. There is a domain name, en.wikipedia.org

In order to find the IP address for this server, we're going to have to start by consulting a DNS server. DNS stands for Domain Name System, and the process of converting a domain name into an IP address is called name resolution.

How do we know where the DNS server is? When we use DHCP to connect to WiFi, it also provides us with the IP address of the nearest DNS server, which is usually being maintained by our ISP. Acronyms!

Turkish Protesters

Turkish protestors have written Google's DNS information, in spraypaint, on a building

Google also maintains a public DNS server at the address 8.8.8.8, which is good to know in case our local DNS server ever goes down or is interfered with by a totalitarian government.

So our computer sends a request to the DNS server, asking where to find en.wikipedia.org.

If the server already knows where en.wikipedia.org is, then it responds with the IP address. Let's imagine, though, that the server doesn't know.

DNS Root servers are distributed all over the globe, and they keep track of exactly one thing: the IP addresses of the computers reponsible for the recordkeeping of top level domains, like

.com
.net
.org
.photo
.click
.ninja
.unicorn
.fun
.ooo
.plumbing
.oh my god top level domains are just getting dumber and dumber

So, the DNS server looks at the domain name we've given it — en.wikipedia.org — sends a request to the root server, and asks "which servers can I ask about .org records?"

The root server will reply with a list of IP addresses responsible for .org records. These are the addresses of Top Level Domain Servers, which are maintained by Domain Registrars. We can pay these people about ten US dollars a year to create and maintain a record for us, so long as nobody else has claimed that domain name already. For a pittance, I now own http://lassam.net.

Finally, the DNS server queries the wikipedia nameservers, asking them where they can find en. If we paid a registrar to put up a domain name for us, they'll usually throw in a nameserver for free — all we have to do is write some DNS rules that tell the nameserver what IP address we want to point at.

DNS rules are written in a cryptic language that contains records with names like A, MX, AAAA, and AAAAAAAAAAAAAAAAHH SPIDERS — I'm sorry, there was a spider next to the keyboard.

So, Wikipedia's nameservers report that en.wikipedia.org is located at, say, 203.0.113.98. Finally, after that entire protracted process, we know where wikipedia is.

HyperText Transfer Protocol

Our next step is to use TCP/IP to create a connection between our IP address, and the IP address that we just resolved from DNS.

There's only one thing left that we need — a port number. We also didn't specify a port number as part of the URL, but we did specify a protocol, HTTPS — the HyperText Transfer Protocol feat. Transport Layer Security — and when we specify a protocol without a port number, our connection automatically goes to the default port for that protocol. In the case of HTTPS, that's 443.

The HyperText Transfer Protocol (HTTP) is the protocol responsible for moving documents around. Request a document? Get a document. The rules for that are laid out in the HyperText Transfer Protocol, which is the protocol that powers pretty much the entire web as we know it.

Our URL's protocol is HTTPS, though, not just HTTP. The difference is slight but important — HTTPS is the same as HTTP, but over a connection encrypted with Transport Layer Security (TLS). This prevents J. Random Hacker from watching every HTTP request that goes by.

Once we've created this encrypted communication path between our computer and the faraway server, we need to construct a HTTP Request. It'll look something like this:

GET /w/index.php?title=Blinkenlights&action=edit#Etymology HTTP/1.1
Host: en.wikipedia.org

This is a request to GET whatever's at the path of the URL we provided to our browser.

It also includes Headers with the request — sets of key and value that communicate extra information to the server. In this case, the only header we've included is "Host".

The server will receive this request, and respond with a HTTP Response containing the sweet webpage we've been looking for this entire time.

Virtually everything in web programming happens in the space between the HTTP request and the HTTP response. Figuring out how to respond, quickly, with the right stuff is the meat and potatoes of web programming. This bit, right here. It's all the marbles. Empires have risen and fallen, all dependant on the simple gap of how a server converts this HTTP request into a HTTP response.

And then, Wikipedia responds. The full HTTP response is several pages long, we can look at it here. In order to keep my book neat and tidy, though, I'm going to concoct a fake response for the sake of example:

HTTP/1.1 200 OK
Content-language: en
Content-type: text/html; charset=UTF-8
X-Clacks-Overhead: GNU Terry Pratchett

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Definitely Wikipedia</title>
</head>
<body>
    <h1>This is totally Wikipedia.</h1>
    <p> Hi there. I am Bob Wikipedia and you are at my website.
    It’s still under construction but I am pretty sure it will be done by 1998.</p>
    <img src=”https://media0.giphy.com/media/K5Yn9JCXcrXr2/giphy.gif”>
</body>
</html>

It opens with the version of HTTP that's currently running, as well as a HTTP Status Code. So long as the HTTP Status Code is 200 OK we're good to go.

A Brief And Mostly Inaccurate Reference Chart for HTTP Status Codes

A more accurate look at HTTP status codes can be found here.

Alternatively, pictures of cats matching every HTTP status code exist at http.cat.

After the HTTP version and status code, there are Headers again. These headers describe important properties of the file that's been returned.

Content-language: en
Content-type: text/html; charset=UTF-8
X-Clacks-Overhead: GNU Terry Pratchett

Most headers are defined in the protocol itself, but we can add any headers that we want by starting them with X-, which is how we can sneak in the clacks.

After the HTTP version and status code, we get into a big patch of HTML. How do we know that it's HTML? Well, the Content-type header referred to this as text/html, so we can be pretty sure that we've got a big handful of HTML.

Hypertext Markup Language

I'm going to do a dramatic reading of a passage from "In The Beginning Was The Command Line". Despite being almost 20 years old, it's a stonkingly accurate diatribe on computer culture and a fun historical record from the before times.

This crud is called HTML (HyperText Markup Language) and it is basically a very simple programming language instructing your web browser how to draw a page on a screen. Anyone can learn HTML and many people do. The important thing is that no matter what splendid multimedia web pages they might represent, HTML files are just telegrams.

When Ronald Reagan was a radio announcer, he used to call baseball games by reading the terse descriptions that trickled in over the telegraph wire and were printed out on a paper tape. He would sit there, all by himself in a padded room with a microphone, and the paper tape would eke out of the machine and crawl over the palm of his hand printed with cryptic abbreviations. If the count went to three and two, Reagan would describe the scene as he saw it in his mind's eye: "The brawny left-hander steps out of the batter's box to wipe the sweat from his brow. The umpire steps forward to sweep the dirt from home plate." and so on. When the cryptogram on the paper tape announced a base hit, he would whack the edge of the table with a pencil, creating a little sound effect, and describe the arc of the ball as if he could actually see it. His listeners, many of whom presumably thought that Reagan was actually at the ballpark watching the game, would reconstruct the scene in their minds according to his descriptions.

This is exactly how the World Wide Web works: the HTML files are the pithy description on the paper tape, and your Web browser is Ronald Reagan.
Neal Stephenson, "In The Beginning Was The Command Line"

So, the HTML that Wikipedia has returned to us contains a description of the content that we're looking at, and then our web browser renders it into a webpage.

It would seem like, here, our journey is complete. We've made a round trip between our device and the server, and we're done — but, not quite! When the browser renderer gets to this part of the HTML:

<img src=”https://media0.giphy.com/media/K5Yn9JCXcrXr2/giphy.gif”/>

This image tag references content that exists at another URL. And so, we kick off this entire process again, from start to finish, to get whatever it is at that new address. The trick is, though, instead of sending us HTML, this server will respond with an animated image. Modern web-pages may require dozens of requests to various images and scripts before they're completely rendered.

Chapter 1 Summary

Every browser request–response goes a little something like this:

Our browser uses DNS to resolve the URL's domain into an IP address.
Our browser uses TCP to create a two-way connection with the server at that IP address.
If the URL's protocol is HTTPS, then a TLS connection is made to the server.
Our browser sends a HTTP request to the server.
The server does some stuff.
A HTTP response is sent back to us, containing a document.
If that document is HTML, the browser will render that HTML into a webpage.
If that document contains other URLs, the browser will create new requests for each of them.

Chapter 2: HyperText

We've talked about HyperText Transfer Protocol and HyperText Markup Language, so what in the blue blazes is HyperText?

Garden of Forking Paths

In 1945, Vannevar Bush described a hypothetical device called the memex which allowed its users to organize content in a way that allowed for it to be indexed, searched, and linked to other content.

While a memex, as described, was never built, the concept of a network of interlinked documents captured the public imagination.

In the 1960's, Ted Nelson popularized the term "Hypertext" for the same concept. Engineers spent the next 30 years working on various implementations until Tim Berners-Lee finally produced a popular implementation in 1989.

Tim Berners-Lee

The clever dude who figured this out was Tim Berners-Lee, who was looking for a way to take the peanut butter of 'hypertext' and the chocolate of 'a world-wide network of computers' and mash them together.

He developed HTTP, HTML, and the URL, technologies that worked in concert to produce a world-spanning network of interlinked content that he called the "WorldWideWeb".

That means that when he lists his job title as "Web Developer", this is both literally true and a wonderful example of subtle understatement.

As with most things, there's a lot of history and nuance behind the term, and also as with most things, the way that most people understand the term completely discards all of that history and nuance: HyperText is text with a whack of extra stuff strapped on, links and images and style and fonts and videos of cats and scripts.

Links and URLs

There are two big questions of hypertext

How do we represent a link to another document in a document?
How do we represent a diverse array of multimedia content in a simple file format?

The answer to both of these questions is the humble URL. A URL can represent a link to any sort of content.

So HTML documents are littered with URLs, both as links to other documents and as a means to embed multimedia content in a wide variety of formats.

Text Files and Markup Language

Under the hood, all file formats are just binary narrowly disguised as something else.

When we think of file formats for text, our first thought might be of Word, from Microsoft Office — a format that also contains information about how to properly render the document, fonts, document layout details, and much much more. Unfortunately, Word file formats — of which there are several at this point — are legendary tarpits of horrendous complexity, closed off so that they can only be understood by Microsoft engineers, a hideous conglomeration of text and binary that make Word the file format equivalent of the last 15 minutes of Akira.

Pictured: An OpenOffice developer trying to make sense of Word's file format.

A common way to manage the complexity associated with complicated documents are to represent them as simple plain-text files. It's not quite as simple and efficient as storing documents in pure binary representations, but the file formats end up being human-readable by anybody with a text editor, which is great for the clarity and interoperability of the format.

But the problem of plaintext becomes quickly clear:
it doesn't support presentational elements very well.
How do you include a list of items? A table? An image?
How do you define font sizes and margin widths?
There need to be rules that a computer can follow
to take a plaintext file and turn it into
a proper document.

Furthermore, we shouldn't show these rules to
the user.

A markup language allows us to mark
the text with additional information that isn't shown
to the user, but indicates additional information
like emphasis, color, and font.

or, to look at that last line, again, in Hypertext Markup Language:

A <strong>markup language</strong> allows us to mark
the text with additional information that isn't shown
to the user, but indicates additional information
like <em>emphasis</em>,
<span style='color:red;'>color</span>, and
<span style='font-family:chunk'>font</strong>.

HTML — Hypertext Markup Language — is not the only markup language — many of them exist.

HTML is based on a language called SGML, which was based on a language called GML, which was created at IBM in the 1970 as a documentation language.

Technically, HTML qualified as a subclass of SGML until the introduction of HTML5, which finally cut all remaining ties with the old standard.

Some Other Markup Languages

Another child of SGML is XML. While HTML was intended to be a representation of documents and the sort of data that would commonly appear in documents — tables, images and such — XML is a similar format that was designed as a general-purpose markup language for any kind of data.

Some XML might look like this:

    
<employee>
    <name>Barry Fudgechampion</name>
    <age>27</age>
    <salary>Low</salary>
    <odor>Potent</odor>
</employee>

Markdown is a markup language that's designed to be interoperable with HTML and easy for humans to read and edit, even when it's just plain text.

Some Markdown might look like this:

    

# Laziness, Impatience, and Hubris

    "We will encourage you to develop the three great virtues of a programmer:
    laziness, impatience, and hubris." -- Larry Wall, Programming Perl (1st edition)

Take a look at [this link](http://wiki.c2.com/?LazinessImpatienceHubris)
for more information.

## Laziness

The quality that makes you go to great effort to reduce overall energy expenditure.
It makes you write labor-saving programs that other people will find useful,
and document what you wrote so you don't have to answer so many questions about it.
Hence, the first great virtue of a programmer.
Also hence, this book. See also _impatience and hubris_.

## Impatience

The anger you feel when the computer is being lazy.
This makes you write programs that don't just react to your needs,
but actually anticipate them.
Or at least pretend to. Hence, the second great virtue of a programmer.
See also _laziness and hubris_.

## Hubris

Excessive pride, the sort of thing Zeus zaps you for.
Also the quality that makes you write (and maintain) programs that other people won't
want to say bad things about. Hence, the third great virtue of a programmer.
See also _laziness and impatience_.

JSON (Javascript Object Notation) is a quite simple markup language that's also valid Javascript, which makes it very easy for the language Javascript to handle.

Some JSON might look like this:

    
{name: "Barry Fudgechampion",
 age: 27,
 salary: "Low",
 odor: "Potent"}

YAML (Yet Another Markup Language) is a markup language that's designed to be easy for humans to read and edit, designed to represent complex objects.

Some YAML might look like this:

    
- name: Barry Fudgechampion
  age: 27
  salary: Low
  data: |
     There once was a short man from Ealing
     Who got on a bus to Darjeeling
         It said on the door
         "Please don't spit on the floor"
     So he carefully spat on the ceiling

Some of these markup languages are easier for humans to read. Some of them are easier to code with. Some of them are more stict and some of them are less strict. Some of them are intended to represent data and some of them are intended to represent documents. But they are all a way to use plain text to represent something more complicated than plain text.

All of these different markup langauges are good for different purposes.

HTML is good for representing multimedia documents that link to one another.
JSON is good for transmitting data in a simple format between systems.
YAML is good for human-written data, like configuration files.
Markdown is good for human-written multimedia documents that are still readable in their text format.
XML is good for representing objects that have a strictly defined format.

Bad news: You're probably going to have to learn all of these markup languages at one point or another.

Good news: They're not very difficult to learn!

Character Encoding

It's possible, through the use of these different markup languages, to represent just about anything as a plain-text file.

"A text file" is a little bit disingenuous, though. There's no such thing as just a "text file". A text file is, under the hood, a series of characters represented as binary data.

How, then, do we represent a text file as binary data?

There are numerous schemes for converting character data into binary data, and these schemes are known as character encodings.

ASCII

An early common character encoding is ASCII — which is simple — 7 bits can make 128 different combinations. That's more than enough room for all 26 latin alphabet characters in both lowercase and uppercase, and all of the numbers and symbols on a standard US keyboard, with room left over for some control codes like "new line", "tab", "alert", and "bell".

Here's a segment of the ASCII table:

Decimal	Hex	Binary	Character
96	60	01100000	`
97	61	01100001	a
98	62	01100010	b
99	62	01100011	c
100	63	01100100	d

The full table, as well as more additional detail than you could possibly imagine, is available at Wikipedia.

hello, world in ASCII would read as:

1101000   h
1100101   e
1101100   l
1101100   l
1101111   o
0101100   ,
0100000  
1110111   w
1101111   o
1110010   r
1101100   l
1100100   d

ASCII was enormously popular for a very long time — but the big problem with ASCII is that 128 characters are not enough characters to encompass the entire character set of all available human languages.

ISO-8859

7 bits is all well and good, but 8 bits would double the amount of character space available to us — and 8 bits makes a byte, which makes it really easy to remember "A character is about a byte".

What characters do we cram into that extra 128 spaces on the table, though? Even English contains a bouquet of words that use diacritical marks.

How do we decide whether to include the spanish ñ, so that we can say "jalapeño" or the french ç, so that we can say "soupçon" and "façade"?

And what about characters that are used extensively by unimportant languages like every other human language?

One idea was to use ASCII for the first 128 characters, and then, in the other 128 characters, fit all of the extra bits required for groups of different languages. Group 1 would contain a bevy of characters for latin-based languages like English, French, German, Italian, Portugese. Group 2 would have characters useful for central european languages like Polish, Czech, and Hungarian. This scheme would be extensible, allowing new groups to be added — and, to date, there are sixteen.

This scheme was called ISO-8859, and it enjoyed common usage through most of the 90s and 00s, and group 1, ISO-8859-1, commonly known as latin-1, became one of the world's most common character encodings.

Here's ñello, world in latin-1:

11110001   ñ
01100101   e
01101100   l
01101100   l
01101111   o
00101100   ,
00100000  
01110111   w
01101111   o
01110010   r
01101100   l
01100100   d

The only character that stands out is ñ, which lives in the last-half of the encoding, so it starts with a one. All of the other characters are the equivalent to their ASCII-encodings.

� � � � what even

When downloading a HTML file from the internet using HTTP, the HTTP server is supposed to accurately report the character encoding that the text is using.

Sometimes, though, the HTTP server sends the wrong character encoding. Stupid HTTP server. When most browsers can't understand a character, they'll display the � character, which is the browser equivalent of going ¯\_(ツ)_/¯.

It's getting harder and harder to find webpages with serious encoding errors but fortunately Eric S. Raymond hasn't updated his Jargon File in a long while, so we can still find some.

There's a bug that used to be common — Windows used an encoding in the Windows-98 era, called Windows 1252 that was commonly confused for latin-1.

Webpages would report that they were returning latin-1 when they were actually returning windows-1252, and the result would be web-pages lined with the � character. It doesn't happen much any more, because the world has been standardizing on a new encoding: UTF-8.

But before we get to UTF-8 we have to talk about Unicode.

Unicode

The problem with all of these character schemes is that they have to be efficient.

Nobody is interested in a character encoding scheme that has room for all of the different possible characters, because it's just not practical for every single character to take several bytes of space to represent.

Character encoding is hard, though. For one thing, there's a lot of characters that are very easy to confuse for one another.

"Do you know what would be nice", I imagine someone saying. "It would be nice if we just had a list of every single character and gave each one of them a number."

"That's not a character encoding scheme", said someone else in this hypothetical scenario. "There's not even any instructions for how to convert that into a binary representation, that's just a big list of all of the possible characters, and each of them have a unique number."

"It doesn't need to be a whole character encoding scheme. It'd just be nice if we had a unique number for each character."

So, that's what Unicode is — not a binary encoding scheme at all, really, just a list of all of the characters and a unique number for each of them. This unique number is called a code point.

The entire Unicode character table is enormous. "A" is 41, "Â" is 194, "௵" is 3061. "♞" is 9822. There's space for everything, including every character in Chinese, Japanese, and Korean, and there are over 120,000 characters in the complete table.

Unicode is not a Binary Encoding Scheme

Because Unicode is not itself a binary encoding scheme, just a list of characters, that brings the obvious follow-up question: how do we represent Unicode in binary, then?

And the answer: there are lots of different schemes! UCS-2, UTF-32, UTF-16, UTF-1, UTF-8, UTF-7, UTF-EBCDIC, there are many, many standards for representing Unicode characters in binary, most of them with various clever tricks to keep their size under control.

It's hard, especially because Unicode is an giant, unbounded table. New characters are being added to Unicode all the time.

If we're not interested in efficiency at all, we can just set aside 32 bits for each character. So long as Unicode stays less than 4 billion characters, this should work fine. This is known as UTF-32 , and it's very simple and easy to understand. It also wastes a lot of space. If UTF-32 were common, it would quadruple the size of most web requests.

Just Use UTF-8

The sneaky trick is that, as much as we sometimes need special characters, most of our day-to-day communication and code come out of the 128 characters that we originally used for ASCII.

UTF-8 is a scheme that uses 8-bytes for most characters, but uses extra bytes when necessary to encode special characters.

This means that UTF-8 is as efficient as latin-1 most of the time, while also being able to represent the entirety of Unicode.

Here's ñello, world ಠ_ಠ in UTF-8:

11000011 10110001   ñ
01100101   e
01101100   l
01101100   l
01101111   o
00101100   ,
00100000  
01110111   w
01101111   o
01110010   r
01101100   l
01100100   d
00100000  
11100000 10110010 10100000   ಠ
01011111   _
11100000 10110010 10100000   ಠ

Just like latin-1, all of the ASCII characters look the same that they did in ASCII. It's not until we get into special characters that we see more bits start to appear.

UTF-8 is slowly consuming the entire internet — if we ask anybody what character encoding to use today, they'll probably say "just use UTF-8".

HTML

Okay, so, now that we know about hypertext, markup languages, and character encodings, let's look at the Hypertext Markup Language.

I think the best way to start, here, might be to start by looking at a completely empty HTML page.

Okay, that's not funny. Let's put some stuff in there.

    
<!DOCTYPE html>
<html>
</html>

So, this document doesn't contain anything except for a Doctype and an HTML element.

Creating a HTML Page

To get started, we need a file to work on. Which is very easy. Just open a file, anywhere on your system, with a text editor, and name it sample.html. Now we can open this file with a browser and it's an empty webpage. Easy peasy.

Doctype

HTML has many versions at this point — HTML5, HTML 4.1, XHTML 2.0 — and in order for the browser to render the HTML properly, it needs to know which version of HTML it is looking at.

Until HTML5, HTML was technically a subset of a language called SGML, and SGML required that documents come with a "Document Type Definition" which would lay out all of the rules for displaying the document. Any HTML document released before HTML5, then, would have a doctype that looks like:

    
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

HTML5 dropped SGML compatibility, so it doesn't have to follow these rules anymore. It was decided that, after the change, the doctype wouldn't need a link to a document type definition any longer. Now it's just <!DOCTYPE html>. All HTML5 documents start this way, to let browsers know that this is an HTML5 document.

HTML Element

HTML is strictly hierarchical, which means that every element in the document must be enclosed by another element, except for the "root" element which wraps everything in the entire document. This is the <html> element, the outermost element that contains everything else.

HTML elements usually start with an opening tag and end with a closing tag — in this case, <html> is the opening tag and </html> is the closing tag. Everything inside the <html> tag is considered to be inside the document. Everything outside the <html> tag is ... outside the document?

Attributes

    
<!DOCTYPE html>
<html lang="en">
</html>

HTML elements can also have attributes, which are attached to the opening tag and look like attribute="value".

In this case, we've attached the lang="en" attribute to the base HTML element to let the world know that this HTML document is in English.

HTML Comments

    
<!-- Nobody can see this, no matter where you put it in the HTML -->

<!-- Comments are important in any language, even HTML -->

<!--
Comments
can
span
multiple
lines
-->

<!-- Sometimes you want to say stuff that nobody can see in the final page output. -->

<!-- Watch out, though - people can still see the comments in your HTML source. -->

Head & Body

    
<!DOCTYPE html>
<html lang="en">
    <head>
    </head>
    <body>
    </body>
</html>

Inside the HTML element comes two key elements that divide the document into "the part you see" (the body) and "the part you don't see" (the head).

The head element is loaded before the rest of the page. It contains metadata about the content in the body like the document's title, as well as any style information that the page needs to load.

Elements What Go In The Head

The elements that go in the <head> element are important metadata about the page.

It's possible to completely skip the head and go right on to the body.

We shouldn't, though — a lot of stuff in the head is important to the proper operation of the page.

Charset

    
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
    </head>
    <body>
    </body>
</html>

This next part we've added, <meta charset="UTF-8"/>, defines the character encoding of the document as UTF-8.

The HTTP server that serves this document reads this when it serves the document and uses it to accurately report what binary encoding scheme this document is using.

The <meta> tag is a special tag that doesn't wrap any content, so it doesn't require a closing tag.

Title

    
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Definitely Wikipedia</title>
    </head>
    <body>
    </body>
</html>

The <title> element contains the title of the document. In this case, it is Definitely Wikipedia.

Hey, it's a thing that we can see! Look at that sweet title.

Author & Description

    
<meta name="description" content="This is a description of the content on the page" />
<meta name="author" content="Andrea Authorface" />

The description meta tag is picked up by Google and Twitter when we share a link to our html page somewhere.

The content of the description meta tag shows up here.

Best I can tell, the author meta tag isn't used anywhere and is completely optional.

Favicon

This is one of my favourite HTML tricks!

Let's take a look at my browser's tabs.

It almost seems like the icon is more important than the title, for most of these!

Also, and this is a fun thing to note, the Google Calendar favicon is accurate to the day that I'm looking at it which is a very cool touch.

These icons are called "favicons".

It's possible to link a favicon to our page by including a link to a very small image file.

A basic favicon link looks like this:

    
<link rel="shortcut icon" href="http://buildatheweb.cube-drone.com/favicon.ico" type="image/x-icon">

The type option here needs to match the type of the file that we're linking to. In the case of an .ico file, it's image/x-icon, but if we were to provide a .png file, it would be image/png. This format is a standard called MIME Type, and a list of file-extension to MIME Type mappings can be found here.

It's possible to really go to town optimizing our favicons — providing different icon sizes and formats for every device that could possibly display our webpage, but just a 32x32 .png file should be a serviceable option.

Style & Scripts

We're not going to talk about these yet! Wait until we get to our chapters on CSS and JavaScript respectively.

Body Elements

All of the parts of the document we've built so far? They're hiding just under the surface of our document. Now we look at the exciting parts that people can actually see!

Paragraphs

Let's start by just dropping a bundle of Charles Stross quotes in the <body>:

    
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Definitely Wikipedia</title>
        <meta name="description" content="This is a description of the content on the page" />
        <meta name="author" content="Andrea Authorface" />
        <link rel="shortcut icon" href="http://buildatheweb.cube-drone.com/favicon.ico" type="image/x-icon">
    </head>
    <body>
Like the famous mad philosopher said, when you stare into the void, the void stares also;
but if you cast into the void, you get a type conversion error.
(Which just goes to show Nietzsche wasn't a C++ programmer.)

Helpfiles are traditionally outnumbered by no-help files, which superficially resemble a helpfile
in form but not in content because they don't actually tell you anything you don't already know,
or they answer every question except the one you're asking, or you open them and a giant
animated paper clip leaps out and cheerfully asks where you want to go today. And wikis are worse.
    </body>
</html>

When we open this HTML file in a browser, we get something that looks like this:

We wanted those paragraphs to be separate! The HTML just up and ignored our helpful spacing in the HTML file.

What we need is to indicate that those words are in separate paragraphs. We can do this with the <p> tag.

    
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Definitely Wikipedia</title>
        <meta name="description" content="This is a description of the content on the page" />
        <meta name="author" content="Andrea Authorface" />
        <link rel="shortcut icon" href="http://buildatheweb.cube-drone.com/favicon.ico" type="image/x-icon">
    </head>
    <body>
        <p>
            Like the famous mad philosopher said, when you stare into the void, the void stares also;
            but if you cast into the void, you get a type conversion error.
            (Which just goes to show Nietzsche wasn't a C++ programmer.)
        </p>

        <p>
            Helpfiles are traditionally outnumbered by no-help files, which superficially resemble a helpfile
            in form but not in content because they don't actually tell you anything you don't already know,
            or they answer every question except the one you're asking, or you open them and a giant
            animated paper clip leaps out and cheerfully asks where you want to go today. And wikis are worse.
        </p>
    </body>
</html>

There we go.

Is The Indentation Really Necessary?

As we discovered just now, the whitespace in our HTML file doesn't matter.

So the indentation is just there to make the source code easier to read — it doesn't have any effect on the rendering of the page.

Headers

You've been reading this document (presumably), so you've noticed that it's divided into segments, by Headers, like the one that says "Headers" that I used to open this segment.

Headers come in six different "levels", from <h1> all the way up to <h6>.

Conventionally, <h1>, the largest header, is used for the document's title. <h2> is used for headings, <h3> for sub-headings, and so on, through the document.

    
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Definitely Wikipedia</title>
        <meta name="description" content="This is a description of the content on the page" />
        <meta name="author" content="Andrea Authorface" />
        <link rel="shortcut icon" href="http://buildatheweb.cube-drone.com/favicon.ico" type="image/x-icon">
    </head>
    <body>
        <h1>Header Level One</h1>
        <p>
            There is a philosophy by which many people live their lives,
            and it is this: life is a shit sandwich, but the more bread you've got,
            the less shit you have to eat.
        </p>

        <h2>Header Level Two</h2>
        <p>
            These people are often selfish brats as kids, and they don't get better
            with age: think of the shifty-eyed smarmy asshole from the sixth form
            who grow up to be a merchant banker, or an estate agent, or one of the
            Conservative Party funny-handshake mine's a Rolex brigade.
        </p>

        <h3>Header Level Three</h3>
        <p>
            (This isn't to say that all estate agents, or merchant bankers, or conservatives
            are selfish, but that these are ways of life that provide opportunities
            of a certain disposition to enrich themselves at the expense of others. Bear with me.)
        </p>

        <h4>Header Level Four</h4>
        <p>
            There is another philosophy by which people live their lives,
            and it goes thus: You will do as I say or I will hurt you.
        </p>

        <h5>Header Level Five</h5>
        <p>
            Let me draw you a Venn diagram with two circles on it, denoting sets of individuals.
            They overlap: the greedy ones and the authoritarian ones.
            Let's shade in the intersecting area in a different color and label it: dangerous.
        </p>

        <h6>Header Level Six</h6>
        <p>
            Greed isn't automatically dangerous on its own, and petty authoritarians aren't
            usually dangerous outside their immediate vicinity &mdash; but when you combine the two,
            you get gangsters and dictators and hate-spewing preachers.
        </p>
    </body>
</html>

Lists

Lists can either be numbered or simply bulleted, and they can also contain sub-lists.

    
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Definitely Wikipedia</title>
        <meta name="description" content="This is a recipe for sausages on a bun" />
        <meta name="author" content="Andrea Authorface" />
        <link rel="shortcut icon" href="http://buildatheweb.cube-drone.com/favicon.ico" type="image/x-icon">
    </head>
    <body>
        <h1> Tasty Sausage Recipe </h1>

        <h2> Ingredients </h2>
        <!-- ul stands for 'unordered list' -->
        <ul>
            <!-- li stands for 'list item' -->
            <li>Sausages (Bratwurst or Italian Sausage)</li>
            <li>Onions</li>
            <li>Beer</li>
            <li>Sausage Bun</li>
            <li>Pickle Relish</li>
            <li>Mustard</li>
        </ul>

        <h2> Instructions </h2>
        <!-- ol stands for 'ordered list' -->
        <ol>
            <li>
                Put the sausages and onions in a pan with a few cups full of water or beer
                and crank the heat unil the liquid starts to boil.
            </li>
            <li>
                Turn the temperature down and cover the pan with a lid. Let the
                sausages and onions steam for 10 minutes.
            </li>
            <li>
                Remove the lid, set aside the onions, and turn the heat up to medium-high,
                browning the exterior of the sausages.
                <em>A safe internal temperature for sausages is 165&#8451; </em>
            </li>
            <li>
                <!-- it's possible to put lists inside of lists -->
                Serve on a toasted sausage bun with
                <ul>
                    <li>the onions</li>
                    <li>a chunky pickle relish</li>
                    <li>grainy mustard</li>
                </ul>
            </li>
        </ol>
    </body>
</html>

So Boring. Dying.

Do we really have to go over every single HTML tag?

Yes. Yes we do.

It's important. Every webpage is built out of this stuff.

These Basic HTML Sites Look Nothing Like Real Websites

We're going to get to that, soon! HTML is only part of the whole story, here — in order to look good, we also need CSS files to apply style and pizzazz to our page.

This is what the New York Times looks like without style:

Okay, that was a cheap shot. I mean, this is what the New York Times looks like without a CSS stylesheet:

That's not much, right? Now let's look at what it looks like with a stylesheet:

Wow! Much classier! We're going to be covering CSS in our next chapter, but we need to understand vanilla HTML first!

One of the rules of good web design is that the HTML of a page should be as clean to read without CSS as it is with it — some users, especially the visually impaired, use the raw HTML of a page without any additional style.

Tables

Sometimes we need to display a table of data. Tables have rows, columns, and headers.

    

        <table>
            <tr> <!-- tr means "table row" -->
                <th>Vegetable</th> <!-- th means "table header" -->
                <th>Fries</th>
                <th>Chips</th>
                <th>Smoothies</th>
                <th>Real</th>
            </tr>
            <tr>
                <td>Potato</td> <!-- td means "table data" -->
                <td>Yes</td>
                <td>Yes</td>
                <td>No</td>
                <td>Yes</td>
            </tr>
            <tr>
                <td>Carrot</td>
                <td>No</td>
                <td>No</td>
                <td>Yes</td>
                <td>Yes</td>
            </tr>
            <tr>
                <td>Beet</td>
                <td>No</td>
                <td>Yes</td>
                <td>Yes</td>
                <td>Yes</td>
            </tr>
            <tr>
                <td>Brotato</td>
                <td>Yes</td>
                <td>Yes</td>
                <td>Yes</td>
                <td>No</td>
            </tr>
        </table>

Tables are pretty flexible about how you align them, but td and th elements always go inside tr elements.

    
        <table>
            <tr>
                <th>Name:</th>
                <td>Curtis</th>
            </tr>
            <tr>
                <th>Class:</th>
                <td>Software Developer</th>
            </tr>
            <tr>
                <th>Blood Type:</th>
                <td>Double Plus Good</th>
            </tr>
            <tr>
                <th>Sandwich:</th>
                <td>BLT</th>
            </tr>
            <tr>
                <th>Weaknesses:</th>
                <td>Myriad</td>
            </tr>
        </table>

Special Characters

Certain characters, like & and >, are used in HTML markup.

We cannot simply type the < character into our HTML — the browser would confuse it for the start of an HTML tag.

Instead, we need to use a special escaping syntax: HTML Entities.

The ampersand character, &, can be represented with &. The "greater than" character, >, can be represented with &gt. The "less than" character, <, can be represented with &lt. With these three entities, we can represent any HTML characters that we come across.

Special characters can represent more than just HTML characters, though! There are many special typographic characters that just don't exist on the keyboard, like "—" (—), "©" (©), and "€" (€).

On top of that, if we don't want to enter Unicode characters, we can always reference them directly by their Unicode code point, like "❤" (❤), "ൠ" (ൠ) or "🤘"(&#129304).

Emphasis and Strong

This one's going to be really quick. Sometimes, you want to emphasize text or really, really emphasize text.

This is where the <em> (for emphasis) and <strong> (for strong emphasis) tags shine.

Figure them out on your own time. I'm not yer' daddy.

Links

Do you know what would make a lot of sense? If the HTML tag for links between pages was <link>.

But <link> was already taken, so instead, a link to another page is represented with the <a> tag. Which stands for anchor.

So, if we wanted to construct a link to, say, a video about the death of Flash from the point of view of some characters who were animated in Flash, I'd need to use an <a> tag, like so:

    
        <a href="https://www.youtube.com/watch?v=L0nuQ5o2DYU"> a video about the
            death of Flash from the point of view of some
            characters who were animated in Flash </a>

Yeah, that's right, the URL is encoded in an attribute called href. This whole deal is just a cascade of nonsense. It's possible to connect all of these details with minutiae in the history of HTML, but I think it's easier just to remember "links are a href=" without trying to justify it.

Absolute vs. Relative Links

There are two different ways to link to content. These linking rules apply to anything that we might link to — other web pages, images, videos, stylesheets — the rules are the same for all of them.

An absolute link is a full URL, like the one that we saw in the first chapter, with protocol, domain, and path.

One of these: http://buildatheweb.cube-drone.com/images/chapter3/hot_dog.jpg

As we've established, this refers to /images/chapter3/hot_dog.jpg on the domain buildatheweb.cube-drone.com using the http protocol.

A relative link is a link that contains less information, and depends on context to resolve the full URL. The rules of resolving relative links can be a little bit complicated, but it's almost always a better idea to use relative links rather than absolute ones when we are building webpages, because if we leave information like the domain name and protocol out of our links, we can change these things more easily.

This is particularly useful because when we are developing our web sites, we usually do not have a domain name, yet. Even if we do, it's common practice to develop our site on our personal computer, without a domain name, before launching it to our production server, where it does have a domain name — so, relative links are important.

The first rule of relative links is that they always resolve to the same domain as the source page. I cannot, for example, create a relative link to a youtube.com page from buildatheweb.cube-drone.com.

That means, though, that if we wanted to link to /images/chapter3/hot_dog.jpg, but we know that this file is going to be on the same domain and protocol as the place that we are linking from, we can just leave out the domain and protocol entirely, and include a link to /images/chapter3/hot_dog.jpg.

    
        <!-- these links go to the same place -->
        <a href='http://buildatheweb.cube-drone.com/images/chapter3/hot_dog.jpg'>Hot Dog</a>
        <a href='/images/chapter3/hot_dog.jpg'>Hot Dog</a>

The / at the beginning of the relative link communicates the web root — in the case of this page, the web root is buildatheweb.cube-drone.com/.

If we leave out the web root (represented by the initial / character) we instead ask the web server to construct the link starting at the folder that we are currently in.

Which means that, because we are already at the web root, we could link to images/chapter3/hot_dog.jpg and it, again, would refer to the same file.

    
        <!-- these links go to the same place -->
        <a href='http://buildatheweb.cube-drone.com/images/chapter3/hot_dog.jpg'>Hot Dog</a>
        <a href='/images/chapter3/hot_dog.jpg'>Hot Dog</a>
        <a href='images/chapter3/hot_dog.jpg'>Hot Dog</a>

However, let's now imagine that our HTML file is located in the /images directory. The first two links would stay the same — the absolute link and the relative link that references the web root — but the relative link that didn't start with / would no longer be rooted at the same directory. It would become invalid.

    
        <!-- these links go to the same place -->
        <a href='http://buildatheweb.cube-drone.com/images/chapter3/hot_dog.jpg'>Hot Dog</a>
        <a href='/images/chapter3/hot_dog.jpg'>Hot Dog</a>

        <!-- because we are already in the images directory,
                this link would go to the wrong place: /images/images/chapter3/hot_dog.jpg -->
        <a href='images/chapter3/hot_dog.jpg'>Hot Dog</a>

        <!-- this would fix the problem -->
        <a href='chapter3/hot_dog.jpg'>Hot Dog</a>

Images

Links are all well and good, but let's get our hands dirty with some honest-to-goodness multimedia content. Images!

Images can be rendered with the <img> tag, which works a lot like the <a> tag, but with src instead of href.

    
        <!-- images work like links but use 'src' instead of 'href' -->
        <img src='http://buildatheweb.cube-drone.com/images/chapter3/hot_dog.jpg'>
        <img src='/images/chapter3/hot_dog.jpg'>
        <img src='images/chapter3/hot_dog.jpg'>

Pictured: A Picture

The title attribute can be used to provide pop-over text for an image.

The alt attribute should be used to provide screen-reader text for an image, to help the visually impaired.

    
<img src="images/chapter3/hot_dog.jpg"
    alt="A picture of a hot dog"
    title="Do I look like I know what a JPEG is? I just want a picture of a god-dang hot dog.">

Image Formats And You

If we're going to post pictures to the internet, we're quickly going to run abreast of the myriad different image formats that exist, the most popular of which include BMP, JPEG, PNG, and GIF.

Each format has its strengths and weaknesses — JPEG files are great at reproducing photographic images but terrible for text content, GIF files can be animated but are limited to 256 colors at best, BMP files are great when it's 1995...

When in doubt, just use a PNG file. It's the most versatile mix of compression and clarity for most purposes.

For much, much more detail, check out this article.

Pictured: A Picture With Pop-Over Text and Alt Text

Divisions, Sections, and Asides

We're not really going to talk about divisions and sections until we get to the chapter on CSS.

For now, I'm just going to say that the div, section, nav, and aside tags are all invisible tags that we wrap around other content.

"Why even bother with invisible tags?" — well, they're useful for grouping content together for styling!

They're important, but they're not really visible until we start to style things.

Audio & Video

It's possible to embed both audio and video content in the browser. This is a relatively new feature.

Let's look at some!

    

<audio controls>
  <source src="horse.ogg" type="audio/ogg">
  <source src="horse.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

<video width="320" height="240" controls>
  <source src="movie.mp4" type="video/mp4">
  <source src="movie.webm" type="video/webm">
  Your browser does not support the video tag.
</video>

The audio and video tags work a little bit differently than the img tag. Instead of having just one src attribute, these tags wrap multiple source tags that each have their own src attribute.

The reason for this is that the browser can cycle through the various sources until it finds a source type that it supports — not all browsers support all of the different audio and video types. If a browser is too old to understand the audio or video tag, it will just display the text contained within the tag: "Your browser does not support the video tag".

Like with images, there are lots of different potential file formats for both audio and video files — although mostly we just need to remember to use mp3 for audio and mp4 for video if we want every browser to be able to play our files.

iFrames

The frame tag was once a tag that allowed for webpages to be constructed out of parts of other pages.

Frames aren't a thing anymore, though. They're no longer part of the HTML specification. Frames are dead.

What remains is the concept of the <iframe> — the inline frame.

It allows for a webpage to be embedded within a webpage. We've been looking at them every time I've embedded a sample webpage into this one!

    
 <!-- this is the iframe-->
 <iframe src="ex10.html"> </iframe>

    
<!-- this is the code that's inside the iframe-->
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>iFrame Example</title>
        <meta name="description" content="ifraaaaaame">
    </head>
    <body>

        <h3>This is an iFrame</h3>
        <p>It's an entirely different webpage, but it exists inline on this webpage. </p>

    </body>
</html>

Many More Tags

I feel like we've covered enough HTML tags that we can get a good start on building a webpage, but there are lots of other tags, like abbr, form, and input that we'll have to touch on in later chapters.

Here's a list of tags.

HTML Validation

How do we know if we've built our HTML page correctly? There are so many rules and we don't even know half of them yet.

One helpful way to know if we've built our websites properly is to run them through a validator, which will check our HTML for common errors. This one, from the W3C, is a good choice.

Here's the validation report for this site. If I've done my job, this should be completely empty, although I can guarantee you that it is probably not.

Let's Put it All Together

We've learned a lot of stuff, put some tags together, but now, I think it's time to put together a HTML curriculum vitae for my friend, Dr. Frampton Q. Fakenamington.

    
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Dr. Frampton Q. Fakenamington</title>
        <meta name="description" content="You should hire him, he is pretty great.">
        </head>
    <body>

    <h1>Dr. Frampton Q. Fakenamington</h1>

    <img src="images/chapter3/professor.jpg">

    <p>
        <a href="mailto:frampton@sample.org">frampton@sample.org</a>
    </p>
    <p>
        <a href="tel:+16045555555">604-555-5555</a>
    </p>
    <p>
    <address>
        123 Fake St.
        Surrey, BC, Canada
        V1V 1V1
    </address>
    </p>

    <h2>Professional Skills</h2>

    <ul>
        <li>Doctoring, but not the medical kind, the kind where mostly you write grant proposals</li>
        <li>Office (Word/Excel/PowerPoint)</li>
        <li>Some Kung-Fu</li>
        <li>Attentoin to Detail</li>
    </ul>

    <h2>Education</h2>

    <h3>Learnington University</h3>

    <p> Surrey, B.C. </p>

    <p> 2000-2017 </p>

    <p> PhD, Theoretical Mustard Sciences </p>

    <p>
        For the past 17 years, I've been advancing the field of theoretical mustard sciences, mostly
        by consuming a variety of delicious mustards. All of my results are published in the
        New England Journal of Various Mustards, a journal that I am the sole contributor and editor for.
    </p>

    <h3>Publications</h3>

    <table>
        <thead>
            <tr>
                <th>Title</th>
                <th>Authors</th>
                <th>Journal</th>
                <th>Date</th>
            </tr>
        </thead>

        <tbody>
            <tr>
                <td>A Simple Taxonomy Of Grainy Mustards</td>
                <td>Frampton Fakenamington</td>
                <td>New England Journal of Various Mustards</td>
                <td>2016</td>
            </tr>

            <tr>
                <td>Ballpark or Dijon: The Controversy Rages</td>
                <td>Frampton Fakenamington, Hurgen Jurgen</td>
                <td>New England Journal of Various Mustards</td>
                <td>2015</td>
            </tr>
        </tbody>
    </table>

    <h2>Interests</h2>

    <ul>
        <li>Mustard</li>
    </ul>

    <h2>Class Schedules</h2>
    <p>
        For any students looking for the syllabus for
        <strong>IAT 208 &mdash; Introduction to Hot &amp; Spirit Mustards </strong>,
        you can find it
        <a href="http://www.seriouseats.com/2014/05/mustard-manual-guide-different-types-mustard-varieties-dijon-brown-spicy-yellow-hot-whole-grain.html">here</a>.
    </p>

</body>
</html>

This produces the final product:

Good ol' Dr. Fakenamington now has a website that would look at home in any modern CS department.

Use Developer Tools to Take a Look Under The Hood

Do we want to see that HTML that sits underneath our favourite websites? Even this website? Yes. Yes we do.

Every modern browser now includes an "Inspect Element" or Developer Tools option.

We can look at all of the HTML that comprises any page that we visit.

It's also possible to use the HTML editor to change any page that we want to, to read whatever we like. As it turns out, this is great fun.

The Developer Tools are a Swiss Army Knife of useful tools and techniques for doing Web Things, and we'll be returning to them frequently for sweet tricks and tips.

Chapter 2 Summary

In this chapter, we've:

Spent entirely too much time learning about Unicode and character encodings.
Learned about markup languages in general, and HyperText Markup Language in specific.
Created an HTML page, with a
- Root HTML Tag
- Head & Body
- Charset
- Title
- Favicon
Learned about a whole whack of HTML tags, like
- Paragraphs
- Headers
- Lists
- Tables
- Special Characters
- Emphasis
- Links
- Images
- Audio & Video
- iFrames
Cracked open a page using developer tools

Chapter 2 Resources

Chapter 3: Cascading Style Sheets

It's all well and good to be able to author webpages, but we've come to expect a little more graphical acuity from the modern internet. A little more punch. A little more pizzazz. A touch more razzamatazz.

The CSS language is all about styling things.

The Content Stays the Same

One of the foundational rules of CSS design is that the content should be laid out in a way that makes styling it easy, but also that the content should be laid out in such a way that the theme could completely change without touching the HTML at all.

The site CSS Zen Garden has the exact same content rendered a few hundred different ways, using nothing but a succession of different stylesheets.

CSS is Made of Many Small Rules

CSS is composed of many small rules, each responsible for the display rules controlling a handful of elements.

A CSS rule is composed of two parts: the selector and properties.

The selector is a part of the CSS rule that defines which HTML tags we are interested in targeting — for example, all header tags or just the h1 tag or paragraph text.

The properties are a set of rules that define how to display the object — for example, should be blue or should be really big.

Including CSS In Our Page

So, let's create a CSS rule that defines that just the h1 tag should be really blue and really big.

    
h1 {
    color: blue;
    font-size: 300%;
}

Here, the h1 is the selector, and everything between the curly braces are the properties.

Let's add this rule to our HTML file and watch it in action. We can do this with the <style> tag.

    
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>CSS Example</title>
        <meta name="description" content="CSS">
        <style>
            h1{
                color: blue;
                font-size: 300%;
            }
        </style>
    </head>
    <body>
        <h1> Hello, world. </h1>
        <p> Well would you look at that? That h1 element is big and blue! </p>

    </body>
</html>

Success! We've applied color:blue; and font-size:300% to the h1 tag!

There are three ways to apply CSS styles to our pages:

By including it within the style tag, which we've just seen
By attaching a CSS file to the page
By attaching CSS properties directly to an HTML element using the style attribute.

CSS File

If we have a lot of style to define — and we probably do — we can keep all of these style definitions in a separate .css file.

        
            <link rel="stylesheet" type="text/css" href="style.css">

Here, we link to the style.css file. Any CSS rules that we write in this file will be applied to this page.

Our style.css file might contain this:

        
            h1{
                color: blue;
                font-size: 300%;
            }

Success once more!

Directly Applying CSS Properties

Finally, it's possible to apply CSS properties directly to HTML elements using the style property.

    
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>CSS Example</title>
        <meta name="description" content="CSS">
    </head>
    <body>
        <h1 style="color:blue; font-size:300%;"> Hello, world. </h1>
        <p> Well would you look at that? That h1 element is big and blue! </p>

    </body>
</html>

When we do this, we don't need to bother with selectors — we've already got the element that we want to apply this rule to, right here!

Success once again! Any way we apply these styles to the HTML, the result is always the same.

Selectors

The selector is a part of the CSS rule that defines which HTML tags we are interested in targeting — so let's dig in a bit and learn how to select elements.

Classes & IDs

Sometimes, we want to be able to select a very specific HTML tag, or a specific group of tags, and simple selection rules just don't cut it.

Let's take a look at this code from Dr. Fakenamington's CV:

    
    <p>
        <a href="tel:+16045555555">604-555-5555</a>
    </p>

This is very clearly a telephone number. We might want to apply a special style to just telephone numbers all across the page!

A class allows us to apply a style to a group of elements.

What we can do is mark it with the telephone class:

    
    <p>
        <a class="telephone" href="tel:+16045555555">604-555-5555</a>
    </p>

Then, we can write a CSS rule that makes all telephone numbers bold.

Class selectors in CSS start with the . character.

    
    .telephone{
        font-weight: bold;
        color: brown;
    }

That should do it. It's also possible to apply more than one class to the same object!

    
    <p>
        <a class="telephone bigger" href="tel:+16045555555">604-555-5555</a>
    </p>
    <p>
        <a class="telephone smaller" href="tel:+17785555555">778-555-5555</a>
    </p>

    
    .telephone{
        font-weight: bold;
    }
    .bigger{
        font-size: 120%;
    }
    .smaller{
        font-size: 80%;
    }

A class works for an object that might appear again and again on a page — like telephone numbers, or dates — but sometimes we want to identify a single, unique object. Let's imagine, for example, we had more than one telephone number, and we wanted to name them different things.

    
    <p>
        <a class="telephone" id="office_number" href="tel:+16045555555">604-555-5555</a>
    </p>
    <p>
        <a class="telephone" id="cel_number" href="tel:+17785555555">778-555-5555</a>
    </p>

We can select these identifiers using the # character.

    
    .telephone{
        font-weight: bold;
    }
    #office_number{
        color: fuchsia;
    }
    #cel_number{
        color: red;
    }

Fragments

Why in the world would we ever use IDs? — It seems like everything an ID can do, a class can do better!

The ID serves a double purpose, though! It's also possible to link directly to an ID within a webpage. On this page, we can navigate directly to the paragraph on CSS Is Made of Many Small Rules by navigating to http://buildatheweb.cube-drone.com/#css-is-made-of-many-small-rules.

The reason that we can do this? The header element has the ID css-is-made-of-many-small-rules.

Children

In Chapter 3, we talked about how HTML is hierarchical — with every element enclosed by another element.

This means that we can think of our entire HTML document like a tree data structure.

There's a thicket of terminology we're going to borrow, then, from the world of tree-shaped data.

Node	A single object within the tree. `html` is a node. `tr` is a node.
Root	The node at the base of the tree. In the case of HTML, this is always the `html` tag.
Child	A node directly connected to another node, moving away from the root. In our diagram, `body` is the child of `html`, and `table` is the child of `body`.
Descendant	A node reachable by repeatedly moving from parent to child. In our diagram, every node is a descendant of `html`, and `table`'s descendants are all of the `tr` and `td` elements.
Parent	A node directly connected to another node, moving towards the root. The opposite of a child. In our diagram, the `body` is the parent of `table`, which is the parent of `tr`.
Ancestor	A node reachable by repeatedly moving from child to parent. In our diagram, `body` has only `html` as its ancestor, whereas `td` has `tr`, `table`, `body`, and then `html`.
Leaf	Nodes that do not have any children. In our diagram, `title`, `meta`, `h1`, `p`, and `td` are all leaf nodes.

Now let's look at a new problem from Dr. Fakenamington's CV:

        
    <tbody>
        <tr id="a-simple-taxonomy-of-grainy-mustards" >
            <td>A Simple Taxonomy Of Grainy Mustards</td>
            <td>Frampton Fakenamington</td>
            <td>New England Journal of Various Mustards</td>
            <td>2016</td>
        </tr>

        <tr id="ballpark-or-dijon">
            <td>Ballpark or Dijon: The Controversy Rages</td>
            <td>Frampton Fakenamington, Hurgen Jurgen</td>
            <td>New England Journal of Various Mustards</td>
            <td>2015</td>
        </tr>
    </tbody>

Let's imagine that we want to apply a property to each of the td elements in a-simple-taxonomy-of-grainy-mustards, but not in ballpark-or-dijon. If we tried to use the selector td, our property would be applied to both groups!

We could take what we've learned so far about classes, applying a taxonomy class to each child of a-simple-taxonomy-of-grainy-mustards:

        
    <tbody>
        <tr id="a-simple-taxonomy-of-grainy-mustards" >
            <td class="taxonomy">A Simple Taxonomy Of Grainy Mustards</td>
            <td class="taxonomy">Frampton Fakenamington</td>
            <td class="taxonomy">New England Journal of Various Mustards</td>
            <td class="taxonomy">2016</td>
        </tr>

        <tr id="ballpark-or-dijon">
            <td>Ballpark or Dijon: The Controversy Rages</td>
            <td>Frampton Fakenamington, Hurgen Jurgen</td>
            <td>New England Journal of Various Mustards</td>
            <td>2015</td>
        </tr>
    </tbody>

But there's an easier way! Without creating the taxonomy class at all, we can just select all td elements that are children of an element with the ID a-simple-taxonomy-of-grainy-mustards.

We can do this by just writing one selector after the other:

        
    #a-simple-taxonomy-of-grainy-mustards td{
        font-weight: bold;
        font-family: Verdana, sans-serif;
        color: purple;
    }

This selects all td elements that are children of #a-simple-taxonomy-of-grainy-mustards:

Wildcards

Let's imagine we want to select every element inside a table, or every element on the entire page, regardless of what type of element that it is.

The wildcard operator, *, selects... everything.

        
    .table * {
        font-weight: bold;
        font-family: Verdana, serif;
    }

    * {
        font-size: xx-large;
    }

M-M-M-Multi-Selectors

Let's imagine that we have a rule that we want to apply to a group of different selectors at the same time. We can do this by chaining together selectors with a , character.

For example, we might want all of our header elements, all of our links, and all of our journal table entries to be a nice cool teal color.

        
    h1, h2, h3, h4, h5, h6, a, .journal td {
        color: teal;
    }

Hover And Pseudo-Classes

Alongside the classes that we define, like telephone, the browser will add and remove its own classes to HTML elements. These automatic classes are called "pseudo-classes".

For example, an a link will have the pseudo-class visited, when the client has already visited that link. Any element will have the pseudo-class hover, when the client's cursor is hanging over it.

The selector for pseudo-classes uses a : instead of a . — so, a pseudo-class selector might look like this:

        
    :hover{
        background-color: pink;
    }

That's good, but that CSS rule will apply that hover property to anything on the entire page! We can glom the hover property on to any other selector, though, by combining them: p:hover or .telephone:hover. Let's look at an example:

        
    <div class="big-ol-block">
        <p>Put your mouse over this</p>
    </div>

        
    .big-ol-block{
        width: 300px;
        height: 200px;
        background-color: blue;
    }
    .big-ol-block p{
        color: white;
    }
    .big-ol-block:hover{
        background-color: green;
        width: 500px;
    }

A Common Mistake With Pseudo Classes

Here's a mistake that I frequently make with these elements:

        
    <div class="big-ol-block">
        <p>Put your mouse over this</p>
    </div>

        
    .big-ol-block{
        width: 300px;
        height: 200px;
        background-color: blue;
    }
    .big-ol-block p{
        color: white;
    }
    .big-ol-block:hover{
        background-color: green;
        font-size: xx-large;
        width: 500px;
    }
    .big-ol-block :hover{
        background-color: red;
        font-family: monospace;
    }

Looking at this example, what's the difference between .big-ol-block:hover and .big-ol-block :hover?

It's the space, right? .big-ol-block:hover selects the big-ol-block class when hover is applied to it, but .big-ol-block :hover selects any children of the big-ol-block class that the cursor is hovering over — namely, the p element contained within.

With a Little Help From Our Friends

The selector library allows us to select all sorts of things on a webpage to apply different styles to them. We've talked about some of the features of selectors, but the best way to learn this topic, I think, is interactively.

So, check out this interactive CSS selector tutorial. It covers everything I've covered, and much more!

Properties

The other half of the CSS rule is the list of properties that define what we want the selected element to look like, like font-family: Arial or color: blue.

Inheritance

When we set some CSS properties in a rule, they apply not just to an element, but to all children of that element.

For example, let's imagine that we want to change the font for the entire document at once. We could use the wildcard selector:

        
    * {
        font-family: "Courier New", monospace;
    }

But if we just set the font for the body element, which wraps all content on the page, all of the children of body will inherit that font.

        
    body {
        font-family: "Courier New", monospace;
    }

That makes it clearer when we want to use a different font for a different part of the page.

        
    body {
        font-family: "Courier New", monospace;
    }
    h1, h2, h3, h4, h5, h6{
        font-family: "Arial", sans-serif;
    }

        
<h1>Hello, world</h1>

<p>
    We're a lazy, inattentive chef, so a sous vide is probably
    a better purchase than a pressure cooker.
</p>

<h2>Sous Vide</h2>

<p>
    Sous vide is a cooking technique where we leave a
    vacuum-sealed bag of our food in hot water for a long time.
    It makes it really hard to overcook our food, because we
    can forget it for hours and it doesn't matter.
</p>

Some CSS properties do not inherit. Others do. This definitely happens on a case-by-case basis, so the only way to really learn which ones are which are a little bit of trial and error — hopefully, with a bit of experience, this will become intuitive.

Inheritance Overwritten

An inherited property is always overwritten by an actual rule that sets the property. If, for example, we were to have:

        
    body {
        font-family: "Courier New", monospace;
    }
    p {
        font-family: Arial, sans-serif;
    }

In this case, the p element would inherit the font-family: "Courier New", monospace; property from body — but immediately write over it with the font-family: Arial, sans-serif; property of its own.

The Cascade

What happens when two different rules both select the same element and contain the same property? How do we resolve overlapping properties?

Well, the rules that define how overlapping properties collide are called the cascade, and they're the reason that CSS stands for "Cascading Stylesheets", rather than just "Stylesheets".

For each property applied to an element, the priority of that property is calculated based on its importance then specificity then order. Higher priority rules win out over lower priority rules, but the final set of properties that are applied to an element can be a combination of a set of properties from a variety of different CSS rules.

Importance

The first rule of the cascade is that you do not talk about the cascade.

The second rule of the cascade is that a property with the !important tag always wins over a property that doesn't have an !important tag.

        
    p {
        font-family: "Courier New", monospace;
        color: blue !important;
        font-size: small;
    }
    p {
        font-family: Arial, sans-serif !important;
        color: red;
        font-size: medium;
    }
    p {
        font-family: "Times New Roman", serif;
        color: green;
        font-size: large !important;
    }

        
    <p> This text should be large, blue, and in the Arial font </p>

Specificity

The next rule of the cascade is that more specific selectors always win out over less specific selectors.

Properties applied directly to an HTML element have the highest specificity.
An ID selector (#this) is more specific than a class (.this)
A class is more specific than a type selector (td)
Selectors that chain children together (table tr td) are more specific than selectors that don't (td).

        
    p {
        font-family: "Courier New", monospace;
        color: blue;
        font-size: small;
    }
    /* an ID has the highest specificity */
    #argle {
        font-family: Arial, sans-serif;
        color: red;
        font-size: medium;
    }
    .bargle {
        font-family: "Times New Roman", serif;
        color: green;
        font-size: x-large;
    }
    /* because it is part of a chain, p .bargle has higher specificity than just .bargle */
    p .bargle {
        font-family: Verdana, sans-serif;
        color: pink;
        font-size: xx-large;
    }

        
    <p id="argle" class="bargle"> This text should be medium-sized,
            red, and in the Arial font </p>
    <p class="bargle"> This text should be large, green,
            and in the Times New Roman font </p>
    <p> This text should be small, blue,
            and Courier New </p>
    <p> <span class="bargle"> This text should be extra large,
             pink, and in the Verdana font </span></p>

Order

The final rule of the cascade is that, with all other things being equal, the last rule defined wins.

        
    p {
        font-family: "Courier New", monospace;
        color: blue;
        font-size: small;
    }
    p {
        font-family: Arial, sans-serif;
        color: red;
        font-size: medium;
    }
    p {
        font-family: "Times New Roman", serif;
        color: green;
        font-size: large;
    }

        
    <p> This text should be large, green, and in the Times New Roman font </p>

Web Typography

Fonts! They're a huge part of design and we're going to talk about them for a bit!

The typographic choices that we make surrounding our site's body text — the text that we display when we're not displaying anything else — set the design tone for the whole rest of the document, and the four choices that matter the most are:

Font Choice
Font Size
Line Spacing
Line Length

Font Choice & The Web Safe Fonts

We can choose a font for our website using the font-family property.

        
    body {
        font-family: Helvetica;
    }

What do we do if our website visitor doesn't have Helvetica installed on their computer? The font-family property allows us to list a whole group of fonts. The browser will try to use each one of these fonts in turn until it finds one that works.

        
    body {
        font-family: Helvetica, Arial, sans-serif;
    }

While some fonts, like Arial, Times New Roman, Verdana, Georgia, Courier New, Arial Black, and Impact exist on virtually every computer, the browser also provides default fallback fonts called serif, sans-serif, monospace, cursive, and fantasy.

It's best to include one of these at the end of every font-family declaration — that way, even if our webpage is opened by a browser running on a toaster that doesn't have any of the default fonts installed, it'll be able to render something.

Using Strange Fonts

It's possible to embed fonts directly in to our webpages, allowing us to use a typeface that our users don't have installed. Right now, I'm using "Averia Serif Libre" for this page's headings.

Currently, the best resource for this is Google Fonts, which provides both an enormous library of useful fonts and a helpful wizard to help install and configure these fonts in our pages.

Not Using Strange Fonts

It's important to note, though, that adding a large web font or two can significantly increase the size of a webpage.

One of the major pitfalls of non-system web fonts is that they generally do not render well below about 12pt. System fonts like Arial or Georgia are designed in such a way that when they get very small, they can be crammed into a tiny pixel grid to still remain readable. Most web fonts are not designed with this in mind, and the process of hammering them into tiny pixel containers can be pretty rough on the final product.

For this reason, it's probably for the best only to use strange fonts for large display text, like headers.

When In Doubt, Just Use Georgia

It's a good font. Look at it. Crisp. Readable. Neutral. Kind. Sassy. Georgia is whatever we want it to be.

%	A percentage relating to another value, typically the enclosing element — for example `body{ width: 80% }` would set the `body` to be 80% of the viewport width, whereas `p{ font-size: 120% }` would just set the font-size to be 120% of the default font-size.	Relative
`width: 80%;`
px	A measurement in screen pixels. While this used to mean 1/96th of an inch, thanks to variable pixel density screens, `px` can mean different things based on the device that it is displayed on.	Absolute-ish
`width: 500px;`
em	A measurement equal to the width of the `m` in the currently set font. If the font changes, so does the `em`.	Relative
`width: 25em;`
rem	A measurement equal to the width of the `m` in the currently set font of the page's root element.	Relative
`width: 25rem;`
vw	1% of viewport width.	Relative
`width: 20vw;`
vh	1% of viewport height.	Relative
`width: 100%; height: 10vh;`
vmin	Whichever is smaller of `vh` and `vw`	Relative
vmax	Whichever is larger of `vh` and `vw`	Relative

Then there's also in, cm, mm, pt, and pc, which are all based on real-life measurements, most of which are kind of meaningless when applied to a computer monitor, so they're all quietly just mapped to various amounts of px.

The question then becomes but which measurements should I use?

But Which Measurements Should I Use?

Use % whenever we want to define something relative to something else.
Use em if we're working with fonts.
Use px if we know the exact pixel dimensions of the thing we're working with.
Use rem for everything else.

Line Height & Line Width

Lines should be between 45 and 90 characters long (including spaces), and in order for lines to be readable, they should be spaced comfortably far apart from one another.

We can set this using the line-height and width properties, respectively, as well as the measurements that we just learned about.

Let's take this unstyled HTML content:

and add our four CSS rules.

        
    body {
        font-family: Georgia, serif; /* font choice */
        font-size: 120%; /* font size */
        line-height: 1.5em; /* line spacing */
        width: 35em; /* line length */
    }

I like it! It's no mystery that this textbook uses very similar settings.

Smart Quotes and Special Symbols

One pet peeve of Design Folks is when writers use the wrong characters for things. It bothers them a lot.

For example, this badly written scenario:

The crimes are countless. Well, not countless. But there are definitely at least eight crimes!

The quote "I'm Spartacus" uses straight quotes instead of angled quotes.
And a straight apostrophe instead of an angled apostrophe.
That's a hyphen (-), not an em-dash — which is what we would expect to see, here.
45C instead of 45°C? Blasphemy.
Three periods instead of an ellipsis character…
(C) is no substitute for ©
1/2 could easily read ½
Some people might consider replacing the ?!?!? with a simple interrobang, ‽, but actually both of these would be incorrect. You get one punctuation mark, writers.
I'm not sure that Bad Writing ever incorporated into a full company.

Let's update this HTML so that we aren't embarrassed by the bad typography!

        
<p>
    &ldquo;I&rsquo;m Spartacus&rdquo;.
</p>
<p>
    Well, that was all well and good &mdash; but would he survive
       &hellip; temperatures of over 45&deg;C?
</p>
<p>
    &ldquo;I can definitely survive exactly 45 degrees of heat!&rdquo;
</p>
<p>
    Darn! But would he survive&hellip; &frac12; of a degree of extra heat?
</p>
<p>
    &copy; 2017 Bad Writing Inc.
</p>

That's much better! Well, the writing is still terrible. I don't think using ½ instead of "half" in a sentence is even remotely valid.

That HTML is pretty ugly, though!

There are two potential solutions to this problem:

If our HTML is encoded properly in UTF-8, we can just include the special characters directly in the file. Many code editors still won't automatically set our curly brace characters for us, though.
We can attach a JavaScript script to our site that intelligently converts quotes to smart quotes. This script works like magic and I highly recommend it.

Thin, Bold, Uppercase, Small-Caps, and Italics

I'm just going to show you some code and trust you to figure it out.

        
    .bold{
        font-weight: bold;
    }
    .uppercase{
        text-transform: uppercase;
    }
    .caps{
        text-transform: capitalize;
    }
    .lowercase{
        text-transform: lowercase;
    }
    .smallcaps{
        font-variant: small-caps;
    }
    .italic{
        font-style: italic;
    }
    .mixed-bag{
        font: bold italic 150% Georgia, serif;
    }

Colorful Text

Black is always in — it's trendy, and so slimming — but sometimes we want our text to be a different color. Links, for example, default to a rather hideous blue that we often want to change to match our color scheme.

Let's imagine we wanted to make every link on our page red, instead of blue. We could do that like so:

        
    a {
        color: red;
    }

Red is a named color. CSS supports over 140 different named colors, including white, blue, aquamarine, snow, and burlywood.

140 is not really a lot of colors, though! Our monitors can display over 16 million different colors, most of which don't have names.

To tell CSS about a color that isn't one of the 140 named colors, we have to define that color another way.

        
    a {
        color: #FF0000;
    }

CSS Colors

Colors, as reproduced on a monitor, are represented by three lights — a red light, a green light, and a blue light. When all of the lights are turned all the way up, we get a white pixel. When all of the lights are turned all the way down, we get a black pixel. Blue is created by turning the blue lights all the way up, and leaving the other two lights off. This is Additive Color.

Each light has 256 different possible settings — 0 meaning "off" and 256 meaning "maximum power".

We can set colors by setting these numbers — here's how we would turn our links red using an RGB code:

        
    a {
        color: rgb(255, 0, 0);
    }

This takes up a lot of space, though. The hex format is a handy format for relaying these values quickly — we convert each number to its hexadecimal representation, mash them all together, and display them like this:

        
    a {
        color: #FF0000;
    }

A Quick Gray

One thing that's easy to remember is that shades of gray always have the same R, G, and B values. So we can create very dark grays like #222222 and very light grays like #EEEEEE without having to think very hard.

Hex Color Tools

When we're looking to create colors that exist beyond the realm of grays, the relationship between hex values and colors can start to get a bit muddy.

Fortunately, countless tools exist to help us with this problem, from simple color pickers to full-on palette generators and complex color wheels.

Hue, Saturation, and Lightness

One common criticism of RGB is that it's very tied to the actual raw implementation of color on a technical level — which produces a scheme that doesn't map easily to our understanding of color.

In an attempt to produce a system that we can understand more easily, RGB can be mapped to Hue, Saturation, and Lightness or Hue, Saturation, and Value — schemes designed to be easier to manage because they more closely approximate how our brains process color.

In the HSL scheme:

Hue is a number between 0 and 360 that selects the color from a wheel.
Saturation is a percentage between 0 and 100 that determines how intense the color is, with 0 being "just a shade of gray" and 100 being "dazzling full-color".
Lightness is a percentage between 0 and 100 that determines how light or dark the color is, with 0 being "black" and 100 being "white"

CSS supports these schemes, although they are not commonly used, as they are a relatively new feature and most tooling still revolves around the old hex-RGB based system.

        
    a {
        color: hsl(0, 100%, 50%);
        /* 0 is "red",
           100% is "full red", and
           50% is "not any lighter or darker than standard red" */
    }

Opacity

One element of color that's not as frequently considered is the color's opacity.

This determines how transparent the color is. Opacity is the opposite of transparency — so an object with 100% opacity is 0% transparent, and an object that is 100% transparent has 0% opacity. If this seems opaque, your capacity for opacity is limited by your lack of perspicacity, but with tenacity your incapacity can be overcome.

The term for opacity in color is "alpha" — so we can set the RGBA value to 255, 0, 0, 0.9 to create a color that's entirely red and 10% transparent.

        
    a {
        color: rgba( 255, 0, 0, 0.9 );
    }

This also works within HSL:

        
    a {
        color: hsla( 0, 100%, 50%, 0.9 );
    }

We can also do this using the opacity property.

        
    a {
        color: red;
        opacity: 0.9;
    }

Web Typography Summary

Seriously, just use Georgia.

The Box Model

Now that we've defeated the basics of web typography, it's time to start looking in to layout — and layout starts with the box model.

What is the box model, you ask? In CSS, every element is contained within an imaginary box, and the rules for adjusting the size, color, and appearance of those boxes are the box model.

Height & Width

First of all, every element (except for inline elements, which we'll talk about later) has a width and a height associated with it.

To make these boxes easier to see, our examples are also going to use background-color, which is a property that sets… well, the background color.

We can set the width and height using any of the CSS Measurements we talked about before — em, px, %;

        
    .block-one {
        background-color: pink;
        width: 300px;
        height: 200px;
    }
    .block-two {
        background-color: green;
        width: 90%;
        height: 5px;
    }

        
<div class="block-one"></div>
<div class="block-two"></div>

Max Height & Max Width

If we're not sure how big the content is going to be, we can set max-width, min-width, max-height and min-height instead — they'll get bigger or smaller with their content, but they won't get any bigger or smaller than the bounds we set.

In order to help me illustrate this, let me introduce this picture of me, which is exactly 371 pixels wide and 455 pixels tall.

Say hi, me picture!

Looks like he's being shy.

Now, we're going to create a class with a maximum and minimum height, and display it with and without content:

        
    .block-one {
        background-color: pink;
        max-width: 300px;
        max-height: 300px;
        min-width: 30px;
        min-height: 30px;
    }

        
<div class="block-one"></div>
<div class="block-one"> <img src="images/classam.png"> </div>

What can we divine from this example?

Elements want to be as wide as they can. The block with no content has a min-width of 30px and a max-width of 300px, and it's 300px wide.
Elements do not want to be tall. The block with no content has a min-width of 30px and a max-width of 300px, and it's 30px tall.
Despite the fact that that image is greater than 300px in size in both directions, it's leaking out of its container and taking up all kinds of space.

Wait, what?? How? I don't understand how that picture is bigger than it's containing element. Is it stretching it?

We can understand this better by making that image partially transparent.

        
    .block-one {
        background-color: pink;
        max-width: 300px;
        max-height: 300px;
        min-width: 30px;
        min-height: 30px;
    }
    .block-one img{
        opacity: 50%;
    }

        
<div class="block-one"> <img src="images/classam.png"> </div>

The image isn't stretching its containing element — it just hits the edge and keeps on going!

Overflow!

Can we arrest this rogue element? Contain it within our mighty block-one element?

Yes, of course we can! We can tell it how to deal with elements that overflow using the... overflow property.

        
    .block-one {
        background-color: pink;
        max-width: 200px;
        max-height: 200px;
        overflow: hidden;
    }
    .block-two {
        background-color: pink;
        max-width: 200px;
        max-height: 200px;
        overflow: scroll;
    }
    .block-three {
        background-color: pink;
        max-width: 200px;
        max-height: 200px;
        overflow-x: hidden;
        overflow-y: scroll;
    }

        
<div class="block-one"> <img src="images/classam.png"> </div>
<div class="block-two"> <img src="images/classam.png"> </div>
<div class="block-three"> <img src="images/classam.png"> </div>

overflow:hidden takes any content that ventures outside of the containing element and hides it, whereas overflow: scroll gives the containing element the ability to scroll around in the large element.

Those pictures are totally under control!

The Background Of A Box

Sometimes images are part of the content of our page, and if that's the case, those images should be included as an img HTML tag — but sometimes images are part of the style of our page, and if that's the case, those images should be included as part of the page's stylesheet.

But instead of styling a box with a flat background color, we can style a box with a background image, allowing us to drop an image in to our stylesheet.

        
    .my-amazing-face {
        background-image: url('images/classam.png');
        width: 371px;
        height: 455px;
    }

        
<div class="my-amazing-face"></div>

We can also background-repeat images — like this handsome ring pattern.

With background-repeat set to repeat, even if the background image is smaller than the containing element, it will tile itself all the way to the edges.

        
    body {
        background-image: url('images/chapter3/circles-and-roundabouts.png');
        background-repeat: repeat;
    }
    div {
        background-color: white;
        width: 200px;
    }

        
<div> Hey there. </div>

Padding

Our content extends to the very edge of our box — which is sort of ugly, honestly. Fortunately, we can use padding to extend the box out beyond the edges of the content.

        
    body{
        background-color: #333333;
    }
    blockquote{
        background-color: #DDDDDD;
        width: 250px;
        padding-top: 20px;
        padding-left: 20px;
        padding-right: 20px;
        padding-bottom: 20px;
    }
    .no-padding{
        padding: 0;
    }

        
<blockquote>
    Most software today is very much like an Egyptian pyramid with millions
    of bricks piled on top of each other, with no structural integrity,
    but just done by brute force and thousands of slaves.
    &mdash; Alan Kay
</blockquote>

<blockquote class='no-padding'>
    if you ever code something that "feels like a hack but it works,"
    just remember that a CPU is literally a rock that we tricked into thinking
    &mdash; @daisyowl
</blockquote>

Look at that sweet padding!

Looks like TRBL

Entering padding-top, padding-right, padding-bottom and padding-left every single time that we want to set the padding on an element can be a little wordy. Fortunately, there are some shortcuts that we can use to set the padding quickly.

We can fold all four of these into one padding property, so long as we remember the order: Top, Right, Bottom, Left.

        
    body{
        background-color: #333333;
    }
    blockquote{
        background-color: #DDDDDD;
        padding: 100px 200px 30px 0; /* top right bottom left */
        width: 250px;
    }

        
<blockquote>
    Java is to JavaScript what Car is to Carpet. &mdash; Chris Heilmann
</blockquote>

Of course in many situations, we want to set the same padding in all four directions.

For that, there's an even easier shortcut: we just pass one measurement to the padding property.

        
    body{
        background-color: #333333;
    }
    blockquote{
        background-color: #DDDDDD;
        padding: 30px;
        width: 250px;
    }

        
<blockquote>
    "What one programmer can do in one month,
     two programmers can do in two months." — Fred Brooks
</blockquote>

Borders

Beyond the edge of the padding lives the border.

The border is an impenetrable wall between the background-color and the rest of the page.

We must never go beyond the wall. It is cold there.

By default, the border is invisible, but we can make it visible with the awesome power of various border properties.

        
    body{
        background-color: white;
    }
    blockquote{
        width: 250px;
        background-color: #DDDDDD;
        padding: 30px;
        border-top: 2px solid #222222;
        border-left: 5px solid blue;
        border-right: 4px dashed black;
        border-bottom: 10px dotted teal;
    }

Each of those border properties came with a width (in the CSS measurement of our choice), a line-type, and a color. The border needs all three to render properly.

Like with padding, if we know all four directions are going to be the same, we can save ourselves some typing, like so:

        
    blockquote{
        width: 250px;
        background-color: #DDDDDD;
        padding: 30px;
        border: 2px solid black;
    }

A Gentle Roundness About The Exterior

We can also gently round off the corners of our border with border-radius

        
    blockquote{
        width: 250px;
        background-color: #DDDDDD;
        padding: 30px;
        border: 4px solid black;
        border-radius: 10px;
    }

The larger the measurement in the border-radius, the more rounded the edges will be, until the element itself is totally round.

        
    blockquote{
        width: 75px;
        height: 75px;
        background-color: #DDDDDD;
        padding: 30px;
        border: 4px solid black;
        border-radius: 200px;
    }

Margins

Beyond the padding and the border wall lie the margins. The margins are for spacing — they define a no-man's land between this element and other elements.

        
    blockquote{
        background-color: #DDDDDD;
        padding: 30px;
        border: 4px solid black;
        margin-top: 50px;
        margin-left: 50px;
        margin-right: 50px;
        margin-bottom: 50px;
    }

Of course, like with padding, we can present these arguments in TRBL order or combine them into one argument.

        
    blockquote{
        margin: 10px 50px 10px 50px;
    }

        
    blockquote{
        margin: 50px;
    }

Centering Content With Auto Margins

If we know the width of an element, and we want to horizontally center it, we can set the margin-left and margin-right to auto, and the browser will automatically figure out how to set the margins such that the element is horizontally centred.

I can't recommend this technique enough — one of the easiest website layouts is simply a single column of readable text, and this trick gives it to us easily. It's how this text stays in the center of the page.

        
    .sassy{
        width: 500px;
        margin: auto;
        background-color: #ccc;
        border: 2px solid black;
        padding: 20px;
    }
    blockquote{
        width: 300px;
        margin: 20px auto 20px auto;
        padding: 20px;
        background-color: #333;
        color: white;
    }

        
<div class="sassy">
    <blockquote>
        "The best programs are the ones written when
        the programmer is supposed to be working on something else."
        &mdash; Melinda Varian
    </blockquote>
    <blockquote>
        "One accurate measurement is worth a thousand expert opinions."
        &mdash; Grace Hopper
    </blockquote>
</div>

Remember when we talked about how elements want to be as wide as they can be horizontally, but they want to be as short as they can be vertically? That's why we can use margin: auto; to center an element horizontally but not vertically. Using margin: auto; vertically is the equivalent of setting margin: 0;.

Margin Collapse

One of the tricky things about margins is that they can overlap with one another.

Let's imagine that we have two elements, one after the other, one with margin: 25px and one with margin: 35px. How much space would we guess is between them?

The first answer that would come to mind is 60px — the sum of the two, but this would be incorrect, because they overlap. In fact, the elements would only be 35px apart.

A Diagram Of That Thing I Just Said

While "Margin Overlap" would be a good name for this, apparently it is called "Margin Collapse" instead.

Divisions, Sections, & Asides (Again)

The box model is a really powerful abstraction for putting boxes around things, which is, at least for now, our most powerful layout technique.

Sometimes, though, we need to put a box around a group of HTML elements that aren't grouped together naturally.

Fortunately, for cases like that, we have our old friend div, an element that, like vodka or porridge, does not have any flavor of its own, instead existing just so that we can add our own flavors to it.

        
<div class="sassy">
    <blockquote>
        "That’s what’s cool about working with computers. They don’t argue,
        they remember everything and they don’t drink all your beer." &mdash; Paul Leary
    </blockquote>
    <blockquote>
        "Think about it; and think about it carefully.
        Nothing happens in our society without software. Nothing." &mdash; Uncle Bob Martin
    </blockquote>
</div>

<blockquote>
    "As developers, we are often one of the last lines of
    defense against potentially dangerous and unethical practices." &mdash; Bill Sourour
</blockquote>

At first glance, it may appear like that div has done nothing. And… it hasn't!

It won't do anything until we apply some style to it.

        
    .sassy{
        width: 350px;
        font-family: Georgia, serif;
        font-size: large;
        margin: auto;
    }

So, div is great for grouping stuff together.

There's also section and aside. They're… just other names for div.

Inline vs. Block Elements

In the ~~criminal justice~~ cascading style system, the ~~people~~ elements are represented by two separate yet equally important groups. The block elements which are stacked on top of one another, and the inline elements which are stacked next to one another.

Inline elements are things like links and emphasis — things that are expected to be within a paragraph. Some properties, like width, don't work on inline elements.

Block Elements	`display: block;`	`div`, `h1-h6`, `p`, `ul`, `ol`, `li`, `dl`, `table`, `blockquote`, `form`, `pre`
Inline Elements	`display: inline;`	`a`, `strong`, `em`, `img`, `br`, `input`, `code`, `span`
Invisible Elements	`display: none;`	`style`, `link`, `meta`

If we have an inline element that we want to make into a block element, we can use display: block.

If we have a block element that we want to make into an inline element, we can use display: inline.

span is to inline elements what div is to block elements — a featureless element to apply our own styles to.

Let's use display:block and the box model to turn an a link into a button!

        
    a{
        display: block;
        width: 200px;
        font-family: Georgia, serif;
        font-size: large;
        font-weight: bold;
        padding: 20px 50px 20px 50px;
        border: 2px solid #333;
        border-radius: 50px;
        color: #333;
        text-decoration: none;
        text-align: center;
    }
    a:hover{
        border: 2px solid darkblue;
        color: darkblue;
    }

        
    <a href="https://twitter.com/classam">@classam</a>

Let's look at that again, but without display:block.

Floats

Boxes in CSS — div, p — usually stack on top of one another.

This is a parade float that is also floated to the right

Sometimes, though, we want to have an element sit to the left or to the right of other elements.

This image, for example, is a parade float that has been floated to the right, with float: right;

We can also float this image to the left, with float: left;

CSS Comments

Wait wait wait! Hold the presses! We didn't want the element that we floated to the left to push that section header! We wanted it to push the p text, but not the h2 of the section header.

In order to stop floated elements from pushing elements around, we can use clear: both on the element that we want to be un-pushable.

Let's try this again, but this time we'll set clear:both on the h2 header for CSS Comments.

CSS Comments Again

Are we safe? Can we talk about comments, now? Yeah? Okay, let's continue.

Sometimes we want to comment out large blocks of CSS code, or add comments to our CSS.

The only way to comment in CSS is /* like this */

        
    /*
    a{
        display: block;
        width: 200px;
        font-family: Georgia, serif;
        font-size: large;
        font-weight: bold;
        padding: 20px 50px 20px 50px;
        border: 2px solid #333;
        border-radius: 50px;
        color: #333;
        text-decoration: none;
        text-align: center;
    }
    */
    a:hover{
        border: 2px solid darkblue; /* this is a comment */
        color: darkblue;
    }

Chapter 3 Summary

In this chapter we learned about:

Stylesheets
CSS selectors
CSS rules
The Box Model
Web typography
Basic layout

There's lots more to learn about CSS, but I think it's best to save that for a future chapter.

Chapter 4: Building A Web Server

We need to get our website online, somehow! While there are lots of online services that make this easy, we're going to start the unnecessarily hard way and run a web server on our own computer.

By tradition, when computering, our first program is always a program to print out 'hello, world'.

You could, of course, broadcast the message "hello, world" by tattooing it on your arm, as I have, but perhaps we should start with a less permanent solution to the problem.

This chapter is going to cover a lot of topics very loosely in order to get us to a workable initial state. Don't worry — we'll be covering everything, in depth, eventually.

We're going to be doing everything — absolutely everything — from a Linux command line.

Both of these choices — Linux, and the command line — might seem a little strange. Let's dig into them a bit.

Linux

Fully half of web programming is server programming. In order to program a server, we're going to need a server — a computer that stays online all the time and responds to client requests.

We could use our home computer as a server, but:

Windows and Mac operating systems have an awkward tendency to reboot themselves to install security updates and add features, and this is not optional.
A home computer has all of its security defaults set in a way that would be appropriate for a computer on a home network, not a computer exposed to the tireless onslaught of the public internet.
Unless we pay our ISP extra for a static IP address, our home's IP address changes regularly, which means that it will be difficult to find our computer.
Our home computer likely contains our own, personal, private information.
Doing other stuff on our home computer, like opening 80 browser tabs while playing video games, could affect the performance of our server.

Unless we happen to have a home fibre connection, it's likely that we're on an asymmetric connection to the internet — with downloads much faster than uploads. For a server, though, it is the upload speed that matters; serving a lot of traffic from our home internet connection would be like trying to operate a Denny's through a foot-wide hole — possible, but frustrating once we have more than one or two customers.

These are not insurmountable problems, and building a home server out of a spare computer is a fun and instructive project that I highly recommend.

With the home computer out as a server option, that leaves us renting server time from someone else.

There are lots of reasons that we might choose Linux over Windows if we're planning on running a server for a long time. Linux is lightweight. Linux is secure. Linux is able to stay online for a very long time. The reason for Linux's popularity as a server is very simple, though: Linux is free, which generally makes server Linux the most cost-effective option.

Free or Four

In software, there are two meanings of free.

The first meaning is the obvious one, and the correct one: free means "we don't have to pay for it".

But there's the other meaning of free, meaning the code is wild and free like an unbroken stallion. This is an alternative definition of free that has to do with our freedom to legally take the code, modify it as we please, and share it with others. Linux is also free in this way.

Many systems are one kind of free but not another - GitHub, for example, is don't-have-to-pay free but not freeeeeeedom!-free, whereas Red Hat Linux is freeeedom!-free but we still have to pay money dollars for it.

"It's a UNIX system! I know this!"

Linux, like Mac OS X, is based on Unix, an operating system with an almost half-century of history, only some of which involves velociraptors. That half-century can be a double-edged sword — the reason for strange design decisions is often shrouded in some ancient obscure keyboard layout.

Bell Labs & Unix

If we're at all interested in computing history — and why wouldn't we be, it's the best history — we're going to hear Bell Labs come up a lot.

It was created by Alexander Graham Bell in 1925, as the "Volta Laboratory and Bureau", a private research laboratory, and has changed hands numerous times since then.

Since opening, the laboratory has been responsible for eight different Nobel prizes between 1937 and 2014 — which gives it more Nobel laureates than the University of British Columbia.

Bell Labs invented transistors in the fifties, invented CCDs in 2009, and, in the sixties, discovered the cosmic background radiation that validated the existence of the Big Bang.

Oh, and during a brief lull in the seventies, Bell Labs created Unix and C, the most successful operating system and programming language in the world.

Way to go, Bell Labs.

Linux Distributions

One confusing thing about Linux is that we can't just crack open a laptop and start running Linux on it — in fact, Linux really refers to The Linux Kernel, which is the tiny beating heart of the operating system.

There are lots of things that we think of as part of an operating system — the window system, a desktop environment, the software that comes pre-installed on the system, device drivers, installers, Minesweeper — but the Linux Kernel doesn't come with any of these things.

So, the complete kit required to make a computer actually do computer things is called a Linux Distribution.

Popular distributions include Ubuntu, Red Hat/Fedora, Mint, Slackware, and Arch.

Different distributions focus on different priorities. Some, like Mint, focus on being easy to use as a desktop OS. Arch is highly configurable and optimizable, and popular with the sort of people who drive a 1996 Nissan 300ZX Turbo with a gigantic aftermarket spoiler and undercar lights.

Ubuntu started out resolutely trying to conquer the desktop PC market, but after a decade of mostly failing to do so, has seemingly embraced its position as a competitive and pretty comprehensive server operating system. I like Ubuntu, a lot, and it's the distribution I'm going to be using as an example distribution.

The Command Line & The Secure Shell

Most systems work is still done with the command line.

While I'd love to say that this is because the command line is inherently superior to all other methods of interaction with a computer, the actual reasons are simple:

It's easier to write a command line program than a GUI program.
SSH

SSH, or Secure Shell, is one of the most important tools in our web programming arsenal. SSH is a tool allowing for remote control of computers.

Most Linux distributions run the SSH daemon in the background, all the time, and it's possible to log in to the computer directly with a username and password.

But the SSH interface to a Linux computer doesn't come with any fancy doodads or graphical user interfaces. Nope, dad-gummit, we have to control the system by pulling up our suspenders, tying an onion to our belts, walking uphill both ways, and using the command line interface.

Let's Get This Terrible Party Started

First and foremost, we are going to need access to a command line.

If we're running Mac OS X, we can get to a command line, right now, by opening the "Terminal" program — although downloading iTerm2 gets us tabs and splits.

If we're running Windows, we can download Git. We're going to need Git anyways, it's the most widely used source control tool on the market, but it also comes with the easiest to configure unix-like console available in Windows. Move over, Cygwin.

If we're not in a hurry, we also might spend some time configuring ConEmu to work with Git Bash. This gets us the tabs and the splits.

Code Directory & Project Root

Somewhere on our computer is our home directory. This is the root for all of our personal files. In Windows, my home directory lives at C:\Users\Curtis, whereas in Mac OS X, the home directory lives in /users/curtis.

I'm fond of creating a directory, code, under the home root, as my project root. Every different project I'm working on lives in this directory. Then, I configure my terminal to open this directory when it starts up.

Let's open our terminal — be it Terminal, iTerm2, Git Bash, or ConEmu.

Shell Game

Technically, the terminal is the window that shows you the shell, which is the program that actually handles your interactions with the command prompt.

Bash is the Bourne Again Shell, the most common shell for Linux systems.

SSH is the Secure Shell, a shell to remote systems.

sh stands for "Shell", and it refers to whatever is the default shell on a system.

dash is a lightweight shell without the features and functionality of Bash. In Debian and Ubuntu, /bin/sh points to dash instead of bash, but bash is still the default shell for users.

dat's some good terminal

We're staring a command prompt in the face. That $ means that the terminal is ready for us to enter a command.

Let's Putter Around In The Terminal A Bit

I'm excited! I get to tell you about the most important shell commands!

Let's start by asking where we are, using the print working directory command, pwd.

curtis@SMOKESTACK ~/code
$ pwd
/c/Users/curtis/code

Tilde Swinton

So, we're at /c/Users/curtis/code, but the terminal says that we're at ~/code. What's the deal?

~(tilde) is shorthand for "home". Since our home is /c/Users/curtis, ~/code is shorthand for /c/Users/curtis/code.

MSYS

Follow-up question: why is it /c/Users/curtis/code and not C:\Users\curtis\code?

Simulating Bash on Windows requires a layer to translate Windows concepts into vaguely-Linux-shaped concepts. This is provided by MinGW — Minimalist GNU for Windows — and MSYS, which is a collection of utilities like Bash ported to run on MinGW.

Linux doesn't support lettered-drive roots like C:, instead prefering to have a single root for all drives, /. Linux also uses forward slashes instead of backslashes.

MSYS and MinGW do this conversion for us, but the differences between Windows and Linux are going to continue to bite us all the way throughout this journey.

Mac OS X, being built on top of a Unix system called "Darwin", doesn't suffer from these problems.

Terminal Commands

Using the change directory command, cd, we can change our directory.

curtis@SMOKESTACK ~/code
$ cd ~

curtis@SMOKESTACK ~
$ pwd
/c/users/curtis

Look, we're home, now!

We can navigate to a specific directory:

curtis@SMOKESTACK ~
$ cd /

curtis@SMOKESTACK /
$ cd /etc

curtis@SMOKESTACK /etc
$

And we can use ls to display the contents of a directory:

curtis@SMOKESTACK /etc
$ cd ~

curtis@SMOKESTACK ~
$ ls
AppData
Application Data
ConEmu
Contacts
Cookies
Creative Cloud Files
Embarrassing Pornography
Even More Embarrassing Pornography
Seriously, Goats and Stuff
Desktop
Documents
Music
NTUSER.DAT
_vim
_viminfo
_vimrc
code
pip
vimfiles

Whoa. There's a lot of stuff in my home directory.

If there's no code directory in here, yet, we can make one, with mkdir

curtis@SMOKESTACK ~
$ mkdir code
mkdir: cannot create directory 'code': File exists

That doesn't work, because in my case, the code directory already exists.

Here's a good trick. Type in cd c and then hit the tab key.

curtis@SMOKESTACK ~
$ cd code

curtis@SMOKESTACK ~/code
$

The terminal should auto-complete to the first thing that starts with a 'c'. If you press tab again, it'll auto-complete to the second thing that starts with a 'c'. This is a very useful tool.

All this computer hacking is making me thirsty. I think I'll order a TAB.

Let's create a directory for our hello world program.

curtis@SMOKESTACK ~/code
$ mkdir hello_world

curtis@SMOKESTACK ~/code
$ ls
hello_world

Now let's delete that directory and then recreate it, just to be contrary.

curtis@SMOKESTACK ~/code
$ rmdir hello_world

curtis@SMOKESTACK ~/code
$ ls

curtis@SMOKESTACK ~/code
$ mkdir hello_world

curtis@SMOKESTACK ~/code
$ ls
hello_world

We can use cd to jump in to the folder we've just created, and we can also use cd with the special shortcut .. to go up a level.

curtis@SMOKESTACK ~/code
$ cd hello_world

curtis@SMOKESTACK ~/code/hello_world
$ cd ..

curtis@SMOKESTACK ~/code
$ cd ..

curtis@SMOKESTACK ~/
$
hello_world

That feels like a pretty solid introduction to the command line. With these simple tools we can zip around the filesystem creating directories like a pro!

Curtis's Big Sexy Terminal Cheat Sheet

`ls`	list the contents of a directory
`cd`	change the directory
`pwd`	tell me what directory I'm in
`mkdir`	make a directory
`rmdir`	remove a directory
`touch`	make a file
`cat`	print a file
`rm`	remove a file
`cp`	copy
`mv`	move
`sudo`	i am an admin, let me do this thing

For a more in-depth introduction to Unix/Linux, you could always visit this Unix Tutorial for Beginners.

Installing Ubuntu

This used to be a process that involved either running both Ubuntu and Windows/Mac OS X on our primary computer, and switching from one to the other when we boot up the computer — or, finding a junky old computer from around the house and installing Ubuntu on that.

With Virtual Machines, though, it's possible to run Ubuntu from within our primary operating system. So we should definitely do that. In fact, the toolset for doing this has become so easy to use that I've started running a different virtual machine for every project I'm working on.

The tools that make this easy and free are VirtualBox and Vagrant. Download and install these programs.

VirtualBox & Vagrant

VirtualBox is software to run virtual machines. We could get away with just using VirtualBox in our quest to install Ubuntu, but that would take a lot of effort and clicking.

Vagrant is a wrapper around VirtualBox that provides a very hackable command line interface and a gigantic repository of pre-created operating system images. Installing an operating system with Vagrant can be as easy as just picking it out of a catalogue and modifying a text file.

One of the things I like most about Vagrant is that it's possible to configure the entire operating system automatically. Instead of laboriously hand-configuring the operating system every time we move to a new machine or accidentally bork a database, we can just incinerate the virtual machine clean and summon a new one.

Let's go back to our new 'hello_world' directory and type in vagrant init

curtis@SMOKESTACK ~/code
$ cd hello_world

curtis@SMOKESTACK ~/code/hello_world
$ vagrant init
A `Vagrantfile` has been placed in this directory. You are now
ready to `vagrant up` your first virtual environment! Please
read the comments in the Vagrantfile as well as documentation
on `vagrantup.com` for more information on using Vagrant.

vagrant init creates for us a Vagrantfile. This file describes all of the important details about the operating system we are about to create — which Linux distribution we want, networking details, how much RAM it gets to use, and even, if we configure it right, the instructions for setting up all of the software that runs on the operating system.

Now it's time to open up the Vagrantfile in a text editor.

    
# -*- mode: ruby -*-
# vi: set ft=ruby :

# All Vagrant configuration is done below. The "2" in Vagrant.configure
# configures the configuration version (we support older styles for
# backwards compatibility). Please don't change it unless you know what
# you're doing.
Vagrant.configure(2) do |config|
  # The most common configuration options are documented and commented below.
  # For a complete reference, please see the online documentation at
  # https://docs.vagrantup.com.

  # Every Vagrant development environment requires a box. You can search for
  # boxes at https://atlas.hashicorp.com/search.
  config.vm.box = "base"

  # Disable automatic box update checking. If you disable this, then
  # boxes will only be checked for updates when the user runs
  # `vagrant box outdated`. This is not recommended.
  # config.vm.box_check_update = false

  # Create a forwarded port mapping which allows access to a specific port
  # within the machine from a port on the host machine. In the example below,
  # accessing "localhost:8080" will access port 80 on the guest machine.
  # config.vm.network "forwarded_port", guest: 80, host: 8080

  # Create a private network, which allows host-only access to the machine
  # using a specific IP.
  # config.vm.network "private_network", ip: "192.168.33.10"

  # Create a public network, which generally matched to bridged network.
  # Bridged networks make the machine appear as another physical device on
  # your network.
  # config.vm.network "public_network"

  # Share an additional folder to the guest VM. The first argument is
  # the path on the host to the actual folder. The second argument is
  # the path on the guest to mount the folder. And the optional third
  # argument is a set of non-required options.
  # config.vm.synced_folder "../data", "/vagrant_data"

  # Provider-specific configuration so you can fine-tune various
  # backing providers for Vagrant. These expose provider-specific options.
  # Example for VirtualBox:
  #
  # config.vm.provider "virtualbox" do |vb|
  #   # Display the VirtualBox GUI when booting the machine
  #   vb.gui = true
  #
  #   # Customize the amount of memory on the VM:
  #   vb.memory = "1024"
  # end
  #
  # View the documentation for the provider you are using for more
  # information on available options.

  # Define a Vagrant Push strategy for pushing to Atlas. Other push strategies
  # such as FTP and Heroku are also available. See the documentation at
  # https://docs.vagrantup.com/v2/push/atlas.html for more information.
  # config.push.define "atlas" do |push|
  #   push.app = "YOUR_ATLAS_USERNAME/YOUR_APPLICATION_NAME"
  # end

  # Enable provisioning with a shell script. Additional provisioners such as
  # Puppet, Chef, Ansible, Salt, and Docker are also available. Please see the
  # documentation for more information about their specific syntax and use.
  # config.vm.provision "shell", inline: <<-SHELL
  #   sudo apt-get update
  #   sudo apt-get install -y apache2
  # SHELL
end

Picking a Box

In order to boot up an operating system, first, we have to pick a base box — this is the blank, untouched, pristine operating system that we are going to be working with.

A catalogue of base boxes exists here.

We're going to choose ubuntu/xenial64.

Xenial Xerus

Every release of Ubuntu has been paired with an Alliterative Animal name, and they've been in alphabetical order from 5.10 (Breezy Badger) all the way up to 17.04 (Zesty Zapus).

Every 4 years, Ubuntu also releases a LTS "Long Term Support" version of Ubuntu, which is scheduled to keep getting attention and security updates for 5 years after it launches. Xenial Xerus, 16.04, is the most recent LTS version of Ubuntu, which is why we picked it.

Now, they're rolling around to the front of the alphabet again.

One tradition when rolling through the alphabet is to go "X", "Y", "Z", "AA", "BB".

They've managed to find an "AA" animal, the "Artful Aardvark".

"BB" might B a little harder.

The line in the Vagrantfile pertaining to boxes will read:

    
# Every Vagrant development environment requires a box. You can search for
# boxes at https://atlas.hashicorp.com/search.
config.vm.box = "ubuntu/xenial64"

Forwarding Port 80

The virtual machine is going to be running a web server (spoiler alert) on port 80. The only problem with that? The ports on the virtual machine aren't accessible by default. We need to write a rule to connect a port on our outer machine to a port on the virtual machine.

By convention, 8080 is popular port for developing HTTP services, so we're going to use that.

The VagrantFile already had this option ready for us — all we have to do is remove the # character to turn the line from a comment into real code.

    
# Create a forwarded port mapping which allows access to a specific port
# within the machine from a port on the host machine. In the example below,
# accessing "localhost:8080" will access port 80 on the guest machine.
config.vm.network "forwarded_port", guest: 80, host: 8080

Syncing a Folder

This is a more advanced feature, but one that's going to allow us to edit our codebase on the virtual machine with a graphical text editor on our host machine, and that seems like it's going to be worth the effort.

First of all, let's create a folder to sync.

curtis@SMOKESTACK ~/code/hello_world
$ mkdir html

Then, let's sync that folder to a location within the virtual machine.

We could choose just about any location, but if we were to do something like "accidentally sync over a whole lot of important system files", we would be in for a world of trouble.

One spot that's generally safe is /home/ubuntu/something. Vagrant boxes usually come with one user operating on them, already. This one comes with a user named ubuntu, and that user's home directory is /home/ubuntu, which means we can put anything in there that we want.


# Share an additional folder to the guest VM. The first argument is
# the path on the host to the actual folder. The second argument is
# the path on the guest to mount the folder. And the optional third
# argument is a set of non-required options.
config.vm.synced_folder "html", "/home/ubuntu/html"

Pull the Lever, Kronk

Save the Vagrantfile and we're ready to get started!

Let's turn the Virtual Machine on, with vagrant up.

curtis@SMOKESTACK ~/code/hello_world
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Checking if box...

Eventually a mountain of output will go by.

If that process went off without a hitch, we are good to go.

Logging In

Finally, we can log in to our newly created Ubuntu operating system with the command vagrant ssh.

curtis@SMOKESTACK ~/code/hello_world
$ vagrant ssh
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-78-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

0 packages can be updated.
0 updates are security updates.


Last login: Thu May 18 02:31:20 2017 from 10.0.2.2
ubuntu@ubuntu-xenial:~$

This is a whole new command line. The command line of the Ubuntu Virtual Machine. We've got an Ubuntu server running inside our computer!

Take a moment for a brief celebration.

Installing a Web Server

We're not done, yet. Our server is running, yes, and we've made sure that, if it listens on port 80, we can connect to it.

One problem, though — the server isn't listening on port 80. This is just a server, right now. We need to install a web server on our server server.

We can do that with the command sudo apt-get install nginx.

Whoa. There's a lot of stuff happening in that command. Let's unpack that a bit!

sudo

sudo stands for "superuser do".

Linux is, by nature, a multi-user operating system. It's designed to have many users logged in at once.

In order to keep users from being able to access each other's files, each file has a set of permissions that defines who is able to see it, and many system functions are locked down so that common users can't touch them.

sudo is a way of saying "I am the administrator and I can do whatever I like".

The 'sudo make me a sandwich' comic from XCKD'

Not everybody has access to sudo — only admin users.

Superuser access is required to install software which is what we wanted to to, here.

apt-get install

apt-get is the command line interface to the advanced packaging tool, which — and this is going to seem pretty magical if you're used to the way that Windows does things — allows you to install any Ubuntu program, automatically, with a single command.

Seriously. Want vim? Type sudo apt-get install vim. Want to install a program that pipes your command output through ASCII-art cows? sudo apt-get install cowsay. Want a steam locomotive? sudo apt-get install sl.

nginx

Nginx is a fast, lightweight HTTP server. It can also be configured to do a few other fun tricks, like act as a load balancer or reverse proxy, but right now we're interested in just using Nginx to serve some web page.

sudo apt-get install nginx

Well, that's enough explanation. Let's get to it!

ubuntu@ubuntu-xenial:~$ sudo apt-get install nginx

This will ask us to confirm it by pressing y, then chug along and eventually stop chugging. At this point, we have a web server!

Prove it

Don't believe me? Open a browser and visit localhost:8080

Egads! It worked!

localhost

This is a shortcut! localhost is a domain that always resolves to 127.0.0.1, which is a special IP address that always resolves to this computer, the one that I am using right now.

:8080

Earlier, I mentioned that the port that your browser connects to is always the default port for the protocol — in the case of HTTP, that's 80. However, when we created the Virtual Machine, we specified that port 80 on the Virtual Machine should correspond with port 8080 on the host machine.

When we enter a domain name, we can always add a colon and the port number to specify that we're connecting to a non-standard port.

Creating a Basic HTML Document

Now that we've got a server running an HTTP server, we need something to serve.

We're going to go into more detail about HTML later, but for now, let's create a very simple file, index.html and put it in our ~/code/hello_world/html directory on our host computer

index.html

index.html is a special filename.

Technically, when we navigate to a website, we are always supposed to indicate the file that we are asking for — our URL should end with /whatever.html or /anything.jpg to indicate the file that we're looking for.

By convention, however, many web servers are configured to look for index.html by default when no filename is specified. This is because, when we enter a directory, the index file would provide us a list of files in that directory.

So, remember — the index.html file is what your web browser will serve if you don't request a specific file.

    
<!-- I go in ~/code/hello_world/html/index.html -->
<html>
<head>
  <title>hello, world</title>
</head>
<body>
  <h1>hello, world</h1>
</body>
</html>

This file should now exist at /home/ubuntu/html/index.html on our virtual machine.

Serving our HTML

By default, nginx serves the contents of a system directory, /var/www/html, which contains the "hey, you're running nginx" message we saw earlier.

We could just copy our file into /var/www/html, but that's not the satisfying solution that we want. No, we want to tell nginx to serve from /home/ubuntu/html rather than /var/www/html by changing nginx's configuration files!

One problem: nginx's configuration files are deep within the Ubuntu virtual machine, not synced to anywhere on our host machine. We have to edit these configuration files from within the virtual machine.

A Different Kind of Text Editor

Presumably, we're far enough along in the craft of programming to know that a text editor is the software developer's multi-tool, and one's choice of text editor is approximately as emotionally charged and contentious as one's choice of video game console.

But there's a problem — because we're working extensively with SSH, and SSH is a command shell, with no mouse to speak of, we're going to need a text editor that operates entirely from within the shell.

This leaves us, then, with about three different options:

vim, my favourite, which is cryptic and difficult, but powerful
emacs, which is even more cryptic and difficult, but even more powerful
nano, which is quite easy to learn, but offers limited functionality

Nano is Like a Spork

As much as I want to push my pigheaded vim lust on to a new generation of developers, nano is the easiest editor to get started with.

Configuring Nginx

ubuntu@ubuntu-xenial:~$ sudo nano /etc/nginx/sites-available/default

sudo again

Why sudo? Well, /etc/nginx/sites-available/default is a system file, and we wouldn't normally have access to modify it.

/etc is for config

/etc is the directory that contains the configuration files for most Linux systems. This is not a very good name. They might as well have called this directory /you/know/whatever or /idfk — or maybe, and I'm just spit-balling, here, /config, but the insane tradition of calling this directory /etc lives on.

/sites-available/default

While the core configuration for nginx lives at /etc/nginx/nginx.conf, all it does is reference a multitude of other files.

Sometimes I like to throw out all of the default configuration and just dump all of my configuration directly into nginx.conf, but for this example we're going to play ball with the rules and instead edit this other configuration file.

Here's the configuration file:

    
## /etc/nginx/sites-available/default
##
# You should look at the following URL's in order to grasp a solid understanding
# of Nginx configuration files in order to fully unleash the power of Nginx.
# http://wiki.nginx.org/Pitfalls
# http://wiki.nginx.org/QuickStart
# http://wiki.nginx.org/Configuration
#
# Generally, you will want to move this file somewhere, and start with a clean
# file but keep this around for reference. Or just disable in sites-enabled.
#
# Please see /usr/share/doc/nginx-doc/examples/ for more detailed examples.
##

# Default server configuration
#
server {
        listen 80 default_server;
        listen [::]:80 default_server;

        # SSL configuration
        #
        # listen 443 ssl default_server;
        # listen [::]:443 ssl default_server;
        #
        # Note: You should disable gzip for SSL traffic.
        # See: https://bugs.debian.org/773332
        #
        # Read up on ssl_ciphers to ensure a secure configuration.
        # See: https://bugs.debian.org/765782
        #
        # Self signed certs generated by the ssl-cert package
        # Don't use them in a production server!
        #
        # include snippets/snakeoil.conf;

        root /var/www/html;

        # Add index.php to the list if you are using PHP
        index index.html index.htm index.nginx-debian.html;

        server_name _;

        location / {
                # First attempt to serve request as file, then
                # as directory, then fall back to displaying a 404.
                try_files $uri $uri/ =404;
        }

        # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
        #
        #location ~ \.php$ {
        #       include snippets/fastcgi-php.conf;
        #
        #       # With php7.0-cgi alone:
        #       fastcgi_pass 127.0.0.1:9000;
        #       # With php7.0-fpm:
        #       fastcgi_pass unix:/run/php/php7.0-fpm.sock;
        #}

        # deny access to .htaccess files, if Apache's document root
        # concurs with nginx's one
        #
        #location ~ /\.ht {
        #       deny all;
        #}
}


# Virtual Host configuration for example.com
#
# You can move that to a different file under sites-available/ and symlink that
# to sites-enabled/ to enable it.
#
#server {
#       listen 80;
#       listen [::]:80;
#
#       server_name example.com;
#
#       root /var/www/example.com;
#       index index.html;
#
#       location / {
#               try_files $uri $uri/ =404;
#       }
#}

This is an awful lot of file to look at, but there's only one line that we're concerned about. We need to change root /var/www/html; to read root /home/ubuntu/html;, and then save the file.

Rebooting nginx

That change should do it, but none of our configuration changes will take effect until we reboot nginx.

ubuntu@ubuntu-xenial:~$ sudo service nginx restart

We Did It!

Let's go back to localhost:8080 in a browser.

this is what the browser window should look like if we have succeeded

Please attempt to contain your excitement.

Who would have thought that a simple "hello, world" program would take so much effort?

Cleaning Up & Starting Over

We're done, but our Virtual Machine is still running. It'll keep on running, forever. That's the whole point of a server.

We can quit the terminal.

ubuntu@ubuntu-xenial:~$ exit
logout
Connection to 127.0.0.1 closed.

curtis@SMOKESTACK ~/code/hello_world
$

But if we check localhost:8080, the server's still running.

The only way to truly kill the server is to ~~plunge a stake through it's heart~~ turn it off, which we can do with vagrant halt.

curtis@SMOKESTACK ~/code/hello_world
$ vagrant halt
==> default: Attempting graceful shutdown of VM...

That killed it.

Let's imagine, though, that we made a serious mistake. While we were working on the operating system we accidentally made a change that ruined everything forever and we don't know how to fix it.

We want to go back to a completely clean slate, to the good times before we had a Virtual Machine at all. This is the situation where vagrant destroy is here to purify our hard drive with its scouring light.

curtis@SMOKESTACK ~/code/hello_world
$ vagrant destroy
default: Are you sure you want to destroy the 'default' VM? [y/N] y
==> default: Destroying VM and associated drives...

Never again will you trouble our shores with your foul presence, Virtual Machine.

We can always bring it back with vagrant up when we need it again.

Chapter 4 Summary

We've done a lot, to get our web server up and running and serving "hello, world"!

We installed a command line interface and learned some basics about how to use it.
We used Vagrant and VirtualBox to install an Ubuntu Linux Virtual Machine on our computer.
We used sudo and apt-get to install nginx on our server.
We created a basic HTML file.
We used nano to change nginx's configuration.
We rebooted nginx.
We destroyed the Virtual Machine.

Oh no! This is as far as I've written! Keep track of the project on GitHub for updates!