Dashes: - vs. – vs. —
The grammar school version (for English usage) is:
-
(known as an hyphen) between the elements of compound words--
(known as an en-dash) for ranges "3–7"---
(known as a em-dash) punctuation for digressions in a sentence—though how it differs from a parenthetical comment I have never known—which is why you don't see it much
As Charles notes in the comments, you should probably consult whatever style guide you use (or are required to use) for a more comprehensive and detailed treatment. Especially for the tricky cases.
The answers and comments I have seen so far are incomplete and not entirely accurate in some regards. Rather than hiding all the information in comments, here is a summary. Note that I'm listing only the cases that occur most frequently and that seem to raise the most questions. I am not saying that my answer is exhaustive. (For example, I'm omitting discussion of the various uses of dashes in dialogues or quoted speech because (1) they pop up rarely in the (La)TeX world as they are mostly relevant to copyeditors of fiction (who can be assumed to have learned the rules otherwise) and (2) their usage is rare and thus not necessarily governed by hard-and-fast, rigid rules.)
The em-dash (---
) has very few recommended use cases nowadays.
- The most common generally accepted possible use case is before a name in a quotation that is attributed only to a person (that is: not quoted bibliographically), say, at the beginning of a paper or chapter.
- In US typography, parenthetical phrases are traditionally set off with em-dashes without surrounding spaces. (Some sources recommend "hair spaces" around the em-dash for this practice, but other sources recommend against that, and I don't see this commonly done by newspapers and magazines in the US.) This practice is recommended against by some style guides these days (typographer Robert Bringhurst indirectly calls the em-dash "Victorian"), and I agree. While you could describe this as a question of visual taste, one definite problem with em-dashes without surrounding spaces is that tracking (aka letter-spacing aka "stretching" in some TeX lingo) doesn't play well with this: interword spacing will increase uniformly for words separated by spaces but will remain fixed for words separated by an em-dash without surrounding spaces. Of course that would not apply to em-dashes surrounded by spaces (the NYTimes style guide recommends this, for this reason), but keep in mind that this makes for really fat typographic separation.
(Note that usage of either em- or en-dashes for parenthetical purposes includes many cases where you visually see only one em/en-dash, simply because of the implicit orthographic rule that the beginning or end of a sentence "eats" one member of the "dash-dash parenthesis symbol pair".)
The en-dash (--
) has two frequent usages:
- As a modern punctuation mark for parenthetical phrases. The en-dash in this usage is always surrounded by spaces. This also works well with kerning (see above). For this parenthetical usage, the em-dash is more common in US typography, while the en-dash is more common in UK typography.
- As a hyphen-replacement, functioning as a semantic linking element in a compound word (most frequently: a compound noun) that binds together two elements more loosely than a normal-space-in-that-same-compound. This is very important: When I have a compound such as "pre-World War II", I really want it to appear as
pre-–World␣War␣II
(too bad Unicode en-dashes don't render correctly on this forum!) with␣
standing in for a fixed (non-stretchable) space (this question is about how to produce such fixed-width spaces), because semantically the compound has the structure [pre [World War II]], because the "pre" modifies the entire compound word/expression "World War II", not just the word "World". That is: Don't look at simple rules telling you about hyphenated adjectives, prefixes, etc. for nouns (these are too complex); instead pay attention to the semantic structure, because this is the only thing that ultimately counts. We want a separator that separates orthographic words (= clusters of contiguous letters) a tad more than an ordinary space while still indicating that things belong together. If people were writing "pre-World War II" directly (with a hyphen, which is normally (not clearly on this forum though) shorter than or of equal length as an ordinary space), the immediate visual parsing experience would be to understand this structurally as [[pre World] War II], which is semantically incorrect. Note that linguistically the fact that English orthography doesn't (unlike German orthography) treat all linguistic words as orthographic words ("swimming pool" is one linguistic word but two English orthographic words) is what essentially necessitates this usage of an en-dash, which is entirely absent from German orthography. For example, we write things like "Konrad-Adenauer-Stiftung"; here ordinary hyphens represent all levels of inner-compound connections. Even though the division between "Konrad" and "Adenauer" is tighter than the one between "Adenauer" and "Stiftung" (the semantic structure here is [[Konrad Adenauer]-Stiftung]), there is at least no illusion that "Adenauer" binds closer to "Stiftung" than to "Konrad" (which there would in English, if you used a space for the first gap and only a hyphen instead of an en-dash for the second gap). Note, the point here is that the en-dash has this effect, because it must be visually slightly wider than an ordinary space.
Q: Are there constraints against line-breaking around dashes?
- A: parenthetical dashes: I believe to have seen line breaks before and after parenthetical dashes in the past, but when I recently checked a bunch of US magazines (such as the New Yorker, the Atlantic, and Harper's), I saw only line-final dashes in any column. This came to me as a surprise, but either there is variation or I was previously wrong about this. I don't see any reason against line-initial parenthetical dashes, but if they don't occur or only rarely occur, let's hereby note this observation. If other people have further insight on this, please contribute.
- A: en-dashes functioning as a hyphen-replacement: These of course cannot occur line-initially.
(Now you might ask: Why would anyone want to forbid line breaks on either or both sides of a dash? I can think of two possible reasons: (1) Perhaps someone would want to treat all dashes like hyphens, by visual analogy. (2) One could think of the two hyphens in a parenthetical pair as an "opening hyphen" and a "closing hyphen" and by analogy with other parenthesis types forbid the former line-finally and the latter line-initially. But such a distinction is never made in practice.)
Wikipedia claims en-dash usage for "relationships and connections", but Chicago does not recognize that; this "relationships and connections" usage is IMO newfangled and not generally accepted. But then, note as a disclaimer that I don't agree with all of Chicago. (Because many of its statements are prescriptivist recommendations, but I am trying to be descriptivist. Prescriptivism can be good, but many recommendations I've read - here and elsewhere - are neither descriptively accurate nor backed up by argument.)
Hyphens:
- For whenever most people think hyphens are used. (I could attempt to make an exhaustive list, but I think this doesn't belong here.)
Some thoughts and observations about page ranges:
- I've read in various places that an en-dash is prescribed for page ranges (e.g.
pp.~100--200
), but note that such usage is by no means universal even in the US. Subjectively speaking, I think it is used more in English than in German, though. With this in mind, German usage could be changing, and I also wouldn't be surprised if technical subjects' literature is influenced by (La)TeX conventions. (The (La)TeX community tends to recommend an en-dash for page ranges.) - While some sources recommend an en-dash for page ranges, many sources are silent on this matter. Because an ordinary person might not notice an en-dash (vs a hyphen) in page ranges, I think that de-facto far more people use a hyphen (though this admittedly doesn't prove that it's better to do so).
- In Germany I can say that for sure an ordinary hyphen is predominantly used for page ranges (though this does admittedly leave open the question of what "ideal" typographic practice would be).
- While DIN 5008 supposedly prescribes an en-dash ("Halbgeviertstrich") for ranges, there is no mention of such usage in either the "Grammatik" volume of Duden or the official rules for the reformed orthography. Descriptively speaking, en-dashes between digits are overwhelmingly not used.
- I believe that DIN 1505-2 doesn't mention or require an en-dash around digits
- The en-dash is not taught to be used for juxtaposition of words of equal category (and in English this is also definitely fringe usage), so it shouldn't be obviously so for digit groups either.
- Note that Unicode provides a so-called figure-dash (U+2012), which is designed to be of the same width as a digit. According to some people, this is what everybody "should" be using, but I would like to remind everyone that ultimately usage depends on the information level and acceptance of the various target communities). While I don't know how widespread the figure-dash is (I am fairly sure it is not widespread as of right now), I'd be curious to learn about this community's stance on this.
- The Chicago Manual of Style's (16th ed.) does recommend an en-dash for ranges (6.78) [I disagree with this aesthetically], but its recommendations are consistent with the view that en-dashes (or figure-dashes) do not belong in between numbers in other cases: "A hyphen is used to separate numbers that are not inclusive, such as telephone numbers, social security numbers, and ISBNs." (6.77)
Note that Wikipedia offers some "prescriptivist poppycock" (that expression is taken from the Language Log blog, which I highly recommend): "The figure dash is used when a dash must be used within numbers (e.g. phone number 555‒0199). It does not indicate a range, for which the en dash is used [...]." It is actually quite unclear where these prescriptions come from. I definitely can say that such usage is not widespread, and I do not see an obvious reason: the hyphen doesn't have the same semantic status as a digit, neither in a digit string (where it acts as a separator) nor in a range (of page numbers, units in physics, ...). Semantically the only intuitive rule is that the separator should be wider than the digit separation (that is, wider than zero) and that it should ideally be narrower than other surrounding orthographic divisions, to make clear that the digit group is a close unit (on the phrase structure level). That is, it would be odd to use an em-dash in a page range (since a sentence containing that page range would then have the largest visual separator token be just that em-dash), but by the same reasoning an en-dash is also not appropriate. So, a hyphen seems entirely fine, and I am somewhat surprised that the oft-heard (in the (La)TeX community) statement that ranges necessitate an en-dash isn't supported by either solid references or a good linguistic argument that talks about tokens, orthography, and phrase structure. As for non-range digit groupings: dots and colons can also function as separating elements in a sequence of digits, but noone would ever think about wanting to set them in a monospaced font with equal-to-digit width.
Usage of the em dash, since that's what dmckee's explanation doesn't tackle: it's nicest to think of the em dash as a dramatic pause in a sentence, and it need not be used parenthetically. Chicago (6.87) calls it "the most commonly used and versatile of the dashes", and lists among its uses:
- Amplifying or explaining, e.g., Chicago's example: "She secured the strategy — a strategy that would, she hoped, secure the peace."
- Separating subject from pronoun: "TeX — that was the typesetting program they would use."
- Indicating sudden breaks: "Now we are finished — no, wait, what's this?"
- Used in place of other punctuation: this is the idea that it is a low-precedence subsitute for hyphens, commas, colons or parentheses. It also has other nice properties, such as that it can be used as a comma, together with an exclamation or question mark. Chicago's example: "Only if — heaven forbid! — you lose your passport should you call home."
- To introduce speech, as per Brent's answer.
Don't overuse it! Em dashes stand out more than any other piece of normal punctuation and too many on a page can make the text look absurd. Use it only to add force to your writing, or to solve tricky to parse punctuation problems — and those are often best solved by finding a less complex way of making your point.