Langbahn Team – Weltmeisterschaft

Wikipedia talk:Manual of Style/Archive 207

Archive 200Archive 205Archive 206Archive 207Archive 208Archive 209Archive 210

Rhyme scheme patterns

A rhyme scheme is a pattern that appears in the lines of a poem, and generally letters are used to notate the pattern, for example "ABAB CDCD". Sometimes these sequences are very long, the longest I could find on Wikipedia being "abacabadabacabaeabacabadabacabafabacabadabacabaeabacabadabacaba". There is considerable variation in how this notation is capitalized and punctuated:

  • ABAB
  • "ABAB"
  • abab
  • "abab"
  • "A,B,A,B"
  • "AB AB"

In some cases, the main article says that the notation requires a specific capitalization. For example, "aBaBccDDeFFeGG" distinguishes masculine and feminine rhymes with lower vs. upper case. I think it would be nice, for human readability reasons, to settle on a consistent style for this notation. To me the sequences are nicely distinguished from sentence prose when they are either in quotes, all caps, or both. I'm open to whatever poetry-editing editors want to do, but for the sake of having a starting point for discussion, how about the below? I have asked for input from Wikipedia talk:WikiProject Poetry. -- Beland (talk) 22:00, 25 July 2018 (UTC)


Unless otherwise required by a specific notation, rhyme schemes should generally be written when appearing in prose:

  • In all capital letters
  • Enclosed in double quotation marks
  • Without italics
  • Using spaces to separate groups (not commas or other punctuation)

Example: This poem uses the "ABAB" rhyming pattern.


Why the quotation marks? What information do you think you are conveying by including this extra baggage in the notation? (Also, I suspect that when spaces are used it may often be because they are meaningful, e.g. to separate stanzas. And there are more styles currently in use than you list; e.g. a-b-a, b-c-b, c-d-c. —David Eppstein (talk) 23:28, 25 July 2018 (UTC)
I'm not attached to the quote marks for all-caps sequences. If you parse the sequences as proper nouns, they make sense without them. If you parse them as a sequence of symbols not part of the sentence but being quoted from somewhere else, I would think about them like MOS:WORDSASWORDS, where quote marks are one of the options (and prettier than italics, which I don't see often used for this purpose in poetry articles). If we were doing all-lowercase, then the sequences would blend in with sentences much more easily and I think there'd be a stronger argument for quote marks. Yes, the spaces are meaningful, though without a lot of research I'm not sure I could catalog all the ways people use them, so I tried to be generic. Do you think "stanzas" is a better word than "groups"? I did some searches for the style with dashes and didn't find attestation for that, but that may merely be a limitation or my own misuse of the search engines. I'm open to that or other styles as well if people like them better, though my personal opinion is that dashes just bulk up the string without adding clarity. -- Beland (talk) 19:45, 26 July 2018 (UTC)
Pinging Phil wink who has put a lot of thinking and work into this issue. Also pointing out as of relevance: WP:POETRY#Scansion, Wikipedia talk:WikiProject Shakespeare/Archive 5#Scansion and meter in the sonnets. --Xover (talk) 07:47, 26 July 2018 (UTC)
The linked discussions are beautiful examples of knowledgeable editors working out article content for themselves in a specific and important topic area. Why does MOS need to overbear that? EEng 10:18, 26 July 2018 (UTC)
Well, I'm asking those knowledgeable editors if we can agree on a single style for a given notation, rather than having different styles on different articles due to smaller discussions coming up with different answers. I'm not sure what you mean by the MOS overbearing; if there's a preferred style for something, this is the place to document it (either by explaining in detail or pointing to WikiProject guidelines). The point of project-wide consistency is to make article content easier to digest (e.g. when reading a bunch of different articles about rhyming poems) and to make the project look polished, professional, and credible. -- Beland (talk) 19:45, 26 July 2018 (UTC)
  • I strongly suspect that there are already-published norms about this sort of thing, "out there". If there's consensus among editors who work on poetry material a lot that there's a preferred way to do this (and I'd bet that David Eppstein's suspicion that spacing can be semantically meaningful is correct), maybe we could add something about it in a poetry and lyrics section at WP:Manual of Style/Writing about fiction. This above stuff seems pretty half-baked at this point, though. And scansion and rhyme schemes are not the same thing. The two scansion threads pointed out do seem consistent with each other, probably because WP:POETRY#Scansion is basically a de facto guideline. So, it could be the start of MOS:FIC section on poetry. PS: I agree that adding quotation marks around such markup is pointless. — Preceding unsigned comment added by SMcCandlish (talk • contribs) 16:40:00, 26 July 2018 (UTC)
  • As always:
A. It is an axiom of mine that something belongs in MOS only if (as a necessary, but not sufficient test) either:
  • 1. There is a manifest a priori need for project-wide consistency (e.g. "professional look" issues such as consistent typography, layout, etc. -- things which, if inconsistent, would be noticeably annoying, or confusing, to many readers); OR
  • 2. Editor time has, and continues to be, spent litigating the same issue over and over on numerous articles, either
  • (a) with generally the same result (so we might as well just memorialize that result, and save all the future arguing), or
  • (b) with different results in different cases, but with reason to believe the differences are arbitrary, and not worth all the arguing -- a final decision on one arbitrary choice, though an intrusion on the general principle that decisions on each article should be made on the Talk page of that article, is worth making in light of the large amount of editor time saved.
B. There's a further reason that disputes on multiple articles should be a gating requirement for adding anything to MOS: without actual situations to discuss, the debate devolves into the "Well, suppose an article says this..."–type of hypothesizing -- no examples of which, quite possibly, will ever occur in the real life of real editing. An analogy: the US Supreme Court (like the highest courts of many nations) refuses to rule on an issue until multiple lower courts have ruled on that issue and been unable to agree. This not only reduces the highest court's workload, but helps ensure that the issue has been "thoroughly ventilated", from many points of view and in the context of a variety of fact situations, by the time the highest court takes it up. I think the same thinking should apply to any consideration of adding a provision to MOS.

I'd like to see, at the least, evidence for A1 or A2 before we even think about embarking on such a debate, because if MOS does not need to have a rule on something, then it needs to not have a rule on that thing. EEng 21:55, 26 July 2018 (UTC)

For A1, Rhyme scheme uses both lowercase-in-quotes and uppercase-no-quote styles in its prose, but mostly uses uppercase when explaining the different notations. I think this looks ugly and unprofessional because it is inconsistent, and the inconsistency continues when compared to other poetry pages, found by a moss scan:

-- Beland (talk) 02:23, 28 July 2018 (UTC)

I don't understand what that list is supposed to demonstrate. EEng 02:31, 28 July 2018 (UTC)
Yeah, is this supposed to indicate a consistent stylistic preference, or is this is just a partial list, of a style Beland is objecting to?  — SMcCandlish ¢ 😼  00:27, 31 July 2018 (UTC)
@EEng: This was just to demonstrate that different articles use inconsistent conventions, since you didn't want to make a rule unless there was "evidence" for a need for consistency. Is this what you were asking for? -- Beland (talk)
What are you talking about? All your examples use a single consistent format. EEng 22:40, 6 August 2018 (UTC)
@EEng: Well, as listed above, all the examples are inconsistent with the article rhyme scheme, which is internally inconsistent. And actually it's more complicated than that; the scan downcased all the contents for de-duplication purposes. (Sorry, I forgot it was doing that.) So for example Ballad of Eric actually uses "ababC". If you only read the article, it is a bit unclear why the "C" is capitalized, but rhyme scheme notes sometimes that is done to indicate verbatim repetition of an entire line. But in other cases capitalization is used to indicate gender, so it might help to be more explicit. Some further irregularities: Nachtlied (Reger) uses italics. Ottava rima is internally inconsistent, using both "abababcc" and "a-b-a-b-a-b-c-c". -- Beland (talk) 07:32, 7 August 2018 (UTC)
Beyond EEng's A1/A2/B prerequisites, we're also missing any evidence that Beland has surveyed professional best practices in writing rhyme schemes and has come to the proposal above as a condensation of those best practices, or has brought any subject expertise at all to bear on the issue. The comment above suggesting an unfamiliarity with the word "stanza" is not promising. —David Eppstein (talk) 02:00, 31 July 2018 (UTC)
WT:WikiProject Poetry has been notified of the discussion (by Beland). I've added more notices, to WT:MOSFICT, and to the talk pages of the songs, music, and classical music wikiprojects, to try to attract more participants.  — SMcCandlish ¢ 😼  12:41, 1 August 2018 (UTC)
@David Eppstein: I have not researched professional practices beyond a quick search for what seems to be most popular on other web sites that discuss rhyme schemes, and I have no interest in doing any further research. In general, I have no interest in reading poetry or reading about poetry, so I'll take the grade "not promising" with pride. I just happened to notice that Wikipedia uses this notation inconsistently, and I need some guidance from folks who do care about this stuff so I can correctly program my database scanner and advise other editors how to fix this type of problem. I chose my proposed style based on what looks good to me, as a mostly arbitrary starting point for discussion. If following a professional standard is important to you, feel free to cite one you want us to use. -- Beland (talk) 22:34, 6 August 2018 (UTC)
You are proud of your ignorance of professional practice, and you want to use that ignorance as the basis for making policy decisions here? Do you think that the professionals might somehow have failed to address issues of how to notate this kind of information? Or that the best practices of experts are something to be avoided? That being uninformed makes your aesthetic judgements purer? That as someone not even interested in the subject of poetry, you are going to recognize and address the informational needs that such a notation should provide? That seems an amazingly backwards and anti-intellectual position for a Wikipedia editor to take. —David Eppstein (talk) 22:40, 6 August 2018 (UTC)
You are proud of your ignorance of professional practice, and you want to use that ignorance as the basis for making policy decisions here? – Possibly he's a Trump cabinet secretary editing Wikipedia in his spare time. EEng 02:52, 14 August 2018 (UTC)
@David Eppstein: I am not arguing that ignorance gives me any greater insight into what style should be used; if anything, it gives me less. I'm just saying that as a non-poetry person, I don't really care which style is chosen, as long as we are consistent. I'm providing a default option if no one else cares either. It sounds like you want the style chosen to reflect professional practice of some kind; if so, whose practices do you want us to follow? -- Beland (talk) 23:46, 6 August 2018 (UTC)
I poked through List of style guides; the MHRA Style Guide for example does not address this issue at all. I don't know if MLA Style Manual does, but I couldn't download it for free. If you do a web search for "abab cdcd" you'll see both abab cdcd and ABAB CDCD used in about equal measure. I don't think this comes up often enough to have a generally accepted style for a general audience. If there's a particular journal or something you care to follow, I'm open to suggestions. -- Beland (talk) 00:36, 7 August 2018 (UTC)
If you think that general-purpose style guides were what I meant by the best practices of experts, then all I can say is that you're so ignorant that you don't even know you're ignorant. What I meant (as I thought would have been obvious) were the writings of literary scholars when writing about rhyme schemes. You know, the sort of writing that we should be using as references in our articles about rhyme schemes? Maybe we could find something like this in the sort of textbook that would be used for undergraduate-level poetry classes? As a random and probably too-technical example (not intended to be in any way definitive) this paper on Glück writes "lower case letters stand for masculine clausulae and upper case letters for feminine clausulae. X and x indicate respectively blank feminine and blank masculine clausulae." If we're going to write about style guides for rhyme schemes at all (something I'm still not convinced we need), these sorts of intricacies are something we need to understand, so that we don't end up writing a lobotomized style guide that can only be used for lobotomized articles. —David Eppstein (talk) 01:09, 7 August 2018 (UTC)
@David Eppstein: Well, there's no need to be insulting; I checked general-purpose style guides because they were easy to find and I'm trying to put in some effort to help get this discussion to some sort of resolution. I assumed you were talking about academic journals, which is why I asked if there were any in particular you wanted to argue in favor of. I don't read poetry journals because I hate poetry. Yes, rhyme scheme clearly documents that some notations use a mix of uppercase and lowercase letters. I read that, and I mentioned it in the preface to my proposal, and that's why I included in the proposal "Unless otherwise required by a specific notation". Maybe it wasn't clear what I was getting at? I expect there are a lot of different academic notations (and actually that article lists quite a few) but what Wikipedia seems to use most of the time is the dead-simple notation we used in high school freshman English class, which can be freely styled either in all-uppercase or all-lowercase. If you look at reference 5 on Summum Bonum (poem), you'll see that it uses all-lowercase with no spaces and no punctuation. But if you search Google Books for ababababcd as found in Christis Kirk on the Green you'll see that some academic authors use spaces and some don't. In general, I think the way Wikipedia deals with cases like this where there are heterogeneous external conventions is simply to declare a house style and adapt all content it's quoting to that style. We even do this for direct quotes to some degree: Wikipedia:Manual_of_Style#Typographic_conformity. We don't use single quote marks for articles about British things even though British sources almost always do, because that doesn't fit the house style. Because there are several different variants of this notation, some of which use capitalization and spacing in a meaningful way, an entirely different way to handle this would be to require articles to follow the convention for the particular subnotation listed at rhyme scheme. That would require making that master article internally consistent; for example, right now it uses both "abab" and ABAB CDCD EFEF GHGH to describe traditional rhymes, which should be using the same subnotation. We could also just say something like "Unless differences between uppercase and lowercase letters are being used in a meaningful way, all letters should be uppercase" and "Unless spacing or punctuation is being used in a meaningful way (such as to separate stanzas), patterns should be written without spaces or dashes; spaces are preferred over dashes when indicating groups of lines." -- Beland (talk) 07:32, 7 August 2018 (UTC)
  • I need some guidance from folks who do care about this stuff so I can correctly program my database scanner and advise other editors how to fix this type of problem – How about if you just don't do that? That would save us all a huge amount of trouble. EEng 22:44, 6 August 2018 (UTC)
@EEng: Because that would leave an ugly inconsistency that makes articles harder to read. This should not be a lot of trouble; we just need to make an arbitrary choice between multiple available styles already in use. If this is taking a lot of work, maybe we're over-thinking it. -- Beland (talk) 23:46, 6 August 2018 (UTC)
We also need some way to resolve these cases so that we don't waste editor time examining the same ones over and over again because they keep getting detected as spelling errors. If we resolve them inconsistently, then it seems to me we're wasting the effort that was put into detecting them in the first place and manually resolving them. -- Beland (talk) 00:21, 7 August 2018 (UTC)
{sic} handles the spelling error thing. I'm still waiting for evidence there's any problem here MOS needs to solve. Several people here have expressed exasperation at your stirring so many pots. This project of yours to automate something or other -- is it going to beget an endless stream of these bids to regiment everything big and small? EEng 00:27, 7 August 2018 (UTC)
If we put {{sic}} on all of these before deciding on a style, then once we eventually do decide on a style, we'll have to go back and change half of them. That seems like a lot of wasted work in comparison to just deciding on the style sooner rather than later. If we happen to decide to capitalize them, we won't need {{sic}} at all, and we will also be able to ensure we haven't missed any. I'm sure we'll have more questions as we make progress fixing more known spelling problems. But it won't be endless; there shouldn't be any more style questions to decide than the staff of a print encyclopedia would have to decide, and the MoS and WikProjects have already decided a very large number of questions. I don't know why having two outstanding style rule questions on this page, for a project with 5.6 million articles would be unreasonable, especially as there are 10 other style question discussions also on this page. I don't expect all the same people to care enough about all possible questions, as different people are subject matter experts in different areas. As for "evidence", do you not consider the existing style inconsistencies I've pointed out as a problem that should be solved? -- Beland (talk) 00:53, 7 August 2018 (UTC)

Someone else is going to have to take over. This is hopeless. EEng 03:21, 7 August 2018 (UTC)

@David Eppstein: Did you have any thoughts on the alternatives raised in my comment from 07:32, 7 August 2018, above? -- Beland (talk) 17:52, 12 August 2018 (UTC)

You have made it clear that your proposals in this area are premature and ill-informed. My thoughts are that you should stop pushing this. Why don't you go try creating some actual content instead of approaching style issues for content that you have no experience editing and no input from the content creators on? —David Eppstein (talk) 18:00, 12 August 2018 (UTC)
OK, that felt like a personal attack and not discussion of the merits of the proposal or when a house style should or should not follow sources or any means by which the proposal could be matured. -- Beland (talk) 01:23, 13 August 2018 (UTC)
Just give it up, will you? None of us, including you, has the domain knowledge to even consider imposing some project-wide standard for this; and none of us, except you, thinks there's any need for it. You yourself notified the poetry project and there's been zero response. EEng 01:38, 13 August 2018 (UTC)
[edit conflict] Every time I or anyone else suggests that the whole idea of making this proposal might be premature and a bad idea, you either blow it off or make cosmetic changes and then ping everyone with a comment asking "how about now? do you like it now?". How clearly do I have to say this is a bad idea before you are convinced that I think it's a bad idea and that tweaking it won't make it anything other than still a bad idea? —David Eppstein (talk) 01:40, 13 August 2018 (UTC)
User:Beland, you may find WP:1AM to be helpful. --Guy Macon (talk) 02:49, 13 August 2018 (UTC)
@Beland:, I think a lot of times proposals get better results when the person making the proposal puts in the work to actually make it a good, well-researched proposal. Your strategy was to show up with a not-at-all-baked proposal and then try to get others to do the actual work of building it out. That's often going to be an uphill battle. The fact that other editors don't seem to be convinced that this is an issue worth putting in the MOS does not help at all. CapitalSasha ~ talk 03:03, 13 August 2018 (UTC)
I honestly don't think this sort of thing needs all that much work put into it, especially if no one has strong feelings about it. My main concern was running into a poetry editor community that had strong feelings and would revert unless we followed a already-favored style, or that had contradictory feelings we'd need to hash out, but that doesn't seem to exist. We don't need to specify an outcome for every conceivable circumstance, it's perfectly possible to say "here's what to do for the elementary school cases" but "use your judgment and consider sources for anything other than that". It doesn't seem very fair to me for folks also not part of a poetry-editing community to say "I think we need a much more complicated guideline that follows academic conventions" of which there are multiple, refuse to engage in discussion of which academic conventions we might want, and derail a much simpler guideline. If people don't want this to be part of the MoS, would they prefer it be handled at Wikipedia:WikiProject Poetry? Would they prefer I edited a hundred poetry articles, see if anyone objects, and then come back and summarize what I did? -- Beland (talk) 06:15, 13 August 2018 (UTC)

Well, I have a practical problem in front of me I'm trying to solve, which will require investing some work, regardless of the outcome of this discussion. David, you expressed a concern that any guideline Wikipedia puts out would need to follow academic sources. That's an actionable concern. First, it's a debatable question that has pros and cons that can be explored. We also started to look at potential sources to follow, which I think are inconsistent enough to say we can set our own style to some degree. We currently disagree about how to proceed; the only way I know how to come to consensus is to discuss arguments pro and con on their merits and compare opinions and try to find compromise. There are also alternatives to modifying the Manual of Style; for example, I could advise editors who are looking at these instances to check articles for sources and see if there is a particular style being used for a particular reason, and we could develop practices bottom-up rather than top-down. I was hoping to get at least local consensus before posting an RFC, but if neither of you wish to attempt to reach consensus among us, I can just go ahead and open that now and see what other editors have to say. -- Beland (talk) 03:09, 13 August 2018 (UTC)

Pardon me, what what "practical problem" do you "have" to solve? EEng 06:00, 13 August 2018 (UTC)
The problem I'm trying to (not "have to") solve is classifying these sequences as correctly or incorrectly spelled words, correctly or incorrectly styled rhyme schemes, or something else, and taking appropriate action. I've managed to find a way to separate them out algorithmically, and it turns out there are less than 300. I will take a look at the articles individually myself. -- Beland (talk) 15:34, 15 August 2018 (UTC)
Sorry I misquoted you. Classify them as "something else" and move on. You are getting a lot of people pretty goddam angry with your OCD cluelessness. If you insist on trying to bring 300 articles into line with your idea of a required consistency, I recommend you wait until July 4 of next year since the fireworks will be entertaining. EEng 02:33, 16 August 2018 (UTC)
Leaving things as they are creates several problems, including being unable to reliably search for what Wikipedia knows about a given rhyme scheme and poems that use it (e.g. searching for a-b-a-b won't find ABAB or A,B,A,B), and confusion for readers about what the notation means (as I've noticed on talk pages and external forums discussing poetry and Wikipedia articles about rhyme schemes). I think it would be insensitive to anyone with the condition to compare myself to someone diagnosed with obsessive–compulsive disorder, which has really seriously ruined some people's lives. I hope editors will find more joy than anger in collaboratively producing a more polished, professional, useful encyclopedia. -- Beland (talk) 07:12, 17 August 2018 (UTC)
FTR, I made rhyme scheme internally consistent based on the preferences expressed here so far, and added an explanation of the notation there so readers aren't confused by what it means. The explanation ended up being complicated by notations for internal rhyme and rhyming refrains (for which I imported superscript notation (Villanelle). There may be further evolution as I slowly sync up articles that link to rhyme scheme and essentially use it as a legend, or if any of the editors of those articles express preferences. If a stable style emerges, I'll follow up with WikiProject Poetry. Feel free to ping me from Talk:Rhyme scheme or whatever if you're following along. -- Beland (talk) 07:12, 17 August 2018 (UTC)
Don't call us, we'll call you. EEng 23:03, 17 August 2018 (UTC)

HTML entities

Greetings all, I'm currently updating the style-checking code that reports to Wikipedia:Typo Team/moss, and I need some clarity on which HTML character entity references (things like &) are allowed or preferred. Variations that are not allowed or which are disfavored would be brought to the attention of human editors, along with other suspected style and spelling errors. There are occasional mentions of such entities in the Manual of Style, but no general rules that I could find. I would propose the following:

HTML character entity references

(edited to reflect the below comments)

HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name, decimal number, or hexadecimal number. For example, "€" is the same as "€", "€", or including the character "€" directly. For a comprehensive list, see List of XML and HTML character entity references. Wikipedia editors are encouraged to follow these guidelines to make it easier for editors to read and understand wikitext, especially those not familiar with HTML notation.

  • In general, it is preferable to write characters directly instead of using an HTML entity reference. Wikipedia stores articles with Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the Unicode input method provided by their operating system.
  • Numeric references should not be used when there is a named reference available. For example, − should be used instead of −
  • References must be used when the character itself cannot be used for technical reasons. For example, "]" cannot appear in wikilinks that use "[[" and "]]" to mark the start and end. The <nowiki> tag can also be used to prevent interpretation of special characters as wiki markup.
  • Named references are preferred when the characters themselves are easily confused. This includes:
    • Whitespace. The regular ASCII space " " should be typed directly, but entities should be used for others like "&nbsp;" and "&ensp;".
    • Dashes and similar characters. The regular ASCII hypen-minus "-" should be entered directly, but other characters might be entered with entities. For example, &minus; is generally preferred because "−" looks very similar to "-" in some web browsers. See Wikipedia:Manual of Style § Dashes for more usage guidelines.
    • Prime (′) and related symbols that resemble quote marks
  • Other guidelines ask that the Unicode characters not be used at all (except when the character itself is being discussed):

Initial discussion

What do folks think? -- Beland (talk) 19:39, 14 July 2018 (UTC)

  • Another set of characters to avoid are the superscript-digits (at least when used with a mathematical meaning). See MOS:MATH#Superscripts and subscripts. —David Eppstein (talk) 19:46, 14 July 2018 (UTC)
  • I disagree that mdash isn't easily confused -- in some fonts it definitely is. I'd pretty much advocate that everything not on a standard English keyboard (whatever the "standard English keyboard" is) should be symbolically represented by either a & form or a template. And I'm a little worried that the typo team link at the start of the OP talks about flagging "violations of the Wikipedia:Manual of Style"; I fear this will slide all too easily into a project to blindly "fix violations". EEng 20:06, 14 July 2018 (UTC)
    @EEng: OK, I'll drop the emdash example. As for scope...well, this is already a project to fix violations of the Manual of Style and English spelling and grammar, though it's never done blindly. In some cases it would be safe to make a bot to make certain substitutions (like converting numerical to named references), but that would require approval by Wikipedia:Bot requests to make sure it didn't have any unwanted side effects. Not sure why that is something to be afraid of; if we think a certain form is better for editors, that seems useful. We don't do that for spelling mistakes because there could be a good reason to keep the misspelling. Could you explain a bit why you feel it's better for an editor to come across say, &trade; instead of ™ when opening an article for editing? -- Beland (talk) 21:00, 14 July 2018 (UTC)
    I'm fine with replacing numerical refs and &trade; and so on; in fact I welcome it because, as I mentioned, I generally think everything not on standard keyboards should be expressed symbolically in the wiki source. Its the vague statement at Wikipedia:Typo_Team/moss that you're gonna find "violations of the Wikipedia:Manual of Style" that worries me. I don't mind automatically identifying apparent "violations", but what worries me is that that might slide into automatic "fixes" – worried because MOS isn't rigid, it needs to be applied with common sense, exceptions apply, etc. EEng 21:24, 14 July 2018 (UTC)
    Re replacing characters with entities or the reverse: what I don't want to see is slow-motion edit wars where one group of editors or bots regularly replace characters by entities and a different group regularly replace entities by characters. That sort of thing just clutters watchlists for no good reason. So I'd rather either see a very clear specification of which things should be expanded and which should be left as unicode (probably difficult to attain consensus for) or (more likely) something like WP:RETAIN where edits of this type are discouraged. —David Eppstein (talk) 21:32, 14 July 2018 (UTC)
    Absolutely agree. A hard-won consensus in advance will consume 1/1000 the editor time and energy wasted on a zillion skirmishes and rage-reverts all over the project. And certainly some part of that consensus might be that some things come under RETAIN (though honestly the less RETAIN stuff we have the better). EEng 21:35, 14 July 2018 (UTC)
    An explicit list would be great for me, since I have to code that into software anyway. I'll whip up a table. FTR, as of April there were a grand total of 7 numerical references the moss software could find, and I changed all of them just now. -- Beland (talk) 01:15, 15 July 2018 (UTC)

The proposal should be revised to make it clear how it relates to the advice already in the MOS at WP:MOS#Keep markup simple,

An HTML character entity is sometimes better than the equivalent Unicode character, which may be difficult to identify in edit mode; for example, &Alpha; is explicit whereas Α (the upper-case form of Greek α) may be misidentified as the Latin A.

Also the proposal should indicate where this addition would go into the MOS; context matters.

The proposal contains the statement "The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards." That's only partially true; in the version I use, there are a variety of special characters to choose from, but when I hover over them, there isn't any little hint that pops up telling me what the name of the character is. So it is hard to be sure if a character is an n dash or a minus. In another case, it's hard to tell a prime from an apostrophe. I've learned to tell an n dash from a hyphen, but I'll bet there's lots of editors who can't. Jc3s5h (talk) 22:18, 14 July 2018 (UTC)

Hmm, thumbnails for special characters would make a great feature improvement for the web UI. I agree it's a bit of a pain; I always have to paste characters into a search engine to figure out what they are. If we're making a big table of what should be which, maybe it would need to be on its own subpage? I'm agnostic as to where this goes, and I'm open to suggestions; I don't think it matters as long as it's easy to find. -- Beland (talk) 01:15, 15 July 2018 (UTC)
FTR, I have filed a feature request for the popup text to include the character name at [1] for anyone who wants to comment or follow along at home. Thanks for the suggestion! -- Beland (talk) 06:57, 16 July 2018 (UTC)

Second draft

(Edited to reflect the below discussion)

HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name, decimal number, or hexadecimal number. For example, &euro; is the same as &#x20AC;, &#8364;, or including the character directly. For a comprehensive list, see List of XML and HTML character entity references [2].

In choosing between the numeric reference, named reference, and direct character methods, Wikipedia never uses the numeric reference when a named reference is available, and it usually prefers direct character input over named references (and edits in this direction are made by semi-automated systems like AutoWikiBrowser). For example, &minus; should be used instead of &#8722;, and é should be used instead of &eacute;. Wikipedia stores articles with Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the Unicode input method provided by their operating system. There are some exceptions where named references are preferred, to avoid confusion and to circumvent technical limitations. The <nowiki> tag can also be used instead of character escaping to prevent interpretation of special characters as wiki markup. These preferences are detailed in the table below, and some instances where a given character is preferably not used at all (except where that character is itself the topic of discussion) are noted. Wikipedia editors are encouraged to follow these guidelines to make it easier for editors to read and understand wikitext, especially those not familiar with HTML notation.

Category Preferred forms Exceptions and notes
ASCII characters ! " % & ' + < = > [ ] Sometimes proximity to other characters causes misinterpretation of &, <, >, [, ], or ' as part HTML markup or wiki markup. In these cases, use &amp;, &lt;, &gt;, &#91;, &#93; or &apos;.
Latin and Germanic letters À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ Œ œ Š š Ÿ Instead of ligatures (Æ, æ, Œ, œ) write two separate letters, except in proper names and in text in languages in which they are standard – see Wikipedia:Manual of Style § Ligatures.
Greek letters Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω ϑ ϒ ϖ When written standalone (not part of a Greek word with other Greek characters), the following can be used to reduce confusion with similar-looking Latin alphabet letters: &Alpha; &Beta; &Epsilon; &Zeta; &Eta; &Iota; &Kappa; &Mu; &Nu; &Omicron; &Rho; &Tau; &Upsilon; &Chi; &kappa; &omicron; &rho;. μ (mu) and Σ (sigma) are nearly identical to µ (micro) and ∑ (sum), but the other characters are not used in Wikipedia so there is no potential for confusion.
Quote marks &lsquo; &rsquo; &sbquo; &ldquo; &rdquo; &bdquo; &acute; &prime; &Prime; ASCII quote marks are generally preferred. Wikipedia:Manual of Style/Dates and numbers § Specific units says not to use &prime; and &Prime; for inches and feet.
Dashes –/&ndash; —/&mdash; &horbar; &shy; &horbar; is not used by Wikipedia. For more info on &shy; (optional hyphen) see MOS:SHY.
Whitespace and non-printing &nbsp; &ensp; &emsp; &thinsp; &zwnj; &zwj; &lrm; &rlm; &ensp;, &emsp;, &zwnj;, and &zwj; are generally unnecessary. For more info on text direction, see MOS:RTL.
Math × ÷ √ ∝ ∝ ¬ ± ∂ ∇ ℵ ℜ ℑ ℘ ∀ ∃ ∈ ∉ ∋ ∅ ∏ ∑ ∠ &and; (∧ confused with ^) &or; (∨ confused with v) ∩ ∪ ∫ ∴ ∼ ≅ ≈ ≠ ≡ ≤ ≥ ⊂ ⊃ ⊄ ⊆ ⊇ ⊕ ⊗ ⊥ ⌈ ⌉ ⌊ ⌋ &lang; (⟨ confused with <) &rang; (⟩ confused with >) In some cases TeX markup is preferred to Unicode characters; see Wikipedia:Manual of Style/Mathematics § Typesetting of mathematical formulae. × (&times;) is used in article titles and also for hybrid species. ∑ (sum) should not be used; Wikipedia uses the nearly identical Σ (sigma).
Currency ¢ £ ¤ ¥ € $
Non-English punctuation ¿ ¡ « » &lsaquo; &rsaquo; &lsaquo; and &rsaquo; are not used by Wikipedia; < and > can be used instead.
Dots &middot; &bull; &sdot; "..." is preferred to "…" - see MOS:ELLIPSIS. Wiki markup should be used instead of these for lists; see Wikipedia:Manual of Style/Lists § List layout.
Diacritics ¨ ¸ ‾ ˜ ˆ
Arrows ← ↑ → ↓ ↔ ↵ ⇐ ⇑ ⇒ ⇓ ⇔
Other symbols ¦ § © ® ™ ° µ ¶ † ‡ ƒ ‰ ◊ ♠ ♣ ♥ ♦ µ (micro) is not used by Wikipedia; use μ (lowercase Greek letter mu) instead - see Wikipedia:Manual of Style/Dates and numbers § Specific units
Superscript and subscript ¹ ² ³ ª º Do not use Unicode subscripts and superscripts like these for numbers, per Wikipedia:Manual of Style/Superscripts and subscripts; use <sup> and <sub> instead.
Fractions ¼ ½ ¾ &frasl; These are not used unless discussing the characters themselves; for alternatives, see Wikipedia:Manual of Style/Dates and numbers § Fractions and ratios


Above is is a draft of a definitive list of whether the HTML reference or the character itself should be used, as suggested by other editors above. I noticed a few things:

  • Both the characters and the references are widely used for endash and emdash; allow both for now?
  • mu and micro are rarely if ever used in the same context; the direct form seems preferable? Same for sum and sigma?
  • ∼ (&sim;) and ~ (ASCII tilde) seem to be used interchangably but &sim; itself is used very rarely.

-- Beland (talk) 08:12, 15 July 2018 (UTC)

  • usually prefers direct character input over named references – That's too sweeping. I can see this is gonna take a lot of discussion. For starters, pinging David Eppstein for his thoughts on literal or symbolic for math symbols (not meaning to imply there's one simple answer to that). Not pinging SM because he'll find his was here without doubt and his user name is too hard to get right and it's late and I'm tired. EEng 08:32, 15 July 2018 (UTC)
    • I think it's very important to spell out &minus; as otherwise it's too difficult to distinguish from &ndash. Otherwise I don't feel strongly but I know I have seen legions of random AWB users replace &times; (e.g.) by its unicode character. So we should not encourage replacements that go the other way. —David Eppstein (talk) 16:30, 15 July 2018 (UTC)
    • @EEng: Well, if I'm counting right, out of the 252 named references, in 28 instances (11.1%), the proposal is recommending to use the reference over the character itself, and in 27 instances (10.7%) it's either not making a recommendation or different options are used in different circumstances. That leaves 78.2% of the time where the character itself is being recommended over the named reference. That seems to qualify as "usually"; am I missing something? -- Beland (talk) 21:55, 15 July 2018 (UTC)
      You're counting entries in the table; I'm counting occurrences in the wild i.e. I'd wager that the population of ndash + mdash in articles is greater than that of all those other characters put together, and those two should always be coded by name or template, IMHO. EEng 02:37, 16 July 2018 (UTC)
      @EEng: Ah, would it make more sense to say "for most characters prefers" rather than "usually prefers"? -- Beland (talk) 02:42, 16 July 2018 (UTC)
      At this point I don't know if anything needs to be said at all. I'm a bit unclear about something. Right now much or most of this advice, to the extent it's somewhere in MOS, is distributed among the various relevant sections. You're not proposing to insert this giant table somewhere, are you? Because then it will be in two places which will need to be kept in sync. EEng 03:36, 16 July 2018 (UTC)
  • WP:MOSNUM always uses the Greek letter mu or the html entity &mu; as the metric prefix for micro. I know some Unicode characters were created for obscure reasons such that Wikipedia has no interest in using those characters; I infer from it's low numerical code value &micro; (U+00B5, µ) exists as a way of coding the micro symbol that was used in some pre-Unicode character codes that didn't provide for most Greek letters, to permit round-tripping between those older character codes and Unicode. According to the Unicode Consortium, the Greek letter character is preferred,[1]. Maybe use the Greek letter mu directly, whether in a Greek word, the archaic stand-alone symbol for micrometer, or the metric prefix, and explicitly encourage editors to replace µ (U+00B5) with μ (U+03BC). Jc3s5h (talk) 10:31, 15 July 2018 (UTC)
  • As a comment, is convenient in templates when you want a whitespace. --Izno (talk) 21:58, 15 July 2018 (UTC)
    • Ah, this points out to me that the regular space (which is U+0032) actually doesn't have a named reference, so it probably doesn't belong on this chart.

References

  1. ^ Beeton, Barbara; Freytag, Asmus; Sargent, Murray III (30 May 2017). "Unicode® Technical Report #25". Unicode Technical Reports. Unicode Consortium. p. 11. {{cite web}}: Cite has empty unknown parameter: |dead-url= (help)

EEng made a good find, that &dollar; was missing. It turns out that this is because List of XML and HTML character entity references only goes up to HTML 4, and HTML 5 has a ton more, listed here. Given the length of the resulting table if we include all of them, maybe we should just say "use the character itself except for those listed below" and list the ones where named references should be used? (And maybe continue to list the characters that should not be used at all?) -- Beland (talk) 03:53, 16 July 2018 (UTC)

I still don't understand why, to a first approximation, we're not saying that everything other than a-zA-Z0-9`~!@#$%^&*()-_=+[]{};':",./<>? should be given via &foo; or {some template}. Also, the table mixes advice on how to express various characters with advice on whether and when to use various characters. Not saying that's bad, just worth noting. EEng 04:12, 16 July 2018 (UTC)
I think accented Roman letters should certainly be written as e.g. á not &aacute;. More generally I am in favor of using unicodes over html entities or templates in most cases, with exceptions for characters like &amp; (when written next to something that would cause it to expand to a different entity) or &minus; (because there is too much possibility for confusion with other dash-like characters). Also, as an aside, the text above about avoiding ligatures is too strong; when these characters occur in the standard spelling of a name (e.g.), we should write them that way even when we are writing in English. —David Eppstein (talk) 04:25, 16 July 2018 (UTC)
Re accented Romans, I did say "to a first approximation". Re ligatures, the text says "except proper names" -- is that not enough? EEng 05:05, 16 July 2018 (UTC)
I did a quick database check, and as of April 2018, – is more popular than &ndash; by a ratio of about 10.6:1.
My thought on combining "how" and "whether" is that it's entirely likely the answer to the question "how do I put this character into Wikipedia?" is "please don't, use this other one", so having it all in one place is handy. -- Beland (talk) 05:28, 16 July 2018 (UTC)
The fact that literal ndash is 10X as common as symbolic just shows how much work we have to do -- in my edit window it's very hard to tell ndash from hyphen or mdash unless they're next to each other. I'm fine with combining both kinds of advice, though (again) I'm not sure what exactly where this big table is gonna go. EEng 06:38, 16 July 2018 (UTC)
Well, you were using your guess that the numbers were the other way around as an argument for a wording change. The current preponderance might be evidence that most editors prefer the raw characters, or maybe it's just what people do because the UI is designed to encourage that. That fact that the UI is the way that it is may be an indication that there is not great support for using &ndash; and friends. I can generally tell the difference between dashes of different lengths, though if some people can't, that may be an indication that it just doesn't matter that much. In any case, given the lack of consensus on this, the current proposal is to remain neutral on the choice for ndash and mdash, and let editors decide on a page-by-page basis. In contrast, for other characters like ∀ and °, which can be clearly distinguished by everyone, I haven't heard a good argument for why those shouldn't just be used directly. -- Beland (talk) 23:26, 17 July 2018 (UTC)
  • Well, you were using your guess that the numbers were the other way around – No, you're mixing up two different things. I conjectured that ndashes and mdashes, together, make up the bulk (counting each use separately) of all these not-on-the-keyboard characters; that was without regard to how those characters were expressed (literal vs. symbolic).
  • that the UI is the way that it is may be an indication that there is not great support for using &ndash; and friends – WP's facilities and interfaces are full of debris that's little used or even "impossible" to use (e.g. template parameters that want to present information that an RfC has determined should never be presented). Trying to infer how things are spozed to be based on things you see in the UI will get you way off track very, very fast.
  • I can generally tell the difference between dashes of different lengths – So can I, easily in the rendered page, but in the wikisource only with a bit of effort, if I make a point of looking. It's that last bit that's the rub: in the rendered page an ndash vs. mdash look like – vs. —, but in the wikisource they're much more similar i.e. vs. . (What you see in that sentence may depend on your skin, so your mileage may very.) Thus it's easy in copyediting to not notice that the wrong one is present, and that's why symbolic names should be used instead. (If we really cared we'd suggest that hyphens be rendered as &hyp; as well. I actually tried that once in an article but got laughed off the stage, so we'll just have to live with using the literal -. What I usually do is when I see e.g. a date range like 1899-1920, I just change the literal hypheny-dashy thing that's there to &ndash, so that I know it's the right thing.)
  • I haven't heard a good argument for why those shouldn't just be used directly – Clearly a quotation in a language using a non-Roman script should just present that text literally. For everything else, there are a lot of pros and cons relating to how many different special symbols are used (in a given article), the extent to which each one is used repeatedly, how potentially confuse-able they are for one another or for something else not even used on the page page, the likely sophistication of editors who might work on the article, and a lot more. Here's a random example: WP:MOSNUM says arcminutes should be denoted by a prime and not an apostrophe or a single quote i.e. ′ but not ‘ or ' . Once again, you have to be looking to notice if the wrong one is there; thus MOSNUM suggests that the markup &prime; be used to save editors squinting. Unfortunately different considerations come into play for different symbols, so separate analyses are needed in each case. That's why I predicted this discussion would take a long time.
EEng 03:32, 18 July 2018 (UTC)

As for the general direction of the advice, using characters directly seems to be the recommended best practice for web development generally. It's more WYSIWYG and easier for web editors to read and think about. It also fits the goal of not forcing editors to learn HTML in order to be able to use Wikipedia; they can just input and edit these characters in the same way they do elsewhere like Word or phone apps or other web sites. We also have a UI right below the text-being-edited box which encourages people to add the characters directly; it would be weird if the advice is to generally use the references because that's not what the system is designed to encourage. The escaping system was originally designed to allow input of special characters that were part of SGML or HTML itself (like angle brackets). Later it became a way to work around the limitations of ASCII. But modern web sites all use Unicode now, as does Wikipedia, so it's a bit of an obsolete workaround. I think any system where you have to learn a special language for telling a computer something is less user-friendly than a system where you can express your intention in the way you would express it to other humans. -- Beland (talk) 06:29, 16 July 2018 (UTC)

I think we should treat it like citations: citations are hard, both inside Wikipedia and outside. Just see what happens in any university freshman humanities class where citation expectations are rigorously enforced for the first time in most student's life. So at Wikipedia we're satisfied if the first editor gives some way to find the source; gnomes can improve the citation format later. And the tools to do the improvement exist.
Similarly, editors who are not skilled with markup can do the best they can with the visual editor and other editors can improve it. The editors who make the improvements need the tools to do so, and bots must not overrule their contributions by converting html entities to characters.
The idea that you can write documents and web pages with purely WYSIWYG tools is only true if you're writing some thing simple, or you're a slob. That's why Microsoft Word has a little paragraph symbol so you can turn on the display of paragraph marks. That's why WordPress has two editing tabs, WYSIWYG view, and HTML view. The Wikipedia editors are quite primitive, hence the need for HTML entities continues. Jc3s5h (talk) 10:54, 16 July 2018 (UTC)
I agree contributions of new editors should be welcomed whetehr or not they follow this sort of guideline; I added language to that effect in the draft. -- Beland (talk) 23:26, 17 July 2018 (UTC)

General comment This discussion may affect WP:CHECKWIKI error 11. The error is currently disactivated. -- 11:10, 16 July 2018 (UTC)

  • A couple of quick responses:
    1. Wrap the table's characters-as-such, not just the HTML character entities, with <code>...</code> or perhaps with {{kbd}}, whatever looks better (semantically, it can be either – it's code when viewed in the wikitext but also input when you're entering it). If we don't like any of the faint-background effects, use bare <kbd>...</kbd>, which just uses monospace. I would go with <code> because the table already uses a light grey and it blends in well, while also not requiring any template calls.
    2. That for which we're providing entity codes should also be shown as characters.
    3. That for which we're showing characters but recommending/allowing entity codes should also be shown as those codes.
    4. "ASCII characters": Present the characters in the same order as the codes in the later column.
    5. "Greek latters: Change "but the other characters are not used" to "but these latter two characters are not used".
    6. "Dashes": This is a misuse of the slash character and and results in confusing typographical gibberish: "–/&ndash; —/&mdash;". Try: "– (&ndash;), — (&mdash;),". Also, "For more info on ­ (optional hyphen) see MOS:SHY" is a misuse of parentheses (round brackets), seeming for some kind of emphasis. Should just remove them.
    7. "Whitespace and non-printing": should also including &hairsp;; like &thinsp; it is generally only used for kerning in templates and such; there is usually not any reason to manually insert either into an article.
  • # "&lsaquo; and &rsaquo; are not used by Wikipedia; < and > can be used instead" is wrong; the are not the same character and should not be confused. If we need to illustrate French quotation style, etc., use the correct characters, not lesser-than and greater-than, which serve an entirely different purpose. This is pretty much exactly like hyphen vs. dash vs. minus.
 — SMcCandlish ¢ 😼  07:28, 17 July 2018 (UTC)
The weird "shy" line was due to a typo preventing &shy; from showing up at all. I fixed that. You're right about lsaquo; I must have messed up something when scanning the database for it. I'll change that and other points you mention in the next draft, as applicable. Thanks for reading! -- Beland (talk) 00:37, 18 July 2018 (UTC)

Third draft

Posted to Wikipedia:Manual_of_Style/Text_formatting#HTML_character_entity_references

Proposed as new subsection titled "HTML character entity references" under Wikipedia:Manual of Style § Miscellaneous, replacing the second paragraph of "Keep markup simple".

HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name, decimal number, or hexadecimal number. For example, &euro; is the same as &#x20AC;, &#8364;, or including the character directly.

On Wikipedia, characters should be used directly unless doing so is confusing for editors or causes technical problems. Numerical references should not be used if a named reference is available. For example, &minus; should be used instead of &#8722;, and é should be used instead of &eacute;. Edits favoring these conventions are made by semi-automated systems like AutoWikiBrowser. For a comprehensive list of available named references, see [3].

Wikipedia stores articles with Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the Unicode input method provided by their operating system. There are some exceptions where named references are preferred, to avoid confusion and to circumvent technical limitations. The <nowiki> tag can also be used instead of character escaping to prevent interpretation of special characters as wiki markup.

Characters to avoid |
Avoid Instead use Note
(&hellip;) ... (i.e. 3 periods) See MOS:ELLIPSIS.
Unicode Roman numerals like Latin letters equivalent (I II i ii) MOS:ROMANNUM
Unicode fractions like ¼ ½ ¾ &frasl; {{frac}}, {{sfrac}} See MOS:FRAC.
Unicode subscripts and superscripts like ¹ <sup></sup> <sub></sub> See WP:SUPSCRIPT. In article titles, use {{DISPLAYTITLE:...}} combined with <sup></sup> or <sub></sub> as appropriate.
µ (&micro;) μ (&mu;) See MOS:NUM#Specific units
Ligatures like Æ æ Œ œ Separate letters (AE ae OE oe) Generally avoid except in proper names and text in languages in which they are standard. See MOS:LIGATURES.
(&sum;) (&#8719;) (&horbar;) Σ (&Sigma;) Π (&Pi;) (&mdash;) (Not to be confused with \sum and \prod, which are used within <math> blocks.)
(&lsquo;) (&rsquo;) (&sbquo;) (&ldquo;) (&rdquo;) (&bdquo;) ´ (&acute;) (&prime;) (&Prime;) ` (&#96;) Straight quotes (" and ') Use {{coord}}, {{prime}} and {{pprime}} for mathematical notation; elsewhere use straight quotes unless discussing the characters themselves. See MOS:QUOTEMARKS.
(&lsaquo;) (&rsaquo;) « (&laquo;) » (&raquo;) Use &lang; and &rang; for math notation. In non-English quotations normalize angle quote marks to straight, per MOS:CONFORM, except where internal to non-English text, per MOS:STRAIGHT.
&ensp; &emsp; &thinsp; &hairsp; Normal space These are sometimes used for precision positioning in templates but rarely in prose, where non-breaking (&nbsp;) and regular spaces are normally sufficient. Exceptions: MOS:ACRO, MOS:NBSP.
In vertical lists

(&bull;) · (&middot;) (&sdot;)

* Proper wiki markup should be used to create vertical lists. See HELP:LIST#List basics.
&zwj; &zwnj; see note Used in certain non-English language words, see zero-width joiner/zero-width non-joiner. Should be avoided elsewhere.
£ for GBP, keep ₤ for Italian Lira and other lira currencies that use ₤ (see the main article for that currency) MOS:CURRENCY; find broken instances
Potentially confusing or technically problematic characters |
Category coded form (direct form) Notes
Miscellany &amp; (&) &lt; (<) &gt; (>) &#91; ([) &#93; (]) &apos; (') &#124; (|) Use these characters directly in general, unless they interfere with HTML or wiki markup. Apostrophes and pipe symbols can alternatively be coded with {{'}} and {{!}} or {{pipe}}. See also character-substitution templates and WP:ENCODE.
Greek letters &Alpha; (Α) &Beta; (Β) &Epsilon; (Ε) &Zeta; (Ζ) &Eta; (Η) &Iota; (Ι) &Kappa; (Κ) &Mu; (Μ) &Nu; (Ν) &Omicron; (Ο) &Rho; (Ρ) &Tau; (Τ) &Upsilon; (Υ) &Chi; (Χ) &kappa; (κ) &omicron; (ο) &rho; (ρ) In isolation, use coded forms to avoid confusion with similar-looking Latin letters; in a Greek word or text, use the direct characters.
Quotes &lsquo; () &rsquo; () &sbquo; () &ldquo; () &rdquo; () &bdquo; () &acute; (´) &prime; () &Prime; () &#96; (`) Can be confused with straight quotes (" and '), commas, and with one another. MOS:STRAIGHT generally requires conversion to straight quotes, except when discussing the characters themselves or sometimes with non-English languages. See next row for prime characters.
Apostrophe-like ' ` ´ ʻ ʼ ʽ ʾ ʼ ʽ ʻ ʼ
Dashes, minuses, hyphens &ndash; () &mdash; () &minus; () - (hyphen) &shy; (soft hyphen) Can be confused with one another. For dashes and minuses, both forms are used (as well as {{endash}} and {{emdash}}). Soft hyphens should always be coded with the HTML entity or template. Plain hyphens are usually direct, though at times {{hyphen}} may be preferable (e.g. Help:CS1#Pages). See MOS:DASH, MOS:SHY, and MOS:MINUS for guidelines.
Whitespace &nbsp; &emsp; &ensp; &thinsp; &hairsp; &zwj; &zwnj; In direct form these are nearly impossible to distinguish from a normal space. See also MOS:NBSP.
Non-printing &lrm; &rlm; In direct form these are nearly impossible to identify. See MOS:RTL.
Mathematics-related &and; () &or; () &lang; () &rang; () Can be confused with x ^ v < >. In some cases TeX markup is preferred to Unicode characters; see MOS:FORMULA. Use {{angbr}} instead of ) / ()
Dots &sdot; () &middot; (·) &bull; () Can be confused with one another. Interpuncts (&middot;) are common in horizontal lists and to indicate syllables in words. Multiplication dots (&sdot;) are used for math. In practice, the dots are used directly instead of the HTML entities.

Discussion of third draft

FTR, as of the July 1, 2018 database dump, &lsqb; is used about 329 times and &lbracket; is used about 91 times, so I picked the more common one. -- Beland (talk) 15:04, 18 July 2018 (UTC)

  • While I still have my reservations about where this is going and the amount of effort it will take to iron all the bugs out, I'm warming up to this. EEng 15:35, 18 July 2018 (UTC)
  • The table asserts the &Prime; html entity resembles the ASCII backtick (`), and even have something displayed that looks like a backtick. But this is the real result of the &amp:Prime; html entity: ″. The table is just a mass of stuff and I wouldn't be able to find anything in there to make corrections. Jc3s5h (talk) 16:46, 18 July 2018 (UTC)
    • @Jc3s5h: Sorry, the backtick was missing from the second table; I just fixed that. It was rather exhausting to catalog everything and try to format it properly, so I didn't get a chance to double-check things. You're right about it being hard to read, so I also put each character in the second table on its own line, to make matching up characters and references easier. Is that clear enough now? Is it making the table too long? -- Beland (talk) 23:21, 18 July 2018 (UTC)
      • In the table, as rendered, &Prime; appears twice. Each time the character next to it is `, which is U+0060 and is named GRAVE ACCENT. But this is wrong; it should look like a double prime and is U+2033. It is used to mark seconds of time or seconds of arc; a backtick is completely wrong for that. Jc3s5h (talk) 00:04, 19 July 2018 (UTC)
Mostly looking good. It would put this at the bottom of MOS:TEXT, probably. Maybe in a section called "Unicode characters". We could see about cross-referencing it in various places.  — SMcCandlish ¢ 😼  02:07, 19 July 2018 (UTC)
Gave the boxes a spinshine/reorganization. Headbomb {t · c · p · b} 18:29, 19 July 2018 (UTC)

I posted this to Wikipedia:Manual_of_Style/Text_formatting#HTML_character_entity_references (there's another section there that talks about Unicode PUA and RTL characters) and cross-referenced from Wikipedia:Manual of Style § Miscellaneous. Feel free to edit the live version as needed. -- Beland (talk) 05:56, 20 July 2018 (UTC)

And thanks to everyone for greatly improving this section from the initial draft! It will be a great help to me in writing the code that will flag less-than-clear usage. -- Beland (talk) 05:57, 20 July 2018 (UTC)

Might be worth adding a comment in the Greek notes that the same sort of thing applies to Cyrillic letters that look like Latin and Greek ones; use the entity codes for clarity when discussing particular characters, but use the Unicode in actual Russian, Ukranian, etc. words. We probably needn't dwell on the details, since there's another proposal open for centralizing all the scattered Cyrillic-related material to one page. Then again, that's mostly to be about transliteration, so maybe the Greek section in the table should be Greek and Cyrillic?  — SMcCandlish ¢ 😼  04:11, 22 July 2018 (UTC)

Instances of character references for Cyrillic letters seem to be relatively rare. I don't see any on a casual skim through this report, though I'd have to go through the entire alphabet to definitively say they are never used. Unlike Greek letters, they aren't in common use for scientific and mathematical purposes. I think it would be simpler and probably more user-friendly just to say to use the Cyrillic characters directly, which is what the draft is currently proposing. -- Beland (talk) 07:58, 22 July 2018 (UTC)
Works for me.  — SMcCandlish ¢ 😼  15:47, 26 July 2018 (UTC)

Reversion of addition of third draft

So after I posted the tables proposed above, David Eppstein reverted, with the edit summary "what part of "I think you should be more patient"..."Try proposing something narrower and more specific" do you not understand?".

I think I did not see those remarks by David Eppstein and SMcCandlish because they were posted in the discussion ("Fraction slash" below) about the "Slashes" section of the main MOS page, which I did not check for comments before updating the "Text formatting" MOS subpage. SMcCandlish wanted a one-word change to the "Slashes" section, which he implemented. I think David Eppstein was commenting on the change he reverted, as he then wrote:

I'm not convinced that the html section is needed at all. It is more material for a guidebook on html than style guidance for Wikipedia editors. And you appear to have the purpose of using the new section as a bludgeon to begin a massive project of automatically reformatting characters in Wikipedia, which I think is a bad idea (watchlist clutter for no visible change to articles).

"Bludgeon" sounds pretty ugly and mean. I started a project to spell-check all Wikipedia, which is intended to improve its readability and credibility. Along the way I noticed that editors have also occasionally misspelled HTML character entity references. I thought as long as we're cleaning up the misspellings, we might as well clean up any undesirable forms, because right now we don't seem to be representing them consistently. I started this discussion because I couldn't find any guidance in the Manual of Style to help me write the code to correctly flag undesirable forms vs. ignore desirable forms.

Mediawiki markup uses this part of HTML syntax, and if we have a preferred form for these things we'd want to communicate that to editors, and the Manual of Style is the place to document choices of style rather than technical how-to for the benefit of editors, so I don't understand the criticism that this is not the right place for this sort of guideline. Especially since Wikipedia:Manual of Style#Keep markup simple already discusses exactly this point, and the other sections linked from the proposed tables also address which characters are preferred.

We already encourage editors to make edits that have no reader-visible changes but do have editor-visible changes intended to make wikitext easier to read and thus articles easier to edit. That's the whole point of Wikipedia:WikiProject Wikify and wikification. I do agree there are some edits that don't improve readability all that much that aren't that worthwhile on their own, like changing "==xx==" to "== xx ==". This seems less trivial than that. I'd also note we have Wikipedia:HTML5, a project which is doing nothing but replacing obsolete HTML tags with newer ones, with hopefully no user-visible changes.

There are less than 20,000 articles that even have HTML character entity references at all, less than 3.5% of all articles. Even if we changed all of them today, given the sheer volume of changes to the encyclopedia it would not be a big deal, and in reality it will probably take months or years to manually change all the instances, if that's what we want to do. At worst, editors who notice these changes happening will be educated about the desired way of doing things, and be more likely to input characters that way when adding new text.

Given that editors seem to use characters a lot more than references, and given that characters are built into the Wikipedia UI, it seems a lot less disruptive to move toward characters than away from them.

To illustrate the difference it makes to editors, consider an editor who comes across "São Paulo" in wikitext. To most people who are not web developers, that looks like a typographical error. Some English-speaking people might correct it to "Sao Paulo" which is often seen in English, or, getting the idea there might be an accent there, to "Sáo Paulo", which is incorrect. "São Paulo" is what Portuguese speakers are expecting to see - it's what they type with their keyboards, and it's what appears in Word docs and on the Portuguese Wikipedia and on Google Translate, and in the readable parts of other web sites. With "São Paulo", everyone knows exactly what's going on, and there's no need to waste time doing a search on the meaning of "atilde" or "&atilde" or whatnot.

If I were making the rules, I think I'd keep it simple and say to use characters directly except for otherwise invisible characters and those that cause technical problems when used directly. I'd actually be fine if we used ASCII hyphens for all of our dashes, but I'm not complaining if people who can see the difference on their monitors want to upgrade some of them to emdashes to make things look pretty as in the golden years of paper typography. That would make a much smaller table than the one proposed above, but given that other editors seem to feel more strongly about making it easy to tell the difference between certain lookalike characters, I think that table now represents a pretty good compromise. Leaving dashes and quotes as they are takes the biggest chunks of potential work off the table, anyway.

Given that this is proposing a simple general rule and then listing all the desirable exceptions to it, I'm not sure that a narrower proposal would make sense. The volume of comments has been relatively small, so having multiple discussions about the same topic it seems would just burn more editor time. I am, however, open to actionable suggestions. -- Beland (talk) 08:03, 22 July 2018 (UTC)

@David Eppstein: Did you have any thoughts in response? -- Beland (talk) 18:46, 23 July 2018 (UTC)
I don't think we should be setting up automatic processes that make neither a visible change to article content nor a semantic difference to the markup of the articles. And I don't think we should be prescribing such things in the MoS and by doing so encouraging such processes. —David Eppstein (talk) 18:54, 23 July 2018 (UTC)
@David Eppstein: OK, would you be happy if the guideline said that all such changes be made manually? -- Beland (talk) 20:27, 23 July 2018 (UTC)
Still not strong enough. I would prefer that such changes be made only as part of other substantive changes to articles (more or less what usually happens now with AWB users; see WP:AWBRULES #4). —David Eppstein (talk) 20:35, 23 July 2018 (UTC)
OK, I think that will lead to undesirable forms lingering around for a long time for no particularly good reason. -- Beland (talk) 20:56, 23 July 2018 (UTC)
(And I think leaving those forms around would generate higher cognitive load and more work for editors than the messages generated by removing them.) -- Beland (talk) 21:01, 23 July 2018 (UTC)
(ec with D.E.) Way TLDR. I warned you that this would take a LOT of work and patience before it would be ready to become part of MOS. Your table, without question, inadvertently trods on a lot of toes in the form of established ways various groups of editors do things in various topic areas. It would be wonderful to systematize and summarize and centralize all this but, like I said, it's gonna be a lot of work. And it's one thing to come up with a guide for future editing; it's quite a different one to use it for some mass-change project. To be blunt, if you think that Even if we changed all of them today, given the sheer volume of changes to the encyclopedia it would not be a big deal then there are some things you really don't understand; if you made changes like this to 3% of articles in one day, or one week, or even one month, you'd be strung up by your URLs.

I haven't been following that last week of discussion so I don't know where we are and what the open issues are, but if you want this to see the light of day you need to be prepared to keep plugging for quite some time to work through all the details with all interested parties (not that I even know how to find them). I've gone through an effort like this myself elsewhere in MOS and it can be an exhausting task, though you will be quite rightly congratulated by all in the end if you can pull it off, because it will be a very useful achievement for the project. EEng 19:05, 23 July 2018 (UTC)

What does "ec with D.E." mean? If you think I should consult more people, but don't know how to go about doing that, that's not really an actionable suggestion. -- Beland (talk) 20:25, 23 July 2018 (UTC)
It means "edit conflict"; EEng and I wrote our comments in parallel. —David Eppstein (talk) 20:36, 23 July 2018 (UTC)
@EEng: As far as I know, the only open issue is whether these improvements would justify their own systematic edits. To a large degree, this is just codifying current practice so we can clean up stragglers, so I don't expect very many objections. -- Beland (talk) 07:11, 24 July 2018 (UTC)
This will need much wider exposure before you can have that kind of confidence. EEng 07:17, 24 July 2018 (UTC)
@EEng: I was only referring to issues that had been raised by editors who have already heard of the proposal. But how would you like to see me go about getting wider exposure? -- Beland (talk) 21:27, 25 July 2018 (UTC)

How do other editors feel about David Eppstein's proposal for a rule that "such changes be made only as part of other substantive changes to articles"? Personally, I don't see the need for that, given the arguments I made above, but of course I'll implement whatever the consensus is. -- Beland (talk) 20:46, 23 July 2018 (UTC)

This is a whole lot of stuff being discussed at once. I'll cover it in the order in which I'm seeing it come up above:

  1. My "Try proposing something narrower and more specific" (and David Eppstein's "I think you should be more patient", from what I can tell) were from the discussion below, on fraction-slash, and have nothing to do with the discussion above about having a handy quick-reference table on characters and their entities and what to do with them on WP. (Well, my comment didn't; I can't read David's mind.) That should be restored, toward the bottom of Wikipedia:Manual of Style/Text formatting I would think.
  2. "I'm not convinced that the html section is needed at all" no longer seems to have a referent. The table version 3 has no such sectioning.
  3. This point by Beland is correct: "Mediawiki markup uses this part of HTML syntax, and if we have a preferred form for these things we'd want to communicate that to editors, and the Manual of Style is the place to document [it]".
  4. Beland's entire "We already encourage editors to make edits that have no reader-visible changes but do have editor-visible changes ..." paragraph and the two that follow it are correct.
  5. David says: "I don't think we should be setting up automatic processes that make neither a visible change to article content nor a semantic difference to the markup of the articles." I can't find anywhere that this has been suggested, and it would already be governed by WP:COSMETICBOT. Beland seems to want to use this for AWB/GENFIXES purposes, but that's not automated. It's semi-automated, and entirely permissible when done in the course of more substantive edits.
  6. Consequently, "I don't think we should be prescribing such things in the MoS and by doing so encouraging such processes" doesn't really track. A) We do in fact have preferences, recorded willy-nilly throughout MoS (e.g. use ... not or &hellip;, at MOS:ELLIPSIS; and use μ or &mu; not &micro;, at MOS:UNITSYMBOLS; and so on), so the idea that it's off-topic or out-of-scope for MoS doesn't fly. B) MoS has already been updated with a footnote against automated "enforcement" of MoS stuff, including cross-references to the COSMETICBOT policy and to ArbCom decisions about it. The fact that someone could go on an bot-mediated enforcement rampage is not an argument against MoS having line-items about various stuff; the fact that we have rules against doing that is already sufficient to address the rare problem. Given that someone just lost their AWB access as a result of doing something like that should discourage a repeat. Rules do not need 100% compliance to be useful, nor does failure to achieve 100% compliance mean they're insufficient; otherwise civil society would be impossible.
  7. David's "I would prefer that such changes be made only as part of other substantive changes to articles": We can include something about this, but not making up a new rule just for this, only pointing out the existing ones. MoS is not an editing or behavioral policy nor a dispute resolution board. This is already covered by WP:MEATBOT policy and WP:AWBRULES, and is just how WP:GENFIXES works. The aforementioned footnote can simply be recycled from the main MoS page to where ever this table will live.
  8. EEng says: "Your table, without question, inadvertently trods on a lot of toes in the form of established ways various groups of editors do things in various topic areas." That's not "without question"; prove it, please. Then we can integrate whatever tweaks are necessary. And sometimes toes have to be stepped on, anyway. Not everything some gaggle of people at a wikiproject are doing is a good idea, nor do they get to just make up their own rules and force others to comply; site-wide concerns override local ones (WP:CONLEVEL policy).

    And what was once an okay idea can become a poor one over time as circumstances change. E.g., the cutover last month to a new HTML linter for the parser broke all kinds of stuff that used to "okay" or "we don't care", but which is no longer okay, and thus we now do care. The most obvious of these is that unclosed inline elements used to be forcibly closed at the opening of a block element and this is no longer the case, resulting in badly broken, mis-rendering HTML in at least tens of thousands of pages. People have been cleaning this up, including with semi-automation tools like AWB and JWB, yet no one having a shit-fit about it. People will have shit-fits about such activity if it's PoV pushing (e.g. changing all "U.S." to "US", or changing all unspaced em-dash parenthesizing to use spaced en dashes), but they don't lose it over technical cleanup. Another example is that <br> breaks the output of at least two of the available edit-mode syntax highlighters, and needs to be changed to <br />; I've already fixed one "Help:"-namespace page from the 2000s that was recommending <br>, and there are probably some others that need fixing in this regard.

  9. The obvious way to proceed is for EEng to document these "toes" he says are being stepped on; for discussion to ensue, with any needed adjustments being made to the table; and then – if we really think it's necessary – do an adoption RfC on table version 4.

 — SMcCandlish ¢ 😼  16:38, 26 July 2018 (UTC)

  • My point about the toes is simply that, from experience, people tend to be very set in their ways about low-level details such as direct (literal) pasting in of characters vs. coded form (and, where a coded form is used, both the & forms and template forms have their enthusiastic adherents). So the wider this is advertised and discussed the better, to save WP:WHINE-ing down the road.
  • I think the table needs to recognize that there are much-used template forms e.g. {ndash}
  • If we're going to all this trouble, I'd like to see a shift to a preference for coded forms of mdash and ndash, instead of the current even-handed statement. It's just crazy-making that you can't tell if the right character is present (depending on your font and platform of course). This of course we be a stepping on of some toes.
  • I'm still hoping to get an explanation of why Whitespace other than the non-breaking &nbsp; and regular space should be avoided in prose.
EEng 05:27, 27 July 2018 (UTC)
Because they cause copy-pasting errors/oddities, clashes with find/replace searches, don't play nice with screen readers, mess with alignment/justification, and there's pretty much no point to them in any sort of prose "James&thinsp;Dean was an actor." is pure nonsense. Headbomb {t · c · p · b} 11:06, 27 July 2018 (UTC)
Yeah, we only use these for special kerning purposes. If there's some case were we're regularly using thin and hair spaces and it's not spacing tweaks in tight material in template output, feel free to point out where we're doing it, and it can be accounted for (if it's a good idea). Other stuff:
  • As for the "set in their ways about ... direct (literal) pasting in of characters", that's irrelevant, because MoS doesn't constrain editors in any way as to adding new material. You can edit WP without ever complying with anything MoS says, as long as you're following WP:CCPOL, and not a) changing guideline-compliant material to be non-compliant, or b) reverting people making non-compliant material be compliant.
  • Re "the table needs to recognize that there are much-used template forms e.g. {ndash}" – sure. That's not an objection to the table, its an expansion suggestion.
  • On changing to &ndash;: I actually proposed that several years ago for the same reason, and did not get consensus. Apparently the average editor, with their fonts, can see the difference clearly, and people were dismissive of the idea because the editing tools below the edit window provide a button for directly inserting the Unicode character. I think, therefore, this is a lost cause. Editors having trouble seeing the difference between , , , and - need to use WP:User CSS or their browser's font settings to use a font for editing that works better for them. I wrote instructions on how to do this at Help:User style#User CSS for a monospaced coding font. It's not absolutely perfect; the minus and hyphen are still hard to distinguish. If I find a better, free coding font than Roboto Mono I'll put it at the front of the font stack.
 — SMcCandlish ¢ 😼  13:07, 27 July 2018 (UTC)
Or you can use WP:WIKIED WP:WIKED which marks them as different in the edit window. Headbomb {t · c · p · b} 13:18, 27 July 2018 (UTC)
I'm assuming you didn't really mean m:Wiki Education Foundation. EEng 17:44, 27 July 2018 (UTC)
Yes, my bad, fixed. I meant WP:WIKED. Headbomb {t · c · p · b} 18:18, 27 July 2018 (UTC)
  • Obviously no one's suggesting James&thinsp;Dean so that isn't helpful, and BTW I just checked and text search on Chrome has no problem understanding that thin space is a space. Now and then I've used hsp to adjust "something in italics"[5] to "something in italics"[5] (your mileage may vary, of course) and I'm sure I've used thinsp now and then though I can't recall where. Take a look a this change [4].
  • I'm not so sure that the evidence is that Apparently the average editor, with their fonts, can see the difference clearly. I suspect instead that that the great majority of editors don't even know there is a difference (and just use hyphen), most of those who know the difference are inserting directly using the click-to-insert gizmo but don't really notice or care what it looks like in the edit window since they never look back, and the very small number of us who are copyediting and checking these things have learned to deal somehow with the difficulty of distinguishing them – in my case, wherever I see a direct/literal character which I know should be an ndash but I'm not sure, I just change it to {ndash} so I know it's right. But I'd rather we encouraged editors to use a coded form in the first place to save that trouble. Unfortunately that would create a new flashpoint for my next point, which is...
  • MoS doesn't constrain editors in any way as to adding new material – You know that and I know that, but as sure as day follows night someone's gonna paste in a direct rho, someone else is gonna change that to &rho; (as recommended in the table), and the first guy's gonna change it back, saying "I like it this way." Having said that, looking over the whole table now I don't see very many cases where that might happen (unless we adopt a recommendation to use coded forms of ndash and mdash) but I still think the wider this is advertised for comment in advance the less trouble there will be.
EEng 17:44, 27 July 2018 (UTC)
Do you have any example of where thinsp/ensp/emsp/hairsp should be used in prose? Because you have none, and no one can come up with any use for them in prose. Until you have such counter examples, the avoid them in prose has consensus, and the allow them is your simply your own preference to not disallow them because of reasons which are never explained. Headbomb {t · c · p · b} 18:41, 27 July 2018 (UTC)
I guess you didn't read my post above because the first bullet point gives one. I've been very up-front about my wish that we could recommend coded dashes over direct dashes, instead of just trying to force it into the table. Please have the same courtesy about your apparent wish to flatly forbid thinsp and hsp. Is such a provision already present in MOS? EEng 18:45, 27 July 2018 (UTC)
Such a provision is the current state of Wikipedia. No one writes "something in italics"[5], and they shouldn't start to do so either. Not sure what that has to do with dashes.Headbomb {t · c · p · b} 18:49, 27 July 2018 (UTC)
Is a blanket ban on thinsp and hsp already in MOS or not? EEng 18:56, 27 July 2018 (UTC)
The only use I can recall for which I manually employ thin space is between § and the section number that follows it, to split the difference between "§ 1.2.3" and "§1.2.3" styles. This is just a personal habit of mine; there's no rule about it. The only use I've ever have for hair space, outside of a template, is between em dash and an author name when attributing a quotation: "Humor is Mandkind's greatest blessing." — Mark Twain". Also not a rule; it just looks better. Neither of these uses is vital. But they're not objectionable. So, we have a handful of use cases we can document, and then discourage it otherwise. Put it in a footnote, probably. I'm a big fan of footnoting "there are some geeky exceptions" stuff instead of clouding the central advice. On horizontal marks: Well, you can try proposing glyph-to-code conversion if you want, but don't hold your breath. With my font tweaking solution, I have no difficulty at all telling en dashes and hyphens apart, in rendered or source view. "The wider this is advertised": Sure, but not while we're still banging on it just with 3 or 4 people. Iron out the obvious kinks, or even more surely that day follows night, people will "strongly oppose" the whole thing on the basis of some nitpick we should have already anticipated.  — SMcCandlish ¢ 😼  20:58, 27 July 2018 (UTC)
Obviously I meant we elite should get it in the best form we can before inviting the hoi polloi to look at it. EEng 21:29, 27 July 2018 (UTC)
Thin space is needed for the correct typography of some mathematics formulas. E.g. (from something off-wiki I was working on today) without thin space: ; with thin space: . The thin space makes it much more clear that this is a product of two subformulas rather than some strange binary-operator usage of the exclamation point. —David Eppstein (talk) 21:18, 27 July 2018 (UTC)
All great use cases (though I'm sure there are more we're not thinking of) so you see why I objected to These are sometimes used for precision positioning in templates but should not be used in prose. Use either non-breaking (&nbsp;) or regular spaces. So who's OK with my formulation These are sometimes used for precision positioning in templates but rarely in prose, where non-breaking &nbsp; and regular space are normally sufficient (with or without a footnote as suggested by SM)? I'm fine with the rest of what SM has said. EEng 21:27, 27 July 2018 (UTC)

I feel like keeping to the spirit of Wikipedia:Manual of Style#Keep markup simple means saying that &thinsp and &hairsp should not be used around italics, dashes, and §, since either a regular space or no space works just fine. And I agree with that general approach; HTML is not well-suited to pixel-perfect character control, and as long as there are no horribly ugly problems like actually-overlapping characters I don't think we should fuss about that sort of small thing. This sort of layout issue may be better addressed by making web browses render text more beautifully than by throwing in a bunch of site-specific directives.

If we were to start putting &thinsp around, say, emdashes, then I think that would be a good argument for doing that in an {{emdash}} template, since we'd want it everywhere consistently. I don't think it's a good idea to do that sort of fine-control typography on an article-by-article basis, since then it will not be done consistently.

If {{endash}}, &endash, and – all do exactly the same thing with no fancy spacing, I can see an argument for having two different ways to do it (one HTML-free and one for easier identification), but three ways seems like too many, when two of them serve almost exactly the same purpose.

That said, I'd rather publish the new tables with some of the rows marked as disputed/under discussion than hold the whole thing until there's consensus on every single part, so at least we can start making progress on the items that everyone agrees on, which seems like 95% of it. -- Beland (talk) 02:16, 28 July 2018 (UTC)

  • keeping to the spirit of Wikipedia:Manual of Style#Keep markup simple means saying that &thinsp and &hairsp should not be used around italics, dashes, and § – No, what the linked guideline says is "Other things being equal, keep markup simple... Use HTML and CSS markup sparingly". That's not "should not be used".
  • HTML is not well-suited to pixel-perfect character control, and as long as there are no horribly ugly problems like actually-overlapping characters – It may not be well-suited, but at times we need to do the best we can, and we're not talking about "pixel-perfect". David Eppstein's example is an excellent one in which neither regular space nor no space is at all acceptable.
  • I'd rather publish the new tables with some of the rows marked as disputed/under discussion – Well, I think we have our hands full just coming up with tables which faithfully and uncontroversially centralize what is now scattered all over creation. And that would be quite an achievement. Changes to what's being recommended should be a follow-on effort.
EEng 03:47, 28 July 2018 (UTC)
David Eppstein's example doesn't use &thinsp; in the wikitext, so it seems to be out of scope of what I'm proposing. is rendered with <math>\phi!\,2^\phi</math>. Though wouldn't that be a good place to use the dot operator if that's appropriate - surely a very subtle spacing difference isn't the best way to clarify the notation? -- Beland (talk) 17:22, 28 July 2018 (UTC)
Dot operator is for noobs. Writing for noobs may be appropriate in some Wikipedia articles but it can be condescending in other contexts. —David Eppstein (talk) 19:28, 28 July 2018 (UTC)
I know, how about × or * ? I larnd bout them in algbra. EEng 00:18, 29 July 2018 (UTC)
@EEng: Are you arguing that the additional complexity of using &thinsp and &hairsp in prose is worthwhile, and if so in what situations? -- Beland (talk) 17:22, 28 July 2018 (UTC)
I'm arguing that this isn't the time or place ...
... to get into the weeds of changing the current guidelines, rather than just summarizing and centralizing them.
... to tell a practicing mathematician what notation he should use.
EEng 18:12, 28 July 2018 (UTC)

@David Eppstein: @EEng: Given the above discussion, do either of you have any remaining objections to posting the revised guidelines? -- Beland (talk) 00:07, 5 August 2018 (UTC)

Yes. I have the same objections I have already discussed. Why do you think that would have changed? —David Eppstein (talk) 00:08, 5 August 2018 (UTC)
Honestly, I've completely lost track of where we are. I'd be happy to trust SMcCandlish to recapitulate what the outstanding objections (by me, by DE, or by anyone else) seem to be. EEng 05:46, 5 August 2018 (UTC)
I'm mostly on wikibreak. Eppstein's example is a very good one, and I thought of another the other day but then then promptly forgot it. Anyway, we can probably arrive at a suggestions to not use these unusual space characters in lieu of regular spaces, nor for decorative effect, and have a footnote with some programmatic examples of appropriate use. I like pushing MoS back towards being advisory and providing examples instead of issuing do/don't commandments where practical. We need a "WP:Don't feed the wikilawyers" page.  — SMcCandlish ¢ 😼  15:21, 12 August 2018 (UTC)

A germane (if insane) Help page

Assuming we're still doing this, let's while we're at it do something about this insane pile of technical minutiae: WP:How_to_make_dashes. EEng 23:48, 4 August 2018 (UTC)

I tided that up so it's a shorter read if you don't want to learn how to type by keyboard "shortcut". I'll add a link to the fourth draft. -- Beland (talk) 22:55, 6 August 2018 (UTC)

Exception for superscripts/subscripts in titles

@Beland: Issue to add to the resolution stack: WP:Manual of Style/Titles#Typographic effects specifically advises use of Unicode superscripts and subscripts and such when available for use in titles of works, because they copy-paste correctly (that is, the output of E=mc<sup>2</sup> copy pastes as E=mc2, and can be used in citation templates without boogering the COinS output. I'm wondering if this conflicts with anything in MOS:NUM and MOS:TM, and the main MoS page. If so, we need to figure out how to reconcile that.  — SMcCandlish ¢ 😼  22:56, 27 July 2018 (UTC)
Well, Wikipedia:Manual of Style/Superscripts and subscripts is marked as inactive, but I resolved what little conflict there seems to be by adding an exception for titles on that page, with a cross-reference to Wikipedia:Manual of Style/Titles § Typographic effects. I added the same cross-reference and exception to the proposed table. -- Beland (talk) 02:00, 28 July 2018 (UTC)
User:Headbomb reverted the table change with the edit summary "that page has nothing that contradicts the advice given here. This also applies to titles too via {{DISPLAYTITLE}}". Before my edit, I read that row as recommending not to use Unicode superscripts and subscripts at all, and after the edit to recommend not using them except when needed in titles. The linked page in fact says: "To ensure correct copy-pasting, it is preferable to use Unicode superscript or subscript characters when possible, rather than HTML or wiki markup, which are purely typographic (Unicode ² is not the same character as 2 with superscript markup). Special characters can be used in citation templates." which to me contradicts the "don't use, ever" advice before the edit. Actually my edit was incorrect, the exception is not for Wikipedia article titles, but for titles of works generally, so I'd have to reword it if restoring. (SMcCandlish mentioned that but I was reading too quickly.) But does that at least make sense as I explained it, or am I missing something? -- Beland (talk) 02:30, 28 July 2018 (UTC)
The only exception should be for an article on the unicode characters themselves. Everything else should be done via a DISPLAYTITLE, e.g. (−1)F, or AC0. Titles of works are no exceptions there. Something like H2O: The Book should be located at H20: The Book and formatted via {{DISPLAYTITLE:''H<sub>2</sub>O: The Book''}}, not located at H₂O: The Book, and then formatted as H2O: The Book throughout the rest of the article. Headbomb {t · c · p · b} 02:47, 28 July 2018 (UTC)
H20? Holy heavy hydrogen, Batman! EEng 03:00, 28 July 2018 (UTC)
Well, there is ISBN 1492615323. But the same would apply to H2O (American band) / H2O (Scottish band), etc... Headbomb {t · c · p · b} 03:18, 28 July 2018 (UTC)
OK, I made another go at noting exceptions to the general "don't use" rule in the table. Does that look better? -- Beland (talk) 17:36, 28 July 2018 (UTC)
Reverted and clarified. Unicode superscripts shouldn't be used anywhere in titles, except for articles dealing with the Unicode characters themselves. Copy-pasting issues are irrelevant to how things should be properly formatted and displayed, and copy-pasting H<sub>2</sub>O is no harder than copy-pasting H₂O. This is also an accessibility concern, as screen-readers will often chock on Unicode superscripts.Headbomb {t · c · p · b} 14:16, 5 August 2018 (UTC)

Please someone step in to resolve the most stupid revert war ever

EEng keeps messing with the table layout, forcing them to take huge amounts of vertical space, breaking consistency, scaling/zoom functionality, and forcing unnatural breaks for AFAICT, no real reason but personal preferences. What looks better, [5] + [6] (inline) or [7] + [8] (random vertical breaks)? Headbomb {t · c · p · b} 10:52, 27 July 2018 (UTC)

Works better allowed to naturally flow; viewport sizes vary radically. The version with forced line breaks does waste a bunch of vertical space on my big-ass monitor. When I reduce window width sharply to simulate a mobile device, it wraps awkwardly, because the browser wraps as needed, plus there are forced line breaks, and they're at cross purposes.  — SMcCandlish ¢ 😼  12:02, 27 July 2018 (UTC)
Also note to EEng (talk · contribs) (posting this here since your userpage is too slow to use), when you refer to collective things, they take the plural form. The hyphen is considered... but Hyphens are considered, not Hyphen is considered.... Headbomb {t · c · p · b} 14:48, 27 July 2018 (UTC)
I have to go do my laundry but perhaps when I get back we can talk about this calmly and without the self-certainty. EEng 15:03, 27 July 2018 (UTC)
OK, that's the whites done, so I have a minute. Look, we've all been through this, where we're seeing different things on different platforms, and it's not helpful to say simply "looks horrible" without thinking about what the other person is seeing and what they're trying to achieve. While in general (all other things being equal) the conservation of a table's horizontal and vertical space is a priority in order to make it easier for the reader to absorb its content, in the present example (or one of them) there was the competing desire to present the various dashes and so on in a stacked form to allow the reader to see how confusing they can be. That may or may not have been worth the slight additional vertical space consumed, but it's not ridiculous either, and Headbomb simply ignored my repeated explanations of that instead of engaging in a discussion of the competing desiderata.
As for plural and so on, "Hyphen is considered" is simply a telegraphic form of "The hyphen is considered", and is just as correct as "Q is considered the hardest letter to use in Scrabble." You're more concerned with strict formalism than is appropriate outside article space.
I've been many times thanked for my careful reforms of previously incomprehensible tables such as those at MOSNUM and WP:PROTECTION, so I do know what I'm doing even if you're not able to always see what I'm aiming at. But I'm not sufficiently interested in these minutiae to worry about them, at least until this proposal goes live and its content is in final form. EEng 18:25, 27 July 2018 (UTC)
Replace "hyphens/minuses/dashes" with "car/turnip/leaf" and see how it doesn't make anysense. "Car should always be..." makes no sense. "Cars should always be..." does. Headbomb {t · c · p · b} 18:44, 27 July 2018 (UTC)
Replace it by Q to see there must be more to it than you seem to think: "Q is usually followed by u" makes complete sense – or would you insist on "The Q is usually followed by the u"? Or, God forbid, "Q's are usually followed by u's"? Can't you just let anything go? EEng 19:07, 27 July 2018 (UTC)
First, using "The" makes this singular. But if you remove it and have "Q is usually followed by u" you're using a mention, and the analogous situation would be something like "- is usually followed by ;", not "hyphen is usually followed by semicolon" (the grammatically correct way of having a use would be "Hyphens are usually followed by semicolons"). Headbomb {t · c · p · b} 19:13, 27 July 2018 (UTC)
Apparently the answer to my question is No. Oh, and see WP:MISSSNODGRASS.EEng 19:38, 27 July 2018 (UTC)
To get back to the original question, the different versions of the tables look the same to me when I use a narrow window. With a wide window, I prefer the ones with the explicit breaks; I think it makes the markup examples clearer to break them into lines like that. The extra vertical space doesn't bother me; if you have a wide window, you probably also have a tall window. —David Eppstein (talk) 20:15, 27 July 2018 (UTC)

Fourth draft

Proposed for posting to Wikipedia:Manual of Style/Text formatting § HTML character entity references and replacing the second paragraph of "Keep markup simple" at Wikipedia:Manual of Style § Miscellaneous with a link to this new section.

HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name, decimal number, or hexadecimal number. For example, &euro; is the same as &#x20AC;, &#8364;, or including the character directly.

On Wikipedia, characters should be used directly unless doing so is confusing for editors or causes technical problems. Numerical references should not be used if a named reference is available. For example, &minus; should be used instead of &#8722;, and é should be used instead of &eacute;. For a comprehensive list of available named references, see [9].

Wikipedia stores articles with Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the Unicode input method provided by their operating system. There are some exceptions where named references are preferred, to avoid confusion and to circumvent technical limitations. The <nowiki> tag can also be used instead of character escaping to prevent interpretation of special characters as wiki markup.

Please note: It is always OK, whether using manual or semi-automated means, to fix broken HTML entities by replacing them with characters or correct HTML entities (whichever is preferred in the specific case). (Fully automated fixes would need bot approval.) However, when changing existing text from a disfavored to favored form, especially when making large numbers of changes, WP:MEATBOT asks that editors making manual edits please pay attention to the context and be aware of exceptions to the guidelines. When using automated and semi-automated tools, remember that WP:COSMETICBOT and WP:AWBRULES ask that these tools not be used to make changes of this type unless accompanied by a more substantive (reader-visible) change. Check Wikipedia error 11 is disabled for this reason.

Characters to avoid |
Avoid Instead use Note
(&hellip;) ... (i.e. 3 periods) See MOS:ELLIPSIS.
Unicode Roman numerals like Latin letters equivalent (I II i ii) MOS:ROMANNUM
Unicode fractions like ¼ ½ ¾ &frasl; {{frac}}, {{sfrac}} See MOS:FRAC.
Unicode subscripts and superscripts like ¹ <sup></sup> <sub></sub> See WP:SUPSCRIPT. In article titles, use {{DISPLAYTITLE:...}} combined with <sup></sup> or <sub></sub> as appropriate.
µ (&micro;) μ (&mu;) See MOS:NUM#Specific units
Ligatures like Æ æ Œ œ Separate letters (AE ae OE oe) Generally avoid except in proper names and text in languages in which they are standard. See MOS:LIGATURES.
(&sum;) (&#8719;) (&horbar;) Σ (&Sigma;) Π (&Pi;) (&mdash;) (Not to be confused with \sum and \prod, which are used within <math> blocks.)
(&lsquo;) (&rsquo;) (&sbquo;) (&ldquo;) (&rdquo;) (&bdquo;) ´ (&acute;) (&prime;) (&Prime;) ` (&#96;) Straight quotes (" and ') Use {{coord}}, {{prime}} and {{pprime}} for mathematical notation; elsewhere use straight quotes unless discussing the characters themselves. See MOS:QUOTEMARKS.
(&lsaquo;) (&rsaquo;) « (&laquo;) » (&raquo;) Use &lang; and &rang; for math notation. In non-English quotations normalize angle quote marks to straight, per MOS:CONFORM, except where internal to non-English text, per MOS:STRAIGHT.
&ensp; &emsp; &thinsp; &hairsp; Normal space These are sometimes used for precision positioning in templates but rarely in prose, where non-breaking (&nbsp;) and regular spaces are normally sufficient. Exceptions: MOS:ACRO, MOS:NBSP.
In vertical lists

(&bull;) · (&middot;) (&sdot;)

* Proper wiki markup should be used to create vertical lists. See HELP:LIST#List basics.
&zwj; &zwnj; see note Used in certain non-English language words, see zero-width joiner/zero-width non-joiner. Should be avoided elsewhere.
£ for GBP, keep ₤ for Italian Lira and other lira currencies that use ₤ (see the main article for that currency) MOS:CURRENCY; find broken instances
Potentially confusing or technically problematic characters |
Category coded form (direct form) Notes
Miscellany &amp; (&) &lt; (<) &gt; (>) &#91; ([) &#93; (]) &apos; (') &#124; (|) Use these characters directly in general, unless they interfere with HTML or wiki markup. Apostrophes and pipe symbols can alternatively be coded with {{'}} and {{!}} or {{pipe}}. See also character-substitution templates and WP:ENCODE.
Greek letters &Alpha; (Α) &Beta; (Β) &Epsilon; (Ε) &Zeta; (Ζ) &Eta; (Η) &Iota; (Ι) &Kappa; (Κ) &Mu; (Μ) &Nu; (Ν) &Omicron; (Ο) &Rho; (Ρ) &Tau; (Τ) &Upsilon; (Υ) &Chi; (Χ) &kappa; (κ) &omicron; (ο) &rho; (ρ) In isolation, use coded forms to avoid confusion with similar-looking Latin letters; in a Greek word or text, use the direct characters.
Quotes &lsquo; () &rsquo; () &sbquo; () &ldquo; () &rdquo; () &bdquo; () &acute; (´) &prime; () &Prime; () &#96; (`) Can be confused with straight quotes (" and '), commas, and with one another. MOS:STRAIGHT generally requires conversion to straight quotes, except when discussing the characters themselves or sometimes with non-English languages. See next row for prime characters.
Apostrophe-like ' ` ´ ʻ ʼ ʽ ʾ ʼ ʽ ʻ ʼ
Dashes, minuses, hyphens &ndash; () &mdash; () &minus; () - (hyphen) &shy; (soft hyphen) Can be confused with one another. For dashes and minuses, both forms are used (as well as {{endash}} and {{emdash}}). Soft hyphens should always be coded with the HTML entity or template. Plain hyphens are usually direct, though at times {{hyphen}} may be preferable (e.g. Help:CS1#Pages). See MOS:DASH, MOS:SHY, and MOS:MINUS for guidelines.
Whitespace &nbsp; &emsp; &ensp; &thinsp; &hairsp; &zwj; &zwnj; In direct form these are nearly impossible to distinguish from a normal space. See also MOS:NBSP.
Non-printing &lrm; &rlm; In direct form these are nearly impossible to identify. See MOS:RTL.
Mathematics-related &and; () &or; () &lang; () &rang; () Can be confused with x ^ v < >. In some cases TeX markup is preferred to Unicode characters; see MOS:FORMULA. Use {{angbr}} instead of ) / ()
Dots &sdot; () &middot; (·) &bull; () Can be confused with one another. Interpuncts (&middot;) are common in horizontal lists and to indicate syllables in words. Multiplication dots (&sdot;) are used for math. In practice, the dots are used directly instead of the HTML entities.

Discussion of fourth draft

@David Eppstein: I thought your opinions might have changed or been refined in response to the comments by SMcCandlish in the discussion of the third draft. SMcCandlish said some interesting things about how to formulate advice against disruptive editing, which I think helped evolve my position. I've tried to integrate both your views in the new paragraph in the above fourth draft. How does that sound to you? -- Beland (talk) 23:35, 6 August 2018 (UTC)

My opinion is still that we should not make invisible and semantics-neutral changes to articles except as part of more substantive edits to the same articles, and that your suggestions here seemed aimed at doing that. If you reassure me that no such automation is intended, and that your proposed tables are intended purely for the use of human editors, I may be willing to take your proposals more seriously. But even then, I don't see this as something that is so important to standardize that it should be codified in the MOS. —David Eppstein (talk) 00:50, 7 August 2018 (UTC)
@David Eppstein: Pretty sure no one in WP:BAG would ever approve a bot whose only purpose is changing &euro; to or &ndash; to (or vice versa) without strong consensus to do so per WP:COSMETICBOT. Some of those could end up as minor WP:GENFIXES (and only when "Unicodify page" is manually enabled), but that's already an option for a lot of things. I think (not sure) AWB exposes invisible characters (non-breaking spaces to &nbsp; for instance), and I know for a fact that WP:WikED does it. I'd support changing obscure hex (&#x20AC;) and dec (&#8364;) codes to their regular () or readable (&euro;) equivalents on a character-per-character basis though. Headbomb {t · c · p · b} 02:28, 7 August 2018 (UTC)
I definitely don't mind turning hexes into unicodes if it's part of more substantive edits. But we don't need complicated tables to do that. —David Eppstein (talk) 02:45, 7 August 2018 (UTC)
@David Eppstein: Well, given that there's a complicated guideline that's being enforced, it seems like it has to be written down somewhere? It took some work to figure out what generally desirable practices are, and what the exceptions are that we need to watch out for. Is there somewhere else you think it would be more appropriate to codify this? -- Beland (talk) 06:32, 7 August 2018 (UTC)
"I worked hard on it" is never a valid reason for accepting something. And I think it's important to put more attention into learning what our actual practices are, than into making high-handed decisions about what they should be. —David Eppstein (talk) 06:35, 7 August 2018 (UTC)
@David Eppstein: That's not really what I was getting at. The reason it's worth documenting is not that I personally put work into it, it's that someone else would have to do the same work over again (for any given character) to figure out "which way should I write this?" or "is this the right style or should I change it?" later on. Documenting the preferred practice should save time in both looking for that information, in being confused by disfavored styles, and in changing disfavored to favored styles. I think we have done a good job learning what our actual practices are; I have looked at frequencies in database dumps, and we've done a consultation where people have pointed out what various groups consider errors vs. acceptable style. What's the argument against putting this in the MoS? -- Beland (talk) 15:45, 7 August 2018 (UTC)
@David Eppstein: I just confirmed that searching for the character itself (with ‰ for example: [10]) can miss articles that use the entity (for example articles using &permil; can be found with a different search: [11]). Given that and what Headbomb points out about automation, do you feel the proposed language in the fourth draft is a reasonable compromise? -- Beland (talk) 06:31, 10 August 2018 (UTC)

I find what Headbomb says about this sort of misbehavior being disallowed more convincing than your rationales for why you think it should happen, but whatever. There are also still specific problems with your draft.

  • Why does the first table confuse summation signs and the letter capital sigma, and product signs with the letter capital pi? They are semantically different, and requiring that summations and products always use the same glyph as the letters would be wrong.
  • Why do the curly quote characters appear in the second table when they're already adequately treated in the first table? Also, as far as I know only the single and double prime are used in math notation. And can we spell out mathematics, to avoid offending the Brits?
  • What does "Soft hyphens should always be coded" mean? Coded as entities? Coded using templates? What?
  • Does anyone use the "Non-English punctuation" lower quotes in math notation? I think you're confusing them with the angle brackets of the previous table entry, which are used (more in physics than in math) and should not be substituted by less-than greater-than even in html-coded formulas.

David Eppstein (talk) 06:49, 10 August 2018 (UTC)

@David Eppstein:

  • Jc3s5h pointed out that WP:MOSNUM requires using mu instead of micro, even where micro is more semantically correct. In practice, editors seem to use sum very rarely and just use sigma instead, even where sum would be more semantically correct. It seems like Wikipedia is saying "thanks but no thanks" to the Unicode consortium for those somewhat redundant characters. Since humans interpret them the same way regardless of which character is used, I'm not sure there's a point in fighting for using them as originally intended. Certainly "always use Σ" is an easier rule to enforce automatically than "sometimes use Σ and sometimes use ∑ depending on the context".
  • I just changed "math" to "mathematical" and "mathematics" as requested.
  • Quote mark characters appearing in both tables means, "avoid using these, but if you have to use them, use the entity and not the character".
  • I just changed the text to clarify soft hyphens should always use the HTML entity or a template.
  • Yes, there are some math articles in the search results for lsaquo and rsaquo. If you are wanting to recommend that &lsaquo and &rsaquo be changed to &lang and &rang in math contexts, that makes sense to me, and those characters are more commonly used in math articles (results for lang and rang). That would be a pretty easy changeover given the relatively small number of affected articles.

-- Beland (talk) 07:22, 10 August 2018 (UTC)

In practice, sums and products are formatted within <math> (because the html/template alternative coding works too poorly) and invariably are coded as \sum and \prod not as \Sigma and \Pi. And they look different from the Sigma and Pi characters. Your proposal could be read as implying that even within <math> one should use \Sigma and \Pi for sums and products instead of the correct coding. That is just wrong and will never be accepted by anyone with any mathematical literacy. —David Eppstein (talk) 07:28, 10 August 2018 (UTC)
This proposal is only trying to make recommendations about HTML entities and certainly not about <math> markup. I'll clarify the text. -- Beland (talk) 07:35, 10 August 2018 (UTC)
@David Eppstein: That's done. Any further action to take on any of these points? Should I make the lsaquo/rsaquo to lang/rang for math articles? -- Beland (talk) 08:14, 10 August 2018 (UTC)
@David Eppstein: Since there were only two articles affected, I dropped lsaquo and rsaquo from all math articles, and noted in the table to use lang/rang instead for math purposes. Have all your objections been satisfied? -- Beland (talk) 07:40, 12 August 2018 (UTC)
I don't know, have you promised not to make edits that don't make visible changes to articles yet? —David Eppstein (talk) 07:54, 12 August 2018 (UTC)
@David Eppstein: Do you consider edits that affect search results to be invisible? -- Beland (talk) 15:32, 12 August 2018 (UTC)
@Beland: Do you have evidence that changes to character encoding affect Google search results? –David Eppstein (talk) 06:59, 13 August 2018 (UTC)
@David Eppstein: Google seems to be generally broken when searching with special characters, so even with cleaner wiki source, we're not necessarily going to get good results there. I did find some cases where using HTML entities vs. characters directly does matter to Wikipedia's built-in search engine results. (Discussed in detail in the "Ahem" section below.) -- Beland (talk) 07:06, 13 August 2018 (UTC)

FTR, I just advertised this proposal on Wikipedia talk:WikiProject Mathematics and Wikipedia talk:WikiProject Science, since these are the topic areas with the most articles that would eventually have to be changed to reach full compliance with this new recommendation. -- Beland (talk) 06:54, 10 August 2018 (UTC)

Any attempts at style guidance for mathematics which elevate the aesthetic character-usage preferences of non-mathematicians over how mathematics is actually formatted by mathematicians are going to be doomed to failure. Meaning, either they won't be approved, or if you somehow ram them through they will be ignored and rebelled against. So if you think that many mathematics articles are going to need to be changed by your proposal, that's a big red flag that your proposal is bad. —David Eppstein (talk) 07:02, 10 August 2018 (UTC)
@David Eppstein: Well, I think it's much more natural for mathematicians who don't know HTML to write ∈ than &isin;. There may be reasons why there is currently a mix of both of those in use which are unrelated to people having an actual preference for the HTML entity. There are also more instances where characters are already in use and things do not need to be changed, which is evidence to me that math people do not actually have a strong preference for HTML entities. It may just be a matter of people not knowing about the other way to do it, which is exactly the sort of reason a published convention would address. So, I'm not going to pre-judge the outcome and I'd like to hear from the math folks themselves. -- Beland (talk) 07:32, 10 August 2018 (UTC)
What is actually much more natural is to use \in within math mode. Anything else is a workaround for Wikipedia's historically-really-bad and now merely subpar rendering of math mode. —David Eppstein (talk) 07:46, 10 August 2018 (UTC)
OK, that's orthogonal to this proposal, which is not attempting to make or change any recommendations about <math> markup or when it should be used. -- Beland (talk) 08:14, 10 August 2018 (UTC)
And actually it links to MOS:FORMULA which talks about how and when to do <math> markup, noting that is sometimes preferred. -- Beland (talk) 08:20, 10 August 2018 (UTC)
  • Since i'm invited to comment... I'm squarely in camp "unicode for everything", as that is where generally the web will move to. Lets not teach people arcane XML/HTML entities that they will never have to use anywhere else. Also I think spending time on this is a complete waste, people should be less anal, be less annoyed with each other, more forgiving and let go of things that 99% of the editors will never consciously adhere to anyway. —TheDJ (talk • contribs) 08:25, 10 August 2018 (UTC)

@David Eppstein: You requested evidence above; evidence was given. What are the implications of those findings for you? -- Beland (talk) 08:32, 15 August 2018 (UTC)

Every time I express my objections to your proposals, several days later I get a ping from you like this saying "Have you changed your mind yet? How about now? Really? Have you changed your mind yet? Now have you? What about now?" Stop it. It is irritating and does not help build support for your proposals. I don't want to see pings from you, will not respond substantively to them any more, am still unsupportive of your proposals, and don't see much likelihood of that ever changing. Don't ping me any more. —David Eppstein (talk) 17:03, 15 August 2018 (UTC)
Well, Beland, you've managed to piss David Eppstein off, and that's not easy so I guess you've accomplished something. The idea of centralizing this kind of advice is still a useful one, but you've got to stop this relentless sea-lioning. SMcCandlish, I invited you to do this once before, and you didn't take the bait, but I'm going to try again: you're our policy wonk supreme, so can you bullet-list the points still at issue so we can find a way forward? Take your time since I think tempers need to cool for a few weejs before we take this up again. EEng 02:52, 16 August 2018 (UTC)
I'm very sorry I've irritated you. When someone presents information or argumentation in a debate that isn't something I've thought of yet, there's a strong possibility that I'll change my mind, and that's how consensus-building is supposed to work. If there's no possibility of those things changing anyone's mind, then we might as well just take an immediate vote instead of discussing, which is something we're not supposed to do. On occasion I or someone else has presented new information or argumentation which seems like a convincing counter-argument to your objection, and I haven't been able to tell if you also found it convincing because you haven't responded, which is why I've been using pings. Personally I welcome pings, because I have too many pages on my watchlist to be able to notice a discussion happening. I'll certainly honor your request not to ping you if you don't wish to receive messages that way. -- Beland (talk) 01:20, 19 August 2018 (UTC)
If Beland was a nurse. EEng 02:33, 19 August 2018 (UTC)

Ahem!

Beland, I thought you said you weren't going to go about making trivial changes that don't alter what the reader sees. [12] EEng 22:30, 7 August 2018 (UTC)

@EEng: I have not said that. I have proposed some text above which points out that existing rules for AWB prohibit doing that with AWB, and that fully automated bots are unlikely to get approval to do that. But I'm not using AWB or a bot; I'm using the built-in Wikipedia search engine and manual editing. WP:MEATBOT says that editors have to pay attention and not make errors caused by manual speed-editing; I have been paying attention, and as far as I know I haven't messed up any of the conversions. I've also been doing some repairs to broken HTML entities that are reader-visible, in addition to this edit which isn't directly. BTW, a close reading of WP:COSMETICBOT reveals that edits which change search engine results (which these do) are considered substantial, so perhaps the objection that these edits are insubstantial should not apply at all, whether automated or manual, since they are indirectly reader-visible. -- Beland (talk) 21:15, 8 August 2018 (UTC)
OK, well I'm saying what David Eppstein said above: we should not make invisible and semantics-neutral changes to articles except as part of more substantive edits to the same articles. I continue to be concerned that you seem to be on some kind of uniformity-for-uniformity's-sake crusade, and that your machinery for "automatically find misspellings, mistakes in English grammar, and violations of the Wikipedia:Manual of Style" (WP:Typo_Team/moss) will lead to mindless gnomish "corrections" of things that were right in the first place or simply don't need to be changed. EEng 11:54, 12 August 2018 (UTC)
@EEng: I think this is only uniformity for uniformity's sake if you don't care about usability for editors and don't care about searches with special characters getting all the results the reader is looking for. I would disagree; usability is important for editor retention and productivity, and full search engine recall is a good goal. -- Beland (talk) 15:37, 12 August 2018 (UTC)
Please give an example of a plausible search that doesn't find an actual article now that it should, and will work properly under this "reform". Meanwhile, if you think any of this has even the tiniest bit to do with editor retention then you're completely out of touch with reality. EEng 16:12, 12 August 2018 (UTC)
@EEng: OK, say a teacher is reading a student report and they come across the text "The maximum gradient between Paddington and Didcot is 1 in 1320 (0.75 ‰ or 0.075 %)". They might think to themselves, "Hmm, ‰ is not a symbol I've introduced my students to or that I've seen any of them use. I wonder if this sentence is plagiarized?" If they do a search for "0.75 ‰" like so, they will not see Great Western main line in the first 500 results, but that is where I copied this text from. And the reason is that article is using &permil instead of ‰.
Help me understand your understanding - are you saying that using characters instead of HTML entities has no impact on usability, that usability has no impact on editor retention, or something else? -- Beland (talk) 17:45, 12 August 2018 (UTC)
  • But if the teacher searches gradient paddington didcot, which is what a sensible person would do, then they will find it, so I'm going to have to ask for another example before I'm convinced there's a realistic problem.
  • And that's assuming they use the Wikipedia search box. If they use Google, which is what any sensible person would do, then searching your original string The maximum gradient between Paddington and Didcot is 1 in 1320 (0.75 ‰ or 0.075 %) brings up the article as the first result, so now I really have to ask for another example before I'm convinced there's a realistic problem.
  • As a side note, use of that sentence wouldn't be plagiary, but that's neither here nor there.
  • A vanishingly tiny proportion of editors will ever encounter any of this, other than the dashes, and it's also unclear whether all this effort has any significant effect on "usability" from the point of view of these very few editors.
  • But most of all I very much dislike this assumption that regimentation in all things, no matter how minor, is necessarily a benefit. Let me quote something the wise Herostratus wrote in a similar context (WT:Manual_of_Style/Archive_188#How to indicate which person is which in a caption):
This is certainly something that should be left up to the individual editor, for various good reasons.
  • One good reason is that... there is no one clear correct or better way.
  • A second good reason is that adding another needless rule bogs down the MOS with more detail and makes it harder to learn and harder to use.
  • A third good reason is that creating a rule means enforcement, it puts interactions about the matter into an enforcement mode where editors are playing rules cop with other editors and this is not as functional as peer-to-peer interactions.
  • A fourth good reason is that there's zero evidence that it matters to the reader.
  • A fifth good reason is that micromanaging editors to this level is demoralizing and not how you attract and nurture a staff of volunteer editors – for instance we have a stupid micromanaging rule that I have to write "in June 1940" and not "in June of 1940" which is how I naturally write, and every stupid micromanaging rule like this is just another reason to just say screw it. As the Bible says "Thou shalt not muzzle the ox that treadeth out the corn" (1 Timothy 5:18, paraphrased from Deuteronomy 25:4) which updated means "Let the editor who did the actual work of looking up the refs and writing the friggen thing -- you know, the actual work of the project -- be at least allowed the satisfaction of presenting it as she thinks best, within reasonable constraints"...
This means different articles will do it differently. This annoys a certain type of editor. Oh well...

Please read that carefully and think about it. EEng 23:19, 12 August 2018 (UTC)

@EEng: I don't think we should blame the teacher in this example for failing to find the phrase in question because they have not chosen the words a "sensible" person would. A lot of times it makes sense to choose the rarest thing to search on because TFIDF ranks that highly. A sensible person would expect the search engine to find the phrase in question regardless of whether the searcher picked your words or mine.

In real life, I'm a programmer, and I often need to look up operators, which are usually punctuation. For privacy reasons and because they are a potential competitor, I generally prefer to avoid using Google and if I'm searching Wikipedia I use Wikipedia's internal search engine. In the case of special characters, that is often mandatory. For example, if you search Google for "site:en.wikipedia.org 0.75 ‰" you will also not see Great Western main line in the top search results either, because Google drops ‰ from that search entirely. I'm also a linguist, and sometimes I need to research symbols in various languages. For example, if I'm doing machine translation work with French, I might need to know more about how « is used by that language. Right now if I do a full-text search for that on the Wikipedia site, I only get one article in the search results. It's Guillemet and that's very helpful, but if I search for &laquo there are dozens more results. If I search for "«" or "« site:en.wikipedia.org" on Google, I don't get Guillemet at all. I can file a bug report with Google that they may or may not do anything about, but I can fix Wikipedia's search engine right now by converting all the &laquo to «.

If the number of editors this affects is "vanishingly small" in comparison to the size of the project, then the number of changes needed to implement the proposed guidelines is similarly small, and thus the amount of disruption is similarly small. If we have consensus that such changes are either neutral or small improvements (opinions range) but no one thinks they are negative or would want to undo them, and I'm willing to put in the work to do them, then what's the problem?

As for Herostratus' wisdom, I think I agree with most or all those points applied to the situation upon which they are commenting. English is complicated and people can tell the difference between clear and unclear prose without having an enormous rulebook. But most of these arguments don't jive for me with this case, which is not about how to phrase English prose.

  • The first reason - By definition, there is only one best way to do this. I'm not touching the cases where we've identified multiple possible ways to represent a character.
  • The third reason - I have a small set of volunteers that are enforcing good spelling; I personally don't mind being an "enforcer" of HTML entities. I don't think it's going to cause any arguments, since the cases I'd want to deal with would be pretty self-evident improvements. Rather I expect it'll be "oh hey I changed it to this other way of doing it for clarity" and most people won't care and some people will be educated about a better way to do it and maybe they'll adopt it and maybe they won't.
  • The fourth reason - Most readers won't notice, but I think the search examples above show it does matter in an important way to a small number of readers. It matters a lot more to editors who are trying to make improvements to the article and who would be expected to interpret and occasionally use the HTML entity syntax despite mostly not being web developers.
  • The fifth reason - This seems more like a technical how-to than a style micro-management to me. Besides, all contributed text is edited mercilessly, and that's a community standard. None of the articles I've started have any text recognizably from me left in them. Authors have to be open to small tweaks from editors - that's what editors do: edit. Wikipedia authors are discouraged from claiming ownership over a particular expression of an idea.

As for the second reason, it's a valid concern that the Manual of Style not get too long. I don't think anyone actually reads it end-to-end, though, so this is not my biggest worry. When I use it, I tend to be looking for the section that answers a particular question that's come up. It seems to me like the most logical place to put this information, but I'm open to putting it somewhere else if it's considered too obscure for a general audience. Would that be preferable? -- Beland (talk) 02:43, 13 August 2018 (UTC)

Look, Mr. Programmer-Linguist having "a small set of volunteers that are enforcing good spelling", we are not here to make it possible for you to look up operators for your work. EEng 06:08, 13 August 2018 (UTC)
P.S. for my fellow editors: this is beginning to remind me of the periodic harangues we receive about how we mustn't do this or that in citation templates "because it pollutes the CoINS output". EEng 06:20, 13 August 2018 (UTC)
@EEng: If Wikipedia isn't here to help people find useful and interesting information, then what is it here for?? Making search engines give useful results is part of the reason the MoS requires double quote marks, and much of the reason for creating an army of redirects. -- Beland (talk) 13:37, 13 August 2018 (UTC)
The straight-quotes guideline is a bunch of nonsense from the stone age, and you've failed to give a plausible example of what's not working now which all this will fix. EEng 02:42, 16 August 2018 (UTC)
Re "Making search engines give useful results is part of the reason the MoS requires double quote marks": [Citation needed]. —David Eppstein (talk) 02:48, 16 August 2018 (UTC)
It's this nonsense: [13]. EEng 02:54, 16 August 2018 (UTC)
Stone age or not, the motivation to improve results is the same. What's implausible about wanting to look up programming language operators or punctuation like "«"? The same problems could easily be happening for a hundred other characters, from ∃ to ∞, where again, I only get one result when I do a full-text search instead of the dozens of results that I know use those characters via HTML entities. -- Beland (talk) 21:09, 17 August 2018 (UTC)
You've still failed to give a plausible example of what's not working now which all this will fix. EEng 23:06, 17 August 2018 (UTC)
I honestly don't understand why you say that. Do you consider the examples immediately above to be implausible, or is it unclear how making these changes would improve those cases? -- Beland (talk) 23:31, 17 August 2018 (UTC)
Neither Wikipedia search not Google search can distinguish "\exists" in math mode (by far the most common way of formatting this symbol) nor &exists; from the word "exists" used in English prose (I just tried it to be sure). And Wikipedia's search can't find articles containing the "∃" character either: standard Wikipedia search returns only the main article on that topic, and insource:"∃" finds nothing. Google search for "∃" site:en.wikipedia.org works better, but doesn't care about the difference between unicode characters and html entities. So replacing some html-entities by the ∃ character is pointless, and your supposed improvements from making this replacement are, yes, implausible. And replacing math-mode by html is more than pointless; it is wrongheaded, bad, and likely to get you blocked if you persist in that sort of edit. So your proposed changes seem to be completely pointless to me, a waste of your time as well as ours. As I have said before, if you want to spend time actually improving Wikipedia, you are much more likely to have an impact doing so by creating content than by fiddling with character encodings. —David Eppstein (talk) 00:31, 18 August 2018 (UTC)
Ah, the Wikipedia search engine definitely has a bug in it, which I will report. You're right that it doesn't pick up articles that have a literal ∃ in them when doing a full-text search for that. Even worse when I search for "∃n", the "∃" gets stripped out, leaving just a search for "n", which is obviously not helpful. The "insource" syntax appears to need regular expression delimiters. I do successfully find a ton of articles with "insource:/∃/" and a I do get a different set when I search for "insource:/&∃/". So technically the changes are visible to searches, but yeah, we shouldn't expect readers to know that they would need to use that syntax to find what they are looking for. Changing <math> markup has never been part of this proposal, though it is a good point that the search engine is not handling that well right now. Are there search engines that specialize in math that would be a good model for how to solve that? I know there are specialized search engines for chemistry that know about 3-D structures in a useful way. Obviously programming language operators, punctuation, and letters from non-English languages don't use <math> markup, so I think normalizing representation would matter more for those, especially if the bug was fixed. -- Beland (talk) 01:41, 19 August 2018 (UTC)
The "insource" syntax does not require regular expression delimiters; quotation marks (insource:"search_term") work for an exact search (some exceptions apply, mostly punctuation), whereas the regular expression delimiters will actually execute a regex search (which take longer). Probably the bug you've run into is a result of search folding, were I a guessing man. --Izno (talk) 02:58, 19 August 2018 (UTC)

Capitalization of reference titles

Hi all, not sure where the best place to ask this question is, but quickly: If we find a reference like this, how are we supposed to present the title of the article in our reference formatting?

A) Jennifer Winget and Harshad Chopda's Bepannaah makes a smashing debut; rakes in ratings
B) Jennifer Winget and Harshad Chopda's Bepannaah Makes a Smashing Debut; Rakes in Ratings

The broad question is, how much (if any) work do we do to conform the reference's title to our MOS? Thanks, Cyphoidbomb (talk) 14:43, 19 August 2018 (UTC)

This is covered by Wikipedia:Citing sources and may be discussed on the associated talk page. There you will find a Wikipedia article may follow any consistent citation style. Both approaches to capitalization mentioned by Cyphoidbomb can be found in various external style manuals, and either approach may be used in Wikipedia, as long as any particular Wikipedia article is consistent. Jc3s5h (talk) 14:59, 19 August 2018 (UTC)

Dog breed capitalisation?

Reading through Italian Greyhound, I noticed that both Italian and Greyhound were both capitalised. I see why 'Italian' would be, but 'Greyhound' didn't seem right, so I started changing them. I just checked pages for other dog breeds though, and noted that they are the same - see Whippet, Greyhound, Sloughi. I've looked at MOS:COMMONNAMES and it does not give any justification for this - are there any other rules I'm missing, or should I edit these pages to conform with normal capitalisation rules? Girth Summit (talk) 18:23, 23 August 2018 (UTC)

More or less, breeds are capitalized at this time. NCFauna mentions breeds and the current state of things, though there is nothing formal. There is a discussion at the ideas village pump regarding a draft RFC to either solidify this as a !rule or to deprecate it in favor of lower case naming. --Izno (talk) 19:30, 23 August 2018 (UTC)
Thanks - I didn't know that was going on. I'll revert my changes at Italian Greyhound and go over to the RfC. Girth Summit (talk) 07:26, 24 August 2018 (UTC)
It is a draft--so the user who opened that discussion is looking to ensure that he has presented ideas fairly, rather than the full RFC. Take that into account. --Izno (talk) 12:17, 24 August 2018 (UTC)

Weird punctuation in song titles

Radiohead have a song called Go to Sleep, which is listed as "Go To Sleep." on some tracklists. (In fact, this goes for every track on their album Hail to the Thief.) Should the period be included when we mention the song on Wikipedia? My vote would be no, because I see it as a stylization, but I can't find anything in the MOS that specifically backs me up or shoots me down. MOS:CONFORMTITLE might apply but it doesn't seem black and white. Popcornduff (talk) 09:39, 25 August 2018 (UTC)

Does the MOS say anything about introductory paragraphs of sections (before its subsections)?

In an article, the main intro (lede) summarizes the article overall.

But it may have a subsection with its own intro (lede) and subsections. For example:

==Political activity controversies==

[Summary of section/section lede]

== Involvement in X==
...

== Donations to Y==
...

== Consultancy to Z==
...

My question is hard to express exactly so I've worded it a few different ways in the hope it expresses the underlying concern.

  • An article has a significant section made up of a lede covering some (but not all) issues, and rebuttals to them, in overview, followed by subsections covering various of them in more detail.
  • How should one judge whether the section header is appropriate?
  • Should the section intro/lede, be expected to summarize the subsections (like the article lede summarizes the article content)? What guidance is there about such section headers?
  • Should the section's lede text be considered like an article lede is considered - that is to say, 'does it summarize the subsections for a reader who reads the section lede but not the detailed subsections'? if not, what are the criteria for a good section lede?

Thanks for any help and insight. FT2 (Talk | email) 11:00, 25 August 2018 (UTC)

Use of inner subsections should cover this:
==Political activity controversies==
General information on political activity and anything not covered in the subsections below.

=== Involvement in X===
X specific material
====2016 exposé====
What he did re X
====2017 impeachment attempt====
Moves to impeach him for the exposé
=== Donations to Y===
Y specific material

HTH, Martin of Sheffield (talk) 12:00, 25 August 2018 (UTC)

In general, only the lead (which has no sub-sections) is expect to summarize the article. But sometimes, a summary of one part of the article may be followed by individual incidents, examples or whatever. Whether these should be given sub-headers largely depends on length. Many editors over-use sub-headers, at least imo. In general, avoid a string of sub-headers for bits one or two sentences long. On the other hand, if there is a string of distinct bits with more than a para each, that should probably be divided with sub-headers. I don't think anything more specific can be said. If we all had the same desktop screens, and pictures didn't complicate the issue, I'd be tempted to say as a rule of thumb that the average screen view at any point should include at least two headings, or parts of two sections, but no more than ?4 or 5. Hope that helps. Johnbod (talk) 14:36, 25 August 2018 (UTC)

Where a large number of articles are very closely related, and contain sections which are interchangeable or nearly so are duplicated within those articles, should Wikipedia encourage use of a boilerplate system instead of copying the same section over three hundred or more articles? I think the "template" system might end up being too rigid? Any suggestions? Can this be done? Or should we keep on cut-and-pasting such sections? Thanks! Collect (talk) 18:55, 3 August 2018 (UTC)

Example? Sounds like exactly what templates are for. We also now have sectional transclusion.  — SMcCandlish ¢ 😼  20:10, 3 August 2018 (UTC)
Sectional transclusion – is that one of those gender reassignment surgeries? I just can't keep up with these modern developments. EEng 01:14, 4 August 2018 (UTC)
Nah it's when the foetus is taken out by C-section, then put into the father, so he can lug it around for a while.  — SMcCandlish ¢ 😼  02:37, 4 August 2018 (UTC)
The cookie-cutter/templated approach isn’t wrong ... it is just a bit lazy in my mind. The important thing for me is that we also continue to allow articles to NOT follow the mold. We should not fall into the trap of thinking that all articles in the topic area have to follow the pattern just because most do. Blueboar (talk) 20:19, 3 August 2018 (UTC)
  • An example of such interchangeable sections? EEng 21:26, 3 August 2018 (UTC)
    It's a paragraph rather than a section, but one example is {{Johnson solid}}. I'm not convinced this is a good idea, but having it consolidated and properly sourced in one place like this is at least better than its creator's habit of spewing this material over hundreds of related articles by copy and paste without proper sourcing. —David Eppstein (talk) 21:29, 3 August 2018 (UTC)
    And this would be used in each of the 92 articles, or something? And that's really needed, instead of just saying, in each of the 92 articles, "A dimorphic protoplasm is a Johnson solid", and letting the reader click to learn what that is? EEng 21:34, 3 August 2018 (UTC)
    Is it really needed? I doubt it, but the editor who pushes this material is difficult to rein in, and as I said, tends to copying and pasting of sort-of-related material on lots of sort-of-related articles. At least this way the copyedits and sourcing get propagated to all copies at once. —David Eppstein (talk) 22:05, 3 August 2018 (UTC)
    Here's another example: Look at List of birds of Gibraltar. There's a paragraph after each family name that could be turned into a template. Similar paragraphs are used throughout the "List of birds of ..." series. There was once an effort to set them up that way, but it is not now used. @Basar: was involved in that and I think Template:Bird list header was part of it. The system had not been maintained and had only been partially implemented, so now each list has similar material, but of course individual editors have made changes and now there's a lot of variation. I'm not sure using templates would be an improvement, but that's an example.  SchreiberBike | ⌨  04:01, 4 August 2018 (UTC)
Also 461 cases for " qualified for the 1980 U.S. Olympic team but was unable to compete due to the 1980 Summer Olympics boycott. (She) did however receive one of 461 Congressional Gold Medals created especially for the spurned athletes. as one of the shorter examples. There are scads of other examples, of course, but I was unsure that "template" results would make the material appear "of a piece" of the rest of the articles or simply make it stand out as most templates now appear to do. Collect (talk) 14:25, 4 August 2018 (UTC).
I think most people commenting here will be familiar with the idea of using templates and the fact that they may translude text into articles. However they are [nearly] always placed in box of some sort that makes it clear that they are not usual text. Back in 2015 I ran a script that placed a {{unreferenced section}} on lots of Family tree templates. In those case I also included instructions of how those templates could contain be self contained inline-citations using {{efn-lr}} and {{notelist-lr}} (see for example Template:Kennedy family tree).
At the moment I am running a script that searches for "1911encyclopedia.org" (a dead web site) and replacing the dead source with Wikisource links. This means that I am straying out of my watch-list area and viewing other articles.
A "1911encyclopedia.org" citation was included in the article House of York and while fixing it I noticed that there were two distinct citation styles used and that the short-inline-citations did not have corresponding supporting long-citations in a bullet pointed references section. This is usually a red-flag that text has been copied from one article into another (see Wikipedia:Copying within Wikipedia#Other reasons for attributing text) sometimes without the correct attribution in the edit history. I checked back through the history the change in style happened with a series of edits made by user:JMvanDijk on 5 June 2017 this included copying text from fr:Armorial des Plantagenêts and White Rose of York so far so normal.
But there was also a transluded section/piece that was not a template. The section/piece was taken from List of coats of arms of the House of Plantagenet and it appears in House of York#Coats of Arms it the translucent was created by user:JMvanDijk on 5–7 June 2017.
Problems:
  • The lack of edit history for the trasnsluded section in the child article. This may be a copyright violation (not sure).
  • Citation style: In this case the article House of York was using long-inline-citations while the List of coats of arms of the House of Plantagenet uses short-inline-citations.
  • Maintance issue with short-inline-citations and long-citation in the reference section. A citation requested to the section in the parent article, followed by a new short-inline-citation with a long-citation in the reference section will not propagate with the translucent.
  • Editing issues for new editors who probably will not understand what transluded sections are, they will be confused and will not be able to access them to fix problems in the text (See the people who hate citation templates use WP:CITEVAR to stop the changing to citation templates one of their arguments for not using them is that they are complicated).
  • Which talk page?: If someone fixes a problem and it is reverted. On which talk page does one discuss the reversion? For example the change/revert my be done for good reasons in the child article, but the actual change takes place in the parent where it is not so relevant. --PBS (talk) 10:23, 26 August 2018 (UTC)

Names of chemical elements

@Deacon Vorbis: This came about as a result of a discussion at Talk:List of chemical elements#spelling, after I queried why aluminium and sulfur appeared in the same article. I was refered to WP:ALUM. That made it clear that the internationally accepted spellings should be used. It doesn't just refer to article titles, as it states "... even if they conflict with the other national spelling varieties used in the article." As I'm sure I'm not the only person who's not aware of that convention, I believe it should be listed as an exception to ENGVAR. Voice of Clam (formerly Optimist on the run) (talk) 14:20, 24 August 2018 (UTC)

Okay, I see what you're going for, but I think it was unclear originally. Others might still disagree, but I wouldn't object if it were re-added with something a bit clearer/firmer, like along the lines of "use x, y, and z regardless of the variety used in the article; see here". Since that page only lists 3 cases, there's no reason to cut it down to just 2. –Deacon Vorbis (carbon • videos) 14:41, 24 August 2018 (UTC)
However it's done, I do agree that that it should be made clear that this is an exception to the usual ENGVAR guidelines. "Sulfur" in a British English article looks as odd to me as doubtless "aluminium" does in a US English article, and I remember 'fixing' it when I was a new editor here and being surprised to find it was the norm. Peter coxhead (talk) 15:54, 24 August 2018 (UTC)
If it's just a wording issue, how about the following:
For articles about chemistry-related topics, the international standard spellings of aluminium, sulfur, caesium and derivative terms should be used, even if they conflict with the other national spelling varieties used in the article. See WP:Naming conventions (chemistry)#Element names.
Voice of Clam (formerly Optimist on the run) (talk) 19:46, 24 August 2018 (UTC)
Let me suggest instead
For articles about chemistry-related topics, the international standard spellings of aluminium, sulfur, caesium (and derivative terms) should be used, even if they conflict with the other national spelling varieties regardless of the national English variant employed in the article generally. See WP:Naming conventions (chemistry)#Element names.
EEng 20:03, 24 August 2018 (UTC)
Fine by me. Voice of Clam (formerly Optimist on the run) (talk) 20:29, 24 August 2018 (UTC)
The problem with writing for the MoS is the prevalence of nit picking and wiki-lawyering. "Chemistry-related topics" isn't clear enough, I think. Does it mean that the whole article has to be chemistry related? Does it apply to sections on biochemistry in medical articles? Or in articles about organisms? Why include this qualification? Peter coxhead (talk) 20:49, 24 August 2018 (UTC)
(Also fine by me). As far as what counts as chemistry-related, that was already just as open to interpretation from the naming conventions guideline, so this at least doesn't introduce anything new in that regard. –Deacon Vorbis (carbon • videos) 21:54, 24 August 2018 (UTC)
That was my thinking exactly. EEng 02:06, 25 August 2018 (UTC)

In the context of automobile articles, we would be pushing it uphill to get Americans to talk about aluminum wheels. Very few of the editors on automobile articles would know anything about chemistry or know about the WP policies for chemistry. Indeed, I only found out about WP:ALUM today and I have been contributing for over 10 years, have been working in various engineering related fields for 30 years and remember most of my high school chemistry.  Stepho  talk  03:58, 25 August 2018 (UTC)

we would be pushing it uphill to get Americans to talk about aluminum wheels – say what? EEng 04:37, 25 August 2018 (UTC)
Apologies, I meant to say 'we would be pushing it uphill to get Americans to talk about aluminium wheels'. My point still stands that editors with little knowledge of chemistry are not going to know about WP:ALUM. Having WP:ALUM trump WP:ENGVAR will lead to endless edit wars.  Stepho  talk  22:19, 25 August 2018 (UTC)
Please stop dragging Trump into everything. I agree that the vagueness of chemistry-related will almost certainly lead to trouble sooner or later, but I'll leave that to others. Probably that means we'll just kick the can down the road. EEng 22:34, 25 August 2018 (UTC)
Suggestion: What if we allowed local spelling in articles with strong regional ties? For example, it would feel off to me to use "aluminium" in this sentence from Alcoa, Tennessee: "As its name suggests, Alcoa is the site of a large aluminum smelting plant owned and operated by the Alcoa corporation." Note that "aluminum smelting" in the sentence is wikilinked to aluminium smelting, which feels fine to me. In other words, MOS:TIES > WP:ALUM, but WP:ALUM > MOS:RETAIN. Tdslk (talk) 00:24, 26 August 2018 (UTC)
That's exactly what I would like to see. Well, I'd actually like to see everybody use Australian English but, of course, that won't happen. WP:TIES for most articles, WP:ALUM for the articles directly related to chemistry (eg mining, refining, smelting, moulding, reactions), WP:RETAIN for most existing articles (to avoid edit wars) and WP:ALUM for new articles (although I can live with it not being policed well for new entries).  Stepho  talk  08:51, 26 August 2018 (UTC)

What does IUPAC mean when it says "The alternative spelling 'aluminum' is commonly used." and "The alternative spelling 'cesium' is commonly used." in its Recommendations (Table I of the Red Book)? DrKay (talk) 07:41, 25 August 2018 (UTC)

It's beauracratese for "don't be surprised if you see these spellings and don't bother trying to have them changed".--Khajidha (talk) 15:30, 25 August 2018 (UTC)

See also sections in articles -- the Health and appearance of Michael Jackson article

Opinions are needed at Talk:Health and appearance of Michael Jackson#Structure. The latest discussion in that section concerns whether or not articles should have see also sections and whether what MOS:MED states at Wikipedia:Manual of Style/Medicine-related articles#Standard appendices should apply to what to do with this particular article's See also section. A permalink for the discussion is here. Flyer22 Reborn (talk) 23:43, 29 August 2018 (UTC)

Input-needing RM re en-dashes/hyphens in –30– (The Wire)

In the hypen–dash wars, no one is neutral.

A requested move of –30– (The Wire) to -30- (The Wire) was just relisted due to lack of input. Regulars of this page are the only people I can think of who are knowledgeable of and interested in en-dashes vis-a-vis hyphens, so I notify you of the RM. (I assume my notice is neutral because I am neutral and don't care which line is used here.) -sche (talk) 16:02, 30 August 2018 (UTC)

Bold in section headings

Do we have a rule about using bold, capital letters, exclamation points, color, font size, etc., to give one section on a talk page more prominence? Because whatever I want to say is obviously far more important than anything anyone else has to say...  (:   --Guy Macon (talk) 20:20, 23 August 2018 (UTC)

MOS:PSEUDOHEAD has "try to avoid using bold markup", but it's hard to see what reasonable justification someone could give for using bold in section headings. --tronvillain (talk) 20:51, 23 August 2018 (UTC)

  I PERSONALLY FIND IT TO BE REALLY ANNOYING!!!!!  

...but is it worth adding a specific rule forbidding it? I hear little birds, and they are chirping WP:CREEP, WP:CREEP. On the other hand, look at the current table of contents at Wikipedia:Village pump (technical)... --Guy Macon (talk) 23:34, 23 August 2018 (UTC)

Possibly for bold, color and font size. Capital letters and punctuation would need exceptions for initialisms, acronyms, tradenames etc; still, in principle it's an improvement. All-caps headings are annoying in drafts. Have there been a few live examples recently? I've only ever seen this mistake made by new editors, quietly fixed and not questioned, so I don't know if it's worth the effort of getting a policy change agreed. On the other hand, it's sad that policy is seen so often as untouchable, so it'd be nice to see a concrete improvement like this. › Mortee talk 00:36, 24 August 2018 (UTC)
Wikipedia:Talk page guidelines#Technical and format standards: "Avoid excessive use of color and other font gimmicks." Wikipedia:Talk page guidelines#Editing others' comments: "Because threads are shared by multiple editors (regardless how many have posted so far), no one, including the original poster, "owns" a talk page discussion or its heading. It is generally acceptable to change headings when a better heading is appropriate, e.g., one more descriptive of the content of the discussion or the issue discussed, less one-sided, more appropriate for accessibility reasons, etc." DrKay (talk) 07:25, 24 August 2018 (UTC)
OK, so there is already a guideline against this kind of thing, or sufficient to remove/change it (and hence less/no need to add something to the MOS), that's good. I agree that styling using font color/size/weight gimmicks in section titles is inappropriate. Thanks, Mandruss, for changing the mentioned example. -sche (talk) 16:09, 30 August 2018 (UTC)
Fixed. [14]Mandruss  23:06, 24 August 2018 (UTC)

Julian or Gregorian dates for medieval England births and deaths in year articles?

WP:JG doesn't seem to have a strong opinion one way or the other, if I'm reading it right. I just added a Japanese figure whose death date I converted to the Gregorian calendar using an online tool. I don't actually know whether all the other dates included in the list are Julian or Gregorian. Is there a rule here? It seems like lists like that should be internally consistent. The only solid example I could think of off the top of my head, where English Wikipedia would definitely list two specific people in different countries who died on the same date, according to different calendars so that they actually died several days apart, was the famous Cervantes/Shakespeare mess here, which explicitly notes that the Shakespeare date is OS. But are all dates for 11th-century Europe assumed to be Julian? Or what? Should I change the Japanese date to Julian? Hijiri 88 (やや) 11:59, 1 September 2018 (UTC)

For 11th century the applicable advice would be:
  • Dates before 15 October 1582 (when the Gregorian calendar was first adopted in some places) are normally given in the Julian calendar. The Julian day and month should not be converted to the Gregorian calendar, but the start of the Julian year should be assumed to be 1 January...
For England after 15 October 1582 but before 1752, the applicable advice would be:
The dating method used should follow that used by reliable secondary sources...
The Dictionary of National Biography and American National Biography (many people important in Revolutionary America were born in the UK) use Julian dates for UK and American events when that calendar was in effect in the relevant area. Therefore I suggest following these sources and using Julian when and where that calendar was in effect, but changing the beginning of year to 1 January where applicable. The article should have a footnote on the first Julian date indicating this approach.
In lists that include events in the transition period, and the events occur in several countries, it would be best to indicate the calendar used for each entry. Jc3s5h (talk) 12:39, 1 September 2018 (UTC)
See also {{OldStyleDate}}, {{OldStyleDateDY}} and {{OldStyleDateNY}}. There don't seem to be the equivalent NewStyle templates, and in any case you need to do the conversions manually. For instance the Battle of Reading (1688) occurred on 19 December [O.S. 9 December] 1688. ({{OldStyleDateNY|19 December|9 December}} 1688). HTH, Martin of Sheffield (talk) 13:14, 1 September 2018 (UTC)

Exact dates or not

In cases like this, does MOS have any guidance on what is "better"? Gråbergs Gråa Sång (talk) 08:31, 3 September 2018 (UTC)

The parenthesis says "two days before investigation findings were made public", so yes, you should not be over-vague. --Redrose64 🌹 (talk) 08:52, 3 September 2018 (UTC)

City, state, and zip code

Many address, for places large and small, have the city state and zip code based on the post office delivering mail. It is not so unusual that this disagrees with the actual city boundary. Sometimes this matters, such as indications in articles about specific cities. Gah4 (talk) 06:42, 4 September 2018 (UTC)

What does this have to do with MOS? EEng 12:41, 4 September 2018 (UTC)