Langbahn Team – Weltmeisterschaft

Template talk:Municipalities of Slovenia

Sorting diacritics

This template has been re-sorted again according to some mysterious and probably completely irrelevant argument about "this being an English Wikipedia". English language doesn't use letters with diacritics č, š and ž, so there are no rules about sorting them in English grammar. In which case they should be sorted according to the only existing grammatical rules - those of the language they come from. You cannot just treat them as letters without diacritics, because they are distinct letters, with very different pronunciation from their "plain" counterparts (i.e. they don't just signify a detail like inflection; you pronounce "č" as [ch]). At least that's my reasoning. If rules do exist, please provide a reference (this of course refers to User:Gene Nygaard as the only person who knows them, apparently), otherwise undo your recent "fix". Thank you, — Yerpo Eh? 08:28, 17 February 2010 (UTC)[reply]

Yes, there are rules for sorting them in English grammar. They are sorted as c, s, and z. See the sorting rules in Wikipedia:Categorization, or example. Then look at Category:Living people, with nearly a half-million entries. I'm the only one who knows these rules? Get real.
English sorting rules don't have anything sorted after Z. Nobody is going to look for anything there. That includes the sorting of the second letter, or whatever. English sorting rules don't have anything sorted in between C and D. English sorting rules don't have anything sorted in between S and T.
Furthermore, it would be totally ridiculous to have to know the sorting rules of the language in which a letter is used in order to be able to find something here. For one thing, the same letter isn't sorted the same way in every language. In Norwegian, Å is sorted after Ö (which is sorted with Ø, not with O), but in Swedish, Å is sorted before Ö. In German, Ö is sorted with O, but in Norwegian and Swedish, it comes somewhere after Z.
So we'd not only have to know a few hundred different sorting rules for each language, but also we'd have to know which of the thousands of different letters used on Wikipedia are included in each language. Then, when they are used, we'd have to know which of the various languages in which each could be used is being used at that time, and we'd also have to know which language a particular word derives from, in order to be able to find it. That would be a ludicrous situation.
We don't do that. We sort in accordance with the English alphabet. If these letters can't be used with the English alphabet, transliterate them into English before we sort them here on the English Wikipedia. As it stands now, there are probably several of these in this template which are mis-named according to our naming conventions, and shouldn't have those letters in the article names in the first place. Rather they should be like Munich and Germany and Copenhagen.
So if you want to edit-war about it, okay. Otherwise, leave it fixed. Gene Nygaard (talk) 11:25, 17 February 2010 (UTC)[reply]
AFAIK, category sorting stems from software's features (i.e. it follows Unicode character listing) which automatically sorts all diacritics and other "funny" characters at the very end. So, they are sorted as "flat" to avoid lumping everything together and the end of the category which would be even more wrong. The argument "English sorting rules don't have anything sorted after Z." is completely irrelevant, too, as the English alphabet doesn't have anything else similar to Z anyway.
Which brings us to transliteration and naming conventions, where your "argument" is flatly false. There is no established usage of obscure Slovenian towns and villages, so according to Wikipedia:Article titles: If there are too few English-language sources to constitute an established usage, follow the conventions of the language appropriate to the subject.
The only even remotely credible argument you gave, and for which I'm prepared to leave this like it is now (at least for the time being), is the confusion about different sorting rules in different alphabets. You could've just said that without resorting to false logic that almost made me question your good faith. — Yerpo Eh? 12:21, 17 February 2010 (UTC)[reply]
Now that you've figured out a little about this, you could go help out the poor people on the Slovenian Wikipedia, who don't know that š is sorted between s and t in Slovenian sorting rules, according to you. A primitive, rudimentary system which sorts by Unicode number doesn't fit anybody's sorting rules. That's why we fix them here, to sort in accordance with the 26 letters of the English alphabet. That brings up a couple of more factors:
  1. We'd also have to have some readily available source to specify the sorting rules applicable to each of the hundreds of languages from which the names used on Wikipedia come. We don't have that.
  2. There isn't necessarily one set of sorting rules applicable for a particular language. Different publications might have different rules. This is especially evident in things such as "Mc" and "Mac" names in English, and "St." abbreviations for "Saint". Not all of the possible ways that some things can be sorted have been hashed out on Wikipedia, but the basic framework is there. A simpler way to get true case-insensitive sorting in almost all categories would be a big help. But we're still stuck with a primitive sort engine, and likely will be for a long time. Gene Nygaard (talk) 14:05, 17 February 2010 (UTC)[reply]