WORD BY LETTER : English CROSSWORD SOLVER and others things ...
Words starting with : 
Words ending  with : 
Ledger Nano S - The secure hardware wallet
Find a definition : 
Home

definition of the word Wiktionary:Beer_parlour

by the Wiktionnary

IC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> Wiktionary:Beer parlour - Wiktionary

Wiktionary:Beer parlour

Definition from Wiktionary, a free dictionary

Jump to: navigation, search
Start a new discussion

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to the relevant policy page, or a brand new one may be created. See Category:Policies - Wiktionary Top Level for identified policy pages. Some of these may be inactive. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives
2002
December
2003
2004
2005
2006
2007
2008
All subject headings




Contents

I was wondering what people's thoughts would be about replacing all etymon templates (e.g. {{L.}}, {{AGr.}}) with {{etyl}}. The advantage would be that we would have a consistent, fairly intuitive format for etymology templates (just one set of codes to memorize). Additionally, this would also allow us to make widespread changes in format, or allow users to customize their experience (perhaps we could allow users to link to SIL's site, instead of the 'pedia, or no links at all). Finally, and most importantly in my opinion, this will allow us to sync up our allowed L2 headers with etymon languages. The disadvantage would that we'd be putting a lot of eggs in a single basket. {{etyl}} is under sysop-only protection, which is about as safe as we can make it, but if I turn out to be another Wonderfool, a couple of edits to the template after such a switch is made could severely backlog the server. A couple of caveats: First, {{etyl}} is not yet capable of handling dialects, although this is something I would like to change in the near future (see User_talk:Robert_Ullmann#Standardizing_dialects for a glimpse of my lack of progress on the issue, any bright ideas would be most welcome). Because of this, we would want to keep dialect-specific templates, such as {{LL.}}, {{VL.}}, etc. Also, {{etyl}} can only handle languages with ISO codes, which discludes reconstructed languages, such as Proto-Indo-European, as well as macro-languages, such as Germanic. Proto-langs are no problem, as {{proto}} should be used for them anyway. Btw, I've gone through a bunch of the old-fashioned proto-lang templates, such as {{PIE.}}, switched them to {{proto}}, and nominated them for deletion here, if anyone would like to comment. As for macro-languages, I figure we can leave those for the time-being, at least until a solution is decided. So, if anyone agrees with this, but has some issues with the current implementation of {{etyl}}, they should be noted now. Also, if any of our bot owners would be willing to take up this task, should the community response be positive, that would be appreciated. -Atelaes λάλει ἐμοί 02:13, 11 August 2008 (UTC)

That's what I've been trying to do with the etymologies I've added or edited, at least if I knew enough to look for the code. (Who knew that Anglo-Norman had a code?) I steer clear of all reconstructed languages. Do all of the Germanic languages used in Webster 1913 etymologies have ISO 639 codes or are they like the various Latins? DCDuring TALK 02:24, 11 August 2008 (UTC)
I have not come upon any Germanic languages which do not have ISO codes. Could you give an example of one you could not find? -Atelaes λάλει ἐμοί 02:30, 11 August 2008 (UTC)
It would be Middle and Old (Dutch, Friesian, Saxon) that I'd be concerned with because Webster 1913 uses them. I guess the real scope of my concern is with Websters 1913, so that we don't lose that information needlessly and so that we don't waste time searching for codes that don't exist. DCDuring TALK 03:09, 11 August 2008 (UTC)
{{dum}}, {{odt}}, {{ofs}}, {{gml}} (Middle Low German is another name for Middle Saxon) {{osx}}. Couldn't find Middle Frisian. And, if we do this, step 1 would certainly be, switch those templates which currently have an ISO counterpart, and worry about the rest later. -Atelaes λάλει ἐμοί 18:21, 11 August 2008 (UTC)
There isn't a code for Middle Frisian because almost nothing was ever written down in Middle Frisian. Old Frisian continues into the 16th century, then modern Frisian languages appear in the nineteenth. There is almost nothing available for the intervening period. --EncycloPetey 18:57, 13 August 2008 (UTC)
Sounds good—I use etyl whenever possible. The only problem I remember off-hand is Afrikaans, which I couldn't locate a code for use in laager. For Canadian French, I simply created {{fr-ca}} and used {{etyl|fr-ca}}, in CanadienMichael Z. 2008-08-11 02:36 z
Afrikaans code = afr. Canadian French = fre. "afr" has worked for me. But there are many codes that don't work with etyl, possibly because they are language families (e.g. today pes (Persian) didn't work with etyl). With Afrikaans and South African English you can also find African languages that don't yet work. DCDuring TALK 03:09, 11 August 2008 (UTC)
I can't find a reference that supports Canadian French = fre. ISO 639-2 has French = fre/fra,[1] and ISO 639-3 doesn't have fre at all.[2]. It looks like the region code fr-ca is necessary. Michael Z. 2008-08-11 06:56 z
Shouldn't that be fr-CA (uppercase country code)? —RuakhTALK 13:29, 11 August 2008 (UTC)
Yup, fr-CA would be the recommended style and that's how I should have created it. (But, technically, it ought to be case insensitive.) Anyway, as Robert reminds us, it's probably not a good idea to multiply the number of dialects with codes by the number of possible regions. Michael Z. 2008-08-12 15:51 z
We have language code templates for the exact set of languages we use as L2 headers. Creating templates like this for an entirely open ended set of regional dialect (e.g. hundreds of thousands in potential) would be an absolute disaster area. Canadian French is a context label {{Canada}} on definitions. In etymologies, we are always going to need descriptive qualifiers, it is not possible to just add code after code. etyl should be used if and only if the language is accepted as an L2 header and thus is coded. If you simply must use etyl for everything, forcing them into it, then etyl needs a qualifier parameter ({etyl|fr|Canadian}} or some such syntax). (The point made above about sync'ing L2 languages with Ety languages is valid, as long as one keeps in mind that one is syncing L2 languages (with code templates) with a small subset of the huge number of dialects and variants in Etys. Robert Ullmann 14:04, 12 August 2008 (UTC)
Is there a master list of L2 headers? Can I assume that we can use any valid ISO code, but not introduce regional variants without discussion?
(cutting into the middle of this comment) see WT:CFI, where the list of languages allowed uses ISO 639-(1,3) as the presumptive starting point, with some disallowed (Klingon), and other allowed with some discussion. Robert Ullmann 18:11, 17 August 2008 (UTC)
There ought to be some mechanism to handle this, for example to preserve the etymological information when my dictionary reference says that the source of a word is “Canadian French.” I will quote you at template talk:etyl#Regional language tags. Thanks. Michael Z. 2008-08-12 15:44 z
You can say "From Canadian French language", putting only the "French" in with a template, but noting Canadian as text. --EncycloPetey 18:59, 13 August 2008 (UTC)
I tried that first: From Canadian {{etyl|fr}}, in the entry Canadien. It works, but it is lacking in a couple of ways.
  1. The link text is awkward, particularly in this entry: “From Canadian French Canadien (“Canadian”)” Splitting up the single noun “Canadian French” into linked and unlinked text makes it a bit confusing.
  2. The raw text “Canadian” doesn't add any semantic information into the database that is Wiktionary. There is no category to consistently find French Canadian derivations.
 Michael Z. 2008-08-13 22:55 z

I agree with this, I've been using {tempy|etyl}} exclusively where possible in all the ety sections I add or edit. As far as I know, all the language codes that work with {{etyl}} are listed at Wiktionary:Etymology/Templates (WT:ETY/TEMP) and those languages, families, etc that don't are listed at Wiktionary:Languages without ISO codes. Obviously feel free to update either of these. Thryduulf 10:56, 11 August 2008 (UTC)

I support standardizing on {{etyl}}; and if there are any etymology languages that don't have codes that we don't want to create pseudocodes for, I'd advocate using a name like {{etyl-fake-lang}} rather than {{FL.}} for its etymology template. As for admin vandals — well, that hasn't stopped us from using {{context}}, {{infl}}, {{en-noun}}, {{term}}, {{a}}, and so on. —RuakhTALK 13:29, 11 August 2008 (UTC)

I do also support this idea, since I reckon all forms of standardization would help on this Wiktionary. It's already hard enough for newbies to contribute, so I will support all ideas that can easy this. Also, I reckon this is a job a bot could help out with? If needed, I may run my bot to change templates. --Eivind (t) 18:11, 11 August 2008 (UTC)

Here is a list of etyl codes, although it's not complete, it does help finding them for languages like Old Irish or Low German. Nadando 00:22, 12 August 2008 (UTC)

There are cases, however, where we cannot (currently) use the {temp|{etyl}} template, as there are a few old languages without codes, and there are words whose etymological origin is not known with enough specificity, such as words known to originate in a southern Slavic language or a Mayan language or a Tupí language, but where the specific language of origin ios not known. For these situations, we have broader etymological categories and may have to use the older style templates. --EncycloPetey 19:02, 13 August 2008 (UTC)
Agreed (see my intro for this thread). But would you support changing all old-style etymon language templates which have etyl counterparts? That way we can see what's left and what needs work. -Atelaes λάλει ἐμοί 19:36, 13 August 2008 (UTC)
Seeing what wasn't available for use with {{etyl}} so we could work out how we want to deal with them was the reason I created Wiktionary:Languages without ISO codes - see its talk page. Thryduulf 21:35, 13 August 2008 (UTC)
Right, but lots of those common language families or macrolanguages like Slavic, Germanic.. or even Indo-European (ine) do have their own code which can effectively be used in {etyl} and thus deprecate old-style templates. Wiktionary:Etymology/language templates already contains {etyl}-style alternatives for some of them. I would also support the idea of AutoFormat changing old templates to {etyl} and {proto} in etymologies.. --Ivan Štambuk 19:21, 13 August 2008 (UTC)
We do run into one problem using those templates though, and this is that it becomes harder for the bots to know when a code represents a valid language and when it doesn't. We ought to tag the macrolanguage templates in some way for the benefit of Robert's bots, or else we'll end up having AF treating "Berber" and "Slavic" as valid language headers. --EncycloPetey 02:31, 16 August 2008 (UTC)

Relevant to this discussion are three sections on WT:RFDO, WT:RFDO#Template:Haw., WT:RFDO#Template:Icel. and WT:RFDO#Manx.. These are the specific etymology templates for Hawaiian, Icelandic and Manx respectively and have all been orphaned and replaced with {{etyl}}. The Hawaiian template currently has a consensus to delete, the other two have so far not garnered any responses. Thryduulf 21:39, 13 August 2008 (UTC)

Well, I'm counting six supports (besides myself) and no opposes. Should I go ahead and put the request to AF, or should this be put to a vote? Anyone? -Atelaes λάλει ἐμοί 05:26, 21 August 2008 (UTC)

I don't see the need for a vote. I'm happy (and apparently nobody else has any massive objections) for either AF or another bot to do the work as described here and at WT:RFDO#Dotted etymology templates. Thryduulf 09:25, 29 August 2008 (UTC)
While this is an old discussion, I wanted to point peoples’ attention to Connel’s remark at: Wiktionary_talk:Etymology#New_template, namely that the old forms are useful for Webster 1913 import.
Presumably automated conversion would address this issue – any thoughts?
Nils von Barth (nbarth) (talk) 02:44, 28 September 2008 (UTC)

I'd like to know what the result of this was. The Webster-1913 style templates are retained as redirects (since there is no actual naming conflict...they all have a period in the actual template name.) --Connel MacKenzie 16:07, 13 November 2008 (UTC)

The result is that AutoFormat is working through converting all the {{F.}} style uses to {{etyl|fr}} ones. It isn't currently doing all of them, just those listed at User:AutoFormat/Ety temps. I believe the intention currently is to add all the others that translate to a single ISO code later.
As far as I am aware there has been no decision yet about what to do regarding templates that do not correspond to an iso code (e.g {{LL.}}). I can't remember what the state of play is regarding dotted templates that correspond to iso codes for things other than a single language (e.g. {{Gael.}}).
I personally don't have a problem with retaining the dotted templates for ease of inputting/importing if others see a value in this. As it's been quite a while since any of these orphaned templates were deleted, I think its fair that any more should be brought to WT:RFDO rather than be deleted under the previous consensus. I know Robert Ullmann has a report detailing the usage of these templates, but I can't find it at the moment. Thryduulf 20:35, 15 November 2008 (UTC)

As of when I'm writing this, the English Wiktionary is 300 words away from the size of the French one. Soon we'll be in the #1 spot! -Oreo Priest talk 17:46, 20 August 2008 (UTC)

But they'll be back from vacation soon. DCDuring TALK 18:25, 20 August 2008 (UTC)
We've been slowly gaining ground on them for a while now; we just happen to have made a sudden gain as a result of high activity here and reduced activity there. There were ahead of us once before, and we then surpassed them for a long time. "This has all happened before, and it will all happen again." --EncycloPetey 18:53, 20 August 2008 (UTC)

Update: we now trail by 72 entries. --EncycloPetey 23:03, 20 August 2008 (UTC)

Neck and neck now!. Congrats. Everyone. Conrad.Irwin 23:46, 20 August 2008 (UTC)
...and now we're nearly 500 ahead of them. --EncycloPetey 18:01, 21 August 2008 (UTC)
But acording to the homepage of Wiktionary.org as of the present moment, they should have gained the upper hand with 3000 words ahead. When exactly did the French Wiktionary surpass anew the English one? (I presume between 21. August and 28 Aug, right?) Bogorm 07:48, 28 August 2008 (UTC)
It's out of date, see Special:Statistics and. The number we compare against is "pages qui sont probablement de véritables articles" and "pages that are probably legitimate content pages.". It's not really important at all, but some competitiveness is inherent in human nature, and it's better to get rid of it on an "external" foe than to bicker among ourselves. :) Conrad.Irwin 08:06, 28 August 2008 (UTC)

Now the French have taken the lead again by about 350 1000. --EncycloPetey 15:35, 31 August 2008 (UTC)

Well, but now I see on wiktionary.org the French Wiktionary behind the English. And you thought that they would take up expanding after their holidays are over. Your prognosis has obviously not been corroborated by these developments... What is the reason? Bogorm 07:06, 6 September 2008 (UTC)
On wiktionary.org, they're definitely ahead now. Teh Rote 23:30, 18 September 2008 (UTC)
Yes! We're ahead again! Teh Rote 14:56, 6 October 2008 (UTC)
 ??? www.wiktionary.org shows 919 000 for English W. and 922 000 for French, how did you conclude, that they had been surpassed? On what evidence? Bogorm 16:07, 6 October 2008 (UTC)
Recent changes count. The one on Wiktionary.org isn't updated too often. Teh Rote 14:27, 14 October 2008 (UTC)


Hi. Can we come to a consensus about phrase entries with someone('s) somebody('s) one('s) in the title? I keep finding duplicates such as twist someone's arm and twist somebody's arm. Perhaps there is already a consensus? If so, can someone/body place it in the "News for editors" (what a really useful idea, btw!) so we all know. Cheers. -- ALGRIF talk 14:55, 2 October 2008 (UTC)

No matter what standard we might set, new editors and anons are liable to use one of the other possibilities. To discourage the creation of redundant entries like this, when I create a term (usually an idiom) containing this word, I use "someone" (because it is one letter shorter than "somebody") but also create a redirect with the "somebody" variant (and another redirect with the "one" variant if it makes sense for that term). When I find an existing duplicate "someone"/"somebody" entry I usually combine them into the "someone" entry and change the "somebody" to a redirect. -- WikiPedant 15:17, 2 October 2008 (UTC)
Good question... Two thoughts: 1. I would distinguish "someone/somebody" from "one." "Twist one's arm" would suggest to me sentences like "I twisted my arm" or "He twisted his arm." "Twist someone's arm" would imply sentences like "He twisted my arm," as is correct for this idiom... 2. I prefer "someone" to "somebody," but in any case we should have a redirect from one to the other. I wonder if we could get a list of these titles? -- Visviva 15:20, 2 October 2008 (UTC)
We would benefit from a consensus. The redirect idea is obviously a good one. It just involves a little work to standardize and police, much of which would be facilitated by a list and templates or executed by a bot.
I agree with Visviva's distinction between "one" and "someone" and with his preference for "someone". The reflexive restriction on many phrases cannot be conveyed otherwise. For example, the addition of "own" in "twist one's own arm" produces an phrase that does not convey the right meaning. I don't think that these points had been well discussed previously, however, so we need to see if they are widely accepted. If we come to agreement, it needs to be memorialized at WT:CFI#Idiomatic phrases or somewhere in WT:ELE. DCDuring TALK 15:49, 2 October 2008 (UTC)
Long ago, Paul G, Muke and I think Dvortygirl used the Wiktionary-community standard of "someone's" for all idiomatic entry headings. Other forms are supposed to be hard or soft redirects to the main entry. --Connel MacKenzie 16:04, 13 November 2008 (UTC)

Or are they lexicon categories and should be treated differently? I am proposing to rename:

__meco 16:09, 2 October 2008 (UTC)

No. They are properly named. They are not topics. The equivalents for other languages would be (e.g.) Category:French words with negative connotations, and so on. Robert Ullmann 17:05, 2 October 2008 (UTC)
There's a problem in that its only parent category is in the topical hierarchy, i.e. Category:Emotions. What to do? __meco 18:11, 2 October 2008 (UTC)
There are some cases where a topical category resides within a grammatical category or vice versa. This is especially true when it comes to the numbers, which are classified grammatically, but also in the topical Mathematics categories. Such situations are rare, and should be avoided when possible, but I don't think we can avoid them altogether. --EncycloPetey 18:23, 2 October 2008 (UTC)
There isn't any reason to "avoid" them, the category structure is hierarchical, but isn't a tree. A category doesn't have to be "in" one "parent" cat. No reason at all why these can't be in the cat for topic "Emotions" (which is useful; also note that that cat is "en:Emotions" except that we don't use the code prefix for English), and also be in the English language cat (directly or indirectly), where they "belong". Robert Ullmann 18:32, 2 October 2008 (UTC)
Shouldn't there be a category linking these to the lexicon hierarchy? __meco 18:52, 3 October 2008 (UTC)

I think it is a bad idea not to use the Romanized version of for instance Greek words (or Chinese). I tried to change this for pornography but was promptly reverted. If this is policy we need to change it. It is most unhelpful to the majority of users to show preference for the original script and forcing those who would be interested in seeing (and understanding) what the origins of a word are to open more pages just to be able to see what that word is. __meco 16:52, 2 October 2008 (UTC)

The template accepts tr=, which displays the Romanisation in addition to the native script. The problem with your edit was that you were hiding the native script. See the entry for comoedia for an example where both he Greek script and romanisation are used. --EncycloPetey 16:59, 2 October 2008 (UTC)
Very good. I added tr= items to the term template at pornography. That solves the problem as far as I'm concerned (unless somebody decides to remove them again). __meco 17:30, 2 October 2008 (UTC)
"original script"? WTF does that mean? That the word used to be written in Greek and is now written in Latin? No, we write words (in etymologies and elsewhere), as they are written. If someone wants some derived attribute, they can follow the link. Seeing it in the script it is written in is "seeing what the word is". Robert Ullmann 17:02, 2 October 2008 (UTC)
As to what the fuck I meant by "original script", that is the script which is primarily associated with the language the word is classified as. For a Greek word, that would be Greek script. I do not agree with your blunt conclusion of "how we do things". The reason I bring this up is because I think we should pay attention to our raison d'etre and our customers and their needs. It is much less useful for someone who can not decipher a foreign script to "see what the word is" than to be able to read the transliteration and perhaps experience some recognition. As for relegating anyone who is interested in experiencing recognition thusly to follow the link, I addressed this aspect expressly in my first post. I also perceive your tone to be not overly conducive to congenial dialogue. __meco 17:22, 2 October 2008 (UTC)
I see no reason why we're having this argument. Our policy is already quiet clear (and has been for some time). We should have both the native script and a romanization. No one is in disagreement here. -Atelaes λάλει ἐμοί 17:56, 2 October 2008 (UTC)

I have been accused by EncycloPetey of removing a link to Wikipedia when I replaced the latter by the former (which is simply not true because the box contains the same link surrounded by different text). It may be debatable if this was a good action but ultimately it is a matter of taste (we seem to agree on that) and I was working on hawthorn anyway. EncycloPetey reasons on my talk page further that it is community consensus that both should be used for linking - and obviously it is his opinion that this means in effect that it is a perfectly good idea that both templates are to be used in the same article and for the same language. I disagree. Community, can we agree on that it is not a good idea to use both templates in the same article within the same language section, and that I did therefore not do anything harmful? -- Gauss 18:57, 2 October 2008 (UTC)

Replacing standard text links with a courtesy box does not mean that the link wasn't removed. There are many, many pages where the text links are necessary and where the pediabox is atill useful as a courtesy. Consider the entry for Afar, where the box link is to the disambiguation page on Wikipedia and the text links are to the individual articles. This is not an uncommon phenomenon. The box is a visual courtesy to alert users quickly that WP articles exist, but it cannot replace the links to the separate relevant articles, or we would have multiple boxes, which has in the past been rejected as visual clutter. --EncycloPetey 19:03, 2 October 2008 (UTC)
The argument with Afar would apply only if the section External links at hawthorn, which is the subject of this incident, contained a pedialite link other than to the same disambiguation page w:Hawthorn. That section could, and maybe should, contain a direct link to w:Crataegus but it didn't and doesn't. And to the other matter: A link is a link, whether it is in a box or not, and I did not change the link in any way! (In hindsight, I should have been the dab= parameter in {{wikipedia}}. My bad. I can admit mistakes.) Constructive opinions from a third party? -- Gauss 19:18, 2 October 2008 (UTC)
Some editors, including myself, prefer {{pedialite}} (or the equivalent {{projectlink|pedia}}); some editors prefer {{wikipedia}}; having both seems like a decent compromise. A lot of readers won't notice the {{wikipedia}} box, due to banner blindness. I'm not sure if the same is true of {{pedialite}}. —RuakhTALK 22:29, 2 October 2008 (UTC)

Hello everyone. I'm not sure if what's discussed at s:User talk:Giggy#Spanish - English Dictionary for Beginners is the sort of thing wanted here, but if it is (see also the page linked to in the header there), then please leave a note on my Wikisource talk page and I can copy across the content. Its author has agreed to release it under GFDL. Giggy 01:34, 3 October 2008 (UTC)

As readers of WT:GP know, I've been working on dumps to replace the AWOL WMF XML dumps. It is going quite well, you can find them at http://devtionary.info/w/dump/xmlu/ with a new dump added daily. Each one is up to the minute with edits and deletes when run, except that as of now, there are ~70K entries and edits from the last 4 months missing; it is adding 30-40K a day, so in 2-3 days will be current. In any case, these are a large improvement on 13 June.

There are two purposes for these dumps; one to make it possible to run our internal reports and lists again, the other to make our content available to other people re-using it in various ways. Sites like Ninjawords haven't been able to get new content from us since June 13th. (I suspect they have started doing dynamic mirroring, which isn't allowed, but under the circumstances, what were they to do?)

The dumps presently include namespace 0 and templates (namespace 10); this is what is needed for our internal stuff. For other users, they were using the primary WMF dump, which is all even-numbered namespaces except 2 (User). (FYI: the odd-numbered namespaces are the corresponding Talk pages.) So to make my dumps more useful, I should probably add some others back in. (At the moment it drops them, who needs a 4-month old copy of Beer Parlour? ;-)

If and when WMF gets their dumps running again (any day now, but it has been any day now since June ...) we will still be able to run daily spins, and not have to wait weeks for each. I was thinking about like this:

namespace ns include comments
(main) 0 yes possibly excluding a few remaining oddities
User 2 no
Wiktionary 4 no not content
Image 6 no all of our images are from commons
Mediawiki 8 no not content
Template 10 yes possibly excluding "User ..."
Help 12 no not content
Category 14 yes possibly excluding "User ..."
Appendix 100 yes
Concordance 102 no not our content, used for finding new entries
Index 104 yes
Rhymes 106 yes
Transwiki 108 no will appear if moved to mainspace
Wikisaurus 110 no not for now
WT 112 no not content, internal shortcuts
Citations 114 yes

Technical discussion, gory details in WT:GP. Any thoughts? Do you use the dumps? Robert Ullmann 15:25, 4 October 2008 (UTC)

I do not run any processes that use the dumps directly. As I understand it, certain lists depend on the dumps to be current. Of those, the ones I use regularly are "Missing", "Not Counted", and the L3 header list. Most of the other lists I use are based on categorization, often by bots. Since the dumps started getting irregular (March '08?), I have not devoted any time to thinking up analyses and maintenance lists that seemed to depend on the dumps. What other lists are run from the dumps? I think that my use is solely dependent on NS:0.

The daily dumps are useful from my point of view only if those lists are updated from the daily dumps. DCDuring TALK 16:14, 4 October 2008 (UTC)

I use the dumps and will be sure to grab the "spin" in a couple of days once it's current. Thank you for all the work. Besides the basic pages XML, I also use the category links file. Any plans for that too? --Bequw¢τ 20:03, 4 October 2008 (UTC)

There are separate categories:

AFAICT, Slavonic refers either to a language family (Category:Slavic languages), or to the protolanguage (Proto-Slavic language).

Should there be two separate categories? (I don’t understand the distinction, if one is intended.)

The language template {{Sla.}} generates Slavonic links, and, suspiciously:

…so I suspect that {{Sla.}} should be deprecated and replaced with {{proto}} Slavic links – does this analysis seem correct?

Nils von Barth (nbarth) (talk) 02:13, 5 October 2008 (UTC)

You are mostly correct. {{Sla.}} needs to be analyzed on an entry by entry basis. Most of the entries which use it should be replaced with {{proto}}, as it is a reference to the proto-lang. However, in some cases, the template is used to refer to the language family, and not the proto-lang. We don't have a solid policy on how derivations from language families should work just yet, and so those should probably be left as they are right now. The same problem exists with {{Ger.}}. -Atelaes λάλει ἐμοί 03:16, 6 October 2008 (UTC)
Terms Slavic and Slavonic are synonymous. {{Sla.}} is used for borrowings from unknown Slavic language (most prehistorical Slavicisms in Albanian, Baltic, Romanian and Hungarian are such, as it doesn't make much sense to speak of individual Slavic "languages" in the period of 6th-12th centuries). Proto-Slavic should be used exclusively for Slavic words that descend from Proto-Slavic reconstructions. Sometimes the Proto-forms are added when Slavic words are added in the etymologies of other families' words, so you get e.g. English word being "derived" from Proto-Slavic, but that practice is IMHO wrong and should not be encouraged (or at least practised with blank lang= in {{proto}}. --Ivan Štambuk 15:27, 6 October 2008 (UTC)

  1. Neskaya
  2. I am requesting a bot flag for Neskbot
  3. The bot is using the pywikipediabot framework. A link to the specific script for it can be found User:Neskbot/uploadscript.
  4. The bot will be used to semi-automatedly add Hiligaynon entries, in batches of no more than 250 at a time. Each edit is manually reviewed before submission. This bot will also add Hiligaynon language sections to existing entries. The bot serves to perform the edits that I would otherwise be performing under my own account, and the bot flag will allow me to process significantly more edits at a time.

The vote for this can be found here.

Thank you. --Neskaya kanetsv 02:19, 5 October 2008 (UTC)

I'm confused, sorry. Are you studying Hiligaynon? Where are you finding these words? —RuakhTALK 02:27, 6 October 2008 (UTC)
Yes, I'm studying Hiligaynon. I began learning Hiligaynon from a classmate in early high school, and am continuing to learn from the caregiver who is taking care of my grandparents. I am getting the words from an copyright-free dictionary that said classmate's grandmother gave me (it had no publishing information, else it would be cited under references) that I used OCR software to produce a PDF of. If you have any other questions please let me know. --Neskaya kanetsv 19:54, 6 October 2008 (UTC)

See: Template talk:l#Links to words in head of phrase?

The template {{l}} (which generates language links w/o italicizing the entry) is v. useful for linking to individual words or subphrases in the head of a (idiomatic) phrase, as in {{infl|xx|phrase|head=...}}.

It’s useful for this as one does not want to italicize the words – is this an “approved use”?

I’ve started a discussion at Template talk:l#Links to words in head of phrase? & wanted to flag it here so people could weigh in.

E.g., current version of de gustibus non est disputandum has as head:

{{infl|la|phrase|head={{l|la|de}} {{l|la|gustibus}} {{l|la|non}} {{l|la|est}} {{l|la|disputandum}}}}

…which generates correctly formatted links

Does this seem ok? A good idea, even?

Nils von Barth (nbarth) (talk) 14:25, 5 October 2008 (UTC)

Two issue: (1) There is already a language parameter included in {{term}}, so if the template doesn't already link to the correct language section when "head=" is included, shouldn't there be a way to fix the existing template function to do that? (2) That's problematic for Latin, because the head form should include macrons, which your example above doesn't. I think for Latin, it would actually be simpler to use explicit wikilinks. For languages that do not have optional diacriticals (the way that Latin, Arabic, and Hebrew do), your suggestion might be feasible. However, it won't work universally, so it might be better to look for a universal solution. --EncycloPetey 19:36, 5 October 2008 (UTC)
You can use additional unnamed parameter for {l} for forms with diacritics, just like {term}, that will be used for display and not for wikilinking. e.g. {{l|la|gustus|gustūs}} which will link to gustus but display gustūs. --Ivan Štambuk 15:16, 6 October 2008 (UTC)
Ivan, thanks for informing us of the optional parameter for {{l}}!
I’ve documented it there and I believe this addresses EncycloPetey’s concern (2).
EncycloPetey, I don’t follow your issue (1) – could you elaborate?
The issue I have is in linking the words in the head of a foreign language phrase:
{{infl}} doesn’t link words automatically, hence one must do so manually, and {{l}} seems the best way to do it; I don’t see the relevance of {{term}}.
Nils von Barth (nbarth) (talk) 23:14, 14 October 2008 (UTC)

At put the pedal to the metal I have removed a pronunciation section that consisted entirely of an {{rfp}} template. My reasoning for doing this is that the pronunciation of this multi-word idiom is completely predictable from the component words (which are already wikilinked). As such I don't feel it is worth the time required to add the pronunciation to these entries, especially as there are many possible permutations of stress pattern, slurring of word boundaries, etc, for each accent.

Does anyone object to this? Thryduulf 17:06, 6 October 2008 (UTC)

I object. The pronunciation is not entirely predicatable for two reasons in this case: (1) The word the has two pronunciations. A non-native speaker will not know which pronunciation is to be used in this phrase. (2) The placement of stress in the overall idiom is not predictable from the component words. Multi-word idioms sometimes move stress to new syllables, or emphasize certain components over others. This cannot be predicted from the components. --EncycloPetey 17:14, 6 October 2008 (UTC)
(after edit conflict) The stress pattern of idioms like this is not fixed, depending on the speaker and the context; likewise the pronunciation of words such as "the" in idioms such as this is context (and stress) dependant and not crucial.
In the case of words with differing pronunciations depending on the part of speech or meaning, these are predictable once the meaning of the idiom is known. Thryduulf 17:19, 6 October 2008 (UTC)
O boy did you pick the wrong example (:-) Yes, the pronunciation is usually predictable from the words, but in this case rather spectacularly not! It is often pronounced with either "pedal" as "petal" or "metal" as "medal", matching the consonantal phoneme one way or the other. (and concur with EP, idioms sometimes have particular stress patterns etc) So sometimes a pronunciation section is warranted, either with IPA and so forth, or notes. Robert Ullmann 17:30, 6 October 2008 (UTC)
I can't recall hearing this idiom with the non-standard pronunciation of either "pedal" or "metal", which is why I described it the way I did. Thryduulf 17:35, 6 October 2008 (UTC)
I would not normally request a pronunciation for a multi-word expression. I think that Stephen made the point that this one is not pronounced exactly as one would expect, unless one happened to be a linguist and possibly a specialized one at that. I am not sure that I understand WP's point which may extend beyond this case.
In my experience in the US this is pronounced almost always as "pedal to the medal" (56 bgc hits for this spelling!) or "petal to the metal" (72 bgc hits for this spelling!) (possibly something in between) and hardly ever "pedal to the metal" (700 bgc hits). (For the curious, the truly nonsensical "petal to the medal" gets 5 bgc hits, predictable from the "error" rates for the individual words.) That the pronunciation should leave strong traces in edited works is pretty good evidence I would think. If a non-native speaker is in the advanced stages of simulating the speech of a native speaker, this would help. Phonetic lookup would be a help if we had it for many not familiar with this who first encounter it in speech. DCDuring TALK 17:39, 6 October 2008 (UTC)
I'm confused by your comment. In my dialect, "pedal" and "petal" are homophones, as are "metal" and "medal": all four use the alveolar flap IPA: [ɾ]. I thought this was true for almost all forms of U.S. English. (In hyper-enunciated speech people will distinguish them based on spelling, but that's not really relevant to this idiom.) So including a US pronunciation that's strictly SOP, plus three UK pronunciations that are strictly SOP except that two of them each have one phoneme switched, is really not going to help the reader. Are we O.K. with a free-form, usage-note-style pronunciation section, something like, "This idiom is normally stressed on the nouns pedal and metal. In dialects where pedal and metal do not ordinarily rhyme, speakers will sometimes modify one or the other in order to make them rhyme."? —RuakhTALK 17:54, 6 October 2008 (UTC)
I'm not in the habit of listening consciously to such things. Treating my own pronunciation as a reflection of what I've heard, I think that I make a small distinction, but it's hard to tell when I'm being conscious of it. MWOnline and Cambridge Dict. of Amer Eng. show different pronunciations for "pedal" and "petal". DCDuring TALK 18:04, 6 October 2008 (UTC)
I, too, hear a distinct difference in the two words. ...But I also pronounce the "c" in distinct. That's what you get for learning your vocabulary by reading. Amina (sack36) 00:32, 7 October 2008 (UTC)
Doesn't everyone pronounce the c in distinct?—msh210 19:58, 7 October 2008 (UTC)
Actually, no, not everyone does. At the very least, when I'm speaking fast the c slips out of it. --Neskaya kanetsv 00:12, 8 October 2008 (UTC)
Re: MWOnline and Cambridge Dict. of Amer Eng.: Well, I said almost all forms. :-P   Also, I should clarify that flapping is a phonological phenomenon, so I'm not sure whether "pedal" and "petal" are actually phonemically distinct. [Disclaimer: I am not a linguist, and the remainder of this comment should be taken with a large grain of salt.] In an illiterate society, I think unconditioned phonological merging probably implies phonemic merging. (Flapping is conditioned, but in ordinary speech, the condition is always met in pedal/petal and metal/medal, so it's "almost unconditioned" for the purposes of this discussion.) In our society, our literacy shapes our phonemic awareness. It's not perfect — as you point out, people sometimes write "petal" when they mean "pedal" and so on — but it has an effect. So even if flapping were universal to all forms of English, I think dictionary pronunciations might distinguish "petal" and "pedal" at the phonemic level. But to apply that to "put the pedal to the metal", we'd have to somehow determine that speakers are using the "wrong" phoneme. Your spelling comparisons are evidence of this, but I don't think they're terribly compelling. More compelling would be evidence from speakers who don't practice flapping (either because they never do, or because they're really enunciating). Thryduulf is presumably one such, since I don't think flapping occurs in the UK, and he says that he hasn't heard the "wrong" phone; Robert Ullmann might be another, since he now lives in another region where flapping might not occur (no clue), and he says that he has. (But if he's since grown unused to flapping, then maybe he would mis-interpret a flapped speaker's pronunciation as "put the pedal to the medal"?) We don't have a large sample of editors from every region of the Earth, and AFAIK we don't have independent references to use for verification of something like this. I think that whatever claims we make in our entry, they should be very cautious. —RuakhTALK 21:55, 7 October 2008 (UTC)
Then, all things considered, perhaps we have to concede that #Pronunciation of multi-word idioms is beyond our capabilities at this time or not worthwhile. It is not as if we already have all the single-word pronunciations covered. DCDuring TALK 23:17, 7 October 2008 (UTC)
Seems to me that if someone actually requested pronunciation be added by use of {{rfp}} then it must not have obvious pronunciation, and the pronunciation is worth adding.—msh210 19:57, 7 October 2008 (UTC)
But like many such requests it's us talking to each other. I placed the request, because Stephen's comment about "flapping" had made me notice or believe that there was something funny about it. If the funny pronunciation is only in some parts of the US, as it seems, and our pronunciations are sourced from the UK (mostly Thryduulf lately), and print sources don't usually cover multi-word pronunciations, the current result seems to be the outcome. DCDuring TALK 20:55, 7 October 2008 (UTC)
Oik! Does that mean no pronunciation guide for "Don't you know"? Where I come from it's "DOAN cha no" and you'd get sniggered at for the correct pronunciation. Perhaps "put the pedal to the metal" wasn't the best example? Amina (sack36) 10:03, 9 October 2008 (UTC)
Nobody would delete a pronunciation provided, I think. Nobody can force a volunteer to provide a pronunciation. The only question, I suppose, is whether we want to permit the removal of request for pronunciation tags to neaten up the entries and the request lists. DCDuring TALK 11:51, 9 October 2008 (UTC)

This reminds me of a phenomenon I have observed in my own use of lists (requests, missing, etc). At some point a list becomes "clogged" with items that I personally can't or won't fix. The motivating factor of the "empty inbox" is vitiated. I'm thinking that in some cases it may be better for me to copy such a list to my own user space and delete items as they are resolved or determined to be beyond my capability or inclination. There might be a technical solution to eliminate the need for the personal userspace solution, but only worth requesting if the underlying problem is shared by many. DCDuring TALK 12:14, 9 October 2008 (UTC)

I've also had this problem (the remnants-I-can't-fix problem). I don't have a solution to offer, but would be willing to help implement a solution you came up with. —RuakhTALK 16:36, 11 October 2008 (UTC)

What is this? is this a typical application of translations these days? Just so I'm aware, you understand... I'm having a very hard time keeping up with all the template-policy and -practice initiatives. - Amgine/talk 17:16, 6 October 2008 (UTC)

It appears to be an experiment for transcluding translations when there is more than one spelling of a word. In this case, there is a hyphenated and non-hyphenated form. --EncycloPetey 17:18, 6 October 2008 (UTC)
Yes, I understood that... wouldn't "aternative spelling of..." be more suitable, less likely to create a bazillion new templates? How does this define which sense is being translated? etc. It seems likely to result in complexification without real benefit, but that's merely my personal opinion. - Amgine/talk 19:15, 6 October 2008 (UTC)
I agree. I was merely noting that is seemed to be an old experiment, since you were asked what this was. --EncycloPetey 19:22, 6 October 2008 (UTC)

I've started a vote on substituting {{es-verb}} in place of the exisitng {{es-verb-ar}}, {{es-verb-er}}, and {{es-verb-ir}}, so that we use a single template consistently for all Spanish verb lemmata.

A full description of the template's function with examples appears on the template's talk page. Discussion is located at Wiktionary talk:About Spanish#Template:es-verb. --EncycloPetey 00:11, 7 October 2008 (UTC)

Recently, while looking through several articles, I observed in the translation section Manchou translations in Latin script(exempli gratia: here or here). There is even an entry about a Manchu noun, again in Latin script - aniya. I was urged insistently to input entries in the native script (at least what regards Gothic) and I would like to ask: are there objections against creating a [[Category:Articles which need Manchu script]], so that knowledgeable editors be able to input the script where necessary and to widen here the Manchu entries (there is still no Manchu Wiktionary). The script in question is on the right.
I had thought until recently that the digitalisation is not yet possible, until I beheld at this Wikipedia entry the name of the language in digitalised form, but just like Gothic until recently I am unable to see anything but questionmarks (ᠮᠠᠨᠵᡠ ᡤᡳᠰᡠᠨ). If anyone sees something meaningful, then this proposal about delivering Manchu words in the proper script without inputing images should be accepted and the expression would be bound to become the first Manchu word n Manchu language here. If so, I am ineffably eager to know whether the digitalisation has preserved the original top-bottom writing or it is an adjustment to the prevailing in the digital world horizontal writing? Furthermore: has anyone any idea about whether the Manchu script is expected to be included in the Unicode just like Gothic? Bogorm 13:33, 8 October 2008 (UTC)

FYI, the characters are supported in Unicode 3.0, from 1999. They are supported by five fonts which come with the Mac, and the above text specimen is readable on my Mac. The script renders horizontally left-to-right in both Safari and Firefox on the Mac. It is also supported by the free w:Code2000 font, but it appears to me that the characters are rendered sideways in that font (e.g., so that the rows look correct if you turn your computer screen sideways). Michael Z. 2008-10-09 15:26 z

There is some information about the script, but despite the explanation of the script here, in the next lesson they proceed with the Latin script... If there is a digitalisation, one should apprise the contributors on Wikibooks about it so that it is written properly, right? Bogorm 13:44, 8 October 2008 (UTC)

The script is Unicode range U+1800 to U+18AF, script code is Mong. Characters specific to Manchu are included in this range. Mostly what you need is a font.

The coding just puts the characters in order; it is up to the rendering (e.g. browser) to display them vertically if desired.

So the answer is, yes, already there, everything coded. Find a font to download from somewhere ;-) Robert Ullmann 13:52, 8 October 2008 (UTC)

Mongolian sounds pretty strange, since Mongolian is written in Cyrillic script and Mongolian is not a Tungusic language like Manchu. Following your elucidation, I conclude that a category [[Category:Articles which need Manchu script]] similar to the categories for Gothic, Cyrillic and so forth is exigent and I am going to create it and put aniya there. Are there any objections? Bogorm 14:11, 8 October 2008 (UTC)
Mongolian is now usually written in Cyrillic. But it was written in a Uyghur script as is Manchu. (Why the script block is named "Mongolian" rather that "Uyghur" I don't know; probably just the more familiar name.) See w:Mongolian script for a larger explanation. Robert Ullmann 14:23, 8 October 2008 (UTC)
Well, but Mongolian and Turkic languages have nothing to do with Tunguso-Manchu languages, which are a completely separate family. The Manchu script descends from the Jurchen script, which according to Wikipedia descends from Khitani script. And Khitan people are by far different from Mongolians... Some speculators believe that the Tunguso-Manchu group of languages and the Japanese language were allegedly Altaic languages, but this goes over the top (not according to me, but to the huge contesting group of venerable linguists) and I find it highly dubitable, they three are just in a neighbourhood area. So - Mongolians and Turkic people are similar, Japanese and Tunguso-Manchu - something entirely different. Bogorm 14:36, 8 October 2008 (UTC)

If the script can be written as in this template about the language on German Wiki, then we should also adhere to the proper script, should not we? Is there something visible for anyone? Bogorm 13:57, 8 October 2008 (UTC)

I can see the characters, but as the template itself notes, the lines are all backwards. This is a Right-to-Left language script, but the lines are displayed as Left-to-Right because the reversal that would display them correctly is not yet supported. --EncycloPetey 17:53, 8 October 2008 (UTC)
In any case, I absolutely fully support tagging such entries with a script request. Many of the entries currently tagged for various scripts will be waiting for some time, as we have no one who can handle the script, and some are tagged with a script which is not even Unicode supported yet. However, I see no problem as this creates a convenient worklist for when we have the skills/technology at our disposal. Please feel quite free to use {{rfscript|Mongolian}} on anything you see which should have Manchu script but doesn't. In general, this should be true for any and all situations where a native script is not present. -Atelaes λάλει ἐμοί 19:01, 8 October 2008 (UTC)
EP: the script isn't RTL: it is top-to-bottom vertical, with the lines then ordered left-to-right. (And yes, it/they had an RTL origin, then rotated to vertical; so if one was to "force" horizontal presentation, RTL makes more sense. But it should be vertical, and hence is not in the RTL part of Unicode/UCS.) Robert Ullmann 14:47, 9 October 2008 (UTC)

At the moment a term can appear in Wiktionary after one occurrence in a "well-known work". The interpretation of this is very subjective, and I would like to propose that we remove this proviso from the CFI. With templates such as {{only in}}, there is no need to fear that removing an entry from the dictionary will remove it completely; so we could use our "Concordance" or "Appendix" namespaces to include terms from any published work - as indeed is already started, see Appendix:Harry Potter terms or Concordance:A Clockwork Orange. The main thing to be decided, if there is agreement on this being a sensible course of action, is on what format these non-main-namespace pages should take; however I don't intend to wrangle too much about that until we've decided this would be a good thing to try. Conrad.Irwin 14:28, 9 October 2008 (UTC)

While "well-known" may be technically subjective, in practice it works extremely well: we have not have any dispute I can recall about whether a given work is "well-known". There are a small number that clearly qualify, a vast number that of course do not, and little grey area. This part of CFI is not broken, and I thus object to any attempt to "fix it".
The issue was raised at Talk:bababadalgharaghtakamminarronnkonnbronntonnerronntuonnthunntrovarrhounawnskawntoohoohoordenenthurnuk in which a contributor referred to our policy as "ridiculous". This instant case in fact establishes the contrary: Finnegans Wake is unquestionably a "well-known work", not in any grey area, and thus reinforces the reasonableness and validity of our policy. Robert Ullmann 14:41, 9 October 2008 (UTC)
There is some disagreement over Harry Potter, I am not suggesting its removal purely because of subjectivity, but more because I feel it inappropriate to distinguish some authors from others. Conrad.Irwin 14:47, 9 October 2008 (UTC)
In my opinion, we should follow the appendix practice if:
  • A very large number of terms were coined and most have not fallen into common usage (like A Clockwork Orange)
  • or most of the terms do not have translations; they are not synonyms for existing concepts but rather names for inexistant concepts
Otherwise, the well-known work addition practice works fine. Teh Rote 14:51, 9 October 2008 (UTC)
We would miss no valuable principal namespace entry if we would eliminate the well-known work exception to our attestation criteria. Finnegan's Wake always struck me as a well-known title, not a well-known work. Pynchon, w:Nabakov, Burgess, and Tolkein also have the penchants for coinage and resurrection or rare words that are usually not taken up more widely. The distinction we make favoring them over George Lucas, Gene Roddenberry, J. K. Rowling, George Herbert, and Neal Stephenson is a throwback to a more elitist kind of reference work than a wiki-based work at WMF should be, I think. DCDuring TALK 15:02, 9 October 2008 (UTC)
I will highly object to changing this aspect of CFI. We exist to define "all words in all languages". I think we leave too much out, personally (place names, etc.); this would exclude far too many terms. sewnmouthsecret 21:05, 9 October 2008 (UTC)
I like that clause and have no desire to eliminate it, but I would certainly hear out a proposal to clarify or otherwise improve it. —RuakhTALK 16:41, 11 October 2008 (UTC)
I agree entirely with Ruakh. The point about words such as the above is that they could possibly be used in another work, or article, at any time. And when that happens, Wikt should be there to help the reader understand it. However, OTOH, I think that "well-known work" should come with a start date. For instance, Shakespeare goes without saying. Edward Lear has given us runcible spoon which was once a nonce word. beamish similarly has come to have normal usage. But if we get too recent, then we are faced with all the Harry Potter garbage (IMHO) and similar, just because a lot of people know the work. If in, say, 30 years time Harry Potter is still well-known, then it would fall into the same category as Lord of the Rings and The Hobbit; books that are only just out of the grey area now. -- ALGRIF talk 10:05, 13 October 2008 (UTC)
In practice the exemption of words from certain favored well-known works allows us to venerate words from antiquarian or obscurantist literary English by including a modest number of words with were coined in select literary works, but that were not taken up widely. It serves no other purpose, AFAICT. It does not enable us to anticipate the popularity of words from such sources. "[[Beamish]]" and "[[runcible spoon]]" are examples of words that would not be excluded by the elimination of the exemption. Nor would many of the Jabberwocky nonces. The various dead nonce words and bits of eye-dialect in Finnegan's Wake would be. An appendix for Finnegan's Wake nonces and the use of "only in" tags to direct English majors, graduate students, and other antiquarians to that appendix should be a perfectly adequate solution. It has been deemed good enough for many categories of live words that are hard to cite but with specialized usage such as Military Slang and, yes, Harry Potter.
If I had pursued the well-known work exemption for [[Brer Rabbit]], would that have been permitted? DCDuring TALK 10:47, 13 October 2008 (UTC)
Are you then advocating no entry until an appendix is built? Or normal dictionary entry moved to appendix when (if) one is built? -- ALGRIF talk 11:50, 13 October 2008 (UTC)
"Only in" with a redlink to the appendix to be created would be one way. We could have the citation entry, perhaps integrated with the only in and with the appendix in some way (vague hand-waving). We could let our contributors suggest appendices and have some page that listed the appendices and the redlinks to "wanted" Appendices. We could have templates to facilitate the creation of such appendices. It is just a question of making sure that all second-class citizens are treated the same, until we come up with third- and fourth-class citizenship status. DCDuring TALK 15:04, 13 October 2008 (UTC)

I think the Wiktionary:News for editors page is a very good idea, but suspect that it is overkill for it to display at the top center of every page. Assuming that most WT visitors are, or soon will be, users looking things up rather than editors, wouldn't it be more appropriate to have this link display just on project pages and on the editing page? -- WikiPedant 20:36, 9 October 2008 (UTC)

Once it is on a user's watch list, it would be better if they didn't have to see it elsewhere at all. Why on editing? Perhaps it could be put on everyone's watch list by default, permitting each user to unwatch it at will. DCDuring TALK 21:09, 9 October 2008 (UTC)
Or just before the "log out" link? -- ALGRIF talk 13:07, 12 October 2008 (UTC)

Why is SEO spam on the beginning of Wiktionary:Main Page (see the source)? Can any admin remove it, please? Wiktionary does not need such kind of nasty cheating, besides it makes the page less usable for people with screen readers, text browser, stylesheets turned off etc. because of seing/hearing unnecessary garbage. Thanks.

Danny B. 19:25, 12 October 2008 (UTC)

I think the point of them originally was to make our <meta keywords=""> be useful (which is not cheating, it's actually good practise), but that seems to have been broken at some point by software updates. It'd be nice if we got https://bugzilla.wikimedia.org/show_bug.cgi?id=14882 but I see no reason not to remove the links. Conrad.Irwin 08:36, 13 October 2008 (UTC)

Out of habit, gained after I have seen some models, I have been putting a period at the end of captions of pictures, such as "Navy pea coat." instead of "Navy pea coat". But I have also seen captions at Wiktionary formatted without a period. Is there any shared convention on Wiktionary for putting or not putting periods there? Any recommendations for me? Thanks. --Dan Polansky 07:33, 13 October 2008 (UTC)

As far as I know we have no specific convention, as I have seen images both with and without a period for my entire time here. However, someone more familiar with policy may say differently than I do. --Neskaya kanetsv 17:47, 13 October 2008 (UTC)
I don't think there's a standard convention for it, but seeing a caption end with a period when it's a simple declaration drives me up the wall. :) EVula // talk // 18:54, 13 October 2008 (UTC)
Periods (full stops) should be used at the end of sentences (and nowhere else). SemperBlotto 18:58, 13 October 2008 (UTC)
It's not that simple. Periods are legitimately used a number of places besides at the ends of sentences (e.g. after abbreviations or numbers in a list). In many dictionaries (e.g, the OED, the Random House, the 1928 and 1913 Webster's, American Heritage), there is indeed a period at the conclusion of each sense's definition, even if that defn is not a grammatically complete sentence.
The MLA Style Sheet says that periods should be used between the elements of references in academic writing, and gives lots of examples.
As for captions, my Chicago Manual of Style (14th ed.) says that periods are not used in legends (caption headings) or caption text consisting of partial sentences, unless the legend immediately precedes the caption text, and gives this e.g.:
Fig. 21. Augustus addressing his troops. This portrayal of the emperor deliberately harks back to old times.
I tend to follow the Chicago Manual of Style and do not use periods in simple captions which are not full sentences, but I do use periods if there is more than one element in the caption (even if the elements are not full sentences). -- WikiPedant 20:12, 13 October 2008 (UTC)
For what it's worth, I have been doing a similar thing with etymologies. When there is a single element (which is almost never a complete sentence, e.g. "From French foo"), I leave it without a period. However, when there are multiple elements (e.g. From French foo. Compare German fooer, Norwegian fö.) I end each statement with a period, for the sake of clarity. -Atelaes λάλει ἐμοί 22:42, 13 October 2008 (UTC)

Wow, thanks a lot, especially to WikiPedant. --Dan Polansky 07:03, 14 October 2008 (UTC)

A good rule of thumb is not to add redundant punctuation to any element whose nature is exposed by design or typography. For example, bolded headings shouldn't be followed by colons, bulleted list items shouldn't end with commas, semicolons, or periods, and captions tied to images don't need terminal periods, especially when they are both bounded by a box. Another clue is that these things are often sentence fragments standing alone.
If such an element gets more complex, then it requires some written structure from commas, semicolons, colons or full stops, maybe just separators, or maybe a terminator too. This includes bibliographic references and complex list items. The Augustus example of a sentence fragment followed by a sentence is just fine by me.
Of course, dictionary entries have an even more complex structure defined by typography and punctuation, but we're mostly doing okay so far. Michael Z. 2008-10-17 20:42 z

As you may know, the statistics counter is significantly off. We have passed one million entries. I've done a bit of analysis while doing XML dumps. Count is all namespace zero, not redirects, and not "misspelling of" or "only in".

It is said that if you ask a question of 10 economists, you will get 10 different answers; if one is from Harvard, you will get eleven different answers. In that spirit, and in the tradition of judge's decisions in Wiktionary contests being self-appointed, arbitrary, capricious, and final:

Bot entry:

allanaste, Spanish verb form, by BuchmeierBot

Excepting bots:

, Symbol, by Bequw

Excepting bots and symbols and the like:

SFP, Acronym/Noun, by anonymous IP 64.28.25.201

First actual word:

svetlost‎, Serbian noun, by Dijan

At 7:57 UTC 16 October. Robert Ullmann 15:49, 17 October 2008 (UTC)

This is so cool. Now, where's the press release? :-P Teh Rote 22:03, 17 October 2008 (UTC)
But does this count include "bad" entries, i.e. those without wikilinks? Such pages are usually excluded from the statistics. --EncycloPetey 00:42, 18 October 2008 (UTC)
The script that was used to update the stats after the "pause" due to DB problems counts all outgoing links, not just brackets in entries, while the ongoing increments are based on brackets. So the stat counter will double-count an entry when a link is added, while missing others. It basically crap, doesn't mean anything. Yes, people pay attention to it, not having anything else ... Robert Ullmann 02:30, 18 October 2008 (UTC)
And I'd like to say right off the bat that if, for whatever reason, it turns out that Wikimedia officially declares fr to have gotten to a million first, it would look pretty petty if we made a big stink about it. -Atelaes λάλει ἐμοί 00:44, 18 October 2008 (UTC)
If we think the count of one million is correct, then we should post at Wikimedia News. I haven't seen the data, so I'm following the (probably flawed) statistics counter for now. --EncycloPetey 00:50, 18 October 2008 (UTC)
We can blow that away quickly too, but as noted, it means jack shit. Robert Ullmann 02:30, 18 October 2008 (UTC)
So what kind of data do you want? It takes running through the entire db, counting the entries. Take the Oct 17th XML dump and do it yourself? Or what? (I'm not being snarky, just wondering what you are looking for?) Robert Ullmann 02:33, 18 October 2008 (UTC)
Having never worked with a dump (and not knowing how), I'm not really sure. I'd prefer that you post our milestone at Wikimedia News yourself, since you're the one that did the analysis and therefore would be able to confidently answer critics who point out that our site-based page count doesn't say one million yet. Maybe this will prompt someone to fix that... Supposedly XML dumps are now possible and "on the way", at least according to a conversation I had a couple of days ago on #wikimedia-tech. --EncycloPetey 03:17, 18 October 2008 (UTC)
So here's a question: How have these sorts of things been done in the past? Did someone simply watch RC and declare the winning entry to be the one which turned the RC counter to whatever number? Certainly there must be a more systematic way of doing it than that. How did they decide the ten millionth all-lang wikipedia article? It seems to me that we should follow the system that every other project has been using, whatever that is. On an almost related note, some projects seem to be noting everything (e.g. number of articles, number of edits, number of users) on that news page. Maybe we should note our millionth block, when the time comes (if we haven't done it already). That's something I would be proud of. -Atelaes λάλει ἐμοί 06:48, 18 October 2008 (UTC)
When I've done it in the past, I've noticed the counter just over the mark, and counted backwards. Now that we have XML dumps current (EP: look at http://devtionary.info/w/dump/xmlu/ even if you don't want to grab any; we have dailies) it is easier to simply count from the data. We probably should wait until that counter {NUMBEROFARTICLES} goes over 1 mil before any announcement; should only be a few hours anyway. (Oh, BTW, the WM XML dumps are running, but Brion has failed to fix the queueing problem; there are 5 threads running, all stuck on huge projects; the en.wp dump has an ETA in February ;-) Robert Ullmann 12:10, 18 October 2008 (UTC)

one million on the counter at 15:57 UTC, entry was good job ... fr.wikt had 995,002 at that moment. Robert Ullmann 15:59, 18 October 2008 (UTC)

Darn it, I already posted it to Wikinews before you mentioned that! I'll have to correct it. Teh Rote 18:36, 18 October 2008 (UTC)
Remember that that counter is bogus. Counting all the entries in NS:0 that we want to count, the millionth entr(ies) are as I listed above. But the counter was reset recently using actual outgoing links (which is closer to the way we'd like to count). Counting with the brackets algorithm that the counter uses for updates, we are at 995,606 right now .... Robert Ullmann 16:23, 19 October 2008 (UTC)

Kudos to everyone involved! Really nice achievement. I think it is time to remove sort of strange claim that Wiktionary have around 300,000 articles. "Excluding these 163,000 entries, the English Wiktionary would have about 137,000 entries" (c) Wikipedia TestPilottalk to me! 12:18, 20 October 2008 (UTC)

one million if the counter was correct (;-) If the counter was doing what it should, and not off by ~7000 ... counting entries with [[ in them, the million entry is

  • (including bot): insertad
  • (excluding bot): stuntwoman created as gibberish by an IP-anon, and turned into a proper entry by Nandando

at about 23:52 UTC 20 October. So there are two more candidates ... (smirk) Robert Ullmann 12:12, 21 October 2008 (UTC)


When Wikipedia surpassed 1,000,000 articles a long time ago, they had a banner on the front page commemorating it. Why doesn't Wiktionary? --Takamatsu 04:29, 28 October 2008 (UTC)

I think that when half of the entries are soft redirects written by bots it sort of makes the entry count a bit less significant. It would probably be a short order to write an inflection bot to write thirty thousand grc inflected forms. While Wiktionary would be a better dictionary because of it, the numbers somewhat inflate the importance. -Atelaes λάλει ἐμοί 07:29, 28 October 2008 (UTC)
Our entries are also quite a bit smaller in the average case (whether form-of or not); if one imported M-W (not, because it is copyright, but if one did), that would be 736,000 entries at a go. (we have about 1/2 now, as a WAG) "million" just doesn't mean as much. Now, when we get to (say) 1M headwords (counting entries for each language, not pages, but not counting form-of unless they have defs and examples) then we'd have something to crow about. As of March, that metric was about 437K, depending on details. Robert Ullmann 08:03, 28 October 2008 (UTC)

I wrote a script to extract the missing plurals of English nouns from the XML dump. There are about 3,500 of these at the moment. I ran a totally supervised bot (I had to manually approve each entry before submission) on a hundred and thirty-odd of these in the pre-million dash and the accuracy seems to pretty high. I ended up deleting three singular pages that should never have existed, and made two further pages uncountable. I don't have the time to do the google books search for so many words (besides which, they blocked me from searching when running manually for acting like a bot). Would there be an interest in me running this as a proper bot? This may mean that a few entries get created which should not exist, however I would still check the output for any obvious errors. Conrad.Irwin 11:54, 19 October 2008 (UTC)

Yikes! I undercounted just slightly. With a much-improved parser I now find 25,000 missing plural entries. Conrad.Irwin 13:16, 19 October 2008 (UTC)
Hmm ... I was going to suggest we look at the list of 3500. You are finding 25K for English? or a number of languages? A list of English words should be easy for a fluent native speaker to review in advance, likewise for other languages. Robert Ullmann 15:42, 19 October 2008 (UTC)
24K now, with the User:Conrad.Irwin/bad_plurals gone - leaving User:Conrad.Irwin/good_plurals. You'd need a larger vocab than I to make much of a dent in it; but it would certainly be possible to do a quick sanity check. Conrad.Irwin 23:22, 19 October 2008 (UTC)
Doesn't look terribly hard But where are you getting, say, "albatrosss" from? Robert Ullmann 23:36, 19 October 2008 (UTC)
From neglecting to remember that the pl= parameter cancels out the other parameters (as the pl2= and pl3= don't). Interesting that there are still a number of words ending in 'sss', so maybe I just got lucky with my first sample. Down to 23,000 now. Conrad.Irwin 01:00, 20 October 2008 (UTC)

Along similar lines: I was running some code to add links to various "form of" entries, from the list at User:Robert Ullmann/Not counted/Fof list. This was to increase the pages counted by the s/w. I stopped when we went past 1M, as I had caught a couple of errors, and need to look at the code a bit closer. Should I continue? Robert Ullmann 15:42, 19 October 2008 (UTC)

It might be more beneficial to convert the entries not using the form of templates to use them - or maybe to do both in parallel; they're both low priority. I can get you a list of 12,000 entries that look like form-ofs but don't have templates if you want - but deciding which "form of" template to use will require a bit of parsing fun. Conrad.Irwin 23:22, 19 October 2008 (UTC)
I added two lines to AF to link in some of the form-of cases; the edit summary says "make page count: ...". It only does this if the page doesn't already count. Not terribly useful, but is our general practice. Where can I get your list (which sounds more interesting ;-)? Robert Ullmann 08:07, 28 October 2008 (UTC)
If you're still following this thread it's wikt_cleanup. Conrad.Irwin 01:56, 16 November 2008 (UTC)

What should I do with plurals of words like replayability, where the plural clearly can exist (when comparing the replayabilities of several games) but doesn't have any use. Similar problem with exothelium and exothelia - but as the singular has such low use I'd be more inclined to include the plural "on faith". (I'm running in manual mode in my bot account for a short time now and then) Conrad.Irwin 01:13, 24 October 2008 (UTC)

My personal preference would be for a "plural not attested" option for en-noun (and analogous templates). Or maybe "plural form Xs not attested," generated by {{spec=Xs}}, so that we could show the presumptive plural without linking it. -- Visviva 03:22, 24 October 2008 (UTC)
replayabilities has (very limited) usage on the web in the way Conrad suggests, but not in instances that we accept as durably archived. It seems to me that we only have have one relatively permanent and definite state for plurals: attested. The attestation can be challenged, but let's ignore that. Let's also ignore multiple plurals. Very few plurals have been attested and the effort to do so should probably await the development of the "attestor's workbench". At the lemma (singular) entry, we now have five states shown: "uncountable", "blue plural", "red plural", "black plural", "no plural" (omission). Unfortunately none of these at present has any procedure that assures the user of the attestation or other kind of validation of the plural.
I see merit in blue showing only for attested plurals. "uncountable" is sometimes used because a contributor is unsure how to insert the proper spelling of the plural, even in simple caaes ("es"). Perhaps en-noun should default to show no plural, with it entering a maintenance category ("no plural shown"). Once a plural is shown by a selection of a particular form, perhaps it should be shown as black until an attested plural entry is entered. Uncountability has its own special problems, but its use when a user is ignorant of template mechanics is not acceptable. If a plural fails RfV (or a similar attestation process), but the plural would follow common rules, a blue-linked asterisk or other superscript would seem good enough.
The obvious biggest problems are the work and the change in contributor habits. En-noun would change. Many existing plural entries, some with non-vacuous content, would be orphaned. There must be other problems, too, but I will leave their discovery to others. DCDuring TALK 09:13, 24 October 2008 (UTC)
I don't think that would be particularly constructive. For the vast majority of English nouns (and other regularly-inflected words in living languages), there is no problem with our usual practice; the plural exists, or occasionally has good reason not to exist, and for most words that's all there is to say. We don't want to complicate garden-variety entries because of a handful of corner cases. On the other hand, I think we do want to do a somewhat better job of handling these (especially because it will give us a better handle on classical and poorly-recorded languages, where these cases are much more common). -- Visviva 12:36, 24 October 2008 (UTC)
Replayability is an uncountable noun. In the first 30 hits at google books:replayability, I find plenty of unambiguously uncountable uses, and no unambiguously countable ones. Granted, all uncountable nouns can be countified, but if the plural isn't even attested, then I don't see what's wrong with {{en-noun|-}}. It drives me crazy when editors use {{en-noun|-}} to mean “I don't know what the plural is or whether it exists”, under the (IMHO wrong) impression that an erroneous "this noun is uncountable" is better than an erroneous "this noun's plural is _____"; but I think it's perfectly fine to use {{en-noun|-}} to mean “this noun is uncountable, but English grammar allows uncountable nouns to be countified”. —RuakhTALK 13:25, 24 October 2008 (UTC)
See Appendix:Unverified plurals. Teh Rote 04:04, 16 November 2008 (UTC)

Can someone please explain the meaning of the table headers in Wiktionary:Statistics? 1. Number of entries 2. Number of definitions 3. Gloss definitions 4. Form-of definitions. How come the total Number of entries (1,003,409) and Number of total pages (1,117,298) are different? Thanks. --Panda10 00:56, 20 October 2008 (UTC)

Pages is a literal count of all pages (including talk pages, wiktionary pages, etc.etc.) entries (in this context) is the number of pages in the main namespace that contain "[[" (the fun of software). Conrad.Irwin 01:02, 20 October 2008 (UTC)
Except that "entries" is not number of pages (not including redirects) that contain [[. It is the number of pages with actual outgoing links as of about two weeks ago plus the number of pages since added or updated to include [[. That is, it means practically nothing. The number of pages containing [[, what it is supposed to be, was 996,337 as of a few hours ago. Is the MW software broken? Yes, seriously. Do they care? No. Robert Ullmann 01:16, 20 October 2008 (UTC)
What is number of definitions then? There are approximately 1,485,304 definitions in total. How could it be that literal count of all pages smaller than number of definitions? TestPilottalk to me! 12:11, 20 October 2008 (UTC)
"definitions" is not "number of words that have definitions", it is the number of definitions. Some words have more than one, and some pages have more than one word (homographs, in 1 or more languages). Robert Ullmann 15:11, 20 October 2008 (UTC)
More specifically it is the number of lines in the redirect-free main-namespace that start with a "#" that don't continue with a "#*:;" and don't contain {{rfdef}} or {{defn}}. Conrad.Irwin 15:20, 20 October 2008 (UTC)

I just noticed that {{t}} is not mentioned in WT:ELE, but it is ni Wiktionary:Translations. I think the template is sufficiently mature now to include it there. Shall I start a vote on this? Something like: include in the dos: use {{t}} and adapt the example below accordingly? H. (talk) 09:29, 20 October 2008 (UTC)

Since nobody objected or commented on this, I have been bold. H. (talk) 11:45, 6 November 2008 (UTC)
That's not how it works. We want explicit discussion before making major changes to policy documents. Further, you changed all the quotation marks to smart quotes, which do not display correctly in some browsers or on some platforms. I have reverted your changes, since they were not approved. You should know better. --EncycloPetey 18:31, 6 November 2008 (UTC)
Re: smart-quotes: really? I dislike them myself, but had been given to understand that they're de facto policy, so have started using them myself (well, when I bother). They're enforced by templates such as {{term}}, and encouraged by their presence in the edit-tools. But if we can get a push to use normal-person quotes, I'm totally behind it. :-)   —RuakhTALK 20:07, 6 November 2008 (UTC)
I support putting the {{t}} into WT:ELE as the recommended method, and as far as I am concerned H. was right to be bold after more than two weeks. I agree with EP though that "smart" quotes shouldn't be used. Thryduulf 19:51, 6 November 2008 (UTC)
Don't misundertand. I agree that {{t}} oought to be explained in ELE, but we've held for a longer time now that major changes to the ELE text require a vote. This would be a major change, and it would certainly be preferrable to hash out any discussion of problems with the particular wording before altering our primary policy document. --EncycloPetey 20:09, 6 November 2008 (UTC)

Is there a template or a template parameter to let an entry link to other Wikimedia project only by inserting an inconspicuous link at the left, without creating any other visual element? What I mean is such a link that can be seen at commons:dog, where it links to Wikipedia.

I would like to use such a template for linking to Commons. The conspicuous links to Wikipedia created by {{wikipedia}} and {{pedialite}} are already common and got used to, so I would keep the practice of using {{wikipedia}} for Wikipedia. --Dan Polansky 19:05, 20 October 2008 (UTC)

I don't think there is one, and personally, I don't think I'd like there to be. What's wrong with {{projectlink|commons}}? —RuakhTALK 21:01, 20 October 2008 (UTC)
The template {{projectlink|commons}} is okay, but it takes away four lines, uses icons, and uses boldface, so the result it produces is quite conspicuous, IMHO anyway. By contrast, the links that the very same template creates at the left are inconspicous, and easy to find once the user gets used to looking there for them. (An example of use of the template: United States.) The template is placed under "External links" heading, which it only should, and yet, the links are external to Wiktionary while internal to the group of Mediawiki projects, unlike links to, say, Webster's dictionary.
Also, what I find strange is that some of the items created by the template are sentences, not terms denoting external resources. What I mean is that the link reads "Wikimedia Commons has media related to “United States”." and not "Media related to “United States” at Wikimedia Commons". For comparison, we have at knowledge: ""knowledge" at The Century Dictionary, The Century Co., New York, 1911.". But that is a different topic. --Dan Polansky 07:03, 21 October 2008 (UTC)

According to Alexa we (wiktionary.org) are now in the top 1000 sites. en.wikt is almost half the overall traffic for the wikts.

(note that the rank displayed at the top is a 3 month average; look at the graph and the daily numbers ;-)

About 1 in 1000 of all global Internet users use the Wiktionaries in the course of a given day. Not bad. Robert Ullmann 17:38, 23 October 2008 (UTC)

42% of the traffic is to enwikt. M-w.com gets more than four times as much traffic as enwikt. 27% of wiktionary.org users come from US, UK, canada, and Australia. More than 75% of M-W.com users come from those 4 countries. English-only M-W.com has more traffic from every English-speaking country that I can see stats for than all wiktionary.org. DCDuring TALK 22:35, 23 October 2008 (UTC)
Yeah, one of the huge advantages of en.Wiktionary over M-W is that it got translations plus it can define words for variety of languages. Plus, if person speak more then one language, Wiktionary sometimes can provide word definitions in other languages. TestPilottalk to me! 19:40, 25 October 2008 (UTC)
We seem to have backed into a de facto market strategy of serving non-native speakers. The other side of offering the translations and prominent pronunciation section is to make the site less attractive for most native speakers. This would be particularly true for the very large number of mono-lingual English speakers in the US. Such users would find MW good for "serious" use and Urban Dictionary for trendy slang. No wonder that I have to explain Wiktionary to my friends. DCDuring TALK 20:15, 25 October 2008 (UTC)
Native speakers outside the United States (which is the majority), find translations and such very useful, as they are often using several languages. UK and Aussie/NZ might be more mono-lingual, but native English speakers in India will pretty much always also speak Hindi or whatever; English speakers here all speak Swahili as well (the evening news switches back and forth whenever someone is interviewed in the other language, with no captions or pause :-). I like the fact that we are much "bigger" than M-W and such. Robert Ullmann 21:49, 25 October 2008 (UTC)
You are looking in very wrong direction. The reason why en.Wikt is not as popular as M-W is because it is not as good. Yet. And it is really easy to see that M-W got more definitions and even headwords(for English). But Wiktionary catching up. And it is a matter of time till it become best dictionary around. Just as it happened with Wikipedia vs. Britannica. But multilingual capabilities do bring more users, native speakers including. Without them, there wouldn't be even half of those 27% around, not to mention the rest. TestPilottalk to me! 09:29, 26 October 2008 (UTC)
We are certainly not as consistent as M-W, but why are we relatively more successful among those located outside of English-speaking countries, especially the US? Is it just because of interwiki links?

I doubt that it is merely completeness that sets us back among native English speakers. Is it our layout (prominence of pronunciation, incredibly long tables of contents forcing users to page/scroll down before finding English definitions}? (Note what OneLook does, providing a list of definitions down the right-hand side in addition to the list of entries from different sources.) Is it the obsolete wording of some of the old Webster entries? Is it lack of consistency in presentation? Is it unreliability of entries? DCDuring TALK 14:26, 28 October 2008 (UTC)

Lets get statistics straight. M-W are getting more then 71% of it traffic from North America. Primarily from USA. With Great Britain plus Australia(total just above 4%) being insignificant markets. How come Merriam Webster are so desperately targeting States? My guess, since they oriented toward making money, they go for where is most moneys is - first economy in the world. Now lets look at enWikt. It have 42% share of total of Wiktionary traffic. 27% of total Wiktionary traffic comes from US+UK+CA+Au. That mean enWikt are getting 61+% from those 4 English language countries(if we assume that majority, 99+% who comes from US/UK/etc. go to enWikt). 14% difference compare to m-w.com. Now lets look at North American share of Wiktionary. Only ~47% for US or ~53% US+CA! Far cry from 71% of M-W. Now check English language population of US and CA in comparison to English language of the rest of the world. You can clearly see that Wiktionary traffic distributed more evenly. Should enWikt target primarily Americans? It might make sense donationwise. But I rather say NO. It is doing fine as it is so far.
Reasons for not being on top is not "merely completeness" of Wiktionary. It is overall quality of definitions. Plus lack of trust, huge brand recognition of Merriam-Webster trademark(100+ years old), lack of examples/pronunciations/etymologies sections for lot of Wikt entries and so forth.
As for "long tables of contents" - it could be fixed in really easy way. Lets make it collapsed by default for anonymous user and remember state for registered users. As simple as that. I doubt thou it is a real problem. Prominence of pronunciation? You must be kidding. There is dozens projects around that target minimalism approach toward user interface of dictionaries. The best web based one, IMHO, ninjawords.com. I personally doubt that it got many users. Writing something like NinjaDic with Wiktionary backend could be done in no time, and most likely is out there somewhere. I myself, by coincidence, happens to have one such project. Download WikiLook, and it will define words for you using enWikt without pronunciations, translations, tables of contents and so forth. WikiLook even go one step further - it let you to check definitions without opening any extra tabs or new windows in your web browser. With an easy access(just a click) to full definition page in case you need it. And that is just couple of gazillions possible ways of approaching end user (keep in mind, they got different needs by definition). With open source communities catching up, there would be even more options around. TestPilottalk to me! 20:05, 28 October 2008 (UTC)
Here are the simple facts of wikt market share relative to mw:
US: 20%; UK: 82%; canada; 46%; India: 46%.
In the UK, cambridge.uk is a popular site (more popular than wiktionary) that gets 82% of its reach from its dictionaries.
MW has a high portion of its traffic (79%) from these 4 countries because it only offers English. Some of wiktionary.org's traffic in these countries is attributable to offerings from de.wikt, fr.wikt, etc.
In English wiktionary faces competition from encarta.msn.com, bartleby.com, dictionary.com, answers.com, wordnet, freedictionary.com, artfl, and others.
Our multi-lingual offering ("all words in all languages"), together with whatever traffic the other wiktionaries get in English-speaking countries, don't seem to help us gain share against the monolingual dictionaries in any English-speaking country. I don't know what the implications of this are for us, but it might make it easier to understand why we don't seem to get much attention from WMF developers or from the US media and the public. DCDuring TALK 01:18, 29 October 2008 (UTC)
I agree, and I think it is looking very positive. These are huge figures, and we're not anywhere close to a “finished” dictionary yet.
By the way, do we have any way of gauging the completeness of basic vocabulary in Wiktionary? I wonder how we compare to, for example, a grade-school dictionary or small college dictionary. Michael Z. 2008-10-28 02:41 z
List of defined entries from big editions. You can see how many are redlinks here. On the other hand I would not call most of them "basic". From time to time I come across real world words that not defined here and are defined in Google or M-W.com. And, for comparison - in the year 2004 there was 50K entries here total. And you can hardly score hit, unless you are searching for truly basic words. TestPilottalk to me! 12:59, 28 October 2008 (UTC)
I'd like to see lists from some small and medium editions, although I suppose those might not have been deemed as valuable to compile. They could give us some milestones to shoot for on the way to the OED. Michael Z. 2008-10-28 16:16 z
I guess large word lists are a good start: simple:category:Wordlists, WT:FREQMichael Z. 2008-10-28 17:26 z

There are two categories for English proverbs: Category:Proverbs and Category:English proverbs. If {infl|en|proverb} is added to the inflection line, the entry will be automatically in Category:English proverbs. Can I move all entries from Category:Proverbs to Category:English proverbs? --Panda10 22:15, 24 October 2008 (UTC)

Why not the other way? DCDuring TALK 20:43, 25 October 2008 (UTC)
It would make no difference to me, but there a couple of reasons: Proverb is a POS, so it is a grammatical category just like nouns and verbs. Also, the category for foreign languages is "<lang> Proverbs", not "xx:Proverbs" (see Category:Hungarian proverbs vs. Category:hu:Proverbs). It would be helpful to keep it consistent. Plus using {infl} automatically puts the entry in English proverbs. --Panda10 20:52, 25 October 2008 (UTC)
Yes, it is like other POS. Should be Category:English proverbs and in Category:Proverbs by language. Which it is. Robert Ullmann 21:42, 25 October 2008 (UTC)
Agreed. Panda10, please do. —RuakhTALK 05:37, 26 October 2008 (UTC)
Actually, I don't think "Proverb" is really a part of speech in the grammar of the English language at all. I prefer just to use "phrase" as the POS and add the category Category:English proverbs manually for the English phrases which clearly have proverbial status (which can sometimes be a bit of a judgment call). -- WikiPedant 18:54, 26 October 2008 (UTC)
"Phrase" isn't a "true" or "classical" PoS either. Both "Phrase" and "Proverb" are in use as Wiktionary PoS headers. Phrase seems to have higher status, but Proverb seems superior to me in specificity. Most occurrences of "Phrase" could be replaced with a "classical" PoS, but not any properly applied Proverb headers. However, it is probably more important to have an entry in the Proverb category than under a Proverb PoS header. DCDuring TALK 20:17, 26 October 2008 (UTC)
I agree that "proverb" isn't really a grammatical POS, but to me "phrase" implies "not a clause". I wouldn't consider "It takes all kinds to make a world" to be a "phrase", for example. Even something like "Don't count your chickens before they're hatched", which technically is a single verb phrase with don't at its head, isn't really well described by the "phrase" POS IMHO. —RuakhTALK 22:53, 26 October 2008 (UTC)
Personally I try to avoid using "phrase" as POS, and would advocate deprecating it. I can't even remember the last time I used it. I like to use "proverb" if it really is one, but the category seems to have a number of entries that are not in the form of the set proverb e.g. one who hesitates is lost (anyone searching would start this with "He") and some simply are not proverbs, full stop. Back to the main point, I believe that Category:English proverbs is the correct heading for the reasons outlined above, and also because many of them have direct translations to similarly worded proverbs in other languages. --ALGRIF talk 10:16, 28 October 2008 (UTC)
"Phrase" has quite a range of meanings. In its broadest sense, "phrase" means "a particular choice or combination of words used to express an idea, sentiment, etc., in an effective manner" (OED). Sometimes, "phrase" is understood in a more limited way as any syntactic unit which does not contain both a subject and a verb, ruling out clauses and complete sentences. And there are, of course, a number of other valid senses of "phrase" (some of them quite technical). I still see "phrase" (understood in the broadest sense) as the best generic term we have for any group of words (including complete sentences) which cannot be classified under any other part of speech. Even if we allow the use of "Proverb" as a POS (more like a pseudo-POS, I'd say), there are still some entries which defy all other POS's except "Phrase" -- such as that's the way the cookie crumbles, so far so good, that's the way the ball bounces, or ladies first, all of which are currently categorized as proverbs, but, as Algrif says, "simply are not proverbs, full stop." But, to return to my original point, I'm still wary of using "Proverb" as a (pseudo-)POS and prefer to use "Phrase" with the manual addition of Category:English proverbs for that subset of phrases which are incontrovertibly proverbial. -- WikiPedant 18:33, 28 October 2008 (UTC)

The move is completed, thanks Dan. I'd like to delete the now empty Proverbs category and its talk page. I checked the What links here page, and modified a couple of pages. The rest of the links are talk pages. --Panda10 17:56, 28 October 2008 (UTC)

Wiktionary's Word of the day currently just points out "interesting" words, for expanding one's vocabulary; we have no equivalent to Wikipedia's Featured articles, which are not necessarily the most interesting articles, but which are comprehensive, well-written and referenced. Would other editors support the creation of a Featured Words project to do something similar for Wiktionary? --Ptcamn 21:35, 25 October 2008 (UTC)

I seem to remember that when Word of the day started, we said that all words chosen were to be of a reasonably good quality, with an etymology and at least some translations. Or is my memory even worse than I thought (quite likely)? SemperBlotto 21:59, 25 October 2008 (UTC)
I don't know how it started, but right now, it seems to be the other way 'round: whoever's doing WOTD (IIRC: EncycloPetey until recently, and currently Circeus) will select an entry because the word is "interesting", and then bring it to a good quality before the date arrives. That's my impression, anyway. —RuakhTALK 05:41, 26 October 2008 (UTC)
Even the earliest entries, for the months before I got involved, were selected as "interesting" words rather than as quality articles. Part of my rationale for keeping WOTD that way is that there is a tradition (at least in English-speaking countires) that a "word of the day" should be such an interesting and vocabulary-building word. Calendars, newspapers, and even other on-line dictionary sites do it that way because that's what a reader expects it to be. This is a draw for people who enjoy learning. In contrast, the Wikipedia Featured article can feature their best articles, rather than just something offbeat or unusual, because they are an encyclopedia web site, and so have articles on topics rather than entries on words. No matter how good our entry on the name Cyrus became, it would never draw in the number of views that WP could get featuring an article on Miley Cyrus. A topic can have currency, breadth, and immediacy (even on a historical topic) that just isn't easy to put into a dictionary entry, nor should we try to change that. Second, our entries are, by necessity, much heavier on adherence to format than WP articles, and so what makes our entries "featurable" is fully expanded format. That's something that's just not as interesting to the general public. Yes, it's worth having our entries on head, walk, and big as rich and well-done as we can, but a well-done article on such a word, even if rich with useful content, carries no general appeal. Third, a Wikpedia article, because it covers a topic, can begin with a summary section, and it is from this summary section that their main page copy portion is produced. What would we place on the Main Page to draw in reader interest? We can't summarize the content of one of our entries, because it's partitioned into discrete sections, each one of which intended to contain and present a specific kind of information. You can't summarize the Quotations, Synonyms, Derived terms, Translations, etc. the way that Wikipedia is able to summarize the subsections of one of their articles, so there just wouldn't be a way to present such an article on the Main Page. The underlying functions of a dictionary and an encyclopedia are very, very different. The way in which WP's featured articles differ from our WOTD is just one reflection of those diferences. --EncycloPetey 18:50, 6 November 2008 (UTC)
I'm not convinced that the time wasted to work out what makes an entry featurable and then to debate whether pages conform to the criteria would be best spent thusly. I do think that to have a list of a few well constructed entries with varying levels of detail would be useful to point newbies at, and that's pretty much the same thing. Conrad.Irwin 08:41, 26 October 2008 (UTC)
I tend to agree with Conrad.Irwin. Wiktionary editing tends to be rather more distributed than Wikipedia (when was the last time you saw anyone spend a week on a single entry........and thought it time well spent). Additionally, it seems like few, if any, people are good at everything. Some folks write amazing defs, some verify esoteric words, some write code, some put out fires and generally make up for the lack of diplomacy most of us have, etc. I have yet to see a single editor whom I would think capable of writing that perfect entry all by themselves. Couple that with our general inability to coordinate and stick to tasks (I run out of fingers and toes counting off failed projects, and simply run out of ideas trying to come up with one which involves more than one person and is completed in a timely fashion), and I think it unlikely that we're capable of creating a new such entry every day. Speaking for myself, I'm far too self absorbed to get bogged down by deadlines and such. I think that having a few entries brought up to amazing standards as examples for.....well.....everyone (not just newbs) would be an excellent idea (and it might not be a bad idea to make a habit of getting one for every language) and time well spent. Just a point of clarification, I don't think that our project sucks, no matter how much my preceding comments might seem to indicate.  :P -Atelaes λάλει ἐμοί 09:04, 26 October 2008 (UTC)
Very true, and very well put. —RuakhTALK 17:33, 26 October 2008 (UTC)
Likewise. That was one of the goals behind two of my personal pet projects:
  1. The Model Pages project, which is more modest right now than when I first conceived it, but which I maintain and for which I continue to seek help now and then. The result is that we have really good models for simple situations on a common noun (parrot), proper noun (Central Europe), and verb (listen), as well as one slightly more complicated case (hinder). I've also made a start on a few non-lemmata to show a little of how to take these pages beyond the bare minimum (which is something that even an inexperienced new user can contribute greatly to).
  2. The Substantive nouns primer for Latin. I took a variety of Latin nouns chosen as if I had been looking to make an ABC-book for children, so I selected everyday nouns (well, everyday for Ceasar's Rome) which could be illustrated easily. I also made sure to represent various genders and declension patterns so those would be modelled for editors. Each of these has been expanded as much as I've (so-far) been able to manage. I went one step further and also beefed up the corresponding entries on Victionarium at the same time, thus ensuring that people who went looking first on the Latin edition of Wiktionary, but who desired an English explanation, would be able to follow a trail here.
There is ample opportunity for similar projects using additional articles or in additional languages, but my experience is that you should expect to tackle them almost single-handedly. At best, you'll be getting support only when you make a direct personal appeal to an expert or specialist on a scale that they can manage easily (such as asking for a translation into a particular language). Ultimate success will depend on your own personal unflagging enthusiasm and effort. --EncycloPetey 19:09, 6 November 2008 (UTC)
Not disagreeing, but we should somehow encourage folks to tackle revising the big entries, many of which would be an embarrassment as WoTD, not being better than the Webster 1913 entries they started from. What VisViva did for head is something that needs to be done for many entries to serve our core language-learner user base. A single person may need to tackle each, but will certainly need assistance. DCDuring TALK 20:27, 26 October 2008 (UTC)
I completely concur with the proposal for "featured word" - after I encountered some adversities when expanding the etymologies, I was forced to insert references and it has become my habit ever since. It would be better if referencing in the "etymology" section augmented and leads it to "featured" provided that the article is circumstantially and diligently dealt with. Bogorm 11:06, 27 October 2008 (UTC)
It seems to me that you are advocating reactivation of {{COW}} If so, I would agree. I'm not sure why it is deactivated in the first place. -- ALGRIF talk 10:07, 28 October 2008 (UTC)
I think it's because no one was laboring except for EncycloPetey, and "solilaboration of the week" didn't have the same ring to it. —RuakhTALK 19:01, 28 October 2008 (UTC)
The COW died from serial solitary laboring. Connel tried to keep it going for a while, and around the time he burned out on it (for not getting any sustained community involvement), I stepped in. Then I burned out for the same reason, so Davilla stepped in and subsequently was just as disillusioned. The French Wiktionnaire has had the same poor results. In late 2005, they started an Articles de qualité, which has grown to include only 14 articles since that time. While the idea is a good one in principle, in practice it just hasn't worked. Chalk it up as one other way in which Wikipedia and Wiktionary substantially differ. --EncycloPetey 18:38, 6 November 2008 (UTC)

Category:Male given names and Category:Female given names have over a thousand names each. A few hundred names are listed in the subcategories by origin. It's a confusing set-up. You have to keep clicking pages to find secret subcategories that are almost empty. I would like to sort all English names into subcategories (why have them at all otherwise?) and for that, I would create the following new subcategories:

  • Diminutives of male/female given names. Pet forms are quite separate from formal given names in many languages.
  • Male/female given names from surnames. The place name Shirley derives from Old English, but it wasn't a personal name then, so it's misleading to list it with Edith and Mildred.
  • Male/female given names from place names (when they are not surnames). Brittany, Erin, Shannon etc.
  • Male/female given names from English. April, Heather, Pearl, Earl. Where the word originally comes from is beside the point.
  • Male/female names of artificial origin. Vanessa, Belinda, Jayden, Deshawn etc.
  • Male/female given names used in India (or: from India). They cannot be called romanizations since the original language often isn't mentioned. They may be used in several Indian languages. Hopefully some day somebody comes along who can sort them into sub-subcategories.
  • Romanizations of Russian/Greek/etc male/female given names. See the example Nikita. I think it's an error to call them English. The subcategories should go to the original languages.

Robert Ullmann wants to change the names of all given name categories from topics into parts of speech, so "English" would be prefixed to the above. It would be a good chance to get rid of erratic categories. E.g. Category:Male given names from Greek was originally a mistake for Ancient Greek and is now used for romanizations. --Makaokalani 09:33, 27 October 2008 (UTC)

Sounds like the right direction.
Do endearing names correspond to diminutives in all languages? (They mostly do in Slavic languages.)
I'm not so sure that's the right way to treat the “romanizations”. The way we treat most terms, the Russian entry would have Никита in Cyrillic. Not sure where Nikita in Latin letters belongs—perhaps under several language headings or “translingual”, but it's not in Russian orthography, and it is used by people who don't speak a word of Russian. Michael Z. 2008-10-28 02:34 z
You are right. A romanization is a part of speech ( it IS a romanization, not ABOUT romanizations) so it should have a language and Nikita isn't a Russian romanization. Can parts of speech be translingual? "Category:Translingual romanizations of Russian male given names", and a parent category "Translingual romanizations of given names"? Does anybody except a Wiktionary editor understand what a translingual romanization is? Each time I try to make order to the given name categories the matter gets more complicated.
A diminutive would be any endaring or pejorative name derived from a person's official given name. They are different in each language. In Finnish almost anything beginning with the first letters of the given name will do, and only the commonest ones are worth recording. It's a much needed category. Right now Spanish Pepe and English Jim are defined as given names instead of diminutives. When an English name is both it can be explained in the entry. Betty = "A diminutive of Elizabeth, also used as a formal given name."--Makaokalani 11:13, 29 October 2008 (UTC)
But are romanizations common for all languages using the Latin alphabet? (Or am I now confusing romanizations with translitterations again?) I was just thinking of the Russian name Юрий which, in en:wp, seems to be translitterated/romanized as variously Yuri, Yury and Yuriy, but which in Swedish are spelled Jurij, and according to w:Yuri Gagarin also are spelled as Joeri, Juri, Iuri, Youriy,.... depending on the target language. Hence I don't see how you mean there is a translingual romanization of this particular name. \Mike 11:51, 29 October 2008 (UTC)
I wish there wouldn't be any romanizations in the Wiktionary. I'm just trying to sort then somewhere. "Translingual" wouldn't mean it appears in all languages, only in several of them. I could also define Nikita as an English romanization. But then we'd need an entry for every romanized language: Nikita in Finnish, Swedish, French, Kiswahili... or could we make a rule that only English romanizations are allowed? Maybe I'll just call them English and worry about it later. --Makaokalani 12:04, 29 October 2008 (UTC)

I'm eager to add some Czech verb forms here, but I don't know which way to do it. I had a look at the entries in Category:Spanish verb forms, Category:Portuguese verb forms, Category:French verb forms, Category:Latin verb forms, Category:Finnish verb forms, and Category:Italian verb forms, to see which looked nicer, and I think that the Portuguese, Spanish, Latin and Finnish verb-form entries are the best, all nicely templated. If I want to add these Czech verb forms, which style would you recommend that I use. --Ro-manB 17:16, 27 October 2008 (UTC)

Also, most of these were added by a bot - who can use their bot for Czech entries? --Ro-manB 17:16, 27 October 2008 (UTC)
When you're looking at formats, just remember that the Spanish verb forms have become over categorized. Ideally we'd have them all under one category- just Spanish verb forms. So if you add forms, try to keep the categorization simple. Nadando 23:35, 27 October 2008 (UTC)
There are already some Czech conjugation templates. I don't have many resources on Czech grammar, so I can't judge them properly. However, if they are correct (for example, see nést), there are only three tenses (past, present and future), two numbers (singular and plural) and three persons (first, second and third). That sounds fine for me, if you want to create one category for each of the 18 possible forms, or just keep all of them at Category:Czech verb forms. If the conjugation templates are incorrect or lacking important details, I suggest you first edit them. Then, you could create entries for verb forms using the {{form of}} template, as explained at Wiktionary:About Czech; or create a new template specifically for Czech entries that should generate simple definitions (e.g. "First person singular present tense of doufat.") and add the entry to the corresponding categories. Daniel. 01:48, 28 October 2008 (UTC)
Re: over-categorization: Sorry, I think that was my bad. I've never understood the point of these ginormous categories, and apparently I didn't even understand how they were supposed to work. —RuakhTALK 15:11, 28 October 2008 (UTC)
I think that the point of these ginormous categories is simply label an entry; generate a small text (Galician verb forms | Portuguese verb forms | Spanish verb forms | Swedish nouns) that ought to summarize all languages and parts of speech involved. This is good for one reading an entry, but I don't think that this user will ever click on these category links and find much more information than "Wiktionary has 186,484 entries for Italian verb forms up to date". If someone is, for example, studying Spanish conjugation, he or she could want to see a category full of entries ending in -ríamos. The problem with the actual Category:Spanish verb forms is: this user probably won't find anything useful easily either. There are many categories, often describing parts of more subcategories leading to a main fully-detailed one, as in: Spanish first-person verb forms > Spanish first-person future verb forms > Spanish first-person future subjunctive verb forms > Spanish first-person singular future subjunctive verb forms (compare with this: Spanish first-person forms > Spanish first-person future forms > Spanish first-person singular future indicative forms). There are also many discrepancies between them, like "Spanish verb forms"/"Spanish:Conjugated verb forms" and "Spanish present participles"/"Spanish gerunds", but even if all categories matched completely, this still doesn't seem to me a good system: We would need 71 categories just for the indicative persons, numbers and tenses (including conditional); if the subjunctive mood is included, the number would be 201, and so on. And, to prevent over-subcategorization (e.g., a category "Spanish present verb forms" pointing directly to all the present-related categories), there would be up to four empty categories leading to every populated category. In my opinion, a better way to fix this situation would be to include all populated categories (Spanish first-person future indicative...) directly on Category:Spanish verb forms, i.e., to delete all its empty subcategories; or do as it is done for Category:Portuguese verb forms, i.e., to organize them just by mood, number and person. Daniel. 04:43, 4 November 2008 (UTC)
I disagree. No one studying verb forms would find a category full of "entries ending in -ríamos" helpful. Our usual policy is not to have any of these subcategories, and to just have Category:Spanish verb forms. Spanish verb forms are over-categorized as a result of an early bot run that resulted in many incorrect and badly formatted entries that we are still cleaning up many years later. --EncycloPetey 08:07, 4 November 2008 (UTC)
Most of these entries are automatically included on the categories by templates, including Spanish informal second-person plural conditional forms of -ar verbs and more. So I'd like to ask how can you editors are cleaning badly formatted entries, if the source of the problem is still there; instead, I will focus in the main problem of incorrectness. After seeing some dozens of entries, most seem correct, but not uniform (Why some third-person forms are labeled in separate definitions as dialects used only with "usted", and others are a single definition "also used with usted"? Why some imperative forms are lacking when identical to the present subjunctive, others not?), then please someone say if there is any policy for how to format all Spanish conjugation entries. As for the usual policy [...] not to have any of these subcategories, this applies just to languages with more than twenty variants? If not, Category:English simple past forms and Category:English archaic third-person singular forms should not exist either. Daniel. 15:08, 9 November 2008 (UTC)

RuakhTALK 22:04, 28 October 2008 (UTC)

I think we really could use some more wording in etymology sections. Cryptic stuff like ‘short + cut’ really isn’t very helpful. I also do not like the use of ‘<’ (or was it ‘>’?) to indicate inheritance and so on. What do other people think of this and should we work out a consensus here?

The impetus is {{suffix}} and related templates which indirectly promote this terseness. If including some more wording in these templates is not wanted, then at least we whould update their usage to instruct people to put some verbiage around it, but I fear that will make the templates less usable. H. (talk) 16:23, 3 November 2008 (UTC)

I find the terseness to make the etymologies easier to read. The use of + and < allows the etyma to stand out more clearly. Quite frankly, what more do you want to put down than "short + cut"? Seems to me like unnecessary fluff. If you can show me a wordier ety that I like, I might change my mind. -Atelaes λάλει ἐμοί 18:25, 3 November 2008 (UTC)
Brevity is only good if it is unambiguous and gives all the necessary information. As we don't have that much information to give, these abbreviated forms work - I'm sure examples can be found where too much information has been packed too tightly. However, if we persist in having the Etymology bit before the useful</troll> parts of the entry, then they will be kept brief. Conrad.Irwin 18:38, 3 November 2008 (UTC)
I always put "From", e.g. From {{term|short}} + {{term|cut}}., but when another editor removes it, I don't revert. I do think the "From" is important, because not all of our readers know what "etymology" means. Similarly, I don't use <, because I don't think the casual reader will recognize it, and while in some cases I think the idea comes across anyway, in some cases I think it does not. (I view it as analogous to the various abbreviations, F. and so on, that are found in other dictionaries but that we don't use.) However, I'm fine with +, as it seems crystal clear to me. —RuakhTALK 20:05, 3 November 2008 (UTC)
"not all of our readers know what "etymology" means" Which editors do you mean? Etymology is a loanword from Greek present in almost every Indo-European language (and other languages who are not so reluctant to accept borrowings - エチモロジー ), therefore virtually all editors from Europe, South, Central and North America must know what etymology is, must not they? Bogorm 20:39, 3 November 2008 (UTC)
Just because a person is a native speaker of a language does not mean they know every word in that language, and certainly not that they understand every concept described by that language. To know what "etymology" means requires at least a rudimentary understanding of language evolution, etc. Those of us who are interested in a discipline should not take for granted such knowledge, as plain as it may seem to us. Up until a few days ago, I had no idea what the term "liquidity" meant (and to say that I have an exceptionally solid grasp of the term now would be deceptive), as I have almost no background in economics. I would not consider myself stupid nor generally uneducated (though others are free to disagree :)). -Atelaes λάλει ἐμοί 23:06, 3 November 2008 (UTC)
Starting with “From” is good form. It may help a new reader who has never heard the term etymology understand what he is looking at, on his very first page view of Wiktionary. It in no way detracts. Michael Z. 2008-11-04 17:09 z
I agree that starting with "From" is good form. --EncycloPetey 17:15, 4 November 2008 (UTC)
Etymologies for compound words don't warrant much more than we have, unless we decide to include dates for first attested usage of one or more senses. I wonder if we might not expose ourselves to an endless supply of folk etymologies if we make wordy etymologies, especially for compounds. Long, discursive, or disputed etymologies should certainly not consume too much space, especially if they force definitions off the first screen. Such etymologies and two or more lines of cognates especially should normally only appear under a show-hide bar, if cognates be retained at all. DCDuring TALK 20:48, 3 November 2008 (UTC)
I am very adamant that cognates remain, when appropriate. While I share your concern about ten page theses blocking out the heart of the entry, I don't think it prudent to trim the preceding content to a minimum. The answer lies, rather, in altering our formatting/presentation. -Atelaes λάλει ἐμοί 23:06, 3 November 2008 (UTC)
What about using show/hide bars for etymological material such as cognate lists or discussions of disputed etymologies that, in total, take more than 3 lines and push definitions off the initial screen (with right-hand Toc)? As you know I like etymologies, including long chains through Middle and Old English and French; Anglo-Norman; Vulgar, Medieval, Late, and New Latin, the loss of visibility of which would greatly sadden me. DCDuring TALK 23:26, 3 November 2008 (UTC)
Since I like too etymological chains through Old Norse, Gothic and Sanskrit, it would sadden me as well. (That was facetious, I support every etymological information) I do not embrace the proposal for the hide bars, since I am firmly convinced that etymology is one of the most important parts of the articles, and however disputed it is, expounding the diverse linguistic theories without concealing any of them is indispensable for a thorough comprehension of the entry´s meaning. Bogorm 16:36, 4 November 2008 (UTC)
If we were just running this for our own benefit, I could agree. I'm looking for ways to make this site more useful for ordinary (unregistered, non-contributing, non-linguist) users, by getting onto the initial sreen more of the info that is, I think, most commonly sought: 1. definitions and 2. a guide to the definitions and other material that don't fit on the first screen (the Table of contents). Registered users ought to be given the power to have show/hide bars in the sections that they select be expanded by default (a feasible option, BTW). DCDuring TALK 20:08, 4 November 2008 (UTC)
Ever since we introduced Show/Hide bars, my experience has been that average users aren't aware of their function, and overlook their contents entirely. Time and again, I have seen comments made from ordinary users who were surprised when they finally discovered them, and that's just the fraction who discover them. I've been following the addition of Translations to WOTD entries since before the introduction of Show/Hide bars. When these were introduced to the Tranlsations sections, addition of translations by average users plummeted, and this drop has never recovered to its former levels. In short, your position that these tables benefit average users isn't supportable. If we do choose to use them, them they should be expanded by default, and collapsible only as a customization feature available to registered users. --EncycloPetey 20:17, 4 November 2008 (UTC)
On the use of "<" and conciseness: I like the use of "<" to mean "from". Century 1911 uses "<" and "+" in its etymology markup. Unlike Wiktionary, the etymology in Century 1911 is not introduced by "Etymology" heading, and is placed in "[" and "]" instead; and yet the readers of Century 1911 must have managed to learn to read its entries. A new reader of Wiktionary sees a content under an "Etymology" heading, so can quickly look up the word "etymology" in Wiktionary to find out what it means. --Dan Polansky 21:09, 6 November 2008 (UTC)
Headers (of all kinds) take up more space on enwikt than on some other wiktionaries, and other online dictionaries, let alone print dictionaries. The space taken by "from" is almost negligible by comparison. DCDuring TALK 22:20, 6 November 2008 (UTC)
It's not the space taken by "from" for which I prefer "<". I prefer it for its faster showing me the derivation chain: my eye locates the individual elements separated by "<" faster than when they are separated using "from". I understand that one of the reasons why printed dictionaries use terse markup are the space constraints, which are absent in an electronic dictionary such as Wiktionary.
The issue seems to me to be at least remotely similar to the mathematician's preference of symbols to wordy sentences. Mathematical formulas can be phrased in the words of natural language, but when it is done, patterns and structures are obscured. --Dan Polansky 12:38, 7 November 2008 (UTC)

There are many Portuguese words with spelling variations between two main dialects: Brazilian Portuguese and European Portuguese (e.g., "contato"/"contacto", "registro"/"registo", "elétron"/"eléctrão" etc.). So, following the examples of "metre"/"meter", "parlour"/"parlor" etc., I have been adding the related entries and context templates. However, I didn't see up to this date, any context for conjugated or inflected forms. That is, "liters" should be obviously related to "liter", and therefore mainly used in US, right? The problem is: there are a huge number of Portuguese words with more than one inflected or conjugated variants according to each dialect.

  • An "European man" is an homem europeu; but an "European woman" can be both "mulher européia" (in Brazil) or "mulher europeia" (in Portugal).
  • "To love" means amar; but "we loved" means "nós amamos" (in Brazil) or "nós amámos" (in Portugal).

So, should I add context to these inflected or conjugated forms, as I did for amamos? And that leads me to another question: Should I add context to all forms of a word used mainly in just one dialect, as I did for golos, relating to golo? A section "Alternative spellings" at an entry without context for definitions seems incomplete, and an entry with no qualifier seems incomplete at all. Daniel. 20:24, 4 November 2008 (UTC)

Ideally, yes, the context you've added is desirable for all forms of words, including inflections. The alternative spellings and qualifiers are wanted on all such entries. --EncycloPetey 20:28, 4 November 2008 (UTC)
Ok, then I'll continue to use them. And, in this case and based on existing English entries, I suppose that other contexts related to when and where use specific words ({{archaic}}, {{colloquial}}, etc.) are also desirable, so I will use them too. Daniel. 14:40, 9 November 2008 (UTC)

Currently Wiktionary has the same favicon as Wikipedia. It's not the best solution: Wiktionary needs to be recognised as a separate project and gain recognision and current situation may only strengthen false opinion that Wiktionary is a part of Wikipedia. Apart from it, it's simply unconvenient to have browser with many open tabs and have a problem to tell, which come from Wikipedia and which from Wiktionary.

We could use icon , already used in some situations to represent Wiktionary (like on German Wikipedia or on French Wiktionary). There's a general agreement about such change on Polish Wiktionary, but we don't want to do unilateral steps, as we would like to see Wiktionarys having consistent look. I have written also on German Wiktionary beer parlour and if we decide to go for it, we could write a request on Bugzilla for our three projects together. --Derbeth talk 12:18, 7 November 2008 (UTC)

This has been discussed extensively before. There is no consensus to use the tiles here, although I personally prefer it. My opinion is that Wikipedia should change theirs to the mini globe - but trying to change them would be amusing. Conrad.Irwin 23:42, 7 November 2008 (UTC)
I would personally prefer the tiles to the existing W to differentiate us from Wikipedia. Thryduulf 14:43, 8 November 2008 (UTC)
I would prefer that Wikipedia switched their favicon to a little globe (their standard icon), leaving the "W" for Wiktionaries. The tiles are icky. --EncycloPetey 19:32, 8 November 2008 (UTC)
I like the tile, but also think that WP should change to the globe... -- IrishDragon 02:31, 19 November 2008 (UTC)

See https://bugzilla.wikimedia.org/show_bug.cgi?id=16315 for the bugzilla request for that. --- Best regards, Melancholie 00:21, 21 November 2008 (UTC)

Hello everyone! I had a good idea (tm). The idea behind this bit of javascript is to create "form-of" entries semi-automatically. The basic workflow is that users who have this ticked in WT:PREFS (it's not there yet, because there are some technical details to sort out; it won't be on by default - at least not for many moons) will see green links instead of red links in places where the contents of the non-extant entry can be worked out automatically. They then click on the green link, and click "Save", instead of having to type out a form-of entry. This is intended for use in situations where running a bot is not desirable, but where a little work could be saved anyway. This is ideal for templates like {{en-noun}}, as the creator of a noun page can create its plural in two clicks and no typing. It is less good for words with large inflection sets, but these are probably worth creating a bot for anyway.

Feel free to move this paragraph to WT:GP, I just wanted to group it all together. In order to make this work, I would like to add some markup to the inflection templates that will aid the javascript in working out the format of the entry. Each potentially creatable link will be wrapped in a span with three class names. (These are negotiable, just the ideas I first came up with atm). class="form-of plural-form-of lang-en" (I envisage adding a few more parameters, such as gender). The "form-of" allows a quick lookup of all potentially creatable entries, the "plural-form-of" tells us which form we are to create, and the "lang-en" (although optional for english) tells us which language we are working in. From that it then does a lookup to find an entry-creation template (example for plurals at User:Conrad.Irwin/test) and invokes mw:Extension:AutoEdit to substitute the values of the parameters. I know there are several problems with the code as it is at the moment, and most of them I have commented in there - but if you can see other issues, or a better way of doing this, please let me know. The main reason for this post is to ask permission to add this meta-data to some of our inflection templates. Would anyone mind if I merged {{en-noun/test}} back into {{en-noun}}? Conrad.Irwin 03:22, 8 November 2008 (UTC)

I'm all for changes that make editing easier and more efficient. Polyglot 11:17, 8 November 2008 (UTC)

Sounds good to me. Thryduulf 14:37, 8 November 2008 (UTC)

I have put this into WT:PREFS, simply tick "Make red-links to some form-ofs fill out entries automatically." and visit a page with {{en-noun}} and no plural form. A list of such missing plurals can be found at User:Conrad.Irwin/good plurals. I'll update that list to link to the singulars too :). I've tested this under IE6, Konqueror, Opera, Firefox3. Conrad.Irwin 19:38, 8 November 2008 (UTC)
Now works for {{en-verb}} too. Conrad.Irwin 01:54, 9 November 2008 (UTC)
Thanks for this. It appears to work in Firefox 2/Kubuntu in addition to the browsers you've tested in. Thryduulf 04:27, 9 November 2008 (UTC)
This is great, thanks! -- Visviva 06:46, 9 November 2008 (UTC)

When the target page exists is there a way this could detect whether it has the template it would add if it didn't exist? For example, skins is an English plural, English third person singular and Dutch plural. Until I added it earlier today, the skins entry did not have a Dutch section, but unlike the diminutive skinnetje I could not use the "accelerated" method. Thryduulf 23:55, 11 November 2008 (UTC)

That looks like a job for parsing the dump. RJFJR 00:45, 12 November 2008 (UTC)
In theory it is possible, but it would require a lot of effort, both in programming, and also when running (as it would have to load each potential "form-of" page and then parse it to determine which forms were present). Parsing the XML dump would also be quite tricky, though possibly slightly easier, and should probably be the task for a bot. (If someone else wants to implement it, I'm happy for them to integrate it into the script, but I don't have the inclination to implement it myself). Conrad.Irwin 00:56, 12 November 2008 (UTC)

We have quite a few User pages that either describe the person concerned, or link to an external site (e.g. Facebook). In cases where this is the only edit by the user, is it OK to delete the user page (after an interval to allow for slow editing). SemperBlotto 08:36, 8 November 2008 (UTC)

I have no problem with deleting userpages which link to personal websites from users whose sole edit is that userpage (such as the user Gauss pointed out). However, to take some examples, I have absolutely no problem with Ruakh noting his educational background nor Robert Ullmann noting his history with RFC's. Perhaps more controversially, I don't have a problem with SemperBlotto's and Alifshinobi's links to their personal webpages, nor User:ArielGlenn/Personal, nor Connel putting his personal email on his userpage. While I adamantly support our editors' rights to anonymity, those who choose to disclose their real names and whatever else could be considered to be doing Wiktionary a service in providing relevant information. People might rightfully wonder who is writing the dictionary they're reading. The key difference between the first user and the latter group is that every editor I've noted by name is an important editor here, with a significant contribution to this project. I think that if we simply said that users without substantial contributions were prohibited from links to personal webpages, I think that our admins generally have the good sense to make judgment calls on this sort of thing. If we wanted a more concrete rule, we could perhaps set a requirement of 100 edits to the main namespace which do not get reverted. I rather doubt that folks looking to use us as a Myspace would go to the trouble of fulfilling that requirement. -Atelaes λάλει ἐμοί 08:59, 8 November 2008 (UTC)
OK - If someone knows how to generate a list of such pages (let's say zero other edits apart from User and initial User talk page, and over a month old (as a start)) then I'll see about pruning them. SemperBlotto 11:30, 8 November 2008 (UTC)
In this case I concur completely with User:Atelaes that Wiktionary should not imitate w:Myspace - for many reasons, one of which is that the latter is a regional network from Septentrional America and Wiktionary is meant to have a world-wide audience. Therefore I support the proposal about 100 edits in the main (and Citations ? Why not?) space or a bit more. Bogorm 13:35, 8 November 2008 (UTC)
I'd say about 100 constructive edits to the content (main, citations, rhymes, Appendix, Wikiksaurus, concordance*, transwiki*) or Wikitonary namespaces (or their talks) would be a useful guide. I wouldn't be completely rigid on it though, for example if a user starts with their userpage and then steadily (but not necessarily quickly) makes good edits then I would let it stand. I'm not completely certain about those namespaces I've asterisked though.
I think that all users (except obvious vandals) should be given a month's grace, and after that time I think it should be easy to classify them into three groups that we should handle differently:
  1. Good contributors - those who've reached the 100 edit mark or look set to shortly. These users should be afforded the leeway given to the established users namechecked above.
  2. None contributors - those who've clearly displayed a lack of interest in significant contribution. The proposal by SB should be applied to these users.
  3. Others - those who don't fit into either category. An individual approach is probably best with these users, maybe giving them more time to see if it becomes clearer or talk to them. Thryduulf 14:35, 8 November 2008 (UTC)
While I can follow and agree with the reasoning given, there is one point about which I'm hesitant. I've seen some situations where are editor goes a month or more between bouts of editing. In some of these cases, it is useful to attempt to contact the editor regarding some bit of information to be verified. For some of these editors, the best way to resolve the situation is to have an alternative means to contact them, which these outside links do provide. While this situation doesn't happen all that often, it has come up on at least three occasions for me, and there are others where it would have been really nice to have a user page with contact information. So, I'm not sure I would set the number of edits as high as 100 in cases where the editor has added new entries. --EncycloPetey 19:31, 8 November 2008 (UTC)
If they haven't contributed at all, they should be deleted. If they've contributed, even only a bit, let them keep some reward. Conrad.Irwin 19:57, 8 November 2008 (UTC)
I agree, except where the userpage is excessive.—msh210 08:11, 14 November 2008 (UTC)
I haven't had any particular issue noting whether or not a page is essentially SPAM or someone contributing. Seems rather obvious most of the time. If you want to count contributions, either count globally, or look at their "home" project; they may be very active elsewhere, and simply have a copy of their user page here.
Rather than deleting the page, just add the wikimagicword __NOINDEX__ to it; Google and friends will ignore it, and not count links from it. Robert Ullmann 10:49, 9 November 2008 (UTC)
I have made a start by creating {{vanitypage}} and added it to User:Magwizshiz. It really needs to add the page to some sort of category so that we can easily keep track of them. (p.s. That list of pages would be nice) SemperBlotto 13:02, 12 November 2008 (UTC)
I suggest giving users fair warning before deleting any User pages. Maybe give the warning in their discussion pages? If someone deleted my user page without warning, I would be offended. --AZard 02:50, 22 November 2008 (UTC)

I'd like to propose separating derived terms (i.e. terms that vary the spelling of the original words) from derived phrases (i.e. set phrases and/or idioms that include the original word. By way of example, the current listing of derived terms in beauty is:

Everything up to beautify is a different word, a word that takes part of beauty and adds a different suffix or suffixes to it. Everything after that is a phrase, "beauty X" or "X beauty". I think these should get different treatment in definitions because the process of forming a true derivation is so different from the process of making a phrase. bd2412 T 09:46, 8 November 2008 (UTC)

I'd keep them together. Separating the lion cub from the lioness would be too cruel. And, very often, both spellings exist, with and without space. There is not so much of a difference... However, complete sentences (beauty is in the eye of the beholder, beauty is only skin deep) are a different case, and should be elsewhere, in my opinion. Lmaltier 09:53, 8 November 2008 (UTC
What is the proposal?
  1. to add a new "Derived phrases" header at the same level as "Derived terms"?
  2. to add a new "Derived phrases" subhead below "Derived terms"?
  3. to group the terms under existing headers, possibly using {{rel-top}}?
The last costs the least amount of vertical screen space and requires no Vote AFAICT. There are sometimes principles other than what you mention that afford useful bases for grouping related and derived terms, many of which could be (are now!!!) accommodated under option 3. Why isn't option 3 sufficient? DCDuring TALK 10:45, 8 November 2008 (UTC)
There was a time when I would have agreed with this proposal, but since that time I've seen too many odd problems and cases like what Lmaltier has noted. Look at the derived terms (first section) for time, where a single term may be variously spelling with a space, without a space, or with a hyphen. This proposal would put timescale and time scale on separate lists, which makes no sense lexically. --EncycloPetey 19:26, 8 November 2008 (UTC)
A solution to the requirement of putting timescale and time scale on one list is to define the second list as those terms that contain at least one additional stem. Thus, both timescale and time scale end up in the second list, while timely and timeless in the first list. This rule still separates lion cub from lioness, though. A good heading title for the second list is unclear to me; what about "Compound terms"? (If the headword is already a compound term, the heading is inexact a bit.). --Dan Polansky 21:54, 8 November 2008 (UTC)
(after edit conflict; mostly a dup of Dan's comment) Is it possible to separate {words derived using affixes and whatnot} from {phrases and compounds that come from other words as well as this one}? I think that should give "timely", "timeliness", "untimely", "betimes", "timing", etc. pride of place, while appropriately demoting "time[ ]scale", "lunch[-]time", etc. —RuakhTALK 21:56, 8 November 2008 (UTC)
But that's only a partial solution, because it doesn't consider all the other ways in which derivations happen, such as shortening by means of abbreviations, contractions, elision, etc. Please look at the whole gamut of possibilities. --EncycloPetey 22:03, 8 November 2008 (UTC)
So a different take: group A: all derived terms; group B: group A minus group C; group C: all the terms that can be obtained by appending or prepending a word or more words to the headword, regardless whether the separation sign between the headword and the newly affixed word or words is (i) absent, (ii) hyphen, or (iii) space. I do admit that I am unaware of the varienty of spectrum of derivation possibilities, so this may possibly be a rather naive proposal. Still, AFAICS one way of derivation is assigned to the group C, and all the rest to the group B. --Dan Polansky 22:27, 8 November 2008 (UTC)
But you've only modified your proposal to accomodate the one specific problem I metioned without looking for any others. Here is another situation your proposal does not deal with: how Related terms would be affected. Consider replacement of one affix with another. On the entry for timely, where would timeless go related to timeliness. And would time be in a separate list of its own, since it would be the only entry related strictly be removal of a suffix? I used this Related terms example mostly because I can't offhand think of a Derived terms example, but I know this situation exists for Derived terms as well. And this isn't the only additional problem. Before making a sweeping proposal for formatting Derived terms, I'd want to know that the proposed solution has been thought through first, which your proposal has not. Otherwise, we end up having to make many further revisions to work already done. --EncycloPetey 00:39, 9 November 2008 (UTC)
Re: "But you've only modified your proposal to accomodate the one specific problem I metioned without looking for any others." I am afraid that is correct. I do not understand the problem with related terms that you have just mentioned, but I guess I should better stay out of the discussion at this point, as the knowledge that I have of the topic of derivation of words is too limited to allow me to show that all kinds of not-yet-mentioned problems that could possibly arise have been considered. --Dan Polansky 10:46, 9 November 2008 (UTC)
Let me add: If your objection was that this proposal artificially creates a dedicated heading for one class of derivations while leaving all the other classes of derivation unmentioned, assigning them the implicit class "miscellaneous", then my answer was off the track, and I do not have any reply to this objection. Just like some other people, my experience is that the class labeled by me as C typically gets much longer than the class B, the miscellaneous derivations, so its separation could be worth it. --Dan Polansky 22:46, 8 November 2008 (UTC)
How about creating two lists: one for the longer "sentence-like" phrases, and another for the rest. It's easy to find what we are looking for if they are sorted alphabetically. --Panda10 22:56, 8 November 2008 (UTC)
Well, certainly it is a stretch to say that beauty is in the eye of the beholder and beauty is only skin deep are "derived terms". I would agree with an additional level of subheaders below a "Derivations", with separate between (a) true derivations, (b) derived phrases including compound words which may or may not use a space, and (c) idioms which necessarily include the headword. bd2412 T 04:06, 9 November 2008 (UTC)
I think this conversation would be helped by someone experimenting slightly with some entries that have many "Derived/Related Terms". New subheadings and stuffing previously-visible terms in hidden boxes would be too much, but be bold otherwise. Tinker with table column headers or create separate (vertically stacked) {{top2}} tables for the different proposed groupings. --Bequw¢τ 10:21, 10 November 2008 (UTC)
Keep it simple! Lmaltier 21:11, 14 November 2008 (UTC)

There are several words I've seen recently with alphagrams, correctly placed in the anagrams section. As there is no standard presentation, however, they are shown in several different ways.

I've just created {{alphagram}} to try and resolve this, the formatting is simple, e.g. for word:

* {{alphagram|dorw}} 

gives

The ELE says that alphagrams should not be linked, unless it is also a word, e.g. the alphagram of tar is "art". In this case just wikilink the parameter:

* {{alphagram|[[art]]}} 

gives

Does anyone object to using/mandating this formatting?

Also, I think that for consistency, the alphagram should always be placed at the end of the anagrams section. If the template is used, this should be a trivial task for AF. Does anyone have any comments? Thryduulf 15:30, 8 November 2008 (UTC)

I like everything, but think the alphagram should be placed first;. It will, after all, always be first alphabetically by its very nature, and in some cases will be a word itself. --EncycloPetey 19:21, 8 November 2008 (UTC)
I wondered about that, but thought that real words should be given preference - the alphagram is only a real word in a tiny minority of cases. What do others think. Thryduulf 22:44, 8 November 2008 (UTC)
If the method is place it at the end, then you have the case where it is a word, and placed first, and when it isn't, it is at the end. Should just be first. (however, the whole thing seems rather pointless to me) Robert Ullmann 10:42, 9 November 2008 (UTC)
i like the alphagram placed first since that was the format in the ELE. either way is fine. confession: i changed the alphagram example to the template format before i realized that a vote was needed to change the ELE. sorry about that. --AZard 04:25, 13 November 2008 (UTC)
what is AF? --AZard 15:18, 14 November 2008 (UTC) (signing after the fact.)
User:AutoFormat.—msh210 08:04, 14 November 2008 (UTC)
so, after a decision is made on the location of the alphagrams, an AF bot will make the changes and avoid human manual effort. am i understanding correctly? --AZard 15:18, 14 November 2008 (UTC)

When editing pages, we should be able to go the another language's analogue. As well as discussion pages and history pages. This goes for all wiki projects (wikimedia/wiki media projects).96.53.149.117 22:52, 8 November 2008 (UTC)

I don't understand what you're saying. What situation prompted this comment? --EncycloPetey 23:01, 8 November 2008 (UTC)
My guess is that you are asking for interwiki links to be displayed in the sidebar when editing a page as well as when viewing it (for example if you are editing elastic then you should see the interwikis to fr:elastic, de:elastic, pl:elastic, etc.)? If this is the case, then there is nothing we can do about this here, and you will need to make a feature request at https://bugzilla.wikimedia.org/. If you want your request to have any chance of being acted on, then I recommend that you explain clearly what it is you want and why you want it (if English isn't your first language, then I suggest asking someone to translate it for you). Even this though will be no guarantee that it will get done, as what is aced upon and what isn't is seemingly enirely down to the developers' whim. Thryduulf 04:02, 9 November 2008 (UTC)

{{nrm}} (a language template) is currently set as "Norman." However, nrm is the SIL code for Narom. Now, we have a Norman Wikipedia (with the prefix nrm, note). So, I'm assuming that the Wikimedia language council designated nrm as the code for Norman, as it did/does not have an official such code. My assumption is that we would give SIL's codes precedence over WM's internal code-set, and so {{nrm}} should be changed to Narom, and Norman be made an orphan language (i.e. not having a code). We do have {{xno}}, but that's not the same thing. Thoughts? -Atelaes λάλει ἐμοί 06:55, 9 November 2008 (UTC)

The "language committee" is screwed up. I don't know why. They set up Swiss-Deutsch as als, which is Tosk Albanian, instead of gsw. They (and in this case specifically Gerard) refused a Jerrias request, saying it didn't have any ISO code, allowed Aromaniain as roa-rup in spite of having a perfectly good ISO code rup (which they must have known, why else would they come up with "rup" as a code?!) And now create Norman as nrm, as noted an allocated code; instead of roa-nrm which would make sense. I don't follow this at all; short of assuming actual brain-damage, what can be going on?
We should use nrm=Narom, and ignore them. If they create a wikt, it would get a little more complicated, but we already compensate for a few other idiocies. (yue->zh-yue, etc) If there is a wikt, and we have Narom entries, it will take a little care. In the mean time, anyone want to ask WMF WTF? Robert Ullmann 10:35, 9 November 2008 (UTC)
I agree. The Wikipedia article for Norman language used to indicate nrm as the ISO code, until I pointed out that this was not, in fact, the case. I don't know where the idea came from but we should definitely not be perpetuating it. (I just wish SIL would actually assign Norman a code.) Ƿidsiþ 10:41, 9 November 2008 (UTC)
I'm afraid that fr.wiktionary is perpetuating nrm as the code for Norman. Something should be done. For roa-rup, I think I can answer: at the time, rup was proposed as the ISO code, but was not an official ISO code yet. This is why they chose roa-rup. Lmaltier 09:13, 10 November 2008 (UTC)
Similarly, als.wp was NEVER "Swiss German", at first it was Alsatian, and was later enlarged to all Alemannic dialects. However, the dialects in questions are currently spread across more than one code by ISO-639-3 (gsw is Swiss German, wae is Walser German and swg is Swabian, it's not clear if gsw alone can represent them as a whole). A Swabian Wikipedia requests exist, but is not on the requests page, fpor some reason. Circeus 22:34, 15 November 2008 (UTC)

Empire State Building, New Brunswick, purple gas and many other compound terms often have their components linked in either or both the etymology (Empire State + building) and the declension line's headword (Empire State Building).

I suggest that linking only the etymology is preferred:

  1. Redundant links reduce the clarity of a web page, so it is better to choose one preferred linking site
  2. The headword in the declension line is the heart of the entry, and should draw the eye by remaining black (it's the reader's destination, not a jumping-off point)
  3. The nature of simple and compound links is clearer when they are separated by unlinked punctuation (+), which is done in the etymology
  4. Link targets are clearer when they can be linked without pipes, which is done in the etymology, e.g., lowercase “building” in the above example
  5. Affixes, etc, can be linked in the etymology, e.g., “New Brunswick + -er” (in New Brunswicker)

Linking components of the headword is a poor substitute for even the most rudimentary etymology.

Whatever we decide, it should be incorporated into WT:ELE and WT:LINKSMichael Z. 2008-11-09 18:20 z

Both. If links don't cost much in performance terms, then why not have them both? In the case of an entry like [[Empire State]] [[building|Building]], I would not think that we gain much from having a separate "etymology", but certainly wouldn't object and might add it.
  1. Redundancy. Users may have the cursor or their attention near on or the other. Users simply may develop the habit of not looking at Etymology (or alt spellings, or pronunciation, or translations, or, even, definitions) because the section doesn't meet their ne eds. I don't personally experience the lack of clarity, but would welcome evidence or authority on the question.
  2. Bold, black heart of entry; destination vs. jumping-off place. After a user has landed on an entry and confirmed that it is right place for which purpose bold must help, the entry stands on its own. I would argue that much bold is then irrelevant or distracting. But then, as the user generates further questions, it becomes a jumping-off place for further answers. Then the bluelinks or redlinks provide good information and proximity to attention or cursor helps.
  3. Clarity of links. True.
  4. Pipes bad. We aren't going to be forbidding them. They just take a few keystrokes. What harm? Etymologies have a problem too, on occasion. The ability to link to lemmas in etymologies is sometimes only achieved by having two terms, one for the headword, another for the stem. (eg record < Template:.... < cord-, stem of heart DCDuring TALK 19:39, 9 November 2008 (UTC)
  5. Affixes. I would put in an Etymology every time if the affix is at all interesting. But in cases where it is just the common senses of -s, -es, -ed, -ing, -er, we don't miss much by having them "lost" in pipes, afaict.
Both. As a general rule I agree with you that it's a bad idea to have redundant links, but that is somewhat mitigated by our page naming scheme and linking style: a link that says "book" is a link to our entry for book, etc. (I think the big problem with redundant links on other Web sites is that it's not obvious that they're redundant.) It's also mitigated by our use of distinct colors for visited and unvisited links (which all sites should have, but many don't). I realize that "somewhat mitigated" isn't a strong argument for something, but I think it's good to be somewhat consistent. We basically always linkify components of multiword headwords, and I like that we do. —RuakhTALK 21:00, 9 November 2008 (UTC)
1. “...why not have them both?”—why not link every word? When an interface does everything, any added redundancy makes it a bit worse. “Choice is good” leads to cluttered interfaces which overwhelm and confuse the reader (e.g.)—take that too far, and you get Microsoft Word (good functionality, but how many users feel they comfortably understand its interface?). Avoiding redundancy is a good design principal in general, and also specifically in web page/software interface design.
4. I didn't necessarily mean that pipes are bad for editors, but that they make the link target less clear to the reader, especially when e.g. building and Building are absolutely different. The etymology already clarifies the roots overtly, so why not rely on those links going to exactly the same terms?
New Brunswicker derives directly from New Brunswick + -er, Empire State Building from Empire State + building: the etymological expression makes that clear, and you know exactly what the links lead to. So why would we link New, Brunswicker, and Empire in another context? Is this an alternate etymology? If the reader has developed the habit of not looking at the etymology, do we consider this an adequate substitute?
What exactly is the function of links in the headword? If we insist that it is a standard element of the entry, then we should be able to clearly define it. Michael Z. 2008-11-10 03:26 z
The function is to provide a convenient way for a user to link to components of the headword (if they exist) that is near the definitions (which I assume to be a focal point for users) whether or not there is a proper etymology header. If there is no meaningful etymology other than the components, then it enables about a three-line reduction in the vertical space taken by the etymology in the precious screen real estate above the definitions, which space may make more usable information visible on the initial screen for the entry.
The "New Brunswicker" instance is the one that I have no good way to handle with bluelinks. The link to "Brunswicker" in the inflection line seem undesirable. DCDuring TALK 03:48, 10 November 2008 (UTC)
Are you saying that the etymology section should be left out for most compound words? If so, then that should be clearly spelled out in WT:ELE, but I think a clear etymology would be preferable, is is likely to be added eventually for the sake of other details like attestation date, etc.
Anyway, this aspect of the interface needs some focus. I'd rather see links preferred in the etymology, and only present in the headword as a temporary expedient if the etymology isn't present. Michael Z. 2008-11-10 06:58 z
I agree with this. IMO headword links are at best a necessary evil; they are distracting and opaque to most users (particularly since only a small fraction of entries will have them). As I see it, headword links should be used only a) when there is no Etymology (or plausible basis for one), and perhaps also b) when it is constructive to link the constituents in a way other than is done in the etymology. -- Visviva 09:24, 10 November 2008 (UTC)
Just the headword, in general. However there may be some value in including an etymology section with entries like Empire State Building to make clear that it's {Empire State} Building and not Empire {State Building}. Ƿidsiþ 07:37, 10 November 2008 (UTC)
Both - In addition to the redundancy points already raised, I'll note that in languages with inflections, the etymology section typically links to the lemma form of the etymological origin, while the inflection line links to the specific forms used to construct the word. I suspect something similar happens in some Asian languages. Additionally, the etymology often includes additional text or other information. Where it doesn't (yet), it probably should have that information added. The inflection line is much cleaner visually, so it is easier to see and follow component links. --EncycloPetey 04:10, 12 November 2008 (UTC)
Can you link some examples? Michael Z. 2008-11-12 06:31 z
Multi-"word" entries are inority to begin with, so the only example I can find at the moment is homo nulli coloris, which doesn't have the best formatting to begin with, but which may serve as an example. --EncycloPetey 05:42, 13 November 2008 (UTC)
Yikes, this is a good example (of bad page usability). Redundant links are best avoided, because a reader can click a link, go back, then click a different link and momentarily wonder why he's not at a different place. Piped links can make this worse, because the same text linking to different pages, or different text linking to the same page, can be downright confusing. This example has it all, including links to nullus, nulli, and “nulli” > nullus.
I'd still suggest that the best way to avoid this situation is to discourage inflection-line links when there is an etymology (where one can write inflections or lemmas, and so there is less temptation to pipe links). A less desirable alternative might be to discourage pipes except where they avoid red links.
Page editing happens in a piecemeal fashion, so it is easy for situations like this to arise. We need simple guidelines to help prevent this. Michael Z. 2008-11-13 17:50 z
But de-linking the inflection line hurts all the inflected forms, which usually do not have an etymology section, and even for the singular the etymology is often do trivial that it isn't added: see number line and number lines for examples of this. I'm not sure whether simple guidelines can be drawn up that will cover more than just English. The needs of various languages are so disparate. --EncycloPetey 18:46, 13 November 2008 (UTC)
But the inflected forms have a prominent link to the lemma, which is where the actual etymology lives. Linking the inflection-line components here actually distracts the reader from this and other detailed information in the lemma entry. Instead of a cogently-written and relevant etymology, the reader is being presented with more opportunities for random dictionary surfing. (E.g. number lines has exactly one valuable link to the lemma with full definition, and potentially an etymology—clicking number or, take your pick based on an editor's whim of, line or lines here skips the useful information.)
Apart from this, I'm glad to recommend linking the inflection-line components in entries lacking an etymology. However, I think in every case an etymology which is written by an editor will be superior to an “etymology” composed of links added by default.
The homo nulli coloris example also shows how the writer of an etymology tends to explicitly name the relevant forms, and routinely link them. In this case the inflection-line links are not only completely redundant, but also less clear; they simply detract from the entry's quality. Michael Z. 2008-11-13 19:09 z
You're entitled to your opinion, but at least three of us have explicitly disagreed with you on that point. It's all very well to claim that it's simply distracting, but that's merely an unsupported opinion. I find such circumventing links positively helpful for some situations, for exactly the same resaons that you dislike them. Ultimately then, this comes down merely to preference. --EncycloPetey 19:31, 13 November 2008 (UTC)
That we disagree doesn't mean that both options are equally good, and it certainly doesn't mean that we should just keep doing it both ways in random entries, instead of agreeing on some rationale for this.
What it does mean that the form of the central element of every Wiktionary entry remains unresolved as a part of the page design. Michael Z. 2008-11-14 19:33 z
That is why I pointed out the the majority opinion disagrees with you. --EncycloPetey 19:43, 14 November 2008 (UTC)
We make decisions by consensus, no? I pointed out a number of concrete strengths of linking in the etymology. Several counter arguments seem to be along the line of “I agree that redundant links and pipes can be bad, but I like these links, and in this case they might be good.” I don't see a cogent argument in favour of this alternative, or a well-stated rationale for it, so I'd like to continue the discussion until we can agree on something. I'll try to examine the various examples systematically to help formulate a realistic picture of the pros and cons of each. Michael Z. 2008-11-14 20:49 z
Here is an example of a Latin proverb entry, where the links are positively better done in the inflection line than attempting to do so in the Etymology section: tantum religio potuit suadere malorum --EncycloPetey 02:14, 14 November 2008 (UTC)
This “etymology” is only an English translation of the Latin proverb, and it is somewhat redundant with both the translation in the definition, and the identical gloss in the quotation. A more comprehensive etymology would source and gloss each word, making the inflection-line links constitute still more redundancy in this entry. Michael Z. 2008-11-14 19:33 z
Look again, the etymology is giving the literal meaning, and because this is the English Wiktionary, that is most easily done with English. Explaining the discrete meaning and grammar of each word is not the etymology of a proverb; the etymology is the context and literary origin. Also, you did not read carefully, as the translation of the quotation is not the same as the literal translation in the etymology. A literal translation is not necessarily appropriate for translating passages of text. Please re-examine the entry. --EncycloPetey 19:42, 14 November 2008 (UTC)
But wouldn't an etymology of a foreign proverb at least have to explain the component terms? Does this apply to other phrases or expressions? Are there any guidelines or external references about how to formulate etymologies for proverbs? Not sure about proverbs, but in such a case the inflection-line links would seem to simply emphasize a sum-of-parts picture of a phrase, rather than a technical etymology. Michael Z. 2008-11-14 20:49 z
Why not show me how you would write the etymology for let the cat out of the bag? This isn't a proverb, so the proverb issue won't confound the question, and it currently has no etymology section to bias the discussion. --EncycloPetey 21:01, 14 November 2008 (UTC)
Not sure that would make sense for such a clear phrase or not, so I made an initial attempt for die Katze aus dem Sack lassen (compare). I wouldn't exactly call it elegant, but with the glosses it is significantly clearer than clicking on each inflection-line link and then trying to locate the correct sense. Probably best demonstrated with a language one can't read at all (while the German is fairly obvious to me on its own). Michael Z. 2008-11-14 21:51 z
Where would an explication of the idiom go? (Not that we consistently have such explications.) DCDuring TALK 21:59, 14 November 2008 (UTC)
Mzajac, the etymology you've set up tells the meanings of the individual words, but doesn't give any etymology for the idiom. --EncycloPetey 22:24, 14 November 2008 (UTC)
Well, then I don't know the etymology of this idiomatic phrase, and I'm not familiar with how this is done. But isn't this a better way to explain the components, and wouldn't it be a useful supplement to an overall etymology of the phrase? Michael Z. 2008-11-14 23:12 z
Now you're beginning to see: The meanings of the components words isn't actually etymologial information in many cases, but is instead supplementary information. There is no compelling reason to put such non-etymological information in the Etymology section. This is one of the unstated reasons why I don't like trying to shoehorn this information into the etymology section. Yes, the etymology section has room to explain grammar, translations, and to list both inflected and lemma forms, but this makes it visually hard to follow. Links from the inflection line directly to lemmata cut through all that. Additionally, explaining the component terms as a substitute for the actual etymology may discourage editors from adding the Etymology, since they will see that a complicated section of text already exists. --EncycloPetey 23:19, 14 November 2008 (UTC)
I am starting to better understand etymologies of longer expressions.
But the notion that some useful information should be left out of the etymology, because it will discourage editors from improving it, seems highly speculative. If an editor doesn't have a better notion of the expression's etymology, then he won't add it, regardless of what's there.
And the nature of the wiki is such that it's all going to be added sooner or later, next week, or next year, or long after we're gone. Isn't it better to work out how to add this information gracefully, rather than ignore the possibility? Michael Z. 2008-11-15 21:38 z

So, I know this conversation's been had before, but I can't for the life of me find it. If I missed the conversation where we came to a definite conclusion on this topic, please direct me to it, and I'll shut up. Otherwise, here goes: A number of editors are taking citations and dumping them all on the citations pages of entries. I am very much opposed to this, and I know that a number of others share this view. I would like to change our policy, something to the effect of:

"In addition to providing evidence of usage and existence through time, quotations also provide the ideal example sentences. In general, the ideal state is that each sense of a word be followed by a quotation which illustrates the sense in question. Additional quotations should be placed on the citations page, in order to maintain a focused entry. Note that quotations used as example sentences can be duplicated in the citations page. Pages which are very simple and would not be bogged down by additional quotes (e.g. only a single sense, few or no translations, derived terms, etc.) may have up to five citations listed in the entry itself. This is especially true of words which are rare, archaic, new, or otherwise can benefit from their existence being proven by quotations."

Additionally, I think the example given (mauve) should be made to conform to this. Of course it goes without saying that I'm not set on the details of the above proposal, but would like to see something that retains some quotes within the entries (but then, why did I say it?). Additionally, there's an interesting little convo on the talk page of WT:CITE concerning inflected forms that interested parties might care to peruse (whichever sense of the word works for you, the hypercorrect one, or the other :-)) -Atelaes λάλει ἐμοί 09:42, 10 November 2008 (UTC)

I am in general agreement with this. I think there is a useful distinction to be made between illustrative citations and probative citations. I expect just about everyone would agree that illustrative citations -- those which are chosen as representative of actual usage in context -- should be kept on the entry page, unless perhaps they are being replaced by even better citations. On the other hand, probative citations -- those needed to prove that an entry meets CFI, or perhaps to test some hypothesis about a word's history and range of use -- are a bit trickier, particularly if the cites in question are long and messy and add little new information. Personally I would still prefer to keep probative cites in the entry unless they are really obstructive (too long, too numerous), for the simple reason that our definitions of senses are always in flux, and Citations pages for non-monosemous words will inevitably get out of sync with their entries over time. -- Visviva 10:35, 10 November 2008 (UTC)
I find that adding too many quotes to the entry pages makes it harder to read, as there is that much more irrelevant text. I don't mind a quote per definition, to illustrate usage, but any more than that should be moved the the cites page where those who are interested can find it. If people aren't looking there, then maybe we should reword {{seeCites}}. Conrad.Irwin 10:55, 10 November 2008 (UTC)
We do still want to serve ordinary users, don't we? From that perspectve, it seems to me that the best probative citations are those that are also good usage examples. Unfortunately most probative citations are not especially good as usage examples. However, all but the very worst (eg, some from Usenet} are better that no usage example at all. Often, the at-least-two-lines-long nature of citations (in contrast to the one-line usage examples) pushes important content off the initial screen. Also, citations often miss specific problems that users have that can be better addressed with constructed examples. Perhaps what we need is a link format that goes directly to sense-specific citations-page sections from the sense. (I know. I know. Synchronisation issues. No technology magic for that?) DCDuring TALK 13:19, 10 November 2008 (UTC)
I would like too to emphasise the distinction between usage examples from ordinary sources in contrast to literary and thence more long citations (especially when stemming from poetic works). That is why I advocate the preservation of the concise usage examples in the main entry and the place for the citations on the appropriate page, as in mauve (although in this entry there are no ordinary non-literary usage examples). I find the structure given (mauve) exemplary and support the current Wiktionary:Citations policy. Bogorm 13:46, 10 November 2008 (UTC)
I tried to clarify what I thought of the situation at Help:Citations, Quotations, References - does that ring true to anyone else, if not could it be fixed? Conrad.Irwin 16:10, 10 November 2008 (UTC)
  1. One idea not expressed on that page is that, if an entry is short enough so that no (English) definition is pushed off the page thereby, attestation-type quotations could (should!) be on the main page, not the citation page.
  2. Another is that no citation of reasonable quality should be removed from a sense if doing so would leave the sense without any usage example. The requests for usage examples persist on talk pages, feedback, and elsewhere. DCDuring TALK 16:52, 10 November 2008 (UTC)
That page makes it sound like {citations} ∩ {quotations} = Ø, but I always took it as {citations} = {quotations} or perhaps {citations} = {quotations} ∪ {references}. —RuakhTALK 17:53, 10 November 2008 (UTC)
Yes, Atelaes, the removal of quotations to those damned citations pages has become a major peeve of mine as well, and more than once I've considered leaving the project because of it. I've added thousands of quotations and am careful to pick quotations which illuminate the meaning. It really rankles me to see good quotations exiled to those gulag subpages (which are sloppily maintained and which require redundant maintenance of the senses when there are multiple senses, and which are never going to stay in sync with their main pages). The real solution is to do it the way the online OED does it and to implement collapsible quotations boxes which work between senses on the main entry page (the code for which was developed by the ever-capable Ruakh about a year ago). -- WikiPedant 17:01, 10 November 2008 (UTC)
Hear, hear. Ƿidsiþ 05:55, 13 November 2008 (UTC)
Why not? DCDuring TALK 09:13, 13 November 2008 (UTC)
Agreed. It is inexplicable that those quotation boxes still haven’t been instituted; their use would:
  1. Remove the problem of Citations-page and Quotations-section synchronisation;
  2. Create clearly-visible grey bars between the definitions, demarcating each sense and drawing the reader’s eye to them;
  3. Cut down the amount of space taken up (or, as their detractors would say, wasted) by our citations to about one line per sense; and,
  4. Allow the simple categorisation of all pages with citations, and those without.
In light of the above, why is there so little enthusiasm for their use?  (u):Raifʻhār (t):Doremítzwr﴿ 20:46, 16 November 2008 (UTC)
In the online OED, clicking the "quotations" button at the top of the entry toggles the visibility of all quotations. IMO the ideal system for us would be similar: a JS button on every page (or at least every page with citations) that toggles the visibility of all interlinear citations on the page. The default setting could be partial visibility -- showing the quotation but not the source, so that the default user gets examples without a lot of ISBN numbers and whatnot; the user could then choose to either show all info or hide everything. I was fiddling with this a while back, but didn't get far (AFAIR, I could make it work in Firefox but nowhere else). -- Visviva 00:18, 11 November 2008 (UTC)
Right, Visviva, this is very much how I would envision it too. The current "Citations" button could be the toggle switch. -- WikiPedant 00:39, 11 November 2008 (UTC)
Further on this line, I'm delighted to report that my show/hide citations button is finally working in IE6, IE7, Opera and FF. Yay! It currently assumes that any unordered list nested in an ordered list is a citation -- I haven't been able to think of a counterexample in mainspace. Anyway, if you'd care to take it for a spin, copy the first section of User:Visviva/monobook.js to Special:Mypage/monobook.js, and put {{cites-button}} on a suitable test page. Discussion of if/how to fix this up for general use should probably go to the WT:GP; I just wanted to mention it here. -- Visviva 04:47, 12 November 2008 (UTC)
Useful, illustrative quotations should absolutely stay in the entry. The citations page is for everything else. (Note that "quotation" and "citation" are not the same thing. Anyone moving a good quotation to the citations page "because it is supposed to be there" should be immediately trouted. To the contrary, the citations page is a useful resource for quotations that might be added to the entry. (and no, we don't need more collapsible box magic; senses should have 1-2, maybe 3 useful examples and/or quotations that illustrate use. If there are more interesting things, they can go on the citations page) Robert Ullmann 17:16, 10 November 2008 (UTC)
How many per sense? What makes them useful? DCDuring TALK 17:31, 10 November 2008 (UTC)
That drives me crazy, too, as you know. Though I think 5 quotations under a sense is kind of a lot, even if there's just one sense, because then it starts to put a bit too much distance between a sense-line and the various attached onyms, translations, etc. (The OED gets away with it because it doesn't have any of that stuff, but we do, and we should be proud of it.) —RuakhTALK 17:53, 10 November 2008 (UTC)
How many per sense? Depends. For straightforward terms, 1 or 2 good quotations suffice. For slightly harder terms, I like to find one telling quotation from each of the 19th-, 20th-, and 21st-centuries. For difficult terms, I like to find one telling quotation from each century, going back as far as I can. For terms that have been around a long time, this can produce 5 or more. For ambiguous terms, I like to find enough quotations to give a clear sense of the range of usages. For idioms, lengthier lists can also be appropriate, since the defn of an idiom is greatly enriched by seeing the idiom used in context. -- WikiPedant 19:23, 10 November 2008 (UTC)
The question is how many should appear in the main entry page, as opposed to citations. Are you saying that there ought be no guidelines for this?
Idioms are actually an easy case, because the entire English section (the literal definition plus the figurative, plus the first few citations [unfortunately, the oldest] appear on the first screen. This is because there are no long etymology or pronunciation sections yet. We are not pushing more valuable content (except more recent quotes) off the first screen by inserting another quote.
Following such a practice at an entry like head or set will almost certainly drive users to seek dictionaries with more straightforward layouts for their most common dictionary needs. I already use OneLook in this way, because it provides a few definitions on the landing page as well as providing links to various dictionaries and other references, including WP and Wikt, each with its own characteristic strengths and weaknesses.
Perhaps Wikt should serve the neglected function of offering filtered usage examples. This would serve certain scholars, writers, language learners, and many others sometimes not well served by the various other tools. DCDuring TALK 19:50, 10 November 2008 (UTC)
I had always thought that around 3 was the maximum number wanted per sense of a word, but some people here seem to clamoring for a high max. Would it be fine to move to Citations: some of the 7 serial quotations under the first sense of verb#Verb? --Bequw¢τ 10:23, 29 November 2008 (UTC)
verb#Verb is a good test case. I would hope that only one or two of the citations would remain on the mainspace page. Perhaps all of the citations should be moved to citation space to allow a simple view of the attestation and usage history with just one or two left behind. DCDuring TALK 12:34, 29 November 2008 (UTC)
I would vote for keeping the 1981 (anon), 1997 (Griffiths), and 2005 (Mattison) cites. 1981 because it is the first we have, and therefore significant; the others because they are the simplest, and where possible a good citation should also be a good example, free of unnecessary complications. As an added bonus this would give us one cite per decade.
While we're pet-peeving here, I would like to point out that all of the cites for verb#Verb are borderline worthless at present, since the citer has not provided any links to the the source material (there are no URLs, DOIs, or ISBNs). For all the user can tell, these examples could all have been made up out of whole cloth.-- Visviva 15:28, 29 November 2008 (UTC)
My ears are burning. I have hardly ever inserted the links in my citations. At least any individual cite can be googled. I've only recently started using the wonderful quote templates, which make citation easier. But, you are right and I will henceforth insert the url, though the greater work may make me cite less. The citations definitely look like attestation cites rather than good usage examples. Sometimes actual usage falls short of one's aesthetic standards. Of course a show/hide for citations obviates the space problem so selectivity is a little less critical. Perhaps good-only-for-attestation citations should be commented out, so as to make our entries more uplifting. DCDuring TALK 16:30, 29 November 2008 (UTC)
I agree with DCDuring that at most two of them must remain. The more citations are moved to the Citation space, the less encumbered the main entry is. Bogorm 15:41, 29 November 2008 (UTC)

There has, for many years, been discussion about whether to indent subsenses in long lists of definitions. (c.f. generator, death grip, ward et al). Is this something we like or hate? As far as I can see it improves the clarity of the entry, as well as the logical flow, at the possible expense of possibly breaking the {{quote-book}}-like templates that have the indentation hard-wired in.

Please, for the love of God, yes. Subsenses absolutely make the definitions more meaningful and easier to understand. This is standard practice in almost every professional dictionary I've ever seen. I think that this will require more rethinking than just quotebook, and it will certainly make for some long, tedious, and excruciating Tea room discussions, but I absolutely think it is worth it. Yay for subsenses! -Atelaes λάλει ἐμοί 00:03, 11 November 2008 (UTC)
I'm down with it. The quote templates can easily be retooled. Some code should be added to MediaWiki:Monobook.css so that the subsenses display properly; I guess the question is whether they should be numeric ("1.2") or alphabetical ("1.b"). -- Visviva 00:41, 11 November 2008 (UTC)
To be honest, my first preference would be for senses to be numbered with Arabic numerals, and senses grouped under Latin numerals (I. 2). However, I would still prefer entries with non-grouped senses to simply be Arabic numerals, as we currently have it. Can the software be made smart enough to do that? My second preference would be for alphabetical (1. b.). -Atelaes λάλει ἐμοί 05:54, 11 November 2008 (UTC)
A good direction for improving the quality of the big, complex entries. Will we need to have explicit umbrella senses for the subsenses? Not all dictionaries seem to find that necessary, eg MW3, MW Online. DCDuring TALK 01:18, 11 November 2008 (UTC)
That is an excellent question. I wonder if we might want to consider starting out with an option for either or. I realize this will introduce a certain lack of consistency, but I think it would be good to try them both out, and see which one is more practical. My intuition is that generally it will work best to simply have senses grouped together, without a "super sense", but I wonder if sometimes an explanatory note about how and why such senses are grouped might be nice.
A further question is: How many levels we should be prepared for? I think MW has 4 or 5, but no less than 3. This might affect the implementation and the level labeling. A kludge that got us two levels but couldn't go beyond that might not be desirable. Would it be easier to have labels like "1.2.4.2."? DCDuring TALK 10:20, 11 November 2008 (UTC)
I have serious reservations with indented subdefinitions. These reservations stem from the way we code and edit entries, rather than from the concept itself. If a simple, friendly, and easily maintainable structure could be devised, I might support its implementation. However, the three examples presented in support of such a structure don't seem particularly suited to such a complex structure. I see such a structure as useful only for really messy entries like head or set. A list of only three to six definitions doesn't pay off with the added complexity to format it and ensure that additional edits don't destroy the structure. --EncycloPetey 04:01, 12 November 2008 (UTC)
  • I love them. It was me that worked on ward, and I still miss my early version of "mark", before Connel nixed it. I think subsenses are very intuitive, and I think they allow for a good compromise between whether our definitions are ordered on historical principles or in terms of related definitons. This allows us to show the order in which various broad senses have emerged, while still keeping similar definitions in close promiximity. Ƿidsiþ 05:53, 13 November 2008 (UTC)

The wiki format for lists encourages entering main senses—it just seems weird to add sub-senses under an empty bullet point (octothorp point).

Changing the numbering format is easy, and you can already do it in your own user style sheet. Example from my style sheet (User:Mzajac/monobook.css), which accounts for four levels of nested lists:

/* nested ordered lists */
ol {
  margin-left: 1.6em;
  list-style-type: decimal;
  }
  ol ol {
    list-style-type: lower-alpha;
    }
    ol ol ol {
      list-style-type: lower-roman;
      }
      ol ol ol ol {
        list-style-type: decimal;
        }

I think OED uses something like A. 2. c. IV., but sometimes skips a level. Unfortunately, we can't currently set the numbers' font style or weight using CSS.

The style sheet should account for 4 or 5 labels, for those cases where someone goes overboard. I suggest we should discourage more than a level of nesting, perhaps by setting an unattractive style for all lower levels. Michael Z. 2008-11-13 18:09 z

  • The primary objection to subsense formats was that all initial examples given, violated copyright. Many more entries that use that subsense format remain suspect. The second objection (and more important, IMHO) is that derivative uses are confounded by subsense syntax. Only the most elaborate parsers have even the slightest chance of interpreting such entries. The third objection is aesthetic: splitters already have far too much latitude here - encouraging this syntax encourages more specious splitting of definitions. The resulting definition themselves are more difficult to read, so we alienate readers. Split definitions also make translation tables more numerous, less accurate and more complex. If the goal is to devise an unusable, incomprehensible system, then subsenses are attractive. But if the goal is to provide a reusable, extensible dictionary to the world, artificial subsenses add unnecessary complication. --Connel MacKenzie 14:23, 15 November 2008 (UTC)
    Can you give an example of the parsing problem? I would have thought that nested ordered lists are well structured, and relatively easy to parse (unlike the unfortunate hash the wiki parser makes of nested headings, which makes it impossible to select a section in CSS). Michael Z. 2008-11-15 21:08 z
  • In point of fact, subsenses have nothing to do with "splitting definitions". It does not change the number of definitions, only how they are organised on the page. The number of translation tables is exactly the same. It also seems bizarre to describe this as "an unusable, incomprehensible system" when all major print dictionaries do it. Ƿidsiþ 21:06, 16 November 2008 (UTC)

Seeking feedback to come up with a good method for entries that take inflected objects.

  • For English entries, this problem appears in the form of prepositions: think about, against, of, on, out, to, up - what is the best way to highlight all these prepositions so it draws the eye to the one the user is looking for?
  • For FL entries, the same may be a case ending: gondol -ra/-re. Where should we list the required case endings - in the beginning of the definition line? Bolded, so it's easy to see?
  • Sometimes the object can only be a person or a non-living entity, other times both. What words should indicate this: somebody, something, or an abbreviated form of these?

Your thoughts, please. --Panda10 00:12, 11 November 2008 (UTC)

An example can be seen at tartozik. The abbreviations indicate the case ending: vmvel = valamivel - with something. The labels may be generated by templates that include the entry in categories, for example Category:Hungarian words taking "valamivel". --Panda10 23:43, 11 November 2008 (UTC)

I created a page User:Panda10/Sandbox to compare options. Would you please take a look and let me know if any of them are acceptable? --Panda10 15:12, 15 November 2008 (UTC)

Since there is a great divergence between entries of past participle variants here in Wiktionary, I ask: for languages where a past participle can be inflected, how should we handle them? Here are some examples:

  • Asturian escribir: escribíu m., escribida f., escribío n., escribíos m. and n. pl., escribíes f. pl.
  • Catalan escriure: escrit m., escrita f., escrits m. pl., escritas f. pl.
  • French écrire: écrit m., écrite f., écrits m. pl., écrites f. pl.
  • Italian scrivere: scritto m., scritta f., scritti m. pl., scritte f. pl.
  • Portuguese escrever: escrito m., escrita f., escritos m. pl., escritas f. pl.
  • Spanish escribir: escrito m., escrita f., escritos m. pl., escritas f. pl.

If these words exist and are recognized as verb forms (i.e., not as adjectival derivations or other interpretations), I propose that they should be labeled correctly, linked to the lemma form of the verb and included in conjugation tables. Daniel. 05:31, 11 November 2008 (UTC)

Some of how we handle these may have to vary by language. I am not sure what you are asking about: Format of a complete entry, just the inflection line, how such participles appear within the lemma page, how to set up conjugation tables for the lemma, or something else. Even in the examples you've provided, you haven't presented the whole picture: The Spanish past participle escrito is not used in that form in all regions that speak Spanish; Argentina and Uruguay prefer the spelling escripto instead. It also doesn't address the issue of what interlinking (if any) ought to exist between participles of different tenses. Nor have you expressed an opinion about what part of speech you consider these words for the specific languages listed.
In Latin, participles function as a separate part of speech, with characteristics of both adjective and verb. There is also no "past participle" in Latin. There is a "perfect passive participle", so called because Latin has both active and passive senses for most verbs and because Latin has more than one "past" tense. An example of how these are being handled in Latin exists at the entry for amatus. Some of this page's content and formatting will not apply in other languages, but some of it may be helpful.
However, as I said at the outset, I'm not sure we can decide what to do for all languages, or even if a uniform setup is possible for just the Romance languages listed above. We don't even have that kind of consistency with certain groups of words in English. --EncycloPetey 03:54, 12 November 2008 (UTC)
Some of these examples were took directly from Wiktionary itself, both here at the English version and from other languages; these were commonly referred as "Adjectives" (derived from participles) or "Verbs" (conjugated verb forms) without a clear distinction; most without even appearing at conjugation tables. They caught my attention. To be honest, I don't have sufficient knowledge to decide how should we handle a word in Asturian or Italian, but they seem to have similar rules of conjugation - that is, the part of speech of all participles should be "verb form" and they should link to the infinitive form of the verb; that's why I am asking for other opinions. One language I know with details is Portuguese, that I speak fluently, and the answer to my own question is: Yes! In Portuguese language, there are such variants. Here is an example: A garota foi embelezada por mim. (The girl was embellished by me.) This is clearly an use of the female singular past participle of verb embelezar, that cannot be substituted by an adjective: A garota foi "bonita" por mim. (The girl was "beautiful" by me.) makes no sense. Daniel. 16:49, 23 November 2008 (UTC)

How do we handle words that are replaced in a descendant language?

If a term evolves into an etymologically related term, we put it in “Descendant terms” (and it is linked back in the “Etymology” of the descendant term), but if it is replaced, there doesn’t seem a natural place to put it – it can sorta fit in “Synonyms” as the terms likely coexisted for a time, but this doesn’t capture the relationship. (The following examples are “as far as I can tell”.)

For instance, in Classical Latin, the term for “cheese” was cāseus, which was replaced in Vulgar Latin by formaticum, leading to, for instance, French fromage. Currently there’s no standard way to see terms in descendant languages, not just terms etymologically descendant from a given term.

A more familiar example is perhaps Middle English they, where Old Norse þeir displaced þæt. (Here they are cognate, which confuses explanation a bit further.)

This seems a common language process, and worthy of some systematic treatment – any suggestions?

Perhaps:

  • a note in the “Etymology” section if a term replaced an older term, and
  • a “Replacement terms” section, similar to “Descendant terms”?

Nils von Barth (nbarth) (talk) 17:23, 12 November 2008 (UTC)

A section in the Usage notes would also be a good idea. This would allow a description of when the wrod was used, and when it began to be replaced. For modern languages, this could also be expressed on the deinition line with a context label like {{archaic}} or {{dated}}. Note also that your Latin cheese example is a bit simplistic, since some Latin descendant languages retained caseus and have modern words descended from it rather than from formaticum. In other words, sometimes replacement is incomplete, or is limited geographically, such as replacement in Galician of poñer with pór, which is another incomplete and geographically-determined case of change. In this case, both forms exist in modern Galician, but each form has regions where it is the norm. --EncycloPetey 05:39, 13 November 2008 (UTC)
  • =See also= seems the natural place. Sometimes I will also mention this in an Etymology section where it seems relevant/interesting. Ƿidsiþ 05:48, 13 November 2008 (UTC)

w:Birthday wishes in other languages is up for deletion at Wikipedia (and rightly so). However, this resource is available to us for a few days (longer, if really necessary — I can transwikify it upon request.). If there are any translations there that are missing from happy birthday, we could do worse than incorporate them (as unchecked translations if necessary, or on the talk page). I've encouraged the article's author to come here to do that sort of work. Uncle G 13:03, 13 November 2008 (UTC)

For the mean while, I've transwikied it to transwiki:Birthday wishes in other languages, so it can be deleted from WP.—msh210 04:50, 16 November 2008 (UTC)

Can I initiate some serious bikeshedding by proposing that we should edit the {{suffix}} template so that it does not include quotation marks in the name of the category. I believe this would be better because it is then possible for people to type the category name manually easily, it prevents the nested quotation marks in Entries in category “English words suffixed with “-ness””. It will be necessary to do some category renaming whatever we decide as some languages (e.g. Hungarian) use the category name without the quotes. Would people object if I got User:Conrad.Bot to move the categories that have already been created, and then update the template, then find any manual categorisations left over, and then (possibly) delete the old categories? Conrad.Irwin 01:05, 15 November 2008 (UTC)

OK with me. DCDuring TALK 01:32, 15 November 2008 (UTC)
sounds good to me. —RuakhTALK 02:40, 15 November 2008 (UTC)
Sounds very good to me. :) --Panda10 02:44, 15 November 2008 (UTC)
I've created all the respective categories without ", and Daniel. has updated the {{suffix}} to use them. I'll now slowly delete all the empty categories containing quotemarks. Conrad.Irwin 14:25, 15 November 2008 (UTC)

I have been unable to find any Wiktionary policy statement on whether fair use images are permitted on Wiktionary. As local image uploads are restricted to sysops, and Commons does not allow fair use images, such pictures are de facto not permitted here. Should we make this explicit?

It should be noted that this Wikimedia Foundation board resolution requires all projects that allow fair use content to have an "Exemption Doctrine Policy" (EDP) (see link for definition, I don't know how to make interwiki links to the foundation wiki). (I do. ☺ Uncle G 12:03, 20 November 2008 (UTC))

There appears to be no requirement for projects not hosting fair use content to have a policy about or otherwise explicitly state this (presumably as it is the default position). Making an explicit statement (or not) about fair use images is therefore a purely local decision. Thryduulf 01:56, 16 November 2008 (UTC)

  • I have no qualms about using images under fair-use, but as we don't have many pages that are about propriety things I'm not sure we need them (and it sounds as though saying no is simpler, as we can ignore our Image: space completely). I certainly would like to keep upload sysop-restricted, which probably limits us to using commons. Conrad.Irwin 02:01, 16 November 2008 (UTC)

There has been some discussion at Wiktionary:Requests for verification#Cheerios about allowing fair use images. I personally do not think that it is worth all the hassle for the handful of entries that would benefit. IANAL, but going by what I know from Wikipedia it is worth baring in mind the following:

  • If a copyrighted image can be replaced by a free content one, then using the copyrighted image is not fair use. Even if no free image currently exists, only if one cannot exist is it fair use.
  • Every fair use image must be accompanied by a detailed fair use rationale, explaining why every use of on Wiktionary is fair use (i.e. if it is used on three pages, it must be accompanied by three fair use rationales).
  • If a fair use image is no longer used, it must be deleted. I believe that Wikipedia uses a bot to flag unused fair use images.
  • Using such images outside of the main namespace is unlikely to be fair use.
  • If we as a community choose to allow fair use, we will need to work with the foundation to create an "Exemption Doctrine Policy". AIUI this needs to be in place before we accept fair use images. Thryduulf 19:25, 16 November 2008 (UTC)
  • Unlike many conventional encyclopedias, most convential dictionaries are typically not full of pictures. -- Gauss 19:37, 16 November 2008 (UTC)
    • I have several dictionaries with pictures in them. However, I cannot recall ever seeing one (that wasn't trying to be an encyclopaedia as well) that employed a copyrighted work for which the dictionary didn't have publication rights. Translated into Wiktionary terms, this is equivalent to not having any images in Wiktionary that are not free content. Uncle G 12:03, 20 November 2008 (UTC)
    • The Wikipedia EDP at w:Wikipedia:Non-free content is long. We could copy it or shorten it. In either case, someone who has some familiarity with our likely usage pattern and the IP law field should read it. I also don't think we can do fair use casually. As Thryduulf suggests it may not be worth the effort to do so at all. DCDuring TALK 19:31, 16 November 2008 (UTC)
      • If we do decide to go down the route of fair use, the foundation site (see link above) makes it clear that they will help projects create an EDP, and presumably they will need to approve it if we go it alone. It doesn't say how to request that help, but presumably there is somewhere on Meta to contact the relevant people. Thryduulf 22:21, 16 November 2008 (UTC)
        • I see no problem with that. Even if we don't end up using any fair use imagery, there is no reason for us to foreclose ourselves from doing so if there is any potential it might be useful to illustrate a definition. bd2412 T 04:47, 17 November 2008 (UTC)
  • Per Thryduulf 19:25, 16 November 2008 (UTC) and Gauss 19:37, 16 November 2008 (UTC), I think allowing fair-use images will be more bother than it'll be worth.—msh210 06:09, 17 November 2008 (UTC)
    • Amen to that. This is trouble we don't need. Wikipedia wouldn't have fair-use either, if they/we had dealt with the issue before the project was cursed with popularity. -- Visviva 03:52, 19 November 2008 (UTC)

Should I create a vote on this issue to settle it? Thryduulf 12:19, 19 November 2008 (UTC)

  • Go for it. --EncycloPetey 07:11, 20 November 2008 (UTC)
    • Since you are interwiki linking, here's another one for you Don't vote on everything. ☺ Wait until someone can come up with a cogent reason that Wiktionary would ever need a non-free image, and only then put forward a proposal.

      Here's a thought to ponder upon: The "preamble purposes" in the fair use doctrine are "criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research". We are not Wikinews, Wikipedia, Wikibooks, or Wikiversity. Teaching, criticism, comment, and news reporting are not our domain. And you'll be hard pressed to explain how a dictionary needs to copy copyright-protected images in order to perform lexicographic scholarship or research.

      Here's another thought to ponder upon: Our quoting short excerpts of copyrighted works for the purposes of illustrating their use is something that we do under the fair use doctrine. We don't use images and media under fair use, but we do use text quite a lot. But we already have that covered in our copyright policy. Uncle G 12:03, 20 November 2008 (UTC)

  • Well, Cheerios has been mentioned as an entry that might require a fair-use image for illustration. The same reasoning would probably apply to many other terms derived from trade names. While I don't think we should have them, I think the case for fair-use images on Wiktionary is roughly as good (or as bad) as the case on Wikipedia; in both cases, the primary purpose of fair use is "teaching," i.e. illustrating the concept in question. Our need for this type of illustration is not really any less than Wikipedia's; indeed it is arguably greater since our format does not allow lengthy verbal descriptions. But again, I don't think we should have them and for my part I am happy to continue with our Vote-free, EDP-free status quo. -- Visviva 12:13, 20 November 2008 (UTC)
  • I can't think of any case where an image is really needed, but a free use image is not possible. I'd prefer to say illustrate with images at commons (easy and practical way to illustrate). Can someone give me more examples of where a copyrighted image is needed? RJFJR 14:31, 20 November 2008 (UTC)
  • Darth Vader could really use one - it would enhance the explanation of why, exactly, this name has come to stand for a person attributed with that sort of brooding malevolence. There are probably a few other examples of that type. bd2412 T 04:56, 25 November 2008 (UTC)
    Would it not be possible to use a Commons self-made image of a person in a self-made Darth Vader costume? There are some such images there currently, from conventions, but unfortunately Vader is blurry in all of them. --EncycloPetey 20:01, 25 November 2008 (UTC)
    Would anything short of an actual image of Darth Vader from the movies really capture the look? And in any event, Lucasfilm undoubtedly owns whatever copyrightable elements exist in the costume (although costumes are generally not copyrightable, masks generally are). The point is, we would be on utterly safe fair use grounds to use a 250px cropped scene from the film, and I don't think there is a substitute that would have the authenticity of the real thing. Same thing, perhaps, with Death Star, hobbit, Ringwraith, Klingon, Hogwarts. I'd also reiterate that to the extent we allow brand names of packaged goods like Rice Krispies (which we include because they may be used without context in writing), we ought to have pictures of the packaging as well. bd2412 T 06:49, 26 November 2008 (UTC)
    Since packaging changes periodically, I definitely disagree with you on that point. A bowl filled with Rice Krispies would be more effective than a picture of any package, and could be uploaded on Commons with no copyright problems. --EncycloPetey 06:23, 26 November 2008 (UTC)
    Interesting point. But in many such products there are certain elements of the packaging that are familiar across generations, such that a picture of a box of cereal from, say, the mid-80s would be instantly recognizable as that cereal. bd2412 T 06:49, 26 November 2008 (UTC)

A good news: linking to Wiktionary from Wikimedia projects using "d:" prefix works now. --Dan Polansky 08:58, 16 November 2008 (UTC)

Great. Thanks. DCDuring TALK 19:31, 16 November 2008 (UTC)

"d:"?! Why "d:"!?? Ƿidsiþ 21:00, 16 November 2008 (UTC)

"Dictionary". Seems to be the best anyone could come up with... \Mike 21:06, 16 November 2008 (UTC)
Ah, I see. So they picked the one letter from dictionary which "Wiktionary" doesn't use.... Ƿidsiþ 21:08, 16 November 2008 (UTC)
"C" isn't in "Wiktionary" either, and doens't have any relevance at all. "W" and "N" are in use for Wikipedia and Wikinews respectively. None of the others have any special meaning for "Wiktionary", and we are a dictionary. Makes sense to me. Thryduulf 22:38, 16 November 2008 (UTC)
The only other option would have been "T" (as the other projects strip the wiki prefix) but it was suggested that that could be confused with "Template:" - and is counter intuitive for beginners. meta:Talk:Interwiki map/Archives/2008-08#Wiktionary has the discussion. Conrad.Irwin 02:03, 17 November 2008 (UTC)
Thanks! Conrad.Irwin 02:03, 17 November 2008 (UTC)
Thanks, Conrad, for taking the issue of creating a single-letter prefix for Wiktionary to Meta, thus getting it done in the first place. --Dan Polansky 06:05, 17 November 2008 (UTC)
Oddly, d: links to English Wiktionary, not WIktionary in general. Contrast http://he.wikipedia.org/wiki/d:foo with http://he.wikipedia.org/wiki/wikt:foo .—msh210 07:06, 17 November 2008 (UTC)
Hmm, that seems like a bug to me... Conrad.Irwin 09:07, 17 November 2008 (UTC)
I suspect rather that it means that d:foo was added as an interwiki link like any other (like doi:foo, e.g.), and was added to link to enwikt (as any other interwiki link links to some specific URL), rather than having been added as the counterpart to w:, n:, q:, et al., which are, somehow, treated specially my the MW software or the WM configuration thereof. Not a bug, just a misunderstanding by the implementer of what was to be implemented.—msh210 17:28, 17 November 2008 (UTC)
I take that back. http://en.wiktionary.org/wiki/w:foo works but http://en.wiktionary.org/wiki/doi:foo does not, but http://en.wikipedia.org/wiki/d:foo does, so d: is sorta like the w: and n: prefixes.—msh210 06:41, 18 November 2008 (UTC)

I'd like to ask about the views on how to best categorize nouns, and whether at all. Recently, the following categories have been created:

It seems to me that creating further categories along these lines would lead to a creation of the following categories:

  • Czech feminine nouns
  • Czech masculine nouns
  • Czech neuter nouns
  • Czech animate nouns
  • Czech inanimate nouns
  • Czech nouns with declension pattern pán
  • Czech nouns with declension pattern hrad
  • Czech nouns with declension pattern ...

I'd think these categories are unneeded. --Dan Polansky 08:29, 17 November 2008 (UTC)

Grammatical categories are tricky, as they can't really be decided for all languages, like topical categories can (and even that's just a bit....sticky. How about dividing up Ancient Greek words between Category:Greece and Category:Ancient Greece :-)). We have a number of subcats inside Category:Ancient Greek nouns, and I'm somewhat apathetic about them. They add little....but they subtract little as well. Ultimately, this is something which I think should be decided on a case by case basis by people who are involved with the language in question. Editors from closely related languages may have useful input, as their languages probably have similar issues. Thus, I would suggest that you and Romanb have a discussion about the merits of such categories, and perhaps get other Slavic folks involved (e.g. Stephen, Ivan, etc.) if a resolution cannot be reached. Ultimately, bring the issue back here, with each side's primary arguments for larger community assessment if that all fails. Personally, I see no merits to the gender cats (although "Czech nouns with declension pattern pán" might have some merit). However, as curently presented, I think that I (and most of the other folks reading this) am utterly unqualified to judge such things, as I don't know Czech, and thus don't really understand how people might want to sort Czech nouns. I hope this does not come off as me simply brushing you off. -Atelaes λάλει ἐμοί 08:53, 17 November 2008 (UTC)
Useless categories, should all be deleted. We shouldn't follow Wikipedia's over-categorizing mentality "if it can't hurt, leave it". I cannot image in what circumstances should one be interested in inspecting ever-growing category populated with thousands of entries connection of which is based on some trivial property such as animacy and gender. Much more interesting would be closed categories, e.g. n-stems in Russians (only a handful of PIE consonant-stems nouns have been preserved in all Slavic languages), nouns meanings of which can be both animate and inanimate (and thus have dual forms in some cases), or have defective inflection, or represent an exception from normal gender assignment based on a suffix, or are in any other way "interesting". --Ivan Štambuk 13:40, 18 November 2008 (UTC)
They seem useful and interesting to me (that is, I would expect to find them of interest if I were learning Czech, which I currently am not). Certainly if a particular entry leaves the user guessing about some aspect of inflection, checking other entries with the same inflectional pattern can be helpful. Also, fine-grained categories can help with maintenance. As Wiktionary develops, there will be approximately n occasions (where n is an arbitrarily large number) when we realize that all entries for Foovian words with certain properties have inaccurate or missing information. Suppose we have entries for 10,000 Foovian nouns, among which there are 500 5th-declension Foovian nouns... to clean up a problem specific to 5th-declension nouns, it would be vastly easier to go through the category for 5th-declension nouns than to sift through the entire "Foovian nouns" category by hand. If you see what I mean... :-) -- Visviva 03:45, 19 November 2008 (UTC)
The gender-based subcategories of nouns have some merit, but as far as I'm concerned, any further subcategorization is a horde of hobgoblins waiting to eat up valuable time that should be spent on more productive pursuits. --EncycloPetey 07:08, 20 November 2008 (UTC)
Like the creator of these gender-noun categories, I think they are useful for learners of the languages, who can see the similarities between such nouns. I was considering whether or not to create a category for animate and inanimate nouns, and probably I will later. Comparing the Czech nouns categories and all the different sorts of nouns in Category:English nouns, I see no reason not to include these. However, I'm not so fussed either way. --Romanb 18:24, 24 November 2008 (UTC)

I want to bring attention to this request: http://en.wiktionary.org/wiki/Template_talk:en-noun#Pluralia_tantum. I'd like to see this happen too. - dougher 05:53, 18 November 2008 (UTC)

I just created {{en-plural-noun}} for this, as {{en-noun}} is complicated enough already. Conrad.Irwin 09:27, 18 November 2008 (UTC)
How does this differ from {{plurale tantum}} and {{pluralonly}}? Thryduulf 11:57, 18 November 2008 (UTC)
According to {{en-noun}}, we ought to enter a singular for the word "entrails", that is the problem. Circeus 18:15, 18 November 2008 (UTC)
  1. There is a {now-obsolete} singular word [[entrail]] that appeared in Webster's 1913.
  2. There are 500+ raw b.g.c. hits for "entrail", often with "one" or "an" (indicating countability).
This is not an unusual pattern of usage (or [[abusage]]). DCDuring TALK 18:38, 18 November 2008 (UTC)
I was just citing this particular example because it was the first fairly clear-cut that came to mind, how about bagpipes, physics, acoustics, brass knuckles, Y-fronts, feces, longjohns, memorabilia, northern lights, smithereens? Clearly using the "uncountable" option of the template is inappropriate (it's explicitly intended for mass nouns). Circeus 19:02, 18 November 2008 (UTC)
My sole interests is in indicating the complexities not yet addressed. The template does not accommodate some real entries. Also it is not obvious from the label "plurale tantum" what the practical implications are for normal users or language learners. "Bagpipes" are (almost always), but "physics" (or "acoustics") is. "Memorabilia" seems to accept both. Presenting a label without presenting how to properly use the word doesn't seem adequate. DCDuring TALK 19:21, 18 November 2008 (UTC)
Personally, I prefer to use {{infl|en|noun}} when {{en-noun}} is being uncooperative. (Of course, this doesn't accommodate pluralization at all). -- Visviva 03:48, 19 November 2008 (UTC)

The Wikipedia article w:Serbo-Croatian language says that the two-letter ISO 639-1 code "sh" for Serbo-Croatian is deprecated, while the three-letter ISO 639-3 code "hbs" is not.

Wiktionary currently only supports the "sh" code, should we continue to use the deprecated code or should we switch to the active code? Thryduulf 13:15, 18 November 2008 (UTC)

hbs is for what Ethnologue calls "macrolanguage" (their own neologism), not individual language code. SC is not used on Wiktionary, so {sh}/{lang:sh} and {hbs}/{lang:hbs} should not be normally transcluded anywhere unless in special circumstances. --Ivan Štambuk 13:31, 18 November 2008 (UTC)
sh should and must be available in Wiktionary for every person from the former SFRY who does not deny the existence of this language (there are such contributors) just becuase of political (separatist) convictions, unless he speaks the Slovenian language or the Bulgarian language in Macedonia. And this must apply not because I am adherent of the language, but because of the existence of the ISO code. Those who dislike it, either make no usage of it, or write sh-0, am I right? Bogorm 13:43, 18 November 2008 (UTC)
Bogorm, I though I've explained this to you while ago. There was never such thing as "Serbo-Croatian language" or "Serbo-Croatian languages", before it was invented by Communists in SFRJ. Croats and Serbs have had separate literatures for centuries (some more than the others..), and the fact that in the 19th century the same dialect (stylised Neoštokavian, but different subdialcts) was chosen for a literary language of both Serbs and Croats does not "prove" anything. Dialects of this "Serbo-Croatian" area do not form a genetic clade (there was never "Proto-Serbo-Croatian"; their last common reconstructable ancestor was Proto-West-South-Slavic). --Ivan Štambuk 13:49, 18 November 2008 (UTC)
And I too explained that there are exactly three South-Slavic languages - Slovenian, Serbo-Croatian and Bulgarian. I suggest we stop here since our argument is of no avail for people outside the Balcan peninsula to decide on the ISO matter. You also explained that there were "Czechoslovakian" also invented by Communists, when in the whole БСЭ there is not a single word about it. Until political radicalisation on the Balcans since 1990es this term was tranquilly in circulation and prevailing and that is what people still feel predilection for it - because it is traditional linguistics who endorse it, while the modern is not politically impartial.
Enough with the most widespread South-Slavic language, let's concentrate on its code. So, either preserve sh, or induce hbs? In mine opinion this ought to be decided in Meta, so that scores of native speakers be enabled to partake, ok? (I mean, discussions in Meta are regularly announced on local Wikipedias, whereas these here are not) Bogorm 13:59, 18 November 2008 (UTC)
All who are interested in the dispute between me, User:Dijan (whom I ardently consented with) and User:Ivan Štambuk and in the arguable Czechoslovakian language, may espy here.Bogorm 14:06, 18 November 2008 (UTC)
Look Bogorom, I personally don't give a flying f*** what some Communist encyclopedia says about "Czecho-Slovak", or "Serbo-Croatian", or "Serbo-Croato-Slovene" language (did you here about this latter one? It was suppose to be the official language of SHS kingdom, but it failed to be codified). No "traditional linguistics" endorse it but that of ignorance and laziness. Croats have called their language Croatian centuries before Communist decided to sanction that practice (entire book editions were burned just because they had "Croatian" and not "Serbo-Croatian" in the title), and utilize it to systematically Serbify Croatian speech in all semantic spheres. If you think of Communist linguistics which imprisoned people for using the "wrong words" or published dictionaries stripped of very-much-alive words which were not "acceptable" just because they were Croat-only as "politically impartial", and modern democratic peer-reviewed-journal-published views as a result of 1990s radicalisation in the Balkans, than you are very much deluded. I know it's simple for outsiders to "simplify" things, imagining that "SC" somehow "disintegrated" paralelly with SFRJ, a view which is still cherished by some Serbs (because it gives them right to claim Croatian cultural heritage), and some Communist-sympathising Yugonostalgics, but issuses are far, far more delicate than that, and please don't raise them here because every instance you mention it I feel like being called to explain why you are wrong.
As for the code - this has nothing to do with Meta, but with a set of template such as {{sh}} which are used by other templates to convert ISO code to language name. Wiktionary already uses different sets of codes than those "intelligently" chosen by Meta Language Committee. --Ivan Štambuk 14:25, 18 November 2008 (UTC)
Do not distort/deride my name, ok? Bogorm 17:42, 18 November 2008 (UTC)

Now you've had your rants at each other, please can we get back to the matter in hand. There are 48 words in four languages that are included in Category:Serbo-Croatian derivations or a subcategory thereof, either manually or through one of {{SH.}} or {{etyl|sh}}. It would not surprise me to find other words so derived but which are not categorised as such. The only question is do we use the deprecated code "sh" or the active code "hbs" to denote these. {{SH.}}, like all dotted etymology templates, is already deprecated in favour of {{etyl|xx}}, if we chose to continue to use the deprecated code then xx will continue equal sh in the case of Serbo-Croatian derivations; if we chose to use the active code then xx will equal hbs. It will be trivial to convert existing {{etyl|sh}} entries to use {{etyl|hbs}} instead. If I read special:whatlinkshere/template:sh correctly, there are precisely 4 pages that would need amending in this manner. 1 page in the Wiktionary namespace and 1 page in Robert Ullmann's userspace may also need changing.

This is not about politics, what native speakers call their language, what $encylopaedia calls the language. Nor is it a proposal to alter the status of the language on Wiktionary, do anything at all to any entry in or referring to Croatian, Serbian, Bosnian or any other language. Please can we keep this civilised and rational. Thryduulf 15:13, 18 November 2008 (UTC)

Let's try to follow published technical standards, rather than formulating our own through political and ethnic discussions. Michael Z. 2008-11-18 17:31 z
Unfortunately, hbs is a macrolanguage code (as Ivan Štambuk (talkcontribs) points out), so by my understanding, we can't use {{etyl|sh}} (because it's deprecated) or {{etyl|hbs}} (because it's not a language code). The question is then what attitude we want to take toward {{SH.}}: do we consider it mandatory, desirable, optional, undesirable, or forbidden for it to be replaced with {{etyl|sr}}/{{etyl|hr}}/etc.? My vote is for "desirable": {{SH.}} is acceptable but not ideal. —RuakhTALK 20:03, 18 November 2008 (UTC)
Agree completely with Ruakh, every word. -Atelaes λάλει ἐμοί 21:26, 18 November 2008 (UTC)
Ack, it's never as simple as we'd like. I'd like to stick with standards.
But I need to know what to type into an etymology when my source says “Serbo-Croatian”. Isn't hbs a synonym for sh? Michael Z. 2008-11-19 06:33 z
 Michael Z. 2008-11-19 06:33 z
That's easy. The best practice for now is, when your source says "Serbo-Croation", use {{SH.}}. The procedure for macrolanguages/language families is, as yet, undecided. Serbo-Croatian is in the same limbo as Germanic ({{Ger.}}. These will undoubtedly get sorted out in time. -Atelaes λάλει ἐμοί 06:56, 19 November 2008 (UTC)

{{SH.}} must be obsoleted for individual language codes. From a cursory look of the etymons that use it in their respective etymologies, all of them are either Croatian or Serbian-specific, are shared with Slovenian and/or Bulgaro-Macedonian, or are even Common Slavic (like slava). The act of borrowing of all of them predates the conception of "Serbo-Croatian" by centuries. Defective and unreliable sources are no excuse to push the usage of {{SH.}}. --Ivan Štambuk 15:23, 19 November 2008 (UTC)

Okay, but I do have access to etymological dictionaries which give “Serbo-Croatian” (or “Serbian-Croatian”). How would I now enter the lang attribute in a template, where I would previously have typed {{term|...|lang=sh}}? I'd prefer to keep doing it the old way and keep the data structured, than to enter it without the template and then not being able to locate and update later it in the wikitext soup. When the standard is updated, we're not obligated to adopt it overnight before we can work out a way to deal with it. Michael Z. 2008-11-19 16:01 z
You can't use {term} with lang=sh, because there are no L2 SC entries you can link to. If you prefer the no-brainer path, feel free to use it that way (or {SH.} when in etymologies), and it will be taken care of sooner or later. --Ivan Štambuk 16:12, 19 November 2008 (UTC)

As a newcomer, I want to suggest the following. Using this main "Project page" for an actual list of guidelines and policies...moving all discussions to the discussion page. Which should pave the way for grouping items and then moving whole groups onto separate pages...? -- IrishDragon 03:33, 19 November 2008 (UTC)

The Beer Parlour is a general discussion forum, comparable to the Village Pump on Wikipedia. I think you are looking for something like Wiktionary:Policies and guidelines. -- Visviva 04:47, 19 November 2008 (UTC)
Yes, but why is it so hard to find...and why isn't it a "real" page? (This redirects to the "Talk" page??) -- IrishDragon 03:25, 20 November 2008 (UTC)
This is not Wikipedia. You are coming here with Wikipedia-based expectations. We don't use our policies or guidelines in the same ways that Wikipedia does. We have only a very few core policy pages, not a library-full of them with supplementary guidelines, essays, and ramblings the way that Wikipedia does. --EncycloPetey 07:02, 20 November 2008 (UTC)

OK, I am now the proud owner of a 2.3-megabyte text file containing basic entries for all Unicode Hangul syllables. For examples of the output, see and . Once created, I do not intend to edit these entries again, ever (excepting the handful that are real words), and I would sincerely hope that no one else has to edit them either. With that in mind, are there any final thoughts about the layout of these entries? -- Visviva 04:44, 19 November 2008 (UTC) P.S. If one of our resident wizards could find a way to make Template:ko-symbol-nav a bit less squirrely, that would be wonderful; however, since it's templated it's not urgent.

Looks good. I like that all the elements are in flexible templates. Quibbling details:
Can we link to Revised and Yale transliteration info? (for applications like these, it would be nice to have unobtrusive links, like the context labels in pl.wiktionary—cf. “geogr.” in pl:Korea).
ko-symbol-nav seems cluttered by all of the hyphen separators. Dots would be less obtrusive if you insist on character separators, but I think the table arrangement and spacing is sufficient. I would also link both the arrow and character for previous and next. I'd be glad to rework the template.
Wording: does ko-usage-keystroke need the word standard?—(to differentiate it from a common non-standard dubeolsik keyboard?) Can we link dubeolsik keyboard to an explanation or Wikipedia? ko-usage-unicode: Unicode standard notation is U+AD6B, with no need to explain that it's hex. You could reduce the wordiness to “Unicode representation U+AD6B.” Michael Z. 2008-11-24 17:36 z
Thanks for this. I think I have implemented all of your suggestions above (except for the "unobtrusive links" part; I'm not sure of the current state of consensus on that). Please feel free to edit the templates further if you are so inclined. -- Visviva 03:11, 25 November 2008 (UTC)
Neat! Let's do it! bd2412 T 04:52, 25 November 2008 (UTC)
The only thing that bothers me is that the "Usage notes" aren't actually notes about usage. Rather, they're notes about typing and encoding. Does anyone know of a better header for this section? --EncycloPetey 03:07, 28 November 2008 (UTC)
I don't see the problem there (taking "usage" with a very broad definition). Maybe just "notes" would avoid any such problem, but I don't think readers will be confused or mislead by the header as it is. bd2412 T 05:50, 28 November 2008 (UTC)
Good point. I would have just called them "Notes" if that weren't proscribed. How about "Technical notes"? That might come in handy for many of our Translingual entries as well. -- Visviva 05:53, 28 November 2008 (UTC)
It is of a technical nature. Whatever answer is chosen, the same method should be used on the Chinese/Han/CJK(V) entries and the Korean syllable entries, and from there to any entries about letters or symbols or characters which include technical nature.
An example of an actual usage note for Korean syllables might be to note those which don't actually occur in Korean writing. If I'm not mistaken I believe I've heard or read that Unicode includes Korean syllables which are technically possible but linguistically impossible. Is this correct? — hippietrail 06:40, 28 November 2008 (UTC)
It's a bit difficult to prove a negative. Syllables that don't exist in standard Korean may still turn up in eye dialect and internet 외계어 ("Martian", the language of the PC-bang generation). There are also some syllabic blocks that can never represent a syllable, but which are nonetheless common in written Korean (in fact that would apply to any syllable with an aspirated or compound batchim, such as 읊 in 읊다 or 없 in 없다). If someone wants to compile this information, it would do no harm, but I ain't volunteering. :-) -- Visviva 07:40, 28 November 2008 (UTC)
As a comparison, here is a typical Han character entry:
(radical 187 馬+2, 12 strokes, cangjie input 戈一尸手火 (IMSQF) four-corner 31127)
References
KangXi: page 1433, character 11
Dai Kanwa Jiten: character 44579
Dae Jaweon: page 1958, character 5
Hanyu Da Zidian: volume 7, page 4540, character 3
Unihan data for U+99AE
A lot of the technical stuff is in the "inflection line", skipping definitions, which syllables don't have, we then have a "References" heading which tells us where to find this character in several well known character dictionaries, followed by a link to the Unicode site which is where the Unicode codepoint is given. — hippietrail 06:49, 28 November 2008 (UTC)
I don't think the situations are really comparable. The CJK characters are real units of meaning, with real entries in real dictionaries; Hangul syllabic blocks, for the most part, have no independent meaning, and no existence outside of the realm of digital possibility. (This is why I tried to have them deleted, but failing in that I figure the next-best thing is to create a complete set of consistent entries for them.)
But yes, I could see putting the keyboard input and composition in the inflection line -- though I have to say that seems a bit odd, even for the CJK entries -- and putting the Unicode data under "References". Anyone else have thoughts on this? -- Visviva 07:40, 28 November 2008 (UTC)

These given name categories must all be renamed. Names are not male or female, they are masculine or feminine. --EncycloPetey 18:53, 21 November 2008 (UTC)

Yikes, between Category:Female given names by language and Category:Male given names by language there's ~100 subcategories written that way. Interestingly, b.g.c. has similar usage statistics between the two forms (when searching w/o "given" as well) while the web clearly prefers the "male/female" usage (~5:1). IANAG (I am not a grammarian) but could common usage be employing "male/female" attributively here? --Bequw¢τ 23:33, 23 November 2008 (UTC)
No, it's not used attributively. If the anmes themselves actually were male and female, they'd be getting together to produce baby names. --EncycloPetey 19:55, 25 November 2008 (UTC)
This seems to me like a perfectly ordinary use of male#Noun as a noun adjunct: "male given name" = "given name of a male". I can see where "masculine" might be somewhat better, but then again given the various additional meanings of masculine/feminine it could be misleading. I don't see a problem with Makaokalani's system. -- Visviva 13:56, 26 November 2008 (UTC)
You'd have to check all the 8000+ given name entries and move hundreds of names into the new categories.
"Masculine/Feminine given names" are correct for languages with exactly two genders, m and f. For all other languages, it would be "Given names borne by persons of the male/female sex". English names cannot have a grammatical gender. And what about "Cecil is such a feminine name for a man"? "Male/Female given names" sounds like a reasonable compromise, short and easy to understand.
The old names of given name categories (fr:Male given names) sounded fine to me, but Robert Ullmann insisted they must be changed into POS ( French male given names). There is a rule about it, so I created nearly 200 new categories and now I'm busy cleaning up the old categories and adding templates to hundreds of names. Very boring and quite unproductive work. I refuse to start all over because of a male/masculine controversy. The only thing that really matters is the content of the entries. --Makaokalani 14:07, 24 November 2008 (UTC)
Male/female is not a "compromise"; it is incorrect. Modern English usage places emphasis on the biological traits for "male/female". The terms "masculine/feminine" are applied to gender roles. Yes, it would mean changing all the entries, but they're already been changed before, haven't they? And they're supposed to be done through templates, which makes the task much easier. It is also incorrect to say the English names "cannot have a grammatical gender". Modern English does have vestiges of gender as it existed in Old English, and the separate third-person pronouns for he/she and him/her are plain testament to that fact. These words evoke a connotation of gender, and most English given names do exactly the same.
The fact that you decided to proceed with a major change to category structire on the basis of a private conversation, without seeking input from the community is your responsibility. If you really believed that "the only thing that really matters is the content of the entries", then you wouldn't have started changing the category names in the first place, would you? --EncycloPetey 19:53, 25 November 2008 (UTC)
"Private conversation?" I discussed this in Wiktionary:Requests for deletion/Others#Category:Armenian names and Wiktionary:Beer parlour#New subcategories for English given names. I take responsibility for the change from topic to POS categories, but not for using the male/female words just like they had been used before. Nobody had ever mentioned that it could be a problem. All the entries haven't been changed yet, and several hundred names don't have templates in the new system either, in categories "--- diminutives of male/female given names", "--- male/female given name parts". If you promise to change them all, that's fine with me.
What about having a vote, if you feel so strongly about this?--Makaokalani 13:34, 26 November 2008 (UTC)
I would be willing to make those changes, although I can't promise to do them quickly. I am currently editing (by hand) the inflection line of more than 3000 Spanish verb entries, in part because I'm the one who proposed a unified template for Spanish verbs, and because the variation in the verb inflection patterns and in the entries themsleves necessitate that each one be seen by a person. Sorry that I mised that October discussion. I took a wiki-break during October and did not edit so much that month. It seems I missed that BP discussion, or I'd have commented. --EncycloPetey 03:03, 28 November 2008 (UTC)
Makaokalani is entirely correct, the names are names for males and females. This makes sense for all languages. "Masculine" and "feminine" only make sense for languages where nouns (and proper nouns) have grammatical gender; for other languages "masculine" and "feminine" are utter nonsense. "Robert" is not a "masculine" name in English, it is a male name. Full stop. "Male" and "female" we can and should use for all languages, just as it is. Robert Ullmann 15:18, 1 December 2008 (UTC)
That's not true. "Masculine" and "feminine" don't pertain only to grammatical gender, but also to social gender; a transman (a man who's biologically female) will go by a male/masculine name such as "Robert". But the issue is complicated; fairly few societies have really accepted transgender people, and names are generally assigned long before a transgender person is capable of articulating such … if a transman's birth-name was "Jessica", was that his name because he's biologically female and it's a female name, or was that his name because it's a feminine name and his family assumed he would be a woman? If some people continue to call him "Jessica" after he's transitioned, is that because it's a female name and he's female, or because it's a feminine name and those people are failing to recognize him as a man? Neither male/female nor masculine/feminine seems totally accurate — from a purely descriptivist standpoint, it seems that such names are both male/female and masculine/feminine — and I don't see how you or EP can see this in such black and white. (And that's even ignoring intersex people, genderqueer people, and random people with arguably gender-inappropriate names.) —RuakhTALK 15:33, 1 December 2008 (UTC)

I know this is a bit off-topic, but could someone knowledgeable about it check the Lithuanian word (semens - seed) which I added in Appendix:Proto-Slavic_*sěmę - in my Etymological dictionary it was written with ě, but when I looked up the letters in Lithuanian in the Wikipedia article it turned out that this kind of e is not there, so I wrote simply e. Unfortunately, I understand no whit in this language. Which letter is the correct one? Bogorm 15:48, 1 December 2008 (UTC)

It's probably sė́menys (flaxseed, linseed). You've mixed up acute tone on <ė> for a caron <ˇ> ^_^. LKZ is excellent source for finding out obscure Lith. words, as well as for accentuation paradigm. --Ivan Štambuk 16:34, 3 December 2008 (UTC)
There are also variant forms sė́menės, sė́mens..your source probably referred to the second one. --Ivan Štambuk 16:40, 3 December 2008 (UTC)

What has happened to Webster's Unabridged Revised Dictionary - yesterday I could not open any entry (in two browsers) and neither can I today. For example this does not open and here an innumerable amount of entries are dependent on it. Is anyone seeing something meaningful in the link? Any information? Bogorm 15:55, 23 November 2008 (UTC)

I have same problem, but this link just now worked for 1828 and this one for 1913. Onelook.com is a useful gateway to multiple dictionaries as well. DCDuring TALK 18:34, 23 November 2008 (UTC)
It seems to have slipped by. Bogorm 12:31, 28 November 2008 (UTC)

belter a term used in reference to the individual characteristics displayed commonly found in dundonian female personality. in a collective format the reference to being a belter implies that a person is lacking in social skills and intelligence, often associated with monday books and social loan repeat applicants. —This unsigned comment was added by Tangerine queen (talkcontribs) 17:00, 23 November 2008.

He/she posted this on many irrelevant pages. I moved it to Requests for entries at the time. Equinox 19:44, 29 November 2008 (UTC)
Strike as well-covered elsewhere. DCDuring TALK 19:59, 29 November 2008 (UTC)

Category:fa:Male given names has been deleted and replaced by Category:Persian male given names. This creates a problem insomuch as Category:fa:Male given names did contain quite rightly the Arabic names in Persian, yet now Arabic names such as علی Ali and جمیل Jamil are in a wrongly-named category: 'Category:Persian male given names'. 'Persian given names' are those such as کوروش Kurosh/Cyrus and داریوش Dariush/Darius. How can this be resolved? Kaixinguo 06:09, 24 November 2008 (UTC)

This is just the kind of misunderstanding I was afraid of. Only the name of the category has changed, it still applies to the same names: any male given name in the Persian language, given by Persian speaking parents to their son, including those derived from Arabic. Just like Cyrus is an English male given name, though it derives from Persian. You may create a subcategory if you wish, by the template {given name|male|from=Arabic|lang=fa}.
A real problem is Category:English male given names from Persian. Besides Cyrus, it contains names like Behrouz and Ehsan, which are not given by English-speaking parents to their children. They should be classified as "English transliterations of Persian male given names", or "Persian male given names transliterated into English", or... I cannot think of a good name. But as long as this isn't resolved, it's better to keep them in the wrong category than to leave them outside all categories.
By the way, your entries on Persian given names were exciting. I learned a lot:)--Makaokalani 14:23, 24 November 2008 (UTC)
How about "Persian male/female given names in Roman script"? These are at least partially independent of language (that's what makes them so headache-inducing), but the script is a straightforward criterion. We could then have a matching category "English given names in Persian script" and so forth.
But then again, are we sure that names like "Behrouz" are never given to the children of English-speaking parents? Even if they are of Iranian ancestry? Seems difficult to prove either way. -- Visviva 15:01, 24 November 2008 (UTC)
On the one hand, our main taxonomy is by language, but on the other classifying these names by transliterated script would greatly limit the proliferation of categories, and category membership for many names. But it will clump together foreign-sounding names transcribed for French, German, Slovak, Hungarian, Filipino, etc. Maybe start with script categories, then subdivide them into languages if/when their membership grows large.
Is “given to children of anglophone parents” a useful criterion at all? Many people's romanized names are assigned by passport-issuing authorities or immigration authorities, including Children's, and including many people who speak English or are learning it, or are becoming citizens of an English-speaking country.
Some countries have official romanization schemes used for passports, and it may be useful to note this in etymologies. Michael Z. 2008-11-24 17:05 z
"Persian male/female given names in Roman script" sounds fine, and is suitable to the Template:given name. The language statement is a problem, I would so much like to call such names "Translingual", even when they don't appear in all languages. But the language can be changed later,and subcategories added.
"Given to children of (several) native speaker parents" is an essential criterion for the language statement of a given name. People immigrate and intermarry so much that if the names of first and second generation immigrants were counted, all names would occur in all languages. Another good criterion is the pronunciation. There is no standard English way of pronouncing Behrouz or Ehsan. You'd have to refer to the Persian pronunciation, in the original entry. --Makaokalani 14:55, 25 November 2008 (UTC)
I'm skeptical about the native-speaker parents criteria. 1. some “foreign” names are used in countries where English is used as an official language or by a large number of second-language speakers, like Pakistan or the Philippines; 2. if someone immigrates to an English-speaking country and make the news, or they start a company, etc., then their name may be widely used in English. Whether standard or not, their name is pronounced somehow in English, and this can be determined by research. I think names in English should simply be included if they are attested, like any other term. Michael Z. 2008-11-25 18:42 z
I think if Behrouz is attested in English (e.g. (not i.e.), if there are three independent people (i.e., not named after one another) whose names appear on official papers in English-speaking countries), it's an English name; otherwise it's a transliteration, and AFAIK our policy is not to include transliterations.—msh210 19:18, 25 November 2008 (UTC)
Thanks to User:Makaokalani for spending so much time to make the name entries in Wiktionary work.
I want to clarify something for myself : what is the status quo on names not of English origin in Roman script which are not that commonly seen in English-speaking countries? Also, am I correct in thinking that, as things stand, a name such as 'Michael' could have an entry for numerous languages using Roman script? How about a name such as 'Mohammad' which has entered numerous languages? Thanks. Kaixinguo 22:58, 25 November 2008 (UTC)
Also, I think it is a given that almost every single Persian name would be able to be attested in English.Kaixinguo 23:02, 25 November 2008 (UTC)
Maybe we should create "Wiktionary:About given names"? Wiktionary rules are made for words that mean something, and some rules make no sense when applied to names. An example: Wolfgang is the first name of Mozart and Goethe, and of thousands of other German speakers. If the CFI is "three citations in three years" - and about different persons, as msh210 suggests - Wolfgang is a word in practically all languages that use Roman script. How many such languages are there - a thousand, certainly? A thousand identical explanations for Wolfgang, and for every name that is reasonably common in any major language? ( And for many place names.)
If Behrouz is called "English", then it should also have an entry in hundreds of other languages using Roman script. That's why I want to call it "Translingual". You could add a list of the languages where this particular transliteration is used.
Surely we could think of some special rules for given names? For names used in India, Pakistan, Philippines, we could decide every country separately. Mohammad should naturally be translated into every language. The present entries for "Michael" mean only that it's a common name in those languages - a very common name in Denmark, for example.--Makaokalani 13:59, 26 November 2008 (UTC)
I agree that an "About given names" page would be an excellent idea (as would an "About surnames" page). The argument that I have seen in favor of not using Translingual for proper nouns is because the inflection and pronunciation will differ from one language to another ... for example, if I'm not mistaken, the genitive of Wolfgang in Finnish would be "Wolfgangin". I think we could really do without pronunciation info for given names in languages where they are not "native" -- there are at least three different ways that Anglophones pronounce "Wolfgang", but only the German pronunciation is "correct" in any meaningful sense. IMO we could do without inflectional information too, though it does have some use.
This has been an area of contention for some time, so a WT:VOTE will probably be necessary to end our current (bad) practice. -- Visviva 12:57, 29 November 2008 (UTC)

I wonder about the preferred formatting of disambiguating glosses at non-English entries. I have seen the following variants, and have been entering all the Czech entries using the first one.

  1. car (nonpowered unit in a railroad train) -- using {{i}}
  2. car (nonpowered unit in a railroad train)
  3. car (nonpowered unit in a railroad train) -- using (''gloss'')

User:Tbot uses the second variant when creating non-English entries.

The reasoning behind my choise of italic was that glosses that refer to sense in the synonyms section of English entries are typeset in italic by {{sense}}, and that the text entered using the template {{sense}} serves a similar role as the glosses being the translation at non-English entries.

Is there any mention in WT:ELE to that direction that I have overlooked?

Thanks for your input.

--Dan Polansky 16:59, 24 November 2008 (UTC)

My reflex is to avoid applying any formatting or other design element unless it serves a clear purpose. I don't see the italics adding any meaning, or meaningfully distinguishing here, so I would simply omit them. And if a bot is making hundreds of these, we may as well remain consistent. Michael Z. 2008-11-24 17:11 z
I like templates like {{gloss}}, they make it easier (slightly) to solve the otherwise impossible linking definitions problem. Whether it should be italics or not is just user-preference, and can be toggled at WT:PREFS. Conrad.Irwin 17:27, 24 November 2008 (UTC)
Sounds good to me, but AFAIK we currently have no template serving the purpose. {{gloss}} is now a redirect to {{gloss-stub}}, which says that gloss is missing.
What about turning {{gloss}} from a redirect to a template serving exactly that purpose of marking up disambiguating glosses in non-English entries? That should be practicable, as currently less than 30 entries are referring directly to {{gloss}}; I could change them manually to refer to {{gloss-stub}}. --Dan Polansky 21:32, 24 November 2008 (UTC)
I agree with your last comment Dan. I don't think we should be using {{i}} (or {{i-c}}, etc) if there is something more specific available to allow the greatest amount of customisability. Thryduulf 22:19, 24 November 2008 (UTC)

Just want to pipe in that for the "final" version, we would want the entry to be formatted as an entry without the parenthesized gloss. I'd format this as "A railroad car." myself. Circeus 23:41, 24 November 2008 (UTC)

I agree, but I don't think we're ever going to convince everyone else. —RuakhTALK 01:09, 25 November 2008 (UTC)
I don't mind any specific format for the bot-created entrances (except to say that the {{i}} classes should NOT be used). But I do have issues with parentheses used as disambiguation inside definitions, which is why I believe such should be automatically considered non-final entries (in addition to the various more obvious issues, e.g. automated légal or pal would be incapable of accounting for these words' inflections). Not to mention the definitions most often requires some proper fine-tuning, or the parentheses may be superfluous (cf. lire entre les lignes). Circeus 01:35, 25 November 2008 (UTC)
I think this gets to the distinction between definitions and translations. Stephen was noting this distinction elsewhere recently (on RFD I think), and while I was busy disagreeing with him at the time, I think it is an important point. There are good reasons why bilingual dictionaries generally list translations rather than giving "full" definitions (except for those words where no simple translation is possible). If we want to break from that tradition, we can, but it merits some serious thought. ... At any rate, in the example in question, I think "railroad car" works neatly as both a translation and a definition. That is the ideal solution. -- Visviva 03:07, 25 November 2008 (UTC)
This turns the discussion from formatting of glosses to whether the glosses should be there at all, and about the overall preferable format of non-English entries. I find it preferable, and have understood it to be the common practice, to indicate a single word or phrase serving as a translation, or a list of these. That is, to find such a word that a translator could actually use when translating text. In the context of railroad, "car" mostly disambiguously refers to a railroad car (unless it is a car carrying automobiles), and the translation would sound weird if it achieved formal disambiguity by invoking "railroad car" all the time. That is, from the following two options, I prefer the first one (disregarding now the question about the preferred formatting of the gloss).
  1. car (railroad car)
  2. A railroad car.
Sometimes, disambiguity can be achieved by listing synonymous translations instead of using a gloss, as used in vůz:
  1. car, automobile
  2. car, train car
Another example that could serve as a test case is výraz, currently formatted in the first sense as:
  1. expression (facial appearance usually associated with an emotion)
The entry clearly indicates the most useful translation in the given sense, accompanied by a gloss indicating the sense. The following alternative makes the "výraz" entry almost non-distinguishable from an English entry, practically destroying the rule that non-English entries should avoid definitions.
  1. An expression; facial appearance usually associated with an emotion.
--Dan Polansky 09:40, 25 November 2008 (UTC)
re "destroying the rule that non-English entries should avoid definitions" um, what? I've heard other people say this, but there is no such "rule". Quite the reverse: WT:ELE says unequivocally that all entries have definitions. It is often convenient to simply use the appropriate English term (disambiguated as appropriate), but it is still required to be a definition. And not limited to being only a "translation". Our mission statement (main page) says: "This is the English Wiktionary: it aims to describe all words of all languages using definitions and descriptions in English." (and has always said something similar)
I don't mean to suggest that (say) jicho be defined as "An organ that is sensitive to light, which it converts to electrical signals passed to the brain, by which means animals see." (someone did that for some language, I don't recall ;-). Defining it as "eye (organ of vision)" (in whatever detailed syntax) is sufficient, and provides the one-word translation. But it is still, as required, a definition, not just a translation. Robert Ullmann 10:21, 25 November 2008 (UTC)
I agree. (I might prefer "An eye: an organ of vision in humans and many animals.", but it's the same idea.) Put another way, let's pretend that jicho were a rare English word meaning "eye". Obviously we wouldn't just copy and past the definition of "eye"; the clearest and simplest definition would be something like "(rare) An eye: an organ of vision in humans and many animals." I think the same logic applies to foreign words. —RuakhTALK 19:16, 26 November 2008 (UTC)
The example of jicho says "eye (organ of vision)". The definition of eye says: "An organ that is sensitive to light, which it converts to electrical signals passed to the brain, by which means animals see." So I would think that "eye" is the target term, "organ of vision" is a gloss, and the long term that I have just quoted is a definition. Put differently, while "organ of vision" would be probably insatisfactory as a definition in the English entry, it is perfectly okay as a gloss. So AFAICS the common practice is to avoid definitions and aim at the format shown at jicho. This can change, if we agree to do so, but that has not been the practice so far.
Quoting Wiktionary:Entry_layout_explained#Variations_for_languages_other_than_English, boldface mine:
"Entries for terms in other languages should follow the standard format as closely as possible regardless of the language of the word. However, a translation into English should normally be given instead of a definition, including a gloss to indicate which meaning of the English translation is intended."
--Dan Polansky 10:47, 25 November 2008 (UTC)
normally. normally. normally. In other words, "eye (organ of vision)" (a translation) is normally adequate as the definition, one doesn't write out "An organ that is sensitive ...". As I said. It (the ELE text) does not prohibit providing an adequate definition, and many, many, many terms require more than a one-word "translation" (1-1 translations between languages being a mostly mythical concept anyway). (Can you tell I am heartily sick of people insisting that entries be severely dumbed-down in one way or another because of whinging that "we aren't supposed to do that"? By all means include anything useful for the "translation", even though someone will claim it is a "definition" and therefore must be fixed and thus rendered useless. Argh.) Robert Ullmann 14:19, 25 November 2008 (UTC)
Okay, agreed. But then, let that additional, "useful" information be in the gloss, and let the best available translation be clearly indicated as standing in the first place at the entry. --Dan Polansky 16:12, 25 November 2008 (UTC)
On location of this topic in policy documents: The formatting of glosses in brackets and italics is mentioned at
and was added to that document on 14 February 2007. The mentioned document is just a help document; I can't find the relevant policy document. I assume that whether to format in italics or in roman has been considered a matter of taste so far.
The use of glosses in brackets has been codified in the mentioned document at least since 18 October 2006, using the example "[[man|Man]] (adult male)".
Wiktionary:Entry_layout_explained#Variations_for_languages_other_than_English has one paragraph devoted to the topic of formatting of non-English entries.
--Dan Polansky 10:47, 25 November 2008 (UTC)
To assume that you can go for a single-word translation is at best simplistic 90%. Anybody who's ever don translation knows that for most purposes billingual dictionaries are actually a Bad Thing. Words have connotations, collocations, they are used with specific referents that other languages don't have. They refer to things other languages don't name. There is no such thing as a centre local de services communautaires or a cégep in English, and there is in fact no proper way to translate "U.S. Route X" in other languages). soulier and chaussure have different connotations in Quebec and France (the former is literary in France, usual in Quebec). breuvage does not readily translate to any English concept (AND has differing usages regionally) etc. etc. Circeus 14:29, 25 November 2008 (UTC)
I do understand that single-word translations are approximations. But so are many definitions. I am not arguing against glosses and against usage notes. I am arguing in favor of finding the best available short terms that can serve as a translation. The claim that bilingual dictionaries are bad thing is an overstatement, to say the least; they are hugely useful as compared to not having them at all.
The examples you give are exceptions to the general rule that direct translations that are good enough mostly can be found. When we give up the effort of finding the best direct translation, we may end up with more definitions than needed, because it is so much easier to describe the meaning around than to search for the single word that does the job.
Taking the second sense in breuvage that you mention--"Any liquid that can be drunk, especially nonalcoholic ones.", the translation that I would have entred is
  1. drink; beverage (especially non-alcoholic one)
The current breuvage entry does not tell me that the straightforward, even if ambigous, translation is "drink".
Taking quite a different example, I find to desirable to remove any additional information that someone could want to add to:
  1. cat (animal)
in an entry for Katze. So the contention is not about one big yes or big no for definitions, but about whether definitions are tolerated in those cases at which they are superfluous.
--Dan Polansky 16:12, 25 November 2008 (UTC)
Perhaps it would be constructive to assemble short lists of example entries, where the various proposed formats are used and where we (or most of us) agree that the particular choice of format is appropriate. I know that for some Latin entries, I have used single word translations on the definition line, sometimes using just one word but other times using more than one translation when the shade of meaning is not quite the same or the English translation is slightly ambiguous. There are also times where no suitable English translation existed, and I gave a full definition. If we assemble a collection of illustrative examples, we can then write text to accompany them and have a guide for editors into the bargain. --EncycloPetey 19:34, 25 November 2008 (UTC)

Brief comment, only to the first question: I'm using invariably variant #2, like Tbot. I've never pondered about it, probably for the same reason Mzajac mentioned (to avoid applying any formatting or other design element unless it serves a clear purpose; the formatting in this bracket is used to indicate a quotation). -- Gauss 00:07, 27 November 2008 (UTC)

Re Ruakh's comment several paragraphs above, from 19:16, 26 November 2008: AFAICS the use of long format without brackets versus the use of list of terms followed by a gloss in brackets is exactly the point of contention, not something for a sidenote. The mentioned (a) "An eye: an organ of vision in humans and many animals." is AFAICS not compatible with what it says at Wiktionary:Entry_layout_explained#Variations_for_languages_other_than_English, unlike the alternative (b) "eye (organ of vision)". In the example (a), there is no gloss; there is a translation + colon + definition + period.
In this discussion, the word "definition" is used ambiguously. There is such a thing as definition by synonym, but that is not what is meant by "definion" in the contrast set consisting of translation, gloss and definition. Sure the combination of translation and gloss effectively provides a definition, but that is not what is meant by "definition" in this distinction. To say that translation and gloss should be there instead of definition is to say that the variant (b) is preferred to (a), not that (b) does not effectively define the foreign-language term.
I'd like to add that I am not necessarily in favor of one formatting or the other. It is just that I have understood the single paragraph devoted to non-English entries in WT:ELE in certain way, and applied it that way, and have seen it applied the same way in many non-English entries. This discussion shows that there is a need for clarification.
I'd think that this discussion must have been lead before, but I do not know where and when, and with what results and opposing parties. --Dan Polansky 07:07, 27 November 2008 (UTC)

Is it normal practice to have pages for every form of a Latin (or any other language) verb? Cos I've been creating them and somehow it feels wrong. Examples: detegis, detegit. Thanks. LGF1992UK 18:20, 24 November 2008 (UTC)

Yes it is. Ultimately we would like for those to be generated automatically, though, because their creation is tedious. Circeus 20:25, 24 November 2008 (UTC)
I have begun creating verb form pages by bot. A very good reason to do it this way is uniformity of the content. It is nice to see someone else interested in helping with this, but the Latin verb form entries you have created lack some of the desirable formatting for such pages. --EncycloPetey 19:29, 25 November 2008 (UTC)

This is a request for comments on this idea. This can be useful to people trying to solve cryptograms. (I suspect the last space in the category's name should be removed, but that's neither here nor there.)—msh210 20:52, 24 November 2008 (UTC)

I don't see that much point, but there's no reason why not. Conrad.Irwin 21:39, 24 November 2008 (UTC)
Given we have alphagrams, I can't see any reason to object to this. Thryduulf 22:14, 24 November 2008 (UTC)
Yes, there is a reason why not: for this to be useful to someone solving cyptogram puzzles, pretty much all words have to be categorized this way. You want this on every word? Is simple (trivial) to generate a cryptogram index from a word list. Adding the categories here doesn't add any information. (see for example [3] for more information) (and, yes, I don't think alphagrams are useful either, but to the extent that they are, they are useful for individual words; simple substitution cipher pattern lists are only useful with a complete index, and we don't want or need that here.) Robert Ullmann 10:36, 25 November 2008 (UTC)
Agree. These don't need to use the category system or be linked from entries. On the other hand, if someone wants to create and maintain pages in Appendix-space that list these (a la our Rhymes pages), I don't think anyone would have a problem with that. -- Visviva 16:35, 25 November 2008 (UTC)
Appendicize, per above. Appendix:English words with ABBCBB structure. bd2412 T 05:46, 28 November 2008 (UTC)
Yes, my intention was that this category be added to every English entry, or, at least, every English entry with repeated letters (not necessarily consecutively, so tot would be so categorized). Since people are objecting, and no one is enthusiastic about it but me, I'll delete the prototype.—msh210 18:07, 3 December 2008 (UTC)

Wikipedia is offering some etymology. See w:Wikipedia:Articles for deletion/Pay through the nose. Uncle G 16:07, 25 November 2008 (UTC)

This is a follow-up on a recent discussion about formatting of translations, glosses, and definitions in non-English entries.

Would anyone object if I repurpose {{gloss}} for the formatting of disambiguating glosses of definition lines in non-English entries? The {{gloss}} template is currently a redirect to {{gloss-stub}}, which is used to indicate that a gloss is missing.

The proposed default formatting of the template: "(gloss in roman)", based on the choice made by Tbot. The formatting can be changed in the template later, when we decide to do so.--Dan Polansky 07:31, 26 November 2008 (UTC)

It couls also be made customizable, which would keep everyone happy regardless of the formatting they'd like to see. --EncycloPetey 17:09, 26 November 2008 (UTC)
I have set up the {{gloss}} template for the task, and placed it in jicho, and some other entries. --Dan Polansky 09:14, 28 November 2008 (UTC)

I've just come in at the tail end of a conversation on a talker I'm active on that included the words "Oh, btw, Wiktionary sucks". I eventually got out of them why they thought this, and they came up with two reasons -

  1. "It includes things that aren't proper words" - by which they meant words that aren't in the OED.
  2. "It doesn't mark words that aren't universal as US only, etc" - addicting (adjective sense) and nihilarian (noun sense) were the two examples they gave (both now on the tearoom).

Adding more regional context labels where appropriate is obviously the way to deal with the second of these, but is there any way we could do better at knowing which words should be so labelled?

Regarding the first point, again I think more regional labels would help. However I think the main point here is that our descriptivist ethos of recording all words as used, not recording which words an "authority" says should be used, is not getting through.

Discuss. Thryduulf 13:43, 26 November 2008 (UTC)

Having just engaged in a long discussion related to these very points, I'm not really sure what our "descriptivist ethos" should be. Languages evolve. They always have and always will. But in times past, the evolution made sense. New words were introduced to describe new things, like "radar", which began as a military acronym and came into common usage. But now, the "Slangists" (if I may coin a word) seem to be in some sort of contest to see how much garble they can create. They make up new words, create new and often vulgar usages for old words, and just generally make a mess! How can we decide, (indeed, are we even qualified to decide), what constitutes valid language? I have no answer, but I will watch this discussion carefully to see if someone else does! -- Pinkfud 14:08, 26 November 2008 (UTC)
Personally, I'm not inclined to waste my time arguing with anyone about #1. It's not like there is some central authority on what "real" English words are. The only meaningful authority comes from the language itself. To put it another way, I don't think it's that the person doesn't realize Wiktionary is descriptive, but that they don't consider descriptivism to be a valid approach. Whatever, their loss.
On #2, if someone can find a general list of words that should be regionally tagged, it would be fairly trivial to go through them and check that this has been done. But I'm not sure where we would find such a list, and I expect most obvious cases have been tagged already. The ones that haven't are the really non-obvious cases like "addicting". Ultimately all entries need to be audited against their coverage in other dictionaries, which will turn up many such gaps, but coverage auditing is a very slow process and it would take a very long time to get to the kind of long-tail words that this person seems to be interested in.
The ideal course of action is for the person in question to join us and fix the problems themself.  :-) -- Visviva 14:21, 26 November 2008 (UTC)
Would we let them? I suspect there would be a few arguments to endure. By the way, did you realize you can take the Latin excīdō (to cut out) and cornu (horn) and create a definition for a certain "word" no one wants to see? LOL! (Now there's another "word" to ponder). Where does it all end? My point is, there's simply no way to keep up anymore. Is it real, or is it tosh? Only your hairdresser knows for sure. -- Pinkfud 14:44, 26 November 2008 (UTC)
Nihilarian is an old (c. 1708) Bishop Berkeley coinage. I'm not sure about any valid current use. Addicting is interesting. I had engaged a user on its talk page. The discussant definitely took a prescriptivist stance. It hadn't occurred to me that it might be regional. We do need more tags to live up to be truly descriptive. DCDuring TALK 16:04, 26 November 2008 (UTC)
More use of context tags is definitely part of the solution, but encouraging the use of supporting citations is another key component. An anon user can't come in and successfully argue "This word doesn't exist" when there is a list of durably archived citations supporting its existence. Also, do we have a place where we explain clearly that we are descriptivist and what that entails? That would be a GoodThing to have. --EncycloPetey 17:08, 26 November 2008 (UTC)

How about adding a dictionary checklist to entries? We can choose a list of comprehensive (OED, Webster's 3rd), regional (CanOD), and specialized dictionaries (slang, jargon, etc). Any of our terms can get tagged as to whether it is present or absent in each of the dictionaries.

This would help comfort dictionary users who want a sense of authority. It would also serve as a simple reference citation for the term. It may even help weed out plagiarism which is entered here.

Is there any disadvantage? The only thing I can envision is that if this can be used to generate complete wordlists from copyrighted dictionaries, then it might constitute infringement. Michael Z. 2008-11-26 23:50 z

Well, it also places emphasis on authority over citations. From time to time non-words do appear in dictionaries, such as through hoaxes. There are also phobia "dictionaries" which contain words that (apparently) have never been used outside of those dictionaries. Many people wrongly believe that a word must be in a dictionary before it is "real". When these people proclaim that a word is "not in the dictionary", they mean it is somehow invalid, inferior, or to be avoided (even if the word actually is in some major dictionaries). I therefore feel uncomfortable about feeding into this erroneous way of thinking. --EncycloPetey 02:57, 28 November 2008 (UTC)
But we already do include references to dictionaries (see category:Reference templates), as well as occasional “Dictionary notes” sections. One advantage of doing this in a systematic way and citing non-mentions would be that such non-words would be identified as missing from all the other dictionaries (rather than being accepted as real because they have a reference). I don't see how adding more factual information can be bad. Michael Z. 2008-11-28 15:59 z
Information good. But it should be noted that many non-words appear in multiple dictionaries, because the lexicographers simply copied from each other instead of doing their own research (perhaps an understandable failing in the pre-computer days). -- Visviva 16:11, 28 November 2008 (UTC)
Cool. So that means some of them may be suitably attested for inclusion in Wiktionary! Is there a name for such words? Has anyone written an article about them? Michael Z. 2008-11-28 17:04 z
They're called ghost words. See for example dord. Equinox 17:17, 28 November 2008 (UTC)
Aren't we essentially doing that a lot of the time by avoiding "inferior" words that aren't in Google Books or Usenet? Equinox 15:01, 28 November 2008 (UTC)
No, that is examination of actual usage, and not through appeal to authority of other dictionaries. --EncycloPetey 15:44, 28 November 2008 (UTC)
Nope. It's true that we omit such words, but we don't take the view that omission is a mark of inferiority. (Selective description is not prescription.) —RuakhTALK 16:00, 28 November 2008 (UTC)

I attempted to write an essay at Wiktionary:Descriptivism about this, but it really needs someone who undetrstands it better than I (and, more importantly, can write better than I) to redraft it. Our CFI are arbitrarily chosen to reflect the set of words we are interested in, we could change them to be stricter - perhaps to books only, or looser - and include the entire internet, without a change to our basic philosophy of describing what we see. However, in my opinion, and no doubt in other peoples, the dictionary would be less uesful with such changed CFI. The problem I can forsee with a "dictionary checklist" type of idea is that it encourages competition - either we try and force in as many words as possible so that we beat other dictionaries, or we start removing entries for the fear that we might not be as right as we had assumed. We could have a set of links automatically added to every page that look up the word in other online dictionaries - this would make us more useful as people will be able to find information that we don't have, and increase people's faith in the definitions we give - at the expense of encouraging fewer people to add new information here, and possibly those who do will just copy from the other dictionaries. I don't see the absense of dialectical labels as a huge problem compared to that of missing definitions and words - but again presence of accurate ones in many entries would make Wiktionary yet more useful. Conrad.Irwin 16:39, 28 November 2008 (UTC)

I do appreciate both sides of the coin, and I'm not sure which side's advantages outweigh the other's. I guess we're mostly speculating here. I do find that after I've created an entry, I do like to compare it to as many references as is practical, to ensure that I haven't made any flubs, and that I haven't inadvertently created an entry which appears to plagiarize one of them. Michael Z. 2008-11-28 17:13 z
That approach is perfectly valid, in my opinion. Making comparisons with published sources for the purposes of refinement is good. It is in using solely the authority of other dictionaries to determine relative merit of an entry where I have a problem. True, inclusion of a word in a major dictionary is persuasive for including an entry for which we might have trouble finding citations, but this is not a justification in and of itself. Rather, it is a stop-gap argument until supporting evidence can be found. Neither is absence from major dictionaries an argument for omitting an entry. "Absence of evidence is not evidence of absence." It is the argument from authority where I take issue. Ideally, entries should be evaluated on the basis of their own merits or flaws, independently of their publication history in dictionaries, and that's the point I think many detractors fail to realize. --EncycloPetey 19:05, 28 November 2008 (UTC)

This year's Christmas Competition is announced and is open to all contributors!
--EncycloPetey 02:52, 28 November 2008 (UTC)

Hi. I just registret myself here on en.wikt and got a nice welcomemessage witch led me to this:Template talk:wikipedia. This is of cause an extreme case, but all the links "wikipedia" in the lefthand menubar called "in other projects" is not very intuitive in my opinion. What do you think about using my proposal on w:Wikipedia:Village_pump_(proposals)#Sister Projects here on Wiktionary? Prillen 10:55, 28 November 2008 (UTC)

If you have the javascript that implements this, I see no reason not to, but I feel that as we shouldn't be linking to more than one wikipedia article anyway (they have the disambiguation pages), that it's not that relevant. It also would bring problems for pages with long titles. Conrad.Irwin 14:53, 28 November 2008 (UTC)
Well, a page could link to maore than one Wikipedia article legitimately, but it would be linking to articles on different language editions of Wikipedia. For example, if pruba is a Spanish word meaning "foobar" and is also a Polish word meaning "doo-jigger", and if each has an article on their respective Wikipedias, then both of those WP articles would be linked from that page. It's uncommon (relatively speaking), but it does happen. --EncycloPetey 15:42, 28 November 2008 (UTC)

The template's talk page is full of tests, so this is not a fair demonstration of a problem needing to be solved. Can we see a real example where there is a need for additional icons?

Do we have a guideline which recommends single or multiple project links? Michael Z. 2008-11-28 19:40 z

Wiktionary:Links is the page where linking issues are covered. --EncycloPetey 19:49, 28 November 2008 (UTC)

I'd like to create a subcategory in Category:Geography to collect words like city, city state, village, town, suburb, megacity, metropolis, megalopolis, country. Would Category:Settlements be a good name? --Panda10 14:27, 28 November 2008 (UTC)

I'd say they are types of settlement. Actual settlements could be geographical places like London. Equinox 14:41, 28 November 2008 (UTC)
How about Category:Communities? BTW, Roget's has a city concept under Geography. --Panda10 18:48, 28 November 2008 (UTC)
Will it include suburbs, rural municipalities, counties, shires, districts, provinces, states, countries, federations? Only political subdivisions, or other kinds of regions too?
Wikipedia has a whole category tree for this: see w:category:Settlements and w:category:Country subdivisions. Perhaps we can start with a simple subset of it. Michael Z. 2008-11-28 19:11 z
The Wikipedia categories contain proper nouns in the end. The proposed category would not contain proper nouns. It is for the actual common words where people may live (but not buildings, we already have a category for that). I can actually just put them in Category:Geography. --Panda10 20:57, 28 November 2008 (UTC)

A related question: is there a word in English for words such as road, street, avenue, lane, crescent... i.e. kinds of odonyms? This would make a very useful category. odotype would be a good word but, unfortunately, it means something else. Lmaltier 21:57, 29 November 2008 (UTC)

These words are currently in Category:Roads. --Panda10 22:06, 29 November 2008 (UTC)

We seem to have pretty much settled the question of short combining forms such as m'y, s'y, m'en, and m'a in the RfD discussions of those terms. I think we can also agree that the arguments in favor of keeping those do not apply to longer combining forms that are not, for example, likely to be mistaken for similarly spelled English words. Examples brought up in the discussion include m'était and m'arriveront. While the argument for having entries such as these would be pretty weak, I see no reason why we ought not redirect the commonly occurring combining forms to their uncombined verb, i.e., redirect j'était, m'était, t'était, s'était, l'était, d'était, c'était, n'était to était. Is there any particular reason why we ought not do so? bd2412 T 12:50, 29 November 2008 (UTC)

In any case, there is a need to be very careful: all forms you mention exist, except j'était (the right form is j'étais) and d'était (this form simply does not exist). :Possible issues about such redirects are:
  • there would imply the creation of a very large number of redirects: for many (or most) French verbs beginning with a vowel, there would be a need for m than 100 or 200 redirects (for each verb); for most French nouns and adjectives beginning with a vowel, there would be a need for 2 redirects; for many English nouns, there would be a need for 1 redirect (e.g. mother's), etc. Is this a real issue? I don't know.
  • the same form might exist in different languages,
  • the same form might imply different redirects, even for a given language (I have no example, but this might happen).
This is an interesting question, because words including a ' are very numerous in some languages (e.g. Breton). Even in French, there are a few examples which are proper words , such as presqu'île (a combining form, but an actual word), périph' (abbreviated word) or chem'not (a regional word).

Therefore, there is a need to define a policy for helping readers when they find such a word. A proposal might be to add a rule in some 'Help' page:

For searching something such as m'y, proceed in several steps:
  • if you feel this might be a word, first try to search m'y; if you don't find it, try to search m (or m'?) and y.
  • if you feel this might be a combination a several words, first try to search m (or m'?) and y; if you don't find them, try to search m'y.
Lmaltier 13:56, 29 November 2008 (UTC)
Oops, j'étais) and d'était came up as false positives when I was googling for combined forms. In any event, it may be 100 or 200 redirects counted by a particular lemma, but it would be no more than a dozen for any individual article (since each conjugation gets its own entry here). The system can easily handle that. I'm not advocating redirecting 's forms for English words. Presumably, people who use an English dictionary will know what the 's signifies (if they don't know enough English to know that, they might be better served with a dictionary in another language). But an English speaker might not know that "n'était" is a combined form, as English doesn't have forms that combine from the front (unless you want to count 'tis type words, for which we have entries). bd2412 T 15:21, 29 November 2008 (UTC)
But this affects many other languages that do something similar, but worse. In Galician, as in Spanish, some pronouns may be suffixed to the verb, but (unlike Spanish) certain articles appearing after infinitive verb forms with an enclitic pronoun undergo a spelling change and are attached to the previous verb, which also undegsoes a spelling change. I have no idea how to add such combined forms as entries. There is no apostrophe or other mark indicating it is a contraction. There is a severe spelling change, so that one cannot look up the components unless you already know it's a combined form (and this can be very hard to recognize. Worse, the resulting combination after contraction is not a word. It's a scribal form representing what happens in speech when a noun's article comes immediately after a verb. It just happens that in Galician, the pronunciation change is reflected in writing. It would be like "eatwethe" or "walktheythe" if English did such things. --EncycloPetey 16:28, 29 November 2008 (UTC)
Indeed. And Spanish, though not as bad, does add accent marks in many such cases (dándole, dármelo, etc.), and contracts -os + nos into -onos. And a different kind of problem: in Hebrew, all one-letter words are written solid with whatever word comes after them, which can produce a long chain of many one-letter words followed by a single two-letter word — how to handle something like וְשֶׁכְּשֶׁמֵּהַפֶּה (v'-she-k'she-mei-ha-pe), and that when from the mouth)? We might not find one solution that works for all languages; for French (where apostrophes always mark the boundaries) and Hebrew (where there are infinitely many possibilities, bounded only by the extremes of syntax, and an infinite subset of them are attested), I advocate simply not having such entries, and perhaps adding fancy JavaScript to redlink pages to help users find what they're looking for. We're a dictionary, and while we've stretched the bounds of that in some respects, there are limits. For Spanish and Italian, maybe we want to develop an "only in"-type non-entry that has a link to each component word, plus a link to an appendix? (Ideally, this non-entry would be able to co-occur with a real entry for an actual word that happens to be identical.) And for Inuktitut, I advocate abandoning all hope. :-P   —RuakhTALK 17:42, 29 November 2008 (UTC)
For the contracted forms that I am immediately concerned about, this is not a problem. bd2412 T 17:55, 29 November 2008 (UTC)
For the “contracted forms” that you are immediately concerned about, there is no problem; we don't need entries for them. They're all sum-of-parts, with parts that are instantly discernible to anyone with any knowledge of French. —RuakhTALK 06:00, 30 November 2008 (UTC)
The problem is that there are plenty of people out there who speak English (and would use an English dictionary) but have no knowledge of French. They may still come across a French phrase in a book or magazine and turn to us in bewilderment. If they are unaware of the significance of an m' or j', they might not know to strip these when looking up the verb, and may be unable to find our entry on the word. bd2412 T 06:27, 30 November 2008 (UTC)
Yes, but they might also have no knowledge of French syntax, and thus how to decipher subject and object. Do you also advocate including entries for every single French clause (I suppose doing so could boost our entry count considerably :P)? Simply put, we cannot have entries for everything someone might want to look up. There comes a point where someone who wants to understand a language must learn some things about that language. We are a dictionary, not a translator engine. -Atelaes λάλει ἐμοί 07:11, 30 November 2008 (UTC)
I am proposing redirects (not entries) for words formed by the contraction of two terms with an apostrophe (i.e. no spaces). The problem is that someone unfamiliar with French may see "m'appelle", look it up, and find nothing here even though we have separate entries on m' and appelle. Clauses are different because the reader can at least find the individual words in the clause. bd2412 T 07:31, 30 November 2008 (UTC)
I think the best solution is to include common compounds of this sort in the respective entries, so that appeler or whatever has an example that includes "m'appelle." This would make the relevant entry show up in a search. -- Visviva 02:37, 3 December 2008 (UTC)
That would have us coming up with 5-6 example sentences for each French verb that starts with a vowel. Redirects would just be easier on the brain. Maybe we could have the combined terms on the page under ====Derived terms====. bd2412 T 06:24, 3 December 2008 (UTC)
Generalizing from this language-specific instance I really like Lmaltier's idea about a Help: page that lists the bounds of our dictionary (because we do/will have bounds). It could be divided by language (or with subpages) and would include information like the outcome of this debate and the vote to disallow English form with 's. Maybe including it at Help:Searching (which I just realized existed)? We could like to it from the page users get when their searches fail. --Bequw¢τ 07:53, 30 November 2008 (UTC)
Not sure what level to indent this on. As I've mentioned before w.r.t. the Hebrew forms Ruakh mentions above, I think these deserve entries, or at least redirects, so that people can find them who don't know where to split the word(s), as bd2412 argues above.—msh210 19:01, 1 December 2008 (UTC)
Wouldn't that lead to an infinite number of such entries? I'm not a technical guy, but that seems problematic. Last I checked, our application for an infinite amount of server space was still "under consideration". ;-) -- Visviva 02:37, 3 December 2008 (UTC)
I agree both with Msh210 and Visviva: I think these entries should be limited to attested forms (with quotes) and created only by contributors feeling these entries are useful (not by bots). I would propose the same rule for numbers and all such infinite lists. But this is a different discussion. Lmaltier 17:09, 3 December 2008 (UTC)
Would it help if I narrow my proposal to words spelled in the Roman alphabet? An English speaker confronted with a completely different alphabet is not likely to confuse those words for a comparable English word, or believe that the combined form is a single word of the type commonly found in a dictionary. bd2412 T 06:27, 3 December 2008 (UTC)
  • In Sanskrit there exists mandatory and highly-elaborate sandhi at word boundaries which joins words in chunks such as "mahāsmṛtidharastattvaścatuḥsmṛtisamādhirāṭ", and which can often be very difficult to decode esp. by beginners who are too lazy to memorize sandhi grid completely, and if you don't know many of the words in the text you're studying (i.e. you have to guess where one word ends and the other one begins to look it up in the dictionary). Obviously, in cases such as this, and in the cases of polysynthetic/highly-agglutinative languages (someone mentioned Inuktitut), it would be pointless to resolve this mechanically by redirects or entries whose number would exceed that of the corresponding lemmas by several orders of magnitude. The best way to solve this IMHO would be to generate all (relevant/common, to the extent that the generation of this sort is feasible) possible outputs of this sort by means of well-defined templates and include them in the corresponding entries. Take a look at the possessive declension of Hungarian nouns at entries such as szerv. Technically "my organ" in whatever language would be SoP not meriting a namespace entry, but it nevertheless shows up in search results and it's "there" for interested users to look it up. --Ivan Štambuk 06:08, 3 December 2008 (UTC)

The Kangxi Zidian that used to be at www.kangxizidian.com and is currently linked to from 21,000 pages as a source seems to have disappeared. Is there an alternative avaliable, or do we need to remove every link? -- Prince Kassad 19:29, 29 November 2008 (UTC)

Hopefully, the links were made using a template so that we can fix all the links by simply updating the template. --EncycloPetey 19:38, 29 November 2008 (UTC)
That would be {{Han ref}}. Nadando 19:52, 29 November 2008 (UTC)
Of course, modifying only the template (and not the individual entries) would leave each entry with some junk syntax. Now I understand where junk DNA comes from. bd2412 T 21:33, 29 November 2008 (UTC)

français is in the French Wiktionary

A while back, Conrad.Irwin created a preference (see Wiktionary:Preferences) to "[t]rial the javascript prominent interwiki links." What it does is, when you visit an entry for a foreign word that has an interwiki link to its language's Wiktionary (for example, when you visit français, since fr:français exists), it adds a little link under the L2 header. I've been using it for a while now, and I've neither experienced nor heard mention of any problems with it. I think it's a great feature, and I'd like to make it standard. (Technical details: I'd copy User:Conrad.Irwin/iwiki.js to MediaWiki:prominent interwikis.js, modify MediaWiki:Common.js to import it, and remove it from the preferences list at User:Connel MacKenzie/custom.js.) Does anyone object to my doing so? (I'll wait a few days to give people a chance to try it out a bit and make sure no one has any concerns or objections.) Thanks! —RuakhTALK 18:26, 1 December 2008 (UTC)

I just turned it on and refreshed a few times, but I don't see the link (in Safari/Mac).
I'm not convinced that a link to the entry in the word's native language deserves extra prominence at all, and I don't think it should be added to the body of the English-language entry. It is not guaranteed to have a better-quality entry than the English or any other Wiktionary, and, a priori, the majority of such links will be useless to the majority of readers. Michael Z. 2008-12-01 18:51 z
Re: not seeing the link: That's odd. Do other preferences work for you?
Re: lack of guarantees: Well, there's also no guarantee that a given Wikipedia entry will be terribly helpful, yet we include links to those — indeed, links that IMHO are significantly more prominent than the proposed FL-wikt links. But, a few points:
  • While it's not guaranteed to have a better-quality entry than ours, there are things that we exclude as a matter of policy, such as translations to other foreign languages, which can only be found at the native-language entry.
  • While it's not guaranteed to have a better-quality entry than that of some random Wiktionary, it is almost-guaranteed to be more useful to the typical reader. If I'm looking up a French word on the English Wiktionary, chances are much higher that I know some French than that I know some Spanish, and chances are that I know much more French than Spanish.
Can you clarify your statement that "a priori, the majority of such links will be useless to the majority of readers"? Are you saying that most readers looking up a French word don't know any French?
RuakhTALK 19:31, 1 December 2008 (UTC)
Now WT:PREFS seems broken completely. I checked a couple of items items and hit "Save settings”, but they didn't show up in other pages. So I went back and reloaded the prefs page, and now they don't show up: “Category: Wiktionary pages with shortcuts” appears directly below “Save settings to refresh view.” (This happened once before, and I couldn't fix it, but yesterday I found that it had reset itself sometime.) I have at least 12 cookies from wiktionary.org, and don't know which ones to delete to reset the prefs.
I'm having the same problem with Chrome on Windows Vista (Firefox 3.0.4 works fine though). --Bequw¢τ 10:53, 2 December 2008 (UTC)
It's the WiktPrefs cookie, the one whose value is a long hyphenated string of ones and zeros. —RuakhTALK 15:21, 2 December 2008 (UTC)
Thanks. Refreshed, but it still won't work in my Safari. Michael Z. 2008-12-02 17:25 z
Regarding usefulness: there are 172 Wiktionaries. I can maybe read about 5 of those, and perhaps make use of information on the page in a dozen more. All of this information is already linked from the sidebar.
Now you are proposing duplicating one of these standard links in the body of the English-language page—this will give me ready access to about 150 web sites which I cannot use and don't intend to try. I can't even distinguish characters in Chinese, Devanagari, or Arabic, so I don't need the English-language interface being cluttered with tens of thousands of redundant links to these and many other sites.
It's great that some editors like this feature, and that you can add it to the interface. But the the majority of reader's can't benefit from 172 dictionaries, so there's no point in watering down an already very dense and complex default page design with this. Michael Z. 2008-12-01 21:18 z
Well, how often do you look up words in those scripts? You won't come across these links unless you're looking up these words. And remember that a large part of our target audience got here from Google, doesn't understand the WMF interface, and won't notice the interwiki links no matter how useful they'd find them. (And "tens of thousands"? It's true that there are many affected pages, but any given page will have only a small number of extra links, if any at all. Why are you opposed to this, but not to the "upload file" link that's on every single page of the entire site?) —RuakhTALK 13:10, 2 December 2008 (UTC)
If you're saying that I, or some average reader, rarely looks up non-Roman words, then please show some statistics to support this. I think it's a mistake to make assumptions about how people use Wiktionary.
If you're saying that the interwiki links are a bad interface, then let's get rid of them or improve them, rather than adding more interfaces to the page.
The upload file link is not hanging awkwardly from a top-level heading in the page body. Michael Z. 2008-12-02 17:25 z
Wait, so you can make the assumption that most people looking up a Foo-language word would be unable to derive any use from the Foo Wiktionary, but when I express doubts, you ask me for statistics, and tell me it's a mistake for me to make assumptions?
I'm not sure if the interwiki links are a bad interface; having them is certainly better than not having them. If you have any suggestions for how to improve them, I'm all ears. If your suggestions are such an improvement that these "prominent interwiki links" aren't useful any more, even better.
If you dislike the appearance, we can change it. What do you propose? IIRC, my original suggestion — the one that prompted Conrad to create this preference — was that we link to fr.wikt the same way we link to fr.wiki, using {{projectlink|wikt|lang=fr}} or the like. Would you like that better?
RuakhTALK 18:36, 2 December 2008 (UTC)
Different assumptions. You've implied that any reader looking up, e.g., a French word can make use of a French dictionary—I am skeptical, and I wouldn't build any interfaces based on this assumption unless I had some evidence supporting it. Or I might infer from your comment that English-language readers only look up English words, which I also doubt. On the other hand, I will categorically state that a very large majority of our readers will not want to look at many of the 170-odd other Wiktionaries at all.
Now that I can see it, I don't think it's terrible-looking, although I rather dislike the indentation, which zigs the left margin of the page right at the mainest (not comparable) heading. Also, the vertical space “above the fold” is very precious to us, and I object to wasting even a little of it on foreign-language content.
We do have the sidebar links on the left, but they have the problem of being disconnected from the content, and in this case we want to relate the two. Wikipedia puts a lot of context-relevant content in an implied right column: infoboxes and a majority of images. Since our content is less block-oriented than Wikipedia articles, I think we could make even better use of this, perhaps even with a full-height third column. Right now we only sporadically hand the somewhat awkward Wikipedia and other project link boxes on the right. I think that “see also” links and the TOC belong there too. Michael Z. 2008-12-04 01:46 z
Support I've had it on for awhile now, and I think it a very useful feature. While Mzajac is right that the entry is not guaranteed to be of higher quality, I think that in many cases it is nonetheless (and generally has the advantage of being reviewed by native speakers). The link is subtle, and I wonder if it might be helpful to a number of readers (although certainly not all). -Atelaes λάλει ἐμοί 19:40, 1 December 2008 (UTC)
The link is already in the sidebar. Why not emphasize it with a different list bullet, or with bold link text. From an interface point of view, this would be better than adding a redundant link with different link text, while showing its status in the context of the full list of foreign-language entries. Its function would be analogous to en.Wikipedia's interface for foreign-language featured articles, which many readers may already be familiar with.
In short, add this functionality to the familiar existing interface, instead of adding a second, heterogenous interface for the identical function. Michael Z. 2008-12-01 21:26 z
Someone has written code for the method you propose, Mzajac: w:Template:Link FA, used with w:MediaWiki:Common.js to highlight (for other purposes) specific entries in the language-interwiki list. Maybe we should use similar?—msh210 22:00, 1 December 2008 (UTC)
I prefer Mzajac's suggestion as well. --EncycloPetey 02:37, 2 December 2008 (UTC)
The problem with just highlighting sidebar links is that they aren't often located near the FL entries on the page. Go to the#Swedish and you won't see the sidebar link several screens up. An addition advantage is that Conrad's solution uses the English names (though the sidebar langs can be translated also) which is good on the English Wiktionary. --Bequw¢τ 10:53, 2 December 2008 (UTC)
Yeah, but this feature is meant for readers of foreign languages, no?—it should be in French, etc. If I could only read English, then I wouldn't be interested in it. If I can make any use at all of the linked page, then I should at least be able to work out “français is in the French Wiktionary” in French. Michael Z. 2008-12-02 17:25 z
That's not a bad idea. Currently we only store "fr"->"French" etc., but there's no reason we couldn't store "fr"->"%s est au Wiktionnaire français" or the like. —RuakhTALK 18:36, 2 December 2008 (UTC)

Update: I've now added a sample link to the top of this section, so y'all can see what it looks like. (That's what I see under the ==French== header at français.) As you can see, "prominent" might be too strong a word: it's not exactly a 72pt blinking marquee in bright magenta. —RuakhTALK 15:13, 2 December 2008 (UTC)

Okay, I finally see what it looks like—also found that it works in Firefox, but after refreshing the cookie it still doesn't show up in my Safari/Mac. Do we know if this is broken in Safari and some other browsers, or is it just something wrong on my machine?
Why is the text indented away from the left margin, which every other element on the page aligns with? This makes the language heading look awkward. Michael Z. 2008-12-02 17:25 z
Re: Safari: I don't know, I'll try to figure that out. (But I don't have Safari on this machine, so I can't do that right now.)
Re: alignment: If people prefer a different display, that's easy to change. For that matter, we can also add a class="" that would let users customize its appearance or remove it entirely.
RuakhTALK 18:36, 2 December 2008 (UTC)

I've recently found myself adding countable senses to our entries for leukemia and nitrogen, among others, and I'm wondering if this is the right way to handle these cases. There are an awful lot of these... as someone noted recently, most "uncountable" nouns can be counted in certain contexts. However, these contexts often differ greatly from one case to another. For example, "milks" can refer to either types or containers of milk... "nitrogens", on the other hand, can refer to either atoms or isotopes of nitrogen. ("Nitrogens" may also be used to refer to containers of nitrogen, but this doesn't seem to have much currency in print.)

I wasn't able to get much help from my usual references; the OED, WordNet, Macquarie and Webster's-1913 all ignore countability entirely. The MW3 asserts (tacitly, by providing a plural form) that both "leukemia" and "nitrogen" are countable... the Longman Dictionary of Contemporary English labels both words uncountable. I wasn't able to find any dictionary that tackles the issue head-on.

There seem to be three main approaches we can take:

1. Ignore the distinction entirely, and keep all such words as "uncountable" with an uncountable definition. This is the status quo for most entries.
2. Label the inflection line "countable and uncountable", with a single multi-part definition.
3. Label the inflection line "countable and uncountable", with separate definitions for the countable and uncountable uses.
4. Ignore the distinction entirely, and change all such entries to use the default, countable inflection line

I prefer #3, since it is the most transparent for the user. On the other hand, it is also more work for editors, and may make our entries more complex (and harder to maintain) than necessary. What do y'all think? -- Visviva 02:28, 2 December 2008 (UTC)

I prefer #3 as well, although it is less than ideal for some situations. Many plants have this problem (e.g. corn/corns, wheat/wheats), where the plural applies only when more than one variety is meant. For multiple individuals of a single strain or variety, the noun is uncountable. Relfecting this in an additional definition is awkward. Perhaps for these, we could include the information as a standrad template used in a Usage notes section. --EncycloPetey 02:36, 2 December 2008 (UTC)
A fifth option is to exclude countability from the inflection line and mark only uncountable senses. I would rather that the "-" option was considered as "plural form to be added" instead of treated as uncountable. Option 3 is satisfactory in its completeness, but not in its esthetics, IMHO. If the uncountable tag provided a blue link to a useful article in an Appendix or at WP, that would be wonderful. Any large classes of nouns that had similar characteristics in this regard, were conveniently identifiable, and had mostly regular contributors might merit templates.—This unsigned comment was added by DCDuring (talkcontribs).
Remember that entries will continue to be refined. If we go with no. 3, then will 90% or 99% of all “uncountable” entries end up marked “countable and uncountable?”—if so, then the distinction will lose its meaning.
Aesthetically, I like entries which combine the senses, and say something like sausage (1): “a type of food, or a length of it, or an example of one.” The countability is self-evident from the elements of the definition. But again, I suppose this will eventually get refined and subdivided into two or three senses.
I remain a proponent of the term mass noun, which to me means “not usually counted (but, like every other English noun, sometimes counted).” Michael Z. 2008-12-02 05:31 z
I go with option 5 as proposed by DCDuring. In many cases this would allow us to give the normal definition as uncountable and to describe what might be counted with a countable definition line. Conrad.Irwin 09:26, 2 December 2008 (UTC)
Option 3 does that as well. The problem with option five is that it leaves the plural in the inflection line without noting there (for normally uncountable nouns) that the plural form listed is unusual. --EncycloPetey 10:32, 2 December 2008 (UTC)
Option 5 leaves it the same as normal, we have no way of indicating which definitions are more common. (Not a counter-argument, just an observation) Conrad.Irwin 19:27, 2 December 2008 (UTC)
I'm not a huge fan of "countable and uncountable", because to me it suggests that the word is both at once, when in fact it's one at a time. I'm also not a huge fan of stretching to include rare countable senses like "a specific variety of milk", because you can do that with any uncountable noun, and it seems nonce-y to me. (Note: that may not be true with this specific example; it may well be that milks (varieties of milk) is a common term in some field. Hopefully my point stands whether or not it applies to this case.) How about:
6. Have the inflection line say “milk (usually uncountable; plural milks)”, and don't bother defining the term countably unless there are particularly common or non-obvious countable uses. (In the case of milk, I think "an order of milk" is probably warranted, since IME at McDonald's you ask for "a milk" rather than "milk" even though the latter would work fine.)
 ? —RuakhTALK 13:03, 2 December 2008 (UTC)
This looks like an improvement to me, although a comma will suffice instead of the semicolon. It still has to be relatively simple to enter, and I think we still have to account for the (rare?) cases which are only uncountable. Michael Z. 2008-12-02 17:30 z
I'd be down with a comma. And yes, we need to account for only-uncountable cases, -slash- cases where the plural isn't well-enough attested to merit an entry. —RuakhTALK 19:32, 2 December 2008 (UTC)
6 sounds good. Of course, we can't automatically change {{en-noun|s|-}} to read that, as it currently gives no preference to the uncountable, merely saying the noun appears as both types, so we'd need to go through them by hand.—msh210 18:42, 2 December 2008 (UTC)
Oh, darn, that's a good point. :-/   —RuakhTALK 19:27, 2 December 2008 (UTC)
We could, however, change {{en-noun}} to give this output for {{en-noun|-|s}}, which currently just gives "uncountable". -- Visviva 01:54, 3 December 2008 (UTC)
Yes, but first would need to check how many entries have {{en-noun|-|s}}, placed there by editors who meant {{en-noun|s|-}}, and who therefore didn't mean to give preference to the uncountable. I know I've done that (though I think I've fixed each time I've done so).—msh210 18:02, 3 December 2008 (UTC)
Option 5 would be my personal preference (because long inflection lines tend to distract too much attention from the definitions below them, an important detail neglected in the general trend to put as much information into inflection lines as could possibly fit in there) but #6 looks like a fair compromise. -- Gauss 20:01, 2 December 2008 (UTC)

Is there a template and/or guidelines for "previous" and "next" words in a list? Entries for 1, a, and b, place the "previous" and next" under See also. Entries for 2 and 3 place the "previous" and next" right under the Symbol header. --AZard 20:49, 2 December 2008 (UTC)

I'd prefer it under "See also", unless there is good reason not to (e.g., it is attached to an illustration, or would just look weird at the bottom of the entry). -- Visviva 12:20, 3 December 2008 (UTC)
Agreed. I think placing them on the inflection line is a bad idea (aesthetically and logically). --Bequw¢τ 20:34, 3 December 2008 (UTC)
There are several other instances of this kind of infobox - see helium and terzo for a couple of examples. SemperBlotto 17:09, 3 December 2008 (UTC)
There are infoboxes in use on the signs of the Zodiac (see Virgo, e.g.) and there are infobox templates available for cardinal and ordinal numbers (using {{cardinalbox}} and {{ordinalbox}}). These numerical boxes are already in wide use for Hungarian, Italian, and Latin entries. A similar template could be created for alphabetical symbols, provided that the infobox is used within a language section, and not in a Translingual section. Alphabetical order, even for Roman-letter European languages, is not the same between languages. Polish dictionaries, for example, alphabetize with the sequence "a ą b c ć ...". Although there are no words that I know of begining with a-ogonek (ą), there are Polish words that begin with "ba-" and "bą-", and these are traditionally alphabetized separately in dictionaries with all the "ba-" words coming first. The Serbian alphabet begins with "a b c č ć ..." when written in Roman letters, but begins with "а б (=b) в (=v) г (=g) ..." when written in Cyrliic script. So even the sequence of the first few letters isn't consistent within Europe. --EncycloPetey 20:44, 3 December 2008 (UTC)

The surname categories are to be changed into parts of speech, like the given names. For most languages it is quite simple: "Category:fr:Surnames" becomes "Category:French surnames". English surnames are a problem. Until now ,"Irish/English/Scottish surnames" have meant "English surnames used in, or typical of, Ireland/England/Scotland". That's the usage in many surname dictionaries, too. But in the Wiktionary, "Category:Irish surnames" must now mean surnames in the Irish (Gaelic) language, such as Ó Murchadha. "Category:English surnames" must mean any surname in the English language, including Murphy, McDonald, Wong, Patel.

I propose to create subcategories for English surnames by language of origin. "English surnames from Irish" will include most surnames typical of Ireland. Many surnames of England may be found "English surnames from Middle English", etc. This is a dictionary, not an atlas. Category:"Scottish surnames", "American surnames", "Jewish surnames", "Islamic surnames" must be abolished because these are not languages. Jewish surnames are a distinctive group, not defined by language of origin, so "English surnames of Jews/of Jewish usage" might be created. There is nothing to stop anyone from making additional categories, e.g. "English surnames of US( =typical of US and rarely found anywhere else)". But they shouldn't replace the language subcategories, and if you only add one surname, it's pointless to make a new category. Usage can be explained in the entry. There is a new Template:surname.

If you have objections, or new ideas, please tell them now. I mean to start creating the categories next week.--Makaokalani 14:36, 3 December 2008 (UTC)

Objection to the "of Jews/of Jewish usage", as it gets into w:who is a Jew and, even discounting that, is pretty much useless, as many, many names have been used by Jews. (My own is an example: I'm a Jew by anyone's standard, I think, but have a surname, Hamm, shared, among Jews, by only my immediate family as far as I'm aware; most people with my surname are of German extraction or African-American.) In other words, I disagree with what you wrote, that "Jewish surnames are a distinctive group". Perhaps you're thinking of "English names from Yiddish"? Yiddish is a language, of course.—msh210 17:59, 3 December 2008 (UTC)
I think we could allow a category for "Jewish" surnames, but not for American, Scottish, etc. This is the one exception I can think of that might actually be useful to our users, since it is not going to be mistaken for a language. There is the question of what to do with "last" names that aren't technically surnames, but I don't think that problem is significant enough to worry about. We can call them surnames, and the few people to whom the difference matters will understand why we've lumped them together. The biggest question in my mind is how to handle surnames used in India, when names may be written in either a native or Roman script. Yes, Patel could be classified as an "English" surname, but that's a bit misleading. I think the proposed solution of regional origin subcategories for English surnames would be a good solution for this. So, "English surnames from Hindi", "English surnames from Tamil", "English surnames from Punjabi", etc. For India, it might be worth also having a region overcategory for "English surnames from languages of India". (with that phrasing to avoid the ambiguity of Indian) --EncycloPetey 18:06, 3 December 2008 (UTC)
It's true that not all Jews have Jewish surnames, and also (incidentally) that not all people with Jewish surnames are Jews, but that doesn't mean there's no such thing as Jewish surnames, just that they form a fuzzy set. Based on my own experience, some probable members include Green, Gold, Cohen, Katz, Levi, Goldwasser, Finkelstein, and Gur. Indeed, to some extent it's possible to distinguish Ashkenaz surnames from Sefardi and Mizrakhi ones. Is it worthwhile? Maybe. There are definitely novels where the reader is supposed to infer a character's Jewishness from such things as his/her surname. I'm not sure to what extent we can help with that, and to what extent we can only hinder. (BTW, if you care, you're not completely alone: google:site:il Hamm does pull up some Jews.) —RuakhTALK 19:05, 3 December 2008 (UTC)
This is a fascinating idea, but I think there will be a lot of incorrect classifications. My own surname (Million) is a fine example, as even I am not completely sure of its origin. Family tradition held that we were French, but extensive searching through genealogical sources has revealed that my family probably came from Ireland in the 12th century with a surname something like O'Mallion. Unfortunately, when you get that far back in the records, you tend to find surnames that mean very little - "son of", "from", etc. People with the same surname may not be related in any way other than having come from the same village, while two close relatives may have unrelated surnames simply because one was born in a different place. But I'm for trying this anyway. It sounds difficult but worthwhile. -- Pinkfud 19:27, 3 December 2008 (UTC)
A category for "English surnames of uncertain origin" (or something similar) is possible. --EncycloPetey 20:23, 3 December 2008 (UTC)
Would this idea also be applied to other peoples with a significant diaspora, or recognizable names? Mennonites in Canada and elsewhere, for example, are associated with a set of surnames of mostly Germanic but also Slavic origin. I'm sure there must be other examples. Michael Z. 2008-12-04 01:15 z

I think we have a fundamental problem with senses vs. translations. I mean that the translations appear in a separate table after all of the individual senses, and if a sense gets added or removed, the translations table remains a separate entity, so it can get badly out of date. (See e.g. abate.) The "obvious" solution would be to attach translations to each individual sense (as we do with citations/quotations); has this been discussed before? Is there any good reason for the separate table of translations? Equinox 23:15, 3 December 2008 (UTC)

Offhand, I can think of two stumbling blocks.
  1. A graceful interface for the reader. Perhaps each of quotations and translations needs a tabbed reveal interface. I'd like to see something more elegant than the current collapsible translation bar.
  2. A usable interface in wikitext, which still outputs reasonably structured HTML. For example, dog has 10 senses, and the first has 180 translations. How does an editor deal with this in the edit box?
These can be overcome, but I don't think we have a solution at hand. Michael Z. 2008-12-04 01:26 z

Definition from Wiktionary
Content avaible with GNU Free Documentation License

Powered by php Powered by MySQL Optimized for Firefox