[go: up one dir, main page]

Jump to content

Wikipedia talk:Article size/Archive 4

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1Archive 2Archive 3Archive 4Archive 5Archive 6

download timings for large articles

Here are my timings using a fast PC with 2464 kilobits per second ADSL broadband in Europe. I first used Web Page Analyzer but I suspect its time model does not reflect actual browsers (too pessimistic), and I also tried OctaGate SiteTimer whose model appears too optimistic and sometimes fails to complete. I have now installed Firebug for Firefox and I am using its Network monitoring tool.

I clear the cache before each timing and nothing else is using the network (except some 256 byte packet every 5 seconds), and in any case the network never approached even 40% usage. All figures are seconds. All accesses made as an anonymous IP, unless "as user" is shown (which was noticeably slower).

Web page                      Manual timing #  Firebug            OctaGate ## 
----------------------------------------------------------------------------
de:Barack_Obama               4 then 3         1.77 then 3.46     1.95 & 2.2   
en:Barack_Obama               6 then 4-9       5.45 then 3.62     1.4  & 5.0
en:Barack_Obama May 18        7 then 5         6.41 then 4.42
en:Barack_Obama as user       9 then 5-9       8.8  then 5.78   
fr:Barack_Obama               2 then 2         1.78 then 2.04     1.35 & 1.35

en:Hillary campaign           6 then 5         5.44 then 5.12     OctaGate fails
en:Hillary campaign repeat    13 then 6        8.54 then 5.42
en:Russia                     20 then 8        15.7 then 8.4
de:Lisa del Giocondo          3 then 2         1.49 then 2.19     0.95 & 1.2
en:Lisa del Giocondo          3 then 2         2.41 then 1.38     1.2  & 1.2
en:Lisa del Giocondo as user  3 then 3         3.71 then 3.14
fr:Lisa del Giocondo          8 then 5         7.3  then 3.18     1.4  & 1.7
citizendium Barack_Obama      3 then 5 !       2.14 - 7.21
                                               then 4.97 - 4.55

info.britannica.co.uk         4 then 3         2.42 then 1.79     1.5 & 2.0
  • "4 then 3" means 4 seconds to load with empty cache, then 3 seconds to reload (with objects already in cache).
  • # Manual Timing - first figure is from empty cache (or freshly sandboxed Firefox), "then" the time to reload. n-m indicates I encountered a range of times, "!" means I was surprised by reload taking longer than fresh load
  • ## OctaGate - first figure is time to download the html,css and javascript, and the second is after all images are downloaded
  • Lisa del Giocondo is a small featured article in english wikipedia, and redirects to La Joconde in the French, but is a mere stub in the German article.
  • citizendium's Barack Obama article is 56504 bytes, with 16641 characters readable prose.
  • Opera was considerably faster than Firefox: it took only 3 seconds to fully render Barack_Obama, and 1 second if images were disabled —Preceding unsigned comment added by 84.223.78.86 (talk) 00:20, 15 May 2008 (UTC)

Oops, I forgot to sign the above - I, User:84user, am the same as 84.223.78.86. -84user (talk) 00:35, 15 May 2008 (UTC)

  • Note Firefox loads Russia in 20 seconds while Opera takes 18 seconds.

(To the table I added Russia timings, and noted Barack Obama now takes 6.4 seconds)-84user (talk) 17:46, 18 May 2008 (UTC)

  • I'm on dial-up. My timings are probably 30 times slower. In minutes instead of seconds. Oakwillow (talk) 07:57, 20 May 2008 (UTC)
  • My older PC locks-up for a long time when trying nagivate all these long articles. Can't something be done to encourage peole to be considerate. I'm also on dial-up making matters much much worse - not everyone can afford hig-speed. —Preceding unsigned comment added by 172.130.155.82 (talk) 21:07, 12 June 2009 (UTC)

GFDL compliance

I have added the following to the spinout section:

I believe this is necessary to secure the GFDL rights of content creators, since Wikipedians do not release their material into fair use but retain authorship credit. This is based on the language at Help:Merging and moving pages, where similar issues of separating text from contribution history exist. The link back to the source is essential at the new article, but it is also important to note the separation at the old article to help guard against history deletion. --Moonriddengirl (talk) 12:38, 2 June 2008 (UTC)

It's actually rather more complicated than that. It's important to note the specific revision where the material was deleted from the original article, because otherwise it may be difficult to find. I usually do this with a specific revision link. It's also important when deleting articles to ensure that any article history relevant to content moved at any time to other articles is copied to the talk pages of those articles. This is currently, frankly, infeasible - I hope a technical solution will make it more practical in the future. Dcoetzee 00:46, 13 June 2008 (UTC)

Unnecessary change

This edit introduces unnecessary error, as admitted by Oakwillow, and isn't needed. We have a means of determining word count (Dr pda's script); we don't need to introduce only 22% error when we have accurate means of determining word count. SandyGeorgia (Talk) 15:00, 18 June 2008 (UTC)

Agreed. This is software, folks; we can count words directly, we don't need estimates or rules of thumb. Wasted Time R (talk) 15:03, 18 June 2008 (UTC)
As is pointed out above, only very few editors have prosesize installed (not even everyone here), which isn't even possible for IP editors, yet everyone has access to the edit byte count. It does not "introduce" an error, it produces an estimate, and clearly states that it is only an estimate. Oakwillow (talk) 00:22, 19 June 2008 (UTC)
And why exactly does 12 produce the correct, or even approximately correct, estimate? (1/12) = 8.5%, so a 22% error looks like 12 is the wrong number. Do you have an analysis? Franamax (talk) 00:53, 19 June 2008 (UTC)
The large majority of editors, including virtually all IPs, never have to know about WP:Article size, because they either don't work on large articles at all or aren't adding big chunks of material to potentially large articles. For those relatively few editors who do need to know about WP:Article size, we can instruct them to install the prosesize tool, so that they have an accurate method of measuring article size rather than a crude guesstimation one. Wasted Time R (talk) 01:08, 19 June 2008 (UTC)
I also agree that a 22% error makes the estimate worthless. Even if it were accurate, it would not provide useful information for the article since there is still no consensus for changing the article away from readable prose and the current numbers regarding readable prose. Tom (North Shoreman) (talk) 01:38, 19 June 2008 (UTC)

It might be worthwhile to take a step back and ask why does this article exist? Is it here just for amusement? Does anyone ever read it? Is it here to actually provide a guideline? What a novel idea. Let's assume the latter. If so, I would suggest that were it to be correctly written, and correctly followed, there would be no complaints about articles being too long, and that if you are getting any complaints about an article you have worked on being too long, it is probably at least twice as long as it should be, because people don't complain about such things until they become a major problem. So take a look at any of the articles that have ever received a complaint about length, divide it by two and that is probably about what the guideline should give for size recommendations. One of the common rules of thumb when looking for cut off points is looking at percentiles, how many exceed the 85th, 90th, 95th and 99th percentile? If 30% of your articles are bigger than your guidelines, your guidelines are too low, if 1%, they clearly are too high. When you don't have to be accurate by more than 50%, an estimate that tells you the word count to within 22% is more than adequate. The way the rule was established was by looking at the 50 random articles above and doing a statistical comparison between byte count and word count and rounding the result off to an easily remembered number. The actual result was something like 12.86, you can look at the table above to see what it was (multiply bytes/character times characters/word to get bytes/word). It's better to err on the plus side than the minus side, so it was rounded down instead of up. There are many articles in the table that have a ratio greater than 12 and many with a ratio less than 12. There are many more reasons for wanting to know the approximate word count than that you are working on one of the 100 or so articles that are horrendously long, by the way. Since there is in fact a strong correlation between word count and byte count, why not tell everyone what the ratio is? Whether I am working on a 500 byte stub or a 200,000 byte article I can get a good estimate just by dividing by 12. Oakwillow (talk) 05:07, 20 June 2008 (UTC)

Well, if I'm looking at the right table, it's a little weird, Whitstable at 1.89 bytes / word is particularly impressive. Ignoring that though, the flaw is that the byte ratio doesn't take into account the overload of WML, for instance reference-heavy and table-formatted articles (such as the list of snakes). The item of interest might be the standard deviation of those ratios, just as much as the mean - wouldn't s.d. tell us how "error-prone" the rule of thumb is? I tried putting it into Excel, but I couldn't even get your mean value, so my s.d. of 7.4 is likely wrong too. In fact I'm probably reading the whole thing wrong lol, but I'd still be interested in the s.d. Franamax (talk) 00:01, 21 June 2008 (UTC)
Bytes=edit byte count. So Whitstable just doesn't have a lot of words. Whitstable was a typo. It has a very respectable ratio of 11.89 bytes/word (edit byte count/readable word count). I fixed two other typos as well. I used a linear regression, which will throw out the lists (Hamburg and Snakes), in effect. If I do a linear average without the two list articles I get 13.48 with a std deviation of 4.34. The linear regression is more accurate, because it uses least squares, and in effect throws out the two that were at 26.8 and 30.5 as well. But remember, I rounded down from 12.84 to 12 just to give a rule of thumb. It does not need to be very accurate, and can't be because of the variations in other factors. Oakwillow (talk) 07:13, 21 June 2008 (UTC)

Size Considerations

I'd like to point out here four size factors that I can easily see (well, 3-1/2 maybe):

  1. Prose size - this is the number of words in the article and is a primary consideration for readability
  2. Byte size - the number displayed with the edit box, which gives an indication of the total raw text size of the article, including ref text, interwiki links, wiki links, etc.
  3. Download size - byte size minus WML plus HTML and most significantly including the size of images, which can easily double/dwarf the text size. Popping in images is easy, it takes 40 text bytes, but images can quite often be 100 KB apiece. This is a major factor when the user first downloads a page.
  4. Rendering size - number of templates and footnotes comes to bear, though I'm not confident as to the precise effects. Have a shot at WT:WPSPAM for a heavily template-loaded page (count the seconds 'til you can scroll it properly), also I've seen anecdotal reports of problems when pages exceed 120-150 footnotes. These manifest in delays to final page rendering and mouse-ability.

In summary, beyond the primary consideration of article brevity, it's also important to think about overall download time (especially considering dialup users, of whom there are many around the world); "time to fully render page in browser", which encompasses raw text size; image size and number; and complications of the wiki software in translating and outputting the entire HTML page. Franamax (talk) 01:54, 19 June 2008 (UTC)

Concentrating purely on the figures...

Starting a new section because the above discussion continually veers off into discussing alternative units of measurement.

Going back to Compromise table #2:

Readable prose size What to do
> 50 KB Almost certainly should be divided up
> 30 KB Probably should be divided (although the scope of a topic can sometimes justify the added reading time)
> 20 KB May eventually need to be divided (likelihood goes up with size)
< 15 KB Length alone does not justify division

I think this table is much closer to the values which lead to articles of acceptable length. There was discussion above on this table, including the comment from User:HermanHiddema:

I do not think it is right to say that about 30% of FA class article "probably should be divided"

It was quite rightly pointed out that circumstances vary with article quality, and that a majority of 30k+ articles are not FA quality (and are probably in need of trimming for readability).

The current values are really too large for average articles. More than 30k of readable prose is very difficult to follow unless it's exceedingly well-written. Right now the upper limit is 100k of readable prose, which is absolutely gargantuan, and the warning bells don't kick in until 60k (which is still a massive amount of readable prose and twice as long as most FAs).

I'm proposing that these values be reassessed. The guidelines could perhaps afford for quality ("GA or FA articles may justify being longer than 30k on the grounds that their material has been confirmed to be well-written and pertinent to the article"), but certainly the "fine below this line" mark has to be moved below 60k of readable prose, which should really be the upper cap - only 19 FAs overstep this, which is enough to justify an "almost certainly" warning on it. Chris Cunningham (not at work) - talk 10:23, 24 June 2008 (UTC)

There is already a discussion of the numbers presented in table number 2. I’m not sure what the purpose is in creating another section to discuss it when the previous discussion did not reach a consensus. I do not agree with the need to make the drastic cuts in the guidelines that you advocate, and I have stated the reasons above.

You seem to suggest that somehow the solution to a large poorly written article is to divide it into two or more poorly written articles. I would suggest that the FIRST recourse to moderately sized articles (based on the current standards) that are not well-written is to edit them until they are well-written.
Part of making an article readable, as at least one editor above has pointed out, is using adequate section and sub-section headings. This is an important part of enabling the reader to navigate (skim) through an article and find the information that they are looking for. You and others seem to assume that the purpose of an encyclopedia is to provide articles that will be read all the way through from the first word to the last -- that simply is not the case.
If the reader is looking for just a general description of the topic, then this should be provided in an appropriately sized lede. Editing the lede is a better alternative than arbitrarily dividing the article. Based on WP:LS and current size guidelines, most articles in the area where size is a factor should have ledes of three or four paragraphs.
Many readers are looking for specific information. Assuming an adequate lede and adequate section and subsection headings, it is easier for the reader to skim through one article for that information rather than being forced to skim through two or more articles. Let’s not forget that when an article is split, there is a requirement (and a needed requirement) to provide duplicate information in both articles (see WP:SS).
This leads to a final concern about premature splitting of an article. Let’s assume that two or more well written, accurate articles are actually created out of one. Keeping those articles well written and accurate against efforts to vandalize or submit POV edits is now more difficult -- it is easier to monitor and maintain one 60 KB article than three 25 KB articles ( the need to provide summaries means that the total text actually increases).
Bottom line -- the best way to create quality articles is for interested editors to do the necessary research and writing required. The article on Summary Style, and particularly Wikipedia:Summary style#Levels of desired details provides the most important guidelines to follow as far as splitting articles. Creating a lower threshold for splitting based purely on arbitrary numbers on the assumption that "one size fits all" accomplishes very little. If an article meets the Summary style criteria for splitting, then make the case based on content -- artificial guidelines can too easily become simply a crutch for those editors inclined to wiki-lawyering who can't otherwise make their case because they don't know the subject matter.
As far as the stats on FA articles, I believe someone else has made the point that it is easier for smaller articles to reach FA status simply because their limted scope makes them less controversial and less likely to attract competing POVs and edit wars -- you mistake correlation with cause and effect.Tom (North Shoreman) (talk) 12:23, 24 June 2008 (UTC)
I disagree that it's easier to maintain a single article: for articles which receive heavy editing, it's difficult to stay on top of large articles simply through watchlisting, and the anti-splitting philosophy necessarily widens the scope of articles substantially (to the point where individual editors aren't going to be able to maintain the whole thing anyway). Arguing that sections and clear summaries eliminate length concerns doesn't appeal to me; That one might not wish to read a whole article should not be assumed to mean that one does not wish to read the whole thing.
These are not iron rules: they are guidelines, based on current usage patterns. That most FAs are around 30-50k of readable prose does indicate that this is a length which suits most articles, and we should fit the MoS around prevailing good practice. The current guideline is essentially useless, as its limits are set ludicrously high - thus discouraging editors from splitting articles into logical chunks in the summary style, and encouraging them to build monolithic titans which favour detail over readability.
You point to Wikipedia:Summary style#Levels of desired details as if it backs you up, but the only metric that section provides is that articles which are split should maintain a summary section roughly twice as long as the split lede in the parent article - and the section directly above agrees with me in saying that "What constitutes 'too long' is largely based on the topic, but generally 30KB of prose is the starting point where articles may be considered too long".
As for why this is a new section, the discussion above is rambling, and several times threads have been ended with withering dismissals that such-and-such was "already discussed" or such. Well, as consensus can change, these supposed conclusions should be directly challenged from time to time, rather than casually brought up and immediately dismissed in tangential sub-threads. Chris Cunningham (not at work) - talk 13:04, 24 June 2008 (UTC)
I’m not sure how you can claim that “but generally 30KB of prose is the starting point where articles may be considered too long” (from WP:SS) supports your proposed change that articles >30KB “probably should be divided” -- you have replaced the starting point with the end point.
Everything else you say is simply replacing your opinion for mine without really adding anything that hasn’t been discussed before. I really don’t have much faith in assurances that these are just guidelines. If you would agree to eliminating such phrases as “Almost certainly should be divided up” or “Probably should be divided” with language such as that below, then maybe the actual numbers could be subject to compromise:
Some useful rules of thumb for determining whether discussions directed at achieving consensus for splitting articles and combining small pages:
> 100 KB There almost certainly should be discussions aimed at determining whether and how the article should be divided.
> 60 KB There probably should be discussions aimed at determining, based on the scope of the topic, whether and how the article should be divided.
> 40 KB It may be appropriate to initiate discussions on future expansion of the article and possible areas that may be moved to new articles.
< 30 KB Length is not a consideration for dividing the article. Tom (North Shoreman) (talk) 15:02, 24 June 2008 (UTC)
Well, we're both putting forward our opinions, and I know I'm not saying anything particularly original: the point is that sometimes it's worth having these discussions again in order to gauge whether consensus has changed. Anyway, yes, I'm definitely willing to discuss the use of the language you've provided together with a more sensible set of accompanying size markers. 20, 30, 50 and 60 would seem appropriate, seeing as the language you've given doesn't mandate splitting but simply encourages discussion of such. We could even add in exceptions for GA/FA, to allow them more lee-way (given that length will obviously be brought up in review anyway). Chris Cunningham (not at work) - talk 15:19, 24 June 2008 (UTC)
So let's see if anyone else buys in to the basic concept before we start debating numbers or refining language. Tom (North Shoreman) (talk) 19:22, 24 June 2008 (UTC)
I'm definitely more in line with Tom than Chris here, particularly in regards to setting up a different standard for FA and GA articles. The solution to a poorly written larger article is not to split it up into two smaller poorly written articles, but rather to rewrite the current article. Especially since one of the main reasons why articles are identified as "poorly written" is that they have a tendency to go down bunny trails and include extraneous info and could do with a lot of tightening up. Also, limiting the exemption to just FA and GA articles would also have a tendency to prevent non-FA/GA articles from exceeding the size restrictions prior to getting the FA/GA tag, even if they are well written. I can understand the desire to reduce the amount of readable text, but whatever the "Rule of Thumb" is, it needs to be applied equally across all articles and in a way that doesn't excessively limit how comprehensive the coverage of the major aspects of the subject is. --Bobblehead (rants) 21:11, 24 June 2008 (UTC)
Part of the problem here is the focus on splitting as opposed to simply keeping within a certain length. Most long bad articles could do with simply beign shortened. And it's not a case of exempting FA / GA articles, it's a case of making a point that the limits are guidelines and that the "few and far between" exemptions allowed for in the wording are by definition going to be articles which the communtiy has assessed as being well-written even where they go beyond the usual size boundaries. Frankly, I don't think that any article should be expanded beyond 40k of readable prose before a comprehensive peer review is carried out to guide its future, and I don't think that we should be tweaking our guidelines for some hypothetical 60k-readable rough diamond. Chris Cunningham (not at work) - talk 01:47, 25 June 2008 (UTC)

I agree that 60k readable characters is way too long. Out of 2106 FA articles, here are the break points:

Readable words Percentile
15,000 100.0
12,000 99.6
10,000 98.9
8,000 94.9
6,000 84.2
5,000 70.9
4,000 54.0
3,000 32.7
2,000 12.2
1,000 0.3

As the number of articles drops off very rapidly above 8,000 words, I would pick that for "Almost certainly should be divided". This corresponds to about an edit byte count of about 100,000 bytes, which sounds about right. The size warning kicks in at 80 KB ("This page is 81 kilobytes long. It may be appropriate to split this article into smaller, more specific articles."), which corresponds to about 6,000 words, so I would pick that for the "Probably should be divided" category. As you can see that is only about 15% of all FA articles. 5,000 would probably be a good spot for the "May eventually need to be divided", and 4,000 for the "Size alone" section. Readable characters should not be used because it is not industry standard and is constantly confused with Edit byte count. Oakwillow (talk) 23:47, 29 June 2008 (UTC)

Article size

Why shouldn't Article size link to this article. I can see this redirect has been created and deleted a number of times. I think it would be a more convenient way to find this article. Eazyskankin (talk) 19:48, 18 July 2008 (UTC)

For the reasons given in the discussion, which you apparently already saw. Chris Cunningham (not at work) - talk 09:24, 19 July 2008 (UTC)

Word count

This version provides fairly generous word counts and also avoids the confusion of using bytes to mean one of two things. In the current version bytes is used often and rarely is it clear whether it refers to readable characters or edit byte count. Instead of adding in text to clarify at each instance it is better to just use word count for readable prose and bytes for edit byte count. It also is consistent with the existing edit warnings, which never refer to readable prose. It also provides a rule of thumb, which is very accurate, for estimating word count. Oakwillow (talk) 04:08, 21 July 2008 (UTC)

There still is no consensus for this change -- or any change -- to the article. You are repeating arguments that you have made over and over -- my objections (as well as my proposed compromse) remain as stated elsewhere on this discussion page.Tom (North Shoreman) (talk) 19:30, 24 July 2008 (UTC)
Obviously the changes are needed. You really need to examine why you wish to maintain a broken guideline. Oakwillow (talk) 20:29, 30 July 2008 (UTC)

Hardware/software

I'm wondering about the sentence, "However, the early iMac G3 (1998 to 1999) is affected by the 32 KB limit; for example, an iMac G3 with OS 8.6 using Internet Explorer 5.1 can only copy and paste or display 32 KB in an editing box". This sounds like purely a software issue to me—probably Internet Explorer, conceivably something to do with OS 8.6. If so, it has nothing to do with the iMac or the G3 processor. Rivertorch (talk) 17:18, 12 August 2008 (UTC)

Online translators

I've noticed some online translators (Google Translate, at the very least) have problems processing bigger pages (Check this page, for example - the point at which the translation stops varies, but, at least during the tests I made with my internet connection, the translator was never able to finish it properly). I believe this might be a valid consideration to include in the "Technical Issues" section of this article. Squeal (talk) 10:13, 6 September 2008 (UTC)

I use google translations a lot, usually by cut and pasting text into the translate window, and I am aware that the translation stops at some point (24,000 chars?), and I just note where the translation stops and paste in the next section. Using translate a web page you can't do that as easily, so it would be nice to know how many words it accepts, but I don't think that it is very many. However, I'm more likely to do this translation (I sure hope that people are not using google translations to create articles in languages they neither read nor speak). Oakwillow (talk) 07:30, 29 September 2008 (UTC)
Considering how inaccurate online translators are, particularly in regards to grammar, I would certainly hope that people are using the article written in that language rather than using an online translator to read up on a subject. --Bobblehead (rants) 08:07, 29 September 2008 (UTC)
Agreed! Machine translation is a measure of last resort. Wasted Time R (talk) 12:15, 29 September 2008 (UTC)
I was talking about writing articles, not reading them. This is, if anyone didn't know, the English Wikipedia, and the example given was to translate an English article into Polish. My counter example was to translate the same article from the Polish Wikipedia into English. I use translations into English a lot to see what the "foreign writers" are saying about a subject. But no, I don't cut and paste the results into en:wiki. It's more to understand what they are saying. Since there are millions more en:wiki articles than any other wiki, it is a good bet that many who speak another language are relying on google translations to read up on a subject. It is probably equally safe to recommend that articles above 3500 words might not be translatable using Google. Any more suggestions on specifying the break point? Oakwillow (talk) 19:21, 29 September 2008 (UTC)

Too long

This page on pages that are too long is too long! --121.45.127.71 (talk) 02:46, 18 October 2008 (UTC)

I've archived most of the closed discussion. I'll do more as the page expands. Chris Cunningham (not at work) - talk 11:36, 4 November 2008 (UTC)

Symbol characters

Do symbol characters and other non-English symbols take up extra space? I just noticed the History of Vietnam article came out at 83kb which seemed very high, until looking through the text I noticed a lot of Vietnamese characters. I mean it's still a pretty long article, but would that weight of symbols make a difference? Mdw0 (talk) 03:32, 3 November 2008 (UTC)

Yup. Unicode can use more than one byte to represent a character. Chris Cunningham (not at work) - talk 11:15, 4 November 2008 (UTC)

Why should we worry about the readability of an article due to length?

This is an online encycolpedia, it seems rather self-defeating to complain about articles that are "too" long just becuase they are "too" long. If you read an article about WWII, then I expect to read a very long, extended article. Why should we worry if it bores the reader? It's information without biased sources, not Entertainment Weekly. If someone is on a page, chances are they chose to read that page for a reason. I think it just comes down to the policy creaters fearing just how lazy people our. Well, if you go to an online encycolopedia you better expect to read a lot! For that reason, I propose that we remove and ignore artivcle complaints becuase of size. I can understand if they're so large that someone could have a hard time finding everything, but that's why there's a table of contents at the front. And to jump from page to page can become annoying and take more time than scrolling up and down a page. Maybe we should focuss on re-structuring articles instead of making them shorter or splitting them up. Thank you. —Preceding unsigned comment added by 76.216.84.28 (talk) 18:22, 25 March 2009 (UTC)

If you are reading an article about World War II, in this encyclopedia or any other, you should expect to get a general overview that leads you to detailed articles and sources. SultrySuzie (talk) 19:56, 25 March 2009 (UTC)
Agreed. No one consults an encyclopedia in oder to get book-length analyses of single topics. That's what wikibooks is for.YobMod 13:22, 26 March 2009 (UTC)

Rule of thumb

Could someone explain how, in the "Rule of Thumb" section, it says "30 kb - size alone does not justify splittng" and then "30 kb - May need to be divided". Even 50kb, nowadays is hard to justify a split for.  BIGNOLE  (Contact me) 03:43, 7 April 2009 (UTC)

It actually says, " < 30 KB ", which means "less than 30 kb" and " > 30 KB", which means "greater than 30 kb." (The signs, of course, denote Inequality. :)) I don't know that 30 is the best number, but it isn't redundant; it's a dividing line. Sorry for missing your question before. --Moonriddengirl (talk) 19:44, 20 April 2009 (UTC)
I know, but it's too close and too vague. 31 kb is greater than 30, but 1 kb of information is virtually unrecognizable both to the naked eye and to a server trying to download the page. 30kb of readable prose is a huge amount of information to be between that an ">60kb". There's too much vagueness to say "anything less than 30 shouldn't be split" and then "anything between 30 and 60" should be thought about being split. 35kb isn't even that large for article, especially not nowadays. But following the "greater than rule", the suggestion is that 35kb of readable prose "may need to be divided". I cannot think of a single 35kb article that would warrant a split because of size reasons. Clark Kent (Smallville) is about 31kb of readable prose (approximately). Another 8kb would be equivalent to about just about the size of the television appearance section (actually would put it close to about 40kb). 40kb is close to an article that is a pretty long read, and depending on whether you have a lot in one section or just a lot of sections, you might consider splitting. But 30-35kb isn't that large, and I wouldn't even presume to say that an article in that range should "be divided". It's too vague. The Rule of Thumb was changed back in May 08, from this discussion, which appeared to take place between the same three people. Ironically, I don't see a single option in their 3 proposals that matches OakWillow's edits "per discussion". Just about any standard FA article is going to be in upwards of 40kb, in order to meet the comprehensive criteria (exceptions being subtopics that are generally small to begin with, like TV episode articles), so I disagree with Oak's change (as it was 40kb in that particular spot beforehand) and as such the current "rule of thumb" layout.  BIGNOLE  (Contact me) 20:24, 20 April 2009 (UTC)
Well, that would seem to be a separate issue. Personally, I think it's a good idea to have the numbers close, or the same (with, perhaps, the ≥ symbol being used for one), since it clearly indicates "Up to here is okay; after here, consider splitting." Changing one, but not the other, leaves us with a bit of a gap in our recommendations. What the number should be, I don't know. I'm afraid that I'm not very cognizant of size issues with articles...I use a computer, but I scarcely know how it works. :) Load times and such are well beyond me, so I'm afraid I'm not going to be much good for talking about a good division point. By the time a page is slow-loading for my set-up, it's probably way too big for dial-up. Maybe 40 or 45 kb, in line with what it used to say, would be a good suggestion? --Moonriddengirl (talk) 00:03, 21 April 2009 (UTC)
I changed it to 45kb because that was halfway between 30 and 60 (I didn't think about removing the < and > signs), but the page originally had 40kb. From the discussion in the archive, I really don't see how "30" was determined for use on both of those cases, because I cannot even find where that was an option suggested during the discussion.
You know what, I just started looking more closely at those links. The original one I provided, where OakWillow made the first change is not a reflection of this current page. I was only looking at the diffs, and not the actual page. The actual page is exactly what they were proposing on the talk page (though there was apparently some issue over consensus, as there was a revert war about changing the page). Here is where the change actually occurred, and I don't think any consensus was ever reached on this. If it was, I think we should open a new discussion about these numbers, because they were making arguments that "No FA article is 30kb, and if it is then it's in dire need of trimming". Jason Voorhees is FA and well over 30kb. Most FA articles are well over 30kb nowadays, so I think we should probably re-evaluate all of these numbers.  BIGNOLE  (Contact me) 00:16, 21 April 2009 (UTC)
Given that, it should be fairly uncontroversial to change the number back. However, I agree with him that, given the inequality notes, the number should be consistent to avoid an unaddressed gap. Proceeding on that assumption, I'll change the top & bottom to 40 (until and unless somebody disagrees :)) as a start. And then may it's time to consider possible ranges for re-evaluating the articles? That would take care of the immediate problem, but also allows time to draw more widely from other contributors who might have ideas about proper size values. --Moonriddengirl (talk) 10:58, 21 April 2009 (UTC)
Having the dual 40s might be better (I understand the point about the gap between 30 and 40). We'll need someone to do another FA size chart (not sure what program they are using to calculate all of those numbers) so we can get a good idea of the numbers.  BIGNOLE  (Contact me) 11:16, 21 April 2009 (UTC)
Do you have any idea who to approach about that? --Moonriddengirl (talk) 11:21, 21 April 2009 (UTC)
OakWillow seemed to put up a percentile for the word counts above, so they might know how to calculate that stuff. According to them, 6,000 words is around 80kb....don't know if that's readable prose or what. But they do not seem to be active (last edit was Oct. 08). We could try asking some of the editors (active) that participated in the previous discussions to see if any of them know how to generate a sample.  BIGNOLE  (Contact me) 11:30, 21 April 2009 (UTC)
Unless you'd rather try them, I think I would prefer to just start with asking a tech-savvy contributor who sometimes does wizardry for me. This sounds like something he should be able to manage. --Moonriddengirl (talk) 11:36, 21 April 2009 (UTC)

If you already know someone, great. That would save us a lot of time. Right now I'm headed to work. Hope to hear some good news on that front. Cheers.  BIGNOLE  (Contact me) 11:53, 21 April 2009 (UTC)

Quicker than I expected. :) As I was going to ask him, I found it's already done. :) As of 18 March 2009: User:Dr pda/Featured article statistics. The mean is 25.9 kB; the median is 23.579 kB. After the 30s, the numbers do decline sharply, but this 40 kB article, Autism, doesn't seem too long for readability. (Not that the rule of thumb says it is, but unless there are page loading issues, I would agree that 40 seems a little conservative on the split range; a higher number would seem more appropriate.) --Moonriddengirl (talk) 11:56, 21 April 2009 (UTC)
Home for lunch, so I have some time to look at this list. It looks like a good sample to use. What number would you suggest then? I mean, it's always going to come down to an article by article basis, but the RoT is a guide for that decision.  BIGNOLE  (Contact me) 16:00, 21 April 2009 (UTC)
I've been trying to get a grasp on download times which, as I said above, are pretty well out of my experience. :) I'm thinking to ask for feedback on this at Wikipedia:Village pump#Technical first, since I would imagine that the people who frequent that forum would be the kind to know. The basic idea here, I gather, is to be sure that even dial-up contributors find our articles usable without too onerous a download time. But evidently this is more complicated than simple "size" issues, by such factors as the number & size of images on a page, for instance. --Moonriddengirl (talk) 16:53, 21 April 2009 (UTC)
The problem that this guideline has is that there are multiple different ways to measure the size of an article and whether or not the article is "readable" and the least accurate of which is the number that appears at the top of an edit section when you edit an article over 32k in editable text size. The current RoT uses readable prose, but as Moonriddengirl notes, this is often not the biggest hurdle readers have with articles. As an example, Barack Obama only has 30kB of readable prose, but thanks to the multitude of images, templates, references, etc. that are on the article, the total file size is 801kB. For those of us that don't use dial-up a 801kB file is only a few seconds load time, so not too onerous, but it is not uncommon for readers/editors on dial-up and/or older computers to appear on the article's talk page to complain that the article either fails to load at all, or takes an interminable amount of time to load. So for those readers/editors, the issue is not the readable prose, but rather the total file size that is preventing them from reading the article. --Bobblehead (rants) 20:59, 21 April 2009 (UTC)
This is true, but I feel like those are things that would have no bearing on splitting an article. I mean, if an article is say 25kb (which is well under the "no need to split" criteria), but has so many "justified" images and templates that it makes it difficult for dial-up users to load the page there really isn't a way to "split" the article in order to make it easier for them. If all of the images, banners, templates, etc. are necessary then in those cases the dial-up user, IMO, is probably SOL when it comes to making the page easier to load.  BIGNOLE  (Contact me) 21:17, 21 April 2009 (UTC)
In my opinion the Obama article is as long as we should really need for any single page. We should be take articles like that and taking them as the upper bound on size - the Obama article is only tolerable because it is very high-quality. I don't think we should be suggesting that pages can be 33% wordier than that before even considering splitting them. Chris Cunningham (not at work) - talk 08:55, 22 April 2009 (UTC)

Well, according to the sample list, Jason Voorhees is 50kb. There really isn't any place to split that article. Just to point out, I opened the Obama article and the Jason article at the same time, and the Obama article took 2 to 3 times longer to load, and I'm using a high-speed cable connection.  BIGNOLE  (Contact me) 11:18, 22 April 2009 (UTC)

Strange. According to the edit screen, its 80kb. Perhaps it's expanded since then? Given the evident sharp decline in the number of articles once we hit 40, maybe the current # 40, which reflects former consensus, remains a good rule of thumb. The rule of thumb allows for exceptions even at the largest of articles, as it never definitely says this "must" be divided, and at 40 only tepidly indicates that it "may" need division. --Moonriddengirl (talk) 11:46, 22 April 2009 (UTC)
I'm refering to the readable prose in Jason (based on that sample page you provided from Dr. pda, not the character size that has all the codes in it. All of those pictures in the Obama article are what make it load slower than the Jason article, which is almost twice its size in readable prose. I think 40 is a better number, because the 40 range (little under and a little over) is generally a more complete article for primary topics (e.g., a TV episode article is a subtopic of another page, and is typically around 5-10kb of readable prose). Film article especially are getting closer to that range, because of all of the areas one must cover (production itself can sometimes take up 20-30kb of readable prose if it was a high profile film, or at least extensively chronicled production) to meet the guidelines set forth by the film community. I mean, copy editing is always helpful, but even a terse article can have so much information (that couldn't be split) that it's going to be longer than 30kb.  BIGNOLE  (Contact me) 12:23, 22 April 2009 (UTC)
I'm prepared to regard that article as an outlier given the fictional / pop culture nature of the subject matter: while it is indeed very cohesive and well-written, it is rather a treatment of the subject than a description of it and I don't feel that its length is so much a function of the amount of data on the subject as to how comprehensively-examined it is. For the sake of it, here's the figures:
  • File size: 226 kB
  • Prose size (including all HTML code): 74 kB
  • References (including all HTML code): 113 kB
  • Wiki text: 78 kB
  • Prose size (text only): 50 kB (8718 words) "readable prose size"
  • References (text only): 15 kB
By contrast, here's the Obama article:
  • File size: 801 kB
  • Prose size (including all HTML code): 63 kB
  • References (including all HTML code): 448 kB
  • Wiki text: 136 kB
  • Prose size (text only): 30 kB (4867 words) "readable prose size"
  • References (text only): 70 kB
Chris Cunningham (not at work) - talk 14:15, 22 April 2009 (UTC)
That's kind of my point. You have two featured, comprehensive articles. The readable prose is completely different, as are the sizes (download time) of the articles. To say that the Obama article is the way it should be is like saying that anything other than that should be split. There should be a happy medium to this. Clearly, if we're talking about just being able to download the page then the Jason article is better. But if we're talking about being able to read the page without feeling exhausted, then the Obama article is better. 30kb is a good size, but really it isn't that much information (not to suggest that it isn't "enough" information, because depending on the structure of the article it could very well be). There are so many factors here that applying arbitrary numbers is extremely difficult.  BIGNOLE  (Contact me) 16:13, 22 April 2009 (UTC)
I'm not sure we're disagreeing, but I'll try to rephrase my argument. Taking 40k as a "start thinking about splitting limit" means regarding the Obama article as the median here, whereas in fact the only reason it's as huge as it is is because of the importance of the subject and the huge amount of QA the article gets. Our larger FAs have already gone through the "might be time to think about a split" phase and the consensus has been that the quality of the article warrants an exception; therefore, the "start thinking about a split" level should be lower than that of existing high-quality FAs, because for articles which aren't high-quality FAs such length often leads to unbalanced articles. Chris Cunningham (not at work) - talk 09:05, 23 April 2009 (UTC)
I think what we disagree on is the number. I think 30 is too low. If you remove all of the crap that is making the Obama article long (and slow to download), it's a short article. I mean, not as short as those TV episode articles, but as far as primary subjects go, it's a short read. The first problem I have is, again, setting an arbitrary number as the cut off point, especially one I feel is too low. The way you had changed it, there was a 30kb gap between the third and fourth level (and I'm not even sure I like the 40kb gap between the final two levels). I think ranges would be a better, where we had stuff like "5-40: Length alone does not justify division"; "41-60: May need to be divided"; "61-80kb: Probably should be divided "; "81-100+: Almost certainly should be divided". It something similar. I think 30 is a bit too small, and we need to break up the huge gap between 60 and 100; because 40kb is a whole article and to go from "Probably should" to "Almost certainly" is a big leap considering that 60 is already a lot, and just 20kb from that can add almost another Obama article to a topic.  BIGNOLE  (Contact me) 11:24, 23 April 2009 (UTC)

Criteria for article splitting

The current guideline is not comprehensive and there is a focus by some editors on the readable prose size. I would like to see the following added to the guideline:

Reasons for splitting an article include:
  • The subject of the new article has a notability of its own right and is therefore of a higher level of interest to some readers than the original article
  • Having the new article allows for a better linking from, and the population of, lists, portals, categories, topic outlines and indexes
  • Where an article has sections for different countries splitting out any country section that is out of balance with the other countries avoids systemic bias issues to some degree
  • Splitting an article may reduce any clutter of External links, See also links and Further reading references. In the new article they will be more applicable to the topic
  • Wikipedia continues to grow and it is easier to split articles earlier and let the individual articles expand rather than attempting to extricate information from a lengthy article at a later date

This is based on my experiences and feedback from other editors after splitting a fair few articles -- Alan Liefting (talk) - 07:20, 16 May 2009 (UTC)

You still have to worry about size issues, only this time the reverse. If the parent article is only like 15kb, and the splitting article is 5kb, there really isn't a need to split.  BIGNOLE  (Contact me) 11:10, 16 May 2009 (UTC)
There are cases where splitting of small articles is necessary. For example Manchild 4/Pick Your Battles, a 1.4kb article, should be split (if they are not merged or deleted). I have split the Shiva (disambiguation) page into 2kb and 1kb pages. The merit of article splitting should rest on whether it makes WP more usable for readers. -- Alan Liefting (talk) - 19:58, 16 May 2009 (UTC)
The first one seems like a poor article creation. If they are two different subjects then they should be separate to begin with. I don't know how you split a disambiguous page. I'm not sure what you're ultimate goal is though. Because, by your reasoning, we could split the reception section of a film page just because reading it by itself might make it more usable for readers. I think it's a lot easier to split articles later. If you split articles too soon then you're left having to pick up the pieces if the articles never expand. This is a major problem with TV episode articles that are split from list of episode and season pages as soon as the episode airs, yet contain nothing but a plot summary for years afterward.  BIGNOLE  (Contact me) 22:37, 16 May 2009 (UTC)

According to this page: "You can set your preferences to make links to pages smaller than a certain size appear in a different colour. "Size" in this context means the size of the source text seen in the edit box."

Is this accurate? If so, can someone point me to the Preferences setting that allows this? More specific direction in the page itself would probably be nice as well. Propaniac (talk) 17:36, 1 October 2009 (UTC)

Yes. Go to My Preferences -> Click on the Appearance tab -> Scroll down to Advanced settings -> Change the "Threshold for stub link formatting" to something other than 0 bytes. Links to pages whose source text size is less than this threshold will then show as a dark red. Dr pda (talk) 20:57, 1 October 2009 (UTC)
Ah, thanks (and it does work in categories, too, as I had hoped!). I'm going to insert a note in the article because it would have taken me a very long time to realize that "Threshold for stub link formatting" is the same thing as "Make links to small pages look different." Much longer than it took to decide I was looking in the wrong place or the option didn't actually exist. Propaniac (talk) 13:04, 2 October 2009 (UTC)

Exception for controversial subjects?

Under "Occasional exceptions", you do not mention controversial subjects. The article Abortion is under 100Kb because some of the volume is covered in Abortion debate, Pro-choice, and Pro-life. Is this a good thing? No, it is a clear example of content forking. On the other hand, George W. Bush, a nice example of the NPOV style, is 170Kb long as it should be.

I say that NPOV is more important than article size and that some controversial subjects cannot be adequately covered within 100Kb. This exception should be explicitely added to this page. Emmanuelm (talk) 11:56, 11 October 2009 (UTC)

Possible size problem with a list

See the discussion at Wikipedia:Featured list candidates/List of Knight's Cross recipients of the Waffen-SS/archive1. Dabomb87 (talk) 02:55, 30 October 2009 (UTC)

overlong plot

hello. new and can't find the right {{ }} to indicate that the Lovers in Paris plot section needs to be edited for content and length. thanks. GrammarEdits (talk) 07:28, 7 December 2009 (UTC)

Template {{Plot}} works for me. —Aladdin Sane (talk) 08:11, 7 December 2009 (UTC)
thank you! been trying to find an appropriate "summarize" one. GrammarEdits (talk) 08:14, 7 December 2009 (UTC)

As long as WP:SUMMARY is a style while WP:NPOV is a guideline, articles will grow in size

For having worked on several summary & main article pairs, I can tell you that most editors do not understand this concept. Unfortunately, because WP:Summary is a style, not a guideline, there is no way to enforce it. Worse, there is nothing in WP against content duplication; WP:Duplication is a red link.

If WP wants to enforce a limit on article size, WP:summary style must be promoted to a guideline. Until then, long articles will remain an unavoidable consequence of the NPOV guideline. Emmanuelm (talk) 14:53, 12 April 2010 (UTC)

MoS naming style

There is currently an ongoing discussion about the future of this and others MoS naming style. Please consider the issues raised in the discussion and vote if you wish GnevinAWB (talk) 20:50, 25 April 2010 (UTC)

Section on Web Browsers

The section: Web browsers which have problems with large articles seems very weaseley and unclear about who is actually affected.

If in 2010 we are preventing pre-XP computers running IE 5 from accessing Wikipedia fully then I don't think that is anywhere near as big a deal as if we are preventing IE 6 users on XP from accessing the site for example. I therefore think the section should be clarified. -- Eraserhead1 <talk> 10:47, 9 May 2010 (UTC)

Agreed. And even if WP article sizes did cause problems on IE 6 on Windows XP, we'd be doing those users a favor by forcing them to upgrade their browser, due to all the security and other hazards the Internet Explorer 6 article describes. A number of other major websites have ended support for IE 6. Wasted Time R (talk) 12:04, 9 May 2010 (UTC)
I've re-read the section and to clarify my argument I think probably all the content apart from "Under certain environments, Firefox 2.0 and Internet Explorer 6 are known to have difficulty loading articles over about 400 KB. For notes on unrelated problems that various web browsers have with MediaWiki sites, and for a list of alternative browsers you can download, see Wikipedia:Browser notes." should be removed. EDIT: And maybe even the comment about IE 6 and Firefox 2 should go as well. -- Eraserhead1 <talk> 12:18, 9 May 2010 (UTC)
 Done I've removed all the content apart from that on FF2 and IE6. -- Eraserhead1 <talk> 21:54, 15 May 2010 (UTC)