Wikipedia:Bots/Requests for approval/KuduBot 3
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Withdrawn by operator.
Operator: KuduIO (talk · contribs)
Time filed: 00:01, Monday September 12, 2011 (UTC)
Automatic or Manual: Automatic unsupervised
Programming language(s): Python and regular expressions
Source code available: Standard pywikipedia, regular expression / parameters may be available on request
Function overview: Move all hatnotes to the very top of the articles per the Manual of Style.
Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 43#Request for hatnote bot
Edit period(s): One-time run, then daily
Estimated number of pages affected: ? (articles)
Exclusion compliant (Y/N): Y
Already has a bot flag (Y/N): N
Function details: Affects only article lead section.
Discussion
Are there *any* cases where the top is not the best place for a hatnote? Are some used in sections, perhaps? - Jarry1250 [Weasel? Discuss.] 17:23, 12 September 2011 (UTC)[reply]
- They definitely could have been used there, so limiting to lead section is probably smart. — HELLKNOWZ ▎TALK 17:41, 12 September 2011 (UTC)[reply]
- Okay, that's one exception. Are there others? I assume we're limiting to article space here for a start? - Jarry1250 [Weasel? Discuss.] 18:02, 12 September 2011 (UTC)[reply]
- I somehow assumed this applies to articles by default; it definitely should, article layout guidelines do not apply to other namespaces. — HELLKNOWZ ▎TALK 18:07, 12 September 2011 (UTC)[reply]
- It probably applies to project space too, but limiting to article space for now seems wise. Yes, I'll add an exception for sections in the form of a lookahead. — Kudu ~I/O~ 20:12, 12 September 2011 (UTC)[reply]
- I somehow assumed this applies to articles by default; it definitely should, article layout guidelines do not apply to other namespaces. — HELLKNOWZ ▎TALK 18:07, 12 September 2011 (UTC)[reply]
- Okay, that's one exception. Are there others? I assume we're limiting to article space here for a start? - Jarry1250 [Weasel? Discuss.] 18:02, 12 September 2011 (UTC)[reply]
- How do you intend to find articles that suffer from this problem? Are you going to randomly crawl through every article, or is there a database report somewhere? —SW— comment 18:47, 13 September 2011 (UTC)[reply]
- Presumably by processing a dump (AWB can handle this). - Jarry1250 [Weasel? Discuss.] 20:52, 13 September 2011 (UTC)[reply]
- Right, just want to ensure that the operator is willing/able to download and process a database dump. —SW— confabulate 21:27, 13 September 2011 (UTC)[reply]
- This will be done by accessing the database directly from the toolserver. — Kudu ~I/O~ 20:35, 15 September 2011 (UTC)[reply]
- This is not possible. The toolserver database does not include page text. —SW— verbalize 22:30, 15 September 2011 (UTC)[reply]
- Right. Perhaps I can write a separate tool which uses WikiProxy and dumps a list of pages to a file, and then feed that to pywikipedia's replace.py. — Kudu ~I/O~ 14:03, 18 September 2011 (UTC)[reply]
- What is WikiProxy? How will it generate a list of problematic pages if it does not analyse a dump? (Or does it?) - Jarry1250 [Weasel? Discuss.] 15:35, 18 September 2011 (UTC)[reply]
- Right. Perhaps I can write a separate tool which uses WikiProxy and dumps a list of pages to a file, and then feed that to pywikipedia's replace.py. — Kudu ~I/O~ 14:03, 18 September 2011 (UTC)[reply]
- This is not possible. The toolserver database does not include page text. —SW— verbalize 22:30, 15 September 2011 (UTC)[reply]
- This will be done by accessing the database directly from the toolserver. — Kudu ~I/O~ 20:35, 15 September 2011 (UTC)[reply]
- Right, just want to ensure that the operator is willing/able to download and process a database dump. —SW— confabulate 21:27, 13 September 2011 (UTC)[reply]
- Presumably by processing a dump (AWB can handle this). - Jarry1250 [Weasel? Discuss.] 20:52, 13 September 2011 (UTC)[reply]
Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Let's at least see how it performs and then (hopefully) wait for some feedback. Per WP:COSMETICBOT, be careful to not make edits that only affect whitespace and newlines, as is often the case with misformatted lead templates. — HELLKNOWZ ▎TALK 09:37, 22 September 2011 (UTC)[reply]
- One minute: Please don't use WikiProxy. If you're going to be scanning 3.5 million articles, please use a dump. It just makes sense. - Jarry1250 [Weasel? Discuss.] 18:17, 22 September 2011 (UTC)[reply]
- I agree. Sending mass queries through wikiproxy will still consume massive resources at toolserver, which is not good (and is probably a violation of toolserver policies). There is no reason that this task can't work from a database dump that is a few days old. The task doesn't require up-to-the-second versions of articles. You could also consider asking the maintainer of this tool to add a report for misplaced hatnotes if you don't want to deal with database dumps yourself. —SW— spout 19:02, 22 September 2011 (UTC)[reply]
- Comment.I'm trying to see if there are dumps available on the toolserver already, since I have a rather small quota myself. Anybody more experienced feel free to help. — Kudu ~I/O~ 12:08, 23 September 2011 (UTC)[reply]
- If you have a fast internet connection and a moderately good processor, it is far easier to download one onto your home PC and process it with AWB. - Jarry1250 [Weasel? Discuss.] 12:20, 23 September 2011 (UTC)[reply]
- I use Mac OS and Linux, so no AWB for me. However, I'll consider running pywikipedia with a dump from my own computer. Nothing is urgent, so I'll set it up over the next few days. — Kudu ~I/O~ 12:27, 23 September 2011 (UTC)[reply]
- Doing... Downloading an XML dump to the toolserver. — Kudu ~I/O~ 19:38, 23 September 2011 (UTC)[reply]
- If you have a fast internet connection and a moderately good processor, it is far easier to download one onto your home PC and process it with AWB. - Jarry1250 [Weasel? Discuss.] 12:20, 23 September 2011 (UTC)[reply]
- Comment.I'm trying to see if there are dumps available on the toolserver already, since I have a rather small quota myself. Anybody more experienced feel free to help. — Kudu ~I/O~ 12:08, 23 September 2011 (UTC)[reply]
- I agree. Sending mass queries through wikiproxy will still consume massive resources at toolserver, which is not good (and is probably a violation of toolserver policies). There is no reason that this task can't work from a database dump that is a few days old. The task doesn't require up-to-the-second versions of articles. You could also consider asking the maintainer of this tool to add a report for misplaced hatnotes if you don't want to deal with database dumps yourself. —SW— spout 19:02, 22 September 2011 (UTC)[reply]
- Here's the update: I finished downloading and extracting the dump, and now I'm running the script in a screen session. It's still analyzing the dump. — Kudu ~I/O~ 22:42, 23 September 2011 (UTC)[reply]
- Withdrawn by operator. Bad support from pywikipedia. It'd be easier for someone to file a new BRFA using AWB. — Kudu ~I/O~ 21:59, 4 October 2011 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.