Eisspeedway

User talk:Nickj/Link Suggester

For giving feedback on the Link Suggester / LinkBot, please use one of the following pages:

If you're not sure which page to use, just pick the one that seems closest.


Linkbot Procedure Suggestion / Proposal

I hope this isn't TOO radical, but I see a rather simplistic front end for linkbot. The suggestion of {{linkthis}} is JUST a suggestion. The category in the template is jsut an idea too ...

  1. Editor saves page with {{linkthis}} tag that indicates it needs to be parsed.
  2. On the next 'parse' run for linkbot have it poll the 'what links here' for the {{linkthis}} template into list of pages. (every hour?, 15 minutes? ... )
  3. For every page in 'targetlist';
    • parse page to build suggestions for target
    • save changes to talk:target
    • clear {{linkthis}} tag from target page
    • add note to user's talk page that requested {{linkthis}}
    • Log page was parsed by linkbot
  4. For end run:
    • move old entries from /todaylog to /archivelog

[[:Template:Linkthis]] idea draft;

For more, please see User:Dbroadwell/php.

Linkbot - suggestion for implementation

Hi Nickj - Good work on the linkbot. I think it's a great idea. I have a suggestion for how the notices could be displayed though. I agree with the suggestion about using a sub-page to store the linkbot data, but why not have a standard sub-page e.g. Talk:Example Article/Linkbot Suggestions, which could be automatically updated on subsequent passes of the bot?

Originally it did exactly that, but not everyone liked the suggestions being part of the main article talk namespace (see the negative feedback page). The compromise that seemed acceptable was for LinkBot to put the suggestions in it's own userspace.

I would suggest that this page has a link to category:LinkBot (or similar), so all such pages are easy to locate,

Good idea, I'll see if I can do something with this.

and a notice advising people that any edits to the page will be lost when LinkBot is next run.

Done!

You could also store meta-data as comments if this would be useful, or include a mechanism of flagging bad suggestions on a per-page basis!

Getting feedback on the suggestions in an automated way gets kind of tricky - you have build another bot to retrieve the page contents, look at what people have added, parse this (of course some people would the get syntax wrong!), work out what it means, and then take the appropriate action. I'm not saying it can't be done, because I think it can, I just think it's not easy, and I know that I'm unlikely to have the time to implement it in addition to doing this :-( I'm wondering whether having an external site that lists the suggestions, and maybe allows them to be implemented by selecting Yes / No in a form would be the best way to do this.

On the talk page you could add a link at the top saying 'Linkbot has suggested some possible links that could be added to this article, see this page for details' the first time it is run. Alternatively (or maybe as well) you could add a link to the bottom saying 'linkbot found new links' and the date every time you run the bot. This depends how often it is likely to be run.

I'm not sure how often it should be run. Maybe once every six months? It also depends on the response when it's applied to all or most of the Wikipedia.

Anyway, just some late-night thoughts - what do you think of them? --HappyDog 01:50, 30 Mar 2005 (UTC)

They're good thoughts! Thank you for sharing them. -- All the best, Nickj (t) 05:19, 5 May 2005 (UTC)[reply]

The words "this page" normally refer to the page on which they appear, and the Web use of links reading not here but click here support that. Wiki is in any case not ordinary Web material, but seeks to avoid lks that don't work grammatically and logically in their contexts. Good examples should be set in this regard, e.g. follow the lk at the start of this sentence to see how incoherant an otherwise careful editor can become by failure to adhere to that principle.

The otherwise wonderful Link Suggester violates the principle especially egregiously by using the words "this link" twice in the sense i advocate after having used them in a link in the sense i object to! It left me rereading the text to see whether the boilerplate *urges* removing suggestions after acting on them, since it appeared to me that someone had removed them all without discarding the boilerplate. I would instead suggest something like (using the case i was looking at) a handy list.

And in case i haven't yet worn out my welcome, please note that

  • On WP, we have been exhorted to not use "this page" or "this article" as plain text at all, so that copying or moving them does not disrupt their context. That is, any plain text reading "this page" should (if meant literally) be replaced on this page by a selflink that will (unitl copied or moved) render bold but linkless; when it is meant as referring to the page this talks about, namely this talk page's user sub-page, a lk is appropriate (even when an forgoes making the pipe as literal as i prefer.
  • One reason this matters is that some of us have gotten used to expecting that, and editors may not bother looking at what "this page" lks point at (by hovering or following the lk) because they have gotten used to lks whose role in the sentence is the same as if they were not lks (and forget momentarily that the orthodox literal "this page" markup doesn't look like it's linked). Your lks to suggestions may get followed more often if they are more wiki-compliant.

Thanks, --Jerzy (t) 21:14, 2005 Apr 18 (UTC)

Thank you! The message LinkBot leaves on the talk page never quite felt right to me, and I could never really work out why - but you've expressed it very succinctly. I've updated the message that will get left on talk pages, so that it will now look like the one on Talk:Alfred Nobel - I.e. shorter, and without the "this link" stuff. -- All the best, Nickj (t) 05:19, 5 May 2005 (UTC)[reply]

Is there an established procedure to mark the linkbot's suggestion page after some of the changes have been implemented in the article? There should be. -- 19:19, 4 May 2005 (UTC)

How do you mean? Do you mean a way to say "this suggestion should be implemented", "this suggestion should not be implemented", and "never suggest this link any more"? If so, I'm wondering about whether a web form would be the best way to achieve this. I don't think it's possible for a standard user to create HTML forms inside the Wikipedia, so it would probably have to be an external site, unfortunately :-( -- All the best, Nickj (t) 05:19, 5 May 2005 (UTC)[reply]
Close. Sorry I wasn't more clear in my original post. I mean something like "this suggestion has been implemented in the article" and provide a standard way to edit the linkbot's comments to show this (or similar comments like, "this suggestion shouldn't be implemented because", ect.) Basically I mean to say it would be sloppy to go and edit the article using suggestions that the linkbot provided and then leave those (now superfluous) suggestions on the talk page without making a comment to explain it has been implemented. It would also give some indication of how often the linkbot suggestions were being heeded... Paul 19:37, 8 May 2005 (UTC)[reply]
Hi Paul, I'm leaning over time more towards a web interface (either something like the Wikipedia-integrated one described below, or alternatively running on a separate non-Wikipedia server), rather than leaving messages on talk pages. That would then allow the suggestions to be more run-time, rather than pregenerated in a huge batch, and it also wouldn't clutter up the Wikipedia and talk pages with list of suggestions and links to list of suggestions. If this could be done at run-time, it would mean that we were only making suggestions that hadn't been implemented yet. It could be possible also to build in statistics (i.e. to track links that usually get made, and maybe suggest those more; and track links that people don't make, and maybe suggest those less or not at all; And also maybe do stats on overall how many links people are making). That would be the ideal situation I think, but lately I just don't seem to have the time to do it, so I think the best approach now is provide whatever information and code I can that would allow other people to implement these things. All the best, Nickj (t) 06:51, 9 Jun 2005 (UTC)

What, performance-wise, would restrict Linkbot's implementation as an article editing feature/option?

I believe that your Linkbot idea could be feasibly implemented as a step in the wikipedia editing process.

The same way we've got buttons for "Save Page" and "Show Preview" there could be a button for "Suggest Links" which would parse the article and suggest links as your linkbot does.

This way it's not an automated process that would auto-change articles, it doesn't clog up the discussion page (or even its own separate discussion page) with bot-generated suggestions, and nobody has to have anything to do with it if they don't want to--they just wouldn't click the button. The challenge would be, of course, convincing the kind folks at wikipedia that such a programmatic change would not impact system performance to any great extent.

The existing wikipedia search functionality returns highlighted contextual results for each search. If the performance hit for doing one of your "good link" suggestions is less than the performance hit for a search, I would think that the wikipedia folks would be inclined to at least look at your idea...

I really like the idea of linking to anything ending in "-ism." I also like not suggesting a link to a generic term like "government" but instead to specific terms like "democracy" or "communism."

Where can I get a copy of the wikipedia data? What format is it in? -Jared81 03:40, Jun 7, 2005 (UTC)


Hi Jared81,

I would very much like to see the idea implemented as a part of the Wikipedia itself. I would be happy to release the source under the GPL if it would allow this to come about (note: it's a bit of a mess, because it combines three projects into one script, and it's fairly slow, but it does work). I tend to be fairly short of time lately though, so I could release the current code, describe what it's trying to do, describe the problem, etc, but honestly I'm pretty unlikely to have the time learn the MediaWiki codebase and create a patch that implements this.

The criteria for determining whether to suggest a link is:

  1. There is an existing article or redirect with that name.
  2. The link is not in the blacklist of "bad links".
  3. The link is "a good link".

The code for determining whether something is "a good link" in quite simple, and quick, namely:

/*
** @desc: returns whether something is a good link or not.
*/
function isGoodLink($link_text) {
    
    $link_text = trim($link_text);
    
    // string contains two or more capital letters
    $tmp = array();
    ereg("[A-Z][a-z ]*[A-Z]",$link_text,$tmp);
    if (!empty($tmp)) {
        return true;
    }
    
    // contains one or more spaces (but if only one space, must not start with 'the')
    $num_spaces = substr_count($link_text, " ");
    if ($num_spaces >= 2 || ($num_spaces == 1 && !eregi("^the ", $link_text)) ) {
        return true;
    }
    
    // contains a dash
    if (strpos($link_text, "-") !== false) {
        return true;
    }
    
    // string ends in "ism"
    if (eregi("ism$", $link_text)) {
        return true;
    }
    
    // otherwise assume is not a good link
    return false;
}

Determining whether there is already an article of the appropriate name is quite memory intensive, as the fastest way is to keep an index of current article and redirect names in memory. There are also case-sensitivity issues to consider, as article names are case-sensitive on the Wikipedia, so sometimes you get two articles with the same name, but different capitalisation, and you should try to suggest the correct one. So the main question is how much spare memory the servers have, and whether they have some memory persistence (as you don't want to have to recreate the memory index every time a new article is checked - better to have an index that's long-lived).

When you say "get a copy of the Wikipedia data", do you mean the latest copy of the Wikipedia itself? If so, that comes from here (the one you're looking for is the "en.wikipedia" 'cur' database dump, which is about 900 Mb compressed and without images and without old versions); You would then load this into MySQL. Or did you mean the current suggestions? You can have these if you like, but they'd be quite large (maybe 300 Mb, maybe - I don't really know though, that's just a wild guess, it could be more, or it could be much less). They're stored in a MySQL database at the moment, so the suggestions would also be in MySQL database dump format (i.e. the same format as for the Wikipedia database dump downloads). You'd also need to give me somewhere to put the file (e.g. an FTP site).

Also, if you were going to add this functionality (and it's a very good idea adding it in this way, in my opinion), then it would also be worth considering adding checking of wiki syntax that would work in the same kind of way (could even combine them into a "suggest links/ check syntax" combo step). There's already some GPL source code available for this here, and I could get you some slightly updated source code if you were to try and get this added into the MediaWiki software, but again I'm unlikely to have the time to do this by my lonesome (i.e. same basis as above - I'll provide current source, and information, etc, but not the MediaWiki patches).

I'd suggest that probably the next step would be to ask the MediaWiki developers if they're interested in this idea (because if they're not then forget it, but if they are then it could work).

Hope that helps! All the best, Nickj (t) 06:32, 9 Jun 2005 (UTC)

This is a automated to all bot operators

Please take a few moments and fill in the data for your bot on Wikipedia:Bots/Status Thank you Betacommand (talk • contribs • Bot) 19:39, 12 February 2007 (UTC)[reply]

Automated message to bot owners

As a result of discussion on the village pump and mailing list, bots are now allowed to edit up to 15 times per minute. The following is the new text regarding bot edit rates from Wikipedia:Bot Policy:

Until new bots are accepted they should wait 30-60 seconds between edits, so as to not clog the recent changes list and user watchlists. After being accepted and a bureaucrat has marked them as a bot, they can edit at a much faster pace. Bots doing non-urgent tasks should edit approximately once every ten seconds, while bots who would benefit from faster editing may edit approximately once every every four seconds.

Also, to eliminate the need to spam the bot talk pages, please add Wikipedia:Bot owners' noticeboard to your watchlist. Future messages which affect bot owners will be posted there. Thank you. --Mets501 04:21, 22 February 2007 (UTC)[reply]

Linkbot

Hi,

I'm interested in implementing your bot on my wiki (sorry can't provide a link). I'm using MW 1.6.7. What are the steps to have it working?

thx in advance, Regards, --Aretai 15:28, 20 April 2007 (UTC)[reply]

Step 5

Regarding the Phase 5, I'd like to ask you what do you think about implementing something like the InterWiki Link Checker, possibly using User:LinkBot to make the changes. Waldir talk 14:32, 25 November 2007 (UTC)[reply]

Even though you haven't yet responded to my previous message, another suggestion: detect disambiguation pages, and either don't suggest them, or (better, i think) present a dropdown list with the options :)
Another one: When linking dates like March 1978, use the form [[1978#March|March 1978]], instead of just [[1978|March 1978]].
Regards and thanks for the excellent tool. Waldir talk 23:37, 1 December 2007 (UTC)[reply]

Escaping problem

You might want to take a look at the quote character escaping. I just ran the tool on the article and in suggested (among other links) [[Giuseppe Marc\'Antonio Baretti|Giuseppe Baretti]] instead of [[Giuseppe Marc'Antonio Baretti|Giuseppe Baretti]]. Cheers, Waldir talk 16:40, 8 March 2009 (UTC)[reply]