Wikipedia talk:STiki/Archive 3
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 | → | Archive 10 |
STiki needs some sort of coding so it stops if a revert has been made.
Both Twinkle and Huggle will stop making a revert if it has been made by cluebot or another editor already. I have noticed numerous situations where STiki will undo that reversion and re-add the vandalism. Ryan Vesey (talk) 15:38, 26 May 2011 (UTC)
- Can you pass along some revisions where this has taken place? STiki fetches its rollback/edit token when it gets the edit text. Thus, if an edit conflict (i.e., someone else reverts the edit in the meantime) occurs, the STiki revert will fail and report that fact in the GUI. If you get me some RIDs, I can do a little deeper digging in software. Thanks, West.andrew.g (talk) 18:19, 26 May 2011 (UTC)
- Yes, I am still trying to find someone who will show me how to create a link to a specific revision but the edits by Smalljim on Daisy and Beatrix Potter had problems due to STiki. The edit by Methecooldude on Eros had problems due to STiki. Ryan Vesey (talk) 18:25, 26 May 2011 (UTC)
- http://en.wikipedia.org/w/index.php?title=Daisy&diff=429919372&oldid=429747259, http://en.wikipedia.org/w/index.php?title=Eros&diff=431016345&oldid=430950708, http://en.wikipedia.org/w/index.php?title=Beatrix_Potter&diff=429919558&oldid=429845604... to copy a diff, you go to the history page, find the line you want, right-click Prev, and copy the link. p.s. AGW, I keep seeing Cobi around IRC, have you had a chance to figure out what's going on with CBNG? Ocaasi c 18:34, 26 May 2011 (UTC)
- @Ryan: I'll certainly agree that those reverts-to-vandalism shouldn't have been made. However, it seems the most likely culprit is operator error (although both editors in question seem to be very experienced). The prior edits didn't come seconds before the STiki revert -- but hours before. All edits displayed in STiki are checked for recency (to confirm they are the most recent on the page) -- thus the users in question couldn't have seen the "bad" edits and simply been "beaten to the revert." The edits in question were pulled from the CBNG queue (why CBNG is scoring its own reverts so poorly -- or even bothering with them at all -- is beyond me). Maybe the humans were just moving too fast -- or accidentally made a mistake -- but, I've got no evidence this is software-based. Thanks, West.andrew.g (talk) 19:44, 26 May 2011 (UTC)
- @Ocaasi: The CBNG feed has been back up and running for a while. Our interface is good. Thanks, West.andrew.g (talk) 19:44, 26 May 2011 (UTC)
- I also remember coming across a number of such false positives in the cluebot ng queue, including cases where cluebot's own reversion of vandalism was flagged as potential vandalism and presented for review. It can be a little confusing to the person reviewing; it is easy to think ' I see vandalism in the diff -> click revert', without realizing that the vandalism was in the old revision instead of the new revision. Arthena(talk) 09:04, 27 May 2011 (UTC)
- Hmm, perhaps I could add a rule so that CBNG doesn't score its own edits (or at least these don't make it into the queue) to ease this confusion. The "STiki (metadata)" queue does this already. As an algorithmic side note, this is probably happening because CBNG has a very poor "reputation", before it is normalized by the huge number of edits it makes. CBNG gets reverted many times daily, be it false positives or otherwise -- which for an ordinary user would be an indication of poor behavior. Thanks, West.andrew.g (talk) 16:13, 27 May 2011 (UTC)
- You should have told me you were reporting this here, Ryan! I can confirm it was user error on three of my first four edits with the tool: I must have made over a hundred edits with STiki since the 19th and it hasn't happened again (AFAIK). Being used to Huggle and its whitelist, I didn't expect to be shown edits made by ClueBot NG or rollbackers as possible vandalism. My eye was drawn to the clear vandalism at the left of the screen, briefly forgetting the standard layout of diff pages, just as Arthena explained above. So, Andrew, there's no apparent problem with STiki presenting vandalism that has already been reverted, but I do think there are improvements that could be made to the design of the interface. I'll raise those under a new heading if you think my comments might be useful. —SMALLJIM 20:34, 27 May 2011 (UTC)
- I always have an open ear about how the tool might be improved. I should be getting around to some improvements in the next week or two, after some deadlines are cleared. Thanks, West.andrew.g (talk) 22:39, 27 May 2011 (UTC)
- Even if not your fault, some way to keep CBNG out of the queue sounds smart. If you made that error, others likely have too. Ocaasi c 21:41, 27 May 2011 (UTC)
- Seeing if the CBNG folks will handle this directly before handling it directly in STiki (see CBNG's talk page).
- Went ahead and implemented a fix. CBNG will no longer enqueue its own revert actions (though some residual ones might exist in the queue). Thanks, West.andrew.g (talk) 15:46, 4 June 2011 (UTC)
Ideas for improvement
Following on from the section above, here's a bit of user feedback on STiki. —SMALLJIM 12:03, 1 June 2011 (UTC)
For ease of discussion I've refactored the following to add sections and sigs, hope you guys don't object. —SMALLJIM 09:04, 2 June 2011 (UTC)
Diff pages
- I think the greatest improvement that could be made to the GUI would be to conform with standard WP diff pages and Huggle, and include the details of the old and new edits at the top of each column. This information is valuable in deciding whether or not we have a vandal's edit and the nearer it is to where the eye is focusing the better. The present "EDIT PROPERTIES" only provide part of the information and their location at the bottom of the window involves too much eye movement. (Perhaps someone should write a set of HIGs for Wiki AV applications?) —SMALLJIM 12:03, 1 June 2011 (UTC)
- I understand your desire to move the properties box, but which properties do you feel are missing? Maybe the fact I am only presenting new edit information, and not the metadata for the previous edit? West.andrew.g (talk) 22:50, 1 June 2011 (UTC)
- Yes, the lack of previous edit metadata does make it more difficult to decide the correct action to take. For instance if I can see that the previous edit was by ClueBot a week ago, say, then I know it's most likely OK to revert to it. But if the previous edit was made recently by another IP, then it's worth further investigation because that edit might be vandalism too. See this diff, for example (which was presented to me in STiki yesterday) where without checking that the previous edit was by an IP, I would have incompletely reverted the vandalism (not hugely bad vandalism in this case, but it illustrates the principle).
- Reverting to earlier vandalism is a regular problem that Huggle's clever use of colour-coded boxes has mitigated (see below). In STiki, since all edits are rated by the STiki/CBNG/(WIkiTrust?) back-ends, would it be possible to mark or colour code the previous edit in some way according to its score (and, ideally, presence on a whitelist too)? Some way of showing and reverting to earlier diffs in-program would be handy too, instead of having to switch to the browser.
- And just to reiterate: anyone who's edited WP for a while is familiar with the standard diff format and expects to see the metadata above the columns, so it would be more user-friendly to adopt that format. —SMALLJIM 10:26, 2 June 2011 (UTC)
Revert msgs
- An easy way of selecting between several different "REVERT COMM" messages would be useful - I've come across many edits that ought to be undone, but don't warrant a test/vandalism warning. To classify them as "Innocent" seems like a missed opportunity (especially as many of them are hours or days old and may not be spotted again for some time, if ever) and rewriting the warning message each time is too much fuss. —SMALLJIM 12:03, 1 June 2011 (UTC)
- This has been on my radar for a long time -- I just need to get around to it and do it elegantly. I have always believed a "good faith revert" option in some form would be helpful in borderline cases. Maybe a drop-down box of possible edit messages, perhaps edit-able via a persistent configuration file? West.andrew.g (talk) 22:50, 1 June 2011 (UTC)
- I think a hotkey, Vandalism, Unconstructive/Undesirable/Unwarned, Pass, Innocent, with U for the new revert without warning might be enough. You could just add one more square to the VPI buttons, or a note above the warning (To revert without warning, use U) -- Ocaasi c 00:50, 2 June 2011 (UTC)
- Yup, anything along those lines would be good to try out. —SMALLJIM 10:33, 2 June 2011 (UTC)
Recent warnings
- Similarly, some way of showing what recent warnings the editor has been given would be useful, even if it's done via a separate button to be clicked when required. In general it's useful to know what level of warning you're about to issue, and in particularly bad cases it's appropriate to skip a warning level. —SMALLJIM 12:03, 1 June 2011 (UTC)
- I really like this idea. It's a little bandwidth intensive (the user page needs to be fetched and processed in whole for all edits), but the logic is already in place (since it gets tripped when the "vandalism" button is pressed). I small piece of data like this could be presented in many places. West.andrew.g (talk) 22:50, 1 June 2011 (UTC)
Recent history
- Some indication of the article's recent version history would be useful too: it's not that uncommon to revert an edit back to vandalism done by the previous editor. I very much like Huggle's method of showing previous edits both by the editor and to the current article. —SMALLJIM 12:03, 1 June 2011 (UTC)
- This seems tougher. Can you show me how Huggle does this? West.andrew.g (talk) 22:50, 1 June 2011 (UTC)
- Isn't it just a version history from both the last editor and the editor before that? You could theoretically present 3 diffs, but the version metadata would probably be enough, maybe in the infobox at the bottom. -- Ocaasi c 00:50, 2 June 2011 (UTC)
- I seem to have dealt with this above (Diff pages) - some form of marking earlier edits depending on score/whitelist may be feasible. You really ought to take Huggle for a spin if you can (there's no need to actually make any edits with it). It's much easier to see its coloured squares system in action than it is to describe, but see Wikipedia:Huggle/Manual#Other. —SMALLJIM 10:49, 2 June 2011 (UTC)
Admin users
- The program should detect if the logged-in user is an admin and if so either give the option of blocking the vandal instead of reporting to AIV, or, for programming simplicity, provide a link to the Special:Block page in the browser. Or if that's not possible, at least a "Submit to AIV?" yes/no msg box. —SMALLJIM 12:03, 1 June 2011 (UTC)
- This never even crossed my mind, not being an admin myself. Shouldn't a report to AIV be made regardless, for logging purposes? Or is permissible for an admin to skip this step? It seems easy to implement, but I'd be interested in knowing more about the formalities of how these things are handled... West.andrew.g (talk) 22:50, 1 June 2011 (UTC)
- Since admins can do it themselves, they typically wouldn't want to add any load to AIV. Admins don't need to log blocks in any particular noticeboard, to my understanding. -- Ocaasi c 00:50, 2 June 2011 (UTC)
- Ocaasi has it right. All admin actions are logged automatically - see my block log, for instance.
- Incidentally, Andrew - shooting off at a tangent - are you using the available data on previous blocks (such as this) as an input to STiki's user reputation? As that example shows, the existence of {{schoolblock}}s in particular is a strong indicator for continuing vandalism. —SMALLJIM 11:46, 2 June 2011 (UTC)
Whitelist
A few other thoughts:
- I've seen your note to the CBNG folks, but would it be possible to incorporate Huggle's whitelist to stop other safe edits from showing up for possible reversion? —SMALLJIM 12:03, 1 June 2011 (UTC)
- Just curious, how often do these users show up in the queues currently? West.andrew.g (talk) 22:50, 1 June 2011 (UTC)
- After I remembered to keep a record, I saw these yesterday: [1] (EmausBot), [2] (revert by Bento00), [3] (ClueBot NG). I didn't record when I got them, or which queue I was using either (which I guess would be important to know) - I did use both the CBNG and STiki queues yesterday. There were also the three that were presented almost consecutively that I reverted wrongly when I started using the tool (as already discussed). As a very rough guess I'd say around 1 in 50, though their appearance may be clumpy. I'll keep an eye out for any more. —SMALLJIM 14:19, 2 June 2011 (UTC)
Vandals
- How do you stop vandals using STiki to mark dodgy edits as innocent? —SMALLJIM 12:03, 1 June 2011 (UTC)
- This is a complicated issue. STiki is essentially a "collaborative tool" just as Wikipedia is. I'll I've done is throw on an additional level of software into the collaborative infrastructure. Any attack against Wikipedia has an analog against STiki. We can limit the user-rights of those allowed to use STiki, but that reduces the available user pool. At current, I have many triggers in place that email me if suspicious STiki behavior is taking place so I can block users or IP addresses as needed. West.andrew.g (talk) 22:50, 1 June 2011 (UTC)
- Do we have any evidence this has happened? Bad reverts seem to get reported. Someone clicking innocent on obvious vandalism might be more of a hindrance. Maybe some diffs with particularly high scores could be recycled for confirmation and editors who consistently flag obvious vandalism as innocent could be put on watch. Also, an account is required, which is a significant hurdle for vandalism. -- Ocaasi c 00:50, 2 June 2011 (UTC)
- It's different from the normal WP interface, though, because once an edit is marked as "innocent", then as far as I am aware no other STiki users get the chance to consider it. But my concern actually arose after reading that "The ML models are trained over classifications provided on the STiki frontend." By the time you've spotted the bad behaviour, the damage to the training data will have been done, or at least need undoing. I reckon STiki use ought to be limited at the very least to autoconfirmed users. Ocaasi makes good points too. —SMALLJIM 14:44, 2 June 2011 (UTC)
- I raised a similar point at Wikipedia talk:STiki/TalkArchive02#Only for rollbackers? and Andrew seemed receptive to the idea of restricting usage to rollbackers. Yaris678 (talk) 23:16, 5 June 2011 (UTC)
- I think rollback is easy enough to get that it wouldn't be a bad idea to require it. If someone can't get it, perhaps they shouldn't be reverting based on diffs. Although slightly complicated, it might also be interesting to have new users go through a pre-picked set of diffs, with a few obvious vandalism, a few obviously unconstructive but not vandalism, a few clearly innocent, and a few in between. I'm also interested in intermixing a feed of gold standard diffs, as a check for inattentive or mischievous users. Ocaasi t | c 00:32, 6 June 2011 (UTC)
- I raised a similar point at Wikipedia talk:STiki/TalkArchive02#Only for rollbackers? and Andrew seemed receptive to the idea of restricting usage to rollbackers. Yaris678 (talk) 23:16, 5 June 2011 (UTC)
- It's different from the normal WP interface, though, because once an edit is marked as "innocent", then as far as I am aware no other STiki users get the chance to consider it. But my concern actually arose after reading that "The ML models are trained over classifications provided on the STiki frontend." By the time you've spotted the bad behaviour, the damage to the training data will have been done, or at least need undoing. I reckon STiki use ought to be limited at the very least to autoconfirmed users. Ocaasi makes good points too. —SMALLJIM 14:44, 2 June 2011 (UTC)
Maybe use Huggle
*Since STiki is a non-real time reversion tool, how about running your own CBNG back-end just to classify and store the edits and their scores. Then when CBNG is down, AV work using its intelligence can continue manually. Actually CBNG's VDA feeding into a Huggle GUI sounds like the best of all worlds, but I don't suppose this is the place to suggest that!
- An STiki back-end feeding into a Huggle GUI sounds like the best of all worlds, and if you could also run a copy of the CBNG VDA, AV work using its intelligence could continue manually when CBNG, the bot, is down. —SMALLJIM 12:51, 1 June 2011 (UTC)
- If that's the suggestion, why not just have the CBNG feed pumped directly into a Huggle GUI? Similarly, STiki scores have an IRC feed as well. As far as the STiki GUI is concerned, I am resistant from becoming too "Huggle like." That tool already exists, why duplicate effort? It should be no real issue for the Huggle developer(s) (Gurch?) to utilize our existing feeds. I've always strived to be (1) minimalist, (2) use reservation blocks, so there is no "race" like exists in Huggle. West.andrew.g (talk) 22:50, 1 June 2011 (UTC)
- I understand your desire for minimalism, but it's not one I share - at least not in a GUI that's helping do something as potentially complex as deciding what is vandalism and what's best to do about it. A minimalist front-end leads the user to take the software's decisions on trust whereas I'd rather have lots of data easily available so I can make well-informed decisions that I can back up if they're queried. So I'd suggest that if you want to differentiate STiki from Huggle then STiki, with its non-real-time approach, should provide the facilities to support more in-depth exploration of the circumstances surrounding the edits that it presents - thereby improving the chances of discovering patterns of vandalism that are often missed in Huggle's rapid-fire shootouts. —SMALLJIM 16:38, 2 June 2011 (UTC)
Thanks
Hope this is of some use - I must say what a good idea STiki is. So far it's presented me with plenty of vandalism to revert, without either the boredom of ploughing through tons of good edits or the over-excitement of trying to beat ClueBot NG all the time. —SMALLJIM 12:03, 1 June 2011 (UTC)
Slightly problematic user
Hey everyone. I've been watching the daily statistics scroll by, and a recent trend caught my attention. The ClueBot-NG queue normally has about 40% accuracy, but that has dropped dramatically to the 10% range. There have also been an unusually large quantity of edits churning trough.
My focus turned to STiki user, User:Lam Kin Keung. This individual seems to have a very narrow view of what constitutes vandalism. All of the reverts seem appropriate, but he/she tends to click "innocent" for many instances that I would consider vandalism. This individual does work at reasonable speeds, and would seem to be well-intentioned in nature -- so I don't want to start blocking or throwing out accusations. Perhaps the problem stems from this individual's English-language difficulty (which he/she admits to on the user page) -- but the possible sockpuppet claims others have made don't seem to pan out.
These inaccuracies are harming the system accuracy and forcing other STiki users to see edits less likely to be vandalism. Does anyone have suggestions on how this situation should be handled? Thanks, West.andrew.g (talk) 22:50, 30 June 2011 (UTC)
- If the accepted diffs could be down to a poor understanding of the English language then one approach would be to bring it to the user's attention and suggest that they use the "pass" button if they are at all unsure.
- Or perhaps you could suggest that the user wait until they have improved their English before using the tool any more.
- Of course, you wouldn't have this problem if you restricted the tool to Rollbackers, as you said you would. Can I take it that you have decided that the extra pool of users makes it worth the lower quality of results?
- Yaris678 (talk) 20:57, 10 July 2011 (UTC)
- That could still be implemented. However, FWIW, I think this user would be granted the rollback right if they were to request it. Consider that all the vandalism they do report is appropriate. There is no way a rollback-granting admin could ever view the edits the user chose not to undo. Thanks, West.andrew.g (talk) 21:29, 10 July 2011 (UTC)
- Yes... I was kind assuming that if a user's English was that bad then they won't be given Rollbacker rights on the English Wikipedia... but I could be wrong. I guess the issue here is that we have different requirements for the two rolls. As illustrated below.
- The last two don't really apply to STiki because of its client-server model. The second one is the issue under consideration here. In general, they imply a similar level of wiki experience, but there will be cases which qualify for one and not the other. Hmmm.
- I've not listed them like this before, but I have been contemplating the second requirement quite a bit. The thing that most concerns me is stealth vandalism. Ideally, STiki users will have a certain amount of awareness of it and ability and willingness to investigate the possibility of it.
- Yaris678 (talk) 11:47, 11 July 2011 (UTC)
- Is it still going on? I guess the simplest solution would be to retain any edits marked as "innocent" in the queue until two (or more) users have considered them. —SMALLJIM 12:26, 11 July 2011 (UTC)
- Simple... but would generate a lot of rework. I suppose the simplest solution to that would be to have a white list of people whose declarations of innocence are taken as fact.
- This relates to something I was thinking about the other day... if a lot of people have passed on a given diff then the probability of anyone saying that it is vandalism seems to be lower so could you make STiki adjust the score given to the diff... so that others can still come across it, but only after they have removed all the obvious candidates. Similarly, you could make it so that a declaration of innocence merely gives a diff a lower vandalism score, rather than remove it from the list altogether. The amount that the score is adjusted could be varied from person to person... either manually, based on your own assessment, or in response to the vandalism rate amongst a users passed and "innocented" edits that are re-examined.
- Yaris678 (talk) 17:53, 11 July 2011 (UTC)
- I don't see rework as a problem - after all Recent Changes watchers and Huggle users do it all the time without noticing. Whitelists are best used to decide which edits are presented in the first place, and unless determined automatically or as a simple one-click process, the issue of maintaining them has to be considered. However, I do like the idea of reducing the vandalism score by a certain amount each time an edit is marked as innocent until it drops below a threshold set by STiki. That way edits that ClueBot NG thinks are the worst would get more attention than those with a lower rating. The data generated could be useful for CBNG training too, as several people would have said that edit x, though marked highly by CBNG, wasn't in fact vandalism. What about it Andrew? —SMALLJIM 18:34, 11 July 2011 (UTC)
- First off, the percentages have seemed to be more in line with the norm lately, so perhaps this particular situation has passed. Nonetheless, I like the Yaris+Jim solution of decreases to the queue status/scare based on presses of the "innocent" button. Perhaps different user rights could also carry varying weights (i.e., users w/o rollback only carry small weight, rollbackers moreso, and admins even moreso). Thanks, West.andrew.g (talk) 04:12, 12 July 2011 (UTC)
- Andrew, varying the weight according to user rights seems to be a good starting point. You could also look at number of edits. Once a user has built up a decent history of using STiki then I still think that record should be the most significant factor… but I appreciate that examining that recorded in a rigorous way would require more programming by you.
- Jim, I think the main strength of STiki is that its queue-client-server model reduces rework. I know some people are happy to do it but we will be more effective in fighting vandalism is we concentrate our time where it is needed.
- Yaris678 (talk) 09:56, 12 July 2011 (UTC)
- First off, the percentages have seemed to be more in line with the norm lately, so perhaps this particular situation has passed. Nonetheless, I like the Yaris+Jim solution of decreases to the queue status/scare based on presses of the "innocent" button. Perhaps different user rights could also carry varying weights (i.e., users w/o rollback only carry small weight, rollbackers moreso, and admins even moreso). Thanks, West.andrew.g (talk) 04:12, 12 July 2011 (UTC)
- I don't see rework as a problem - after all Recent Changes watchers and Huggle users do it all the time without noticing. Whitelists are best used to decide which edits are presented in the first place, and unless determined automatically or as a simple one-click process, the issue of maintaining them has to be considered. However, I do like the idea of reducing the vandalism score by a certain amount each time an edit is marked as innocent until it drops below a threshold set by STiki. That way edits that ClueBot NG thinks are the worst would get more attention than those with a lower rating. The data generated could be useful for CBNG training too, as several people would have said that edit x, though marked highly by CBNG, wasn't in fact vandalism. What about it Andrew? —SMALLJIM 18:34, 11 July 2011 (UTC)
- Maybe I'm misunderstanding your use of the term "rework", Yaris. All I mean by the term is that any recent edit that's presented in an anti-vandalism context is available for consideration by more than one person, until either someone reverts it or it drops off the list (usually due to age). It's the default process and a proven concept and the fact that STiki currently doesn't
allowencourage it is a weak point. It's especially bad for STiki because the edits that it presents are typically several hours old, and if marked "innocent" they're unlikely to be looked at again by someone with anti-vandalism primarily in mind. Getting several opinions on the innocence of an edit is a good, collaborative activity. - Incidentally, the fact that STiki can pick up as many bad edits as it does seems to show up Huggle's main problem: in its default configuration (which most users employ) it's biased towards vandals who have already been reverted/warned in the current session. Their edits appear at the top of the list, leading to many edits by vandals who haven't been reverted in the session only appearing lower down in the list, where they're not usually spotted. The problem's caused by the simplicity of Huggle's default input filters (an IP or a non-whitelisted registered user) leading to the bad edits being swamped by the flood of good ones. Hence my earlier suggestion that a CBNG feed providing the input to Huggle would be beneficial if the Huggle list was ordered by the CBNG score. Though of course, it would then "compete" with STiki... —SMALLJIM 13:55, 12 July 2011 (UTC)
- Yes. That is what I meant by rework. I suppose that whether or not you see it as a bad thing depends on your feeling about the probability of someone missing an act of vandalism they see versus the probability of no one seeing an act of vandalism further down the queue because they never got that far. A system like we have been discussing, that pushes items further down the queue, rather than remove them from the queue, could be seen as an attempt to balance those two possibilities.
- Your info on Huggle is interesting. Does it only work with vandals identified in Huggle? Or is it anyone that has been reverted? When I revert someone in STiki almost always check there edit history and sometimes I find more vandalism.
- Yaris678 (talk) 15:49, 12 July 2011 (UTC)
- Maybe I'm misunderstanding your use of the term "rework", Yaris. All I mean by the term is that any recent edit that's presented in an anti-vandalism context is available for consideration by more than one person, until either someone reverts it or it drops off the list (usually due to age). It's the default process and a proven concept and the fact that STiki currently doesn't
- Well, any user's copy of Huggle can only work with the info it has available to it. I think that's just the RC feed, the whitelist and the user's own actions. The RC feed includes edit summaries - so page blankings, reverts done and any vandalism warnings and blocks issued by other users (using any process that issues the standard warnings) can be picked out, and any of these that occur during the session bump any later edits by the vandal towards the top of the queue - the actual order depends on the warning level issued. Huggle's "info" button will provide its user with details of any recent warnings issued to any editor, and that data is then taken into account in list position too. (All TTBOMK) —SMALLJIM 18:33, 12 July 2011 (UTC)
STiki @ Wikimania 2011
Hello, everyone. As a bit of a PSA, I'd like to announce that STiki figures heavily into my two submissions to Wikimania 2011:
- "Anti-Vandalism Research: The Year in Review" -- Looking at both practical and academic anti-vandalism progress over the last year. On the practical side, this includes the evolution of the STiki tool, the inclusion of third-party queues (such as ClueBot NG), and recent proposals to integrate anti-vandalism algorithms into "pending changes" and/or "smart watchlists."
- "Autonomous Detection of Collaborative Link Spam" -- This is a queue which is in development for STiki, targeting external link spam. The theoretical work is largely complete (and in submission to WikiSym 2011), and I'm currently working (with my co-authors) to leverage the technique in a live/online fashion.
I'd encourage my supporters to visit the Wikimania site (it can be done under unified login) and indicate your interest in these presentations (only if you feel so inclined). Though I realize few (if any) of you will be in Haifa in August -- you can still benefit. There are plans to tape/stream the presentations and make them available to the entire Wikipedia community. Thanks, West.andrew.g (talk) 04:01, 23 April 2011 (UTC)
Remove the innocent
If the idea of reducing scores is implemented, I think STiki's "Innocent" button could be removed. Clicking "Pass" (maybe renamed "Next") would reduce the score, and unreverted edits would eventually drop off by falling below some score threshold, instead of by becoming too old. Neat! —SMALLJIM 16:47, 12 July 2011 (UTC)
- Interesting idea. But I think this conversation has highlighted the possibility that different users will use STiki in different ways. I tend to only press "vandalism" or "innocent" if I am pretty certain. This means that I can end up doing a lot of looking at the history of the article and Google searching. It also means that I probably press "pass" more often than some people. It sounds like you (Jim) are happier to press "innocent" if it doesn't look like vandalism to you. I think this explains why I wouldn't want someone to redo what I have done, to see that something is innocent, whereas for you the idea of someone else looking at a diff is no biggy.
- If this variation in user behaviour is true then that supports the idea of STiki varying the amount that a diff goes down a queue according to the user. An alternative/additional idea would be to have four buttons, labelled "innocent", "probably innocent", "no idea" and "vandalism". The first three buttons move a diff different amounts down the queue.
- Yaris678 (talk) 08:02, 13 July 2011 (UTC)
- Ah - I see! You spend a lot of time looking at some less obvious cases of vandalism and don't see any benefit in anyone else "reworking" that edit. Makes sense. An alternative idea to having more buttons would be to link a percentage-based score reduction to the time elapsed before the "Next" button is pressed. Since the cases that take a long time to consider would tend to be those with lower scores, spending 15 minutes on one edit might reduce its score by, say 25%, probably causing it to be dropped from the list entirely. Edits that are more obvious vandalism would tend to have higher scores and would take less time to consider, hence the percentage-based score reduction. Lots of parameters to tweak! Implementing this would have the additional benefit of reducing the effect of anyone trying to subvert STiki - stabbing away at the "Next" button would only have a small effect on the scores, whereas doing it slowly could obviously only affect a few edits. —SMALLJIM 10:08, 13 July 2011 (UTC)
- Cool. Looks like we now "get" each other on the rework thing.
- Hmmm... I can see that relating the score modifier to time would work well in a lot of cases... but not so well in others. Most obvious is the case where someone has a break from STiki but leaves it running, thus giving the software the impression that they are really looking into it. Less obviously, someone may spend a long time on something in an area they are unfamiliar with, whereas in a familiar area they can answer quickly.
- Having multiple buttons allows the user to specify how certain they are, rather than having the software estimate it.
- Yaris678 (talk) 11:57, 13 July 2011 (UTC)
- Well it's not a perfect solution, of course - that doesn't exist. Implement a timeout after half an hour. To my mind, rating STiki users by their status is not a particularly good idea because not everyone (admins excepted, of course!) takes as much care as you do. And having a button that says you're very certain that an edit is not vandalism does nothing to solve the problem that started this whole thread.
- There is a way to indicate that you're certain an edit isn't vandalism: make a dummy edit to the article with an appropriate edit summary. So... perhaps a button in STiki to do that automatically (maybe disabled for one minute) would work - the suspect edit would immediately drop out of the queue because it's no longer the top edit, and anyone abusing the facility would incriminate themselves! —SMALLJIM 13:08, 13 July 2011 (UTC)
- I'll respond briefly to the issue of "measuring how quick the decision is made as a measure of confidence." I don't find it a good one. Personally, I rarely spend more than 5 seconds inspecting an edit diff. However, in the past I had an issue with STiki's edit "reservation blocks" timing out for one particular user -- because it wasn't unusual for him to spend 10+ minutes an edit. This individual is well-trusted (an admin!) and was using STiki as a launching pad for fixing all types of article issues (while also correctly classifying vandalism). I think it seems inappropriate to read too much into how a user interacts with the tool. West.andrew.g (talk) 17:09, 13 July 2011 (UTC)
- I agree that it's probably not implementable in the real world - it was just a spot of brainstorming. But I think Yaris678 and I have come up with some other interesting ideas above - I hope they're worthy of consideration. Is there a new version of STiki on the horizon, BTW? —SMALLJIM 22:09, 13 July 2011 (UTC)
- Most of my recent efforts have been on the back-end algorithms, not anything which is visible in the GUI interface (these things get changed all the time, and rather transparently to end users). The link spam queue is still in progress. The generic "metadata" algorithm should be seeing some large performance improvements soon as a result of: http://www.uni-weimar.de/medien/webis/research/events/pan-11/wikipedia-vandalism-detection.html. Thanks, West.andrew.g (talk) 04:17, 14 July 2011 (UTC)
- Thanks for the update. I think it's a shame you're not working on the UI - maybe soon? —SMALLJIM 08:52, 2 August 2011 (UTC)
- Most of my recent efforts have been on the back-end algorithms, not anything which is visible in the GUI interface (these things get changed all the time, and rather transparently to end users). The link spam queue is still in progress. The generic "metadata" algorithm should be seeing some large performance improvements soon as a result of: http://www.uni-weimar.de/medien/webis/research/events/pan-11/wikipedia-vandalism-detection.html. Thanks, West.andrew.g (talk) 04:17, 14 July 2011 (UTC)
New record?
I just reverted vandalism from 50 days ago using Stiki. [4] Arthena(talk) 10:50, 26 August 2011 (UTC)
- Good work! A DB query produced [5], which comes in at ~88 days. West.andrew.g (talk) 14:31, 26 August 2011 (UTC)
- 94 days! Though I was rather helped by the fact that ClueBot NG's server is down. Think I'll do back to the original STiki queue now. Yaris678 (talk) 21:32, 4 December 2011 (UTC)
- Yes, with CBNG down, you should be using the STiki queue (depending on which version of STiki you are using, the STiki queue will now be selected as a default). Since CBNG isn't doing anything, it's accuracy percentages are horrible while users dig deep into an inactive queue. The STiki system, meanwhile, has been showing hit-rates of 66%+. Thanks, West.andrew.g (talk) 22:42, 5 December 2011 (UTC)
- Hmmm... when I went onto the STiki queue my hit rate was much lower. Could this be due to someone else having just been through and finding most of the vandalism? Yaris678 (talk) 09:55, 7 December 2011 (UTC)
- That's the only explanation I can think of. Since being down, the ClueBot-NG queue is scoring less than 10% in terms of accuracy in the aggregate. STiki has been over 60%. This sample composes several thousand revisions; so you must have just hit a bad streak. ACTUALLY, I am a little surprised to hear you report this. Looking at my statistics, it seems that only 64 CBNG revisions have been viewed at in the past 24 hours -- and NONE of these were classified as vandalism (and therefore it would have impossible for CBNG to do "better"). Thanks, West.andrew.g (talk) 14:40, 7 December 2011 (UTC)
- I've not been on STiki since late on the 4th of December, so I can't really speak about the last 24 hours. Yaris678 (talk) 15:16, 7 December 2011 (UTC)
- That's the only explanation I can think of. Since being down, the ClueBot-NG queue is scoring less than 10% in terms of accuracy in the aggregate. STiki has been over 60%. This sample composes several thousand revisions; so you must have just hit a bad streak. ACTUALLY, I am a little surprised to hear you report this. Looking at my statistics, it seems that only 64 CBNG revisions have been viewed at in the past 24 hours -- and NONE of these were classified as vandalism (and therefore it would have impossible for CBNG to do "better"). Thanks, West.andrew.g (talk) 14:40, 7 December 2011 (UTC)
- Hmmm... when I went onto the STiki queue my hit rate was much lower. Could this be due to someone else having just been through and finding most of the vandalism? Yaris678 (talk) 09:55, 7 December 2011 (UTC)
- Yes, with CBNG down, you should be using the STiki queue (depending on which version of STiki you are using, the STiki queue will now be selected as a default). Since CBNG isn't doing anything, it's accuracy percentages are horrible while users dig deep into an inactive queue. The STiki system, meanwhile, has been showing hit-rates of 66%+. Thanks, West.andrew.g (talk) 22:42, 5 December 2011 (UTC)
"Stealth" vandalism
Are there any plans for STiki to tackle one of the most obvious forms of "stealth" vandalism? I am thinking of the type where the vandal uses an account or IP to perform some act of vandalism and then uses a different account or IP to make another change to the article, possibly half undoing the original vandalism, but leaving some intact.
I have seen it many times. I wouldn't have thought it would be that hard to detect. In terms of metadata, I would have thought both edits would look like vandalism... and they would have to be a relatively short time apart. I suppose the biggest issue might be the GUI... depending on how clever you try to be. You could just present the net diff but warn the user that it includes the edits of more than one editor and they should probably check them all individually.
I guess that in the longer term, the issue will be edits like this where the second editor is seemingly an experienced one!
Yaris678 (talk) 17:50, 31 August 2011 (UTC)
Is it down?
Is Stiki down? I have not been able to connect yesterday and today. I get an error message starting with 'Unable to connect to the stiki back-end'. Arthena(talk) 07:45, 13 October 2011 (UTC)
- And it's back up. Arthena(talk) 19:07, 14 October 2011 (UTC)
- Yep, a network outage where STiki's server resides seems to have been the culprit; and the network interface did not come back up automatically when things came back online. A trip to the office fixed matters, and there is no issue with STiki itself. Thanks, West.andrew.g (talk) 02:26, 16 October 2011 (UTC)
UI suggestions
Hey AGW, I have been back using STiki for a few days and noticed some interface tweaks that might improve efficiency.
- Selectable text. Right now the diff-text is not copyable. That would occasionally help when wanting to search Google or Wikipedia for a phrase. Is there a way to render the diff so it's click-able?
- I'll look into this. It wasn't too long ago that I made the text copy-paste-able in the "metadata" panel. I am doing so many little tricks to keep the formatting and line-breaking clean -- it seems to break this functionality out of the box. But yes, this seems reasonable.
- Red highlighting. The diffs show which paragraphs have changed but they don't show which words within the paragraph are different, which Wikipedia's diff does show for me. It would help with longer paragraphs as well as with subtle changes within a paragraph.
- This should have been fixed in the very minor release of 12/2 (just yesterday, or two days ago). A MW upgrade broke this. West.andrew.g (talk) 06:49, 4 December 2011 (UTC)
- View prior diff. If there were a way to query the last 2-5 diffs it would help identify prior vandalism and avoid reverting to a vandalized version. You currently offer the view page history link, but that requires leaving the interface, which reduces efficiency.
- This would be more significant work. I'll think about it, but its not on my immediate radar.
- Integrated feedback/comments. It would be nice if there were a way to leave feedback from within the program and have it post to this page (again, without having to leave the program). That's a much lower priority suggestion, but it might encourage suggestions/learning.
- It would be simple to integrate, but what's the point? If you know how to use STiki, surely it isn't too much of a burden to visit this talk page.
Hope your research/talks have been going well. Cheers, Ocaasi t | c 22:09, 3 December 2011 (UTC)
- View prior diff: In the special case where the same editor makes several consecutive changes, these should all be lumped together on the diff browser, as if they were one edit. Otherwise, most of the vandalism is hidden. If I am not mistaken, STiki already includes all of the consecutive changes when it reverts. Peter Chastain (talk) 15:12, 22 December 2011 (UTC)
Algorithm suggestions
Hey, I'm not sure if these are already implemented in the STiki algo or whether they'd be helpful/feasible.
- Addition of text between ref tags--typically a sign of very trustworthy users or very advanced vandalism.
- Introduction of new spelling errors--spellcheck prior and alter versions and compare number of misspellings
- Single word changes within a paragraph
- Text added to the very end of a paragraph
- Text added to the very beginning of a paragraph
- Timestamps in articles--from someone signing their name
- The words gay, cock, penis, and ass--seriously
- Exclamation points
BTW, just watched your Haifa talk. Very cool. Glad you're still working on all this. Cheers, Ocaasi t | c 04:39, 7 December 2011 (UTC)
- Cool! Is there a link to a video of the Haifa talk?
- In terms of your suggestions, my understanding is that these are the sort of thing that ClueBot NG looks for. The STiki algorithm looks at metadata. However, I am not 100% sure what things Andrew classes as metadata, so maybe I should let him answer the question.
- Of course, when ClueBot NG is working, the STiki system can use it's data through the IRC feed.
- Yaris678 (talk) 09:49, 7 December 2011 (UTC)
- Hi guys, I have edited Ocassi's original posting above and placed check-boxes next to the things that STiki already does. I now respond to individual points:
- I believe it starts somewhat into my presentation, but video exists at [6]. Now everyone can see what I look like in real life.
- STiki originated as an algorithm built purely on "metadata". However, this has been relaxed include things like "bad word lists", checking for Wiki formatting, and other "language/text" features. I'll now use anything that is lightweight.
- I have better algorithms than the one "STiki (the queue)" is currently using. Namely, the one described in [7] (if you view the PDF, there is a large table of the features used). I do not have an 'online' or 'live' implementation of this technique, yet. By and large, I had stopped algorithmic development because the CBNG folks were performing so strongly on this front and everyone preferred their queue.
- Thanks, West.andrew.g (talk) 16:30, 7 December 2011 (UTC)
- Cool.
- Just watched the presentation. It was excellent.
- The paper was also very interesting. Between the two of them, I feel like I have a good overview of the state of play.
- I am guessing Ocassi's other suggestions would be fairly light weight to calculate. Has anyone tested those features to see if they add any information. (i.e. what would be their ranking on Table 4 of your 2011 Multilingual Vandalism Detection paper?).
- I really like the idea of using ex-post-facto evidence. Especially USR_BLK_EVER. I assume USR_HAS_RB is reverts before that edit. What about reverts after that edit? I would have thought that would be a very good indicator.
- Are there any plans for a STiki that can look deeper into an article history? i.e. not just display the top diff. That could be useful with ex-post-facto evidence... and with stealth vandalism... and more. Obviously one issue you would have is knowing if an edit persists in an article. Another is how to display the diffs. I have some ideas in both areas, if you are interested.
- I normally prefer the STiki queue to the CBNG queue! (My stint of the 4th of December is an obvious exception.) The STiki queue seems to provide me with more vandalism (and sometimes more interesting vandalism). I guess some of it depends on who else has been on STiki recently and which queue they have been using.
- Is anyone currently working on an implementation of a combined-method algorithm? The graph on page 19 of your Wikimania 11 presentation is very impressive.
- I really like the AV clearing house approach on page 28 of your Wikimania 11 presentation. Is there any work going on towards that? I can imagine that a lot more people will use the STiki queues if they can access them through Huggle. The approach would also obviously allow user-interface people to develop new UIs and algorithm people to develop new algorithms, safe in the knowledge that someone else was taking care of the other stuff.
- Thanks, Yaris678 (talk) 20:03, 7 December 2011 (UTC)
- Cool.
- Hi guys, I have edited Ocassi's original posting above and placed check-boxes next to the things that STiki already does. I now respond to individual points:
User stats (feedback/gamification)
I think STiki's biggest issues are external rather than internal; in other words, getting people to use it. One thing that might help is giving a more quantized dashboard to the user which could track their patrolling and usage. For example, I'd like to know my:
- Total diffs reviewed (Vandalism or Innocent)
- Pass percentage
- Average time per Vandalism/Pass/Innocent classification
I'd avoid stats that would encourage literal 'gaming' such as category totals (total Vandalism classifications), since that might lead users to try and inflate those stats artificially.
You also might consider a leaderboard based on any of those stats, to give some external validation to the more prolific users. Another suggestion would be handing out barnstars to users who meet some benchmark (e.g. 1000s of classifications, X classifications per month, least Pass percentage). Thanks again,
Cheers, Ocaasi t | c 08:08, 7 December 2011 (UTC)
- Hi again. I like this idea of a leaderboard. I already maintain all the statistics to create it, and it could just be dumped nightly to some page in my user space (or STiki's space). Do you think are any privacy issues in publishing this information (something the WMF is very sensitive to?). Maybe you could create an attractive template/mock-up of how you think the leaderboard should look, and then I could populate it with actual data? This is much easier to do on-wiki than in-STiki. I once gave barnstars to my largest users, it was a manual process, though.
- I liked that you point out that one of STiki's problems is "we need to get people to use it." With CBNG currently down and lots of vandalism hanging around -- this seems like a good time to attack the issue. Can you think of any venues by which to get the program some fresh press? Thanks, West.andrew.g (talk) 16:30, 7 December 2011 (UTC)
Here are some ideas:
- Leaderboard - Make it opt-out, so anyone with privacy concerns can easily be excluded. Also, some thought has to go into the right metric, maybe V+I, perhaps multiplied by 2-P% or something clever. Here are some examples: 1, 2, 3, 4, 5. If you have a spreadsheet with the data, there's an easy way to convert Word documents to Wiki-text (Help:WordToWiki, Excel converter), which is especially helpful for tables. There are also ways to auto-populate tables, but it'd probably be much simpler to just manually update them every couple of weeks.
- Now done, per below. West.andrew.g (talk) 23:17, 28 December 2011 (UTC)
- Barnstars - Give out monthly awards, milestone awards, and superlatives. I think this is really important to keep up the regular user base. If it's too time-consuming for you, the task could be delegated to STiki 'ambassadors', people familiar with the program, the basic algorithm differences, the role of machine-assisted review, etc. Then have them do some of the these tasks.
- I'd be happy to bestow the title of "ambassador" upon you, if you'd like to help out with these sort of things. You've been one of the most consistent STiki users throughout its history. I think the leaderboard will prove especially helpful in handing out barnstars -- because we can target both "new" and "long inactive" users with the goal of retention. West.andrew.g (talk) 23:17, 28 December 2011 (UTC)
- Implement user-statistics in the STiki interface
- A bit redundant with the leaderboard. Not high on my priority list. West.andrew.g (talk) 23:17, 28 December 2011 (UTC)
- Place the STiki userboxes more prominently on the WP:STiki project page, like at the very top. Currently they're hidden in a collapse box. No one will see them down there.
- Write an op-ed for the Signpost, describing your experience from the initial controversy through STiki's adoption, and looking forward to future developments
- Again, I think we should see what can be done with the draft you sent me. It would be very good promotion if the Signpost would take something like this directly. Maybe a bit too wiki-focused, though ...
- Ask to be profiled, along with other STiki users for the Wikiproject report (last edition)
- ... and if the above effort fails, ask the Signpost if they'd do something like this. Maybe you could contact HaeB (editor? former editor?) directly regarding your draft, and indicate you have my endorsement. Maybe we could get an interview considering both the user (you) and programmer (me) perspectives. West.andrew.g (talk) 23:17, 28 December 2011 (UTC)
- Any of those newspaper ideas can be suggested at this page
- Send an email to Jimmy Wales introducing yourself/updating him and asking for any ideas with adoption. Also email some WMF folks like Maggie Dennis, Jorm, Sue Gardener. Tell them what the program is capable of, and ask if they can help or know someone who can.
- Post an update/request at User talk:Jimbo Wales. Thousands will see it.
- Request help brainstorming ways to improve adoption/recruit new users at Village Pump Idea Lab or Village Pump Miscellaneous
- Move STiki to the toolserver? I'm not sure if that would make it more 'mainstream'. Probably not, but just an idea.
- Ugh, no thanks. My server seems to be more stable than anything the toolserver or CBNG folks are using. No affect West.andrew.g (talk) 23:17, 28 December 2011 (UTC)
- Post an update/request for users at the Anti-vandalism project page.
- Run a STiki recruitment drive, asking existing users to tell 1-3 other editors about their experience. Or run a drive trying to get 50 new users.
- Recruit on irc://irc.freenode.net/#wikipedia-en, which I've been doing a bit.
- Create a banner ad to run at Wikipedia Ads
- I read the instructions. Seem straightforward. Do you have any graphical skills, or anyone else who watches this page? Once we have a banner, it can go straight into rotation. Seems silly not to do it. Can someone draft and add a new section for it below? West.andrew.g (talk) 23:17, 28 December 2011 (UTC)
Obviously this all needs to be done in a non-gamey, non-spammy way. I'd start with the features that reward existing users and branch out from there. Word of mouth, userboxes, and visibility in STiki-reverted edit comments will do most of the work from there. A Signpost article would be nice, too. Cheers, Ocaasi t | c 07:48, 8 December 2011 (UTC)
- OK. Here's what I think: Andrew currently has a few tools he uses to check that people aren't abusing STiki. We don't want to bring in a load of novice editors who would be difficult to keep track of. Promoting through signpost is good idea cos I think its mostly experienced editors who read that. We could also hang out at Wikipedia:Requests for permissions/Rollback and invite people who we think look suitable. Maybe we should create a template that we can use for this invitation process... Yaris678 (talk) 10:39, 10 December 2011 (UTC)
- I hope to come up with an auto-updating leader board sometime over the holidays. Thanks, West.andrew.g (talk) 17:06, 16 December 2011 (UTC)
- And now we have a LEADERBOARD! Let me know about any suggested changes, and give it some press if you know of any appropriate venues (I will place a link on the main STiki page). Any edits you try to make to the leader-board will get over-written nightly -- so maybe any suggestions should be lodged here. Thanks, West.andrew.g (talk) 22:48, 28 December 2011 (UTC)
AGF reversions
As a new STiki user, still working out how best to use it, I sometimes find a change that needs to be reverted for reasons other than vandalism, e.g., the change isn't really relevant, or it is wrong but probably not made maliciously. So, I wonder if it is better to revert it outside of STiki, or to use STiki (making sure to uncheck the Warn Offending Editor box and change the edit summary). It is a bit disconcerting to press a button labeled Vandalism (Undo) to revert good-faith edits, and I wonder whether this skews the vandalism statistics, labels innocent editors as vandals, etc. Also, from a user-interface viewpoint, it is easy to forget to change the edit summary or uncheck the warning box, or to forget to change them back to defaults after the reversion. Either mistake could lead to bad feelings, embarrassment, and confusion. Should we have an additional button called "Undo (assume good faith)"? Peter Chastain (talk) 17:21, 22 December 2011 (UTC)
- This has been one of the more hotly contested points across STiki's history (the "good faith revert" one). Ultimately, everyone's definitive of vandalism is subjective. I would tend to suggest that for unconstructive edits that are done in good faith, off-STiki reversion is the best course of action. However, one could have a custom edit summary that accounts more substantially for this case, and then edit as normal, unchecking the "vandalism warning" checkbox as appropriate. The subtleties here should not introduce much bias/consequence into the training process, as the dataset is substantial, and also relies heavily on external corpora. I tend to believe that no matter how many buttons we have (unless we turn into Huggle), there will always be some quantity of borderline/corner cases. Thanks, West.andrew.g (talk) 02:33, 28 December 2011 (UTC)
Pass versus Innocent
I have been getting messages that I use the Pass button too often and suggesting that I default to Innocent, so I am wondering what Innocent should mean? Does it mean, "I have not found clear evidence that this edit was vandalism"? Or does it mean, "I have reviewed this change and am reasonably confident that it is good"? Since our goal is to improve the quality of Wikipedia, I tend to lean toward the second interpretation. A lot edits involve changing small details (e.g., sports statistics), which I have no idea how to verify on the web, so I prefer to kick the can down the road, so someone else can look at it. (A counter-argument is that other reviewers will probably have the same problem.) Peter Chastain (talk) 09:17, 23 December 2011 (UTC)
- I have also leaned towards the second interpretation, which is why I installed the warning message in the first place (after 15 "passes" in one session, I think). Some things are so specific, 99.99% of STiki editors would not be able to verify their veracity. Why keep passing these edits on to other STiki users (and probably to the top of their queue)? These are the types of issues that an article watch-lister should deal with. Being quite small, the STiki user-base should focus on a broad coverage of the Wikipeida space, rather than a multiplicative one -- which I tend to believe is one of Huggle's big weaknesses. Thanks, West.andrew.g (talk) 02:40, 28 December 2011 (UTC)
- I have been mulling over that conundrum myself. I notice that I often come across a big block of hard-to-assess edits. I think that comes from when another user has already been through them, taking out the easy-to-assess ones. I normally end up passing on roughly half of them myself... which combined with others that I may pass means that the next person is going to have exactly the same problem. I think it may be worth creating a special queue for edits that have been passed twice (or some other number of times). That way if an editor fancies a challenge and doesn't mind doing some research to check if someone is adding false info they can go to that queue, meanwhile if you just want to keep ploughing on, trying to find more obvious vandalism you can. You may even find that people are more willing to put more effort into the hard cases if that is what they are anticipating. Yaris678 (talk) 18:41, 23 December 2011 (UTC)
- Good thought, I'll run a script on some interval that achieves just this kind of logic. Moving RIDs with multiple passes out of the primary queues, and into a special one that contains these tough situations. Be patient with me as I get through the holidays and all of this new feature implementation. Adding a new queue is not the easiest thing in the world (not really difficult either, just a touch laborious). Thanks, West.andrew.g (talk) 02:44, 28 December 2011 (UTC)
- Hmmph. I thought adding a queue for this purpose would be a pretty genius idea; saving most editors from dealing with these cases, and create an interesting queue for those who wanted to deal with the tricky cases. Unfortunately, after analyzing my database traces, it seems like this is not the best idea. So few edits are "passed" by 2+/3+ editors, that it wouldn't even be worth my time to implement this. Certainly, some users are a little "pass heavy", but it seems like the next person along usually takes care of it. A queue of this nature would sit empty or near-empty a very large portion of the time. Therefore, I am pushing this to somewhere between "null" and "very low" priority. Thanks, West.andrew.g (talk) 07:22, 29 December 2011 (UTC)
Editing off-STiki
A related question is what to do after I go off-STiki to fix a problem. In one common scenario, I see that more than one editor has vandalized an article recently, so I go into Wikipedia and revert to the last good edit. Now, to advance to the next edit in the STiki queue, should I press Innocent (since there is nothing more for STiki to do), or Pass (assuming that STiki will detect my subsequent change and not requeue the questionable edit)? Peter Chastain (talk) 10:25, 23 December 2011 (UTC)
- In that situation, I normally press "vandalism". STiki tells you that it was beaten to the revert. I don't know if it records the edit as vandalism in its machine-learning thingumy though. Yaris678 (talk) 18:41, 23 December 2011 (UTC)
- Yes, press "vandalism." STiki will be beaten to the revert, but it will record the RID as vandalism for learning purposes. Thanks, West.andrew.g (talk) 02:46, 28 December 2011 (UTC)
- What if, say, I go off-STiki to revert an "good-faith" edit, or make the edit conform with Wikipedia's policies? Should I hit innocent? Jargon777 Talk 04:05, 4 January 2012 (UTC)
- Yes, press "vandalism." STiki will be beaten to the revert, but it will record the RID as vandalism for learning purposes. Thanks, West.andrew.g (talk) 02:46, 28 December 2011 (UTC)
STiki new version on 12/29 (CHANGELOG)
A minor release tonight, mostly affecting the "last revert panel". Per the CHANGELOG:
- Though not code-specific, a LEADERBOARD now exists with STiki statistics
- Some changes were made to the "last revert panel":
- The panel now contains a link to the "article", so one can now easily visit the page which was just affected by an undo/RB.
- Previously, the "0 edits undone... beaten to revert? ... check page hist" message was displayed anytime an edit attempt did not succeed. Now, there are separate messages for the "beaten to revert" (which is known to be the outcome) and "error" cases. They display in bold-red for prominence.
- If an "error" edit outcome occurs, and the user has started STiki from the command-prompt/terminal, there should be output about the error. Please post this to the talk-page, so we can discover which error conditions are occuring in practice.
- A small GUI error was fixed where the panel would report that an edit was successfully made, when in fact, the STiki user had been beaten to the revert. This was a result of the MW-API returning a "success" code, but unknowingly also noting "nochange". This only affected a minority of users without native rollback.
- Backend: Even when CBNG scores itself poorly; these edits will never be enqueued. i.e., a CBNG revert will never be popped from the CBNG queue.
- Backend: A script has been included in the [/utilities] directory such that one can de-queue any edit that has 'x+' pass actions, in order to increase the efficiency of other editors and aim for broader coverage. Debating whether to use on the main en-wiki project itself.
Thanks for everyone's continued support. West.andrew.g (talk) 09:04, 29 December 2011 (UTC)
Reverting multiple edits
A suggestion: In the case where clicking the Vandalism button would revert more than one edit (i.e., the same editor has made several consecutive edits), can the browser window show the cumulative difference for all of them? That is, what I see on the left side of the browser window should be what will the article will look like after reversion. Currently, it shows only the most recent edit. Thanks, Peter Chastain (talk) 11:30, 30 December 2011 (UTC)
- I agree that this would be really helpful. Yaris678 (talk) 19:03, 5 January 2012 (UTC)
- Acknowledged. No promises on the release time-line, though. Thanks, West.andrew.g (talk) 03:00, 6 January 2012 (UTC)
- Yes, this would be a great addition. Thanks for your work, too. Ocaasi t | c 03:45, 16 January 2012 (UTC)
Whitelist(s)
Hello - I am curious if there are any 'whitelist' settings for STiki; i.e., If a user has more than 95,000 edits (or is a bot-flagged account)? An edit by such a user is most likely not vandalism. Avicennasis @ 03:10, 7 Tevet 5772 / 03:10, 2 January 2012 (UTC)
- Hello. First, we should distinguish that the "STiki" GUI tool is the unification of multiple anti-vandalism algorithms (what we would call "queues"). The "Cluebot-NG" and "WikiTrust" algorithms are largely outside my control. I have independently (with my co-authors) written the "metadata" and "link spam" queues. Each operate in their own unique way.
- Generally, however, one feature most of these algorithms integrate is "is the editor registered?" and "how many edits have they made?". On the basis that registered and prolific editors rarely commit vandalism, the algorithms are likely to score their subsequent edits "well" (i.e., unlikely to be vandalism). I'd be surprised to see too many edits by these type of editors appear at the top of any queue (creating a kind of de facto whitelist).
- Speaking about the "metadata (STiki)" queue in particular. I can say that bot edits should never appear in the queue, per a hard-coded rule -- and that any editor with that many edits should be quite exempt on the basis of probability. Have you had any experiences to the contrary? As noted above (or perhaps on my talk page) CBNG has tended to score itself quite poorly (but again, outside my sphere of influence). Thanks, West.andrew.g (talk) 07:29, 2 January 2012 (UTC)
- Thanks for the quick response. It seems bot edits are still coming in through STiki - e.g., like this. It's also interesting to note that the message says the change was reverted, however it seems that is also not the case. Avicennasis @ 12:30, 7 Tevet 5772 / 12:30, 2 January 2012 (UTC)
- Using what queue? How often does the latter happen? Thanks, West.andrew.g (talk) 04:32, 3 January 2012 (UTC)
- I have gotten a few today, using the Cluebot-NG queue. I'm pretty sure they were from bots other than CBNG. Peter Chastain (talk) 06:35, 3 January 2012 (UTC)
- I guess the changes discussed at User talk:West.andrew.g#STiki queues edits from approved bots? only related to the STiki queue. That would explain the recent change to filter out CBNGs edits from its own stream. It might be worth expanding that filter to exclude all approved bots. (I guess that false positives of approved bots have not been an issue for CBNG coders because that is filtered out by the whitelist that it uses. Since STiki takes the scores irrespective of the whitelist, it might be worth applying some filtering.) Yaris678 (talk) 20:27, 4 January 2012 (UTC)
- I have argued here and here that the goal of recent changes patrolling (RCP) should be quality control, rather than just the detection and reversion of vandalism. Editors who would never vandalize can still make mistakes, and I always appreciate an extra pair of eyes looking at what I have done. STiki, with its ability to route changes to individual reviewers, could be a great tool for this kind of peer review. If, as I hope will eventually happen, there were enough RC patrollers, it might be possible for most changes (not just the high-risk ones) to be reviewed. STiki's use of multiple queues would allow those who are interested in just vandalism to see edits with a high probability of vandalism, while those of us who want to do quality control could see a broader range of edits, with no explicit whitelisting. If I understand the STiki (metadata) queue correctly, it scores editors by the persistence of their edits and prioritizes new changes accordingly. The only new thing that STiki would need to do would be to look at categories and reviewers' interests (via some sort of profile) so that it could match each article to an appropriate reviewer. Peter Chastain (talk) 06:49, 5 January 2012 (UTC)
- Some interesting thoughts by Peter there. Here my reactions to them:
- Expanding STiki to cover more general RCPing sound like a great idea. Perhaps STiki could have a "problematic but not vandalism" button (or maybe two such buttons, one that reverts edits and one that doesn't). This could be used to teach the machine-learning algorithm what such edits look like and so inform a new "problematic but not vandalism" queue.
- We could go further and have multiple new buttons and queues: "POV pushing", "Unclear writing" etc. However, such complexity may be more confusing than it is helpful.
- Persistence of edits by an editor is actually how the WikiTrust algorithm and queue work. Having a high score on WikiTrust is a good indicator that you are a trustworthy editor. Having a low score probably indicates that you are a newbie, but it doesn't mean you are a vandal, which is why the vandalism density in that queue is usually lower.
- Using the WikiTrust score of an editor to an algorithm means the algorithm doesn't need a separate white list. (arguably a new approved bot is an exception... but then again... maybe a few extra eyes on its edits would be a good thing).
- The STiki algorithm is based on metadata, which includes the categories that the page is in and the geographic location of IP editors. More info is at Wikipedia:STiki#Metadata scoring and origins
- Directing edits towards editors according to areas they know about sounds like it would be really powerful if we had a large user base. It might also take a lot of programming effort (although Andrew would be a better judge of that). The leaderboard says we have 32 editors who have edited in the last 30 days so it might not be at the top of Andrew priority list... but it would be really cool to see. You could make the profiles self-selecting... or you could make them another thing that is learnt by the algorithm. i.e. it could look at the sort of edits you are most and least likely to pass on... probably mostly based on the category that the page is in.
- Thanks, Yaris678 (talk) 18:59, 5 January 2012 (UTC)
- Just had a thought: If you want to use STiki for general RCPing now, the best way is to use the WikiTrust queue. It will give you lots of edits by newbie users that have not been looked at already by any STiki user. It’s not quite as good as full problematic-but-not-vandalism feature, but it’s a start.
- On a similar note, if Andrew did want to create a special problematic-but-not-vandalism queue, then starting the machine-learning algorithm off by looking mainly at the WikiTrust score of the user is probably a good bet.
- Yaris678 (talk) 10:17, 6 January 2012 (UTC)
- Yaris, thanks for your thoughts. From the talk pages, I think you might be the most experienced STiki user here. Looking at your numbered points:
- 1. STiki is already useful for general RCPing. Despite the fact that I usually work in the Cluebot queue, I spend most of my time looking at pages, fact-checking, fixing things that look like vandalism but are really poor writing, discussion etc. The question is how to make STiki more usable for that.
- 2. STiki is already a bit crowded. I would not like to see the browser window get much smaller. I like the approach of one of the other tools (Huggle???) which pops up a menu of different kinds of problems, each generating its own warning message. I would want one choice to be "none of the above; I will write my own." It is much too easy, and tempting, to "bite the newbies" with STiki's single "Vandalism" button.
- 4. I agree, WikiTrust probably makes explicit whitelists unnecessary. As new bots establish their own reputation, their edits will get much less priority. (An exception could be bots like the one that inserts dates when editors forget to put them into a tag: they are never mistaken, but their edits would not persist after the tag has been removed.)
- 6. Intelligent routing (routing by subject) is the most important part of what I need. A lot of vandalism is subtle. I like to fact-check, but I am not the best person to look at, e.g., television series, football clubs, etc. Even with 25 regulars, intelligent routing would probably be helpful, but that number should increase. I spend much more of my time RCPing with STiki than I did before I started using it, because my increased productivity with STiki makes me feel that my time is well spent. So, STiki could be an important part of changing the Wikipedia culture toward increasing the number RCPers in particular and quality-control people in general. (I often notice poor writing in the unchanged parts of an edit, and go off to fix that, so STiki is useful there, too.) Getting more people to RCP is essential, for Wikipedia to be useful as a source of information.
- Some interesting thoughts by Peter there. Here my reactions to them:
- I have argued here and here that the goal of recent changes patrolling (RCP) should be quality control, rather than just the detection and reversion of vandalism. Editors who would never vandalize can still make mistakes, and I always appreciate an extra pair of eyes looking at what I have done. STiki, with its ability to route changes to individual reviewers, could be a great tool for this kind of peer review. If, as I hope will eventually happen, there were enough RC patrollers, it might be possible for most changes (not just the high-risk ones) to be reviewed. STiki's use of multiple queues would allow those who are interested in just vandalism to see edits with a high probability of vandalism, while those of us who want to do quality control could see a broader range of edits, with no explicit whitelisting. If I understand the STiki (metadata) queue correctly, it scores editors by the persistence of their edits and prioritizes new changes accordingly. The only new thing that STiki would need to do would be to look at categories and reviewers' interests (via some sort of profile) so that it could match each article to an appropriate reviewer. Peter Chastain (talk) 06:49, 5 January 2012 (UTC)
- I agree that the WikiTrust queue is probably the best source of bad edits that are not necessarily vandalism. On the other hand, I find myself using the Cluebot queue, simply because I like the reinforcement of finding and fixing a lot of problems in a short time.
- Thanks, Peter Chastain (talk) 15:52, 6 January 2012 (UTC)
- An additional thought on user messages: It would be nice if there were a way to send them, even when the "innocent" button is pushed. "Please provide an edit summary" is one common example, but others could be "welcome, newbie" or even "kudos!" Sure, this can all be done outside STiki, but the goal here is productivity, and it is nice to have the article title filled in (especially since STiki currently does not allow us to copy into the clipboard from the browser window). Peter Chastain (talk) 16:39, 6 January 2012 (UTC)
User warnings for unconstructive editing
I'm not clear on why STiki warns some users and not others. 117.203.70.73 received no warning for their edit, but 92.237.169.16 did. Is this intentional? — Preceding unsigned comment added by Wrathkind (talk • contribs) 22:19, 6 January 2012 (UTC)
- Whether or not a warning is issued is entirely up to the editor who is running STiki. For each suspect edit, there is an option to warn or not to warn, and the editor running STiki can choose to not issue warnings for edits which are merely dubious (such as valid but inappropriate commentary as in the example for 117.203.70.73). Johnuniq (talk) 00:27, 7 January 2012 (UTC)
- Well I did have STiki set to warn 117.203.70.73, which is why I was confused as to why no warning showed up. The checkbox for "Warn Offending Editor" is selected by default, and I've not deselected it during my use (and it appropriately warned 92.237.169.16 as expected). So, sometimes it warns; sometimes it doesn't? — Wrathkind (talk) 00:56, 7 January 2012 (UTC)
- Oops, sorry: I missed that you are the person who was running STiki. I have not used STiki much, but when I did I quite often asked it to not warn, and it always (I think) did what I wanted. That is, I never noticed it failing to warn when asked. By the way, I would not warn an editor for this edit. If I had unlimited time, I might offer a friendly suggestion, but not a warning. Johnuniq (talk) 02:07, 7 January 2012 (UTC)
- Yes, I understand. I suppose I should take the time to toggle the warning option per user. I tend to just let STiki do its thing once I notice an inappropriate edit, so I've never bothered to select/deselect the warning option.
- As for the intermittent blip in user warnings, it happens occasionally and I couldn't find a pattern to it. Perhaps it's an unintended result of network issues. I was just curious anyway. — Wrathkind (talk) 03:24, 7 January 2012 (UTC)
- After taking your advice to purposefully select the warning toggle with each edit, I discovered the answer to my question. It seems that if the vandalism is a certain number of days old, STiki will abstain from sending out a warning (presumably because it's pointless to do so after a certain amount of time). The message it gives me is along the lines of "Undid 1 edit - no warning given - (edit(s) too old)". That solves my confusion; thanks! — Wrathkind (talk) 04:56, 7 January 2012 (UTC)
- Correct. If an edit is a day or two old (I don't remember the exact threshold) and the editor is an IP address, then even if the "warning" checkbox is checked, a warning will not be issued. This is due to concerns about DHCP (dynamic IP addresses). We don't want collateral damage (i.e., someone receiving a message intended for an earlier user of the computer/IP). Thanks, West.andrew.g (talk) 17:59, 7 January 2012 (UTC)
- Oops, sorry: I missed that you are the person who was running STiki. I have not used STiki much, but when I did I quite often asked it to not warn, and it always (I think) did what I wanted. That is, I never noticed it failing to warn when asked. By the way, I would not warn an editor for this edit. If I had unlimited time, I might offer a friendly suggestion, but not a warning. Johnuniq (talk) 02:07, 7 January 2012 (UTC)
- Well I did have STiki set to warn 117.203.70.73, which is why I was confused as to why no warning showed up. The checkbox for "Warn Offending Editor" is selected by default, and I've not deselected it during my use (and it appropriately warned 92.237.169.16 as expected). So, sometimes it warns; sometimes it doesn't? — Wrathkind (talk) 00:56, 7 January 2012 (UTC)
Channelling edits to the most appropriate user
A collaboration opportunity?
I was thinking about how we would channel edits to the most appropriate user, if STiki were to go that way. One thing that occurred to me was that it would make sense to collaborate with the makers of SuggestBot. I looked into it, and it turns out that they are GroupLens a research lab in the Department of Computer Science and Engineering at the University of Minnesota.
Andrew, maybe there is an opportunity for a collaboration between Minnesota and Pennsylvania.
Yaris678 (talk) 15:03, 10 January 2012 (UTC)
Red text in diffbrowser under Linux/Windows
When I run STiki under Linux, I see red text in the diff-browser to indicate what changed between the two versions. But when I run STiki under Windows, I don't get any red text in the diff-browser, making it hard to tell the difference between the versions. Can anyone confirm this? I´m wondering if it is a bug, or if it is just something on my computer. Edit: I'm using windows 7. Arthena(talk) 09:13, 27 January 2012 (UTC)
- I get red text on my machine, which is Windows XP. Yaris678 (talk) 13:37, 27 January 2012 (UTC)
- When Mediawiki updated a while back, the devs changed some style-sheet classes about how this was represented, breaking my parser. I pushed an update of STiki to address the issue. Might your STiki version on Linux be newer than the Windows one? If so, see if an update fixes the issue. If not, let me know, I have some Windows machines I can play around on. Thanks, West.andrew.g (talk) 02:20, 28 January 2012 (UTC)
- Ok, I updated to the latest version and it fixed it :). Both STiki_2011_08_01 and STiki_2012_01_17 are "version 2.0", so I thought I already had the latest version, but I didn't. Thanks. Arthena(talk) 16:13, 28 January 2012 (UTC)
- Yeah, its not the most elegant system, but the big version number, i.e., "2.0" is hiding in a lot of tricky places that are not easily updated. Thus, when minor bug fixes are pushed, the versions are just denoted by the build date. Thanks, West.andrew.g (talk) 17:19, 28 January 2012 (UTC)
Flexibility
I use STiki regularly and I must say I find it extremely good and useful. However, I face issues when it comes to warnings. Let’s say if an editor is involved in ((subst:uw-unsourced1)) or any other edit that requires warning other than ((subst:uw-vandalism1)), then for issuing the appropriate warning, we have to go outside the tool and manually issue the warning (unless I did not understand the functionality). Due to this either a user has to falsely mark “Pass” or “Innocent” to the article OR identify the edit as “Vandalism” OR revert edit manually. It might be time consuming and STiki user might look for other available alternate tools to facilitate this reverts. Another suggestion is that user should be given an option to either load the edits by time (oldest to newest but NOT other way around else everyone will only review the most recent edits) or random. Can someone please consider this? Many thanks and once again, great job in developing this tool. Cheers AKS 18:44, 4 February 2012 (UTC)
- My suggestion is that instead of just 3 buttons 'Vandalism(undo)', 'Pass', 'Innocent', There can be additional buttons like 'Need References(Tag without undo)', 'Need Reference(Undo)', 'Spam(Tag without undo)', 'Spam(undo)'. Also is it possible to put a button called 'Read Article'. Clicking the button will automatically open the article in the default web browser. This would be very useful if the user is not able to decide without knowing the context. It is because of these problems I mistook that user AKS was wrongly reverting articles. --Anbu121 (talk me) 14:49, 5 February 2012 (UTC)
- I'll address several of these points in greater depth later. In the immediate, I'll note that in the "metadata panel" next to the article name there is a link which leads directly to the article in question. This is the equivalent of your "read article" button request.
- For Example, Please see this. The diff seems like that it is test/vandalism, but if you read the article, you will find that it is a perfect edit with appropriate sourcing.--Anbu121 (talk me) 14:59, 5 February 2012 (UTC)
- Yes, there is a universal need to understand the context of an edit, regardless of how you want to tag/revert it. This is why there are humans in the pipeline. If it were truly straightforward, algorithms would revert ALL vandalism autonomously, instead of around ~40%. Thanks, West.andrew.g (talk) 16:01, 5 February 2012 (UTC)
- STiki intends to be an anti-vandalism tool, not a general-purpose tool for undoing diverse damage. No matter how many classification buttons there are, it is up to humans to use them correctly. Per the 2012-04-11 release, there is a "good-faith revert" button which aims to handle some of the murky cases. It is not my intention to add any more buttons (i.e., unsourced, non-notable, etc.) and this matter will be considered closed. Thanks, West.andrew.g (talk) 18:18, 11 April 2012 (UTC)
Following redirects
Is it possible to have STiki follow redirects, like Twinkle does, when warning users? Here, a message was left on a redirected talk page that probably should have been on the target of the redirect. Logan Talk Contributions 22:14, 12 February 2012 (UTC)
- This can be done, no issue (placed on my TODO list). But out of personal curiosity, I'd also like to know: (1) why, and (2) how prevalent this is? I can understand why a seasoned editor might change user-names and want to redirect their talk page. Why would a vandal (or vandal to be) get involved in such low-level username stuff? Thanks, West.andrew.g (talk) 16:58, 13 February 2012 (UTC)
- This bug report has been filed as ticket number #T002 and included in the bug-tracking table on the main talk page (Wikipedia_talk:STiki). It is my intention to fix this bug in an upcoming release. Thanks, West.andrew.g (talk) 18:20, 11 April 2012 (UTC)
Multiple warnings
What made STiki do this? SD5 23:50, 14 March 2012 (UTC)
- Took me a second, but I figured it out. Essentially it is because there are two sections named "March 2012". When Wikipedia scans a talk-page it extracts only the section corresponding to the current month, i.e., "March 2012" (if it does not exist, it will simply append a new section). Then it scans that section for the highest warning level issued. The problem here is that when I tell to get the section named "March 2012" it gets only the first one, where a warn-level 1 has been issued, and decides a warn-level 2 is appropriate (multiple times). It appends this warning to the "March 2012" (which it seems the append action interprets as the "last" section with that name). Moral of the story: don't have sections with the same name. I will try to think of a way to hack around this, but it ultimately comes down to CBNG duplicating section headers. It has had issues with this in the past, and in fact, it screws up its own warning system incrementation when it does this. Thanks, West.andrew.g (talk) 03:01, 15 March 2012 (UTC)
- Many thanks for taking the time to write that explanation. SD5 03:10, 15 March 2012 (UTC)
- Perhaps it would be possible for STiki to remove duplicate header sections before deciding what warn-level to give the editor. After using Twinkle, I sometimes find myself, upon reading a talk-page, having to manually remove the duplicate header myself. I'm seriously considering switching to STiki after reading this article and discussion. — Glenn L (talk) 06:47, 15 March 2012 (UTC)
- When you say remove, I assume you mean merge. That would be preferable. That would be a cool feature if STiki did that. It may seem that the ideal solution is for CBNG to sort out their own code. However:
- The CBNG team haven't fixed it yet, despite being alerted to the problem a while ago.
- If STiki does the fix then we know that STiki isn't going to get confused by this issue, which could (in theory) arise for other reasons.
- BTW Glenn, Yes... you should switch to STiki! As a mathematician, I'm sure you will appreciate it's use of alternating decision tree methods.
- Yaris678 (talk) 17:52, 15 March 2012 (UTC)
- When you say remove, I assume you mean merge. That would be preferable. That would be a cool feature if STiki did that. It may seem that the ideal solution is for CBNG to sort out their own code. However:
- Perhaps it would be possible for STiki to remove duplicate header sections before deciding what warn-level to give the editor. After using Twinkle, I sometimes find myself, upon reading a talk-page, having to manually remove the duplicate header myself. I'm seriously considering switching to STiki after reading this article and discussion. — Glenn L (talk) 06:47, 15 March 2012 (UTC)
- Many thanks for taking the time to write that explanation. SD5 03:10, 15 March 2012 (UTC)
Other Wikipedias
Are there any plans to adapt STiki into other Wikipedias? Can I help adapt it to the Portuguese Wikipedia? Chico Venancio (talk) 16:40, 17 March 2012 (UTC)
- It is not an immediate priority of mine. It would not be terribly difficult to do, though, mostly translation would be involved. Do you have any programming experience? The question becomes where it would be hosted (it needs to always be running, and have a database to store data). Thanks, West.andrew.g (talk) 21:18, 17 March 2012 (UTC)
- I have some programming experience. I can program in Python (see my bot at ptwiki), and do understand a bit of Java. I don't know where to host it tough. Where is it hosted now? Can toolserver be used? Chico Venancio (talk) 01:52, 18 March 2012 (UTC)
- The current version is hosted on a machine at the University of Pennsylvania. However, my forthcoming dissertation makes that an unreliable place for any new project to get started. The toolserver is a good idea, and I should probably investigate migrating the en.wiki version there sometime soon. Thanks, West.andrew.g (talk) 14:23, 18 March 2012 (UTC)
- So, what do you need translating? Chico Venancio (talk) 16:43, 19 March 2012 (UTC)
- The internationalization has not been formalized. You would just need to take every English string in the STiki source-code and translate it to Portuguese. Most of this would be general translation (and I think calls to the Mediawiki API remain in English) -- but some parts are very specific (for example, comments that indicate where vandalism was undone). This would bring the GUI into Portuguese. The more challenging part is the algorithm, not because of translation, but because it would need a labelled corpus of Portuguese vandalism/non-vandalism over which to learn. What anti-vandal tools currently exist for Portuguese? Does Huggle? Thanks, West.andrew.g (talk) 19:40, 21 March 2012 (UTC)
- West.andrew.g: I've edited the source a bit and sent you an email to the git repository. Most all the gui interface has been externalised and is ready to be translated. I didn't include any links to the specific project in the language file, I thought that could be a separate option (translation of the tool, and version of WP). Also included were a few fixes that were bothering me, mainly persistent user properties and hiding gui options that were greyed-out. Feel free to use any or all of my edits (AKA I'll let you look over everything and release the language file yourself). –meiskam (talk•contrib) 17:49, 22 March 2012 (UTC)
- Huggle does work in ptwiki and is the main anti-vandal tool by far. I'll take a look at the code and start translating the string literals, should be done by the end of next week (or sooner). Chico Venancio (talk) 23:26, 22 March 2012 (UTC)
- Chico Venancio, I've already put the string literals into an external file for easy translating .. just waiting for the go-ahead from Andrew. –meiskam (talk•contrib) 23:31, 22 March 2012 (UTC)
- Sure, I'll wait. Chico Venancio (talk) 01:11, 23 March 2012 (UTC)
- Chico Venancio, I've already put the string literals into an external file for easy translating .. just waiting for the go-ahead from Andrew. –meiskam (talk•contrib) 23:31, 22 March 2012 (UTC)
Perhaps it is best if Andrew addresses this, but I guess he is busy right now, so here is what I think:
Facts:
- Like ClueBot NG, STiki uses a corpus of edits (identified as either vandalism or innocent) to work out a way to determine the probability of a new edit being vandalism.
- Unlike ClueBot NG, STiki mostly uses metadata to determine the probability of an edit being vandalism. This includes things like the name of the article, the categories the article is in and the location of an IP editor.
- STiki does use a small number of linguistic clues to inform the probability calculation, things like the inclusion of exclamation marks and words like “gay” and “cock”.
- Unlike ClueBot NG, STiki probabilities are not based purely on a fixed corpus. It also learns from the information provided by users when they click “vandalism” or “innocent”.
- Unlike ClueBot NG, STiki requires users do the actual reverting. This means that it is not disastrous if the probabilities are calculated inaccurately… although obviously the more accurate the better because it makes things more productive for the users.
Opinion:
- The best solution may be to identify a Portuguese corpus that someone has already created. Failing that, our options are:
- Create a Portugese corpus (which will take time and effort)
- Create a Portugese STiki with “best guess” probabilities, based on a translation of the English STiki alternating decision tree. The Portugese STiki would be given a high rate of learning so that it could work out the correct pt.wp probabilities, based on its own user responses.
- It would be very easy to provide a translation of the linguistic clues that STiki uses. This will be required whichever of the above options is taken. It may even be moderately easy to identify similar clues in pt.wp that are not direct translations of en.wp clues.
- It would be very straight-forward to use the same IP address info as a strating point for option 2.
- It would be more difficult to provide translations of article and category titles and the associated probabilities would, most-likely, be very different in many cases.
I’m sure Andrew will have an opinion on the feasibility of option 2. Would it be possible to do a Portugese STiki, along the lines of option 2, which does not take any data from the English version in relation to articles and categories? I think that would involve recalculating decisions trees based on the English fixed and user-generated corpuses and only using such information as IP address, IP location, age of user account and linguistic clues.
Yaris678 (talk) 12:55, 26 March 2012 (UTC)
- Sorry for the very slow response; indeed, I have been away a while. I'll try to be more responsive on the talk pages, but not a significant amount of programming is going to get done any time in the immediate future. If I were to do this, I'd advocate a hybrid approach of the #1 and #2 proposed above. First, I would translate all of the English strings into Portuguese (including the back-end stuff). Then, I would get the server-side stuff running (feature collection, IRC listening) but instead of using any logic or queueing, I would issue random edits for the first several 1000 classifications. This would take care of the corpus building step, and one could build an ADTree on top of this.
- I am not against a very simple, hand-written scoring system (like the original ClueBot, but rooted more in metadata). However, I don't really see the point in simplifying STiki to the level of Huggle or AWB given that these tools already exist. Either way, I don't think a manual translation of the English ADTree to a Portuguese one will turn out very well. I'd love to consult on this, but we need someone with coding ability and a server to host the back-end. Can someone know where we stand in terms of this? and the time commitment they'd be willing to invest? Thanks, West.andrew.g (talk) 02:33, 29 March 2012 (UTC)
Some more thoughts:
- If you need somewhere to install the STiki server software, Toolserver may be the way to go.
- There is value in using the STiki client-server model, even if you are only using a basic system to identify suspicious edits. The client-server model prevents the edit clash that can happen with Huggle.
- I think WikiTrust is up and running on pt.wp. At least... it is mentioned in their news on 28th January 2011. WikiTrust is pretty sophisticated.... but it will give you a lot more non-vandalism than other methods. I think it is fair to say that it calculates accumulated trust, rather than accumulated distrust, and so fails to distinguish between a vandal and any other newish user.
- If you could have a WikiTrust queue and a queue based on simple calculation of distrust (such as in Huggle) then that might be a worthwhile start. (Not as good as a STiki "metadata" queue or ClueBot NG queue, but better than nothing)
Yaris678 (talk) 21:00, 1 April 2012 (UTC)
- Closing this thread, the need for localization has been added to the bug-reporting and feature-request table atop the main talk page (T#003). I don't speak any other languages. In order to move forward, I need a user to come forward who has the knowledge and resources to get this done for another language edition -- only then will I do the heavy lifting to support localization. Nonetheless, this thread is a nice reference for how such a setup might be bootstrapped. Thanks, West.andrew.g (talk) 18:24, 11 April 2012 (UTC)
Bots on Whitelists
Has this been done yet? This is getting somewhat bothersome. Avicennasis @ 01:45, 10 Nisan 5772 / 01:45, 2 April 2012 (UTC)
- This is a function of queue choice. The metadata queue (of my design) will never pop an edit from a bot. The popular CBNG queue, I believe, does process bot edits -- and therefore the STiki back-end will enqueue them for display. I can (and will) write a filter that makes sure no bot edits ever enter any queue. Regardless, this seems to be a bit of an isolated issue? A single user repeatedly reverting one unusual redirect? Have you had problems with STiki users in the past? Thanks, West.andrew.g (talk) 20:35, 4 April 2012 (UTC)
- I've had four different users get false positives and leave messages for AvicBot. I have no idea whether other bots get hit by this or not. Avicennasis @ 21:46, 12 Nisan 5772 / 21:46, 4 April 2012 (UTC)
- I've seen (and hit Innocent) for a number of bot edits, most notably AvicBot and another one that was commenting out deleted pictures; this has included seeing slightly different edits several times in a row. Cluebot seems to have a problem figuring out such edits as being innocent; I suspect it isn't doing reverts itself because of after-the-NN filtering. Allens (talk | contribs) 22:05, 4 April 2012 (UTC)
- Alright. Regardless, this on my TODO list (which is due for a good clearing over the weekend or early next week). Thanks, West.andrew.g (talk) 01:05, 6 April 2012 (UTC)
- STiki will no longer enqueue edits made by bots into the CBNG queue per the 2012-04-11 release. Some older bot edits may remain in the queue, but these should be flushed in due time. This matter is considered closed. Thanks, West.andrew.g (talk) 18:27, 11 April 2012 (UTC)
Stiki vs. Huggle
Any benefit using sticki over huggle? Dan653 (talk) 18:00, 3 April 2012 (UTC)
- They're both very different tools, which have a host of varying advantages. I use both. Orphan Wiki (talk) 00:17, 4 April 2012 (UTC)
- For me, the two big advantages are:
- The client-server model prevents me from wasting time looking at an edit that someone else is looking at... and prevents there being and "edit clash" when we both try to revert it.
- The more advanced methods of estimating the probability of an edit being vandalism mean my attention gets directed to more vandalism.
- Yaris678 (talk)
- For me, the two big advantages are:
- Stiki has a 'memory': it can bring to your attention vandalism that is hours or even days old. Stiki is useful for finding and reverting vandalism that slipped through the cracks of recent changes patrol. Arthena(talk) 12:52, 4 April 2012 (UTC)
- Yes. And a related benefit is that the vandal has probably got bored and gone away by the time you revert the vandalism. Yaris678 (talk) 15:55, 4 April 2012 (UTC)
Others have summed up the basic advantages. I myself have never used Huggle extensively. For those that have, I am curious what the hit-rate is? i.e., if you look at X edits, what percentage tend to be vandalism? I know Huggle does some prioritization based on some elementary rules. FWIW, STiki (driven primarily by use of the CBNG queue) tends to be hanging around the 50-60% area as of late, even on days when ~2000 edits are being classified. Thanks, West.andrew.g (talk) 20:40, 4 April 2012 (UTC)
- The percentage of edits coming through Huggle that are vandalism, more often than not, is never too great. In periods of quiet, HG is hardly worth bothering with. Orphan Wiki (talk) 18:33, 6 April 2012 (UTC)
- I have put the ideas brought up here into the section Wikipedia:STiki#Comparison to other tools. Everyone should feel free to add to and tweak this table, in the usual wiki way. Or, if you are having trouble understanding the markup of the table, you could suggest an idea here and I will add it myself. Yaris678 (talk) 07:25, 11 April 2012 (UTC)
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 | → | Archive 10 |