Langbahn Team – Weltmeisterschaft

User:ClueBot NG/FAQ

This page contains Frequently Asked Questions about ClueBot NG. Please make sure your question is not listed here before asking.

Core Algorithm

Why did ClueBot NG classify this edit as vandalism or constructive?

ClueBot NG works by generating a probability that a given edit is vandalism. Edits with a probability above a certain (threshold) score are considered vandalism. It's often difficult to tell exactly what the origins of a score are. ClueBot NG examines statistics from the edit, and calculates relationships between the statistics and the output. These relationships are not always apparent to a human. A human usually looks primarily at the meaning of content, while ClueBot NG looks at a large set of numbers. It's usually very difficult or impossible for a human, just looking at that set of numbers, to figure out why ClueBot NG did what it did.

In many cases, this leads to correct classifications that could easily be missed by a human, or extremely rapid reverts where it could take a human some time to do research. On the flip side, it also means that it can be very difficult to determine why a false positive occurred. For more information, see False Positives.

Why don't you decrease/increase the weight of exclamation points/shouting/<Insert Metric Here>?

ClueBot NG does not use weights. It uses an algorithm called an artificial neural network that not only determines optimal weights, but can also discover more complex patterns and relationships. There are no static weights that can be modified.

Where is the bad words list?

There is no bad words list. The bot uses Bayesian classification to automatically generate a list of words with precise statistical probabilities.

What happens if someone legitimately uses bad words inside of quotes?

The bot handles quoted text differently from regular text. Simply quoting a range of text is not enough to "fool" the bot if the edit is vandalism, but for constructive edits, some sometimes-acceptable bad words inside of quotes may be ignored.

Dataset

What is the dataset and how is it used?

ClueBot NG is different from other anti-vandal bots in that it is not preprogrammed to recognize vandalism. Instead, it learns what is considered vandalism by reviewing a very large number of both vandalism and constructive edits. This very large set of edits is called the dataset.

Why don't you automatically generate the dataset?

We use some edits that are automatically generated, but this is far from ideal, for two reasons. First, generated edits are always biased to some degree, depending on whatever statistics are used to generate them. Second, generated edits are often inaccurate, and small inaccuracies in the dataset can cause large decreases in accuracy.

How can I help?

We need people to manually review edits for the dataset. If you'd like to help out, please see here.

Edits

Why don't ClueBot NG's edits show up as bot edits?

Anti-vandal bot edits usually aren't tagged with the bot flag. This is intentional, and is not specific to ClueBot NG – it applies to all anti-vandal bots.

There are two reasons for this:

  • Since anti-vandal bots are doing a steady stream of edits that would otherwise (usually and eventually) be done by a human, unflagged edits do not increase the volume of edits that show up in a feed, nor do they increase clustering.
  • Anti-vandal bots do not perform precise, exact work like most other bots do. They act more like humans, with most edits correct and good, but a small percentage of mistakes. Bot edits show up as (unflagged) human edits so they can be reviewed for possible mistakes if necessary, like other human edits.

False Positives

Why does ClueBot NG have false positives?

ClueBot NG works by generating a probability that a given edit is vandalism. Edits with a probability above a certain (threshold) score are considered vandalism. The higher the threshold is set, the fewer false positives there are, but also the fewer vandalism edits are caught. To catch a large amount of vandalism, the threshold must be set at a level where there are very few, but some, false positives. More information is available in the algorithms and false positives sections.

Why did this false positive happen?

See Why did ClueBot NG classify this edit as vandalism or constructive?

I think ClueBot NG has too many false positives. What do I do about it?

ClueBot NG's false positive rate is set by the operators at runtime. The current setting can be found in the statistics section. It has jumped around since the start, at times being 0.5% (1 in 200), 0.25% (1 in 400), and 0.1% (1 in 1000).

Before complaining about the false positive rate, please read the section on false positives to make sure you fully understand their role in the bot's operation, and the implications of adjusting the rate. Also keep in mind that the set false positive percentage is a maximum, and actual rate will probably be less due to post-processing filters.

If you feel you have a compelling reason that the false positive rate is too high, and your reason has not already been addressed on the talk page (please look through the archives – this topic has already been discussed at length several times), post your concerns and suggested false positive rate on the talk page. If your concerns have already been addressed by past discussions, you suggest an unreasonable false positive rate, or you do not suggest a false positive rate at all, your post will likely not receive a thorough response.

What is the false positive rate and how is it calculated?

The false positive rate is the percentage of not-vandalism edits that are incorrectly reverted as vandalism. It is not the percentage of bot reverts that are incorrect.

Many people probably don't even report false positives, so how can you be sure the false positive rate is accurate?

The false positive (FP) rate calculations do not involve the number of reported false positives. A series of edits, known as the trial dataset, are used. These edits are known to be correct (verified by humans) and random (this is important). They are run through the bot offline, and these results form the basis of the FP rate calculation. This ensures that the FP rate is an accurate maximum.

How do I report a false positive?

Go here.

What do the developers do with reported false positives?

We submit them to the review interface for verification, then add them to the dataset. This should improve bot operation as long as it does not introduce significant dataset bias.

In our spare time, we manually review false positives. If we come across a false positive that may be able to be prevented by some method other than dataset expansion, we look into changing the code. If we see a false positive that occurred due to some reason that has not already been addressed in previous discussions, we usually leave a comment to that effect.

Why don't you just use a page where users can post freeform responses for false positive reporting?

We used to do this, and the result was chaotic. Without explicit directions, users would leave malformed responses, responses without the required data (such as the link to the edit in question), and unrelated comments. Even with templates and comments to guide users, the result was a mess that was very difficult and time-consuming to review by hand, and impossible to automatically scan to import into the dataset.

If you believe the instructions for using the false positive reporting interface are unclear, feel free to modify and improve them. If you dislike the entire concept of the false positive reporting interface in general, feel free to suggest an alternative method on the talk page. But first, please review the archives to make sure your idea has not already been discussed, and also keep in mind that we do not have the time to both manually sort through chaotic freeform reporting and improve the bot. If you do post on the talk page, be sure to actually suggest an alternative method – complaints without suggestions do not help us improve anything.

If there's room for improvement, why don't you shut down the bot until false positives are reduced or eliminated?

Due to ClueBot NG's algorithm using a dataset instead of set rules, there will always be room for improvement. As the bot runs, people contribute to the dataset, and it learns more and more. But, even as it learns and improves, the number of false positives won't necessarily decrease – see the above section on why ClueBot NG has false positives. As long as the false positive rate is below an acceptable level, the bot can legitimately operate as an asset to Wikipedia, and improvements to the bot will result in increased vandalism catch rate. Additionally, even humans make mistakes – bots may even make proportionally fewer mistakes than humans. Bot mistakes are often noticed more because there's a greater raw volume of them (because the bot makes many more edits in total).

Feedback

I love ClueBot NG! How can I show my appreciation?

We always appreciate hearing that we do a good job. Please add praise to the praise page. Barnstars and other awards go on the awards page.

I have a complaint. Where can I register it?

Complaints are filed according to the following criteria:

  1. If you have a suggestion how to make the bot better, see below.
  2. If you have a problem with the bot's false positive rate or reporting interface, see False Positives.
  3. If you have found a bug in the bot (other than false positives), please leave a note on the talk page and we'll look into fixing it.
  4. If the bot is not operating within the expected parameters as stated on the user page (this is very unlikely, as machines do not reprogram themselves), and the bot is causing severe problems, you may use the emergency shut-off.
  5. If you feel threatened by a bot doing a human's job, you can build a time machine and go back in time a few decades.
  6. If you have a complaint about the bot's operation, and you do not have a helpful, useful or practical suggestion about how to solve the problem, we would prefer not to hear your complaint.

I have a suggestion. How do I let you know?

We love helpful suggestions, and they can be left on the talk page. Before posting, please make sure your suggestion is realistic, practical, and makes sense. Also, please glance through the archives to make sure the suggestion has not already been submitted and/or implemented.

We try to respond to and potentially implement helpful suggestions as soon as possible, particularly if they have a real chance of improving the bot's operation.

Origins

How does ClueBot NG relate to the original ClueBot?

There is very little relation, besides the name. The core and algorithms used are written by different people, and use entirely different concepts. The only code shared between ClueBot and ClueBot NG is the interface to Wikipedia, and even that was refactored for ClueBot NG.

NaomiAmethyst wrote the original ClueBot in its entirety, and maintained it for three years. Crispy wrote the core (vandalism detection algorithm) of ClueBot NG. Crispy and NaomiAmethyst were part of the same organization called ClueNet, which is the origin of both bot names. In light of the new core, and greatly increased edit rate, NaomiAmethyst largely refactored the ClueBot interface code and wrote the currently-used dataset collection scripts, as well as the review interface, for ClueBot NG.

What does the "NG" stand for?

It stands for "Next Generation".

Other wikis

How do I run ClueBot NG on my own MediaWiki installation?

Setting up ClueBot NG can be a complex process. You'll need to talk to the development team for instructions and help with setting it up. To talk with the dev team, join the IRC Channel.

Before trying to set it up, you need to consider a few things. Most importantly, ClueBot NG learns what is vandalism based on a dataset, which often needs to be very large to be effective. To be effective on Wikipedia, it needs at least 10,000 constructive edits and 10,000 vandalism edits. This is impractical for most smaller wikis to generate. It may be possible to use the dataset generated from Wikipedia on your wiki, if the content and vandalism trends are sufficiently similar. To test this for your wiki, you will need to at least come up with a trial dataset - say, about a hundred constructive edits and fifty vandalism edits, at a minimum. More is better. With a trial dataset, you can evaluate whether or not the Wikipedia training set can be used on your wiki. If your wiki is very large and has a sufficient number of edits to generate a full training set, then this is preferable, as it can then be personalized to your wiki's content and trends. However, generating the necessary datasets is up to you. It's important that they be random, and a representative sampling (i.e., unbiased). In addition to this, the trial dataset should be random, to accurately calculate a threshold and gauge effectiveness. You will also need a Linux/UNIX system to run the bot on, and sufficient knowledge to compile and install the various dependencies.