User:Ohconfucius/script/Sources
Objectives
Main objectives, as applied to reference sections or otherwise within citation templates, are as follows:
- make source name congruent with WP article namespace of same
- italicisation is applied in accordance with WP:ITALICS
- Wiki-link neutral, usually links will not be removed although links may be piped in certain cases where necessary
- Space neutral – there should be no impact on the disposition of spaces before or after parameters in edit mode
- clean up superfluous data, parameter miscategorisations, etc. from data trawling by Reflinks
- retraining of redirecting (indirect) piped links, where these impact the working of the script
- remove unpopulated parameters within citation templates
- remove hyperlinks within
|journal=
,|website=
,|work=
and|publisher=
fields (CS1 errors) - where the contents of
|work=
and|publisher=
is identical, the two are merged (i.e. one of them is discarded). - unification: ensure uniqueness of each of
|work=
|publisher=
and|location=
; please check that the desired one is retained.
General principles
The rationale and principles applied are as follows:
- urls situated within
|url=
are protected; this protection extends to any linking text (e.g.: whether "http://time.com", or "[http://money.cnn.com/2008/02/18/news/newsmakers/siklos_calhoun.fortune/index.htm Siklos, Richard. “Made to Measure” ''Fortune Magazine'', February 20, 2008]"); - sources cited are to be retrained where a journal is traditional media (e.g. The Times) and its online version (e.g. Times Online or times.co.uk) is cited
- the terms 'online', 'magazine' or 'newspaper' is dropped unless its use conforms with the Wiki naming conventions of the traditional source. (e.g. Time and not Time Magazine; The Guardian and not Guardian Unlimited)
- the traditional journal name (e.g. The New York Times) should reflect the article namespace, with attention being paid to the article in the subject name (e.g. The New York Times); similarly, consistent stylisation should also be ensured (e.g. The Globe and Mail - without the ampersand); 'AFP' will be expanded to 'Agence France-Presse'
- italicisation will be done on an 'opt-in' basis, although an 'intuitive basis' will also be applied
- sites with names sounding like traditional media or that contain words like 'Daily', 'Weekly', 'Monthly', 'Magazine', 'Times', 'Observer' are italicised.
- new media sources will be non-italicised by default; names suffixed
.com
,.org
,.net
, etc are classed as 'publisher' and unitalicised - In line with convention, television channels (e.g. BBC1, Fox News) and networks (particularly US TV and radio stations that use 4-lettered call signs beginning with a "K" or "W") remain unitalicised, whilst only programmes (e.g. Newsnight or Today) are considered 'works'
- Portals (e.g., Yahoo!, Google, ESPN, etc), as well as their individual channels (e.g., Yahoo! Music, Google News, ESPNcricinfo, etc), are unitalicised
- news agencies (e.g., Reuters, AFP etc) will be classed as 'agencies' within citation templates even though they may also be acting as publishers in certain cases. They remain unitalicised.
|via=
is used for Self-published sources such as Youtube or Vimeo
- functionally, correct italicisation will be performed by switching to an appropriate parameter (to or from
|work=
,|newspaper=
or|journal=
<–>|publisher=
); '|work=
' is used to achieve italicisation when switching from|publisher=
as the script cannot customise to the citation template being used).
- Citations to primary sources (social media sites such as Twitter, Facebook) are tagged {{Primary source inline}}
- as
|title=
renders the title with double quote marks, extra double quote marks bounding the title will be removed. |journal=
,|work=
,|newspaper=
,|periodical=
, where correctly used to denote journals or other works that ought to render as italicised (per WP:ITALIC) will not be disturbed.- publication locations
- are not given for e-sources; but they are generally not removed either
- are unlinked
- may be used to disambiguate names that are used for publications of different places (e.g. The Sun may refer to unrelated publications in Hong Kong, Malaysia, Nigeria and the United Kingdom)
- In general, linking status will be respected by the main function unless such preservation involves complex piping that cannot be easily scripted for; a separate button is provided for unlinking all sources.
- Where sources are news reports, publisher name is unnecessary – per documentation at {{citation}} – the cited publications themselves are often better-known than their publishers. Thus some publishers fields and publisher names are removed outright to reduce template clutter (e.g. "
|publisher=The New York Times Company
" is removed for|work=The New York Times
, "|publisher=Time Inc.
" is removed for|work=Time
). - as indicated on the doc to the {{citation}} templates, publication locations are given only where the source is not well-known (i.e. not BBC or CNN) or this isn't obvious from the journal name (San Francisco Chronicle vs The Telegraph);
- Citations to internal articles (even in other non-English language WPs) and certain deprecated sources may be removed. Care should therefore be exercised when the script is used on articles for The Epoch Times and Daily Mail, as use is permitted under WP:SELF.
- some unpopulated fields within citation templates may be removed
- Correction of CS1 errors:
- Removal of external link in any of the CS1 or CS2 citation title-holding parameters;
- Where the "
|title=
" mistakenly contains an URL, it will be blanked with a commented <!--ACTUAL ARTICLE TITLE BELONGS HERE! original text: [url]-->; - Where parameters other than
|url=
(e.g. chapter, journal, magazine, newspaper, publisher, title, work, via) contain hyperlinked text, the URL part is removed, leaving only the text; the stringshttp://
andwww.
are systematically removed in any event;
- Where the "
- Removal of italic (
''
) or bold ('''
) wikimarkup in:|<param>n=
publisher and periodical parameters.
- Removal of external link in any of the CS1 or CS2 citation title-holding parameters;
CITE name function
This function attempts to generate unique names for citations and adds "name=<string>" to the <ref> tag. The unique name is generated in two possible ways and in the following order:
- The regex searches the url of the citation for the first numerical string of 6 digits or more, and suffixes it with the domain name.
- The regex looks up the
|date=
within the url of the citation and suffixes it to the domain name in the format; it further appends the first "word" (alphabetical string) found after the date string such that the string is<domainname>yyyymmmdd-<word>
.
It will therefore not work if no unique identifier strings or dates can be found.
When faced with citations without names where the |date=
is populated, the script will prefix the domain name with the date
Fill DOMAIN_NAME function
- The regex looks at the url, extracts the domain name and populates the
|publisher=
field.
Installing the script
- Open your common.js in edit mode (alternatively, go to your user page and append "/common.js" to the end of the URL and open the page in edit mode).
- If you prefer to load this only on a specific skin, such as monobook, open your monobook.js in edit mode.
- If you make a straight copy of this script, instead of "importing" it, you may not benefit from the enhancements and bug-fixes that are made from time to time. In the latter case, you may choose to watchlist this page so you will know when to update your copy for modifications to this script.
- Copy the following code onto the JavaScript page you have chosen in the previous step:
importScript('User:Ohconfucius/script/Sources.js'); // [[User:Ohconfucius/script/Sources.js]]
- Save the page and (re-)load it – refresh the cache by following the instructions at the top of your JavaScript page.
- Bookmark the script page. This will be your cue to purge the cache on your browser for any updates to take effect.
Disclaimer: Use at your own risk and make sure you check the edit changes before you save.
- If you have automatic userscript installation enabled, you can simply visit User:Ohconfucius/script/Sources.js and click "Install" at the top of the page.
Actions and test
Link to script code: User:Ohconfucius/script/Sources.js
Speed of script execution may vary depending on browser.
Should the script stall when working on large articles, press <continue>
on the pop-up menu – once is usually sufficient.
Some examples of what the script does on its own follow: [1][2][3][4][5][6][7][8][9][10][11][12]
Once you are in edit mode, there are [FOUR] buttons from this script in the toolbox in the left margin:
- 'Fix SOURCES' ('New source module' in the current version);
- 'Add REFTAGS' (Insert missing ref tags – use when the article contains bare urls);
- 'CITE name' (gives names to all citations)
- 'Fill DOMAIN_NAME' (imports domain names to publisher field; requires the existence of an empty
|publisher=
)
Known limitations or contraindications
- The script renames certain parameters so duplications may occur, for example with aliases. (see the citations in 1, and , 4 for example)
- Journals with similar or shared names may cause false negatives: for example, where journals differ only in the definite article in the name, the script may fail to detect and correct (e.g. The Daily Star vs Daily Star).
- a publication (using
|publisher=
) which was italicised may lose italicisation due to automatic removal of the toggle if it is not included in the dictionary of journals and periodicals within the script.
Disclaimer
Users are expected to exercise careful judgement in the context of each article in which they run this script. Use at your own risk and make sure you check the edit changes before you save. It's not my fault if someone misuses this script.
Test page
- User:Ohconfucius/script/Sources/test (Year-2020 version).
- User:Ohconfucius/test/Sourcestest