Eisspeedway

Wikipedia:Manual of Style/Computer science

This manual contains some suggestions which aim to contribute towards writing clear, pleasant looking, and hopefully interesting computer science articles. This guide is a complement to the general Manual of Style.

Suggested structure of a computer science article

Probably the hardest part of writing any technical article is the difficulty of addressing the level of technical knowledge on the part of the reader. A general approach is to start simple, and then move toward more formal and technical statements as the article proceeds. The following structure is merely recommended; editorial discretion and consensus might find an alternative structure more appropriate for some subjects.

Article introduction

The article should start with an lead section of length appropriate to the article content, which describes the subject in general terms and properly summarizes the article. This opening material should give the casual reader a quick understanding of the concept. Name the field(s) of computer science this concept belongs to and describe the context in which the term is used. Write the article title and any alternative names in bold. Include the historical motivation, provide some names and dates, etc. Here is an example:

In computer science, communicating sequential processes (CSP) is a formal language for describing patterns of interaction in concurrent systems. It is a member of the family of mathematical theories of concurrency known as process algebras, or process calculi, based on message passing via channels. CSP was highly influential in the design of the occam programming language and also influenced the design of programming languages such as Limbo, RaftLib, Erlang, Go, Crystal, and Clojure's core.async. CSP was first described in a 1978 article by Tony Hoare, but has since evolved substantially. CSP has been practically applied in industry as a tool for specifying and verifying the concurrent aspects of a variety of different systems, such as the T9000 Transputer, as well as a secure ecommerce system. The theory of CSP itself is also still the subject of active research, including work to increase its range of practical applicability (e.g., increasing the scale of the systems that can be tractably analyzed).

Background and application

It is a good idea to begin the main part of the article with an informal introduction, commonly titled "Background" though "Overview" has also been used, that gives the non-technical reader a basic understanding of the fundamental concepts of the topic being presented. If the concept in question is somewhat theoretical, it is helpful to describe its purpose or applied use(s) toward the top of the article, either concisely in the lead section or in a post-lead introductory section. It is also useful to have some representative examples, which could serve both to expand on the concept in question and to provide some context as to why one might want to use that concept.

A picture is a great way of bringing a point home, and often it could even precede the technical discussion of a concept. WP:How to create graphs for Wikipedia articles has some hints on how to create graphs and other pictures, and how to include them in articles.

If not included in the introductory material, a section about the history of the concept is often useful and can provide additional insight. This often forms its own "History" section.

Structuring different kinds of articles

Computer science is a broad field which encompasses a number of different kinds of ideas. The structure of the main part of an article will vary with the type of article. Here are some general guidelines for structuring a few different classes of computer science articles. Where possible, these guidelines include one or two examples of a "good" article, i.e. an article demonstrates the style we're aiming for in that particular class of article. Always keep in mind that these are guidelines, not hard-and-fast rules: some articles will need to deviate from this structure; some articles will be hard to classify; some won't fit into these classifications at all. Use common sense in applying these guidelines.

Algorithms and data structures

An article on an algorithm should generally consist of:

  1. A description of the algorithm (including pseudocode)
  2. A formal discussion of the algorithm's time and space complexity
  3. A discussion of any implementation and performance issues

A good example is binary search algorithm, a featured article.

An article on a data structure should consist of:

  1. A description of the structure, and any operations that can be performed on the structure
  2. An overview of the applications for the structure in question
  3. A discussion of any implementation and performance issues

Classic problems

An article describing a classic problem should generally consist of:

  1. A statement of the problem.
  2. A discussion of the relevance of the problem.
  3. A discussion of the history of the problem if notable.
  4. A description of solutions to the problem if any exist.

An example is the dining philosophers problem.

Design patterns

The structure of an article describing a software design pattern or other design pattern in computer science (including creational patterns, structural patterns, behavioral patterns, and concurrency patterns) should draw on industry best-practice models provided by reliable sources on the subject. Some of these include:

Fields of study

An article describing a field of study within computer science should include:

  1. A brief overview of the history of the field
  2. A basic introduction to the key concepts around which the field revolves
  3. A description of the important principles, theorems, or results produced by the field
  4. A discussion of relationships with other fields of study (inside and outside CS)
  5. Some pointers to articles that describe particular aspects of the field in greater detail

Formalisms

An article describing some kind of formalism should contain:

  1. A brief outline of the history of the formalism (originator, major developments, etc.)
  2. An informal introduction to the basic concepts involved (preferably including some simple examples)
  3. A formal definition
  4. Discussion of important applications or implementations
  5. Description of major tools that support the formalism
  6. Brief overview of related or derived formalisms

A good example is lambda calculus.

Programming constructs

An article describing a programming construct should generally include:

  1. A brief overview of what the construct is, and how it is used (perhaps including an informal operational semantics)
  2. A listing of alternative names for the construct
  3. Pseudocode or a small code sample demonstrating the construct in use
  4. Description of any equivalences between the construct and other constructs
  5. A discussion of any variations in the semantics of the construct
  6. A discussion of any disadvantages to use of the construct

Examples include continuation, closure (computer programming), and exception handling.

Programming languages

An article describing a programming language should generally include at least:

  1. A brief outline of the history of the language
  2. An overview of the language features
    • Programming paradigm(s) that the language supports, and how well it supports them
    • Style of type-checking, support for design by contract or other specification techniques
    • Memory management style
  3. A basic introduction to the language syntax (including some code samples)
  4. An overview of the formal semantics of the language (if one exists)
  5. A list of available implementations and supported platforms

A good example is the Rust (programming language) article.

Theorems and conjectures

An article describing theorems or conjectures should generally contain at least:

  1. A brief informal description of the theorem or conjecture.
  2. A formal statement of the theorem or conjecture.
  3. A discussion of the history of the theorem or conjecture.
  4. A discussion of impacts, consequences, or implications of the theorem or conjecture.

As many computer science theorems and conjectures are often stated informally in popular literature it may also be beneficial to provide some discussion of common misconceptions or misinterpretations of the theorem or conjecture.

The names of such things are not capitalized except where they contain a proper name (e.g. a surname).

Good examples include halting problem and Church–Turing thesis.

Tools

An article describing a class of tools should generally contain:

  1. A brief overview of the purpose of the tool
  2. A brief history of the development of the class of tool
  3. Discussion of any major subclasses of the tool class
  4. Brief description of the key underlying algorithms or implementation techniques

Examples include compiler, text editor, and automated theorem proving.

Concluding matters

See WP:Manual of Style/Layout § Standard appendices and footers for use of "See also", "Notes", "References", "External links" sections, and navigation templates.

Style guidelines

Code samples

Samples of actual sources get included in articles for a variety of reasons, although the most typical reasons are to demonstrate the "look" of a particular language, to provide examples of language-specific constructs or features, and to provide examples of algorithms not easily expressed in pseudocode. While there's nothing inherently wrong with including sample code, excessive amounts of it can detract from the content of the article itself; avoid writing sample code unless it contributes significantly to a fundamental understanding of the encyclopedic content.

Wikipedia is not a source code repository. Code that is not relevant to encyclopedic content should be moved out of Wikipedia. WikiBooks is the appropriate Wikimedia project for existing GFDL-compatible code; in particular Wikibooks:Algorithm Implementation. The external LiteratePrograms and Rosetta Code wikis are appropriate places to put new sample implementations, along with descriptions of how those implementations work. Important note: existing implementations on Wikipedia cannot be transferred to the LiteratePrograms wiki because Wikipedia content is licensed under the GFDL and/or CC-BY-SA, while LiteratePrograms uses the more liberal MIT/X11 license; relicensing existing code from GFDL/CC-BY-SA to MIT/X11 would require the permission of all copyright holders

Some general guidelines on code samples:

  • Sample implementations of algorithms are fine, but every algorithm article should include a pseudocode description of the core algorithm when possible, so that anyone can understand how the algorithm works. See below for guidelines on pseudocode.
  • The sample should use a language that clearly illustrates the algorithm to a reader who is relatively unfamiliar with the language— even if you believe that the language is well-known. To remain language-neutral, choose languages based on clarity, not popularity. Languages that emphasize readability, such as Python and Ruby, are good choices. Avoid esoteric or language-specific operations.
  • Source code implementations must be compatible with the GFDL and CC-BY-SA licenses (which are incompatible with the GPL).
  • Multiple source code implementations are not appropriate unless they contrast specific aspects of the code and that contrast is important to the encyclopedic content of the article. If possible, accentuate differences by providing the alternate implementation in the same language as the original.
  • All code samples should be marked up using one of the following methods:
    • Surrounding a short, inline code sample in <code>...</code> or <syntaxhighlight lang="x" inline>
    • Surrounding a block of code in <syntaxhighlight lang="x"> or <pre>...</pre> tags
    • Indenting each line of a block of code by one or more spaces (unlike the above method, this allows you to use basic text formatting like italic, bold, etc. in the sample) This typesets them in a monospaced typeface to ensure that spacing is preserved and provides additional information to screen readers and users with customized CSS. Doing this is particularly important for languages where whitespace has syntactic significance— ideally we'd like people to be able to copy and paste sample code into a text editor or IDE. For example,
int main(void) {
    printf("hello, world\n");
    return 0;
}
  • Syntax highlighting may be obtained by using SyntaxHighlight MediaWiki tag instead of <pre>...</pre>, with a specific lang attribute specifying the language name (the value wikitext is used for MediaWiki markup). See Extension:SyntaxHighlight for supported languages. The following is syntax-highlighted Java code, for example, using <syntaxhighlight lang="java">:
class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World!"); // Print "Hello World!"
    }
}
  • Since code listings are not line-wrapped, and many people read Wikipedia in space-constrained environments, ensure that code listings have a maximum line length of 60 chars. Apply manual word wrap if necessary.

Algorithms

There are no universally accepted standards for presenting algorithms on Wikipedia. A past attempt at standardized pseudocode is archived at User:Dcoetzee/Wikicode, though "[t]he author advises that such a proposal not be advanced again, as it is unlikely to gain consent". Within WikiProject Computer science, the consensus has generally been that where possible, algorithms should be presented in pseudocode. The use of pseudocode is completely language agnostic, and is more NPOV with respect to programming languages in general. Pseudocode also provides far more flexibility with regard to the level of implementation detail, allowing algorithms to be presented at however high a level is required to focus on the algorithm and its core ideas, rather than the details of how it is implemented. Finally, suitably high-level pseudocode provides the most accessible presentation of an algorithm, particularly for non-programmers.

Example

algorithm ford-fulkerson is
    input: Graph G with flow capacity c, 
           source node s, 
           sink node t
    output: Flow f such that f is maximal from s to t

    (Note that f(u,v) is the flow from node u to node v, and c(u,v) is the flow capacity from node u to node v)

    for each edge (u, v) in GE do
        f(u, v) ← 0
        f(v, u) ← 0

    while there exists a path p from s to t in the residual network Gf do
        let cf be the flow capacity of the residual network Gf
        cf(p) ← min{cf(u, v) | (u, v) in p}
        for each edge (u, v) in p do
            f(u, v)f(u, v) + cf(p)
            f(v, u) ← −f(u, v)

    return f

General guidelines for writing pseudocode

  • Make sure the algorithm is understandable without having to read this page first.
  • Try to focus on outlining the algorithm, and where possible keep discussion and explanation of the algorithm outside of the pseudocode.
  • All assignment, comparison, and other mathematical operators should be rendered with proper mathematical symbols wherever possible:
Symbols for use in pseudocode
Operator Result Entity Example
Assignment ← or := &larr;, := c ← 2πr, c := 2πr
Comparison =, ≠, <, >, ≤, ≥ =, &ne;, <, >, &le;, &ge;
Arithmetic +, −, ×, /, mod +, &minus;, ×, /, mod
Floor/ceiling ⌊, ⌋, ⌈, ⌉ &lfloor;, &rfloor;, &lceil;, &rceil; a ← ⌊b⌋ + ⌈c
Logical and, or '''and''', '''or''' ab and ac
Sums ∑ ∏ &sum; &prod; h ← ∑aA 1/a
Low-level pseudocode
  • All control structure keywords should be bolded, and comments should be in italics (in addition to whatever other manner for denoting comments is used).
  • Try to keep the algorithm description sufficiently high level so as to avoid most implementation specific details.
  • Try to avoid using structures or techniques that are idiomatic to a particular language or programming paradigm.
  • If idiomatic structures or techniques are unavoidable try to present them in a generic high-level way in line with the style outlined below:
  • The preferred conditional structure is
if condition then
    code path
else if condition then
    code path
else
    code path
end if
  • The preferred looping constructs are
while condition do
    code block
repeat
and
for loop_control do
    code block
repeat
where loop_control is any suitable clause to control a for loop, such as item in list or 1 ≤ i ≤ n etc.
  • The preferred function definition structure is
function function_name is
    code block
    return variable
end function
where function_name is any reasonable format of function name and arguments. Alternatively, inputs and outputs can be specified within the function block:
function function_name is
    input: description of inputs
    output: description of outputs
    code block
    return variable
end function
  • Code blocks should be indented. If sufficiently clear, block closing keywords (end, repeat etc.) may be omitted.

An example of pseudocode roughly hewing to these guidelines is provided as the example on the Algorithm page, and in Bucket sort, Topological sort, and Pollard's rho algorithm.

High-level pseudocode
Inputs description of input arguments
Output description of outputs
  1. (description of a step in the algorithm)
  2. (the next step in the algorithm)
    1. (substep)
  3. (etc.)
using markup:
:'''Inputs''' ''description of input arguments''
:'''Output''' ''description of outputs''
:# ''(description of a step in the algorithm)''
:# ''(the next step in the algorithm)''
:## ''(substep)''
:# ''(etc.)''
  • Descriptions of steps should be high level, and may simply be English sentences.
  • Substeps of the algorithm, due to branch conditions or loop structures should be indented and subnumbered.
  • Termination of the algorithm with a return value should be denoted using the keyword return

Examples of algorithms in this format include Buchberger's algorithm, Pohlig–Hellman algorithm, Itoh–Tsujii inversion algorithm, Baby-step giant-step, and Pollard's p − 1 algorithm.

Including references

It is essential for article content to be verifiable with reliable sources, a well-chosen list of references and pointers to the literature. Any quotation, any material challenged or likely to be challenged, and any claim that involves a living person must be supported by an inline citation to a reliable secondary source. Some additional reasons for citing high-quality sources are the following:

  • Providing proper source citations enables other editors and our readers to verify the given information and evaluate the sources. This is especially important in any science topic, since the state of the art is always advancing.
  • A reader may desire more details, but Wikipedia is not a substitute for a textbook (that is what Wikibooks is for). Many research papers and books of benefit to topical newcomers are now freely available online.
  • Some notions are defined differently depending on context or author. Articles require references that support the given usage, and should identify conflicting usages (giving due weight to the sources for them).
  • Important algorithms, theorems, or definitions should cite the concept-originating papers as historical and technical information, in addition to later secondary reference works relied on for modern, applied context.

Primary sources must be used with care. They are generally permissible for non-controversial and non-promotional claims about the subject or its author(s). Examples: referencing a blog post from the Rust programming language development team is sufficient for a point about the internal architecture of the Rust compiler, or the motivation behind a specific decision about the Rust language design. However, such a source cannot be used for providing Rust benchmarks, or contrasting Rust's features versus those of another language, since such claims from the developers may have a promotional element and involve analysis, evaluation, interpretation, or synthesis of facts and evidence, which requires a secondary source.

Wikipedia does not enforce a specific reference and citation style; just use one consistently within a particular article.

See WP:Citing sources for how to cite sources properly.

See also

Further reading

  • Zobel, Justin (2004). Writing for Computer Science (2nd ed.). Springer. ISBN 1-85233-802-4.