How to distribute the source of programs used in a paper?

Include the source code as auxiliary files when you submit to the ArXiv. That's a more permanent and safe location than your own webpage. In the actual article just put pseudo-code and a link to the ArXiv source.

If you have a lot of code that approach won't work, and you'll have to host it yourself.

See http://arxiv.org/abs/1007.1730 for an example of how Scott and I dealt with this issue, there's a rather extensive worked example so that people can figure out how to check the program's output locally on their own. There we have a huge package we've written that the program uses, so we have a server set up that anyone can download the package from.

You should make all three forms publicly available: pseudo-code, simplified code, and working code, with as much as possible at the arXiv or another publicly maintained site. However, I don't think permanent storage of the working code is as central an issue as it is for research articles, since computing environments evolve quickly. The most important thing to worry about is its availability for say 10 years (but the pseudo-code, more permanently).

Online storage is extraordinarily cheap. I don't see a rationale for skimping. The only consideration is organizing it so people understand what's there, and have guidance as to which version (if any) they might want. The three forms serve different purposes. Even if nobody ever downloads the full version, it only costs you or somebody pennies, and it may even reduce your time and trouble just to provide it.

Sometimes there's a computational task someone wants to perform that is just a step in a bigger project. Programming tends to consume a fair amount of time per idea, even if you are very clear on what you're doing, and programming skill and speed varies widely between different people. You should make it as easy as possible for them to make use of your work.

Sometimes people are thinking about solving related problems where the code won't be directly helpful, but the ideas may be, and sometimes people just want to check whether what you're doing is correct. Pseudocode is much preferable in a case like this.

Sometime people may want to actually use the code, but they don't have your infrastructure installed. For such people, it's good to provide a stripped-down form that doesn't require much to start with --- once they have it working in some baby form, they can add in external libraries, or perhaps optimize it in different ways. Maybe they'll even improve it.

My preference is detailed pseudocode, at a high-enough level of abstraction to allow understanding the algorithm.

Of course, as pointed out by Ryan Budney's comment, it depends strongly on what the journal requirements are and in which journal you publish. However, I feel strongly that the complete code-set which you use should be available from some resource, either through the journal article's publsher, or from your own website, your academic website, or via Arxiv.

If the pseudo-code is detailed enough to allow reimplementing the algorithm straightforwardly by another mathematician, then that should be sufficient.

If the pseudo-code has to leave out certain details which are germane to the computation, then the interpreted code which implements the algorithm in a numerical computational package (such as Maple, Matlab, Sage, or Octave or Scilab (download link ) which are free open source software packages capable of running code similar to or equivalent to matlab) should be provided.

Why not provide both? -- If you can provide a link to your own webpage for the paper, or for its supporting supplemental materials, I don't see why you couldn't provide both the interpreted code and the compileable C or C++ code on your webpage, unless there are copyright issues involved such as if you did not write all of the code yourself and do not have the right to release all of the code source. I am a supporter of free open-source software and the Gnu organization's GPL licensing, which would allow others to benefit from your code and to contribute back to it via incremental improvements.

I suggest that you specify which version of software package, operating system, compiler, and/or library you used in running your program or in creating the binary application from your code. This is necessary because different versions of Octave (2.3 vs. 3.0) or Matlab (R10, R13, etc.) or any software package may implement or include different routines and may not be capable of correctly running your software program.

I would recommend that if particular packages are necessary in order to run the interpreted code in Octave or Matlab that you list which packages they are. In the same vein, if your C or C++ code requires particular libraries such as LAPACK or BLAS, make sure to list them in a text file or in a header file. If you know how to use the make program, you can create a makefile to help others in compiling your software.

The make program, the Gnu compiler collection, and many other development tools are all standard parts of Gnu/Linux distributions, such as Debian.

My preference is detailed pseudocode, at a high-enough level of abstraction to allow understanding the algorithm.

How to distribute the source of programs used in a paper?

Tags:

Mathematical Writing

Related

Recent Posts