Why are CS researchers reluctant to share code and what techniques can I use to encourage sharing?

Why researchers might be reluctant to share their code: In my experience, there are two common reasons why some/many researchers do not share their code.

First, the code may give the researchers an important advantage for follow-on work. It may help them get a step ahead of other researchers and publish follow-on research faster. If the researchers have plans to do follow-on research, keeping their code secret gives them a competitive advantage and helps them avoid getting scooped by someone else. (This may be good, or it may be bad; I'm not taking a position on that.)

Second, a lot of research code is, well, research-quality. The researchers probably thought it was good enough to test the paper's hypotheses, but that's all. It may have many known problems; it may not have any documentation; it might be tricky to use; it might compile on only one platform; and so forth. All of these may make it hard for someone else to use. Or, it may take a bunch of work to explain how to someone else how to use the code. Also, the code might be a prototype, but not production-quality. It's not unusual to take shortcuts while coding: shortcuts that don't affect the research results and are fine in the context of a research paper, but that would be unacceptable for deployed production-quality code. Some people are perfectionists, and don't like the idea of sharing code with known weaknesses or where they took shortcuts; they don't want to be embarrassed when others see the code.

The second reason is probably the more important one; it is very common.

How to approach researchers: My suggestion is to re-focus your interactions with those researchers. What are your real goals? Your real goals are to understand their algorithms better. So, start from that perspective, and act accordingly. If there are some parts in the paper that are hard to follow or ambiguous, start by reading and re-reading their paper, to see if there are some details you might have missed. Think hard about how to fill in any missing gaps. Make a serious effort on your own, first.

If you are at a research level, and you've put in a serious effort to understand, and you still don't understand ... email the authors and ask them for clarification on the specific point(s) that you think are unclear. Don't bother authors unnecessarily -- but if you show interest in their work and have a good question, many authors are happy to respond. They're just grateful that someone is reading their papers and interested enough in their work to study their work carefully and ask insightful questions.

But do make sure you are asking good questions. Don't be lazy and ask the authors to clear up something that you could have figured out on your own with more thought. Authors can sense that, and will write you off as a pest, not a valued colleague.

Very important: Please understand that my answer explaining why researchers might not share their code is intended as a descriptive answer, not a prescriptive answer. I am emphatically not making any judgements about whether their reasons are good ones, or whether researchers are right (or wrong) to think this way. I'm not taking a position on whether researchers should share their code or not; I'm just describing how some researchers do behave. What they ought to do is an entirely different ball of wax.

The original poster asked for help understanding why many researchers do not share their code, and that's what I'm responding to. Arguments about whether these reasons are good ones are subjective and off-topic for this question; if you want to have that debate, post a separate question.

And please, I urge you to use some empathy here. Regardless of whether you think researchers are in right or wrong not to share their code in these circumstances, please understand that many researchers do have reasons that feel valid and appropriate to them. Try to understand their mindset before reflexively criticizing them. I'm not trying to say that their reasons are necessarily right and good for the field. I'm just saying that, if you want to persuade people to change their practices, it's important to first understand the motivations and structural forces that have influenced their current actions, before you launch into trying to browbeat them into acting differently.

Appendix: I definitely second Jan Gorzny's recommendation to read the article in SIAM News that he cites. It is informative.

Stephen, I have just the same experience as you do, and my explanation is that the benefit/cost ratio is too low.

Packing a piece of software, so that it can be usable by another person, is difficult - often even more difficult than writing it in the first place. It requires, among others:

writing documentation and installation instructions,
making sure the code is runnable on a variety of computers and operating systems (I code on Ubuntu, but you may code on Windows, so I have to get a Windows virtual machine to make sure it works there too),
answering maintenance questions of the form "why do I get this and that compilation error when I compile your program on the new version of Ubuntu" (go figure. Maybe the new version of Ubuntu dropped some library required by the code? who knows).
taking care of 3rd-party dependencies (my code may work fine, but it depends on some 3rd-party jar file whose author decided to remove from the web).

Additionally, I should be available to answer questions and fix bugs, several years after I graduate, when I already work full-time in another place, and have small kids.

And all this, without getting any special payment or academic credit for all that effort.

One possible solution I recently thought of is, to create a new journal, Journal of Reproducible Computer Science, that will accept only publications whose experiments can be repeated easily. Here are some of my thoughts about such a journal:

Submitted papers must have a detailed reproduction section, with (at least) the following sub-sections: - pre-requisites - what systems, 3rd-party software, etc., are required to repeat the experiment; - instructions - detailed instructions on how to repeat the experiment. - licenses - either open-source or closed-source license, but must allow free usage for research purposes.

The review process requires each of 3 different reviewers, from different backgrounds, to go through this section, using different computers and operating systems.

After the review process, if the paper is accepted for publication, there will be another pre-publication step, which will last for a year. During this step, the paper will be available to all the readers, and they will have the option to repeat the experiment and also contact the author in case there are any problems. Only after this year, the paper will be finally published.

This journal will enable researchers to get credit for the difficult and important work of making their code usable to others.

EDIT: I now see that someone already thought about this! https://www.scienceexchange.com/reproducibility

"Science Exchange, PLOS ONE, figshare, and Mendeley have launched the Reproducibility Initiative to address this problem. It’s time to start rewarding the people who take the extra time to do the most careful and reproducible work. Current academic incentives place an emphasis on novelty, which comes at the expense of rigor. Studies submitted to the Initiative join a pool of research, which will be selectively replicated as funding becomes available. The Initiative operates on an opt-in basis because we believe that the scientific consensus on the most robust, as opposed to simply the most cited, work is a valuable signal to help identify high quality reproducible findings that can be reliably built upon to advance scientific understanding."

This article in SIAM News sheds some light on the first question, so it might be worth a look. It argues, for a mathematical audience, why researchers ought to publish their source code, and lists many of the reasons you might hear why researchers do not share their source code. It does so by a clever analogy, one that compares the sharing of mathematical proofs to the sharing of source code. Take a look; it has quite an extensive list of reasons why researchers might prefer not to share their source code (as well as some responses arguing that those reasons are not good ones).

Here's a citation:

Top Ten Reasons To Not Share Your Code (and why you should anyway). Randall J. LeVeque. SIAM News, April 1, 2013.

Why are CS researchers reluctant to share code and what techniques can I use to encourage sharing?

Tags:

Publications

Research Process

Public Domain

Related

Recent Posts