Reference on availability of source code used in computer science research articles?
Here's a relevant study on computer science systems research that addresses your first question, "What percentage of research articles are provided with their source code?". The study is described in a tech report:
"Measuring Reproducibility in Computer Systems Research." Christian Collberg, Todd Proebsting, Gina Moraila, Akash Shankaran, Zuoming Shi, Alex M Warren. March 21, 2014.
The authors of this study observed the following protocol to determine code availability:
We downloaded 613 papers from the latest incarnations of eight ACM conferences (ASPLOS’12, CCS’12, OOPSLA’12, OSDI’12, PLDI’12, SIGMOD’12, SOSP’11, VLDB’12) and five journals (TACO’9, TISSEC’15, TOCS’30, TODS’37, TOPLAS’34), all with a practical orientation. For each paper we determined whether the published results appeared to be backed by source code or whether they were purely theoretical. Next, we examined each non-theoretical paper to see whether it contained a link to downloadable code. If not, we examined the authors’ websites, did a web search, examined popular code repositories such as
github
andsourceforge
, to see if the relevant code could be found. In a final attempt, we emailed the authors of each paper for which code could not be found, asking them to direct us to the location of the source. In cases when code was eventually recovered, we also attempted to build and execute it. At this point we stopped — we did not go as far as to attempt to verify the correctness of the published results.
Here is a summary of their findings:
- Total papers examined: 613
- Papers that appeared to be backed by source code (not purely theoretical): 515
Of these 515 papers, 105 were excluded from consideration so that the resulting set of papers had no overlapping author lists. That leaves 410 papers, with results as follows:
- Papers with link to source in the paper: 85
- Papers not in above category, where source was found via web search: 65
- Papers where author shared source following email request: 81
- Papers where author declined to share source following email request: 149
- Papers where author did not respond to email requests for source: 30
More details of methodology and results, as well as all the data and other materials used in this study, may be found at this web site.
There is an "Anecdotes" section appended to this tech report, which I think you may find very interesting, as it relates to some of the other points in your question. It documents the author's struggles to get authors to give up their source code :)