Public dataset without license: what is allowed?

This may vary by country, but in the US at least, the default terms (if no alternative is specified) are "all rights reserved", which means that you are not allowed to reproduce the work (the dataset), prepare derivative works from it, sell/rent/lease it, or publicly perform or display it. In particular, redistributing it to others is not allowed. In practice, the authors may be fine with it, but legally speaking, you're not allowed to redistribute the dataset in the form it was published in, without having explicit permission (which could be provided by a Creative Commons license, for example).

If you wanted to reproduce the dataset in a different form that conveys the same information, that may or may not be allowed. It would be up to a court to decide whether that counts as a derivative work or just a use of the underlying ideas, and a copyright lawyer could advise you better on whether your desired use is legally acceptable.

You are allowed to read the published dataset and use the ideas contained within it (i.e. the data) to draw conclusions. Copyright law does not allow for the restriction of those rights. I think doing an analysis on the data and publishing the results of that analysis (but not the data itself), as you would do in the process of writing a paper, is generally considered to be fine.


Also, talking from the point of view of the US. Not quite familiar with other countries.

Generally, data are not covered by copyright, they are not creative expression. You cannot copyright the fact that the temperature in this room at this moment is 32. It is just a fact, it is not creative. Yes, the creation of datasets can take a lot of work, but copyright does not protect hard work, it protects creative expression. In some cases, datasets can be creative. For example, if your research data is photographs, these photographs will probably be covered by copyright. Or if you compile data in an original creative way, this could be copyrighted. Still, what copyright would protect is the compilation strategy, not the data itself.

Scientists and data creators have other options to protect their datasets that are not copyright law. I believe that you can patent datasets, for example.

If a dataset is publicly available and it contains just facts and not creative work, then it is in the public domain. You can redistribute and make derivatives as you like. You are not legally obliged to give attribution. However, in the academic world you are expected to give attribution, it does not matter if the dataset is in the public domain. If you don't give attribution, using the dataset may be considered plagiarism. Plagiarism is research misconduct, and you may get in trouble.

This is a good question to ask to a librarian in an academic library. Many of them have data management specialists or copyright specialists that will know how to answer these questions.