What security measure one should implement before executing user uploaded files?
It is impossible to analyze a program to find out if it will do anything malicious. That is true regardless of whether you are attempting to analyze the source or compiled code.
The way to do what you are asking for is done by compiling and running the code in a sandbox. Once the program has terminated (or after a timeout you have decided upon) you destroy the sandbox.
The security of such a construction is as secure as the sandbox you are using. Depending on the requirements of the code you need to run the sandbox could be either something simple like Linux secure computing mode, or something complicated like a full blown virtual machine - ideally without network connectivity.
The more complicated the sandbox you need the larger risk of a security vulnerability in the sandbox undermining an otherwise good design.
Some languages can safely be compiled outside a sandbox. But there are languages where even compiling them can consume unpredictable amount of resources. This question on a sister site shows some examples of how a small source code can blow up to a large output.
If the compiler itself is free from vulnerabilities it may be sufficient to set limits on the amount of CPU, memory, and disk space it is allowed to consume. For better security you can run the compiler inside a virtual machine.
Obviously these methods can be combined for an additional layer of security. If I were to construct such a system I would probably start a virtual machine and inside the virtual machine use ulimit to limit the resource usage of the compiler. Then I would link the compiled code in a wrapper to run it in secure computing mode. Finally still inside the virtual machine I would run the linked executable.
This is a really hard problem, and one all online code judges has to solve. Basically, you are asking how you can prevent somebody who can execute arbitrary code on your machine from taking it over.
I have been coding on an online judge (Kattis) for a decade or so, and here are some of my experiences from building the security solutions for this kind of scenario:
- Very early versions was based on a solaris jail. Turns out that you can cause quite a lot of havoc inside a jail, and it does not give you the granularity you need.
- We implemented a system call filtering solution using ptrace. This introduces a very (several context switches) large overhead on system call, and keeping the security profile in sync as compilers and runners change is a nightmare. The final nail in the coffin for this solution was threading. If you allow threading, an application can use a thread to rewrite the systemcall between the inspection and the execution, and for example Java requires threading.
- These days we use a combination of cgroups and namespaces. This gives a surprisingly low overhead, and as these are part of the security primitives in the Linux kernel they are robust. Have a look at MOE Isolate for an idea of how this can be done. Isolate most likely solves your problem.
Note that while containers, such as docker or virtual machines, are popular, they may not be the best choose for a security solution in this kind of scenario. It is hard to get the fine-grained control and resource monitoring you probably want, it is hard to prevent a malicious process from screwing around inside your container and, starting and destroying the containers has a lot of overhead.
In the particular case of a puzzle website, consider the alternative: don't bother. Ask participants to upload the output so you don't have to run untrusted code. This saves you computing power, avoids a security risk, and allows people to compete in any language. If there's a prize at stake, you can verify the winning entry later manually.
If the form of your puzzle allows, you can frustrate copy-and-paste solutions by generating random inputs and writing a verifier. This is how Google Code Jam works. See https://code.google.com/codejam/problem-preparation.html#iogen