Logs for Beam application in Google cloud dataflow
You are correct, the main program does not run on Google Cloud - it only constructs the pipeline and submits it to the Dataflow service.
You can easily confirm this by stepping through your main program in a debugger: it is a regular Java program, just one of the things that happens as part of the execution is the pipeline.run()
call in your program, which under the hood packages the steps of the pipeline so far into an HTTP request to the Dataflow service saying "here's a specification of a pipeline, please run this". If this call didn't happen, or if the network was down, Dataflow would never even learn that your program exists.
Dataflow is just that - a service that responds to HTTP requests - it is not a different way to run Java programs, so it has no way of knowing about anything in your program that your program isn't explicitly sending to it; for example, it has no way of knowing about your log statements.
Moreover, if you use templates, then the execution of your main program is completely decoupled from execution of the pipeline: the main program submits the pipeline template and finishes, and you can request to run the template with different parameters later, possibly multiple times or not at all.