How to create a Dataflow pipeline from Pub/Sub to GCS in Python
I ran into this same error, and found a workaround, but not a fix:
TypeError: Cannot convert GlobalWindow to apache_beam.utils.windowed_value._IntervalWindowBase [while running 'test-file-out/Write/WriteImpl/WriteBundles']
running locally with DirectRunner
and on dataflow with DataflowRunner
.
Reverting to apache-beam[gcp]==2.9.0 allows my pipeline to run as expected.
I have had so much trouble trying to figure out the
TypeError: Cannot convert GlobalWindow to apache_beam.utils.windowed_value._IntervalWindowBase [while running 'generatedPtransform-1090']
There seems to be something with the WriteToText after beam 2.9.0 (I am using beam 2.14.0, python 3.7)
| "Output" >> beam.io.WriteToText("<GCS path or local path>"))
What made it work for me was removing the pipeline part and radding a custom DoFn:
class WriteToGCS(beam.DoFn):
def __init__(self):
self.outdir = "gs://<project>/<folder>/<file>"
def process(self, element):
from apache_beam.io.filesystems import FileSystems # needed here
import json
writer = FileSystems.create(self.outdir + '.csv', 'text/plain')
writer.write(element)
writer.close()
and in the pipeline add:
| 'Save file' >> beam.ParDo(WriteToGCS())