Can we consider AWS Glue as a replacement for EMR?
AWS Glue does not let us configure a lot of things like executor memory or driver memory. It is a fully managed service with 5Gb as the default driver memory and 5Gb as the default executor memory. On the other hand, AWS EMR is not a fully managed service, which requires us to configure. Better for experienced engineers.
As per my understanding, glue cannot be a replacement for EMR. It actually depends on your usecase. There are some limitations with glue ETL;
- It does not support --packages.
- You do not have an internal storage for storing temp data.
With glue catalog you can view data in Athena, but it also has few limitations like cannot create table as select, cannot create view etc. You can use Glue data catalog in EMR to overcome limitations of Athena.
So, currently glue can be a replacement for persistent metadata store.
BTW, you can also config all the built-in configuration with passing the parameters to the Glue Job :
ex.
--conf value: spark.yarn.executor.memoryOverhead=1024
--conf value: spark.driver.memory=10g
This can help to make Glue Job more flexible.