How to list all databases and tables in AWS Glue Catalog?
The boto3 api also supports pagination, so you could use the following instead:
import boto3
glue = boto3.client('glue')
paginator = glue.get_paginator('get_tables')
page_iterator = paginator.paginate(
DatabaseName='database_name'
)
for page in page_iterator:
print(page['TableList'])
That way you don't have to mess with while loops or the next token.
I spend several hours trying to find some info about CatalogConnection class but haven't found anything. (Even in the aws-glue-lib repository https://github.com/awslabs/aws-glue-libs)
In my case I needed table names in Glue Job Script console
Finally I used boto library and retrieved database and table names with Glue client:
import boto3
client = boto3.client('glue',region_name='us-east-1')
responseGetDatabases = client.get_databases()
databaseList = responseGetDatabases['DatabaseList']
for databaseDict in databaseList:
databaseName = databaseDict['Name']
print '\ndatabaseName: ' + databaseName
responseGetTables = client.get_tables( DatabaseName = databaseName )
tableList = responseGetTables['TableList']
for tableDict in tableList:
tableName = tableDict['Name']
print '\n-- tableName: '+tableName
Important thing is to setup the region properly
Reference: get_databases - http://boto3.readthedocs.io/en/latest/reference/services/glue.html#Glue.Client.get_databases
get_tables - http://boto3.readthedocs.io/en/latest/reference/services/glue.html#Glue.Client.get_tables
Glue returns back one page per response. If you have more than 100 tables, make sure you use NextToken
to retrieve all tables.
def get_glue_tables(database=None):
next_token = ""
while True:
response = glue_client.get_tables(
DatabaseName=database,
NextToken=next_token
)
for table in response.get('TableList'):
print(table.get('Name'))
next_token = response.get('NextToken')
if next_token is None:
break