How Do You "Permanently" Delete An Experiment In Mlflow?
I am adding SQL commands if you want to delete permanently Trash of MLFlow if you are using PostgreSQL as backend storage.
Change to your MLFlow Database, e.g. by using: \c mlflow
and then:
DELETE FROM experiment_tags WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM tags WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM params WHERE run_uuid=ANY(
SELECT run_uuid FROM runs where experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';
The difference is, that I added the 'params' Table SQL Delete command there.
As of mlflow 1.11.0, the recommended way to permanently delete runs within an experiment is: mlflow gc [OPTIONS]
.
From the documentation, mlflow gc
will
Permanently delete runs in the deleted lifecycle stage from the specified backend store. This command deletes all artifacts and metadata associated with the specified runs.
Extending @Lee Netherton's answer, you can use PyMySQL to execute those queries and remove all metadata from MLFlow tracking server after deleting the experiment from the MLFlow tracking client.
import pymysql
def perm_delete_exp():
connection = pymysql.connect(
host='localhost',
user='user',
password='password',
db='mlflow',
cursorclass=pymysql.cursors.DictCursor)
with connection.cursor() as cursor:
queries = """
USE mlflow;
DELETE FROM experiment_tags WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
DELETE FROM latest_metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM tags WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
DELETE FROM experiments where lifecycle_stage="deleted";
"""
for query in queries.splitlines()[1:-1]:
cursor.execute(query.strip())
connection.commit()
connection.close()
You can (perhaps should) execute the entire query at once, but I found debugging it easier this way.
Unfortunately it seems there is no way to do this via the UI or CLI at the moment :-/
The way to do it depends on the type of backend file store that you are using.
Filestore:
If you are using the filesystem as a storage mechanism (the default) then it is easy. The 'deleted' experiments are moved to a .trash
folder. You just need to clear that out:
rm -rf mlruns/.trash/*
As of the current version of the documentation (1.7.2), they remark:
It is recommended to use a cron job or an alternate workflow mechanism to clear
.trash
folder.
SQL Database:
This is more tricky, as there are dependencies that need to be deleted. I am using MySQL, and these commands work for me:
USE mlflow_db; # the name of your database
DELETE FROM experiment_tags WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM tags WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM experiments where lifecycle_stage="deleted";