Extract filename without extension from the absolute location
Edit:
Even though my first solution answered the question as asked, I saw @DavidBoho 's answer and he made several good points. He suggested that if the filename is my_file.tar.gz
then the return value should be my_file.tar
and also that my solution would fail in the event that the file had no extension at all. All of the code here is available on this fiddle.
Given the table and data as follows:
CREATE TABLE with_filename
(
file_id INTEGER,
file_name VARCHAR (256)
);
populate:
INSERT INTO with_filename
VALUES
(1, '/users/mcm1/ualaoip2/vmm/file1.pdf'),
(2, '/users/mcm1/ualaoip2/vmm/file2.py'),
(3, '/users/mcm1/ualaoip2/vmm/file3.pdf'),
(4, '/users/mcm1/ualaoip2/vmm/file4.c'),
(5, '/users/mcm1/ualaoip2/vmm/file5.java'),
(6, '/users/mcm1/ualaoip2/vmm/file6.class'),
(7, '/users/mcm1/ualaoip2/vmm/file7'),
(8, '/users/mcm1/ualaoip2/vmm/file8.tar.gz'),
(9, '/users/mcm1/my_prog.cpp');
My original solution:
SELECT LEFT(
RIGHT(file_name, POSITION('/' IN REVERSE(file_name)) - 1),
POSITION('.' IN
RIGHT(file_name, POSITION('/' IN REVERSE(file_name)) - 1)) - 1
) AS my_file
FROM with_filename;
gives the result:
my_file
file1
file2
file3
file4
file5
file6
file -- << should be file7
file8 -- << should be file8.tar
my_prog
After reading @DavidBoho's post, he used the SPLIT_PART
function to resolve the problems with files 7 & 8 - see the fiddle. I decided to look again at my own SQL and I came up with this (perhaps more traditional?):
SELECT
REPLACE(SUBSTRING(file_name, (LENGTH(file_name) + 2) - POSITION('/' IN REVERSE(file_name))),
RIGHT(file_name, POSITION('.' IN LEFT(REVERSE(file_name), POSITION('/' IN REVERSE(file_name)) - 1))),
'') AS the_files
FROM with_filename
Result:
the_files
file1
file2
file3
file4
file5
file6
file7
file8.tar
my_prog
which is also the correct answer!
While I was grasping for a solution, I became interested in regular expressions as a means of solving this problem. Even though I (we) was/were able to solve this using "traditional" SQL, it became clear to me that regexes are extremely powerful and even though SQL is now Turing complete, it could rapidly become extremely convoluted for relatively simple string manipulation problems, so I decided to investigate.
I found two regex solutions - in fairness, I can't claim to have done this myself, the solutions are as a result of a question I asked on StackOverflow. So, the regex solutions are as follows:
The best one is this one -
SELECT
file_name,
REGEXP_REPLACE(file_name, '^.*/([^/]*?)(\.[^/.]+)?$', '\1') AS filename
FROM with_filename;
There is a second one, but IMHO (and that of the original author) it's a bit of a hack - it involves two nested REGEXP_REPLACE
s
SELECT
file_name,
REGEXP_REPLACE(REGEXP_REPLACE(file_name, '^.*/(.*)$', '\1'), '\.[^.]+$', '') AS filename
FROM with_filename
Finally, there may be a solution possible using the UNNEST and the STRING_TO_ARRAY functions together - I came up with this code:
SELECT fn,
LEFT(fn, POSITION('.' IN fn) - 1) AS lef
FROM with_filename w,
UNNEST(STRING_TO_ARRAY(w.file_name, '/')) AS fn
GROUP BY fn
HAVING COUNT(fn) = 1
ORDER BY lef;
which gives the result:
fn lef
file7 file -- << should be file7
file1.pdf file1
file2.py file2
file3.pdf file3
file4.c file4
file5.java file5
file6.class file6
file8.tar.gz file8 -- << should be file8.tar
my_prog.cpp my_prog
I tried lots of different permutations with this, but couldn't get it to work. Would be grateful for any input! :-)
Another interesting function is REGEXP_SPLIT_TO_TABLE.
SELECT
fn,
COUNT(fn)
FROM
(
SELECT REGEXP_SPLIT_TO_TABLE(w.file_name, '/') AS fn
FROM with_filename w
) AS sq
GROUP BY fn
HAVING COUNT(fn) = 1
ORDER BY fn
Result:
fn count
file1.pdf 1
file2.py 1
file3.pdf 1
file4.c 1
file5.java 1
file6.class 1
file7 1
file8.tar.gz 1
my_prog.cpp 1
Again, this might be worth pursuing - didn't have time.
Example DML/DDL
CREATE TABLE with_filename
(
file_id INTEGER,
file_name VARCHAR (256)
);
INSERT INTO with_filename
VALUES
(1, '/users/mcm1/ualaoip2/vmm/file1.pdf'),
(2, '/users/mcm1/ualaoip2/vmm/file2.py'),
(3, '/users/mcm1/ualaoip2/vmm/file3.pdf'),
(4, '/users/mcm1/ualaoip2/vmm/file4.c'),
(5, '/users/mcm1/ualaoip2/vmm/file5.java'),
(6, '/users/mcm1/ualaoip2/vmm/file6.class'),
(7, '/users/mcm1/ualaoip2/vmm/file7'),
(8, '/users/mcm1/ualaoip2/vmm/file8.tar.gz'),
(9, '/users/mcm1/my_prog.cpp');