Replace pattern between two characters
.*
is a greedy regexp, matching the longest possible match. You need to match the shortest match but match it globally on the whole line. Try
sed 's/-[^:-]*:/:/g' 1.file > 2.file
The character class [^:-]
matches anything except colon and dash (and maybe it should match anything except colon only), so the regexp says "dash followed by any number of non-dash, non-colon characters followed by a colon". It then replaces that with a colon (since you wanted to keep that) and does the replacement globally (the trailing g
) on the line. If you omit the g
, only the first instance would be replaced.
Awk solution:
awk -F',' '{ for(i=1;i<=NF;i++) sub(/-[^:-]+/,"",$i) }1' OFS=',' 1.file
-F','
- field separatorfor(i=1;i<=NF;i++)
- iterating through all fields of the recordsub(/-[^:-]+/,"",$i
- substitute the needed sequence (between - and : including - but keeping :)
The output:
Staphylococcus_sp_HMSC14C01:0.00371647154267842634,Staphylococcus_hominis_VCU122:0.00124439639436691308)69:0.00227646100249620856,(Staphylococcus_sp_HMSC072E01:0.00288325234399461859,(((Staphylococcus_hominis_793_SHAE:0.00594391769091206796,Staphylococcus_pettenkoferi_1286_SHAE:0.00594050248317441135)