Replace pattern between two characters

.* is a greedy regexp, matching the longest possible match. You need to match the shortest match but match it globally on the whole line. Try

sed 's/-[^:-]*:/:/g' 1.file > 2.file

The character class [^:-] matches anything except colon and dash (and maybe it should match anything except colon only), so the regexp says "dash followed by any number of non-dash, non-colon characters followed by a colon". It then replaces that with a colon (since you wanted to keep that) and does the replacement globally (the trailing g) on the line. If you omit the g, only the first instance would be replaced.


Awk solution:

awk -F',' '{ for(i=1;i<=NF;i++) sub(/-[^:-]+/,"",$i) }1' OFS=',' 1.file

  • -F',' - field separator

  • for(i=1;i<=NF;i++) - iterating through all fields of the record

  • sub(/-[^:-]+/,"",$i - substitute the needed sequence (between - and : including - but keeping :)


The output:

Staphylococcus_sp_HMSC14C01:0.00371647154267842634,Staphylococcus_hominis_VCU122:0.00124439639436691308)69:0.00227646100249620856,(Staphylococcus_sp_HMSC072E01:0.00288325234399461859,(((Staphylococcus_hominis_793_SHAE:0.00594391769091206796,Staphylococcus_pettenkoferi_1286_SHAE:0.00594050248317441135)