How can I split columns with regex to move trailing CAPS into a separate column?

You may extract the data into two columns by using a regex like ^(.*?)([A-Z]+)$ or ^(.*[^A-Z])([A-Z]+)$:

df[['Name','Team']] = df['Name'].str.extract('^(.*?)([A-Z]+)$', expand=True)

This will keep all up to the last char that is not an uppercase letter in Group "Name" and the last uppercase letters in Group "Team".

See regex demo #1 and regex demo #2

Details

  • ^ - start of a string
  • (.*?) - Capturing group 1: any zero or more chars other than line break chars, as few as possible
    or
  • (.*[^A-Z]) - any zero or more chars other than line break chars, as many as possible, up to the last char that is not an ASCII uppercase letter (granted the subsequent patterns match) (note that this pattern implies there is at least 1 char before the last uppercase letters)
  • ([A-Z]+) - Capturing group 2: one or more ASCII uppercase letters
  • $ - end of string.

I have made a few alterations in the functions, You might need to add re package.

Its a bit manual, But I hope this will suffice. Have a great day!

df_obj_skel = dict()
df_obj_skel['Name'] = list()
df_obj_skel['Team'] = list()
for index,row in df.iterrows():
    Name = row['Name']
    Findings = re.search('[A-Z]{2,4}$', Name)
    Refined_Team = Findings[0]
    Refined_Name = re.sub(Refined_Team + "$", "", Name)
    df_obj_skel['Team'].append(Refined_Team)
    df_obj_skel['Name'].append(Refined_Name)
df_final = pd.DataFrame(df_obj_skel)
print(df_final)