How do I cut a string based on digit first certain digit and the rest

Create new columns by indexing with str, replace for change empty strings and for new column use Series.str.contains with casting to integers:

df['actual_pattern'] = df['actual_pattern'].astype(str)

df['cut_pattern1'] = df['actual_pattern'].str[:4]
df['cut_pattern2'] = df['actual_pattern'].str[4:].replace('','0')
df['binary_cut2'] = df['cut_pattern2'].str.contains('1').astype(int)
print (df)
   Id actual_pattern cut_pattern1 cut_pattern2  binary_cut2
0   1         100101         1001           01            1
1   2          10101         1010            1            1
2   3        1010101         1010          101            1
3   4            101          101            0            0

EDIT:

Solution for @Rick Hitchcock from comments:

df['actual_pattern'] = df['actual_pattern'].astype(str)

df['cut_pattern1'] = df['actual_pattern'].str[:4]
df['cut_pattern2'] = df['actual_pattern'].str[4:].replace('','0')
df['binary_cut2'] = df['cut_pattern2'].str.contains('1').astype(int)
print (df)
  Id actual_pattern cut_pattern1 cut_pattern2  binary_cut2
0  1         100101         1001           01            1
1  2          10101         1010            1            1
2  3        1010101         1010          101            1
3  4       00001111         0000         1111            1

Here's how I'd approach this:

s = df.actual_pattern.astype(str).str
# Split into 2 lists, the first containing the first 4 digits
out = s.split(r'(\d{4})').str[-2:].values.tolist()
# [['1001', '01'], ['1010', '1'], ['1010', '101'], ['101']]
# build a dataframe from the lists
out = pd.DataFrame(out, columns=['cut_pattern1', 'cut_pattern2'])
# fill missing values (absense of string in list) with 0
out['cut_pattern2'] = out.cut_pattern2.fillna('0')
out['binary_cut2'] = out.cut_pattern2.str.contains('1').view('i1')
print(out)

     cut_pattern1 cut_pattern2 binary_cut2
0         1001           01           1
1         1010            1           1
2         1010          101           1
3          101            0           0

Using some regex and string extract here:

m=df.actual_pattern.str.extract('(?P<cut_pattern1>.{,4})(?P<cut_pattern2>.*)').replace('',0)

   cut_pattern1 cut_pattern2
0         1001           01
1         1010            1
2         1010          101
3          101            0

Then do:

m.assign(binary_cut2=m.cut_pattern2.str.contains('1',na=False).astype(int))

  cut_pattern1 cut_pattern2  binary_cut2
0         1001           01            1
1         1010            1            1
2         1010          101            1
3          101            0            0

Finally concat this to the original df:

m=df.actual_pattern.str.extract('(?P<cut_pattern1>.{,4})(?P<cut_pattern2>.*)').replace('',0)
m=m.assign(binary_cut2=m.cut_pattern2.str.contains('1',na=False).astype(int))
pd.concat([df,m],axis=1)

  Id actual_pattern cut_pattern1 cut_pattern2  binary_cut2
0  1         100101         1001           01            1
1  2          10101         1010            1            1
2  3        1010101         1010          101            1
3  4            101          101            0            0

How do I cut a string based on digit first certain digit and the rest

Tags:

Python

Pandas

String

Dataframe

Related

Recent Posts