Unmelt only part of a column from pandas dataframe
set_index
, unstack
, then melt
df.set_index(['RecordID', 'DisplayLabel']).Value.unstack().reset_index() \
.melt(['RecordID', 'Source', 'Test'], var_name='Result', value_name='Value') \
.sort_values('RecordID').dropna(subset=['Value'])
RecordID Source Test Result Value
0 1 Web Logic Value 1 S
5 1 Web Logic Value 2 I
10 1 Web Logic Value 3 Complete
1 2 Person Voice Value 1 >20
6 2 Person Voice Value 2 P
2 3 Mail OCR Value 1 A
7 3 Mail OCR Value 2 I
3 4 Dictation Understandable Value 1 S
8 4 Dictation Understandable Value 2 I
4 5 Web Logic Value 1 R
9 5 Web Logic Value 2 S
Custom function for groupby
def f(t):
name, df = t
d = dict(zip(df['DisplayLabel'], df['Value']))
source = d.pop('Source')
test = d.pop('Test')
result, value = zip(*d.items())
return pd.DataFrame(
dict(RecordID=name, Source=source, Test=test, Result=result, Value=value)
)
pd.concat(map(f, df.groupby('RecordID')))
RecordID Source Test Result Value
0 1 Web Logic Value 1 S
1 1 Web Logic Value 2 I
2 1 Web Logic Value 3 Complete
0 2 Person Voice Value 1 >20
1 2 Person Voice Value 2 P
0 3 Mail OCR Value 1 A
1 3 Mail OCR Value 2 I
0 4 Dictation Understandable Value 1 S
1 4 Dictation Understandable Value 2 I
0 5 Web Logic Value 1 R
1 5 Web Logic Value 2 S
Setup
df = pd.DataFrame(data={
'RecordID': [1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5],
'DisplayLabel': [
'Source', 'Test', 'Value 1', 'Value 2', 'Value 3',
'Source', 'Test', 'Value 1', 'Value 2',
'Source', 'Test', 'Value 1', 'Value 2',
'Source', 'Test', 'Value 1', 'Value 2',
'Source', 'Test', 'Value 1', 'Value 2'
],
'Value': [
'Web', 'Logic', 'S', 'I', 'Complete',
'Person', 'Voice', '>20', 'P',
'Mail', 'OCR', 'A', 'I',
'Dictation', 'Understandable', 'S', 'I',
'Web', 'Logic', 'R', 'S'
]
})
We can achieve your result by applying logic and pivotting, we split your data by checking if DisplayLabel
contains Value
and then we join
them back together:
mask = df['DisplayLabel'].str.contains('Value')
df2 = df[~mask].pivot(index='RecordID', columns='DisplayLabel', values='Value')
dfpiv = (
df[mask].rename(columns={'DisplayLabel':'Result'})
.set_index('RecordID')
.join(df2)
.reset_index()
)
RecordID Result Value Source Test
0 1 Value 1 S Web Logic
1 1 Value 2 I Web Logic
2 1 Value3 Complete Web Logic
3 2 Value 1 >20 Person Voice
4 2 Value 2 P Person Voice
5 3 Value 1 A Mail OCR
6 3 Value 2 I Mail OCR
7 4 Value 1 S Dictation Understandable
8 4 Value 2 I Dictation Understandable
9 5 Value 1 R Web Logic
10 5 Value 2 S Web Logic
If you want the exact column order as your example, use DataFrame.reindex
:
dfpiv.reindex(columns=['RecordID', 'Source', 'Test', 'Result', 'Value'])
RecordID Source Test Result Value
0 1 Web Logic Value 1 S
1 1 Web Logic Value 2 I
2 1 Web Logic Value3 Complete
3 2 Person Voice Value 1 >20
4 2 Person Voice Value 2 P
5 3 Mail OCR Value 1 A
6 3 Mail OCR Value 2 I
7 4 Dictation Understandable Value 1 S
8 4 Dictation Understandable Value 2 I
9 5 Web Logic Value 1 R
10 5 Web Logic Value 2 S
In detail - step by step:
# mask all rows where "Value" is in column DisplayLabel
mask = df['DisplayLabel'].str.contains('Value')
0 False
1 False
2 True
3 True
4 True
5 False
6 False
7 True
8 True
9 False
10 False
11 True
12 True
13 False
14 False
15 True
16 True
17 False
18 False
19 True
20 True
Name: DisplayLabel, dtype: bool
# select all rows which do NOT have "Value" in DisplayLabel
df[~mask]
RecordID DisplayLabel Value
0 1 Source Web
1 1 Test Logic
5 2 Source Person
6 2 Test Voice
9 3 Source Mail
10 3 Test OCR
13 4 Source Dictation
14 4 Test Understandable
17 5 Source Web
18 5 Test Logic
# pivot the values in DisplayLabel to columns
df2 = df[~mask].pivot(index='RecordID', columns='DisplayLabel', values='Value')
DisplayLabel Source Test
RecordID
1 Web Logic
2 Person Voice
3 Mail OCR
4 Dictation Understandable
5 Web Logic
df[mask].rename(columns={'DisplayLabel':'Result'}) # rename the column DisplayLabel to Result
.set_index('RecordID') # set RecordId as index so we can join df2
.join(df2) # join df2 back to our dataframe based RecordId
.reset_index() # reset index so we get RecordId back as column
RecordID Result Value Source Test
0 1 Value 1 S Web Logic
1 1 Value 2 I Web Logic
2 1 Value3 Complete Web Logic
3 2 Value 1 >20 Person Voice
4 2 Value 2 P Person Voice
5 3 Value 1 A Mail OCR
6 3 Value 2 I Mail OCR
7 4 Value 1 S Dictation Understandable
8 4 Value 2 I Dictation Understandable
9 5 Value 1 R Web Logic
10 5 Value 2 S Web Logic