Why must I put the command read into a subshell while using pipeline

The main problem here is grouping the commands correctly. Subshells are a secondary issue.

x|y will redirect the output of x to the input of y

Yes, but x | y; z isn't going to redirect the output of x to both y and z.

In df . | read a; read a b; echo "$a", the pipeline only connects df . and read a, the other commands have no connection to that pipeline. You have to group the reads together: df . | { read a; read a b; } or df . | (read a; read a b) for the pipeline to be connected to both of them.

However, now comes the subshell issue: commands in a pipeline are run in a subshell, so setting a variable in them doesn't affect the parent shell. So the echo command has to be in the same subshell as the reads. So: df . | { read a; read a b; echo "$a"; }.

Now whether you use ( ... ) or { ...; } makes no particular difference here since the commands in a pipeline are run in subshells anyway.


An alternative is to use a process substition:

{ read header; read filesystem rest; } < <(df .)
echo "$filesystem"

The <(...) process substitution executes the contained script (in a subshell), but it acts like a filename, so you need the first < to redirect the contents (which is the output of the script) into the braced script. The grouped commands are executed in the current shel;.

It can be tricky to get this readable, but you can put any arbitrary whitespace into the braces and the process substitition.

{
    read header
    read filesystem rest
} < <(
    df .
)
echo "$filesystem"

And it might be easier to use an external tool to extract the filesystem:

filesystem=$( df . | awk 'NR == 2 {print $1}' )

Your first command

df . | read a; read a b; echo "$a"

effectively gets interpreted as

( df . | read a ) ; read a b; echo "$a"

So the pipeline only feeds into the read a command.

Since you want multiple reads from the pipeline then you need to group the commands together.

Now it doesn't have to be a subshell; it could be a grouping..

bash-4.2$ df | { read a ; read a b ; echo $a ; }
devtmpfs

More commonly you might want a loop

bash-4.2$ df | while read a
> do
> read a b
> echo $a
> done
devtmpfs
tmpfs
/dev/vda3
/dev/vdb

There's a secondary issue with bash and the right side of a pipeline being run a subshell, so the $a $b values aren't accessible outside of the while loop, but that's a different problem!