Remove known audio output from microphone input

Yes, this is possible. Two methods:

Time Domain

If you can guarantee that the mixed audio is sample-accurate to the timing of the original stream1, then you can simply negate the original stream1 and add it to the mix. Now, you might have to scale that waveform a bit, since usually when audio is mixed, their level is reduced.

If there are other things done to the audio (such as level compression), then this affects your ability to do this sort of subtraction of sound cleanly.

Frequency Domain

While normal PCM-encoded audio is just a sampling of pressure many times per second, this is not how sound is fully perceived. We hear different frequencies. If you use a Fourier transform (normally done with an FFT algorithm), you convert audio samples from a time domain to the frequency domain, giving you the level of sound in various frequency buckets along the way.

If you convert both stream1 and the mix to the frequency domain, subtract stream1 from the mix, and then convert back to the time domain for output, you can effectively remove much of stream1 from the mix. The more frequency buckets you use, the more CPU needed, but the more accurate this removal will be. Note that while this means you don't have to quite be sample-accurate, it does typically hurt the quality of the sound from the mix.

Many audio editing programs use this method to remove background noise.