Can you explain the concept of streams?
A stream is already a metaphor, an analogy, so there's really no need to provide another one. You can think of it basically as a pipe with a flow of water in it where the water is actually data and the pipe is the stream. I suppose it's kind of a 2-way pipe if the stream is bi-directional. It's basically a common abstraction that is placed upon things where there is a flow or sequence of data in one or both directions.
In languages such as C#, VB.Net, C++, Java etc., the stream metaphor is used for many things. There are file streams, in which you open a file and can read from the stream or write to it continuously; There are network streams where reading from and writing to the stream reads from and writes to an underlying established network connection. Streams for writing only are typically called output streams, as in this example, and similarly, streams that are for reading only are called input streams, as in this example.
A stream can perform transformation or encoding of data (an SslStream in .Net, for example, will eat up the SSL negotiation data and hide it from you; A TelnetStream might hide the Telnet negotiations from you, but provide access to the data; A ZipOutputStream in Java allows you to write to files in a zip archive without having to worry about the internals of the zip file format.
Another common thing you might find is textual streams that allow you to write strings instead of bytes, or some languages provide binary streams that allow you to write primitive types. A common thing you'll find in textual streams is a character encoding, which you should be aware of.
Some streams also support random access, as in this example. A network stream, on the other hand, for obvious reasons, wouldn't.
- MSDN gives a good overview of streams in .Net.
- Sun also have an overview of their general OutputStream class and InputStream class.
- In C++, here is the istream (input stream), ostream (output stream) and iostream (bidirectional stream) documentation.
UNIX like operating systems also support the stream model with program input and output, as described here.
The word "stream" has been chosen because it represents (in real life) a very similar meaning to what we want to convey when we use it.
Let's forget about the backing store for a little, and start thinking about the analogy to a water stream. You receive a continuous flow of data, just like water continuously flows in a river. You don't necessarily know where the data is coming from, and most often you don't need to; be it from a file, a socket, or any other source, it doesn't (shouldn't) really matter. This is very similar to receiving a stream of water, whereby you don't need to know where it is coming from; be it from a lake, a fountain, or any other source, it doesn't (shouldn't) really matter.
That said, once you start thinking that you only care about getting the data you need, regardless of where it comes from, the abstractions other people talked about become clearer. You start thinking that you can wrap streams, and your methods will still work perfectly. For example, you could do this:
int ReadInt(StreamReader reader) { return Int32.Parse(reader.ReadLine()); }
// in another method:
Stream fileStream = new FileStream("My Data.dat");
Stream zipStream = new ZipDecompressorStream(fileStream);
Stream decryptedStream = new DecryptionStream(zipStream);
StreamReader reader = new StreamReader(decryptedStream);
int x = ReadInt(reader);
As you see, it becomes very easy to change your input source without changing your processing logic. For example, to read your data from a network socket instead of a file:
Stream stream = new NetworkStream(mySocket);
StreamReader reader = new StreamReader(stream);
int x = ReadInt(reader);
As easy as it can be. And the beauty continues, as you can use any kind of input source, as long as you can build a stream "wrapper" for it. You could even do this:
public class RandomNumbersStreamReader : StreamReader {
private Random random = new Random();
public String ReadLine() { return random.Next().ToString(); }
}
// and to call it:
int x = ReadInt(new RandomNumbersStreamReader());
See? As long as your method doesn't care what the input source is, you can customize your source in various ways. The abstraction allows you to decouple input from processing logic in a very elegant way.
Note that the stream we created ourselves does not have a backing store, but it still serves our purposes perfectly.
So, to summarize, a stream is just a source of input, hiding away (abstracting) another source. As long as you don't break the abstraction, your code will be very flexible.
A stream represents a sequence of objects (usually bytes, but not necessarily so), which can be accessed in sequential order. Typical operations on a stream:
- read one byte. Next time you read, you'll get the next byte, and so on.
- read several bytes from the stream into an array
- seek (move your current position in the stream, so that next time you read you get bytes from the new position)
- write one byte
- write several bytes from an array into the stream
- skip bytes from the stream (this is like read, but you ignore the data. Or if you prefer it's like seek but can only go forwards.)
- push back bytes into an input stream (this is like "undo" for read - you shove a few bytes back up the stream, so that next time you read that's what you'll see. It's occasionally useful for parsers, as is:
- peek (look at bytes without reading them, so that they're still there in the stream to be read later)
A particular stream might support reading (in which case it is an "input stream"), writing ("output stream") or both. Not all streams are seekable.
Push back is fairly rare, but you can always add it to a stream by wrapping the real input stream in another input stream that holds an internal buffer. Reads come from the buffer, and if you push back then data is placed in the buffer. If there's nothing in the buffer then the push back stream reads from the real stream. This is a simple example of a "stream adaptor": it sits on the "end" of an input stream, it is an input stream itself, and it does something extra that the original stream didn't.
Stream is a useful abstraction because it can describe files (which are really arrays, hence seek is straightforward) but also terminal input/output (which is not seekable unless buffered), sockets, serial ports, etc. So you can write code which says either "I want some data, and I don't care where it comes from or how it got here", or "I'll produce some data, and it's entirely up to my caller what happens to it". The former takes an input stream parameter, the latter takes an output stream parameter.
Best analogy I can think of is that a stream is a conveyor belt coming towards you or leading away from you (or sometimes both). You take stuff off an input stream, you put stuff on an output stream. Some conveyors you can think of as coming out of a hole in the wall - they aren't seekable, reading or writing is a one-time-only deal. Some conveyors are laid out in front of you, and you can move along choosing whereabouts in the stream you want to read/write - that's seeking.
As IRBMe says, though, it's best to think of a stream in terms of the operations it offers (which vary from implementation to implementation, but have a lot in common) rather than by a physical analogy. Streams are "things you can read or write". When you start connecting up stream adaptors, you can think of them as a box with a conveyor in, and a conveyor out, that you connect to other streams and then the box performs some transformation on the data (zipping it, or changing UNIX linefeeds to DOS ones, or whatever). Pipes are another thorough test of the metaphor: that's where you create a pair of streams such that anything you write into one can be read out of the other. Think wormholes :-)