What are Pascal Strings?

Pascal strings were made popular by one specific, but huge influential Pascal implementation, named UCSD. So UCSD Strings is a better term. This is the same implementation that made bytecode interpreters popular.

In general it is not one specific type, but the basic principle of having the size prefixed to the character data. This makes getting the length a constant time operation (O(1)) instead of scanning the character data for a nul character.

Not all Pascals used this concept. IIRC, the original (seventies) convention was to space pad an allocation, and scan backwards for a non space character (making it impossible for strings to have a terminating space). Moreover, since software was mostly used in isolation, all kinds of schemes were used, often based on what was advantageous for that implementation/architecture.

While the construct is not part of Standard Pascal, the most popular dialects from Borland (Turbo Pascal, Delphi and Free Pascal) generally base themselves on UCSD dialect, and thus have pascal strings, Delphi currently has 5 such strings. (short/ansi/wide/unicode/open)

On the other hand, this means that in a loop, you need some additional check based on indexes to check for the end of the string.

So instead by copying a string using

while (p^) do begin P^=p2^; inc(p) inc(p2); end;

which is wholly equivalent to

while (*s++ = *t++);

in C when using an optimizing compiler.

you need to do e.g.

while (len>0) do begin p^:=p2^; inc(p) inc(p2); dec(len); end;

or even

i:=1;
while (i<=len) do begin p[i]:=p2[i]; inc(i); end;

This made the number of instructions in a Pascal string loop slightly larger than the equivalent zero terminated string, and adds one more live value. Additionally, UCSD was a bytecode (p-code) interpreter language, and the latter code based on pascal string use is "safe".

In case of an architecture that had built in post increment (++) operators (like the PDP-8,11's C was developed for originally), the pointer version was even cheaper, specially without optimization. Nowadays optimizing compilers could easily detect any of these constructs and convert them to whatever is best.

More importantly, since the early nineties security became more important, and in general solely relying on null terminated strings property is frowned upon because small errors in validation can cause potentially exploitable buffer overflow issues. C and the its standards therefore deprecated the old string use, and now use "-n-" versions of the older string routines (strNcpy etc) that need a maximal length to be passed. This is adds the same extra live value, similar to the length, like a manually managed Pascal strings principle, where the programmer must take care of passing the length (or maximum buffer size for C's -N- functions) around. Pascal strings still have the advantage of getting to the last occupied char in an O(1) operation, and the fact that there are no forbidden chars though.

Length prefixed strings are also used extensively in file format, because, obviously, it is useful to know the number of bytes to read up front.

It's an old name dating back to the days where "C language versus Pascal language" was actually a comparison people made. Depending on who you ask, it's either specifically storing the length in the first byte, or refers to any length prefix (two bytes, four bytes). Other memory management details are not included, they are implementation-dependent and not a fundamental difference to C strings.

Pascal strings excel in... everything. NUL terminated strings save one to three bytes on short strings, which may have been useful in 1970 but isn't even worth mentioning today in virtually all circumstances. Aside from not being able to store a zero byte (which isn't too bad for text but rules out any kind of binary data), you can't determine string length efficiently. This affects a a good portion of string algorithms negatively. One example, in the comment you link to, is string comparison: If you have the length, you can instantly return false when comparing strings of different length. There are also many other downsides not related to performance.

For these reasons, virtually every language implementation newer than about 1980 uses length prefixes for strings. This is another reason why the "pascal string" name is outdated.

What are Pascal Strings?

Tags:

String

Data Structures

Pascal

Related

Recent Posts