If char*s are read only, why can I overwrite them?
The presented code snippet does not change the string literals themselves. It only changes the values stored in the pointer fruit
.
You can imagine these lines
char* fruit = "banana";
fruit = "apple";
the following way
char unnamed_static_array_banana[] = { 'b', 'a', 'n', 'a', 'n', 'a', '\0' };
char *fruit = &unnamed_static_array_banana[0];
char unnamed_static_array_apple[] = { 'a', 'p', 'p', 'l', 'e', '\0' };
fruit = &unnamed_static_array_apple[0];
These statements do not change the arrays that corresponds to the string literals.
On the other hand if you tried to write
char* fruit = "banana";
printf("fruit is %s\n", fruit);
fruit[0] = 'h';
^^^^^^^^^^^^^^
printf("fruit is %s\n", fruit);
that is if you tried to change a string literal using a pointer that points to it (to the first character of the string literal) then the program had undefined behavior.
From the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
In your program, the expression "banana"
denotes a string literal object in the program image, a character array. The value of the expression is of type char *
, or "pointer to character". The pointer points to the first byte of that array, the character 'b'
.
Your char *fruit
variable also has type "pointer to character" and takes its initial value from this expression: it is initialized to a copy of the pointer to the data, not the data itself; it merely points to the b
.
When you assign "apple"
to fruit
, you're just replacing its pointer value with another one, so it now points to a different literal array.
To modify the data itself, you need an expression such as:
char *fruit = "banana";
fruit[0] = 'z'; /* try to turn "banana" into "zanana" */
According to the ISO C standard, the behavior of this is not defined. It could be that the "banana"
array is read-only, but that is not required.
C implementations can make string literals writable, or make it an option.
(If you are able to modify a string literal, that doesn't mean that all is well. Firstly, your program is still not well defined according to ISO C: it is not portable. Secondly, the C compiler is allowed to merge literals which have common content into the same storage. This means that two occurrences of "banana"
in the program could in fact be exactly the same array. Furthermore, the string literal "nana"
occurring somewhere in the program could be the suffix of the array "banana"
occurring elsewhere; in other words, share the same storage. Modifying a literal can have surprising effects; the modification can appear in other literals.)
Also "static" and "read-only" aren't synonymous. Most static storage in C is in fact modifiable. We can create a modifiable static character array which holds a string like this:
/* at file scope, i.e. outside of any function */
char fruit[] = "banana";
Or:
{
/* in a function */
static fruit[] = "banana";
If we leave out the array size, it is automatically sized from the initializing string literal, and includes space for the null terminating byte. In the function, we need static
to put the array into static storage, otherwise we get a local variable.
These arrays can be modified; fruit[0] = 'z'
is well-defined behavior.
Also, in these situations, "banana"
doesn't denote a character array. The array is the variable fruit
; the "banana"
expression is just a piece of syntax which indicates the array's initial value:
char *fruit = "banana"; // "banana" is an object in program image
// initial value is a pointer to that object
char fruit_array[] = "apple"; // "apple" is syntax giving initial value
The fruit
object is writable - it can be set to point to a different string literal.
The string literals "banana"
and "apple"
are not writable. You can modify fruit
to point to a string literal, but if you do so then you should not attempt to modify the thing that fruit
points to:
char *fruit = "banana"; // fruit points to first character of string literal
fruit = "apple"; // okay, fruit points to first character of different string literal
*fruit = 'A'; // not okay, attempting to modify contents of string literal
fruit[1] = 'P'; // not okay, attempting to modify contents of string literal
Attempting to modify the contents of a string literal results in undefined behavior - your code may work as expected, or you may get a runtime error, or something completely unexpected may happen. For safety's sake, if you're defining a variable to point to a string literal, you should declare it const
:
const char *fruit = "banana"; // can also be written char const *
You can still assign fruit
to point to different strings:
fruit = "apple";
but if you try to modify what fruit
points to, the compiler will yell at you.
If you want to define a pointer that can only point to one specific string literal, then you can const
-qualify the pointer as well:
const char * const fruit = "banana"; // can also be written char const * const
This way, if you try to either write to what fruit
points to, or try to set fruit
to point to a different object, the compiler will yell at you.