What is EOF and how to trigger it?
Tl;dr
You can generally "trigger EOF" in a program running in a terminal with a CTRL+D keystroke right after the last input flush.
What does EOF mean? How can I trigger it?
EOF means End-Of-File.
"Triggering EOF" in this case roughly means "making the program aware that no more input will be sent".
In this case, since getchar()
will return a negative number if no character is read, the execution is terminated.
But this doesn't only apply to your specific program, it applies to many different tools.
In general "triggering EOF" can be done with a CTRL+D keystroke right after the last input flush (i.e. by sending an empty input).
For example with cat
:
% cat >file # Hit ENTER
foo # Hit ENTER and CTRL+D
%
What's happening under the hood when hitting CTRL+D is that the input typed since the last input flush is flushed; when this happens to be an empty input the read()
syscall called on the program's STDIN returns 0
, getchar()
returns a negative number (-1
in the GNU C library) and this is in turn interpreted as EOF1.
1 - https://stackoverflow.com/a/1516177/4316166
TL;DR: EOF is not a character, it's a macro used for evaluating negative return of an input-reading function. One can use Ctrl+D to send EOT
character which will force function return -1
Every programmer must RTFM
Let us refer to "C A Reference Manual", by Harbison and Steele, 4th ed. from 1995, page 317:
The negative integer EOF is a value that is not an encoding of a "real character" . . . For example fget (section 15.6) returns EOF when at end-of-file, because there is no "real character" to be read.
Essentially EOF
isn't a character, but rather an integer value implemented in stdio.h
to represent -1
. Thus, kos's answer is correct as far as that goes, but it's not about receiving "empty" input. Important note is that here EOF serves as return value (of getchar()
) comparison , not to signify an actual character. The man getchar
supports that:
RETURN VALUE
fgetc(), getc() and getchar() return the character read as an unsigned char cast to an int or EOF on end of file or error.
gets() and fgets() return s on success, and NULL on error or when end of file occurs while no characters have been read.
ungetc() returns c on success, or EOF on error.
Consider the while
loop - its primary purpose is to repeat action if condition in the brackets is true. Look again:
while ((c = getchar ()) != EOF)
It basically says keep doing stuff if c = getchar()
returns successful code (0
or above ; it's a common thing by the way, try running successful command, then echo $?
and then failed echo $?
and see numbers they return ). So if we successfully get character and assing to C , returned status code is 0, failed is -1. EOF
is defined as -1
. Therefore when condition -1 == -1
occurs, loops stops. And when will that happen ? When there is no more character to get, when c = getchar()
fails. You could write while ((c = getchar ()) != -1)
and it still would work
Also, let's go back to the actual code, here's an excerpt from stdio.h
/* End of file character.
Some things throughout the library rely on this being -1. */
#ifndef EOF
# define EOF (-1)
#endif
ASCII codes and EOT
Although EOF character is not an actual character, however, there exists an EOT
(End of Transmission) character, which has ASCII decimal value of 04; it is linked to Ctrl+D shortcut ( represented also as meta character ^D
). The end of transmission chracter used to signify closing of a stream of data way back when computers were used to control telephone connections , hence "end of transmission" naming.
So it is possible to send that ascii value to the program like so, note the $'\04'
which is the EOT:
skolodya@ubuntu:$ ./a.out <<< "a,b,c $'\04'"
digits = 1 0 0 0 1 0 0 0 0 0, white space = 2, other = 9
Thus , we can say that it does exists, but it is not printable
Side Note
We often forget that in the past computers weren't as versatile - designers has to make use of every keyboard key available. Thus , sending EOT
character with CtrlD is still "sending a character" , not unlike typing capital A, ShiftA, you still make give computer an input with available keys. Thus EOT is a real character in a sense that it does come from user, it is readable by computer (though not printable, non visible by humans), it exists in computer memory
Byte Commander's comment
If you try to read from /dev/null, that should return an EOF as well, right? Or what do I get there?
Yes, exactly right, because in /dev/null
there is no actual character to be read , hence it c = getchar()
will return -1
code, and program will quit right away. Again command doesn't return EOF. EOF is just constant variable equal to -1 , which we use to compare return code of getchar function. EOF
doesn't exist as character, it's just a static value inside stdio.h
.
Demo:
# cat /dev/null shows there's no readable chars
DIR:/xieerqi
skolodya@ubuntu:$ cat /dev/null | cat -A
# Bellow is simple program that will open /dev/null for reading. Note the use of literal -1
DIR:/xieerqi
skolodya@ubuntu:$ cat readNull.c
#include<stdio.h>
void main()
{
char c;
FILE *file;
file = fopen("/dev/null", "r");
if (file)
{
printf ("Before while loop\n");
while ((c = getc(file)) != -1)
putchar(c);
printf("After while loop\n");
fclose(file);
}
}
DIR:/xieerqi
skolodya@ubuntu:$ gcc readNull.c -o readNull
DIR:/xieerqi
skolodya@ubuntu:$ ./readNull
Before while loop
After while loop
Another nail in the coffin
Sometimes it is attempted to be proved that EOF is a character with a code like this:
#include <stdio.h>
int main(void)
{
printf("%c", EOF);
return 0;
}
Problem with that is that char datatype can be a signed or unsigned value. In addition they are smallest addressable datatype which makes them very very useful in microcontrollers, where memory is limited. So instead of declaring int foo = 25;
it is common to see in microcontrollers with small memory char foo = 25;
or something similar. In addition , chars may be signed or unsigned.
One could verify that the size in bytes with a program like this:
#include <stdio.h>
int main(void)
{
printf("Size of int: %lu\n",sizeof(int));
printf("Sieze of char: %lu\n",sizeof(char));
//printf("%s", EOF);
return 0;
}
skolodya@ubuntu:$ ./EOF
Size of int: 4
Sieze of char: 1
What exactly is the point ? The point is that EOF is defined as -1, but char datatype can print integer values.
OK . . .so what if we try to print char as string ?
#include <stdio.h>
int main(void)
{
printf("%s", EOF);
return 0;
}
Obviously an error, but nonetheless,error will tell us something interesting:
skolodya@ubuntu:$ gcc EOF.c -o EOF
EOF.c: In function ‘main’: EOF.c:4:5: warning: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘int’ [-Wformat=] printf("%s", EOF);
Hex values
Printing EOF as a hex value gives FFFFFFFF
, a 16 bit (8 byte) value, two's compliment of a -1
.
#include <stdio.h>
int main(void)
{
printf("This is EOF: %X\n", EOF);
printf("This is Z: %X\n",'Z');
return 0;
}
Output:
DIR:/xieerqi
skolodya@ubuntu:$ ./EOF
This is EOF: FFFFFFFF
This is Z: 5A
Another curious thing occurs with the following code:
#include <stdio.h>
int main(void)
{
char c;
if (c = getchar())
printf ("%x",c);
return 0;
}
If one presses Shift + A , we get hex value 41 , obviously same as in ASCII table. But for Ctrl + D , we have ffffffff
, again - the return value of getchar()
stored in c
.
DIR:/xieerqi
skolodya@ubuntu:$ gcc EOF.c -o ASDF.asdf
DIR:/xieerqi
skolodya@ubuntu:$ ./ASDF.asdf
A
41
DIR:/xieerqi
skolodya@ubuntu:$ ./ASDF.asdf
ffffffff
Refer to other languages
Notice that other language avoid this confusion, because they operate on evaluating a function exit status, not comparing it with a macro. How does one read file in Java ?
File inputFile = new File (filename);
Scanner readFile = new Scanner(inputFile);
while (readFile.hasNext())
{ //more code bellow }
How about python?
with open("/etc/passwd") as file:
for line in file:
print line
EOF stands for end of file. While I do not know how to trigger the following symbol, you can run the following program through piping a file, which sends the EOF signal at the end:
echo "Some sample text" | ./a.out
where a.out
is your compiled source