Why create heap when creating a linked list when we can simply do this?

When you declare 'struct node xyz;' in a function, it exists only so long as that function exists. If you add it to a linked list and then exit the function, that object no longer exists, but the linked list still has a reference to it. On the other hand, if you allocate it from the heap and add it to the linked list, it will still exist until it is removed from the linked list and deleted.

This mechanism allows an arbitrary number of nodes to be created at various times throughout your program and inserted into the linked list. The method you show above only allows a fixed number of specific items to be placed in the list for a short duration. You can do that, but it serves little purpose, since you could have just accessed the items directly outside the list.


But what if you had a file containing an unknown number of entries, and you needed to iterate over them, adding each one to the linked list? Think about how you might do that without malloc.

You would have a loop, and in each iteration you need to create a completely new "instance" of a node, different to all the other nodes. If you just had a bunch of locals, each loop iteration they would still be the same locals.


Of course you can do like that. but how far ? how many nodes are you going to create ? We use linkedlists when we don't know how many entries we need when we create the list. So how can you create nodes ? How much ? That's why we use malloc() (or new nodes).


Let's say you create a variable of type node called my_node:

struct node my_node;

You can access its members as my_node.data and my_node.next because it is not a pointer. Your code, however, will only be able to create 3 nodes. Let's say you have a loop that asks the user for a number and stores that number in the linked list, stopping only when the user types in 0. You don't know when the user will type in 0, so you have to have a way of creating variables while the program is running. "Creating a variable" at runtime is called dynamic memory allocation and is done by calling malloc, which always returns a pointer. Don't forget to free the dynamically allocated data after it is no longer needed, to do so call the free function with the pointer returned by malloc. The tutorial you mentioned is just explaining the fundamental concepts of linked lists, in an actual program you're not going to limit yourself to a fixed number of nodes but will instead make the linked list resizable depending on information you only have at runtime (unless a fixed-sized linked list is all you need).

Edit:

"Creating a variable at runtime" was just a highly simplified way of explaining the need for pointers. When you call malloc, it allocates memory on the heap and gives you an address, which you must store in a pointer.

int var = 5;
int * ptr = &var;

In this case, ptr is a variable (it was declared in all its glory) that holds the address of another variable, and so it is called a pointer. Now consider an excerpt from the tutorial you mentioned:

struct node* head = NULL;
head = (struct node*)malloc(sizeof(struct node));

In this case, the variable head will point to data allocated on the heap at runtime.

If you keep allocating nodes on the heap and assigning the returned address to the next member of the last node in the linked list, you will be able to iterate over the linked list simply by writing pointer_to_node = pointer_to_node->next. Example:

struct node * my_node = head; // my_node points to the first node in the linked list
while (true)
{
    printf("%d\n", my_node->data); // print the data of the node we're iterating over
    my_node = my_node->next; // advance the my_node pointer to the next node
    if (my_node->next == NULL) // let's assume that the 'next' member of the last node is always set to NULL
    {
        printf("%d\n", my_node->data);
        break;
    }
}

You can, of course, insert an element into any position of the linked list, not just at the end as I mentioned above. Note though that the only node you ever have a name for is head, all the others are accessed through pointers because you can't possibly name all nodes your program will ever have a hold of.