Best ways of parsing a URL using C?
Personally, I steal the HTParse.c
module from the W3C (it is used in the lynx Web browser, for instance). Then, you can do things like:
strncpy(hostname, HTParse(url, "", PARSE_HOST), size)
The important thing about using a well-established and debugged library is that you do not fall into the typical traps of URL parsing (many regexps fail when the host is an IP address, for instance, specially an IPv6 one).
I wrote a simple code using sscanf, which can parse very basic URLs.
#include <stdio.h>
int main(void)
{
const char text[] = "http://192.168.0.2:8888/servlet/rece";
char ip[100];
int port = 80;
char page[100];
sscanf(text, "http://%99[^:]:%99d/%99[^\n]", ip, &port, page);
printf("ip = \"%s\"\n", ip);
printf("port = \"%d\"\n", port);
printf("page = \"%s\"\n", page);
return 0;
}
./urlparse
ip = "192.168.0.2"
port = "8888"
page = "servlet/rece"
May be late,...
what I have used, is - the http_parser_parse_url()
function and the required macros separated out from Joyent/HTTP parser lib - that worked well, ~600
LOC.