How to separate domain and path from URL in C?

Emmm :

I try to read a URL (which can contains path or does not, e.g. http://google.com and http://www.google.com/abc/dfg) from command line argument, then separate them into domain name and path, but the results seems incorrect, and without valgrind shows no error but valgrind shows errors. I just cannot figure out how to fix it. Can somebody help? Thanks a lot!

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[]) {
    char* domain;
    char* path;

    sscanf(argv[1], "http://%[^/]/%[^\n]", domain, path);
    printf("domain: %s\n", domain);
    printf("path: %s\n", path);
}
bruno :

valgrind shows errors

and it is rigth to do, you missed to initialize domain and path to a valid (enough) storage

For instance do

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[]) {
    char domain[32];
    char path[64];

    sscanf(argv[1], "http://%31[^/]/%63[^\n]", domain, path);
    printf("domain: %s\n", domain);
    printf("path: %s\n", path);
}

Note I limited the length to read to not write out of the arrays (I removed 1 in the size to have place for the null character)

But I also encourage you to check argv and the value return by sscanf to know how much element you read, and to return a value :

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[]) {
    char domain[32];
    char path[64];

    if (argc < 2) {
      fprintf(stderr, "Usage: %s <URL>\n", *argv);
      return -1;
    }

    switch (sscanf(argv[1], "http://%31[^/]/%63[^\n]", domain, path)) {
    case 2:
      printf("path: %s\n", path);
      /* no break */
    case 1:
      printf("domain: %s\n", domain);
      break;
    default:
      fputs("this is not a valid url\n", stderr);
    }

    return 0;
}

of course you can also initialize domain[0] and path[0] to 0.

Compilation and executions:

pi@raspberrypi:/tmp $ gcc -Wall a.c
pi@raspberrypi:/tmp $ ./a.out
Usage: ./a.out <URL>
pi@raspberrypi:/tmp $ ./a.out aze
this is not a valid url
pi@raspberrypi:/tmp $ ./a.out http://google.com
domain: google.com
pi@raspberrypi:/tmp $ ./a.out http://google.com/abc/dfg
path: abc/dfg
domain: google.com
pi@raspberrypi:/tmp $ ./a.out http://google.com/a_too_long_path_to_be_memorized_in_only_64_characters_so_it_is_cut
path: a_too_long_path_to_be_memorized_in_only_64_characters_so_it_is_
domain: google.com
pi@raspberrypi:/tmp $ 

Note a secure url (https) is considered invalid

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=404706&siteId=1