Apple - extract the top-level domain and the second-level domain from a URL
As you are already using awk
and are looking for a simple solution:
awk -F/ '{n=split($3, a, "."); printf("%s.%s", a[n-1], a[n])}' <<< 'http://www.example.com/index.php'
^ ^ ^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^
| | | |
| | | last two elements
| | |
| | +--- Split the 3rd field (aka the part after //) into
| | the array 'a', using '.' as the separator for splitting.
| | Returns the number of created array elements in 'n'.
| |
| +-------------- The awk code between the '' gets run once for every
| input line, with the fields split by -F/ stored in
| $1, $2 etc. In our case $1 contains "http:", $2 is
| empty, $3 contains "www.example.com" and $4 etc. the
| various path elements (if there are any)
|
+---------------- Split the input lines into fields, separated by '/'
Parsing URLs with Bash
The following questions should provide a good starting point:
- Parse URL in shell script
- Parse below URL in bash
@pjz's answer breaks apart a URL into more manageable parts:
#!/bin/sh
INPUT_URL="https://www.amazon.com/gp/product/B007B60SCG/ref=ox_sc_act_title_1?smid=ATVPDKIKX0DER&psc=1"
# extract the protocol
proto="`echo $INPUT_URL | grep '://' | sed -e's,^\(.*://\).*,\1,g'`"
# remove the protocol
url=`echo $INPUT_URL | sed -e s,$proto,,g`
# extract the user and password (if any)
userpass="`echo $url | grep @ | cut -d@ -f1`"
pass=`echo $userpass | grep : | cut -d: -f2`
if [ -n "$pass" ]; then
user=`echo $userpass | grep : | cut -d: -f1`
else
user=$userpass
fi
# extract the host -- updated
hostport=`echo $url | sed -e s,$userpass@,,g | cut -d/ -f1`
port=`echo $hostport | grep : | cut -d: -f2`
if [ -n "$port" ]; then
host=`echo $hostport | grep : | cut -d: -f1`
else
host=$hostport
fi
# extract the path (if any)
path="`echo $url | grep / | cut -d/ -f2-`"
echo $hostport
Given the $hostport
, you should now be able to strip back the domain as desired.