How to parse a URL and extract the required substring

with URI.parse you can get:

require "uri"

uri = URI.parse("http://localhost:3000")
uri.scheme # http
uri.host # localhost
uri.port # 3000

Well, you can use regular expressions. Something like /http:\/\/([^\.]+)/, that is, the first group of non '.' letters after http.

Check out http://rubular.com/. You can test your regular expressions against a set of tests too, it's great for learning this tool.


You could use URI like

uri = URI.parse("http://something.example.com/directory/")
puts uri.host
# "something.example.com"

and you could then just work on the host.
Or there is a gem domainatrix from Remove subdomain from string in ruby

require 'rubygems'
require 'domainatrix'

url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix       # => "co.uk"
url.domain              # => "pauldix"
url.subdomain           # => "foo.bar"
url.path                # => "/asdf.html?q=arg"
url.canonical           # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"

and you could just take the subdomain.


I'd do it this way:

require 'uri'

uri = URI.parse('http://something.example.com/directory/')
uri.host.split('.').first
=> "something"

URI is built into Ruby. It's not the most full-featured but it's plenty capable of doing this task for most URLs. If you have IRIs then look at Addressable::URI.

Tags:

Ruby

Parsing