Scala: split string via pattern matching
Starting Scala 2.13
, it's possible to pattern match a String
s by unapplying a string interpolator:
val s"$user@$domain.$zone" = "[email protected]"
// user: String = "user"
// domain: String = "domain"
// zone: String = "com"
If you are expecting malformed inputs, you can also use a match statement:
"[email protected]" match {
case s"$user@$domain.$zone" => Some(user, domain, zone)
case _ => None
}
// Option[(String, String, String)] = Some(("user", "domain", "com"))
Yes, you can do this with Scala's Regex functionality.
I found an email regex on this site, feel free to use another one if this doesn't suit you:
[-0-9a-zA-Z.+_]+@[-0-9a-zA-Z.+_]+\.[a-zA-Z]{2,4}
The first thing we have to do is add parentheses around groups:
([-0-9a-zA-Z.+_]+)@([-0-9a-zA-Z.+_]+)\.([a-zA-Z]{2,4})
With this we have three groups: the part before the @
, between @
and .
, and finally the TLD.
Now we can create a Scala regex from it and then use Scala's pattern matching unapply to get the groups from the Regex bound to variables:
val Email = """([-0-9a-zA-Z.+_]+)@([-0-9a-zA-Z.+_]+)\.([a-zA-Z]{2,4})""".r
Email: scala.util.matching.Regex = ([-0-9a-zA-Z.+_]+)@([-0-9a-zA-Z.+_]+)\.([a-zA-Z] {2,4})
"[email protected]" match {
case Email(name, domain, zone) =>
println(name)
println(domain)
println(zone)
}
// user
// domain
// com
In general regex is horribly inefficient, so wouldn't advise.
You CAN do it using Scala pattern matching by calling .toList on your string to turn it into List[Char]. Then your parts name
, domain
and zone
will also be List[Char], to turn them back into Strings use .mkString. Though I'm not sure how efficient this is.
I have benchmarked using basic string operations (like substring, indexOf, etc) for various use cases vs regex and regex is usually an order or two slower. And of course regex is hideously unreadible.
UPDATE: The best thing to do is to use Parsers, either the native Scala ones, or Parboiled2