How can pattern matching be done on text?
If you want to pattern match on the head of a charlist, there's one slight difference you need to make in your second code snippet.
'a'
is actually a charlist with one element, so comparing with the head of a charlist will always be false. A charlist is really a list of integer values:
iex> 'abcd' == [97, 98, 99, 100]
true
The char a
equates to integer 97
. You can get the integer code of a character in Elixir by preceding it with a ?
, so:
iex> ?a == 97
true
iex> ?a == hd('a')
true
So in your guard clause, you'll want to match head == ?a
, or more simply:
defmodule MatchStick do
def doMatch([?a | _tail]), do: 1
def doMatch(_), do: 0
end
In Elixir, single quoted strings are quite different from double quoted strings. Single quoted strings are basically lists of integers, where each integer represents a character. Therefore, they are also called character lists. They are mainly used for compatibility with Erlang, because that's how Erlang strings work. You can use single quoted strings just like you would use lists:
iex> hd('a')
97
iex> [97 | rest] = 'abcd'
'abcd'
iex> rest
'bcd'
iex> 'ab' ++ rest = 'abcd'
'abcd'
iex> rest
'cd'
The match function for single quoted strings would look like this:
def match('a' ++ rest), do: 1
def match(_), do: 0
Elixir will hide the list from you and display it as a string, when all of the integers represent valid characters. To trick Elixir into showing you the internal representation of a character list, you can insert a 0
, which is an invalid character:
iex> string = 'abcd'
'abcd'
iex> string ++ [0]
[97, 98, 99, 100, 0]
However, one would typically use double quoted strings in Elixir, because these handle UTF-8 correctly, are much easier to work with and are used by all internal Elixir modules (for example the useful String
module). Double quoted strings are binaries, so you can treat them as any other binary type:
iex> <<97, 98, 99, 100>>
"abcd"
iex> <<1256 :: utf8>>
"Ө"
iex> <<97>> <> rest = "abcd"
"abcd"
iex> rest
"bcd"
iex> "ab" <> rest = "abcd"
"abcd"
iex> rest
"cd"
The match function for double quoted strings would look like this:
def match("a" <> rest), do: 1
def match(_), do: 0
Elixir will hide the internal representation of binary strings as well. To reveal it, you can again insert a 0
:
iex> string = "abcd"
"abcd"
iex> string <> <<0>>
<<97, 98, 99, 100, 0>>
Lastly, to convert between single quoted strings and double quoted strings you can use the functions to_string
and to_charlist
:
iex> to_string('abcd')
"abcd"
iex> to_charlist("abcd")
'abcd'
To detect them, you can use is_list
and is_binary
. These also work in guard clauses.
iex> is_list('abcd')
true
iex> is_binary('abcd')
false
iex> is_list("abcd")
false
iex> is_binary("abcd")
true
For example, to make the double quoted version compatible with single quoted strings:
def match(str) when is_list(str), do: match(to_string(str))
def match("a" <> rest), do: 1
def match(_), do: 0
Just in case someone needed. If you need to match on the part of the string that is in the known middle and you aware of its length then you can use binary matching:
iex(1)> <<"https://", locale::binary-size(2), ".wikipedia.com" >> = "https://en.wikipedia.com"
"https://en.wikipedia.com"
iex(2)> locale
"en"
defmodule MatchStick do
def doMatch("a" <> rest) do 1 end
def doMatch(_) do 0 end
end
You need to use the string concatenation operator seen here
Example:
iex> "he" <> rest = "hello"
"hello"
iex> rest
"llo"