Lua - convert string to table
You could use string.gsub function
t={}
str="text"
str:gsub(".",function(c) table.insert(t,c) end)
Just index each symbol and put it at same position in table.
local str = "text"
local t = {}
for i = 1, #str do
t[i] = str:sub(i, i)
end
The builtin string library treats Lua strings as byte arrays. An alternative that works on multibyte (Unicode) characters is the unicode library that originated in the Selene project. Its main selling point is that it can be used as a drop-in replacement for the string library, making most string operations “magically” Unicode-capable.
If you prefer not to add third party dependencies your task can easily be implemented using LPeg. Here is an example splitter:
local lpeg = require "lpeg"
local C, Ct, R = lpeg.C, lpeg.Ct, lpeg.R
local lpegmatch = lpeg.match
local split_utf8 do
local utf8_x = R"\128\191"
local utf8_1 = R"\000\127"
local utf8_2 = R"\194\223" * utf8_x
local utf8_3 = R"\224\239" * utf8_x * utf8_x
local utf8_4 = R"\240\244" * utf8_x * utf8_x * utf8_x
local utf8 = utf8_1 + utf8_2 + utf8_3 + utf8_4
local split = Ct (C (utf8)^0) * -1
split_utf8 = function (str)
str = str and tostring (str)
if not str then return end
return lpegmatch (split, str)
end
end
This snippet defines the function split_utf8()
that creates a table
of UTF8 characters (as Lua strings), but returns nil
if the string
is not a valid UTF sequence.
You can run this test code:
tests = {
en = [[Lua (/ˈluːə/ LOO-ə, from Portuguese: lua [ˈlu.(w)ɐ] meaning moon; ]]
.. [[explicitly not "LUA"[1]) is a lightweight multi-paradigm programming ]]
.. [[language designed as a scripting language with "extensible ]]
.. [[semantics" as a primary goal.]],
ru = [[Lua ([лу́а], порт. «луна») — интерпретируемый язык программирования, ]]
.. [[разработанный подразделением Tecgraf Католического университета ]]
.. [[Рио-де-Жанейро.]],
gr = [[Η Lua είναι μια ελαφρή προστακτική γλώσσα προγραμματισμού, που ]]
.. [[σχεδιάστηκε σαν γλώσσα σεναρίων με κύριο σκοπό τη δυνατότητα ]]
.. [[επέκτασης της σημασιολογίας της.]],
XX = ">\255< invalid"
}
-------------------------------------------------------------------------------
local limit = 14
for lang, str in next, tests do
io.write "\n"
io.write (string.format ("<%s %3d> ->", lang, #str))
local chars = split_utf8 (str)
if not chars then
io.write " INVALID!"
else
io.write (string.format (" <%3d>", #chars))
for i = 1, #chars > limit and limit or #chars do
io.write (string.format (" %q", chars [i]))
end
end
end
io.write "\n"
Btw., building a table with LPeg is significantly faster than calling
table.insert()
repeatedly.
Here are stats for splitting the whole of Gogol’s Dead Souls (in
Russian, 1023814 bytes raw, 571395 characters UTF) on my machine:
library method time in ms
string table.insert() 380
string t [#t + 1] = c 310
string gmatch & for loop 280
slnunicode table.insert() 220
slnunicode t [#t + 1] = c 200
slnunicode gmatch & for loop 170
lpeg Ct (C (...)) 70