Galois Field Fourier Transform
Presumably your $\alpha$ is a primitive element satisfying the equation $\alpha^3=\alpha+1$ (as opposed to a primitive element satisfying the equation $\alpha^3=\alpha^2+1$, which is the other alternative) - at least that choice allowed me to reproduce your results for both the generator polynomial $g(x)$ as well as the polynomial $c_{nonsys}(x)=m(x)g(x)$.
However, there seems to be a mistake in your formula for the second definition. With this type of RS-codes, you do not evaluate the message polynomial at zero, only at the elements of the multiplicative group. The Wikipedia article concurs, so you should calculate $$ c_{tp}(x)=m(\alpha^6)x^6+m(\alpha^5)x^5+m(\alpha^4)x^4+m(\alpha^3)x^3+m(\alpha^2)x^2+m(\alpha)x+m(1), $$ which is equal to (unless I made a mistake) $$ c_{tp}(x)=\alpha^6x^6+\alpha x^5+\alpha x^4+\alpha^4 x^3+\alpha^6 x^2+\alpha^4x+1. $$
But I couldn't find the relation between the two encodings from the Wikipedia article at all. Are you sure that it was supposed to go like that? What it says (and also holds) is that the sequence of coefficients of the polynomial $c_{tp}(x)$ is the DFT of the sequence of the coefficients of the message polynomial $m(x)$. Therefore we should be able to recover the message as the inverse DFT of the coefficients of $c_{tp}(x)$. Let's try!
$$ c_{tp}(\alpha^{7-0})=c_{tp}(1)=\alpha^6+\alpha+\alpha+\alpha^4+\alpha^6+\alpha^4+1=1, $$ $$ c_{tp}(\alpha^{7-1})=c_{tp}(\alpha^6)=1+\alpha^3+\alpha^4+\alpha+\alpha^4+\alpha^3+1=\alpha, $$ $$ c_{tp}(\alpha^{7-2})=c_{tp}(\alpha^5)=\alpha+\alpha^5+1+\alpha^5+\alpha^2+\alpha^2+1=\alpha. $$ It does look like we, indeed, recovered the coefficients of $m(x)$. The coefficient of degree $i$ term is gotten as $c_{tp}(\alpha^{7-i})$. Because $c_{tp}(x)$ is a valid codeword, we know in advance that $c_{tp}(\alpha^j)=0$, for $j=1,2,3,4$ which is such as well, because those values are the coefficients of $x^6, x^5, x^4, x^3$ respectively, and the message polynomial is quadratic.
The equivalence proof in the Wikipedia article is about showing that the resulting set of vectors (= the RS-code) is the same for both methods of encoding messages. It is not attempting to say anything about transforming the codeword gotten by encoding a message $m(x)$ encoded in the spirit of the first definition to another codeword that would correspond to the same message $m(x)$, when encoded with the second method. I'm fairly sure that is all that the argument in Wikipedia is attempting to say.
Mind you, there should be a way of achieving the transformation that you were looking for. Unfortunately I don't remember right now, how it goes. It would be based on the fact that the polynomial multiplication that gave us the polynomial $c_{nonsys}(x)$ is more or less like a convolution. So when we take the DFT into account that corresponds to pointwise multiplication. Converting this idea into a useful formula that we could test takes more time and space than I can invest on this at the moment, so I stop at this point for now.
With $m(x)$ the message and $g(x) = (x-\alpha)(x-\alpha^2)(x-\alpha^3)(x-\alpha^4)$ the generator polynomial. Then $$ c_{\text{sys}}(x) = x^4m(x) + (x^4 m(x) \bmod g(x)) = d(x)g(x) $$ is a multiple of $g(x)$. On the other hand, $c_{\text{nonsys}}(x) = m(x)g(x)$ is another encoding of $m(x)$.
For any polynomial $f(x)$ of degree $6$ or less, define $F_i = f(\alpha^{-i})$, $0 \leq i \leq 6,$ and $F(z) = \sum_{i=0}^6 F_i z^i$ as the Fourier transform of $f(x)$. Then, $f_j = F(\alpha^j) = \sum_{j=0}^6 F_j\alpha^j$ and $f(x)$ is called the inverse Fourier transform of $F(z)$. (The factor $(1/N)$ that shows up in Fourier transforms happens to have value $1$ here). So what is the Fourier transform of $c_{\text{nonsys}}(x)$? We have $$C_{\text{nonsys}, i} = c_{\text{nonsys}}(\alpha^{-i}) = \begin{cases}0, & i = 3, 4, 5, 6,\\ m(\alpha^{-i})g(\alpha^{-i}), & i = 0, 1, 2\end{cases}$$ since $(\alpha^{-3}, \alpha^{-4},\alpha^{-5},\alpha^{-6}) = (\alpha^{4},\alpha^{3},\alpha^{3},\alpha^{1})$ and $c_{\text{nonsys}}(x)$, being a multiple of $g(x)$ has $\alpha, \alpha^2, \alpha^3, \alpha^4$ as roots. Thus, we have $$\begin{align*} C_{\text{nonsys}}(z) &= \sum_{i=0}^6 C_{\text{nonsys}, i}z^i\\ &= C_{\text{nonsys}, 0} + C_{\text{nonsys}, i}z + C_{\text{nonsys}, i}z^2\\ &= [m(1)g(1)] + [m(\alpha^{-1})g(\alpha^{-1})]z + [m(\alpha^{-2})g(\alpha^{-2})]z^2 \end{align*} $$ as the "message polynomial" of degree $2$ (or less) which gets encoded into $c_{\text{nonsys}}(x)$ as the "transmit points" codeword.
On Wikipedia there is written that we can transform from one definition to second by using Fourier transform.
Turns out it's not true and that section was removed. I show an example of actual codeword transform below. Fourier transform considers a set of values as a set of coefficients to a polynomial, and then uses that polynomial to evaluate a set of input values {1, α, α2, ..., αn-1} to produce a new set of values. Inverse Fourier transform or more generally, Lagrange interpolation, reverses the process. Consider Fourier transform as a mapping and Lagrange interpolation as the inverse mapping of a set of values to another set of values. A Fourier transform can be applied repeatedly, and then Lagrange interpolation can be applied the same number of times or vice versa to end up with the original set of values. A Fourier transform or it's inverse will not transform a codeword from one encoding scheme to another.
The encoding schemes for an RS(n,k) code linearly map a k element original message into an n element codeword, and can be implemented by multiplying the original message by an encoding matrix. Whether the elements of a codeword are considered as values or coefficients is based on the algorithms used in encoders or decoders.
how to convert BCH scheme to transmitting point scheme
Note that the following only works for (error free) codewords. I'm not aware of a way to map "codewords" with errors other than to correct them using an appropriate scheme base decoder, and then mapping the error free codeword.
To transform a codeword from one encoding scheme to another, a mapping matrix that can map one scheme's encoding matrix into another schemes encoding matrix can be used to map a codeword from one scheme to another (assuming that such a matrix can be created).
The following example uses original view evaluation points and BCH generator polynomial roots: {1, α, ... , αn-1}, and systematic encoding RS(n,k) where the leading k elements of a codeword are the original message, followed by (n-k) parity elements. This example uses the OP's example of RS(7,3), GF(2^3).
Note - I couldn't figure out how to format matrices side by side. What works for wiki articles isn't working here at SE.
Naming the message as a matrix m:
\begin{bmatrix} m0 & m1 & m2 \end{bmatrix}Naming the BCH encoding matrix BCH:
\begin{bmatrix} 1 & 0 & 0 & 2 & 3 & 5 & 5 \\ 0 & 1 & 0 & 1 & 6 & 4 & 2 \\ 0 & 0 & 1 & 4 & 7 & 7 & 5 \end{bmatrix}Naming the transmitting point encoding matrix RS:
\begin{bmatrix} 1 & 0 & 0 & 3 & 2 & 1 & 3 \\ 0 & 1 & 0 & 5 & 5 & 1 & 4 \\ 0 & 0 & 1 & 7 & 6 & 1 & 6 \end{bmatrix}Then encoding is a matrix multiply: $m \cdot BCH$ or $m \cdot RS$
The mapping matrix $M$ is the solution to $BCH \cdot M = RS$ .
M is a N x N matrix with too many potential variables, unless it's filled with 0's and 1 so only subsets of K by K+1 augmented matrices are solved using Gaussian elimination and filled into M:
\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 4 & 1 & 6 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 3 & 7 & 1 & 4 \\ 0 & 0 & 0 & 7 & 3 & 1 & 1 \end{bmatrix}For $RS \cdot M' = BCH$ , M':
\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 5 & 2 & 4 \\ 0 & 0 & 0 & 3 & 1 & 6 & 2 \\ 0 & 0 & 0 & 4 & 2 & 1 & 5 \\ 0 & 0 & 0 & 1 & 4 & 3 & 1 \end{bmatrix}Since these are systematic encoders, the original message terms are not modified by the encoders or by M or M', so M and M' only need the bottom right (n-k) by (n-k) part of the matrices to map the (n-k) parity terms.
Using the OP's example, the BCH encoded message = bch = {2,2,1,2,6,5,0}, the transmitting point encoded message = rs = {2,2,1,0,3,1,3}, and $bch \cdot M$ = {2,2,1,0,3,1,3} = rs and $rs \cdot M'$ = {2,2,1,2,6,5,0} = bch .
I don't know if a mapping matrix that deals with a non-systematic encoding scheme is possible. A non-systematic encoding matrix has k $\cdot$ n variables, while a systematic encoding matrix has a k by k identity matrix on the left part and k $\cdot$ (n-k) variables. Using the OP's example such a matrix would have to map 12 variables into 21 variables and vice versa.
In this case RS(7,3), GF(2^3), BCH type encoding can be made identical to transmitting point encoding by setting the first consecutive root in the generator polynomial to αk
(x-3)(x-6)(x-7)(x-5) = 1 x^4 + 7 x^3 + 6 x^2 + 1 x + 6
I've confirmed that this method does not always work, and depends on RS(n,k) and/or GF().
Note - the actual equation for BCH systematic equation should be a "subtraction" of the "remainder":
$c_{sys}(x)=x^4m(x) - (x^4m(x)) \bmod g(x)=\alpha x^6 + \alpha x^5 + x^4 + \alpha^5 x^3 + x^2 + \alpha^2 x + \alpha^5$.
Whether addition or subtraction should be used is masked when using GF(2^n), since both addition and subtraction are effectively xor. If using GF(p,n) with p > 2 and n >= 1, then it's important to keep track of when addition or subtraction should be used.
The wiki article now mentions practical (polynomial time) decoders for the original transmission point type of encoding (Berlekamp Welch, Gao, ...). These generate an error polynomial E(x) and F(x), where F(x) will be the original encoding polynomial if the error correction succeeds. However, these decoders are significantly slower than BCH decoders. Berlekamp Welch (Gaussian elimination) is O(n^3), and while Gao original view decoder (extended Euclid) may be O(n^2), it's 3 to 12 times slower than Sugiyama BCH decoder (also extended Euclid) in the cases I've tested.