What is the meaning of the exclamation mark in indexing a Julia DataFrame?

Quoting the documentation of DataFrames.jl:

Columns can be directly (i.e. without copying) accessed via df.col or df[!, :col]. [...] Since df[!, :col] does not make a copy, changing the elements of the column vector returned by this syntax will affect the values stored in the original df. To get a copy of the column use df[:, :col]: changing the vector returned by this syntax does not change df.

An example might make this clearer:

julia> using DataFrames

julia> df = DataFrame(x = rand(5), y=rand(5))
5×2 DataFrame
│ Row │ x        │ y         │
│     │ Float64  │ Float64   │
├─────┼──────────┼───────────┤
│ 1   │ 0.937892 │ 0.42232   │
│ 2   │ 0.54413  │ 0.932265  │
│ 3   │ 0.961372 │ 0.680818  │
│ 4   │ 0.958788 │ 0.923667  │
│ 5   │ 0.942518 │ 0.0428454 │
# `a` is a copy of `df.x`: modifying it will not affect `df`
julia> a = df[:, :x]
5-element Array{Float64,1}:
 0.9378915597741728
 0.544130347207969
 0.9613717853719412
 0.958788066884128
 0.9425183324742632

julia> a[2] = 1;

julia> df
5×2 DataFrame
│ Row │ x        │ y         │
│     │ Float64  │ Float64   │
├─────┼──────────┼───────────┤
│ 1   │ 0.937892 │ 0.42232   │
│ 2   │ 0.54413  │ 0.932265  │
│ 3   │ 0.961372 │ 0.680818  │
│ 4   │ 0.958788 │ 0.923667  │
│ 5   │ 0.942518 │ 0.0428454 │
# `b` is a view of `df.x`: any change made to it will be reflected in df
julia> b = df[!, :x]
5-element Array{Float64,1}:
 0.9378915597741728
 0.544130347207969
 0.9613717853719412
 0.958788066884128
 0.9425183324742632

julia> b[2] = 1;

julia> df
5×2 DataFrame
│ Row │ x        │ y         │
│     │ Float64  │ Float64   │
├─────┼──────────┼───────────┤
│ 1   │ 0.937892 │ 0.42232   │
│ 2   │ 1.0      │ 0.932265  │
│ 3   │ 0.961372 │ 0.680818  │
│ 4   │ 0.958788 │ 0.923667  │
│ 5   │ 0.942518 │ 0.0428454 │


Note that, since the indexing with ! does not involve any data copy, it will generally be more efficient.


! in indexing is specific to DataFrames, and signals that you want a reference to the underlying vector storing the data, rather than a copy of it. You can read all about indexing DataFrames here. In your example the are both == because all values are identical, but they are not === since df[:, :Treatment] gives you a copy of the underlying data.

Example:

julia> using DataFrames

julia> df = DataFrame(y = [1, 2, 3]);

julia> df[:, :y] == df[!, :y] # true because all values are equal
true

julia> df[:, :y] === df[!, :y] # false because they are not the same vector
false

Tags:

Julia