How can I make slot to be filled with multiple same-type objects in R?

Remembering that R works well on vectors, a first step is to think of 'Words' rather than 'Word'

## constructor, accessors, subset (also need [[, [<-, [[<- methods)
.Words <- setClass("Words",
    representation(words="character", parts="character"))
words <- function(x) x@words
parts <- function(x) x@parts
setMethod("length", "Words", function(x) length(words(x)))
setMethod("[", c("Words", "ANY", "missing"), function(x, i, j, ...) {
    initialize(x, words=words(x)[i], parts=parts(x)[i], ...)
})

## validity
setValidity("Words", function(object) {
    if (length(words(object)) == length(parts(object)))
        NULL
    else
        "'words()' and 'parts()' are not the same length"
})

@nicola's suggestion that one have a list of words has been formalized in the IRanges package (actually, S4Vectors in the 'devel' / 3.0 branch of Bioconductor), where a 'SimpleList' takes the 'naive' approach of requiring all elements of the list to have the same class, whereas a 'CompressedList' has similar behavior but actually is implemented as a vector-like object (one with a length(), [, and [[ methods) that is 'partitioned' (either by end or width) into groups.

library(IRanges)
.Sentences = setClass("Sentences",
    contains="CompressedList",    
    prototype=c(elementType="Words"))

One would then write a more user-friendly constructor, but the basic functionality is

## 0 Sentences
.Sentences()
## 1 sentence of 0 words
.Sentences(unlistData=.Words(), partitioning=PartitioningByEnd(0))
## 3 sentences of 2, 0, and 3 words
s3 <- .Sentences(unlistData=.Words(words=letters[1:5], parts=LETTERS[1:5]), 
    partitioning=PartitioningByEnd(c(2, 2, 5)))

leading to

> s3[[1]]
An object of class "Words"
Slot "word":
[1] "a" "b"

Slot "part":
[1] "A" "B"

> s3[[2]]
An object of class "Words"
Slot "word":
character(0)

Slot "part":
character(0)

> s3[[3]]
An object of class "Words"
Slot "word":
[1] "c" "d" "e"

Slot "part":
[1] "C" "D" "E"

Notice that some typical operations are fast because they can operate on the 'unlisted' elements without creating or destroying S4 instances, e.g., coercing all 'words' to upper case

setMethod(toupper, "Words", function(x) { x@word <- toupper(x@word); x })
setMethod(toupper, "Sentences", function(x) relist(toupper(unlist(x)), x))

This is 'fast' for large collections of sentences because unlist / relist is really on a slot access and creation of a single instance of 'Words'. Scalable Genomics with R and Bioconductor outlines this and other strategies.

In an answer @nicola says 'R is not perfectly suited for OO programming style' but it's probably more helpful to realize that R's S4 object oriented style differs from C++ and Java, just as R differs from C. In particular it's really valuable to continue thinking in terms of vectors when working with S4 -- Words rather than Word, People rather than Person...


I suggest just a work-around for this class of problems. Keep in mind that R is not perfectly suited for OO programming style and every solution will hardly show the solidity of other languages like Java or C++. However, you can declare your Sentence class with a words slot as a list. Then you define your constructor as such:

   Sentence<-function(words,stats) {
     #check for the components' class of words argument
     if (!is.list(words) || !all(sapply(words,function(x) class(x)=="Word"))) stop("Not valid words argument")
     #create the object
      new("Sentence", words=words, stats=stats)
   }

An example of such constructor can be find in the sp package for the Polygons class. You can see the body of that function.

If you want to avoid that user sets incorrectly the words slot, you can redefine the @<- operator such like:

    "@<-.Sentence"<-function(sentence,...) invisible(sentence)

I don't think that the last step is necessary. No matter what you do, user can always mess things up. For instance, he could directly call the new function bypassing your constructor. Or he could set the Word class to an arbitrary object and then pass it to Sentence. As I said, R is not perfect for this style of programming, so you should often adopt some kind of non-optimal solution.

Tags:

Object

Class

R

Slot