SPARQL Optional query
This question is old, but the answer is still hard to understand clearly. Allow me to try in natural English with thanks to SPARQL_Order_Matters
When OPTIONALS appear at the beginning of a query, they either
- Don't match, and nothing happens
- Do match, and now this is the starting dataset against which the rest of the query must match
When OPTIONALS appear after some statement has already matched some data, they either
- Don't match, and nothing happens
- Do match, and some new triples are added to the results
So the real non-obvious behavior happens when an OPTIONAL is first, and it matches some triples. Now all query results match the contents of that OPTIONAL.
The ordering is important here
The semantics of SPARQL queries are expressed via the SPARQL algebra and the two queries here produce very different algebra. I use the SPARQL Query Validator provided by the Apache Jena project (disclaimer - I am a committer on that project) to generate the algebra.
Your first query produces the following algebra:
(base <http://example/base/>
(prefix ((ab: <http://learningsparql.com/ns/addressbook#>))
(project (?first ?last)
(leftjoin
(leftjoin
(bgp (triple ?s ab:lastName ?last))
(bgp (triple ?s ab:nick ?first)))
(bgp (triple ?s ab:firstName ?first))))))
And your second query produces the following algebra:
(base <http://example/base/>
(prefix ((ab: <http://learningsparql.com/ns/addressbook#>))
(project (?first ?last)
(join
(leftjoin
(leftjoin
(table unit)
(bgp (triple ?s ab:nick ?first)))
(bgp (triple ?s ab:firstName ?first)))
(bgp (triple ?s ab:lastName ?last))))))
As you can see the triple patterns in your query appear in different order and the operators differ. Importantly your second query has a join
which only preserves compatible solutions from both sides whereas the first query uses only leftjoin
which preserves LHS solutions as-is if there are no compatible solutions.
So in the first query you first find things with a ab:lastName
and then optionally add the ab:nick
or ab:firstName
if present hence you get all the people in your data returned.
In the second query you first find things with a ab:nick
and then optionally add things with a ab:firstName
before requiring that everything has a ab:lastName
. Therefore you can only get the person with a last name returned.
I thought the period in SPARQL query is the same as "and" operator.
No it merely terminates a triple pattern and may optionally follow other clauses (but is not required to do so), it is not an "and" operator.
Adjacent basic graph patterns are joined unless an alternative join operator (e.g. leftjoin
or minus
) is implied by the presence of an OPTIONAL
or MINUS
clause
Edit - What is table unit
?
table unit
is a special operator that corresponds to the empty graph pattern in a SPARQL query.
For example SELECT * WHERE { }
would produce the algebra (table unit)
It produces a single empty row which in the semantics of SPARQL means it can be joined to anything and returns the other thing so in essence it acts like a join identity. In many cases a SPARQL engine can simplify the algebra to remove table unit
since in most cases it has no effect on the semantics of the query.
In your first query there is technically another join
between table unit
and the join
operator but in the case of a normal join the presence of table unit
will have no effect (as it's the join identity) and so it can and is simplified out.
However with an OPTIONAL
the SPARQL specification requires that the algebra produced is a left join of the thing inside the clause with whatever the preceding clause was. In the case of your second query there is no preceding clause before your first OPTIONAL
(technically there is an implicit empty graph pattern there) so the first leftjoin
generated has table unit
on its left hand side. Unlike a normal join
the table unit
has to be preserved in this case because the semantics of leftjoin
say that the results from the LHS are preserved if there are no compatible solutions form the RHS.
We can illustrate this with a more trivial query:
SELECT *
WHERE
{
OPTIONAL { ?s a ?type }
}
Produces the algebra:
(base <http://example/base/>
(leftjoin
(table unit)
(bgp (triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type))))