How to calculate these totient summation sums efficiently?
For the case $j=0$, you can define some auxiliary summations to formulate an algorithm that runs in $O(n^{3/4})$ time:
$$F(N) = \lvert \{ a,b : 0 < a < b \le N \} \rvert$$
$$R(N) = \lvert \{ a,b : 0 < a < b \le N, \gcd(a,b) = 1 \} \rvert$$
You can see that we are looking for $R(N) + 1$. Also, $F(N)$ is $\dfrac{N(N-1)}{2}$.
Now observe something nice:
R$\left( \Big\lfloor\dfrac{N}{m}\Big\rfloor \right)$ = $\lvert \{ a,b : 0 < a < b \le N, \gcd(a,b) = m \} \rvert$
Why? This is because you can multiply every coprime pair of $(a,b)$ by $m$.
This fact lets you write $F$ in terms of $R$:
F(N) = $\displaystyle\sum_{m=1}^N{ R\left(\Big\lfloor\dfrac{N}{m}\Big\rfloor\right) } $
Since we are looking for $R(N)$, we solve for the first term of the right summation.
$R(N) = F(N) - \displaystyle\sum_{m=2}^N{ R\left(\Big\lfloor\dfrac{N}{m}\Big\rfloor\right) } $
Note this interesting property of the floor function here: $\Big\lfloor\dfrac{N}{m}\Big\rfloor$ will stay constant for a range of $m$. This lets us calculate the summation in chunks. example:
$\Big\lfloor\dfrac{1000}{m}\Big\rfloor$ is constant for $m$ in the range of [501,1000].
Here's a program I wrote in C++ that caches R to trade O(log n) memory for a large speedup
A sublinear algorithm, given by daniel.is.fischer, using the identity $$F(N) = \sum_{m=1}^N R \left( \Big \lfloor \frac N m \Big \rfloor \right)$$
rewrites $$ R(N) = F(N) - \displaystyle\sum_{m=2}^N{ R\left(\Big\lfloor\dfrac{N}{m}\Big\rfloor\right) } $$ into $$ R(N) = F(N) - F\left( \Big\lfloor \frac{N}{2} \Big\rfloor \right) - \sum_{k=1}^{\lfloor(N-1)/2\rfloor}{ R\left(\Big\lfloor\frac{N}{2k+1}\Big\rfloor\right) } $$ with $O(N^{3/4})$ time "if storing and retrieving the values are constant time operations." From Project Euler PDF 73 (73.10).
All the other answers are based on this Project Euler problem which deals with the totient sum in terms of the Farey sequence. There is a geometric way to view this (partially based on the explanation by philliplu in problem 625): Starting with $\sum_{d|n} \phi(d) = n$
we rewrite this as a double sum equal to the triangle numbers: $$T(n) = \sum_{a=1}^n \sum_{b|a} \phi(b) = \sum_{a=1}^n \sum_{b|a} \phi \left(\frac a b \right)$$
Writing out this sum in rows for $n=6$ (We can make a very similar triangle for reduced fractions): \begin{matrix} a=1: & \phi(1) & &\\ a=2: & \phi(2) & \phi(1) & \\ a=3: & \phi(3) & & \phi(1) \\ a=4: & \phi(4) & \phi(2) & & \phi(1) \\ a=5: & \phi(5) & & & & \phi(1) \\ a=6: & \phi(6) & \phi(3) & \phi(2) & & & \phi(1) \end{matrix}
The first column doesn't skip any values. The next column has a totient value every two numbers, the column after that has totient values every three numbers, and so on.
Let $\Phi(n) = \phi(1) + \cdots + \phi(n)$. By summing over the columns we can see $$T(n) = \sum_{x=1}^n \Phi \left(\Big \lfloor \frac n x \Big \rfloor \right)$$
Rearrange to solve for $\Phi(n)$: $$\Phi(n) = T(n) - \sum_{x=2}^n \Phi \left(\Big \lfloor \frac n x \Big \rfloor \right)$$
The observation is that for large $x$ (consider $x \ge \sqrt n$), $\lfloor n/x \rfloor$ is constant for many values. (A similar idea is used in my previous question for calculating $\sum \sigma(n)$.) We can calculate precisely how many times each $\Phi( k )$ value occurs. For $x$ in $(\lfloor n/2 \rfloor, n], \lfloor n/x \rfloor = 1$; for $x$ in $(\lfloor n/3 \rfloor, \lfloor n/2 \rfloor], \lfloor n/x \rfloor = 2$; etc.
Using this observation, we arrive at our $O(n^{3/4})$ formula: $$\Phi(n) = T(n) - \sum_{x=2}^{\lfloor \sqrt n \rfloor} \Phi \left(\Big \lfloor \frac n x \Big \rfloor \right) - \sum_{y=1}^{y_{max}} \left(\Big \lfloor \frac n y \Big \rfloor - \Big \lfloor \frac n {y+1} \Big \rfloor \right) \Phi(y)$$
The summation bound $y_{max}$ may need to be adjusted slightly for an edge case. More precisely:
$$ y_{max} = \begin{cases} \lfloor \sqrt{n} \rfloor - 1, & \text{if} \ \lfloor \sqrt{n} \rfloor = \left \lfloor \frac{n}{\lfloor \sqrt{n} \rfloor} \right\rfloor \\ \lfloor \sqrt{n} \rfloor, & \text{otherwise} \end{cases}$$
For implementation, we can memoize by using pre-sieved totient values for $k \le n^{2/3}$ to calculate $\Phi(k)$ at an almost linear cost, and a dictionary or an array indexed by $x$ to save large $\Phi(k)$, for an overall cost of almost $O(n^{2/3})$.
Any formula that can express $h(n)$ as a Dirichlet convolution $$h(n) = (f * g) (n) = \sum_{d|n} f(d) g(n/d)$$ has a sub-linear algorithm using the Dirichlet hyperbola method.
You should be able to calculate all of the values $\phi(1),\dots,\phi(N)$ simultaneously in time $O(N\log\log N)$, assuming you have sufficient memory.
To do so, set up a Sieve of Eratosthenes-type calculation, but instead of only recording whether every integer is prime or not, keep track of each step in the sieve that "crosses off" a given integer. The result is that you will have stored the list of all primes dividing $n$, for all $1\le n\le N$. (You can modify the sieve to get the complete factorization, but it's not important for this problem.)
Once you have this list in storage, you can calculate all the $\phi(n)$ by the formula $\phi(n)=n\prod_{p\mid n}(1-1/p)$. The total number of multiplications is $\sum_{n\le N} \omega(n)$, where $\omega(n)$ is the number of distinct prime factors of $n$; this sum is known to be $O(N\log\log N)$. (I'm sloppily counting a multiplication of two rational numbers as $1$ step.)
A similar setup will allow you to compute the values of any multiplicative function over an interval in time not much longer than the length of the interval.