Swift Combine: Buffer upstream values and emit them at a steady rate?

This is an interesting problem. I played with various combinations of Timer.publish, buffer, zip, and throttle, but I couldn't get any combination to work quite the way you want. So let's write a custom subscriber.

What we'd really like is an API where, when we get an input from upstream, we also get the ability to control when the upstream delivers the next input. Something like this:

extension Publisher {
    /// Subscribe to me with a stepping function.
    /// - parameter stepper: A function I'll call with each of my inputs, and with my completion.
    ///   Each time I call this function with an input, I also give it a promise function.
    ///   I won't deliver the next input until the promise is called with a `.more` argument.
    /// - returns: An object you can use to cancel the subscription asynchronously.
    func step(with stepper: @escaping (StepEvent<Output, Failure>) -> ()) -> AnyCancellable {
        ???
    }
}

enum StepEvent<Input, Failure: Error> {
    /// Handle the Input. Call `StepPromise` when you're ready for the next Input,
    /// or to cancel the subscription.
    case input(Input, StepPromise)

    /// Upstream completed the subscription.
    case completion(Subscribers.Completion<Failure>)
}

/// The type of callback given to the stepper function to allow it to continue
/// or cancel the stream.
typealias StepPromise = (StepPromiseRequest) -> ()

enum StepPromiseRequest {
    // Pass this to the promise to request the next item from upstream.
    case more

    // Pass this to the promise to cancel the subscription.
    case cancel
}

With this step API, we can write a pace operator that does what you want:

extension Publisher {
    func pace<Context: Scheduler, MySubject: Subject>(
        _ pace: Context.SchedulerTimeType.Stride, scheduler: Context, subject: MySubject)
        -> AnyCancellable
        where MySubject.Output == Output, MySubject.Failure == Failure
    {
        return step {
            switch $0 {
            case .input(let input, let promise):
                // Send the input from upstream now.
                subject.send(input)

                // Wait for the pace interval to elapse before requesting the
                // next input from upstream.
                scheduler.schedule(after: scheduler.now.advanced(by: pace)) {
                    promise(.more)
                }

            case .completion(let completion):
                subject.send(completion: completion)
            }
        }
    }
}

This pace operator takes pace (the required interval between outputs), a scheduler on which to schedule events, and a subject on which to republish the inputs from upstream. It handles each input by sending it through subject, and then using the scheduler to wait for the pace interval before asking for the next input from upstream.

Now we just have to implement the step operator. Combine doesn't give us too much help here. It does have a feature called “backpressure”, which means a publisher cannot send an input downstream until the downstream has asked for it by sending a Subscribers.Demand upstream. Usually you see downstreams send an .unlimited demand upstream, but we're not going to. Instead, we're going to take advantage of backpressure. We won't send any demand upstream until the stepper completes a promise, and then we'll only send a demand of .max(1), so we make the upstream operate in lock-step with the stepper. (We also have to send an initial demand of .max(1) to start the whole process.)

Okay, so need to implement a type that takes a stepper function and conforms to Subscriber. It's a good idea to review the Reactive Streams JVM Specification, because Combine is based on that specification.

What makes the implementation difficult is that several things can call into our subscriber asynchronously:

The upstream can call into the subscriber from any thread (but is required to serialize its calls).
After we've given promise functions to the stepper, the stepper can call those promises on any thread.
We want the subscription to be cancellable, and that cancellation can happen on any thread.
All this asynchronicity means we have to protect our internal state with a lock.
We have to be careful not to call out while holding that lock, to avoid deadlock.

We'll also protect the subscriber from shenanigans involving calling a promise repeatedly, or calling outdated promises, by giving each promise a unique id.

Se here's our basic subscriber definition:

import Combine
import Foundation

public class SteppingSubscriber<Input, Failure: Error> {

    public init(stepper: @escaping Stepper) {
        l_state = .subscribing(stepper)
    }

    public typealias Stepper = (Event) -> ()

    public enum Event {
        case input(Input, Promise)
        case completion(Completion)
    }

    public typealias Promise = (Request) -> ()

    public enum Request {
        case more
        case cancel
    }

    public typealias Completion = Subscribers.Completion<Failure>

    private let lock = NSLock()

    // The l_ prefix means it must only be accessed while holding the lock.
    private var l_state: State
    private var l_nextPromiseId: PromiseId = 1

    private typealias PromiseId = Int

    private var noPromiseId: PromiseId { 0 }
}

Notice that I moved the auxiliary types from earlier (StepEvent, StepPromise, and StepPromiseRequest) into SteppingSubscriber and shortened their names.

Now let's consider l_state's mysterious type, State. What are all the different states our subscriber could be in?

We could be waiting to receive the Subscription object from upstream.
We could have received the Subscription from upstream and be waiting for a signal (an input or completion from upstream, or the completion of a promise from the stepper).
We could be calling out to the stepper, which we want to be careful in case it completes a promise while we're calling out to it.
We could have been cancelled or have received completion from upstream.

So here is our definition of State:

extension SteppingSubscriber {
    private enum State {
        // Completed or cancelled.
        case dead

        // Waiting for Subscription from upstream.
        case subscribing(Stepper)

        // Waiting for a signal from upstream or for the latest promise to be completed.
        case subscribed(Subscribed)

        // Calling out to the stopper.
        case stepping(Stepping)

        var subscription: Subscription? {
            switch self {
            case .dead: return nil
            case .subscribing(_): return nil
            case .subscribed(let subscribed): return subscribed.subscription
            case .stepping(let stepping): return stepping.subscribed.subscription
            }
        }

        struct Subscribed {
            var stepper: Stepper
            var subscription: Subscription
            var validPromiseId: PromiseId
        }

        struct Stepping {
            var subscribed: Subscribed

            // If the stepper completes the current promise synchronously with .more,
            // I set this to true.
            var shouldRequestMore: Bool
        }
    }
}

Since we're using NSLock (for simplicity), let's define an extension to ensure we always match locking with unlocking:

fileprivate extension NSLock {
    @inline(__always)
    func sync<Answer>(_ body: () -> Answer) -> Answer {
        lock()
        defer { unlock() }
        return body()
    }
}

Now we're ready to handle some events. The easiest event to handle is asynchronous cancellation, which is the Cancellable protocol's only requirement. If we're in any state except .dead, we want to become .dead and, if there's an upstream subscription, cancel it.

extension SteppingSubscriber: Cancellable {
    public func cancel() {
        let sub: Subscription? = lock.sync {
            defer { l_state = .dead }
            return l_state.subscription
        }
        sub?.cancel()
    }
}

Notice here that I don't want to call out to the upstream subscription's cancel function while lock is locked, because lock isn't a recursive lock and I don't want to risk deadlock. All use of lock.sync follows the pattern of deferring any call-outs until after the lock is unlocked.

Now let's implement the Subscriber protocol requirements. First, let's handle receiving the Subscription from upstream. The only time this should happen is when we're in the .subscribing state, but .dead is also possible in which case we want to just cancel the upstream subscription.

extension SteppingSubscriber: Subscriber {
    public func receive(subscription: Subscription) {
        let action: () -> () = lock.sync {
            guard case .subscribing(let stepper) = l_state else {
                return { subscription.cancel() }
            }
            l_state = .subscribed(.init(stepper: stepper, subscription: subscription, validPromiseId: noPromiseId))
            return { subscription.request(.max(1)) }
        }
        action()
    }

Notice that in this use of lock.sync (and in all later uses), I return an “action” closure so I can perform arbitrary call-outs after the lock has been unlocked.

The next Subscriber protocol requirement we'll tackle is receiving a completion:

    public func receive(completion: Subscribers.Completion<Failure>) {
        let action: (() -> ())? = lock.sync {
            // The only state in which I have to handle this call is .subscribed:
            // - If I'm .dead, either upstream already completed (and shouldn't call this again),
            //   or I've been cancelled.
            // - If I'm .subscribing, upstream must send me a Subscription before sending me a completion.
            // - If I'm .stepping, upstream is currently signalling me and isn't allowed to signal
            //   me again concurrently.
            guard case .subscribed(let subscribed) = l_state else {
                return nil
            }
            l_state = .dead
            return { [stepper = subscribed.stepper] in
                stepper(.completion(completion))
            }
        }
        action?()
    }

The most complex Subscriber protocol requirement for us is receiving an Input:

We have to create a promise.
We have to pass the promise to the stepper.
The stepper could complete the promise before returning.
After the stepper returns, we have to check whether it completed the promise with .more and, if so, return the appropriate demand upstream.

Since we have to call out to the stepper in the middle of this work, we have some ugly nesting of lock.sync calls.

    public func receive(_ input: Input) -> Subscribers.Demand {
        let action: (() -> Subscribers.Demand)? = lock.sync {
            // The only state in which I have to handle this call is .subscribed:
            // - If I'm .dead, either upstream completed and shouldn't call this,
            //   or I've been cancelled.
            // - If I'm .subscribing, upstream must send me a Subscription before sending me Input.
            // - If I'm .stepping, upstream is currently signalling me and isn't allowed to
            //   signal me again concurrently.
            guard case .subscribed(var subscribed) = l_state else {
                return nil
            }

            let promiseId = l_nextPromiseId
            l_nextPromiseId += 1
            let promise: Promise = { request in
                self.completePromise(id: promiseId, request: request)
            }
            subscribed.validPromiseId = promiseId
            l_state = .stepping(.init(subscribed: subscribed, shouldRequestMore: false))
            return { [stepper = subscribed.stepper] in
                stepper(.input(input, promise))

                let demand: Subscribers.Demand = self.lock.sync {
                    // The only possible states now are .stepping and .dead.
                    guard case .stepping(let stepping) = self.l_state else {
                        return .none
                    }
                    self.l_state = .subscribed(stepping.subscribed)
                    return stepping.shouldRequestMore ? .max(1) : .none
                }

                return demand
            }
        }

        return action?() ?? .none
    }
} // end of extension SteppingSubscriber: Publisher

The last thing our subscriber needs to handle is the completion of a promise. This is complicated for several reasons:

We want to protect against a promise being completed multiple times.
We want to protect against an older promise being completed.
We can be in any state when a promise is completed.

Thus:

extension SteppingSubscriber {
    private func completePromise(id: PromiseId, request: Request) {
        let action: (() -> ())? = lock.sync {
            switch l_state {
            case .dead, .subscribing(_): return nil
            case .subscribed(var subscribed) where subscribed.validPromiseId == id && request == .more:
                subscribed.validPromiseId = noPromiseId
                l_state = .subscribed(subscribed)
                return { [sub = subscribed.subscription] in
                    sub.request(.max(1))
                }
            case .subscribed(let subscribed) where subscribed.validPromiseId == id && request == .cancel:
                l_state = .dead
                return { [sub = subscribed.subscription] in
                    sub.cancel()
                }
            case .subscribed(_):
                // Multiple completion or stale promise.
                return nil
            case .stepping(var stepping) where stepping.subscribed.validPromiseId == id && request == .more:
                stepping.subscribed.validPromiseId = noPromiseId
                stepping.shouldRequestMore = true
                l_state = .stepping(stepping)
                return nil
            case .stepping(let stepping) where stepping.subscribed.validPromiseId == id && request == .cancel:
                l_state = .dead
                return { [sub = stepping.subscribed.subscription] in
                    sub.cancel()
                }
            case .stepping(_):
                // Multiple completion or stale promise.
                return nil
            }
        }

        action?()
    }
}

Whew!

With all that done, we can write the real step operator:

extension Publisher {
    func step(with stepper: @escaping (SteppingSubscriber<Output, Failure>.Event) -> ()) -> AnyCancellable {
        let subscriber = SteppingSubscriber<Output, Failure>(stepper: stepper)
        self.subscribe(subscriber)
        return .init(subscriber)
    }
}

And then we can try out that pace operator from above. Since we don't do any buffering in SteppingSubscriber, and the upstream in general isn't buffered, we'll stick a buffer in between the upstream and our pace operator.

    var cans: [AnyCancellable] = []

    func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
        let erratic = Just("A").delay(for: 0.0, tolerance: 0.001, scheduler: DispatchQueue.main).eraseToAnyPublisher()
            .merge(with: Just("B").delay(for: 0.3, tolerance: 0.001, scheduler: DispatchQueue.main).eraseToAnyPublisher())
            .merge(with: Just("C").delay(for: 0.6, tolerance: 0.001, scheduler: DispatchQueue.main).eraseToAnyPublisher())
            .merge(with: Just("D").delay(for: 5.0, tolerance: 0.001, scheduler: DispatchQueue.main).eraseToAnyPublisher())
            .merge(with: Just("E").delay(for: 5.3, tolerance: 0.001, scheduler: DispatchQueue.main).eraseToAnyPublisher())
            .merge(with: Just("F").delay(for: 5.6, tolerance: 0.001, scheduler: DispatchQueue.main).eraseToAnyPublisher())
            .handleEvents(
                receiveOutput: { print("erratic: \(Double(DispatchTime.now().rawValue) / 1_000_000_000) \($0)") },
                receiveCompletion: { print("erratic: \(Double(DispatchTime.now().rawValue) / 1_000_000_000) \($0)") }
        )
            .makeConnectable()

        let subject = PassthroughSubject<String, Never>()

        cans += [erratic
            .buffer(size: 1000, prefetch: .byRequest, whenFull: .dropOldest)
            .pace(.seconds(1), scheduler: DispatchQueue.main, subject: subject)]

        cans += [subject.sink(
            receiveCompletion: { print("paced: \(Double(DispatchTime.now().rawValue) / 1_000_000_000) \($0)") },
            receiveValue: { print("paced: \(Double(DispatchTime.now().rawValue) / 1_000_000_000) \($0)") }
            )]

        let c = erratic.connect()
        cans += [AnyCancellable { c.cancel() }]

        return true
    }

And here, at long last, is the output:

erratic: 223394.17115897 A
paced: 223394.171495405 A
erratic: 223394.408086369 B
erratic: 223394.739186984 C
paced: 223395.171615624 B
paced: 223396.27056174 C
erratic: 223399.536717127 D
paced: 223399.536782847 D
erratic: 223399.536834495 E
erratic: 223400.236808469 F
erratic: 223400.236886323 finished
paced: 223400.620542561 E
paced: 223401.703613078 F
paced: 223402.703828512 finished

Timestamps are in units of seconds.
The erratic publisher's timings are, indeed, erratic and sometimes close in time.
The paced timings are always at least one second apart even when the erratic events occur less than one second apart.
When an erratic event occurs more than one second after the prior event, the paced event is sent immediately following the erratic event without further delay.
The paced completion occurs one second after the last paced event, even though the erratic completion occurs immediately after the last erratic event. The buffer doesn't send the completion until it receives another demand after it sends the last event, and that demand is delayed by the pacing timer.

I've put the the entire implementation of the step operator in this gist for easy copy/paste.

Could Publishers.CollectByTime be useful here somewhere?

Publishers.CollectByTime(upstream: upstreamPublisher.share(), strategy: Publishers.TimeGroupingStrategy.byTime(RunLoop.main, .seconds(1)), options: nil)

EDIT

There's an even simpler approach to the original one outlined below, which doesn't require a pacer, but instead uses back-pressure created by flatMap(maxPublishers: .max(1)).

flatMap sends a demand of 1, until its returned publisher, which we could delay, completes. We'd need a Buffer publisher upstream to buffer the values.

// for demo purposes, this subject sends a Date:
let subject = PassthroughSubject<Date, Never>()
let interval = 1.0

let pub = subject
   .buffer(size: .max, prefetch: .byRequest, whenFull: .dropNewest)
   .flatMap(maxPublishers: .max(1)) {
      Just($0)
        .delay(for: .seconds(interval), scheduler: DispatchQueue.main)
   }

ORIGINAL

I know this is an old question, but I think there's a much simpler way to implement this, so I thought I'd share.

The idea is similar to a .zip with a Timer, except instead of a Timer, you would .zip with a time-delayed "tick" from a previously sent value, which can be achieved with a CurrentValueSubject. CurrentValueSubject is needed instead of a PassthroughSubject in order to seed the first ever "tick".

// for demo purposes, this subject sends a Date:
let subject = PassthroughSubject<Date, Never>()

let pacer = CurrentValueSubject<Void, Never>(())
let interval = 1.0

let pub = subject.zip(pacer)
   .flatMap { v in
      Just(v.0) // extract the original value
        .delay(for: .seconds(interval), scheduler: DispatchQueue.main)
        .handleEvents(receiveOutput: { _ in 
           pacer.send() // send the pacer "tick" after the interval
        }) 
   }

What happens is that the .zip gates on the pacer, which only arrives after a delay from a previously sent value.

If the next value comes earlier than the allowed interval, it waits for the pacer. If, however, the next value comes later, then the pacer already has a new value to provide instantly, so there would be no delay.

If you used it like in your test case:

let c = pub.sink { print("\($0): \(Date())") }

subject.send(Date())
subject.send(Date())
subject.send(Date())

DispatchQueue.main.asyncAfter(deadline: .now() + 1.0) {
   subject.send(Date())
   subject.send(Date())
}

DispatchQueue.main.asyncAfter(deadline: .now() + 10.0) {
   subject.send(Date())
   subject.send(Date())
}

the result would be something like this:

2020-06-23 19:15:21 +0000: 2020-06-23 19:15:21 +0000
2020-06-23 19:15:21 +0000: 2020-06-23 19:15:22 +0000
2020-06-23 19:15:21 +0000: 2020-06-23 19:15:23 +0000
2020-06-23 19:15:22 +0000: 2020-06-23 19:15:24 +0000
2020-06-23 19:15:22 +0000: 2020-06-23 19:15:25 +0000
2020-06-23 19:15:32 +0000: 2020-06-23 19:15:32 +0000
2020-06-23 19:15:32 +0000: 2020-06-23 19:15:33 +0000

Swift Combine: Buffer upstream values and emit them at a steady rate?

Tags:

Ios

Swift

Combine

Related

Recent Posts