API Reference

Functional interface specification and memory-ordering architecture for libcortlet-upgradesched. The runtime exposes an optimized, lock-free C11 API allowing low-latency user-space execution handling across core-pinned worker topologies.

The scheduler context acts as an opaque handle (cortlet_sched_t*) that manages underlying thread affinities, task ingestion paths, and hardware-bound ring buffers. All execution operations utilize lock-free atomic primitives to eliminate multi-threaded kernel contention.


Initialization & Lifecycle

cortlet_sched_init

C
cortlet_sched_t* cortlet_sched_init(void);

Queries the underlying operating system interface to resolve the total active physical hardware processing context count (_SC_NPROCESSORS_ONLN) and boots an isolated, core-bound worker pool tracking grid.

Parameters

  • None.

Return Value

  • cortlet_sched_t* — Valid pointer to an initialized scheduler execution block context.
  • NULL — Returned if heap allocation or internal thread setup passes encounter OS faults.

Hardware Alignment & Verification

The library uses native posix_memalign to explicitly force the allocation base address of the internal cortlet_worker_t tracking structures to map directly on 64-byte boundaries.

This alignment strategy prevents cross-core false sharing across L1/L2 cache lines and improves lock-free scheduling performance.


cortlet_sched_destroy

C
void cortlet_sched_destroy(cortlet_sched_t* sched);

Performs a clean, synchronous lifecycle teardown block over the specified scheduler execution tree.

Parameters

  • sched — Opaque control structure pointer returned from cortlet_sched_init.

Return Value

  • void

Task Dispatch & Coordination

cortlet_sched_push

C
int cortlet_sched_push(
    cortlet_sched_t* sched,
    cortlet_task_fn func,
    void* arg
);

Enqueues an execution frame into a target hardware core queue context via atomic round-robin load distribution.

Parameters

  • sched — Active scheduler instance context tracker block.

  • func — Worker routine matching:

C
void (*cortlet_task_fn)(void*);
  • arg — Opaque context payload passed directly to the worker callback.

Return Value

  • 0 — Task successfully pushed.
  • -1 — Invalid scheduler or parameters.

Ingestion Flow Control & Backpressure

The queue bounds have a hard capacity limit of 4096 pending elements (MAX_QUEUE_SIZE).

If a target queue becomes full, cortlet_sched_push() enters a lock-free retry loop using memory_order_acquire checks and temporarily yields execution via sched_yield() until capacity becomes available.


cortlet_sched_wait

C
void cortlet_sched_wait(cortlet_sched_t* sched);

Acts as a synchronization barrier. The caller blocks until all outstanding tasks complete.

Parameters

  • sched — Active scheduler context.

Return Value

  • void

Memory Sync Strategy

C
while (atomic_load_explicit(
           &sched->tasks_in_flight,
           memory_order_acquire) > 0) {
    usleep(100);
}

This polling loop observes global task counters using memory_order_acquire semantics without requiring locks.


Low-Level Concurrency Architecture Matrix

FunctionThread SafetyComplexityMemory Ordering
cortlet_sched_initNon-Thread-SafeO(N)O(N)Standard heap initialization
cortlet_sched_pushLock-Free Thread SafeO(1)O(1)memory_order_release
cortlet_sched_waitBarrier SafeBounded Pollmemory_order_acquire
cortlet_sched_destroyNon-Thread-SafeO(N)O(N)Thread teardown sequence

Memory Visibility Ordering Specs

The underlying pipeline uses a Single-Producer Multi-Consumer (SPMC) work-stealing architecture.

Local Queue Extraction (queue_pop)

Uses memory_order_acquire validation over queue tail trackers and pairs it with memory_order_release updates on queue boundaries.

Cross-Core Work Stealing (queue_steal)

When a worker exhausts its local queue, it attempts to steal work from sibling queues using hardware-level Compare-And-Swap (CAS) operations.

C
atomic_compare_exchange_strong_explicit(
    &q->head,
    &h,
    h + 1,
    memory_order_acq_rel,
    memory_order_acquire
);

This guarantees immediate visibility across CPU cores and prevents race conditions or double-extraction bugs.