API Reference

Functional interface specification and memory-ordering architecture for libcortlet-upgradesched. The runtime exposes an optimized, lock-free C11 API allowing low-latency user-space execution handling across core-pinned worker topologies.

The scheduler context acts as an opaque handle (cortlet_sched_t*) that manages underlying thread affinities, task ingestion paths, and hardware-bound ring buffers. All execution operations utilize lock-free atomic primitives to eliminate multi-threaded kernel contention.

Initialization & Lifecycle

cortlet_sched_init

cortlet_sched_t* cortlet_sched_init(void);

cortlet_sched_t* cortlet_sched_init(void);

Queries the underlying operating system interface to resolve the total active physical hardware processing context count (_SC_NPROCESSORS_ONLN) and boots an isolated, core-bound worker pool tracking grid.

Parameters

None.

Return Value

cortlet_sched_t* — Valid pointer to an initialized scheduler execution block context.
NULL — Returned if heap allocation or internal thread setup passes encounter OS faults.

Hardware Alignment & Verification

The library uses native posix_memalign to explicitly force the allocation base address of the internal cortlet_worker_t tracking structures to map directly on 64-byte boundaries.

This alignment strategy prevents cross-core false sharing across L1/L2 cache lines and improves lock-free scheduling performance.

cortlet_sched_destroy

void cortlet_sched_destroy(cortlet_sched_t* sched);

void cortlet_sched_destroy(cortlet_sched_t* sched);

Performs a clean, synchronous lifecycle teardown block over the specified scheduler execution tree.

Parameters

sched — Opaque control structure pointer returned from cortlet_sched_init.

Return Value

void

Task Dispatch & Coordination

cortlet_sched_push

int cortlet_sched_push(
    cortlet_sched_t* sched,
    cortlet_task_fn func,
    void* arg
);

int cortlet_sched_push(
    cortlet_sched_t* sched,
    cortlet_task_fn func,
    void* arg
);

Enqueues an execution frame into a target hardware core queue context via atomic round-robin load distribution.

Parameters

sched — Active scheduler instance context tracker block.
func — Worker routine matching:

void (*cortlet_task_fn)(void*);

void (*cortlet_task_fn)(void*);

arg — Opaque context payload passed directly to the worker callback.

Return Value

0 — Task successfully pushed.
-1 — Invalid scheduler or parameters.

Ingestion Flow Control & Backpressure

The queue bounds have a hard capacity limit of 4096 pending elements (MAX_QUEUE_SIZE).

If a target queue becomes full, cortlet_sched_push() enters a lock-free retry loop using memory_order_acquire checks and temporarily yields execution via sched_yield() until capacity becomes available.

cortlet_sched_wait

void cortlet_sched_wait(cortlet_sched_t* sched);

void cortlet_sched_wait(cortlet_sched_t* sched);

Acts as a synchronization barrier. The caller blocks until all outstanding tasks complete.

Parameters

sched — Active scheduler context.

Return Value

void

Memory Sync Strategy

while (atomic_load_explicit(
           &sched->tasks_in_flight,
           memory_order_acquire) > 0) {
    usleep(100);
}

while (atomic_load_explicit(
           &sched->tasks_in_flight,
           memory_order_acquire) > 0) {
    usleep(100);
}

This polling loop observes global task counters using memory_order_acquire semantics without requiring locks.

Low-Level Concurrency Architecture Matrix

Function	Thread Safety	Complexity	Memory Ordering
`cortlet_sched_init`	Non-Thread-Safe	$O(N)$	Standard heap initialization
`cortlet_sched_push`	Lock-Free Thread Safe	$O(1)$	`memory_order_release`
`cortlet_sched_wait`	Barrier Safe	Bounded Poll	`memory_order_acquire`
`cortlet_sched_destroy`	Non-Thread-Safe	$O(N)$	Thread teardown sequence

Memory Visibility Ordering Specs

The underlying pipeline uses a Single-Producer Multi-Consumer (SPMC) work-stealing architecture.

Local Queue Extraction (`queue_pop`)

Uses memory_order_acquire validation over queue tail trackers and pairs it with memory_order_release updates on queue boundaries.

Cross-Core Work Stealing (`queue_steal`)

When a worker exhausts its local queue, it attempts to steal work from sibling queues using hardware-level Compare-And-Swap (CAS) operations.

atomic_compare_exchange_strong_explicit(
    &q->head,
    &h,
    h + 1,
    memory_order_acq_rel,
    memory_order_acquire
);

atomic_compare_exchange_strong_explicit(
    &q->head,
    &h,
    h + 1,
    memory_order_acq_rel,
    memory_order_acquire
);

This guarantees immediate visibility across CPU cores and prevents race conditions or double-extraction bugs.

API Reference

Initialization & Lifecycle

cortlet_sched_init

Parameters

Return Value

Hardware Alignment & Verification

cortlet_sched_destroy

Parameters

Return Value

Task Dispatch & Coordination

cortlet_sched_push

Parameters

Return Value

Ingestion Flow Control & Backpressure

cortlet_sched_wait

Parameters

Return Value

Memory Sync Strategy

Low-Level Concurrency Architecture Matrix

Memory Visibility Ordering Specs

Local Queue Extraction (queue_pop)

Cross-Core Work Stealing (queue_steal)

Local Queue Extraction (`queue_pop`)

Cross-Core Work Stealing (`queue_steal`)