API Reference
Functional interface specification and memory-ordering architecture for libcortlet-upgradesched. The runtime exposes an optimized, lock-free C11 API allowing low-latency user-space execution handling across core-pinned worker topologies.
The scheduler context acts as an opaque handle (cortlet_sched_t*) that manages
underlying thread affinities, task ingestion paths, and hardware-bound ring
buffers. All execution operations utilize lock-free atomic primitives to
eliminate multi-threaded kernel contention.
Initialization & Lifecycle
cortlet_sched_init
cortlet_sched_t* cortlet_sched_init(void);
cortlet_sched_t* cortlet_sched_init(void);
Queries the underlying operating system interface to resolve the total active
physical hardware processing context count (_SC_NPROCESSORS_ONLN) and boots
an isolated, core-bound worker pool tracking grid.
Parameters
- None.
Return Value
cortlet_sched_t*— Valid pointer to an initialized scheduler execution block context.NULL— Returned if heap allocation or internal thread setup passes encounter OS faults.
Hardware Alignment & Verification
The library uses native posix_memalign to explicitly force the allocation base
address of the internal cortlet_worker_t tracking structures to map directly
on 64-byte boundaries.
This alignment strategy prevents cross-core false sharing across L1/L2 cache lines and improves lock-free scheduling performance.
cortlet_sched_destroy
void cortlet_sched_destroy(cortlet_sched_t* sched);
void cortlet_sched_destroy(cortlet_sched_t* sched);
Performs a clean, synchronous lifecycle teardown block over the specified scheduler execution tree.
Parameters
sched— Opaque control structure pointer returned fromcortlet_sched_init.
Return Value
void
Task Dispatch & Coordination
cortlet_sched_push
int cortlet_sched_push(
cortlet_sched_t* sched,
cortlet_task_fn func,
void* arg
);
int cortlet_sched_push(
cortlet_sched_t* sched,
cortlet_task_fn func,
void* arg
);
Enqueues an execution frame into a target hardware core queue context via atomic round-robin load distribution.
Parameters
-
sched— Active scheduler instance context tracker block. -
func— Worker routine matching:
void (*cortlet_task_fn)(void*);
void (*cortlet_task_fn)(void*);
arg— Opaque context payload passed directly to the worker callback.
Return Value
0— Task successfully pushed.-1— Invalid scheduler or parameters.
Ingestion Flow Control & Backpressure
The queue bounds have a hard capacity limit of 4096 pending elements
(MAX_QUEUE_SIZE).
If a target queue becomes full, cortlet_sched_push() enters a lock-free retry
loop using memory_order_acquire checks and temporarily yields execution via
sched_yield() until capacity becomes available.
cortlet_sched_wait
void cortlet_sched_wait(cortlet_sched_t* sched);
void cortlet_sched_wait(cortlet_sched_t* sched);
Acts as a synchronization barrier. The caller blocks until all outstanding tasks complete.
Parameters
sched— Active scheduler context.
Return Value
void
Memory Sync Strategy
while (atomic_load_explicit(
&sched->tasks_in_flight,
memory_order_acquire) > 0) {
usleep(100);
}
while (atomic_load_explicit(
&sched->tasks_in_flight,
memory_order_acquire) > 0) {
usleep(100);
}
This polling loop observes global task counters using
memory_order_acquire semantics without requiring locks.
Low-Level Concurrency Architecture Matrix
| Function | Thread Safety | Complexity | Memory Ordering |
|---|---|---|---|
cortlet_sched_init | Non-Thread-Safe | Standard heap initialization | |
cortlet_sched_push | Lock-Free Thread Safe | memory_order_release | |
cortlet_sched_wait | Barrier Safe | Bounded Poll | memory_order_acquire |
cortlet_sched_destroy | Non-Thread-Safe | Thread teardown sequence |
Memory Visibility Ordering Specs
The underlying pipeline uses a Single-Producer Multi-Consumer (SPMC) work-stealing architecture.
Local Queue Extraction (queue_pop)
Uses memory_order_acquire validation over queue tail trackers and pairs it
with memory_order_release updates on queue boundaries.
Cross-Core Work Stealing (queue_steal)
When a worker exhausts its local queue, it attempts to steal work from sibling queues using hardware-level Compare-And-Swap (CAS) operations.
atomic_compare_exchange_strong_explicit(
&q->head,
&h,
h + 1,
memory_order_acq_rel,
memory_order_acquire
);
atomic_compare_exchange_strong_explicit(
&q->head,
&h,
h + 1,
memory_order_acq_rel,
memory_order_acquire
);
This guarantees immediate visibility across CPU cores and prevents race conditions or double-extraction bugs.