- Data Structures Used by the Scheduler
- struct task_struct
Type |
Name |
Description |
long |
state |
TASK_RUNNING, TASK_(UN)INTERRUPTIBLE, ... |
int |
prio |
dynamic priority based on static_prio and sleep_avg |
int |
static_prio |
static priority |
unsigned long |
rt_priority |
real-time priority |
unsigned long |
policy |
SCHED_NORMAL, SCHED_FIFO, SCHED_RR, SCHED_BATCH |
unsigned int |
time_slice |
ticks left in the time quantum of the process |
unsigned int |
first_time_slice |
1 if never exhasusted quantum, otherwise 0 |
unsigned long |
sleep_avg |
average sleep time |
unsigned long long |
timestamp |
time of last context switch that it is replaced or time of
last insertion in the runqueue |
unsigned long long |
last_ran |
time of last context switch that it is replaced |
struct prio_array * |
array |
pointer to the runqueue's priority array that inludes the
process |
struct list_head |
run_list |
pointers to the next and previous elements in the runqueue
list to which the process belongs |
gid_t |
gid, egid, sgid |
group ID of the process |
- struct rq
Type |
Name |
Description |
spinlock_t |
lock |
Only one task can modify the runqueue at any time |
unsigned long |
nr_running |
Number of runnable tasks in the runqueue |
unsigned long |
expired_timestamp |
Last time a task is running out of time quantum |
unsigned long long |
timestamp_last_tick |
time of last scheduler tick |
int |
best_expired_prio |
The highest priority of any expired task |
struct task_struct * |
curr |
pointer to the currently running process |
struct task_struct * |
idle |
pointer to the idle process |
struct prio_array * |
active |
Pointer to the lists of active processes |
struct prio_array * |
expired |
Pointer to the lists of expired processes |
struct prio_array [2] |
arrays |
The two sets of active and expired processes |
- struct prio_array
Type |
Name |
Description |
unsigned int |
nr_active |
number of tasks in the array |
unsigned long [5] |
bitmap |
priority bitmap |
struct list_head [MAX_PRIO] |
queue |
an array of 140 priority queues (if MAX_PRIO = 140) |
- Question: How to understand p->array->queue + p->prio
and which tasks are pointed to by p->runlist?
- Functions Used by the Scheduler
- schedule()
- scheduler_tick()
- effective_prio()
- How time_slice is changed?
- In sched_fork(), time_slice is shared between parent and child.
- In scheduler_tick(), time_slice is decremented, if it becomes
0, a new time_slice is calculated depending on different scheduling
policies. The task might be moved around in the priority queue.
- In sched_exit(), when a process exits, time_slice is retrieved
by its parent.
- How static_prio is used?
- It is never changed in the kernel.
- It is used to calculate the nice value (TASK_NICE(p),
TASK_USER_PRIO(p), set_user_nice()), the time slices
(task_timeslice()), the interactivity (TASK_INTERACTIVE(p)),
dynamic priority (__normal_prio()).
- task_timeslice() calculate the time slice values based on
static_prio:
- if static_prio < 120, it returns (140-static_prio) * 20 milliseconds
- if static_prio >= 120, it returns (140-static_prio) * 5 milliseconds
- How prio (dynamic priority) is used?
- It determines which priority array a task will be added/removed:
Related functions: dequeue_task(), enqueue_task(), requeue_task(), enqueue_task_head()
- It is calculated based on the static_prio but is
modified by bonuses/penalties according to sleep_avg:
prio = max(100, min(static_prio - bonus + 5, 139))
Related functions: __normal_prio(), normal_prio(), effective_prio(), recalc_task_prio()
- likely/unlikely macros: defined in
<include/linux/compiler.h>, used for branch prediction.
if (likely(x)) // equivalent to "if (x)"
{ A; } // A is more probable
else
{ B; }
|
if (unlikely(x)) // equivalent to "if (x)"
{ A; }
else
{ B; } // B is more probable
|
- HZ/jiffies: used to measure time in Linux.
- System timers interrupt the processor at a certain frequency.
- HZ is the number of timer ticks per second, or, the
frequency of timer interrupts. It is defined in
<include/asm/param.h>. On x86 systems, it is set to 1000 in
the 2.6 kernel, so there are 1000 timer interrupts per second,
i.e., a timer interrupt happens every millisecond. n*HZ/100
is the number of timer ticks in n millisecons.
- jiffies is the number of timer interrupts since the
system booted. If HZ is 1000, jiffies is incremented every
millisecond, i.e., a jiffy is only 1-millisecond.
- In sched.h, MIN_TIMESLICE is defined as
max(5 * HZ / 1000, 1), which is actually 5ms, DEF_TIMESLICE
is defined as (100 * HZ / 1000), which is 100 milliseconds.
|