(linenum→info "unix/slp.c:2238")

glibc/2.7/manual/resource.texi

    1: @node Resource Usage And Limitation, Non-Local Exits, Date and Time, Top
    2: @c %MENU% Functions for examining resource usage and getting and setting limits
    3: @chapter Resource Usage And Limitation
    4: This chapter describes functions for examining how much of various kinds of
    5: resources (CPU time, memory, etc.) a process has used and getting and setting
    6: limits on future usage.
    7: 
    8: @menu
    9: * Resource Usage::              Measuring various resources used.
   10: * Limits on Resources::         Specifying limits on resource usage.
   11: * Priority::                    Reading or setting process run priority.
   12: * Memory Resources::            Querying memory available resources.
   13: * Processor Resources::         Learn about the processors available.
   14: @end menu
   15: 
   16: 
   17: @node Resource Usage
   18: @section Resource Usage
   19: 
   20: @pindex sys/resource.h
   21: The function @code{getrusage} and the data type @code{struct rusage}
   22: are used to examine the resource usage of a process.  They are declared
   23: in @file{sys/resource.h}.
   24: 
   25: @comment sys/resource.h
   26: @comment BSD
   27: @deftypefun int getrusage (int @var{processes}, struct rusage *@var{rusage})
   28: This function reports resource usage totals for processes specified by
   29: @var{processes}, storing the information in @code{*@var{rusage}}.
   30: 
   31: In most systems, @var{processes} has only two valid values:
   32: 
   33: @table @code
   34: @comment sys/resource.h
   35: @comment BSD
   36: @item RUSAGE_SELF
   37: Just the current process.
   38: 
   39: @comment sys/resource.h
   40: @comment BSD
   41: @item RUSAGE_CHILDREN
   42: All child processes (direct and indirect) that have already terminated.
   43: @end table
   44: 
   45: In the GNU system, you can also inquire about a particular child process
   46: by specifying its process ID.
   47: 
   48: The return value of @code{getrusage} is zero for success, and @code{-1}
   49: for failure.
   50: 
   51: @table @code
   52: @item EINVAL
   53: The argument @var{processes} is not valid.
   54: @end table
   55: @end deftypefun
   56: 
   57: One way of getting resource usage for a particular child process is with
   58: the function @code{wait4}, which returns totals for a child when it
   59: terminates.  @xref{BSD Wait Functions}.
   60: 
   61: @comment sys/resource.h
   62: @comment BSD
   63: @deftp {Data Type} {struct rusage}
   64: This data type stores various resource usage statistics.  It has the
   65: following members, and possibly others:
   66: 
   67: @table @code
   68: @item struct timeval ru_utime
   69: Time spent executing user instructions.
   70: 
   71: @item struct timeval ru_stime
   72: Time spent in operating system code on behalf of @var{processes}.
   73: 
   74: @item long int ru_maxrss
   75: The maximum resident set size used, in kilobytes.  That is, the maximum
   76: number of kilobytes of physical memory that @var{processes} used
   77: simultaneously.
   78: 
   79: @item long int ru_ixrss
   80: An integral value expressed in kilobytes times ticks of execution, which
   81: indicates the amount of memory used by text that was shared with other
   82: processes.
   83: 
   84: @item long int ru_idrss
   85: An integral value expressed the same way, which is the amount of
   86: unshared memory used for data.
   87: 
   88: @item long int ru_isrss
   89: An integral value expressed the same way, which is the amount of
   90: unshared memory used for stack space.
   91: 
   92: @item long int ru_minflt
   93: The number of page faults which were serviced without requiring any I/O.
   94: 
   95: @item long int ru_majflt
   96: The number of page faults which were serviced by doing I/O.
   97: 
   98: @item long int ru_nswap
   99: The number of times @var{processes} was swapped entirely out of main memory.
  100: 
  101: @item long int ru_inblock
  102: The number of times the file system had to read from the disk on behalf
  103: of @var{processes}.
  104: 
  105: @item long int ru_oublock
  106: The number of times the file system had to write to the disk on behalf
  107: of @var{processes}.
  108: 
  109: @item long int ru_msgsnd
  110: Number of IPC messages sent.
  111: 
  112: @item long int ru_msgrcv
  113: Number of IPC messages received.
  114: 
  115: @item long int ru_nsignals
  116: Number of signals received.
  117: 
  118: @item long int ru_nvcsw
  119: The number of times @var{processes} voluntarily invoked a context switch
  120: (usually to wait for some service).
  121: 
  122: @item long int ru_nivcsw
  123: The number of times an involuntary context switch took place (because
  124: a time slice expired, or another process of higher priority was
  125: scheduled).
  126: @end table
  127: @end deftp
  128: 
  129: @code{vtimes} is a historical function that does some of what
  130: @code{getrusage} does.  @code{getrusage} is a better choice.
  131: 
  132: @code{vtimes} and its @code{vtimes} data structure are declared in
  133: @file{sys/vtimes.h}.
  134: @pindex sys/vtimes.h
  135: @comment vtimes.h
  136: 
  137: @deftypefun int vtimes (struct vtimes @var{current}, struct vtimes @var{child})
  138: 
  139: @code{vtimes} reports resource usage totals for a process.
  140: 
  141: If @var{current} is non-null, @code{vtimes} stores resource usage totals for
  142: the invoking process alone in the structure to which it points.  If
  143: @var{child} is non-null, @code{vtimes} stores resource usage totals for all
  144: past children (which have terminated) of the invoking process in the structure
  145: to which it points.
  146: 
  147: @deftp {Data Type} {struct vtimes}
  148: This data type contains information about the resource usage of a process.
  149: Each member corresponds to a member of the @code{struct rusage} data type
  150: described above.
  151: 
  152: @table @code
  153: @item vm_utime
  154: User CPU time.  Analogous to @code{ru_utime} in @code{struct rusage}
  155: @item vm_stime
  156: System CPU time.  Analogous to @code{ru_stime} in @code{struct rusage}
  157: @item vm_idsrss
  158: Data and stack memory.  The sum of the values that would be reported as
  159: @code{ru_idrss} and @code{ru_isrss} in @code{struct rusage}
  160: @item vm_ixrss
  161: Shared memory.  Analogous to @code{ru_ixrss} in @code{struct rusage}
  162: @item vm_maxrss
  163: Maximent resident set size.  Analogous to @code{ru_maxrss} in
  164: @code{struct rusage}
  165: @item vm_majflt
  166: Major page faults.  Analogous to @code{ru_majflt} in @code{struct rusage}
  167: @item vm_minflt
  168: Minor page faults.  Analogous to @code{ru_minflt} in @code{struct rusage}
  169: @item vm_nswap
  170: Swap count.  Analogous to @code{ru_nswap} in @code{struct rusage}
  171: @item vm_inblk
  172: Disk reads.  Analogous to @code{ru_inblk} in @code{struct rusage}
  173: @item vm_oublk
  174: Disk writes.  Analogous to @code{ru_oublk} in @code{struct rusage}
  175: @end table
  176: @end deftp
  177: 
  178: 
  179: The return value is zero if the function succeeds; @code{-1} otherwise.
  180: 
  181: 
  182: 
  183: @end deftypefun
  184: An additional historical function for examining resource usage,
  185: @code{vtimes}, is supported but not documented here.  It is declared in
  186: @file{sys/vtimes.h}.
  187: 
  188: @node Limits on Resources
  189: @section Limiting Resource Usage
  190: @cindex resource limits
  191: @cindex limits on resource usage
  192: @cindex usage limits
  193: 
  194: You can specify limits for the resource usage of a process.  When the
  195: process tries to exceed a limit, it may get a signal, or the system call
  196: by which it tried to do so may fail, depending on the resource.  Each
  197: process initially inherits its limit values from its parent, but it can
  198: subsequently change them.
  199: 
  200: There are two per-process limits associated with a resource:
  201: @cindex limit
  202: 
  203: @table @dfn
  204: @item current limit
  205: The current limit is the value the system will not allow usage to
  206: exceed.  It is also called the ``soft limit'' because the process being
  207: limited can generally raise the current limit at will.
  208: @cindex current limit
  209: @cindex soft limit
  210: 
  211: @item maximum limit
  212: The maximum limit is the maximum value to which a process is allowed to
  213: set its current limit.  It is also called the ``hard limit'' because
  214: there is no way for a process to get around it.  A process may lower
  215: its own maximum limit, but only the superuser may increase a maximum
  216: limit.
  217: @cindex maximum limit
  218: @cindex hard limit
  219: @end table
  220: 
  221: @pindex sys/resource.h
  222: The symbols for use with @code{getrlimit}, @code{setrlimit},
  223: @code{getrlimit64}, and @code{setrlimit64} are defined in
  224: @file{sys/resource.h}.
  225: 
  226: @comment sys/resource.h
  227: @comment BSD
  228: @deftypefun int getrlimit (int @var{resource}, struct rlimit *@var{rlp})
  229: Read the current and maximum limits for the resource @var{resource}
  230: and store them in @code{*@var{rlp}}.
  231: 
  232: The return value is @code{0} on success and @code{-1} on failure.  The
  233: only possible @code{errno} error condition is @code{EFAULT}.
  234: 
  235: When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} on a
  236: 32-bit system this function is in fact @code{getrlimit64}.  Thus, the
  237: LFS interface transparently replaces the old interface.
  238: @end deftypefun
  239: 
  240: @comment sys/resource.h
  241: @comment Unix98
  242: @deftypefun int getrlimit64 (int @var{resource}, struct rlimit64 *@var{rlp})
  243: This function is similar to @code{getrlimit} but its second parameter is
  244: a pointer to a variable of type @code{struct rlimit64}, which allows it
  245: to read values which wouldn't fit in the member of a @code{struct
  246: rlimit}.
  247: 
  248: If the sources are compiled with @code{_FILE_OFFSET_BITS == 64} on a
  249: 32-bit machine, this function is available under the name
  250: @code{getrlimit} and so transparently replaces the old interface.
  251: @end deftypefun
  252: 
  253: @comment sys/resource.h
  254: @comment BSD
  255: @deftypefun int setrlimit (int @var{resource}, const struct rlimit *@var{rlp})
  256: Store the current and maximum limits for the resource @var{resource}
  257: in @code{*@var{rlp}}.
  258: 
  259: The return value is @code{0} on success and @code{-1} on failure.  The
  260: following @code{errno} error condition is possible:
  261: 
  262: @table @code
  263: @item EPERM
  264: @itemize @bullet
  265: @item
  266: The process tried to raise a current limit beyond the maximum limit.
  267: 
  268: @item
  269: The process tried to raise a maximum limit, but is not superuser.
  270: @end itemize
  271: @end table
  272: 
  273: When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} on a
  274: 32-bit system this function is in fact @code{setrlimit64}.  Thus, the
  275: LFS interface transparently replaces the old interface.
  276: @end deftypefun
  277: 
  278: @comment sys/resource.h
  279: @comment Unix98
  280: @deftypefun int setrlimit64 (int @var{resource}, const struct rlimit64 *@var{rlp})
  281: This function is similar to @code{setrlimit} but its second parameter is
  282: a pointer to a variable of type @code{struct rlimit64} which allows it
  283: to set values which wouldn't fit in the member of a @code{struct
  284: rlimit}.
  285: 
  286: If the sources are compiled with @code{_FILE_OFFSET_BITS == 64} on a
  287: 32-bit machine this function is available under the name
  288: @code{setrlimit} and so transparently replaces the old interface.
  289: @end deftypefun
  290: 
  291: @comment sys/resource.h
  292: @comment BSD
  293: @deftp {Data Type} {struct rlimit}
  294: This structure is used with @code{getrlimit} to receive limit values,
  295: and with @code{setrlimit} to specify limit values for a particular process
  296: and resource.  It has two fields:
  297: 
  298: @table @code
  299: @item rlim_t rlim_cur
  300: The current limit
  301: 
  302: @item rlim_t rlim_max
  303: The maximum limit.
  304: @end table
  305: 
  306: For @code{getrlimit}, the structure is an output; it receives the current
  307: values.  For @code{setrlimit}, it specifies the new values.
  308: @end deftp
  309: 
  310: For the LFS functions a similar type is defined in @file{sys/resource.h}.
  311: 
  312: @comment sys/resource.h
  313: @comment Unix98
  314: @deftp {Data Type} {struct rlimit64}
  315: This structure is analogous to the @code{rlimit} structure above, but
  316: its components have wider ranges.  It has two fields:
  317: 
  318: @table @code
  319: @item rlim64_t rlim_cur
  320: This is analogous to @code{rlimit.rlim_cur}, but with a different type.
  321: 
  322: @item rlim64_t rlim_max
  323: This is analogous to @code{rlimit.rlim_max}, but with a different type.
  324: @end table
  325: 
  326: @end deftp
  327: 
  328: Here is a list of resources for which you can specify a limit.  Memory
  329: and file sizes are measured in bytes.
  330: 
  331: @table @code
  332: @comment sys/resource.h
  333: @comment BSD
  334: @item RLIMIT_CPU
  335: @vindex RLIMIT_CPU
  336: The maximum amount of CPU time the process can use.  If it runs for
  337: longer than this, it gets a signal: @code{SIGXCPU}.  The value is
  338: measured in seconds.  @xref{Operation Error Signals}.
  339: 
  340: @comment sys/resource.h
  341: @comment BSD
  342: @item RLIMIT_FSIZE
  343: @vindex RLIMIT_FSIZE
  344: The maximum size of file the process can create.  Trying to write a
  345: larger file causes a signal: @code{SIGXFSZ}.  @xref{Operation Error
  346: Signals}.
  347: 
  348: @comment sys/resource.h
  349: @comment BSD
  350: @item RLIMIT_DATA
  351: @vindex RLIMIT_DATA
  352: The maximum size of data memory for the process.  If the process tries
  353: to allocate data memory beyond this amount, the allocation function
  354: fails.
  355: 
  356: @comment sys/resource.h
  357: @comment BSD
  358: @item RLIMIT_STACK
  359: @vindex RLIMIT_STACK
  360: The maximum stack size for the process.  If the process tries to extend
  361: its stack past this size, it gets a @code{SIGSEGV} signal.
  362: @xref{Program Error Signals}.
  363: 
  364: @comment sys/resource.h
  365: @comment BSD
  366: @item RLIMIT_CORE
  367: @vindex RLIMIT_CORE
  368: The maximum size core file that this process can create.  If the process
  369: terminates and would dump a core file larger than this, then no core
  370: file is created.  So setting this limit to zero prevents core files from
  371: ever being created.
  372: 
  373: @comment sys/resource.h
  374: @comment BSD
  375: @item RLIMIT_RSS
  376: @vindex RLIMIT_RSS
  377: The maximum amount of physical memory that this process should get.
  378: This parameter is a guide for the system's scheduler and memory
  379: allocator; the system may give the process more memory when there is a
  380: surplus.
  381: 
  382: @comment sys/resource.h
  383: @comment BSD
  384: @item RLIMIT_MEMLOCK
  385: The maximum amount of memory that can be locked into physical memory (so
  386: it will never be paged out).
  387: 
  388: @comment sys/resource.h
  389: @comment BSD
  390: @item RLIMIT_NPROC
  391: The maximum number of processes that can be created with the same user ID.
  392: If you have reached the limit for your user ID, @code{fork} will fail
  393: with @code{EAGAIN}.  @xref{Creating a Process}.
  394: 
  395: @comment sys/resource.h
  396: @comment BSD
  397: @item RLIMIT_NOFILE
  398: @vindex RLIMIT_NOFILE
  399: @itemx RLIMIT_OFILE
  400: @vindex RLIMIT_OFILE
  401: The maximum number of files that the process can open.  If it tries to
  402: open more files than this, its open attempt fails with @code{errno}
  403: @code{EMFILE}.  @xref{Error Codes}.  Not all systems support this limit;
  404: GNU does, and 4.4 BSD does.
  405: 
  406: @comment sys/resource.h
  407: @comment Unix98
  408: @item RLIMIT_AS
  409: @vindex RLIMIT_AS
  410: The maximum size of total memory that this process should get.  If the
  411: process tries to allocate more memory beyond this amount with, for
  412: example, @code{brk}, @code{malloc}, @code{mmap} or @code{sbrk}, the
  413: allocation function fails.
  414: 
  415: @comment sys/resource.h
  416: @comment BSD
  417: @item RLIM_NLIMITS
  418: @vindex RLIM_NLIMITS
  419: The number of different resource limits.  Any valid @var{resource}
  420: operand must be less than @code{RLIM_NLIMITS}.
  421: @end table
  422: 
  423: @comment sys/resource.h
  424: @comment BSD
  425: @deftypevr Constant int RLIM_INFINITY
  426: This constant stands for a value of ``infinity'' when supplied as
  427: the limit value in @code{setrlimit}.
  428: @end deftypevr
  429: 
  430: 
  431: The following are historical functions to do some of what the functions
  432: above do.  The functions above are better choices.
  433: 
  434: @code{ulimit} and the command symbols are declared in @file{ulimit.h}.
  435: @pindex ulimit.h
  436: 
  437: @comment ulimit.h
  438: @comment BSD
  439: @deftypefun int ulimit (int @var{cmd}, ...)
  440: 
  441: @code{ulimit} gets the current limit or sets the current and maximum
  442: limit for a particular resource for the calling process according to the
  443: command @var{cmd}.a
  444: 
  445: If you are getting a limit, the command argument is the only argument.
  446: If you are setting a limit, there is a second argument:
  447: @code{long int} @var{limit} which is the value to which you are setting
  448: the limit.
  449: 
  450: The @var{cmd} values and the operations they specify are:
  451: @table @code
  452: 
  453: @item GETFSIZE
  454: Get the current limit on the size of a file, in units of 512 bytes.
  455: 
  456: @item SETFSIZE
  457: Set the current and maximum limit on the size of a file to @var{limit} *
  458: 512 bytes.
  459: 
  460: @end table
  461: 
  462: There are also some other @var{cmd} values that may do things on some
  463: systems, but they are not supported.
  464: 
  465: Only the superuser may increase a maximum limit.
  466: 
  467: When you successfully get a limit, the return value of @code{ulimit} is
  468: that limit, which is never negative.  When you successfully set a limit,
  469: the return value is zero.  When the function fails, the return value is
  470: @code{-1} and @code{errno} is set according to the reason:
  471: 
  472: @table @code
  473: @item EPERM
  474: A process tried to increase a maximum limit, but is not superuser.
  475: @end table
  476: 
  477: 
  478: @end deftypefun
  479: 
  480: @code{vlimit} and its resource symbols are declared in @file{sys/vlimit.h}.
  481: @pindex sys/vlimit.h
  482: 
  483: @comment sys/vlimit.h
  484: @comment BSD
  485: @deftypefun int vlimit (int @var{resource}, int @var{limit})
  486: 
  487: @code{vlimit} sets the current limit for a resource for a process.
  488: 
  489: @var{resource} identifies the resource:
  490: 
  491: @table @code
  492: @item LIM_CPU
  493: Maximum CPU time.  Same as @code{RLIMIT_CPU} for @code{setrlimit}.
  494: @item LIM_FSIZE
  495: Maximum file size.  Same as @code{RLIMIT_FSIZE} for @code{setrlimit}.
  496: @item LIM_DATA
  497: Maximum data memory.  Same as @code{RLIMIT_DATA} for @code{setrlimit}.
  498: @item LIM_STACK
  499: Maximum stack size.  Same as @code{RLIMIT_STACK} for @code{setrlimit}.
  500: @item LIM_CORE
  501: Maximum core file size.  Same as @code{RLIMIT_COR} for @code{setrlimit}.
  502: @item LIM_MAXRSS
  503: Maximum physical memory.  Same as @code{RLIMIT_RSS} for @code{setrlimit}.
  504: @end table
  505: 
  506: The return value is zero for success, and @code{-1} with @code{errno} set
  507: accordingly for failure:
  508: 
  509: @table @code
  510: @item EPERM
  511: The process tried to set its current limit beyond its maximum limit.
  512: @end table
  513: 
  514: @end deftypefun
  515: 
  516: @node Priority
  517: @section Process CPU Priority And Scheduling
  518: @cindex process priority
  519: @cindex cpu priority
  520: @cindex priority of a process
  521: 
  522: When multiple processes simultaneously require CPU time, the system's
  523: scheduling policy and process CPU priorities determine which processes
  524: get it.  This section describes how that determination is made and
  525: GNU C library functions to control it.
  526: 
  527: It is common to refer to CPU scheduling simply as scheduling and a
  528: process' CPU priority simply as the process' priority, with the CPU
  529: resource being implied.  Bear in mind, though, that CPU time is not the
  530: only resource a process uses or that processes contend for.  In some
  531: cases, it is not even particularly important.  Giving a process a high
  532: ``priority'' may have very little effect on how fast a process runs with
  533: respect to other processes.  The priorities discussed in this section
  534: apply only to CPU time.
  535: 
  536: CPU scheduling is a complex issue and different systems do it in wildly
  537: different ways.  New ideas continually develop and find their way into
  538: the intricacies of the various systems' scheduling algorithms.  This
  539: section discusses the general concepts, some specifics of systems
  540: that commonly use the GNU C library, and some standards.
  541: 
  542: For simplicity, we talk about CPU contention as if there is only one CPU
  543: in the system.  But all the same principles apply when a processor has
  544: multiple CPUs, and knowing that the number of processes that can run at
  545: any one time is equal to the number of CPUs, you can easily extrapolate
  546: the information.
  547: 
  548: The functions described in this section are all defined by the POSIX.1
  549: and POSIX.1b standards (the @code{sched@dots{}} functions are POSIX.1b).
  550: However, POSIX does not define any semantics for the values that these
  551: functions get and set.  In this chapter, the semantics are based on the
  552: Linux kernel's implementation of the POSIX standard.  As you will see,
  553: the Linux implementation is quite the inverse of what the authors of the
  554: POSIX syntax had in mind.
  555: 
  556: @menu
  557: * Absolute Priority::               The first tier of priority.  Posix
  558: * Realtime Scheduling::             Scheduling among the process nobility
  559: * Basic Scheduling Functions::      Get/set scheduling policy, priority
  560: * Traditional Scheduling::          Scheduling among the vulgar masses
  561: * CPU Affinity::                    Limiting execution to certain CPUs
  562: @end menu
  563: 
  564: 
  565: 
  566: @node Absolute Priority
  567: @subsection Absolute Priority
  568: @cindex absolute priority
  569: @cindex priority, absolute
  570: 
  571: Every process has an absolute priority, and it is represented by a number.
  572: The higher the number, the higher the absolute priority.
  573: 
  574: @cindex realtime CPU scheduling
  575: On systems of the past, and most systems today, all processes have
  576: absolute priority 0 and this section is irrelevant.  In that case,
  577: @xref{Traditional Scheduling}.  Absolute priorities were invented to
  578: accommodate realtime systems, in which it is vital that certain processes
  579: be able to respond to external events happening in real time, which
  580: means they cannot wait around while some other process that @emph{wants
  581: to}, but doesn't @emph{need to} run occupies the CPU.
  582: 
  583: @cindex ready to run
  584: @cindex preemptive scheduling
  585: When two processes are in contention to use the CPU at any instant, the
  586: one with the higher absolute priority always gets it.  This is true even if the
  587: process with the lower priority is already using the CPU (i.e., the
  588: scheduling is preemptive).  Of course, we're only talking about
  589: processes that are running or ``ready to run,'' which means they are
  590: ready to execute instructions right now.  When a process blocks to wait
  591: for something like I/O, its absolute priority is irrelevant.
  592: 
  593: @cindex runnable process
  594: @strong{Note:}  The term ``runnable'' is a synonym for ``ready to run.''
  595: 
  596: When two processes are running or ready to run and both have the same
  597: absolute priority, it's more interesting.  In that case, who gets the
  598: CPU is determined by the scheduling policy.  If the processes have
  599: absolute priority 0, the traditional scheduling policy described in
  600: @ref{Traditional Scheduling} applies.  Otherwise, the policies described
  601: in @ref{Realtime Scheduling} apply.
  602: 
  603: You normally give an absolute priority above 0 only to a process that
  604: can be trusted not to hog the CPU.  Such processes are designed to block
  605: (or terminate) after relatively short CPU runs.
  606: 
  607: A process begins life with the same absolute priority as its parent
  608: process.  Functions described in @ref{Basic Scheduling Functions} can
  609: change it.
  610: 
  611: Only a privileged process can change a process' absolute priority to
  612: something other than @code{0}.  Only a privileged process or the
  613: target process' owner can change its absolute priority at all.
  614: 
  615: POSIX requires absolute priority values used with the realtime
  616: scheduling policies to be consecutive with a range of at least 32.  On
  617: Linux, they are 1 through 99.  The functions
  618: @code{sched_get_priority_max} and @code{sched_set_priority_min} portably
  619: tell you what the range is on a particular system.
  620: 
  621: 
  622: @subsubsection Using Absolute Priority
  623: 
  624: One thing you must keep in mind when designing real time applications is
  625: that having higher absolute priority than any other process doesn't
  626: guarantee the process can run continuously.  Two things that can wreck a
  627: good CPU run are interrupts and page faults.
  628: 
  629: Interrupt handlers live in that limbo between processes.  The CPU is
  630: executing instructions, but they aren't part of any process.  An
  631: interrupt will stop even the highest priority process.  So you must
  632: allow for slight delays and make sure that no device in the system has
  633: an interrupt handler that could cause too long a delay between
  634: instructions for your process.
  635: 
  636: Similarly, a page fault causes what looks like a straightforward
  637: sequence of instructions to take a long time.  The fact that other
  638: processes get to run while the page faults in is of no consequence,
  639: because as soon as the I/O is complete, the high priority process will
  640: kick them out and run again, but the wait for the I/O itself could be a
  641: problem.  To neutralize this threat, use @code{mlock} or
  642: @code{mlockall}.
  643: 
  644: There are a few ramifications of the absoluteness of this priority on a
  645: single-CPU system that you need to keep in mind when you choose to set a
  646: priority and also when you're working on a program that runs with high
  647: absolute priority.  Consider a process that has higher absolute priority
  648: than any other process in the system and due to a bug in its program, it
  649: gets into an infinite loop.  It will never cede the CPU.  You can't run
  650: a command to kill it because your command would need to get the CPU in
  651: order to run.  The errant program is in complete control.  It controls
  652: the vertical, it controls the horizontal.
  653: 
  654: There are two ways to avoid this: 1) keep a shell running somewhere with
  655: a higher absolute priority.  2) keep a controlling terminal attached to
  656: the high priority process group.  All the priority in the world won't
  657: stop an interrupt handler from running and delivering a signal to the
  658: process if you hit Control-C.
  659: 
  660: Some systems use absolute priority as a means of allocating a fixed
  661: percentage of CPU time to a process.  To do this, a super high priority
  662: privileged process constantly monitors the process' CPU usage and raises
  663: its absolute priority when the process isn't getting its entitled share
  664: and lowers it when the process is exceeding it.
  665: 
  666: @strong{Note:}  The absolute priority is sometimes called the ``static
  667: priority.''  We don't use that term in this manual because it misses the
  668: most important feature of the absolute priority:  its absoluteness.
  669: 
  670: 
  671: @node Realtime Scheduling
  672: @subsection Realtime Scheduling
  673: @cindex realtime scheduling
  674: 
  675: Whenever two processes with the same absolute priority are ready to run,
  676: the kernel has a decision to make, because only one can run at a time.
  677: If the processes have absolute priority 0, the kernel makes this decision
  678: as described in @ref{Traditional Scheduling}.  Otherwise, the decision
  679: is as described in this section.
  680: 
  681: If two processes are ready to run but have different absolute priorities,
  682: the decision is much simpler, and is described in @ref{Absolute
  683: Priority}.
  684: 
  685: Each process has a scheduling policy.  For processes with absolute
  686: priority other than zero, there are two available:
  687: 
  688: @enumerate
  689: @item
  690: First Come First Served
  691: @item
  692: Round Robin
  693: @end enumerate
  694: 
  695: The most sensible case is where all the processes with a certain
  696: absolute priority have the same scheduling policy.  We'll discuss that
  697: first.
  698: 
  699: In Round Robin, processes share the CPU, each one running for a small
  700: quantum of time (``time slice'') and then yielding to another in a
  701: circular fashion.  Of course, only processes that are ready to run and
  702: have the same absolute priority are in this circle.
  703: 
  704: In First Come First Served, the process that has been waiting the
  705: longest to run gets the CPU, and it keeps it until it voluntarily
  706: relinquishes the CPU, runs out of things to do (blocks), or gets
  707: preempted by a higher priority process.
  708: 
  709: First Come First Served, along with maximal absolute priority and
  710: careful control of interrupts and page faults, is the one to use when a
  711: process absolutely, positively has to run at full CPU speed or not at
  712: all.
  713: 
  714: Judicious use of @code{sched_yield} function invocations by processes
  715: with First Come First Served scheduling policy forms a good compromise
  716: between Round Robin and First Come First Served.
  717: 
  718: To understand how scheduling works when processes of different scheduling
  719: policies occupy the same absolute priority, you have to know the nitty
  720: gritty details of how processes enter and exit the ready to run list:
  721: 
  722: In both cases, the ready to run list is organized as a true queue, where
  723: a process gets pushed onto the tail when it becomes ready to run and is
  724: popped off the head when the scheduler decides to run it.  Note that
  725: ready to run and running are two mutually exclusive states.  When the
  726: scheduler runs a process, that process is no longer ready to run and no
  727: longer in the ready to run list.  When the process stops running, it
  728: may go back to being ready to run again.
  729: 
  730: The only difference between a process that is assigned the Round Robin
  731: scheduling policy and a process that is assigned First Come First Serve
  732: is that in the former case, the process is automatically booted off the
  733: CPU after a certain amount of time.  When that happens, the process goes
  734: back to being ready to run, which means it enters the queue at the tail.
  735: The time quantum we're talking about is small.  Really small.  This is
  736: not your father's timesharing.  For example, with the Linux kernel, the
  737: round robin time slice is a thousand times shorter than its typical
  738: time slice for traditional scheduling.
  739: 
  740: A process begins life with the same scheduling policy as its parent process.
  741: Functions described in @ref{Basic Scheduling Functions} can change it.
  742: 
  743: Only a privileged process can set the scheduling policy of a process
  744: that has absolute priority higher than 0.
  745: 
  746: @node Basic Scheduling Functions
  747: @subsection Basic Scheduling Functions
  748: 
  749: This section describes functions in the GNU C library for setting the
  750: absolute priority and scheduling policy of a process.
  751: 
  752: @strong{Portability Note:}  On systems that have the functions in this
  753: section, the macro _POSIX_PRIORITY_SCHEDULING is defined in
  754: @file{<unistd.h>}.
  755: 
  756: