stap - systemtap script translator/driver
stap
[
OPTIONS
]
FILENAME
[
ARGUMENTS
]
stap
[
OPTIONS
]
-
[
ARGUMENTS
]
stap
[
OPTIONS
]
-e SCRIPT
[
ARGUMENTS
]
The stap program is the front-end to the Systemtap tool. It accepts probing instructions (written in a simple scripting language), translates those instructions into C code, compiles this C code, and loads the resulting kernel module into a running Linux kernel to perform the requested system trace/probe functions. You can supply the script in a named file, from standard input, or from the command line. The program runs until it is interrupted by the user, or if the script voluntarily invokes the exit() function, or by sufficient number of soft errors.
The language, which is described in a later section, is strictly typed, declaration free, procedural, and inspired by awk. It allows source code points or events in the kernel to be associated with handlers, which are subroutines that are executed synchronously. It is somewhat similar conceptually to "breakpoint command lists" in the gdb debugger.
This manual corresponds to version 0.5.14.
Any additional arguments on the command line are passed to the script parser for substitution. See below.
The systemtap script language resembles awk. There are two main outermost constructs: probes and functions. Within these, statements and expressions use C-like operator syntax and precedence.
In addition, script arguments given at the end of the command line may be expanded as literals. Use $1 ... $<NN> for casting as a numeric literal and @1 ... @<NN> for casting as string literal. The number of arguments may be accessed through $# (as a numeric literal) or through @# (as a string literal). These may be used in all contexts where literals are accepted, including preprocessing stage. Reference to an argument number beyond what was actually given is an error.
%( CONDITION %? TRUE-TOKENS %) %( CONDITION %? TRUE-TOKENS %: FALSE-TOKENS %)
If the first part is the identifier kernel_vr or kernel_v to refer to the kernel version number, with ("2.6.13-1.322FC3smp") or without ("2.6.13") the release code suffix, then the second part is one of the six standard numeric comparison operators <, <=, ==, !=, >, and >=, and the third part is a string literal that contains an RPM-style version-release value. The condition is deemed satisfied if the version of the target kernel (as optionally overridden by the -r option) compares to the given version string. The comparison is performed by the glibc function strverscmp.
If, on the other hand, the first part is the identifier arch to refer to the processor architecture, then the second part then the second part is one of the two string comparison operators == or !=, and the third part is a string literal for matching it. This comparison is simple string (in)equality.
Otherwise, the CONDITION is expected to be a comparison between two string literals or two numeric literals. In this case, the arguments are the only variables usable.
The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser
tokens (possibly including nested preprocessor conditionals), and are
pasted into the input stream if the condition is true or false. For
example, the following code induces a parse error unless the target
kernel version is newer than 2.6.5:
%( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
probe kernel.function ( %( kernel_v <= "2.6.12" %? "__mm_do_fault" %: %( kernel_vr == "2.6.13-1.8273FC3smp" %? "do_page_fault" %: UNSUPPORTED %) %) ) { /* ... */ } %( arch == "ia64" %? probe syscall.vliw = kernel.function("vliw_widget") {} %)
Scalar variables are implicitly typed as either string or integer.
Associative arrays also have a string or integer value, and a
a tuple of strings and/or integers serving as a key. Here are a
few basic expressions.
var1 = 5 var2 = "bar" array1 [pid()] = "name" # single numeric key array2 ["foo",4,i++] += 5 # vector of string/num/num keys if (["hello",5,4] in array2) log ("yes") # membership test
The translator performs type inference on all identifiers, including array indexes and function parameters. Inconsistent type-related use of identifiers signals an error.
Variables may be declared global, so that they are shared amongst all probes and live as long as the entire systemtap session. There is one namespace for all global variables, regardless of which script file they are found within. A global declaration may be written at the outermost level anywhere, not within a block of code. The following declaration marks a few variables as global. The translator will infer for each its value type, and if it is used as an array, its key types. Optionally, scalar globals may be initialized with a string or number literal.
Arrays are limited in size by the MAXMAPENTRIES variable -- see the SAFETY AND SECURITY section for details. Optionally, global arrays may be declared with a maximum size in brackets, overriding MAXMAPENTRIES for that array only. Note that this doesn't indicate the type of keys for the array, just the size.
probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
Events are specified in a special syntax called "probe points". There are several varieties of probe points defined by the translator, and tapset scripts may define further ones using aliases. These are listed in the stapprobes(5) manual pages.
The probe handler is interpreted relative to the context of each event. For events associated with kernel code, this context may include variables defined in the source code at that spot. These "target variables" are presented to the script as variables whose names are prefixed with "$". They may be accessed only if the kernel's compiler preserved them despite optimization. This is the same constraint that a debugger user faces when working with optimized code. Some other events have very little context.
New probe points may be defined using "aliases". Probe point aliases look similar to probe definitions, but instead of activating a probe at the given point, it just defines a new probe point name as an alias to an existing one. There are two types of alias, i.e. the prologue style and the epilogue style which are identified by "=" and "+=" respectively.
For prologue style alias, the statement block that follows an alias definition is implicitly added as a prologue to any probe that refers to the alias. While for the epilogue style alias, the statement block that follows an alias definition is implicitly added as an epilogue to any probe that refers to the alias. For example:
probe syscall.read = kernel.function("sys_read") { fildes = $fd }
probe syscall.read += kernel.function("sys_read") { fildes = $fd }
Another probe definition
may use the alias like this:
probe syscall.read { printf("reading fd=%d, fildes) }
function thisfn (arg1, arg2) { return arg1 + arg2 }
function thatfn:string (arg1:long, arg2) { return sprint(arg1) . arg2 }
The
printf
formatting directives similar to those of C, except that they are
fully type-checked by the translator.
x = sprintf("take %d steps forward, %d steps back\n", 3, 2) printf("take %d steps forward, %d steps back\n", 3+1, 2*2) bob = "bob" alice = "alice" print(bob) print("hello") print(10) printf("%s phoned %s %.4x times\n", bob, alice . bob, 3456) printf("%s except after %s\n", sprintf("%s before %s", sprint(1), sprint(3)), sprint("C"))
The aggregation operator is
<<<,
and resembles an assignment, or a C++ output-streaming operation.
The left operand specifies a scalar or array-index lvalue, which must
be declared global. The right operand is a numeric expression. The
meaning is intuitive: add the given number to the pile of numbers to
compute statistics of. (The specific list of statistics to gather
is given separately, by the extraction functions.)
foo <<< 1 stats[pid()] <<< memsize
The extraction functions are also special. For each appearance of a distinct extraction function operating on a given identifier, the translator arranges to compute a set of statistics that satisfy it. The statistics system is thereby "on-demand". Each execution of an extraction function causes the aggregation to be computed for that moment across all processors.
Here is the set of extractor functions. The first argument of each is the same style of lvalue used on the left hand side of the accumulate operation. The @count(v), @sum(v), @min(v), @max(v), @avg(v) extractor functions compute the number/total/minimum/maximum/average of all accumulated values. The resulting values are all simple integers.
Histograms are also available, but are more complicated because they
have a vector rather than scalar value.
@hist_linear(v,L,H,W)
represents a linear histogram whose low/high/width parameters are
given by the following three literal numbers. Similarly,
@hist_log(v,N)
represents a base-2 logarithmic histogram with the given number of
buckets. N may be omitted, and defaults to 64. Printing a histogram
with the
print
family of functions renders a histogram object as a tabular
"ASCII art" bar chart.
probe foo { x <<< $value } probe end { printf ("avg %d = sum %d / count %d\n", @avg(x), @sum(x), @count(x)) print (@hist_log(v)) }
The other place where embedded code is permitted is as a function body. In this case, the script language body is replaced entirely by a piece of C code enclosed again between %{ and %} markers. This C code may do anything reasonable and safe. There are a number of undocumented but complex safety constraints on atomicity, concurrency, resource consumption, and run time limits, so this is an advanced technique.
The memory locations set aside for input and output values
are made available to it using a macro
THIS.
Here are some examples:
function add_one (val) %{ THIS->__retvalue = THIS->val + 1; %} function add_one_str (val) %{ strlcpy (THIS->__retvalue, THIS->val, MAXSTRINGLEN); strlcat (THIS->__retvalue, "one", MAXSTRINGLEN); %}
In pass 2, the translator analyzes the input script to resolve symbols and types. References to variables, functions, and probe aliases that are unresolved internally are satisfied by searching through the parsed tapset scripts. If any tapset script is selected because it defines an unresolved symbol, then the entirety of that script is added to the translator's resolution queue. This process iterates until all symbols are resolved and a subset of tapset scripts is selected.
Next, all probe point descriptions are validated against the wide variety supported by the translator. Probe points that refer to code locations ("synchronous probe points") require the appropriate kernel debugging information to be installed. In the associated probe handlers, target-side variables (whose names begin with "$") are found and have their run-time locations decoded.
Next, all probes and functions are analyzed for optimization opportunities, in order to remove variables, expressions, and functions that have no useful value and no side-effect. Embedded-C functions are assumed to have side-effects unless they include the magic string /* pure */. Since this optimization can hide latent code errors such as type mismatches or invalid $target variables, it sometimes may be useful to disable the optimizations with the -u option.
Finally, all variable, function, parameter, array, and index types are inferred from context (literals and operators). Stopping the translator after pass 2 causes it to list all the probes, functions, and variables, along with all inferred types. Any inconsistent or unresolved types cause an error.
In pass 3, the translator writes C code that represents the actions of all selected script files, and creates a Makefile to build that into a kernel object. These files are placed into a temporary directory. Stopping the translator at this point causes it to print the contents of the C file.
In pass 4, the translator invokes the Linux kernel build system to create the actual kernel object file. This involves running make in the temporary directory, and requires a kernel module build system (headers, config and Makefiles) to be installed in the usual spot /lib/modules/VERSION/build. Stopping the translator after pass 4 is the last chance before running the kernel object. This may be useful if you want to archive the file.
In pass 5, the translator invokes the systemtap auxiliary program staprun program for the given kernel object. This program arranges to load the module then communicates with it, copying trace data from the kernel into temporary files, until the user sends an interrupt signal. Any run-time error encountered by the probe handlers, such as running out of memory, division by zero, exceeding nesting or runtime limits, results in a soft error indication. Soft errors in excess of MAXERRORS block of all subsequent probes, and terminate the session. Finally, staprun unloads the module, and cleans up.
The translator asserts certain safety constraints. It aims to ensure that no handler routine can run for very long, allocate memory, perform unsafe operations, or in unintentionally interfere with the kernel. Use of script global variables is suitably locked to protect against manipulation by concurrent probe handlers. Use of guru mode constructs such as embedded C can violate these constraints, leading to kernel crash or data corruption.
The resource use limits are set by macros in the generated C code. These may be overridden with the -D flag. A selection of these is as follows:
In case something goes wrong with stap or staprun after a probe has already started running, one may safely kill both user processes, and remove the active probe kernel module with rmmod. Any pending trace messages may be lost.
In addition to the methods outlined above, the generated kernel module also uses overload processing to make sure that probes can't run for too long. If more than STP_OVERLOAD_THRESHOLD cycles (default 500000000) have been spent in all the probes on a single cpu during the last STP_OVERLOAD_INTERVAL cycles (default 1000000000), the probes have overloaded the system and an exit is triggered.
By default, overload processing is turned on for all modules. If you would like to disable overload processing, define STP_NO_OVERLOAD.
Закладки на сайте Проследить за страницей |
Created 1996-2024 by Maxim Chirkov Добавить, Поддержать, Вебмастеру |