Ftrace is one of those useful “kernel debugging” tools which you turn to when other debugging tools fail to reveal the underlying problem. Ftrace is a Linux kernel internals tracing tool that was first included in the 2.6.27 kernel in 2008. The main developer was (and still is) Steven Rostedt who is currently a Red Hat employee with responsiblity for the real-time patches in the Linux kernel.
The name ftrace comes from the term function tracer, which was the original purpose of the tool, but nowadays it can do a lot more than just trace function calls. Over the years, various additional tracers have been added to look at things like context switches, how long interrupts are disabled, how long it takes for high-priority tasks to run after they have been woken up, and more. Ftrace includes a plugable framework that allows new tracers to be added relatively easily. Currently over 700 events throughout the kernel can be traced.
For my examples in this blog post, I am using ftrace on Fedora 20. Linux kernels must be built with tracing enabled. As you can see, Fedora 20 was built with tracing enabled.
# grep TRACER /boot/config-3.14.2-200.fc20.x86_64 # CONFIG_ATH5K_TRACER is not set CONFIG_NOP_TRACER=y CONFIG_HAVE_FUNCTION_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y CONFIG_TRACER_MAX_TRACE=y CONFIG_CONTEXT_SWITCH_TRACER=y CONFIG_GENERIC_TRACER=y CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y # CONFIG_IRQSOFF_TRACER is not set CONFIG_SCHED_TRACER=y CONFIG_TRACER_SNAPSHOT=y # CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set CONFIG_STACK_TRACER=y
For some unknown reason, however, not all tracers were enabled. Red Hat Enterprise Linux 6.4 has the same config settings.. By way of contrast, Red Hat MRG provides the following tracers: events wakeup preemptirqsoff preemptoff irqsoff ftrace sched_switch none, once the trace variant of the MRG Realtime kernel is installed and in use.
The main ftrace documentation is available in the Linux kernel source at …/root/Documentation/trace/ftrace.txt. It was last updated for the 3.10 kernel. In the same directory, there are a number of other interesting documents including ftrace-design.txt by Mike Frysinger which focuses on architecture implementation details. The actual tracemoint implementation mechanism is detailed by Mathieu Desnoyers in tracepoints.txt.
A tracepoint placed in code provides a hook to call a function (probe) that you can provide at runtime. A tracepoint can be “on” (a probe is connected to it) or “off” (no probe is attached). When a tracepoint is “off” it has no effect, except for adding a tiny time penalty (checking a condition for a branch) and space penalty (adding a few bytes for the function call at the end of the instrumented function and adds a data structure in a separate section). When a tracepoint is “on”, the function you provide is called each time the tracepoint is executed, in the execution context of the caller. When the function provided ends its execution, it returns to the caller (continuing from the tracepoint site).
The default location for ftrace is /sys/kerenel/debug. You may have to mount the debug filesystem.
# mount -t debugfs nodev /sys/kernel/debug # cat /sys/kernel/debug/tracing/available_tracers blk function_graph wakeup_dl wakeup_rt wakeup function nop # cd /sys/kernel/debug # ls acpi cleancache dynamic_debug hid mce regmap tracing xen bdi clk extfrag ieee80211 pinctrl sched_features usb bluetooth dma_buf frontswap kprobes pkg_temp_thermal sleep_time wakeup_sources boot_params dri gpio lpfc pstate_snb suspend_stats x86 # cd tracing # ls available_events instances set_ftrace_notrace trace_options available_filter_functions kprobe_events set_ftrace_pid trace_pipe available_tracers kprobe_profile set_graph_function trace_stat buffer_size_kb max_graph_depth set_graph_notrace tracing_cpumask buffer_total_size_kb options snapshot tracing_max_latency current_tracer per_cpu stack_max_size tracing_on dyn_ftrace_total_info printk_formats stack_trace tracing_thresh enabled_functions README stack_trace_filter uprobe_events events saved_cmdlines trace uprobe_profile free_buffer set_event trace_clock function_profile_enabled set_ftrace_filter trace_marker
The ftrace framework is one of the many improvements to come out of the Linux kernel realtime effort. Unlike SystemTap, it does not attempt to be a comprehensive, scriptable facility. There is a set of virtual files in a debugfs directory which can be used to enable specific tracers and see the results. The function call tracer after which ftrace is named simply outputs each kernel function called as it happens. Other tracers can look at wakeup latency, events, enabling and disabling interrupts, preemption, task switches, and more.
Kernel function call tracing (the original purpose of ftrace) is enabled as follows:
# echo function > /sys/kernel/debug/tracing/current_tracer # echo 1 > /sys/kernel/debug/tracing/tracing_on
Note that older ftrace documentation examples refer to tracing_enabled rather than tracing_on
Events which are available for tracing are listed in /sys/kernel/debug/tracing/available_events. To enable a particular event, such as sched_wakeup, simply echo it to /sys/kernel/debug/tracing/set_event. For example:
# echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
The append (“>>”) operator is required to avoid overwriting existing events.
To disable an event, echo the event name to the set_event file prefixed with an exclamation point:
# echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
To disable all events, echo an empty line to the set_event file:
# echo > /sys/kernel/debug/tracing/set_event
To enable all events, echo ‘*:*’ or ‘*:’ to the set_event file:
# echo *:* > /sys/kernel/debug/tracing/set_event
Events are organized into subsystems, such as ext4, irq, or sched, and a full event name looks like this: <subsystem>:<event>. The subsystem name is optional, but it is displayed in the available_events file. All of the events in a subsystem can be specified via the syntax <subsystem>:*. For example, to enable all irq events:
# echo 'irq:*' > /sys/kernel/debug/tracing/set_event
You can drill down and enable or disable individual events under the events directory:
# pwd /sys/kernel/debug/tracing/events # ls block gpio kmem pagemap scsi vmscan btrfs hda kvm power signal vsyscall cfg80211 hda_intel kvmmmu printk skb workqueue compaction header_event mce random sock writeback drm header_page migrate ras sunrpc xen enable i915 module raw_syscalls swiotlb xhci-hcd exceptions iommu napi rcu syscalls ext4 irq net regmap task filemap irq_vectors nmi rpm timer ftrace jbd2 oom sched udp # cd sched # ls enable sched_process_exit sched_stat_wait filter sched_process_fork sched_stick_numa sched_kthread_stop sched_process_free sched_swap_numa sched_kthread_stop_ret sched_process_wait sched_switch sched_migrate_task sched_stat_blocked sched_wait_task sched_move_numa sched_stat_iowait sched_wakeup sched_pi_setprio sched_stat_runtime sched_wakeup_new sched_process_exec sched_stat_sleep # cd sched_wakeup # ls -ltotal 0 -rw-r--r--. 1 root root 0 May 8 11:23 enable -rw-r--r--. 1 root root 0 May 8 11:23 filter -r--r--r--. 1 root root 0 May 8 11:23 format -r--r--r--. 1 root root 0 May 8 11:23 id -rw-r--r--. 1 root root 0 May 8 11:23 trigger
For example to enable the sched_wakeup event:
# echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
To disable the sched_wakeup event:
# echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
There is lots more to events than I have room to detail in this blog. For example, you can control the output format of events, filter specific events, and invoke specific scriptable actions when events are triggered. Read the ftrace documentation for all the details.
Ftrace uses a custom kernel ring buffer to store event information. The ring buffer can be used in either an overwrite mode or in producer/consumer mode. Producer/consumer mode is where if the producer were to fill up the ring buffer before the consumer could free up anything, the producer will stop writing to the buffer. This will cause the most recent events to be lost. Overwrite mode is where if the producer were to fill up the buffer before the consumer could free up anything, the producer will
overwrite the older data. This will cause loss of the oldest events. The size of the ring buffer is configurable.
# pwd /sys/kernel/debug/tracing # cat buffer_total_size_kb 28 (expanded: 5632) ]# cat buffer_size_kb 7 (expanded: 1408)
A trace marker enables some coordination between what is happening in user space and inside the kernel by providing a way to write into the ftrace kernel ring buffer from user space. This marker will then appear in the trace output to give a location where a specific event occurred. Here is an example of setting a trace marker:
# echo "FINN" > /sys/kernel/debug/tracing/trace_marker
One of the annoying things about ftrace for most first time users of the tool is that it does not provide an easy mechanism to invoke an executable and just trace that executable. Here is a simple shell script which provides that functionality:
#!/bin/bash DEBUGFS=`grep debugfs /proc/mounts | awk '{ print $2; }'` echo $$ > $DEBUGFS/tracing/set_ftrace_pid echo function > $DEBUGFS/tracing/current_tracer echo 1 > $DEBUGFS/tracing/tracing_on exec $* echo 0 > $DEBUGFS/tracing/tracing_on
The function call tracer (function) traces entry into all kernel functions.
Here is what is recorded in /sys/kernel/debug/tracing/trace when the above script is used to invoke ls ( ./script ls):
# tracer: function # # entries-in-buffer/entries-written: 364/27769 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | ls-5080 [003] .... 42387.930906: SyS_rt_sigaction < -system_call_fastpath ls-5080 [003] .... 42387.930907: do_sigaction <-SyS_rt_sigaction ls-5080 [003] .... 42387.930907: _raw_spin_lock_irq <-do_sigaction ls-5080 [003] d... 42387.930907: rm_from_queue_full <-do_sigaction ls-5080 [003] d... 42387.930907: rm_from_queue_full <-do_sigaction ls-5080 [003] .... 42387.930907: __audit_syscall_exit <-sysret_audit ls-5080 [003] .... 42387.930907: path_put <-__audit_syscall_exit ls-5080 [003] .... 42387.930907: dput <-path_put ls-5080 [003] .... 42387.930907: mntput <-path_put ls-5080 [003] .... 42387.930907: unroll_tree_refs <-__audit_syscall_exit ls-5080 [003] .... 42387.930907: kfree <-__audit_syscall_exit ls-5080 [003] .... 42387.930908: __audit_syscall_entry <-auditsys ls-5080 [003] .... 42387.930908: current_kernel_time <-__audit_syscall_entry ls-5080 [003] .... 42387.930908: SyS_execve <-stub_execve ls-5080 [003] .... 42387.930908: getname <-SyS_execve ls-5080 [003] .... 42387.930908: getname_flags <-getname ls-5080 [003] .... 42387.930908: kmem_cache_alloc <-getname_flags ls-5080 [003] .... 42387.930909: _cond_resched <-kmem_cache_alloc ls-5080 [003] .... 42387.930909: do_execve_common.isra.24 <-SyS_execve ls-5080 [003] .... 42387.930909: unshare_files <-do_execve_common.isra.24 ls-5080 [003] .... 42387.930909: unshare_fd <-unshare_files ls-5080 [003] .... 42387.930909: kmem_cache_alloc_trace <-do_execve_common.isra.24 ls-5080 [003] .... 42387.930909: _cond_resched <-kmem_cache_alloc_trace ls-5080 [003] .... 42387.930909: __slab_alloc <-kmem_cache_alloc_trace ls-5080 [003] .... 42387.930910: prepare_bprm_creds <-do_execve_common.isra.24 ls-5080 [003] .... 42387.930910: mutex_lock_interruptible <-prepare_bprm_creds ls-5080 [003] .... 42387.930910: _cond_resched <-mutex_lock_interruptible ls-5080 [003] .... 42387.930910: prepare_exec_creds <-prepare_bprm_creds ls-5080 [003] .... 42387.930910: prepare_creds <-prepare_exec_creds ls-5080 [003] .... 42387.930910: kmem_cache_alloc <-prepare_creds ls-5080 [003] .... 42387.930910: _cond_resched <-kmem_cache_alloc ls-5080 [003] .... 42387.930911: security_prepare_creds <-prepare_creds ls-5080 [003] .... 42387.930911: selinux_cred_prepare <-security_prepare_creds ls-5080 [003] .... 42387.930911: kmemdup <-selinux_cred_prepare ls-5080 [003] .... 42387.930911: __kmalloc_track_caller <-kmemdup ls-5080 [003] .... 42387.930911: kmalloc_slab <-__kmalloc_track_caller ls-5080 [003] .... 42387.930911: _cond_resched <-__kmalloc_track_caller ls-5080 [003] .... 42387.930911: key_put <-prepare_exec_creds ls-5080 [003] .... 42387.930911: key_put <-prepare_exec_creds ls-5080 [003] .... 42387.930911: _raw_spin_lock <-do_execve_common.isra.24 ls-5080 [003] .... 42387.930911: do_open_exec <-do_execve_common.isra.24 ls-5080 [003] .... 42387.930912: do_filp_open <-do_open_exec ls-5080 [003] .... 42387.930912: path_openat <-do_filp_open ls-5080 [003] .... 42387.930912: get_empty_filp <-path_openat ls-5080 [003] .... 42387.930912: kmem_cache_alloc <-get_empty_filp ls-5080 [003] .... 42387.930912: _cond_resched <-kmem_cache_alloc ls-5080 [003] .... 42387.930912: __slab_alloc <-kmem_cache_alloc ls-5080 [003] .... 42387.930912: security_file_alloc <-get_empty_filp ls-5080 [003] .... 42387.930912: selinux_file_alloc_security <-security_file_alloc ls-5080 [003] .... 42387.930912: kmem_cache_alloc_trace <-selinux_file_alloc_security ls-5080 [003] .... 42387.930913: _cond_resched <-kmem_cache_alloc_trace ls-5080 [003] .... 42387.930913: __mutex_init <-get_empty_filp ls-5080 [003] .... 42387.930913: path_init <-path_openat ls-5080 [003] .... 42387.930913: link_path_walk <-path_openat ls-5080 [003] .... 42387.930913: inode_permission <-link_path_walk ls-5080 [003] .... 42387.930913: __inode_permission <-inode_permission ls-5080 [003] .... 42387.930913: btrfs_permission <-__inode_permission ls-5080 [003] .... 42387.930913: generic_permission <-btrfs_permission ls-5080 [003] .... 42387.930913: security_inode_permission <-__inode_permission ls-5080 [003] .... 42387.930913: selinux_inode_permission <-security_inode_permission ls-5080 [003] .... 42387.930913: lookup_fast <-link_path_walk ls-5080 [003] .... 42387.930914: __d_lookup_rcu <-lookup_fast ls-5080 [003] .... 42387.930914: unlazy_walk <-link_path_walk ls-5080 [003] .... 42387.930914: legitimize_mnt <-unlazy_walk ls-5080 [003] .... 42387.930914: _raw_spin_lock <-unlazy_walk ls-5080 [003] .... 42387.930914: path_get <-unlazy_walk ls-5080 [003] .... 42387.930914: mntget <-path_get ls-5080 [003] .... 42387.930914: mntget <-link_path_walk ls-5080 [003] .... 42387.930914: _cond_resched <-link_path_walk ls-5080 [003] .... 42387.930914: touch_atime <-link_path_walk ls-5080 [003] .... 42387.930914: current_fs_time <-touch_atime ls-5080 [003] .... 42387.930914: current_kernel_time <-current_fs_time ls-5080 [003] .... 42387.930914: security_inode_follow_link <-link_path_walk ls-5080 [003] .... 42387.930915: selinux_inode_follow_link <-security_inode_follow_link ls-5080 [003] .... 42387.930915: inode_has_perm.isra.32 <-selinux_inode_follow_link ls-5080 [003] .... 42387.930915: avc_has_perm <-inode_has_perm.isra.32 ls-5080 [003] .... 42387.930915: page_follow_link_light <-link_path_walk ls-5080 [003] .... 42387.930915: page_getlink.isra.28.constprop.37 <-page_follow_link_light ls-5080 [003] .... 42387.930915: read_cache_page <-page_getlink.isra.28.constprop.37 ls-5080 [003] .... 42387.930915: do_read_cache_page <-read_cache_page ls-5080 [003] .... 42387.930915: find_get_page <-do_read_cache_page ls-5080 [003] .... 42387.930915: mark_page_accessed <-do_read_cache_page ls-5080 [003] .... 42387.930915: wait_on_page_read <-read_cache_page ls-5080 [003] .... 42387.930915: _cond_resched <-page_getlink.isra.28.constprop.37 ls-5080 [003] .... 42387.930915: link_path_walk <-link_path_walk ls-5080 [003] .... 42387.930915: inode_permission <-link_path_walk ls-5080 [003] .... 42387.930916: __inode_permission <-inode_permission ls-5080 [003] .... 42387.930916: btrfs_permission <-__inode_permission ls-5080 [003] .... 42387.930916: generic_permission <-btrfs_permission ls-5080 [003] .... 42387.930916: security_inode_permission <-__inode_permission ls-5080 [003] .... 42387.930916: selinux_inode_permission <-security_inode_permission ls-5080 [003] .... 42387.930916: lookup_fast <-link_path_walk ls-5080 [003] .... 42387.930916: __d_lookup <-lookup_fast ls-5080 [003] .... 42387.930916: _raw_spin_lock <-__d_lookup ls-5080 [003] .... 42387.930916: follow_managed <-lookup_fast ls-5080 [003] .... 42387.930916: dput <-link_path_walk ls-5080 [003] .... 42387.930917: inode_permission <-link_path_walk ls-5080 [003] .... 42387.930917: __inode_permission <-inode_permission ls-5080 [003] .... 42387.930917: btrfs_permission <-__inode_permission ls-5080 [003] .... 42387.930917: generic_permission <-btrfs_permission ls-5080 [003] .... 42387.930917: security_inode_permission <-__inode_permission ls-5080 [003] .... 42387.930917: selinux_inode_permission <-security_inode_permission ls-5080 [003] .... 42387.930917: lookup_fast <-link_path_walk ls-5080 [003] .... 42387.930917: __d_lookup <-lookup_fast ls-5080 [003] .... 42387.930917: _raw_spin_lock <-__d_lookup ls-5080 [003] .... 42387.930917: follow_managed <-lookup_fast ls-5080 [003] .... 42387.930917: dput <-link_path_walk ls-5080 [003] .... 42387.930917: page_put_link <-link_path_walk ls-5080 [003] .... 42387.930917: put_page <-page_put_link ls-5080 [003] .... 42387.930918: dput <-link_path_walk ls-5080 [003] .... 42387.930918: _raw_spin_lock <-lockref_put_or_lock ls-5080 [003] .... 42387.930918: btrfs_dentry_delete <-dput ls-5080 [003] .... 42387.930918: mntput <-link_path_walk ls-5080 [003] .... 42387.930918: mntput_no_expire <-mntput ls-5080 [003] .... 42387.930918: inode_permission <-link_path_walk ls-5080 [003] .... 42387.930918: __inode_permission <-inode_permission ls-5080 [003] .... 42387.930918: btrfs_permission <-__inode_permission ls-5080 [003] .... 42387.930918: generic_permission <-btrfs_permission ls-5080 [003] .... 42387.930918: security_inode_permission <-__inode_permission ls-5080 [003] .... 42387.930918: selinux_inode_permission <-security_inode_permission ls-5080 [003] .... 42387.930918: do_last <-path_openat ls-5080 [003] .... 42387.930919: lookup_fast <-do_last ls-5080 [003] .... 42387.930919: __d_lookup <-lookup_fast ls-5080 [003] .... 42387.930919: _raw_spin_lock <-__d_lookup ls-5080 [003] .... 42387.930919: follow_managed <-lookup_fast ls-5080 [003] .... 42387.930919: mntget <-do_last ls-5080 [003] .... 42387.930919: complete_walk <-do_last ls-5080 [003] .... 42387.930919: may_open <-do_last ls-5080 [003] .... 42387.930919: inode_permission <-may_open ls-5080 [003] .... 42387.930919: __inode_permission <-inode_permission ls-5080 [003] .... 42387.930919: btrfs_permission <-__inode_permission ls-5080 [003] .... 42387.930920: generic_permission <-btrfs_permission ls-5080 [003] .... 42387.930920: security_inode_permission <-__inode_permission ls-5080 [003] .... 42387.930920: selinux_inode_permission <-security_inode_permission ls-5080 [003] .... 42387.930920: finish_open <-do_last ls-5080 [003] .... 42387.930920: do_dentry_open <-finish_open ls-5080 [003] .... 42387.930920: path_get <-do_dentry_open ls-5080 [003] .... 42387.930920: mntget <-path_get ls-5080 [003] .... 42387.930920: try_module_get <-do_dentry_open ls-5080 [003] .... 42387.930920: security_file_open <-do_dentry_open ls-5080 [003] .... 42387.930920: selinux_file_open <-security_file_open ls-5080 [003] .... 42387.930921: avc_policy_seqno <-selinux_file_open ls-5080 [003] .... 42387.930921: inode_has_perm.isra.32 <-selinux_file_open ls-5080 [003] .... 42387.930921: avc_has_perm <-inode_has_perm.isra.32 ls-5080 [003] .... 42387.930921: __fsnotify_parent <-security_file_open ls-5080 [003] .... 42387.930921: fsnotify <-security_file_open ls-5080 [003] .... 42387.930921: generic_file_open <-do_dentry_open ls-5080 [003] .... 42387.930921: file_ra_state_init <-do_dentry_open ls-5080 [003] .... 42387.930921: open_check_o_direct <-do_last ls-5080 [003] .... 42387.930922: dput <-do_last ls-5080 [003] .... 42387.930922: mntput <-do_last ls-5080 [003] .... 42387.930922: mntput_no_expire <-mntput ls-5080 [003] .... 42387.930922: terminate_walk <-do_last ls-5080 [003] .... 42387.930922: dput <-terminate_walk ls-5080 [003] .... 42387.930922: mntput <-terminate_walk ls-5080 [003] .... 42387.930922: mntput_no_expire <-mntput ls-5080 [003] .... 42387.930922: dput <-path_openat ls-5080 [003] .... 42387.930922: mntput <-path_openat ls-5080 [003] .... 42387.930922: mntput_no_expire <-mntput ls-5080 [003] .... 42387.930922: __fsnotify_parent <-do_open_exec ls-5080 [003] .... 42387.930922: fsnotify <-do_open_exec ls-5080 [003] .... 42387.930923: sched_exec <-do_execve_common.isra.24 ls-5080 [003] .... 42387.930923: _raw_spin_lock_irqsave <-sched_exec ls-5080 [003] d... 42387.930923: select_task_rq_fair <-sched_exec ls-5080 [003] d... 42387.930923: source_load <-select_task_rq_fair ls-5080 [003] d... 42387.930923: target_load <-select_task_rq_fair ls-5080 [003] d... 42387.930923: target_load <-select_task_rq_fair ls-5080 [003] d... 42387.930924: target_load <-select_task_rq_fair ls-5080 [003] d... 42387.930924: _raw_spin_unlock_irqrestore <-sched_exec ls-5080 [003] .... 42387.930924: stop_one_cpu <-sched_exec ls-5080 [003] .... 42387.930924: cpu_stop_init_done <-stop_one_cpu ls-5080 [003] .... 42387.930924: __init_waitqueue_head <-cpu_stop_init_done ls-5080 [003] .... 42387.930924: cpu_stop_queue_work <-stop_one_cpu ls-5080 [003] .... 42387.930924: _raw_spin_lock_irqsave <-cpu_stop_queue_work ls-5080 [003] d... 42387.930924: wake_up_process <-cpu_stop_queue_work ls-5080 [003] d... 42387.930925: try_to_wake_up <-wake_up_process ls-5080 [003] d... 42387.930925: _raw_spin_lock_irqsave <-try_to_wake_up ls-5080 [003] d... 42387.930925: select_task_rq_stop <-try_to_wake_up ls-5080 [003] d... 42387.930925: _raw_spin_lock <-try_to_wake_up ls-5080 [003] d... 42387.930925: ttwu_do_activate.constprop.95 <-try_to_wake_up ls-5080 [003] d... 42387.930925: activate_task <-ttwu_do_activate.constprop.95 ls-5080 [003] d... 42387.930925: enqueue_task <-activate_task ls-5080 [003] d... 42387.930925: update_rq_clock.part.83 <-enqueue_task ls-5080 [003] d... 42387.930925: enqueue_task_stop <-enqueue_task ls-5080 [003] d... 42387.930926: ttwu_do_wakeup <-ttwu_do_activate.constprop.95 ls-5080 [003] d... 42387.930926: check_preempt_curr <-ttwu_do_wakeup ls-5080 [003] d... 42387.930926: resched_task <-check_preempt_curr ls-5080 [003] dN.. 42387.930926: _raw_spin_unlock_irqrestore <-try_to_wake_up ls-5080 [003] dN.. 42387.930926: _raw_spin_unlock_irqrestore <-cpu_stop_queue_work ls-5080 [003] .N.. 42387.930926: wait_for_completion <-stop_one_cpu ls-5080 [003] .N.. 42387.930926: _cond_resched <-wait_for_completion ls-5080 [003] .N.. 42387.930927: __schedule <-_cond_resched ls-5080 [003] .N.. 42387.930927: rcu_note_context_switch <-__schedule ls-5080 [003] .N.. 42387.930927: _raw_spin_lock_irq <-__schedule ls-5080 [003] dN.. 42387.930927: put_prev_task_fair <-__schedule ls-5080 [003] dN.. 42387.930927: update_curr <-put_prev_task_fair ls-5080 [003] dN.. 42387.930927: update_min_vruntime <-update_curr ls-5080 [003] dN.. 42387.930927: cpuacct_charge <-update_curr ls-5080 [003] dN.. 42387.930927: __enqueue_entity <-put_prev_task_fair ls-5080 [003] dN.. 42387.930928: update_curr <-put_prev_task_fair ls-5080 [003] dN.. 42387.930928: update_min_vruntime <-update_curr ls-5080 [003] dN.. 42387.930928: __enqueue_entity <-put_prev_task_fair ls-5080 [003] dN.. 42387.930928: pick_next_task_stop <-__schedule ##### CPU 0 buffer started #### ls-5080 [000] d... 42387.933572: update_min_vruntime <-update_curr ls-5080 [000] d... 42387.933572: cpuacct_charge <-update_curr ls-5080 [000] d... 42387.933573: update_cfs_rq_blocked_load <-dequeue_entity ls-5080 [000] d... 42387.933573: clear_buddies <-dequeue_entity ls-5080 [000] d... 42387.933573: account_entity_dequeue <-dequeue_entity ls-5080 [000] d... 42387.933573: update_min_vruntime <-dequeue_entity ls-5080 [000] d... 42387.933573: update_cfs_shares <-dequeue_entity ls-5080 [000] d... 42387.933573: update_curr <-update_cfs_shares ls-5080 [000] d... 42387.933573: update_min_vruntime <-update_curr ls-5080 [000] d... 42387.933573: account_entity_dequeue <-update_cfs_shares ls-5080 [000] d... 42387.933573: account_entity_enqueue <-update_cfs_shares ls-5080 [000] d... 42387.933573: dequeue_entity <-dequeue_task_fair ls-5080 [000] d... 42387.933573: update_curr <-dequeue_entity ls-5080 [000] d... 42387.933574: update_cfs_rq_blocked_load <-dequeue_entity ls-5080 [000] d... 42387.933574: clear_buddies <-dequeue_entity ls-5080 [000] d... 42387.933574: account_entity_dequeue <-dequeue_entity ls-5080 [000] d... 42387.933574: update_min_vruntime <-dequeue_entity ls-5080 [000] d... 42387.933574: update_cfs_shares <-dequeue_entity ls-5080 [000] d... 42387.933574: hrtick_update <-dequeue_task_fair ls-5080 [000] d... 42387.933574: put_prev_task_fair <-sched_move_task ls-5080 [000] d... 42387.933574: task_wants_autogroup <-sched_move_task ls-5080 [000] d... 42387.933575: task_move_group_fair <-sched_move_task ls-5080 [000] d... 42387.933575: set_curr_task_fair <-sched_move_task ls-5080 [000] d... 42387.933575: enqueue_task <-sched_move_task ls-5080 [000] d... 42387.933575: update_rq_clock.part.83 <-enqueue_task ls-5080 [000] d... 42387.933575: enqueue_task_fair <-enqueue_task ls-5080 [000] d... 42387.933575: enqueue_entity <-enqueue_task_fair ls-5080 [000] d... 42387.933575: update_curr <-enqueue_entity ls-5080 [000] d... 42387.933575: update_min_vruntime <-update_curr ls-5080 [000] d... 42387.933575: cpuacct_charge <-update_curr ls-5080 [000] d... 42387.933575: update_cfs_rq_blocked_load <-enqueue_entity ls-5080 [000] d... 42387.933575: account_entity_enqueue <-enqueue_entity ls-5080 [000] d... 42387.933575: update_cfs_shares <-enqueue_entity ls-5080 [000] d... 42387.933576: hrtick_update <-enqueue_task_fair ls-5080 [000] d... 42387.933576: _raw_spin_unlock_irqrestore <-sched_move_task ls-5080 [000] .... 42387.933576: smp_call_function_single <-task_function_call ls-5080 [000] .... 42387.933577: __put_css_set <-cgroup_exit ls-5080 [000] .... 42387.933577: module_put <-do_exit ls-5080 [000] .... 42387.933577: proc_exit_connector <-do_exit ls-5080 [000] .... 42387.933578: flush_ptrace_hw_breakpoint <-do_exit ls-5080 [000] .... 42387.933578: unregister_hw_breakpoint <-flush_ptrace_hw_breakpoint ls-5080 [000] .... 42387.933578: unregister_hw_breakpoint <-flush_ptrace_hw_breakpoint ls-5080 [000] .... 42387.933578: unregister_hw_breakpoint <-flush_ptrace_hw_breakpoint ls-5080 [000] .... 42387.933578: unregister_hw_breakpoint <-flush_ptrace_hw_breakpoint ls-5080 [000] .... 42387.933578: _raw_write_lock_irq <-do_exit ls-5080 [000] d... 42387.933578: exit_ptrace <-do_exit ls-5080 [000] d... 42387.933578: task_active_pid_ns <-do_exit ls-5080 [000] .... 42387.933578: _raw_write_lock_irq <-do_exit ls-5080 [000] d... 42387.933578: kill_orphaned_pgrp <-do_exit ls-5080 [000] d... 42387.933579: will_become_orphaned_pgrp <-kill_orphaned_pgrp ls-5080 [000] d... 42387.933579: do_notify_parent <-do_exit ls-5080 [000] d... 42387.933579: task_active_pid_ns <-do_notify_parent ls-5080 [000] d... 42387.933579: __task_pid_nr_ns <-do_notify_parent ls-5080 [000] d... 42387.933579: from_kuid_munged <-do_notify_parent ls-5080 [000] d... 42387.933579: map_id_up <-from_kuid_munged ls-5080 [000] d... 42387.933580: jiffies_to_clock_t <-do_notify_parent ls-5080 [000] d... 42387.933580: jiffies_to_clock_t <-do_notify_parent ls-5080 [000] d... 42387.933580: _raw_spin_lock_irqsave <-do_notify_parent ls-5080 [000] d... 42387.933580: send_signal <-do_notify_parent ls-5080 [000] d... 42387.933580: __send_signal <-send_signal ls-5080 [000] d... 42387.933580: prepare_signal <-__send_signal ls-5080 [000] d... 42387.933581: __sigqueue_alloc <-__send_signal ls-5080 [000] d... 42387.933581: kmem_cache_alloc <-__sigqueue_alloc ls-5080 [000] d... 42387.933581: complete_signal <-__send_signal ls-5080 [000] d... 42387.933581: __wake_up_parent <-do_notify_parent ls-5080 [000] d... 42387.933581: __wake_up_sync_key <-__wake_up_parent ls-5080 [000] d... 42387.933581: _raw_spin_lock_irqsave <-__wake_up_sync_key ls-5080 [000] d... 42387.933581: __wake_up_common <-__wake_up_sync_key ls-5080 [000] d... 42387.933582: child_wait_callback <-__wake_up_common ls-5080 [000] d... 42387.933582: default_wake_function <-child_wait_callback ls-5080 [000] d... 42387.933582: try_to_wake_up <-default_wake_function ls-5080 [000] d... 42387.933582: _raw_spin_lock_irqsave <-try_to_wake_up ls-5080 [000] d... 42387.933582: task_waking_fair <-try_to_wake_up ls-5080 [000] d... 42387.933583: select_task_rq_fair <-try_to_wake_up ls-5080 [000] d... 42387.933583: source_load <-select_task_rq_fair ls-5080 [000] d... 42387.933583: target_load <-select_task_rq_fair ls-5080 [000] d... 42387.933583: effective_load.isra.40 <-select_task_rq_fair ls-5080 [000] d... 42387.933583: effective_load.isra.40 <-select_task_rq_fair ls-5080 [000] d... 42387.933583: idle_cpu <-select_task_rq_fair ls-5080 [000] d... 42387.933583: cpus_share_cache <-select_task_rq_fair ls-5080 [000] d... 42387.933583: idle_cpu <-select_task_rq_fair ls-5080 [000] d... 42387.933583: _raw_spin_lock <-try_to_wake_up ls-5080 [000] d... 42387.933584: ttwu_do_activate.constprop.95 <-try_to_wake_up ls-5080 [000] d... 42387.933584: activate_task <-ttwu_do_activate.constprop.95 ls-5080 [000] d... 42387.933584: enqueue_task <-activate_task ls-5080 [000] d... 42387.933584: update_rq_clock.part.83 <-enqueue_task ls-5080 [000] d... 42387.933584: enqueue_task_fair <-enqueue_task ls-5080 [000] d... 42387.933584: enqueue_entity <-enqueue_task_fair ls-5080 [000] d... 42387.933584: update_curr <-enqueue_entity ls-5080 [000] d... 42387.933585: __compute_runnable_contrib.part.58 <-enqueue_entity ls-5080 [000] d... 42387.933585: update_cfs_rq_blocked_load <-enqueue_entity ls-5080 [000] d... 42387.933585: account_entity_enqueue <-enqueue_entity ls-5080 [000] d... 42387.933585: update_cfs_shares <-enqueue_entity ls-5080 [000] d... 42387.933585: place_entity <-enqueue_entity ls-5080 [000] d... 42387.933585: __enqueue_entity <-enqueue_entity ls-5080 [000] d... 42387.933585: enqueue_entity <-enqueue_task_fair ls-5080 [000] d... 42387.933585: update_curr <-enqueue_entity ls-5080 [000] d... 42387.933586: update_cfs_rq_blocked_load <-enqueue_entity ls-5080 [000] d... 42387.933586: account_entity_enqueue <-enqueue_entity ls-5080 [000] d... 42387.933586: update_cfs_shares <-enqueue_entity ls-5080 [000] d... 42387.933586: place_entity <-enqueue_entity ls-5080 [000] d... 42387.933586: __enqueue_entity <-enqueue_entity ls-5080 [000] d... 42387.933586: hrtick_update <-enqueue_task_fair ls-5080 [000] d... 42387.933586: ttwu_do_wakeup <-ttwu_do_activate.constprop.95 ls-5080 [000] d... 42387.933586: check_preempt_curr <-ttwu_do_wakeup ls-5080 [000] d... 42387.933586: resched_task <-check_preempt_curr ls-5080 [000] d... 42387.933586: _raw_spin_unlock_irqrestore <-try_to_wake_up ls-5080 [000] d... 42387.933587: _raw_spin_unlock_irqrestore <-__wake_up_sync_key ls-5080 [000] d... 42387.933587: _raw_spin_unlock_irqrestore <-do_notify_parent ls-5080 [000] .... 42387.933587: _raw_spin_lock <-do_exit ls-5080 [000] .... 42387.933587: exit_rcu <-do_exit ls-5080 [000] .... 42387.933587: schedule <-do_exit ls-5080 [000] .... 42387.933587: __schedule <-schedule ls-5080 [000] .... 42387.933587: rcu_note_context_switch <-__schedule ls-5080 [000] .... 42387.933587: _raw_spin_lock_irq <-__schedule ls-5080 [000] d... 42387.933587: deactivate_task <-__schedule ls-5080 [000] d... 42387.933588: dequeue_task <-deactivate_task ls-5080 [000] d... 42387.933588: update_rq_clock.part.83 <-dequeue_task ls-5080 [000] d... 42387.933588: dequeue_task_fair <-dequeue_task ls-5080 [000] d... 42387.933588: dequeue_entity <-dequeue_task_fair ls-5080 [000] d... 42387.933588: update_curr <-dequeue_entity ls-5080 [000] d... 42387.933588: update_min_vruntime <-update_curr ls-5080 [000] d... 42387.933588: cpuacct_charge <-update_curr ls-5080 [000] d... 42387.933588: update_cfs_rq_blocked_load <-dequeue_entity ls-5080 [000] d... 42387.933588: clear_buddies <-dequeue_entity ls-5080 [000] d... 42387.933588: account_entity_dequeue <-dequeue_entity ls-5080 [000] d... 42387.933588: update_min_vruntime <-dequeue_entity ls-5080 [000] d... 42387.933588: update_cfs_shares <-dequeue_entity ls-5080 [000] d... 42387.933589: hrtick_update <-dequeue_task_fair ls-5080 [000] d... 42387.933589: idle_balance <-__schedule ls-5080 [000] d... 42387.933589: update_blocked_averages <-idle_balance ls-5080 [000] d... 42387.933589: _raw_spin_lock_irqsave <-update_blocked_averages ls-5080 [000] d... 42387.933589: update_rq_clock <-update_blocked_averages ls-5080 [000] d... 42387.933589: update_rq_clock.part.83 <-update_rq_clock ls-5080 [000] d... 42387.933589: update_cfs_rq_blocked_load <-update_blocked_averages ls-5080 [000] d... 42387.933589: update_cfs_rq_blocked_load <-update_blocked_averages ls-5080 [000] d... 42387.933590: update_cfs_rq_blocked_load <-update_blocked_averages ls-5080 [000] d... 42387.933590: update_cfs_rq_blocked_load <-update_blocked_averages ls-5080 [000] d... 42387.933590: _raw_spin_unlock_irqrestore <-update_blocked_averages ls-5080 [000] d... 42387.933590: load_balance <-idle_balance ls-5080 [000] d... 42387.933590: find_busiest_group <-load_balance ls-5080 [000] d... 42387.933590: update_group_power <-find_busiest_group ls-5080 [000] d... 42387.933590: msecs_to_jiffies <-update_group_power ls-5080 [000] d... 42387.933591: arch_scale_freq_power <-update_group_power ls-5080 [000] d... 42387.933591: target_load <-find_busiest_group ls-5080 [000] d... 42387.933591: idle_cpu <-find_busiest_group ls-5080 [000] d... 42387.933591: source_load <-find_busiest_group ls-5080 [000] d... 42387.933591: idle_cpu <-find_busiest_group ls-5080 [000] d... 42387.933591: source_load <-find_busiest_group ls-5080 [000] d... 42387.933591: idle_cpu <-find_busiest_group ls-5080 [000] d... 42387.933592: source_load <-find_busiest_group ls-5080 [000] d... 42387.933592: idle_cpu <-find_busiest_group ls-5080 [000] d... 42387.933592: msecs_to_jiffies <-idle_balance ls-5080 [000] d... 42387.933592: _raw_spin_lock <-idle_balance ls-5080 [000] d... 42387.933592: put_prev_task_fair <-__schedule ls-5080 [000] d... 42387.933592: pick_next_task_fair <-__schedule ls-5080 [000] d... 42387.933592: pick_next_task_stop <-__schedule ls-5080 [000] d... 42387.933593: pick_next_task_dl <-__schedule ls-5080 [000] d... 42387.933593: pick_next_task_rt <-__schedule ls-5080 [000] d... 42387.933593: pick_next_task_fair <-__schedule ls-5080 [000] d... 42387.933593: pick_next_task_idle <-__schedule
As you can see, quite a lot of output is produced even for a single invocation of ls.
The volume of date outputted by ftrace can be enormous. Consider the following simple Hello World C language program compiled with GCC 4.8.2:
#include <stdio.h> int main() { printf("hello world\n"); }
When traced with the above ftrace script, over 11000 lines of tracing data was produced!
In the next example, the function call tracer is replaced with the function_graph call tracer. This tracer traces entry into and exit from kernel functions and provides the ability to draw a graph of kernal function calls that were invoked.
#!/bin/bash DEBUGFS=`grep debugfs /proc/mounts | awk '{ print $2; }'` echo $$ > $DEBUGFS/tracing/set_ftrace_pid echo function_graph > $DEBUGFS/tracing/current_tracer echo 1 > $DEBUGFS/tracing/tracing_on exec $* echo 0 > $DEBUGFS/tracing/tracing_on
Here is the new output:
# tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) 0.038 us | mntput_no_expire(); 0) 0.310 us | } /* mntput */ 0) + 35.884 us | } /* path_openat */ 0) + 36.234 us | } /* do_filp_open */ 0) 0.038 us | __fsnotify_parent(); 0) 0.044 us | fsnotify(); 0) + 37.168 us | } /* do_open_exec */ 0) | sched_exec() { 0) 0.039 us | _raw_spin_lock_irqsave(); 0) | select_task_rq_fair() { 0) 0.043 us | source_load(); 0) 0.044 us | target_load(); 0) 0.044 us | target_load(); 0) 0.038 us | target_load(); 0) 1.990 us | } 0) 0.044 us | _raw_spin_unlock_irqrestore(); 0) | stop_one_cpu() { 0) | cpu_stop_init_done() { 0) 0.036 us | __init_waitqueue_head(); 0) 0.388 us | } 0) | cpu_stop_queue_work() { 0) 0.036 us | _raw_spin_lock_irqsave(); 0) | wake_up_process() { 0) | try_to_wake_up() { 0) 0.094 us | _raw_spin_lock_irqsave(); 0) 0.101 us | select_task_rq_stop(); 0) 0.042 us | _raw_spin_lock(); 0) | ttwu_do_activate.constprop.95() { 0) | activate_task() { 0) | enqueue_task() { 0) 0.066 us | update_rq_clock.part.83(); 0) 0.029 us | enqueue_task_stop(); 0) 0.660 us | } 0) 0.951 us | } 0) | ttwu_do_wakeup() { 0) | check_preempt_curr() { 0) 0.039 us | resched_task(); 0) 0.339 us | } 0) 0.753 us | } 0) 2.242 us | } 0) 0.046 us | _raw_spin_unlock_irqrestore(); 0) 4.537 us | } 0) 5.156 us | } 0) 0.044 us | _raw_spin_unlock_irqrestore(); 0) 6.171 us | } 0) | wait_for_completion() { 0) | _cond_resched() { 0) | __schedule() { 0) 0.049 us | rcu_note_context_switch(); 0) 0.038 us | _raw_spin_lock_irq(); 0) | put_prev_task_fair() { 0) | update_curr() { 0) 0.029 us | update_min_vruntime(); 0) 0.074 us | cpuacct_charge(); 0) 0.658 us | } 0) 0.052 us | __enqueue_entity(); 0) | update_curr() { 0) 0.036 us | update_min_vruntime(); 0) 0.326 us | } 0) 0.039 us | __enqueue_entity(); 0) 2.463 us | } 0) 0.053 us | pick_next_task_stop(); 1) 8.842 us | } /* enqueue_task */ 1) 9.424 us | } /* activate_task */ 1) | ttwu_do_wakeup() { 1) | check_preempt_curr() { 1) 0.152 us | resched_task(); 1) 0.617 us | } 1) 1.093 us | } 1) + 11.337 us | } /* ttwu_do_activate.constprop.95 */ 1) 0.068 us | _raw_spin_unlock_irqrestore(); 1) + 17.778 us | } /* try_to_wake_up */ 1) + 18.213 us | } /* default_wake_function */ 1) + 18.734 us | } /* child_wait_callback */ 1) + 19.313 us | } /* __wake_up_common */ 1) 0.076 us | _raw_spin_unlock_irqrestore(); 1) + 20.689 us | } /* __wake_up_sync_key */ 1) + 21.126 us | } /* __wake_up_parent */ 1) 0.067 us | _raw_spin_unlock_irqrestore(); 1) + 29.656 us | } /* do_notify_parent */ 1) 0.053 us | _raw_spin_lock(); 1) 0.052 us | exit_rcu(); 1) | schedule() { 1) | __schedule() { 1) 0.053 us | rcu_note_context_switch(); 1) 0.051 us | _raw_spin_lock_irq(); 1) | deactivate_task() { 1) | dequeue_task() { 1) 0.103 us | update_rq_clock.part.83(); 1) | dequeue_task_fair() { 1) | dequeue_entity() { 1) | update_curr() { 1) 0.056 us | update_min_vruntime(); 1) 0.058 us | cpuacct_charge(); 1) 0.933 us | } 1) 0.083 us | update_cfs_rq_blocked_load(); 1) 0.049 us | clear_buddies(); 1) 0.096 us | account_entity_dequeue(); 1) 0.072 us | update_min_vruntime(); 1) 0.051 us | update_cfs_shares(); 1) 3.705 us | } 1) 0.049 us | hrtick_update(); 1) 4.626 us | } 1) 5.519 us | } 1) 5.952 us | } 1) | idle_balance() { 1) | update_blocked_averages() { 1) 0.055 us | _raw_spin_lock_irqsave(); 1) | update_rq_clock() { 1) 0.114 us | update_rq_clock.part.83(); 1) 0.534 us | } 1) 0.089 us | update_cfs_rq_blocked_load(); 1) 0.114 us | update_cfs_rq_blocked_load(); 1) 0.100 us | update_cfs_rq_blocked_load(); 1) 0.199 us | update_cfs_rq_blocked_load(); 1) 0.080 us | update_cfs_rq_blocked_load(); 1) 0.063 us | _raw_spin_unlock_irqrestore(); 1) 5.311 us | } 1) | load_balance() { 1) | find_busiest_group() { 1) | update_group_power() { 1) 0.051 us | msecs_to_jiffies(); 1) 0.052 us | arch_scale_freq_power(); 1) 1.098 us | } 1) 0.056 us | target_load(); 1) 0.052 us | idle_cpu(); 1) 0.059 us | source_load(); 1) 0.102 us | idle_cpu(); 1) 0.068 us | source_load(); 1) 0.086 us | idle_cpu(); 1) 0.053 us | source_load(); 1) 0.058 us | idle_cpu(); 1) 6.578 us | } 1) 7.267 us | } 1) 0.054 us | msecs_to_jiffies(); 1) 0.051 us | _raw_spin_lock(); 1) + 14.520 us | } 1) 0.094 us | put_prev_task_fair(); 1) 0.077 us | pick_next_task_fair(); 1) 0.073 us | pick_next_task_stop(); 1) 0.094 us | pick_next_task_dl(); 1) 0.136 us | pick_next_task_rt(); 1) 0.073 us | pick_next_task_fair(); 1) 0.051 us | pick_next_task_idle();
To simplify the use of ftrace, Rostedt has written two tools – trace-cmd and kernelshark. These are available as standard repo RPMs in Fedora 20. Trace-cmd is a binary command line tool which acts as a user interface to ftrace whereas Kernelshark is a graphical consumer of trace-cmd output.
# trace-cmd record -e sched ls .... # ls -al trace.dat -rw-r--r--. 1 root root 4136960 May 8 16:44 trace.dat # wc -l trace.dat 94909 trace.dat
By default, output is written to a file called trace.dat in the current directory.
Here I show kernelsmark using the above trace.dat file to display the collected tracing data:
As you can see below, this data can be filtered in numerous ways to reduce screen clutter and eliminate extraneous output:
You can read a good tutorial on kernelshark here. It takes a bit of effort to understand either the raw output of ftrace or what is displayed using kernelshark but the effort is worth it when you are dealing with difficult kernel timing or locking issues.
Well, that is all for this particular blog post. Note that there is far more to ftrace than what I have managed to include in this blog. It is an incredibly powerful tool which most Linux system administrators and developers seem to be unaware of.