I am pleased to announce that I will be running my
"Concurrent Thinking in C++"
class at CppCon again this year. Here is the course description:
One of the most difficult issues around designing software with multiple
threads of execution is synchronizing data.
Whether you use actors, active objects, futures and continuations or mutable
shared state, every non-trivial system with multiple threads needs to transfer
data between them. This means thinking about which data needs to be processed
by which thread, and ensuring that the right data gets to the right threads in
the right order. It also means thinking about API design to avoid race
conditions.
In this workshop you will encounter a series of scenarios involving
multithreaded code, and be guided through identifying the problem areas and
the ways of handling them.
You will learn techniques for thinking about the scenarios to ease the
analysis, as well as details of the tools we have available in C++ to mitigate
the problems. You will also learn how to use the C++ standard library to help
enforce the requirements of each scenario in code.
C++17 is adding parallel overloads of most of the Standard Library
algorithms. There is a TS for Concurrency in C++ already published, and a TS
for Coroutines in C++ and a second TS for Concurrency in C++ in the works.
What does all this mean for programmers? How are they all related? How do
coroutines help with parallelism?
This session will attempt to answer these questions and more. We will look at
the implementation of parallel algorithms, and how continuations, coroutines
and work-stealing fit together. We will also look at how this meshes with the
Grand Unified Executors Proposal, and how you will be able to take advantage
of all this as an application developer.
My class is on 17th-18th September, and the main conference is running
19th-23rd, with my presentation on 20th September. If you haven't got your
ticket already, head on over
to CppCon Registration to get yours now.
The first one defines C::foo, the second defines A::foo and the third defines B::foo. This is valid because of an entity known as the injected-type-name:
A class-name is inserted into the scope in which it is declared immediately after the class-name is seen. The class-name is also inserted into the scope of the class itself; this is known as the injected-class-name. For purposes of access checking, the injected-class-name is treated as if it were a public member name. […]
Since A is a base class of C and the name A is injected into the scope of A, A is also visible from C.
The injected-class-name exists to ensure that the class is found during name lookup instead of entities in an enclosing scope. It also makes referring to the class name in a template instantiation easier. But since we’re awful people, we can use this perfectly reasonable feature do horribly perverse things like this:
Last time we took a look at the simulated annealing global minimisation algorithm which searches for points at which functions return their least possible values and which drew its inspiration from the metallurgical process of annealing which minimises the energy state of the crystalline structure of metals by first heating and then slowly cooling them.
Now as it happens, physics isn't the only branch of science from which we can draw inspiration for global optimisation algorithms. For example, in biology we have the process of evolution through which the myriad species of life on Earth have become extraordinarily well adapted to their environments. Put very simply this happens because offspring differ slightly from their parents and differences that reduce the chances that they will survive to have offspring of their own are less likely to be passed down through the generations than those that increase those chances.
Noting that extraordinarily well adapted is more or less synonymous with near maximally adapted, it's not unreasonable to suppose that we might exploit a mathematical model of evolution to search for global maxima of functions, being those points at which they return their greatest possible values.
I have submitted the following via the questionnaire:
 
The Oxpens bridge should move upstream to parallel the railway bridge with a path extension besides railway to station. Getting from Thames/railway station to canal towpath and up to Kidlington Airport and Kidlington new housing extension needed. Second bridge, again parallel to rail bridge needed to access Science Park.
 
I have used the Thames towpath for commuting to Osney Mead industrial estate for three months and for work on Cornmarket for three years. I also used the path for three months to commute to Kidlington (Oxford Airport) but it is too arduous and much to my sadness I now cycle on the road, so have trenchant views on the difficulty of cycle commuting in Oxford.
 
I understand that there is a plan to build many new houses in the Kidlington gap, between Summertown and Kidlington.
 
These new commuters would very much appreciate a functioning cycle route into the centre of town.
 
The new Oxford Parkway should be integrated into this cycle route.
 
The problem facing cyclists is not how to get from the towpath to the Centre, this is served by the pipe bridge, the pedestrian crossing down stream and Folley Bridge. What is needed is easy access to the railway station which could be easily achieved by a bridge parallel to the railway bridge and then along railway land to the station itself.
 
Cyclists wishing to get from the Thames to the canal have to continue to Osney, cross the road and then fiddle around the Thames onto the canal path, which is in a very poor state, then up to Kingsbridge and all the way through Kidlington to the Oxford Airport.
 
Finally the Thames path should connect the Science Park, and the planned new housing at Littlemore. This again could be achieved by a cycle bridge parallel to the existing railway bridge down stream from the bypass underpass.
 
I really welcome the proposals but would urge you to consider extending its scope and vision. This could be such a good route and would show that Oxford is a cycling city.
 
best regards Tim Pizey -- Tim Pizey - http://tim.pizey.uk/
We’re finally here at the last post of the series! This time I’ll be giving a high-level overview of some more advanced concepts in debugging: remote debugging, shared library support, expression evaluation, and multi-threaded support. These ideas are more complex to implement, so I won’t walk through how to do so in detail, but I’m happy to answer questions about these concepts if you have any.
Remote debugging is very useful for embedded systems or debugging the effects of environment differences. It also sets a nice divide between the high-level debugger operations and the interaction with the operating system and hardware. In fact, debuggers like GDB and LLDB operate as remote debuggers even when debugging local programs. The general architecture is this:
The debugger is the component which we interact with through the command line. Maybe if you’re using an IDE there’ll be another layer on top which communicates with the debugger through the machine interface. On the target machine (which may be the same as the host) there will be a debug stub, which in theory is a very small wrapper around the OS debug library which carries out all of your low-level debugging tasks like setting breakpoints on addresses. I say “in theory†because stubs are getting larger and larger these days. The LLDB debug stub on my machine is 7.6MB, for example. The debug stub communicates with the debugee process using some OS-specific features (in our case, ptrace), and with the debugger though some remote protocol.
The most common remote protocol for debugging is the GDB remote protocol. This is a text-based packet format for communicating commands and information between the debugger and debug stub. I won’t go into detail about it, but you can read all you could want to know about it here. If you launch LLDB and execute the command log enable gdb-remote packets then you’ll get a trace of all packets sent through the remote protocol. On GDB you can write set remotelogfile <file> to do the same.
As a simple example, here’s the packet to set a breakpoint:
$Z0,400570,1#43
$ marks the start of the packet. Z0 is the command to insert a memory breakpoint. 400570 and 1 are the argumets, where the former is the address to set a breakpoint on and the latter is a target-specific breakpoint kind specifier. Finally, the #43 is a checksum to ensure that there was no data corruption.
The GDB remote protocol is very easy to extend for custom packets, which is very useful for implementing platform- or language-specific functionality.
Shared library and dynamic loading support
The debugger needs to know what shared libraries have been loaded by the debuggee so that it can set breakpoints, get source-level information and symbols, etc. As well as finding libraries which have been dynamically linked against, the debugger must track libraries which are loaded at runtime through dlopen. To facilitate this, the dynamic linker maintains a rendezvous structure. This structure maintains a linked list of shared library descriptors, along with a pointer to a function which is called whenever the linked list is updated. This structure is stored where the .dynamic section of the ELF file is loaded, and is initialized before program execution.
A simple tracing algorithm is this:
The tracer looks up the entry point of the program in the ELF header (or it could use the auxillary vector stored in /proc/<pid>/aux)
The tracer places a breakpoint on the entry point of the program and begins execution.
When the breakpoint is hit, the address of the rendezvous structure is found by looking up the load address of .dynamic in the ELF file.
The rendezvous structure is examined to get the list of currently loaded libraries.
A breakpoint is set on the linker update function.
Whenever the breakpoint is hit, the list is updated.
The tracer infinitely loops, continuing the program and waiting for a signal until the tracee signals that it has exited.
I’ve written a small demonstration of these concepts, which you can find here. I can do a more detailed write up of this in the future if anyone is interested.
Expression evaluation
Expression evaluation is a feature which lets users evaluate expressions in the original source language while debugging their application. For example, in LLDB or GDB you could execute print foo() to call the foo function and print the result.
Depending on how complex the expression is, there are a few different ways of evaluating it. If the expression is a simple identifier, then the debugger can look at the debug information, locate the variable and print out the value, just like we did in the last part of this series. If the expression is a bit more complex, then it may be possible to compile the code to an intermediate representation (IR) and interpret that to get the result. For example, for some expressions LLDB will use Clang to compile the expression to LLVM IR and interpret that. If the expression is even more complex, or requires calling some function, then the code might need to be JITted to the target and executed in the address space of the debuggee. This involves calling mmap to allocate some executable memory, then the compiled code is copied to this block and is executed. LLDB does this by using LLVM’s JIT functionality.
The debugger shown in this series only supports single threaded applications, but to debug most real-world applications, multi-threaded support is highly desirable. The simplest way to support this is to trace thread creation and parse the procfs to get the information you want.
The Linux threading library is called pthreads. When pthread_create is called, the library creates a new thread using the clone syscall, and we can trace this syscall with ptrace (assuming your kernel is older than 2.5.46). To do this, you’ll need to set some ptrace options after attaching to the debuggee:
Now when clone is called, the process will be signaled with our old friend SIGTRAP. For the debugger in this series, you can add a case to handle_sigtrap which can handle the creation of the new thread:
case(SIGTRAP|(PTRACE_EVENT_CLONE<<8)): //get the new thread ID unsignedlongevent_message=0; ptrace(PTRACE_GETEVENTMSG,pid,nullptr,message);
//handle creation //...
Once you’ve got that, you can look in /proc/<pid>/task/ and read the memory maps and suchlike to get all the information you need.
GDB uses libthread_db, which provides a bunch of helper functions so that you don’t need to do all the parsing and processing yourself. Setting up this library is pretty weird and I won’t show how it works here, but you can go and read this tutorial if you’d like to use it.
The most complex part of multithreaded support is modelling the thread state in the debugger, particularly if you want to support non-stop mode or some kind of heterogeneous debugging where you have more than just a CPU involved in your computation.
The end!
Whew! This series took a long time to write, but I learned a lot in the process and I hope it was helpful. Get in touch on Twitter @TartanLlama or in the comments section if you want to chat about debugging or have any questions about the series. If there are any other debugging topics you’d like to see covered then let me know and I might do a bonus post.