Python 3 – large numbers of tasks with limited concurrency

Andy Balaam from Andy Balaam's Blog

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

I am interested in running large numbers of tasks in parallel, so I need something like asyncio.as_completed, but taking an iterable instead of a list, and with a limited number of tasks running concurrently. First, let’s try to build something pretty much equivalent to asyncio.as_completed. Here is my attempt, but I’d welcome feedback from readers who know better:

# Note this is not a coroutine - it returns
# an iterator - but it crucially depends on
# work being done inside the coroutines it
# yields - those coroutines empty out the
# list of futures it holds, and it will not
# end until that list is empty.
def my_as_completed(coros):

    # Start all the tasks
    futures = [asyncio.ensure_future(c) for c in coros]

    # A coroutine that waits for one of the
    # futures to finish and then returns
    # its result.
    async def first_to_finish():

        # Wait forever - we could add a
        # timeout here instead.
        while True:

            # Give up control to the scheduler
            # - otherwise we will spin here
            # forever!
            await asyncio.sleep(0)

            # Return anything that has finished
            for f in futures:
                if f.done():
                    futures.remove(f)
                    return f.result()

    # Keep yielding a waiting coroutine
    # until all the futures have finished.
    while len(futures) > 0:
        yield first_to_finish()

The above can be substituted for asyncio.as_completed in the code that uses it in the first article, and it seems to work. It also makes a reasonable amount of sense to me, so it may be correct, but I’d welcome comments and corrections.

my_as_completed above accepts an iterable and returns a generator producing results, but inside it starts all tasks concurrently, and stores all the futures in a list. To handle bigger lists we will need to do better, by limiting the number of running tasks to a sensible number.

Let’s start with a test program:

import asyncio
async def mycoro(number):
    print("Starting %d" % number)
    await asyncio.sleep(1.0 / number)
    print("Finishing %d" % number)
    return str(number)

async def print_when_done(tasks):
    for res in asyncio.as_completed(tasks):
        print("Result %s" % await res)

coros = [mycoro(i) for i in range(1, 101)]

loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

This uses asyncio.as_completed to run 100 tasks and, because I adjusted the asyncio.sleep command to wait longer for earlier tasks, it prints something like this:

$ time python3 python-async.py
Starting 47
Starting 93
Starting 48
...
Finishing 93
Finishing 94
Finishing 95
...
Result 93
Result 94
Result 95
...
Finishing 46
Finishing 45
Finishing 42
...
Finishing 2
Result 2
Finishing 1
Result 1

real    0m1.590s
user    0m0.600s
sys 0m0.072s

So all 100 tasks were completed in 1.5 seconds, indicating that they really were run in parallel, but all 100 were allowed to run at the same time, with no limit.

We can adjust the test program to run using our customised my_as_completed function, and pass in an iterable of coroutines instead of a list by changing the last part of the program to look like this:

async def print_when_done(tasks):
    for res in my_as_completed(tasks):
        print("Result %s" % await res)
coros = (mycoro(i) for i in range(1, 101))
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

But we get similar output to last time, with all tasks running concurrently.

To limit the number of concurrent tasks, we limit the size of the futures list, and add more as needed:

from itertools import islice
def limited_as_completed(coros, limit):
    futures = [
        asyncio.ensure_future(c)
        for c in islice(coros, 0, limit)
    ]
    async def first_to_finish():
        while True:
            await asyncio.sleep(0)
            for f in futures:
                if f.done():
                    futures.remove(f)
                    try:
                        newf = next(coros)
                        futures.append(
                            asyncio.ensure_future(newf))
                    except StopIteration as e:
                        pass
                    return f.result()
    while len(futures) > 0:
        yield first_to_finish()

We start limit tasks at first, and whenever one ends, we ask for the next coroutine in coros and set it running. This keeps the number of running tasks at or below limit until we start running out of input coroutines (when next throws and we don’t add anything to futures), then futures starts emptying until we eventually stop yielding coroutine objects.

I thought this function might be useful to others, so I started a little repo over here and added it: asyncioplus/limited_as_completed.py. Please provide merge requests and log issues to improve it – maybe it should be part of standard Python?

When we run the same example program, but call limited_as_completed instead of the other versions:

async def print_when_done(tasks):
    for res in limited_as_completed(tasks, 10):
        print("Result %s" % await res)
coros = (mycoro(i) for i in range(1, 101))
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

We see output like this:

$ time python3 python-async.py
Starting 1
Starting 2
...
Starting 9
Starting 10
Finishing 10
Result 10
Starting 11
...
Finishing 100
Result 100
Finishing 1
Result 1

real	0m1.535s
user	0m1.436s
sys	0m0.084s

So we can see that the tasks are still running concurrently, but this time the number of concurrent tasks is limited to 10.

See also

To achieve a similar result using semaphores, see Python asyncio.semaphore in async-await function and Making 1 million requests with python-aiohttp.

It feels like limited_as_completed is more re-usable as an approach but I’d love to hear others’ thoughts on this. E.g. could/should I use a semaphore to implement limited_as_completed instead of manually holding a queue?

Basic ideas of Python 3 asyncio concurrency

Andy Balaam from Andy Balaam's Blog

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

Python 3’s asyncio module and the async and await keywords combine to allow us to do cooperative concurrent programming, where a code path voluntarily yields control to a scheduler, trusting that it will get control back when some resource has become available (or just when the scheduler feels like it). This way of programming can be very confusing, and has been popularised by Twisted in the Python world, and nodejs (among others) in other worlds.

I have been trying to get my head around the basic ideas as they surface in Python 3’s model. Below are some definitions and explanations that have been useful to me as I tried to grasp how it all works.

Futures and coroutines are both things that you can wait for.

You can make a coroutine by declaring it with async def:

import asyncio
async def mycoro(number):
    print("Starting %d" % number)
    await asyncio.sleep(1)
    print("Finishing %d" % number)
    return str(number)

Almost always, a coroutine will await something such as some blocking IO. (Above we just sleep for a second.) When we await, we actually yield control to the scheduler so it can do other work and wake us up later, when something interesting has happened.

You can make a future out of a coroutine, but often you don’t need to. Bear in mind that if you do want to make a future, you should use ensure_future, but this actually runs what you pass to it – it doesn’t just create a future:

myfuture1 = asyncio.ensure_future(mycoro(1))
# Runs mycoro!

But, to get its result, you must wait for it – it is only scheduled in the background:

# Assume mycoro is defined as above
myfuture1 = asyncio.ensure_future(mycoro(1))
# We end the program without waiting for the future to finish

So the above fails like this:

$ python3 ./python-async.py
Task was destroyed but it is pending!
task: <Task pending coro=<mycoro() running at ./python-async:10>>
sys:1: RuntimeWarning: coroutine 'mycoro' was never awaited

The right way to block waiting for a future outside of a coroutine is to ask the event loop to do it:

# Keep on assuming mycoro is defined as above for all the examples
myfuture1 = asyncio.ensure_future(mycoro(1))
loop = asyncio.get_event_loop()
loop.run_until_complete(myfuture1)
loop.close()

Now this works properly (although we’re not yet getting any benefit from being asynchronous):

$ python3 python-async.py
Starting 1
Finishing 1

To run several things concurrently, we make a future that is the combination of several other futures. asyncio can make a future like that out of coroutines using asyncio.gather:

several_futures = asyncio.gather(
    mycoro(1), mycoro(2), mycoro(3))
loop = asyncio.get_event_loop()
print(loop.run_until_complete(several_futures))
loop.close()

The three coroutines all run at the same time, so this only takes about 1 second to run, even though we are running 3 tasks, each of which takes 1 second:

$ python3 python-async.py
Starting 3
Starting 1
Starting 2
Finishing 3
Finishing 1
Finishing 2
['1', '2', '3']

asyncio.gather won’t necessarily run your coroutines in order, but it will return a list of results in the same order as its input.

Notice also that run_until_complete returns the result of the future created by gather – a list of all the results from the individual coroutines.

To do the next bit we need to know how to call a coroutine from a coroutine. As we’ve already seen, just calling a coroutine in the normal Python way doesn’t run it, but gives you back a “coroutine object”. To actually run the code, we need to wait for it. When we want to block everything until we have a result, we can use something like run_until_complete but in an async context we want to yield control to the scheduler and let it give us back control when the coroutine has finished. We do that by using await:

import asyncio
async def f2():
    print("start f2")
    await asyncio.sleep(1)
    print("stop f2")
async def f1():
    print("start f1")
    await f2()
    print("stop f1")
loop = asyncio.get_event_loop()
loop.run_until_complete(f1())
loop.close()

This prints:

$ python3 python-async.py
start f1
start f2
stop f2
stop f1

Now we know how to call a coroutine from inside a coroutine, we can continue.

We have seen that asyncio.gather takes in some futures/coroutines and returns a future that collects their results (in order).

If, instead, you want to get results as soon as they are available, you need to write a second coroutine that deals with each result by looping through the results of asyncio.as_completed and awaiting each one.

# Keep on assuming mycoro is defined as at the top
async def print_when_done(tasks):
    for res in asyncio.as_completed(tasks):
        print("Result %s" % await res)
coros = [mycoro(1), mycoro(2), mycoro(3)]
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

This prints:

$ python3 python-async.py
Starting 1
Starting 3
Starting 2
Finishing 3
Result 3
Finishing 2
Result 2
Finishing 1
Result 1

Notice that task 3 finishes first and its result is printed, even though tasks 1 and 2 are still running.

asyncio.as_completed returns an iterable sequence of futures, each of which must be awaited, so it must run inside a coroutine, which must be waited for too.

The argument to asyncio.as_completed has to be a list of coroutines or futures, not an iterable, so you can’t use it with a very large list of items that won’t fit in memory.

Side note: if we want to work with very large lists, asyncio.wait won’t help us here – it also takes a list of futures and waits for all of them to complete (like gather), or, with other arguments, for one of them to complete or one of them to fail. It then returns two sets of futures: done and not-done. Each of these must be awaited to get their results, so:

asyncio.gather

# is roughly equivalent to:

async def mygather(*args):
    ret = []
    for r in (await asyncio.wait(args))[0]:
        ret.append(await r)
    return ret

I am interested in running very large numbers of tasks with limited concurrency – see the next article for how I managed it.

Are Refactoring Tools Less Effective Overall?

Chris Oldwood from The OldWood Thing

Prior to the addition of automatic refactoring tools to modern IDEs refactoring was essentially a manual affair. You would make a code change, hit build, and then fix all the compiler errors (at least for statically typed languages). This technique is commonly known as “leaning on the compiler”. Naturally the operation could be fraught with danger if you were far too ambitious about the change, but knowing when you could lean on the compiler was part of the art of refactoring safely back then.

A Hypothesis

Having lived through both eras (manual and automatic) and paired with developers far more skilled with the automatic approach I’ve come up with a totally non-scientific hypothesis that suggests automatic refactoring tools are actually less effective than the manual approach, overall.

I guess the basis of this hypothesis pretty much hinges on what I mean by “effective”. Here I’m suggesting that automatic tools help you easily refactor to a local minima but not to a global minima [1]; consequently the codebase as a whole ends up in a less coherent state.

Shallow vs Deep Refactoring

The goal of an automatic refactoring tool appears to be to not break your code – it will only allow you to use it to perform a simple refactoring that can be done safely, i.e. if the tool can’t fix up all the code it can see [2] it won’t allow you to do it in the first place. The consequence of this is that the tool constantly limits you to taking very small steps. Watching someone refactor with a tool can sometimes seem tortuous as they may need to use so many little refactoring steps to get the code into the desired state because you cannot make the leaps you want in one go unless you switch to manual mode.

This by itself isn’t a bad thing, after all making a safe change is clearly A Good Thing. No, where I see the problem is that by fixing up all the call sites automatically you don’t get to see the wider effects of the refactoring you’re attempting.

For example the reason you’d choose to rename a class or method is because the existing one is no longer appropriate. This is probably because you’re learned something new about the problem domain. However that class or method does not exist in a vacuum, it has dependencies in the guise of variable names and related types. It’s entirely likely that some of these may now be inappropriate too, however you won’t easily see them because the tool has likely hidden them from you.

Hence one of the “benefits” of the old manual refactoring approach was that as you visited each broken call site you got to reflect on your change in the context of where it’s used. This often led to further refactorings as you began to comprehend the full nature of what you had just discovered.

Blue or Red Pill?

Of course what I’ve just described could easily be interpreted as the kind of “black hole” that many, myself included, would see as an unbounded unit of work. It’s one of those nasty rabbit holes where you enter and, before you know it, you’re burrowing close to the Earth’s core and have edited nearly every file in the entire workspace.

Yes, like any change, it takes discipline to stick to the scope of the original problem. Just because you keep unearthing more and more code that no longer appears to fit the new model it does not mean you have to tackle it right now. Noticing the disparity is the first step towards fixing it.

Commit Review

It’s not entirely true that you won’t see the entire outcome of the refactoring – at the very least the impact will be visible when you review the complete change before committing. (For a fairly comprehensive list of the things I go through at the point I commit see my C Vu article “Commit Checklist”.)

This assumes of course that you do a thorough review of your commits before pushing them. However by this point, just as writing tests after the fact are considerably less attractive, so is finishing off any refactoring; perhaps even more so because the code is not broken per-se, it just might not be the best way of representing the solution.

It’s all too easy to justify the reasons why it’s okay to go ahead and push the change as-is because there are more important things to do. Even if you think you’re aware of technical debt it often takes a fresh pair of eyes to see how you’re living in a codebase riddled with inconsistencies that make it hard to see it’s true structure. One is then never quite sure without reviewing the commit logs what is the legacy and what is the new direction.

Blinded by Tools

Clearly this is not the fault of the tool or their vendors. What they offer now is far more favourable than not having them at all. However once again we need to be reminded that we should not be slaves to our tools but that we are the masters. This is a common theme which is regularly echoed in the software development community and something I myself tackled in the past with “Don’t Let Your Tools Pwn You”.

The Boy Scout Rule (popularised by Uncle Bob) says that we should always leave the camp site cleaner than we found it. While picking up a handful of somebody else’s rubbish and putting it in the bin might meet the goal in a literal sense, it’s no good if the site is acquiring rubbish faster than it’s being collected.

Refactoring is a technique for improving the quality of a software design in a piecewise fashion; just be careful you don’t spend so long on your hands and knees cleaning small areas that you fail to spot the resulting detritus building up around you.

 

[1] I wasn’t sure whether to say minima or maxima but I felt that refactoring was about lowering entropy in some way so went with the reduction metaphor.

[2] Clearly there are limits around published APIs which it just has to ignore.

Excel-style DDE Requests

Chris Oldwood from The OldWood Thing

Despite being over 2 decades old Microsoft’s Dynamic Data Exchange (DDE) in Windows still seems to be in use for Windows IPC by a not insignificant number of companies. At least, if the frequency of DDE questions in my inbox is anything to go by [1][2].

Earlier this year I got a question from someone who was trying to use my DDE Command tool (a command line tool for querying DDE servers) to get data out of the MetaTrader 4 platform. Finance is the area I first came across DDE in anger and it still seems to be a popular choice there even to this day.

Curious Behaviour

The problem was that when they used the ddecmdrequest” verb to send an XTYP_REQUEST message to the MetaTrader 4 DDE Server (MT4) for a symbol they always got an immediate result of “N/A”. As a workaround they tried using the “advise” verb, which sends an XTYP_ADVSTART, to listen for updates for a short period instead. This worked for symbols which changed frequently but missed those that didn’t change during the interval. Plus this was a dirty hack as they had to find a way to send a CTRL+C to my tool to stop it after this short interval.

Clearly the MetaTrader DDE server couldn’t be this broken, and the proof was that it worked fine with Microsoft Excel – the other stalwart of the finance industry. Hence the question posed to me was why Excel appeared to work, but sending a request from my tool didn’t, i.e. was there a bug in my tool?

Reproducing the Problem

Given the popularity of the MetaTrader platform and Microsoft Excel the application of Occam’s Razor would suggest a bug in my tool was clearly the most likely answer, so I investigated…

Luckily MetaTrader 4 is a free download and they will even give you a demo account to play with which is super welcome for people like me who only want to use the platform to fix interop problems in their own tools and don’t actually want to use it to trade.

I quickly reproduced the problem by sending a DDE request for a common symbol:

> DDECmd.exe request -s MT4 -t QUOTE -i COPPER
N\A

And then I used the DDE advise command to see it working for background updates:

> DDECmd.exe advise -s MT4 -t QUOTE -i COPPER
2.5525 2.5590
2.5520 2.5590
. . . 

I also tried it in Excel too to see that it was successfully managing to request the current value, even for slow ticking symbols.

How Excel Requests Data Via DDE

My DDE Command tool has a nice feature where it can also act as a DDE server and logs the different requests sent to it. This was originally added by me to help diagnose problems in my own DDE client code but it’s also been useful to see how other DDE clients behave.

As you can see below, when Excel opens a DDE link (=TEST|TEST!X) it actually sends a number of different XTYP_ADVSTART messages as it tries to find the highest fidelity format to receive the data in:

> DDECmd.exe listen –s TEST –t TEST
XTYP_CONNECT: 'TEST', 'TEST'
XTYP_CONNECT_CONFIRM: 'TEST', 'TEST'
XTYP_ADVSTART: 'T…', 'T…', 'StdDocumentName', '49157'
XTYP_ADVSTART: 'TEST', 'TEST', 'X', '50018'
. . .
XTYP_REQUEST: 'TEST', 'TEST', 'X', '50018'

After it manages to set-up the initial advise loop it then goes on to send a one-off XTYP_REQUEST to retrieve the initial value. So, apart from the funky data formats it asks for, there is nothing unusual about the DDE request Excel seems to make.

Advise Before Request

And then it dawned on me, what if the MetaTrader DDE server required an advise loop to be established on a symbol before you’re allowed to request it? Sure enough, I hacked a bit of code into my request command to start an advise loop first and the subsequent DDE request succeeded.

I don’t know if this is a bug in the MetaTrader 4 DDE server or the intended behaviour. I suspect the fact that it works with Excel covers the vast majority of users so maybe it’s never been a priority to support one-off data requests. The various other financial DDE servers I coded against circa 2000 never exhibited this kind of requirement – you could make one-off requests for data with a standalone XTYP_REQUEST message.

The New Fetch Command

The original intent of my DDE Command tool was to provide a tool that allows each XTYP_* message to be sent to a DDE server in isolation, mostly for testing purposes. As such the tools’ verbs pretty much have a one-to-one correspondence with the DDE messages you might send yourself.

To allow people to use my tool against the MetaTrader 4 platform to snapshot data would therefore mean making some kind of small change. I did consider adding various special switches to the existing request and advise verbs, either to force an advise first or to force a request if no immediate update was received but that seemed to go against the ethos a bit.

In the end I decided to add a new verb called “fetch” which acts just like “request”, but starts an advise loop for every item first, then sends a request message for the latest value, thereby directly mimicking Excel.

> DDECmd.exe fetch -s MT4 -t QUOTE -i COPPER -i SILVER
COPPER|2.6075 2.6145
SILVER|16.771 16.821

Hey presto it now works!

This feature was released in DDE Command v1.6.

 

[1] This is a bit of artistic licence :o), they are not a daily occurrence but once every couple of months wouldn’t be far off. So yes, “DDE Is Still Alive & Kicking”.

[2] Most recently it seems quite a few people are beginning to discover that Microsoft dropped NetDDE support way back in Windows Vista.

C++ iterator wrapping a stream not 1-1

Andy Balaam from Andy Balaam&#039;s Blog

Series: Iterator, Iterator Wrapper, Non-1-1 Wrapper

Sometimes we want to write an iterator that consumes items from some underlying iterator but produces its own items slower than the items it consumes, like this:

ColonSep items("aa:foo::x");
// Prints "aa, foo, , x"
for(auto s : items)
{
    std::cout << s << ", ";
}

When we pass a 9-character string (i.e. an iterator that yields 9 items) to ColonSep, above, we only repeat 4 times in our loop, because ColonSep provides an iterable range that yields one value for each whole word it finds in the underlying iterator of 9 characters.

To do something like this I'd recommend consuming the items of the underlying iterator early, so it is ready when requested with operator*. We also need our iterator class to hold on to the end of the underlying iterator as well as the current position.

First we need a small container to hold the next item we will provide:

struct maybestring
{
    std::string value_;
    bool at_end_;

    explicit maybestring(const std::string value)
    : value_(value)
    , at_end_(false)
    {}

    explicit maybestring()
    : value_("--error-past-end--")
    , at_end_(true)
    {}
};

A maybestring either holds the next item we will provide, or at_end_ is true, meaning we have reached the end of the underlying iterator and we will report that we are at the end ourself when asked.

Like the simpler iterators we have looked at, we still need a little container to return from the postincrement operator:

class stringholder
{
    const std::string value_;
public:
    stringholder(const std::string value) : value_(value) {}
    std::string operator*() const { return value_; }
};

Now we are ready to write our iterator class, which always has the next value ready in its next_ member, and holds on to the current and end positions of the underlying iterator in wrapped_ and wrapped_end_:

class myit
{
private:
    typedef std::string::const_iterator wrapped_t;
    wrapped_t wrapped_;
    wrapped_t wrapped_end_;
    maybestring next_;

The constructor holds on the underlying iterator pointers, and immediately fills next_ with the next value by calling next_item passing in true to indicate that this is the first item:

public:
    myit(wrapped_t wrapped, wrapped_t wrapped_end)
    : wrapped_(wrapped)
    , wrapped_end_(wrapped_end)
    , next_(next_item(true))
    {
    }

    // Previously provided by std::iterator
    typedef int                     value_type;
    typedef std::ptrdiff_t          difference_type;
    typedef int*                    pointer;
    typedef int&                    reference;
    typedef std::input_iterator_tag iterator_category;

next_item looks like this:

private:
    maybestring next_item(bool first_time)
    {
        if (wrapped_ == wrapped_end_)
        {
            return maybestring();  // We are at the end
        }
        else
        {
            if (!first_time)
            {
                ++wrapped_;
            }
            return read_item();
        }
    }

next_item recognises whether we've reached the end of the underlying iterator and saves the empty maybstring if so. Otherwise, it skips forward once (unless we are on the first element) and then calls read_item:

    maybestring read_item()
    {
        std::string ret = "";
        for (; wrapped_ != wrapped_end_; ++wrapped_)
        {
            char c = *wrapped_;
            if (c == ':')
            {
                break;
            }
            ret += c;
        }
        return maybestring(ret);
    }

read_item implements the real logic of looping through the underlying iterator and combining those values together to create the next item to provide.

The hard part of the iterator class is done, leaving only the more normal functions we must provide:

public:
    value_type operator*() const
    {
        assert(!next_.at_end_);
        return next_.value_;
    }

    bool operator==(const myit& other) const
    {
        // We only care about whether we are at the end
        return next_.at_end_ == other.next_.at_end_;
    }

    bool operator!=(const myit& other) const { return !(*this == other); }

    stringholder operator++(int)
    {
        assert(!next_.at_end_);
        stringholder ret(next_.value_);
        next_ = next_item(false);
        return ret;
    }

    myit& operator++()
    {
        assert(!next_.at_end_);
        next_ = next_item(false);
        return *this;
    }
}

Note that operator== is only concerned with whether or not we are an end iterator or not. Nothing else matters for providing correct iteration.

Our final bit of bookkeeping is the range class that allows our new iterator to be used in a for loop:

class ColonSep
{
private:
    const std::string str_;
public:
    ColonSep(const std::string str) : str_(str) {}
    myit begin() { return myit(std::begin(str_), std::end(str_)); }
    myit end()   { return myit(std::end(str_),   std::end(str_)); }
};

A lot of the code above is needed for all code that does this kind of job. Next time we'll look at how to use templates to make it useable in the general case.

My Dislike of GOPATH

Chris Oldwood from The OldWood Thing

[This post was written in response to a tweet from Matt Aimonetti which asked “did you ever try #golang? If not, can you tell me why? If yes, what was the first blocker/annoyance you encountered?”]

I’m a dabbler in Go [1], by which I mean I know enough to be considered dangerous but not enough to be proficient. I’ve done a number of katas and even paired on some simple tools my team has built in Go. There is so much to like about it that I’ve had cause to prefer looking for 3rd party tools written in it in the faint hope that I might at some point be able to contribute one day. But, every time I pull the source code I end up wasting so much time trying to get the thing to build because Go has its own opinions about how the source is laid out and tools are built that don’t match the way I (or most other tools I’ve used) work.

Hello, World!

Go is really easy to get into, on Windows it’s as simple as:

> choco install golang

This installs the compiler and standard libraries and you’re all ready to get started. The obligatory first program is as simple as doing:

> pushd \Dev
> mkdir HelloWorld
> notepad HelloWorld.go
> go build HelloWorld.go
> HelloWorld

So far, so good. It’s pretty much the same as if you were writing in any other compiled language – create folder, create source file, build code, run it.

Along the way you may have run into some complaint about the variable GOPATH not being set. It’s easily fixed by simply doing:

> set GOPATH=%CD%

You might have bothered to read up on all the brouhaha, got side-tracked and discovered that it’s easy to silence the complaint without any loss of functionality by setting it to point to anywhere. After all the goal early on is just to get your first program built and running not get bogged down in tooling esoterica.

When I reached this point I did a few katas and used a locally installed copy of the excellent multi-language exercise tool cyber-dojo.org to have a play with the test framework and some of the built-in libraries. Using a tool like cyber-dojo meant that GOPATH problems didn’t rear their ugly head again as it was already handled by the tool and the katas only needed standard library stuff.

The first non-kata program my team wrote in Go (which I paired on) was a simple HTTP smoke test tool that also just lives in the same repo as the service it tests. Once again their was nary a whiff of GOPATH issues here either – still simple.

Git Client, But not Quite

The problems eventually started for me when I tried to download a 3rd party tool off the internet and build it [2]. Normally, getting the source code for a tool from an online repository, like GitHub, and then building it is as simple as:

> git clone https://…/tool.git
> pushd tool
> build

Even if there is no Windows build script, with the little you know about Go at this point you’d hope it to be something like this:

> go build

What usually happens now is that you start getting funny errors about not being able to find some code which you can readily deduce is some missing package dependency.

This is the point where you start to haemorrhage time as you seek in vain to fix the missing package dependency without realising that the real mistake you made was right back at the beginning when you used “git clone” instead of “go get”. Not only that but you also forgot that you should have been doing all this inside the “%GOPATH%\src” folder and not in some arbitrary TEMP folder you’d created just to play around in.

The goal was likely to just build & run some 3rd party tool in isolation but that’s not the way Go wants you to see the world.

The Folder as a Sandbox

The most basic form of isolation, and therefore version control, in software development is the humble file-system folder [3]. If you want to monkey with something on the side just make a copy of it in another folder and you know your original is safe. This style of isolation (along with other less favourable forms) is something I’ve written about in depth before in my C Vu In the Toolbox column, see “The Developer’s Sandbox”.

Unfortunately for me this is how I hope (expect) all tools to work out of the box. Interestingly Go is a highly opinionated language (which is a good thing in many cases) that wants you to do all your coding under one folder, identified by the GOPATH variable. The rationale for this is that it reduces friction from versioning problems and helps ensure everyone, and everything, is always using a consistent set of dependencies – ideally the latest.

Tool User, Not Developer

That policy makes sense for Google’s developers working on their company’s tools, but I’m not a Google developer, I’m a just a user of the tool. My goal is to be able to build and run the tool. If there happens to be a simple bug that I can fix, then great, I’d like to do that, but what I do not have the time for is getting bogged down in library versioning issues because the language believes everyone, everywhere should be singing from the same hymn sheet. I’m more used to a world where dependencies move forward at a pace dictated by the author not by the toolchain itself. Things can still move quickly without having to continually live on the bleeding edge.

Improvements

As a Go outsider I can see that the situation is definitely improving. The use of a vendor subtree to house a snapshot of the dependencies in source form makes life much simpler for people like me who just want to use the tool hassle free and dabble occasionally by fixing things here and there.

In the early days when I first ran into this problem I naturally assumed it was my problem and that I just needed to set the GOPATH variable to the root of the repo I had just cloned. I soon learned that this was a fools errand as the repo also has to buy into this and structure their source code accordingly. However a variant of this has got some traction with the gb tool which has (IMHO) got the right idea about isolation but sadly is not the sanctioned approach and so you’re dicing with the potential for future impedance mismatches. Ironically to build and install this tool requires GOPATH to still be working the proper way.

The latest version of Go (1.8) will assume a default location for GOPATH (a “go” folder under your profile) if it’s not set but that does not fix the fundamental issue for me which is that you need to understand that any code you pull down may be somewhere unrelated on your file-system if you don’t understand how all this works.

Embracing GOPATH

Ultimately if I am going to properly embrace Go as a language, and I would like to do more, I know that I need to stop fighting the philosophy and just “get with the programme”. This is hard when you have a couple of decades of inertia to overcome but I’m sure I will eventually. It’s happened enough times now that I know what the warnings signs are and what I need to Google to do it “the Go way”.

Okay, so I don’t like or agree with all of (that I know) the choices the Go language has taken but then I’m always aware of the popular quote by Bjarne Stroustrup:

“There are only two kinds of languages: the ones people complain about and the ones nobody uses.”

This post is but one tiny data point which covers one of the very few complaints I have about Go, and even then it’s just really a bit of early friction that occurs at the start of the journey if you’re not a regular. Still, better that than yet another language nobody uses.

 

[1] Or golang if you want to appease the SEO crowd :o).

[2] It was winrm-cli as I was trying to put together a bug report for Terraform and there was something going wrong with running a Windows script remotely via WinRM.

[3] Let’s put old fashioned technologies like COM to one side for the moment and assume we have learned from past mistakes.

Why I don’t like getter and setter functions in C++, reason #314.15

Timo Geusch from The Lone C++ Coder&#039;s Blog

This is a post I wrote several years ago and it’s been languishing in my drafts folder ever since. I’m not working on this particular codebase any more. That said, the problems caused by using Java-like getter and setter functions as the sole interface to an object in the context described in the post have […]

The post Why I don’t like getter and setter functions in C++, reason #314.15 appeared first on The Lone C++ Coder's Blog.