Adding a concurrency limit to Python’s asyncio.as_completed

Andy Balaam from Andy Balaam's Blog

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

In the previous post I demonstrated how the limited_as_completed method allows us to run a very large number of tasks using concurrency, but limiting the number of concurrent tasks to a sensible limit to ensure we don’t exhaust resources like memory or operating system file handles.

I think this could be a useful addition to the Python standard library, so I have been working on a modification to the current asyncio.as_completed method. My work so far is here: limited-as_completed.

I ran similar tests to the ones I ran for the last blog post with this code to validate that the modified standard library version achieves the same goals as before.

I used an identical copy of timed from the previous post and updated versions of the other files because I was using a much newer version of aiohttp along with the custom-built python I was running.

server looked like:

#!/usr/bin/env python3

from aiohttp import web
import asyncio
import random

async def handle(request):
    await asyncio.sleep(random.randint(0, 3))
    return web.Response(text="Hello, World!")

app = web.Application()
app.router.add_get('/{name}', handle)

web.run_app(app)

client-async-sem needed me to add a custom TCPConnector to avoid a new limit on the number of concurrent connections that was added to aiohttp in version 2.0. I also need to move the ClientSession usage inside a coroutine to avoid a warning:

#!/usr/bin/env python3

from aiohttp import ClientSession, TCPConnector
import asyncio
import sys

limit = 1000

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

async def bound_fetch(sem, url, session):
    # Getter function with semaphore.
    async with sem:
        await fetch(url, session)

async def run(r):
    with ClientSession(connector=TCPConnector(limit=limit)) as session:
        url = "http://localhost:8080/{}"
        tasks = []
        # create instance of Semaphore
        sem = asyncio.Semaphore(limit)
        for i in range(r):
            # pass Semaphore and session to every GET request
            task = asyncio.ensure_future(
                bound_fetch(sem, url.format(i), session))
            tasks.append(task)
        responses = asyncio.gather(*tasks)
        await responses

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.ensure_future(run(int(sys.argv[1]))))

My new code that uses my proposed extension to as_completed looked like:

#!/usr/bin/env python3

from aiohttp import ClientSession, TCPConnector
import asyncio
import sys

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

limit = 1000

async def print_when_done():
    with ClientSession(connector=TCPConnector(limit=limit)) as session:
        tasks = (fetch(url.format(i), session) for i in range(r))
        for res in asyncio.as_completed(tasks, limit=limit):
            await res

r = int(sys.argv[1])
url = "http://localhost:8080/{}"
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done())
loop.close()

and with these, we get similar behaviour to the previous post:

$ ./timed ./client-async-sem 10000
Memory usage: 73640KB	Time: 19.18 seconds
$ ./timed ./client-async-stdlib 10000
Memory usage: 49332KB	Time: 18.97 seconds

So the implementation I plan to submit to the Python standard library appears to work well. In fact, I think it is better than the one I presented in the previous post, because it uses on_complete callbacks to notice when futures have completed, which reduces the busy-looping we were doing to check for and yield finished tasks.

The Python issue is bpo-30782 and the pull request is #2424.

Note: at first glance, it looks like the aiohttp.ClientSession‘s limit on the number of connections (introduced in version 1.0 and then updated in version 2.0) gives us what we want without any of this extra code, but in fact it only limits the number of connections, not the number of futures we are creating, so it has the same problem of unbounded memory use as the semaphore-based implementation.

Call a dll from Python

Frances Buontempo from BuontempoConsulting

Letting VS2015 make a dll called SomeDLL for me with these implementations

// SomeDll.cpp : 
// Defines the exported functions for the DLL application.
//

#include "stdafx.h"
#include "SomeDll.h"


// This is an example of an exported variable
SOMEDLL_API int nSomeDll=0;

// This is an example of an exported function.
SOMEDLL_API int fnSomeDll(void)
{
    return 42;
}

// This is the constructor of a class that has been exported.
// see SomeDll.h for the class definition
CSomeDll::CSomeDll()
{
    return;
}
 

I then make a python script, using ctypes and loading the dll, using os to find it:

import os
import ctypes

os.chdir("C:\\Users\\sbkg525\\src\\SomeDll\\Debug")
SomeDll = ctypes.WinDLL("SomeDll.dll")


I can either use attributes of the library or use protoypes. I tried protoypes first.The function returns an int and takes no parameters:

proto = ctypes.WINFUNCTYPE(ctypes.c_int)
params = ()

answer = proto(("fnSomeDll", SomeDll), params)


Unfortunately this says

AttributeError: function 'fnSomeDll' not found

because C++ is name mangled. extern "C" FTW another time; for now

link.exe /dump /exports Debug\SomeDll.dll

          1    0 0001114F ??0CSomeDll@@QAE@XZ =

                  @ILT+330(??0CSomeDll@@QAE@XZ)
          2    1 00011244 ??4CSomeDll@@QAEAAV0@$$QAV0@@Z =

                  @ILT+575(??4CSomeDll@@QAEAAV0@$$QAV0@@Z)
          3    2 0001100A ??4CSomeDll@@QAEAAV0@ABV0@@Z =

                  @ILT+5(??4CSomeDll@@QAEAAV0@ABV0@@Z)
          4    3 0001111D ?fnSomeDll@@YAHXZ =

                  @ILT+280(?fnSomeDll@@YAHXZ)
          5    4 00018138 ?nSomeDll@@3HA =

                  ?nSomeDll@@3HA (int nSomeDll)

Looks like we want 4;

answer = proto(("?fnSomeDll@@YAHXZ", SomeDll), params)

print answer

>>> <WinFunctionType object at 0x0248A7B0>


Of course. It's a function. Let's call the function

print answer()
>>> 42


Done. I'll try attributes next. And functions which take parameters. Later, I'll look at other calling conventions.

First, the docs says, "foreign functions can be accessed as attributes of loaded shared libraries.
Given

extern "C"
{
    SOMEDLL_API void hello_world()
    {
        //now way of telling this has worked
    }

}

we call it like this

lib = ctypes.cdll.LoadLibrary('SomeDll.dll')
lib.hello_world()


It seems to assume a return type of int. For example,

extern "C"
{
    SOMEDLL_API double some_number()
    {
        return 1.01;
    }

}

called as follows

lib = ctypes.cdll.LoadLibrary('SomeDll.dll')
lib.hello_world()
val = lib.some_number()
print val, type(val)

Gives

-858993460, <type 'int'>


We need to specify the return type, since it's not an int:



lib.some_number.restype = ctypes.c_double
val = lib.some_number()
print val, type(val)


then we get what we want

1.01 <type 'float'>

We need to do likewise for parameters

extern "C"
{
    SOMEDLL_API double add_numbers(double x, double y)
    {
        return x + y;
    }
}


If we just try to call it with some floats we get an error

    total = lib.add_numbers(10.5, 25.7)
ctypes.ArgumentError: argument 1: <type 'exceptions.TypeError'>: Don't know how to convert parameter 1


Stating the parameter type fixes this

lib.add_numbers.restype = ctypes.c_double
lib.add_numbers.argtypes = [ctypes.c_double, ctypes.c_double]
total = lib.add_numbers(10.5, 25.7)
print total, type(total)


36.2 <type 'float'>

Just one starter thought on strings. Returning a const char * seems to give a str in python

SOMEDLL_API const char * speak()
{
    return "Hello";


called like this



lib.speak.restype = ctypes.c_char_p
print lib.speak()


says "Hello"; using parameters as strings and particularly as refrences that can be changed needs some investigation.

Does a similar approach using attributes of the loaded library work for non-extern "C" functions? Can I use the proto approach on the extern "C" functions?

Let's see...







Codes of Conduct

Frances Buontempo from BuontempoConsulting

There's been yet another thread about code of conducts at conferences on twitter, and I wanted to compare them to conditions of sale for gig or festival tickets. Here's an example:
Let's consider a sample:
1. I can't turn up before it starts? Why even say that? Oh, and 2. I have to leave when it's over. THIS IS PC GONE MAD. OK, not PC. But why do you need to spell this out? It's like putting a precondition of a string has to terminate with a null for you to call strlen.
3. Yeah, rulez. Whatever. Surprised you didn't say it's illegal to break the law. What is this?
Blahblahblah.
9. Yeah, fair enough. But what about flame throwers?
No, no, no...
16. Does that mean I can bring *Real* weapons?
Blahblahblah.

25. There might be swear words?! Why tell me that?

Or maybe some people will bring kids with them and that might need a sane conversation about a time and a place for certain behaviours. I know the bands, I know what kind of music to expect. I know what I'm walking in to.

I also know there's a clearly marked welfare tent on site just in case.

And wardens on the campsite wearing obvious vests.

And hundreds of programmers listening to their favourite rock stars.
(OK, not all the punters are programmers, but many are).

Why did I ever go to a field full of metallers? (Knowing I'd probably be one of the few women there). I went with friends, which made me feel safer. I wanted to listen to bands I knew, and discover new music. If I'd gone by myself, knowing in advance about wardens and the welfare tent would have made me feel OK. Hey, the conditions of sale show the organisers have thought about things that might go wrong or concern people, and that makes me feel safer.

Do they make people think, "This is PC gone mad? If you need to tell me not to set fire to myself and drink myself to death then you are insulting me and I want nothing to do with this?"

Not by the looks of the number of people who turn up. And there have been more women and kids recently which is great.

What are codes of conduct for? Quite frankly me. But not just me. They are not there to tell you how to behave because the organisers think white guys don't know how to conduct themselves, and that all such guys are potential rapists or murders. In some ways, the logic of stating a CoC is pointless because we know how to be nice taken to extreme could mean laws are pointless. Why say "You aren't allowed to commit murder?" Surely that doesn't need pointing out? And yet most countries have such a law.

Conditions of sale and code of conducts aren't laws, but they give a shared statement of expectation and make me feel OK about going to conferences/talks/festivals/gigs alone. And then I meet loads of new people and discover new music. And I can't wait til Bloodstock. Or the next tech talk/conference I go to.





Refactoring – Before or After?

Chris Oldwood from The OldWood Thing

I recently worked on a codebase where I had a new feature to implement but found myself struggling to understand the existing structure. Despite paring a considerable amount I realised that without other people to easily guide me I still got lost trying to find where I needed to make the change. I felt like I was walking through a familiar wood but the exact route eluded me without my usual guides.

I reverted the changes I had made and proposed that now might be a good point to do a little reorganisation. The response was met with a brief and light-hearted game of “Ken Beck Quote Tennis” - some suggested we do the refactoring before the feature whilst others preferred after. I felt there was a somewhat superficial conflict here that I hadn’t really noticed before and wondered what the drivers might be to taking one approach over the other.

Refactor After

If you’re into Test Driven Development (TDD) then you’ll have the mantra “Red, Green, Refactor” firmly lodged in your psyche. When practicing TDD you first write the test, then make it pass, and finally finish up by refactoring the code to remove duplication or otherwise simplify it. Ken Beck’s Test Driven Development: By Example is probably the de facto read for adopting this practice.

The approach here can be seen as one where the refactoring comes after you have the functionality working. From a value perspective most of it comes from having the functionality itself – the refactoring step is an investment in the codebase to allow future value to be added more easily later.

Just after adding a feature is the point where you’ve probably learned the most about the problem at hand and so ensuring the design best represents your current understanding is a worthwhile aid to future comprehension.

Refactor Before

Another saying from Kent Beck that I’m particularly fond of is “make the change easy, then make the easy change” [1]. Here he is alluding to a dose of refactoring up-front to mould the codebase into a shape that is more amenable to allowing you to add the feature you really want.

At this point we are not adding anything new but are leaning on all the existing tests, and maybe improving them too, to ensure that we make no functional changes. The value here is about reducing the risk of the new feature by showing that the codebase can safely evolve towards supporting it. More importantly It also gives the earliest visibility to others about the new direction the code will take [2].

We know the least amount about what it will take to implement the new feature at this point but we also have a working product that we can leverage to see how it’s likely to be impacted.

Refactor Before, During & After

Taken at face value it might appear to be contradictory about when the best time to refactor is. Of course this is really a straw man argument as the best time is in fact “all the time” – we should continually keep the code in good shape [3].

That said the act of refactoring should not occur within a vacuum, it should be driven by a need to make a more valuable change. If the code never needed to change we wouldn’t be doing it in the first place and this should be borne in mind when working on a large codebase where there might be a temptation to refactor purely for the sake of it. Seeing stories or tasks go on the backlog which solely amount to a refactoring are a smell and should be heavily scrutinised.

Emergent Design

That said, there are no absolutes and whilst I would view any isolated refactoring task with suspicion, that is effectively what I was proposing back at the beginning of this post. One of the side-effects of emergent design is that you can get yourself into quite a state before a cohesive design finally emerges.

Whilst on paper we had a number of potential designs all vying for a place in the architecture we had gone with the simplest possible thing for as long as possible in the hope that more complex features would arrive on the backlog and we would then have the forces we needed to evaluate one design over another.

Hence the refactoring decision became one between digging ourselves into an even deeper hole first, and then refactoring heavily once we had made the functional change, or doing some up-front preparation to solidify some of the emerging concepts first. There is the potential for waste if you go too far down the up-front route but if you’ve been watching how the design and feature list have been emerging over time it’s likely you already know where you are heading when the time comes to put the design into action.

 

[1] I tend to elide the warning from the original quote about the first part potentially being hard when saying it out loud because the audience is usually well aware of that :o).

[2] See “The Cost of Long-Lived Feature Branches” for a cautionary tale about storing up changes.

[3] See “Relentless Refactoring” for the changes in attitude towards this practice.

Pride Vibes 2017: Coventry Pride

Samathy from Stories by Samathy on Medium

Laura Tapp performs on Saturday evening at Coventry Pride

Pride Vibes: As a photographer for Gay Pride Pics, I see lots of Prides across the UK every year. Each Pride has a different feel. This series will describe what each Pride was like and what the vibe of the pride was like.

The entire series is my opinion and mine only. Take it as you will. Note that this opinion comes from a 20 something extroverted transwoman who is herself a pride organiser.

I’m still working out what this series is going to be like. Bear with me.

Full Disclosure: I am a trustee and organiser of Coventry Pride. I work specifically on Press and Publicity. This post will be more biased than normal.

Previous: Birmingham Pride

Coventry Pride is in its 3rd year. It’s a smallish pride with a big personality. It brands itself as a ‘Community Pride’ and makes a huge effort to welcome all members of the extended LGBTQIA+ community.

Coventry is a completely free pride welcoming attendees in to a new venue in the centre of the city, University Square.
The venue is much larger than the previous at FarGo Village allowing for massive expansion this year.

Situated right outside the iconic dual Cathedrals the new venue proved much better than last year.
It’s a much bigger space and the event felt very open and a lot less crowded than last year’s pride.
Although, it certainly didn't feel packed, it certainly felt buzzing with happy energy.

The main stage was fairly large and could be seen and heard throughout most of the main square area. Most of the acts proved to be a great success with the audience.
Most of them were musical acts with a smattering of comedy, with the drag acts largely kept to the cabaret stage which is inside Square One, the Coventry University Student’s Union bar and club.

The main square area also featured a bar, which, despite its central location did not result in large amounts of drunk people making me feel uncomfortable.
In fact, I don’t think I saw anyone I would be able to describe as anything other than sober. Which, for me is a good thing.

The supporting stages in Square One featured smaller musical acts and community comedy performances.
Welcomed there were a host of great acoustic musical performances, drag and both small-time and big-time comedy.

The supporting stages welcomed smaller, intimate performances.

The diversity of the event was wonderful.
Being a free event, Coventry Pride is lucky to be able to be open to people of all ages and orientations.
There were lots of families and younger people attending, which was fantastic to see and really helps with the open and welcoming feel of the event.
We saw young and old people attending and having fun. Most importantly we saw people from so many identities! At least, as far as I could tell from the flags people were wearing.
This really helped me and a lot of my friends feel included in the event. The presence of such a diverse set of people really supported the comfortable feeling of a space where one could be themselves without worry.

As well as the outside main area, we also had an inside arrangement of community stalls.

The community stalls were a huge part of the event.

The stalls were as diverse as the attending crowd. We had everything from stalls selling art, offering counselling services, information about scouting, LGBT+ identities and so much more.
We’re really pleased with the amount of community stalls we had and they really helped to maintain the vibe that Coventry Pride aims for. People supporting other people.

The stalls were located inside The Herbert Art Gallery and Museum which proved to be a near perfect venue.

The Herbert provided their huge atrium space for the community stalls which offered a lot of space for attendees to peruse the stalls at their leisure.
The Herbert also offered their gender neutral toilets for Coventry Pride attendees, which was a great bonus to go along with the portaloos and toilets in the University SU.

One thing I noticed at the event was the policing, or lack of.
There were officers at the event and around the surrounding area, but they felt much more like participants in the event rather than bystanders.
We saw officers taking photos, talking to stall holders and seemingly having a great time, just like everyone else.

The police were as much participants as everyone else.

The Sargent on duty on Saturday was a great guy who was super interested in learning as much about the LGBT+ community as he could, asking polite questions when appropriate. He even requested to be posted at Coventry Pride on the Sunday too because he enjoyed the event so much.

A huge event at Coventry Pride 2017 was The Blessing of Haley Bridge and Claire Haines.

Haley and Claire were married earlier on the Sunday at the Guild Hall just up the road from University Square.
Following their wedding, we welcomed the beautiful brides to Coventry Pride and the Chair of Coventry Pride, Paul Desson-Baxter, blessed the wedding in front of 100s of people.

The blessing was a beautiful event. It truly was wonderful to see two women married and celebrate their marriage at a Pride event.

All in all, Coventry Pride was a fantastic event. We’re super happy to have been able to offer an event welcoming such a diverse selection of people from our community.
We’re glad that we’re able to offer the event for free to everyone who wants to come, giving access to all those who need a Pride in their local area.

As always, the volunteers and organisers of Coventry Pride did an amazing job and put in an awful lot of effort to make it happen.

Know your hammer from your screwdriver: The right tool for the job

Paul Grenyer from Paul Grenyer

As software developers, we at Naked Element, are skilled and experienced in a number of different programming languages and aware of many, many more. Choosing the right programming language for a piece of software is as important as choosing a hammer to knock in a nail, a flat headed screw driver for a flat headed screw and a cross headed screwdriver for cross headed screw. However with software it’s far more complicated as there isn’t always just one tool for the job.

It’s also important to consider the skills you have at hand. For example, you wouldn’t usually ask a plumber to fix your electrics or an electrician to fix your plumbing. However, given enough time most plumbers could learn to do electrics and vice versa. Generally people with a talent for practical things can easily pick up other practical skills. It’s the same with software developers, but you have to consider whether the investment in new skills will return sufficient results in an acceptable time frame, or whether to risk compromising your margins by bringing in already experienced outside help. It’s not an easy decision!

Software developers (the good ones at least) love learning new things - programming languages in particular - but there are divisions of course. Some software developers are only interested in writing software for Microsoft Windows, for example, or for Open Source platforms such as Linux and the tools they use are quite different. It’s even more pronounced with Android developers and iPhone developers! You don’t often get developers who like a bit of everything, but it does happen, and those are the sorts of developers we have at Naked Element.

It’s true that we’d happily write Java (a general purpose programming language aimed at open source software development) all day long, but that wouldn’t allow us to develop complete pieces of software. We regularly use various combinations of Java, Ruby on Rails and JavaScript in order to get the best result. We’ve turned our hand to Python and, more recently, Microsoft core languages such as C# and VB.net too. It depends what our clients need and our assessment of the right tool for the job. Sometimes it’s not even about choosing a programming language. Sometimes it’s about choosing pre-built software, such as Wordpress, and customising it to our client’s needs. We wouldn’t use Wordpress for anything more complicated than a simple e-commerce system, but for websites, including ours, it’s the right tool for the job.

So when you’re choosing your software development partner, consider whether they’re using the right tools for your project or whether they’re just using the hammer they’re familiar with to knock in your screw.

Stack Overflow With Custom JsonConverter

Chris Oldwood from The OldWood Thing

[There is a Gist on GitHub that contains a minimal working example and summary of this post.]

We recently needed to change our data model so that what was originally a list of one type, became a list of objects of different types with a common base, i.e. our JSON deserialization now needed to deal with polymorphic types.

Naturally we googled the problem to see what support, if any, Newtonsoft’s JSON.Net had. Although it has some built-in support, like many built-in solutions it stores fully qualified type names which we didn’t want in our JSON, we just wanted simple technology-agnostic type names like “cat” or “dog” that we would be happy to map manually somewhere in our code. We didn’t want to write all the deserialization logic manually, but was happy to give the library a leg-up with the mapping of types.

JsonConverter

Our searching quickly led to the following question on Stack Overflow: “Deserializing polymorphic json classes without type information using json.net”. The lack of type information mentioned in the question meant the exact .Net type (i.e. name, assembly, version, etc.), and so the answer describes how to do it where you can infer the resulting type from one or more attributes in the data itself. In our case it was a field unsurprisingly called “type” that held a simplified name as described earlier.

The crux of the solution involves creating a JsonConverter and implementing the two methods CanConvert and ReadJson. If we follow that Stack Overflow post’s top answer we end up with an implementation something like this:

public class CustomJsonConverter : JsonConverter
{
  public override bool CanConvert(Type objectType)
  {
    return typeof(BaseType).
                       IsAssignableFrom(objectType);
  }

  public override object ReadJson(JsonReader reader,
           Type objectType, object existingValue,
           JsonSerializer serializer)
  {
    JObject item = JObject.Load(reader);

    if (item.Value<string>(“type”) == “Derived”)
    {
      return item.ToObject<DerivedType>();
    }
    else
    . . .
  }
}

This all made perfect sense and even agreed with a couple of other blog posts on the topic we unearthed. However when we plugged it in we ended up with an infinite loop in the ReadJson method that resulted in a StackOverflowException. Doing some more googling and checking the Newtonsoft JSON.Net documentation didn’t point out our “obvious” mistake and so we resorted to the time honoured technique of fumbling around with the code to see if we could get this (seemingly promising) solution working.

A Blind Alley

One avenue that appeared to fix the problem was manually adding the JsonConverter to the list of Converters in the JsonSerializerSettings object instead of using the [JsonConverter] attribute on the base class. We went back and forth with some unit tests to prove that this was indeed the solution and even committed this fix to our codebase.

However I was never really satisfied with this outcome and so decided to write this incident up. I started to work through the simplest possible example to illustrate the behaviour but when I came to repro it I found that neither approach worked – attribute or serializer settings - I always got into an infinite loop.

Hence I questioned our original diagnosis and continued to see if there was a more satisfactory answer.

ToObject vs Populate

I went back and re-read the various hits we got with those additional keywords (recursion, infinite loop and stack overflow) to see if we’d missed something along the way. The two main candidates were “Polymorphic JSON Deserialization failing using Json.Net” and “Custom inheritance JsonConverter fails when JsonConverterAttribute is used”. Neither of these explicitly references the answer we initially found and what might be wrong with it – they give a different answer to a slightly different question.

However in these answers they suggest de-serializing the object in a different way, instead of using ToObject<DerivedType>() to do all the heavy lifting, they suggest creating the uninitialized object yourself and then using Populate() to fill in the details, like this:

{
  JObject item = JObject.Load(reader);

  if (item.Value<string>(“type”) == “Derived”)
  {
    var @object = new DerivedType();
    serializer.Populate(item.CreateReader(), @object);
    return @object;
  }
  else
    . . .
}

Plugging this approach into my minimal example worked, and for both the converter techniques too: attribute and serializer settings.

Unanswered Questions

So I’ve found another technique that works, which is great, but I still lack closure around the whole affair. For example, how come the answer in the the original Stack Overflow question “Deserializing polymorphic json classes” didn’t work for us? That answer has plenty of up-votes and so should be considered pretty reliable. Has there been a change to Newtonsoft’s JSON.Net library that has somehow caused this answer to now break for others? Is there a new bug that we’ve literally only just discovered (we’re using v10)? Why don’t the JSON.Net docs warn against this if it really is an issue, or are we looking in the wrong part of the docs?

As described right at the beginning I’ve published a Gist with my minimal example and added a comment to the Stack Overflow answer with that link so that anyone else on the same journey has some other pieces of the jigsaw to work with. Perhaps over time my comment will also acquire up-votes to help indicate that it’s not so cut-and-dried. Or maybe someone who knows the right answer will spot it and point out where we went wrong.

Ultimately though this is probably a case of not seeing the wood for the trees. It’s so easy when you’re trying to solve one problem to get lost in the accidental complexity and not take a step back. Answers on Stack Overflow generally carry a large degree of gravitas, but they should not be assumed to be infallible. All documentation can go out of date even if there are (seemingly) many eyes watching over it.

When your mind-set is one that always assumes the bugs are of your own making, unless the evidence is overwhelming, then those times when you might actually not be entirely at fault seem to feel all the more embarrassing when you realise the answer was probably there all along but you discounted it too early because your train of thought was elsewhere.

Sign on the Dotted Line – Why Contracts are Important

Paul Grenyer from Paul Grenyer

Contracts might seem like something only big business needs, and many small companies work without them, but if your work is important to you, it is vital to have a contract in place. A well put together contract can make a business relationship stronger and more successful, so it is worth investing some time and effort in getting a contract just right.

When people think of contracts, they often seem daunting, filled with complicated language only solicitors understand, fine print made to confuse the signatory and seemingly endless clauses that only apply in the most unique of circumstances. Documents like this are off putting, and occasionally detrimental to the business process, especially at the beginning of a new working relationship. Contracts don’t need to be pages and pages long, or contain lots of legal jargon or penalties, the most important thing is that all parties understand the content of the contract and all are in agreement as to their own responsibilities. It is very important to make clear what is expected of each party and what will happen if either side fails to keep up their end of the agreement. Being clear on cost is essential too - what is included in the charge and, very importantly, what is not included. A good contract should only contain information relevant to that particular piece of work and should be written in simple, understandable language where possible. Having someone sign something they don’t understand is not a good way to begin!

For general terms of business, applicable to every piece of work, a Master Service Agreement can prove useful to accompany each specific work contract and Naked Element agrees and signs an MSA with every client. This MSA does not oblige either party to work with each other, it merely details the quality of the service or product, each party’s availability throughout the business relationship and the responsibilities they have to each other. Only once a schedule is signed, does it become a binding contract. The MSA defines the confidentiality clauses, copyright details, intellectual property rights, payment terms and the scope of charges as well as liability from each party. These key details are indispensable for any business, whether the project is worth £500 or £5,000,000 as they are crucial if something were to go wrong.

Contracts also shouldn’t be designed to catch someone out, or tie them down unnecessarily, they should be an agreement, put together for the benefit of all parties. Where possible, a clever business person should be open to discussing and amending a proposed contract before it is signed if the other party wishes to make changes. It is also often beneficial to include a clause allowing either party to revisit a contract for adjustment after a set period of time. Being flexible and open to future issues in this way increases trust between parties, making a successful business relationship more likely.

A good contract should -

  • Only include relevant information
  • Use simple language
  • Outline benefits of the contract to both parties
  • Be negotiable
  • Be adjustable where appropriate

With a proper contract the client will feel they can depend on the product or service they are paying for and can rely on the contract to ensure they will not be out of pocket if something goes awry. By the same token, the service provider is protected by the contract if a client should renege on something that was previously mutually agreed upon. A good contract, that has everyone's  interests covered equally, makes a business seem more trustworthy, as well as more professional, and if everything goes well, more likely to do business again.

Words by Lauren.

Making 100 million requests with Python aiohttp

Andy Balaam from Andy Balaam&#039;s Blog

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

I’ve been working on how to make a very large number of HTTP requests using Python’s asyncio and aiohttp.

PaweÅ‚ Miech’s post Making 1 million requests with python-aiohttp taught me how to think about this, and got us a long way, with 1 million requests running in a reasonable time, but I need to go further.

PaweÅ‚’s approach limits the number of requests that are in progress, but it uses an unbounded amount of memory to hold the futures that it wants to execute.

We can avoid using unbounded memory by using the limited_as_completed function I outined in my previous post.

Setup

Server

We have a server program “server”:

(Note it differs from PaweÅ‚’s version because I am using an older version of aiohttp which has fewer convenient features.)

#!/usr/bin/env python3.5

from aiohttp import web
import asyncio
import random

async def handle(request):
    await asyncio.sleep(random.randint(0, 3))
    return web.Response(text="Hello, World!")

async def init():
    app = web.Application()
    app.router.add_route('GET', '/{name}', handle)
    return await loop.create_server(
        app.make_handler(), '127.0.0.1', 8080)

loop = asyncio.get_event_loop()
loop.run_until_complete(init())
loop.run_forever()

This just responds “Hello, World!” to every request it receives, but after an artificial delay of 0-3 seconds.

Synchronous client

As a baseline, we have a synchronous client “client-sync”:

#!/usr/bin/env python3.5

import requests
import sys

url = "http://localhost:8080/{}"
for i in range(int(sys.argv[1])):
    requests.get(url.format(i)).text

This waits for each request to complete before making the next one. Like the other clients below, it takes the number of requests to make as a command-line argument.

Async client using semaphores

Copied mostly verbatim from Making 1 million requests with python-aiohttp we have an async client “client-async-sem” that uses a semaphore to restrict the number of requests that are in progress at any time to 1000:

#!/usr/bin/env python3.5

from aiohttp import ClientSession
import asyncio
import sys

limit = 1000

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

async def bound_fetch(sem, url, session):
    # Getter function with semaphore.
    async with sem:
        await fetch(url, session)

async def run(session, r):
    url = "http://localhost:8080/{}"
    tasks = []
    # create instance of Semaphore
    sem = asyncio.Semaphore(limit)
    for i in range(r):
        # pass Semaphore and session to every GET request
        task = asyncio.ensure_future(bound_fetch(sem, url.format(i), session))
        tasks.append(task)
    responses = asyncio.gather(*tasks)
    await responses

loop = asyncio.get_event_loop()
with ClientSession() as session:
    loop.run_until_complete(asyncio.ensure_future(run(session, int(sys.argv[1]))))

Async client using limited_as_completed

The new client I am presenting here uses limited_as_completed from the previous post. This means it can make a generator that provides the futures to wait for as they are needed, instead of making them all at the beginning.

It is called “client-async-as-completed”:

#!/usr/bin/env python3.5

from aiohttp import ClientSession
import asyncio
from itertools import islice
import sys

def limited_as_completed(coros, limit):
    futures = [
        asyncio.ensure_future(c)
        for c in islice(coros, 0, limit)
    ]
    async def first_to_finish():
        while True:
            await asyncio.sleep(0)
            for f in futures:
                if f.done():
                    futures.remove(f)
                    try:
                        newf = next(coros)
                        futures.append(
                            asyncio.ensure_future(newf))
                    except StopIteration as e:
                        pass
                    return f.result()
    while len(futures) > 0:
        yield first_to_finish()

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

limit = 1000

async def print_when_done(tasks):
    for res in limited_as_completed(tasks, limit):
        await res

r = int(sys.argv[1])
url = "http://localhost:8080/{}"
loop = asyncio.get_event_loop()
with ClientSession() as session:
    coros = (fetch(url.format(i), session) for i in range(r))
    loop.run_until_complete(print_when_done(coros))
loop.close()

Again, this limits the number of requests to 1000.

Test setup

Finally, we have a test runner script called “timed”:

#!/usr/bin/env bash

./server &
sleep 1 # Wait for server to start

/usr/bin/time --format "Memory usage: %MKB\tTime: %e seconds" "$@"

# %e Elapsed real (wall clock) time used by the process, in seconds.
# %M Maximum resident set size of the process in Kilobytes.

kill %1

This runs each process, ensuring the server is restarted each time it runs, and prints out how long it took to run, and how much memory it used.

Results

When making only 10 requests, the async clients worked faster because they launched all the requests simultaneously and only had to wait for the longest one (3 seconds). The memory usage of all three clients was fine:

$ ./timed ./client-sync 10
Memory usage: 20548KB	Time: 15.16 seconds
$ ./timed ./client-async-sem 10
Memory usage: 24996KB	Time: 3.13 seconds
$ ./timed ./client-async-as-completed 10
Memory usage: 23176KB	Time: 3.13 seconds

When making 100 requests, the synchronous client was very slow, but all three clients worked eventually:

$ ./timed ./client-sync 100
Memory usage: 20528KB	Time: 156.63 seconds
$ ./timed ./client-async-sem 100
Memory usage: 24980KB	Time: 3.21 seconds
$ ./timed ./client-async-as-completed 100
Memory usage: 24904KB	Time: 3.21 seconds

At this point let’s agree that life is too short to wait for the synchronous client.

When making 10000 requests, both async clients worked quite quickly, and both had increased memory usage, but the semaphore-based one used almost twice as much memory as the limited_as_completed version:

$ ./timed ./client-async-sem 10000
Memory usage: 77912KB	Time: 18.10 seconds
$ ./timed ./client-async-as-completed 10000
Memory usage: 46780KB	Time: 17.86 seconds

For 1 million requests, the semaphore-based client took 25 minutes on my (32GB RAM) machine. It only used about 10% of my CPU, and it used a lot of memory (over 3GB):

$ ./timed ./client-async-sem 1000000
Memory usage: 3815076KB	Time: 1544.04 seconds

Note: PaweÅ‚’s version only took 9 minutes on his laptop and used all his CPU, so I wonder whether I have made a mistake somewhere, or whether my version of Python (3.5.2) is not as good as a later one.

The limited_as_completed version ran in a similar amount of time but used 100% of my CPU, and used a much smaller amount of memory (162MB):

$ ./timed ./client-async-as-completed 1000000
Memory usage: 162168KB	Time: 1505.75 seconds

Now let’s try 100 million requests. The semaphore-based version lasted 10 hours before it was killed by Linux’s OOM Killer, but it didn’t manage to make any requests in this time, because it creates all its futures before it starts making requests:

$ ./timed ./client-async-sem 100000000
Command terminated by signal 9

I left the limited_as_completed version over the weekend and it managed to succeed eventually:

$ ./timed ./client-async-as-completed 100000000
Memory usage: 294304KB	Time: 150213.15 seconds

So its memory usage was still very bounded, and it managed to do about 665 requests/second over an extended period, which is almost identical to the throughput of the previous cases.

Conclusion

Making a million requests is usually enough, but when we really need to do a lot of work while keeping our memory usage bounded, it looks like an approach like limited_as_completed is a good way to go. I also think it’s slightly easier to understand.

In the next post I describe my attempt to get something like this added to the Python standard library.