Skip to main content

Registering Functions Against Object Methods in Python

My big side project right now is a music theory library in Python, called Ophis. Among many other concerns, I'm trying to make the API as natural and easy to use as possible. This often means finding ways of creating objects other than ClassName(args).

Ophis has the classes Chroma and Pitch. A chroma is a note name without an octave (the idea of C), while a pitch is a chroma with a specified octave (middle C).

The problem with this is that the conventional way of referring to a pitch would then be:

ophis.Pitch(ophis.C, 0)

You can see, Ophis has already initialized all the note names (chromae) you would need. We could do that with pitches...

C0 = Pitch(C, 0)
C1 = Pitch(C, 1)

# later, in user code...

ophis.C1

...but I think we all know the problem with that. It requires initializing several hundred pitch objects that may never be used. Most songs don't use every note. And every physical note has multiple names because of enharmonic spelling (F♯ == G♭).

So, what if the API looked like this?

ophis.C(1)

That's cool. Pretty easy to do, too.

class Chroma:

  #
  #
  #

  def __call__(self, octave):
    return Pitch(self, octave)

What if we went deeper?

Once you realize this is a good idea, the next thing you realize is.... what about chords?

ophis.Chord(ophis.C, Major)

Well, that looks pretty similar, doesn't it?

So, um... okay...

class Chroma:

    #
    #
    #

    def __call__(self, x):
        try:
          return Pitch(self, x)
        except TypeError:
          return Chord(self, x)

There are problems with this.

  • Definitions for Pitch and Chord are in modules that get loaded after Chroma. This doesn't create any errors (because the function isn't run on load), but still feels wrong.
  • It is brittle. If I change the name of Pitch or Chord, I have to go back and change it here. The tightly-wound nature of music terminology means I have long-since given up the idea of loose coupling, but I'm trying to make these types of dependencies only go up the conceptual ladder, not back down it.
  • What if I want to add more things to this method? Eventually I'm going to end up creating a series of type checks.

When I was working through this, I didn't see any way around a series of type checks, but I thought I could solve the first two problems with some creative coding.

I decided I could register functions into a dict, stored on the class. The keys for the dict would be types, and the values would be the functions to run when __call__ is called with that particular type as an argument. These functions could be registered at the point when the type that the function is supposed to return is created.

Something like...

class Chroma:

    #
    #
    #

    _callable_funcs = dict()

    def __call__(self, x, x_type=None):

        if callable(x):
            self.__class__._callable_funcs[x_type] = x
        else:
            return self.__class__._callable_funcs[type(x)](self, x)


# This code has not been tested.

I got (a version of) this to work, and I was feeling pretty darn proud of myself for thinking of this solution, and implementing it.

Then I had this feeling like this was all very familiar. Maybe I had read about this type of thing?

I quickly discovered three things:

Unfortunately, I have two problems:

  • The @singledispatch decorator only looks at the first argument of a function call. The first argument of a method call is always self. So, out of the box, this dosn't work for instance methods.
  • @singledispatch was added in v3.4, making it still a little newish. Since I'm writing a utility library for others to use, and not my own application, it seems unwise to rely on something that everyone might not have.

But, now I can do two things:

  • See if anyone has already figured out a way to apply @singledispatch to a method. (Someone has.)
  • Potentially re-implement @singledispatch myself, for backwards compatibility.

Right...

# oph_utils.py

try:
    from functools import singledispatch
except:
    # A re-implementation of @singledispatch
    # has been left as an exercise for the reader
    # because I haven't done one yet.

def method_dispatch(func):
    """
    An extension of functools.singledispatch,
    which looks at the argument after self.
    """
    dispatcher = singledispatch(func)
    def wrapper(*args, **kw):
        return dispatcher.dispatch(args[1].__class__)(*args, **kw)
    wrapper.register = dispatcher.register
    update_wrapper(wrapper, func)
    return wrapper

# chroma.py

class Chroma():

    #
    #
    #


    @oph_utils.method_dispatch
    def __call__(self, x):
        return self

# pitch.py


import chroma as ch

class Pitch:

    def __init__(self, chroma, octave=0):
          self.chroma = chroma
          self.octave = int(octave)


ch.Chroma.__call__.register(int, Pitch)


# In user code:

ophis.C(0) == ophis.Pitch(ophis.C, 0)
# True

And finally, to encourage this usage...

class Pitch:

    #
    #
    #


    __repr__(self):
        return "".join([
            self.chroma.__repr__()], "(",
            self.octave.__repr__()], ")"
            ])


# At a terminal...

>>> ophis.Pitch(ophis.C, 0)
C(0)

Feels Pythonic, yes?

Further Reading

Intersection of Non-Empty Sets in Python

Suppose you generate several sets on the fly, and you want to find the elements that are in all the sets. That's easy, it's the intersection of sets.

# One syntax option
result = set_one & set_two & set_three

# Another option
result = set.intersection(set_one, set_two, set_three)

But let's suppose that one or more of your sets is empty. The intersection of any set and an empty set is an empty set. But, that's not what you want. (Well, it wasn't what I wanted, anyway.)

Suppose you want the intersection of all non-empty sets.

List comprehension

If the sets are in a list, you can remove the empties. Then unpack the list into the set.intersection() function.

list_of_sets = [set_one, set_two, set_three]

# Empty sets evaluate to false,
# so will be excluded from list comp.
non_empties = [x for x in list_of_sets if x]

solution_set = set.intersection(*non_empties)

The asterisk before non_empties unpacks the list into a series of positional arguments. This is needed because set.intersection() takes an arbitrary number of sets, not an iterable full of sets. (It's the same asterisk as in *args in function definitions.)

(Note: You could use a filter instead of a list comprehension, but Guido thinks a list comprehension is better. I agree.)

With iterable unpacking (tuple unpacking)

In my case, I was generating the sets in my code, and the solution set always contained only one item. And I wanted the item, not a set with the item. So...

# initialize an empty list
list_of_sets = []

# each time I create a set,
# append set to list when it is created,
# instead of naming them individually
list_of_sets.append( thing_that_generates_a_set() )

# drop the empties, find the intersection
# and unpack the remaining single element
solution, = set.intersection(*[x for x in list_of_sets if x])

The comma after solution turns the assignment into a tuple unpacking. If you unpack a collection of one, you get the single item.

By the way, if you end up with more than one item in your collection, and only want the first item, you can do:

first_item, *_ = some_collection

The * indicates a variable number of positional arguments (it's the same asterisk as in *args and in passing the list to set.intersection() above), and the underscore is used as a convention for "not using this stuff."

# you could have done this instead

first_item, *stuff_i_will_not_care_about = some_collection

I'll be using that *_ below, in the actual code.

Why would you ever do this?

The generalized problem

From a pool of items, there are three attributes to select for. Specifying any two of them should produce one and only one result.

More specifically...

Musical intervals.

A musical interval has:

  • a quality (Major, Minor, Perfect, Augment, or Diminished)
  • a number (Unison (1), Second (2), Third (3) ... Octave (8))
  • a distance of half_steps (for example, a major third is 4 half steps)

If you know any two of these, you can select the correct one.

Some actual code

class Interval():

  #####################################
  # ... all sorts of things removed ...
  #####################################


  instances = set()
  # all instances of Interval


  @classmethod
  def get_intervals(cls, *, quality=None, number=None, half_steps=None):
      """Return a set of intervals."""

      candidate_sets = []

      candidate_sets.append({x for x in cls.instances if x.quality == quality})

      candidate_sets.append({x for x in cls.instances if x.number == number})

      candidate_sets.append({x for x in cls.instances if x.half_steps == half_steps})

      candidate_sets = [x for x in candidate_sets if len(x) > 0]

      return set.intersection(*candidate_sets)

  @classmethod
  def get_interval(cls, quality=None, number=None, half_steps=None):
      """ Return a single interval."""

      try:
          interval, = cls.get_intervals(quality=quality, number=number, half_steps=half_steps)

      ## if there was not one and only one result
      except ValueError:

          # only select by half_steps
          candidates = [x for x in cls.instances if half_steps == x.half_steps]

          # select the first one,
          # based on quality priority:
          # Perfect, Major, Minor, Dim, Aug
          interval, *_ = sorted(candidates, key=lambda x: x.quality.priority)

        return interval

In the actual code, there's a bunch of other things going on, but this is the general idea.

Another approach

For my specific use case, another approach is simply to not create a set for the unspecified attribute.

if quality is not None:
    candidate_sets.append({x for x in cls.instances if x.quality == quality})

if number is not None:
    candidate_sets.append({x for x in cls.instances if x.number == number})

if half_steps is not None:
    candidate_sets.append({x for x in cls.instances if x.half_steps == half_steps})

In my working code, I actually do both. This allows for a potentially meaningful result even if something is specified incorrectly. I could have decided to let bad input cause explicit failure, but I think I'd rather not in this case.

So... what's the point?

This post looks like a tutorial on list comprehension. Or maybe set operations. But really this post is about problem solving while writing code.

The code solution to this problem is really easy... but only if you've figured out the problem you need to solve.

I started with the following problem:

Find the intersection of all non-empty sets, from an arbitrary pool of sets, not knowing which ones would be empty.

So I started Googling variations on that theme. But there aren't any "intersection of just the good sets" functions. Then I tried to start writing a question for Stack Overflow, and as soon as I had written the title, I knew the answer.

Starting with a collection of sets, drop the empty sets and find the intersection of the remaining sets.

As soon as I broke my one problem into two steps, the problem was immediately solved:

  1. Create a new collection without the empties. (List comp.)
  2. Find the intersection of that list.

At the same moment I realized these steps, it also become clear that the original group of sets should be a collection, not just several unrelated objects.

So, the moral of the story is...

If you can't find the solution to your specific problem, restate your problem as a series of steps.

Verbiage

I hate the word verbiage.

First, we need to deal with the fact that it is the wrong word. Most of the time, when people say verbiage, they really mean verbage --- that is, the wording. Verbiage, properly, means excessive wordiness, not the specifics of word choice.

But this isn't what I hate about it. I would hate it just as much if it meant precisely what every one uses it to mean. My problem is with the idea itself. I hate what people are saying when they say verbiage.

Every time I have ever heard the word verbiage, the person has been talking about the precise way that something is worded. The context is always about improving something.

  • Can you make this more clear by fixing up the verbiage?
  • After you get the first draft of the design done, ask Adam to help you clean up the verbiage.
  • Maybe we can change the verbiage on this form to make it more user friendly.

Without fail, a request to work on the verbiage is symptomatic of a deeply flawed design and engineering process. We got to this point because people were decorating, not designing, and now we are going to try to get out of it by changing the words the user sees.

This causes more problems, of course.

The reason the words aren't clear and precise in the UI is that the mental model developed by the engineering team is either confused or just plain wrong. In order to make the application easy to use, our Verbiage Specialist has to overlay a new mental model --- often, the one that should have been used in the first place. This new mental model, and the collection of verbages that go with it, will be imprecise and incomplete because the Verbiage Engineering Team can't tell the developers to restructure the database and rename all the application's variables. The result is that the UI becomes temporarily easier to use, but at the cost of taking on additional Verbiage Debt. Somewhere deep in an internal wiki or Confluence page is a OVM (Object-Verbiage Mapper) glossary telling you that dev:event_property => user:"Device Status". But nobody reads internal wiki pages, so the problem just gets worse.

You cannot fix an application by redecorating the UI. Fixing the verbiage is just redecorating + technical debt. If you find yourself fixing up the verbiage, the problems are much deeper.

So how do you avoid Verbiage Debt?

Stop treating writers as Verbiage Technicians and think of them as Verbiage Architects. (I'm sure there's a good word for this already.) Your Verbiage Team, along with your Pictures of Things Engineers, need to be involved from the beginning with the design of your application, and they need to be fully-fledged members of the engineering team --- not hired hands, consultants, helpers, or otherwise after-the-facters.

Building software has more to do with creating mental models than it does with writing code. Humans create mental models in language and pictures.

Your language and pictures people are as important as your coders.

Designing vs. Decorating

My wife spent some time in an Interior Design master's degree program. One of the things that frequently frustrated her was the conflation, by people outside the industry, of interior design and interior decorating.

  • "Oh, so like, you're learning how to pick out furniture and stuff."
  • "Can you help me pick paint colors in my bedroom?"
  • "That's cool, like that show on HGTV."

Decorating is primarily about aesthetics --- how things look. Design is about function --- how things work. There is certainly overlap between the professions, but their focus and concern is very different.

At least, though, nearly everyone in the industry --- and certainly everyone at her school --- understood the difference. Since my wife was there the school has actually changed the name of the program to Interior Architecture, to make the focus more clear.

I'm not sure the software industry as a whole understands the difference between decorating and design. Part of the problem is that we don't use the word "decorator," to describe people with graphics skills and no sense of the underlying software. Everyone is a "designer." The best we have done is to try to make distinctions between "UX Design" and "Graphic Design."

In fact, I think the push in the last decade or so to use the word "UX" is an attempt to make the distinction. Unfortunately, I don't think it has helped. Like Tech Writers calling themselves "Documentation Specialists," the change in label has been driven as much by a desire for a cooler resume as by any real change in practices. The distinction we need to make is not between "graphics" and "UX," and certainly not between "UX" and "UI" (as if those are, you know, actually different things, really). The distinction we need to make is between design and decoration.

Have you ever sat in a redesign review that solved exactly none of the problems of the original design? The new thing looks better, but it functions the same. Decorating

Have you ever been involved in a process where some non-engineer Product Manager drew pictures of screens and buttons, and then someone with Photoshop skills and no coding experience turned that into a mockup? Decorating.

Have you ever been asked, after the graphics person has completed an entire set of screen mockups, to "help with some of the verbiage" in order to make things more clear? Decorating.

Any process that separates out the work of contributors --- first the engineers do something and then hand it off to the graphics person and then the tech writer writes about it later --- will tend toward decorating. Design requires people to actually talk to each other, preferably in the same room. Design requires that a person drawing and labelling a form input understand the conceptual model the form is interacting with.

I suggest we stop futzing with labels for types of people and buzzwords that feel helpful but aren't. This problem cannot be solved by finding an even cooler replacement word for "UX," and then blogging about how "UX is dead, we're doing XZ now." Just keep "design" and "decoration" in your head as an evaluative tool. Look at how things are being done and ask yourself --- it this designing or is it decorating? Then, if there's too much decorating, don't spend a lot of energy convincing people about the difference. Just begin to change the process.

And don't let someone with Photoshop skills redesign an app they don't understand and have never used.

Docs as Code

Practices

  • Docs are written in plain text formats such as Markdown or reStructured Text.
  • Docs are stored as flat files, not database entries.
  • Docs are authored in a code editor of the writer's choice, not a monolithic authoring application.
  • Docs are kept under version control.
  • Doc versions are organized in parallel to product versions.
  • Docs are built and deployed from source in an automated process that mirrors product deployment.
  • Docs are automatically tested for internal consistency and compliance to style guides.
  • Whenever reasonable, writers use the same tools and processes as developers.
  • Writers are integrated into the development team.

Benefits

  • Writers have more control over their authoring environment.
  • Less friction in the authoring process.
  • Elimination of inconsistencies between docs and product.
  • Less need for human proofreading.
  • Coordinated releases of docs with product.
  • Developers are more likely to contribute to docs.
  • Writers and developers have more awareness of and respect for each others' work.
  • Authoring and deployment tools are mostly free; hosting requires less overhead.

DocOps Isn't Just the Fun Part

Somewhere in the last year I decided I was into DocOps.

What that really meant for me is that I am into Docs-as-code, which is a related trend, but not quite the same. I care about things like single-source documents (DRY), version control, plain text editing, style linting, and automated deployment. I write little Python or Bash scripts to pipe tools together and customize the output of static site generators. I'm learning a lot, having a lot of fun, and finally weaving together a number of different skill sets and interests I've picked up over the years (writing, coding, project management).

When I was the only writer at a startup, this was all really effective. I could fool myself into thinking I was doing DocOps. And maybe I was, but only in that particular context.

But now I work at a big, hulking enterprise company. And all of the sudden it is clear that DocOps isn't just the fun technology bits, just like how DevOps isn't just about knowing how to deploy Docker on Kubernetes. It's about dealing with people and dealing with organizations.

I just want to stand up my docs somewhere. "Give me SSH access to a directory with a public URL." At the startup I just made a decision and had live docs published my second or third day there. At the enterprise? Not so simple. My tooling has to go through security checks. Engineers have to sign off on deployment processes. Customer service has a vested interest in how documents are delivered. Can we integrate to Salesforce knowledge base? How do I pip install from behind a firewall?

If I'm into DocOps, this is what I'm into. Not just hacking on writing tools (as much fun as that is), but also being effective in an organization. I was very effective in a startup, where hacking on things was how the organization operated. Now I have to level up and learn how to be effective at scale.

The Real Reason I Love Static Site Generators

There's a lot to like about static site generators like Jekyll, Nikola, and Sphinx.

  • Hosting is much simpler, and can usually be done for free.
  • Static sites are inherently more secure than dynamic ones.
  • Very fast page load times.
  • Authoring in a code editor that I have control over.
  • Markdown and reStructured Text are both faster to type than HTML or rich content in a WYSIWYG editor.
  • Version control.
  • The ability to manage the build and deploy process like code.

There are probably more benefits I'm not thinking of at the moment. When I first started using Jekyll, my main motivation was wanting to simplify hosting and exert control over authoring. I discovered the other benefits along the way, and they have really changed my professional life.

But I've realized there's one thing that has come to matter the most to me:

Static sites revive and make real the notion of a document on the web.

In database-backed CMSes, the pretty URL is a noble lie. Content is smeared around in a database and accessed through ?id=1234 parameters or internal query mechanisms. This is fine, and really the only way to handle massive amounts of content.

But the web was built to serve documents, not database results. In an age where content-as-data is on such hyperdrive that people think a single-page app blog system is a reasonable idea, it is calming to use a technology that works the way the web was always supposed to work.

And this has as much to do with the mental model as with the technology. (Maybe more.) The individual documents that make up a static site are handled as documents before being processed to HTML. If I want to change the content on some blog post, I edit a file on my local computer. I don't have to log in and use an application. It is transparent, and there's a direct relationship between a single file in my source and a single URI on my site. Now it feels like the URI actually identifies a resource, and is not just a cleverly-disguised search pattern.

I understand why we moved past the web of documents. But if you're producing documents, maybe it's the right model.

File Names

There are only two hard things in Computer Science: cache invalidation and naming things.
-- Phil Karlton

I cannot help you with cache invalidation.
-- Adam Michael Wood

I recently saw a question about file names in the Episcopal Communicators Facebook Group:

Question about file names.

This is a question about filenames for websites.

When we first developed our website, our consultant told me that when we put a file on there, it's important to give the file a date and a unique and descriptive name.

While that works for some files, it doesn't for others. It caused me to end up with a lot of old files on my website.

What I changed was that I stopped changing file names. So instead of mileage_rates_2016.pdf, I just call it mileage_rates.pdf. That way every link is correct, everywhere on the site.

However, when we link to outside websites, like the wider church's site, we end up with obsolete links. Case in point: the Manual of Business Methods:

We had full_manual_updated_09-30-2013.pdf.

And now the link is full_manual_updated_012815_0.pdf

Is there any need to give dates to files like this? It's important for the organization to archive old versions, but is there any need to have unique names so that websites like ours end up with older versions?

I summed a few file name best practices, but... I have a lot to say about this topic. File naming is one of those weird little things I have irrationally strong feelings about, and the ubiquity of bad file naming practices is a constant source of rage in my life.

Read more…

What (and how much) to learn?

I recently wrote that you don't need to attain a high-level of coding skill for learning to code to be useful. A technical writer can see substantial ROI from just learning enough to be dangerous.

Which raises (not begs!) the question: How much is enough? What topics should tech writers know? And how well should we know them?

To start answering that question, I'd like to expand what we mean by "learn to code" or "knowing how to code" to "developer skills" in general. Too much focus on coding overlooks the other highly useful things that developers know and do that tech writers can benefit from.

Read more…

The Problem with Github Pages

I love Github Pages. I run this blog, my personal blog, and my music and liturgy blog on it. I used it to host documentation for my most recent tech writing gig.

For a single writer with moderate or better technical skills looking for a simple hosting solution, it's amazing. But, I've recently realized there's a problem with it that makes it ill-suited for multiple collaborators working on complicated documentation. (Or even, as I discovered, a single writer on more than one machine.)

Read more…