Intersection of Non-Empty Sets in Python
Suppose you generate several sets on the fly, and you want to find the elements that are in all the sets. That's easy, it's the intersection of sets.
# One syntax option result = set_one & set_two & set_three # Another option result = set.intersection(set_one, set_two, set_three)
But let's suppose that one or more of your sets is empty. The intersection of any set and an empty set is an empty set. But, that's not what you want. (Well, it wasn't what I wanted, anyway.)
Suppose you want the intersection of all non-empty sets.
List comprehension
If the sets are in a list, you can remove the empties. Then unpack the list into the set.intersection()
function.
list_of_sets = [set_one, set_two, set_three] # Empty sets evaluate to false, # so will be excluded from list comp. non_empties = [x for x in list_of_sets if x] solution_set = set.intersection(*non_empties)
The asterisk before non_empties
unpacks the list into a series of positional arguments. This is needed because set.intersection()
takes an arbitrary number of sets, not an iterable full of sets. (It's the same asterisk as in *args
in function definitions.)
(Note: You could use a filter instead of a list comprehension, but Guido thinks a list comprehension is better. I agree.)
With iterable unpacking (tuple unpacking)
In my case, I was generating the sets in my code, and the solution set always contained only one item. And I wanted the item, not a set with the item. So...
# initialize an empty list list_of_sets = [] # each time I create a set, # append set to list when it is created, # instead of naming them individually list_of_sets.append( thing_that_generates_a_set() ) # drop the empties, find the intersection # and unpack the remaining single element solution, = set.intersection(*[x for x in list_of_sets if x])
The comma after solution
turns the assignment into a tuple unpacking. If you unpack a collection of one, you get the single item.
By the way, if you end up with more than one item in your collection, and only want the first item, you can do:
first_item, *_ = some_collection
The *
indicates a variable number of positional arguments (it's the same asterisk as in *args
and in passing the list to set.intersection()
above), and the underscore is used as a convention for "not using this stuff."
# you could have done this instead first_item, *stuff_i_will_not_care_about = some_collection
I'll be using that *_
below, in the actual code.
Why would you ever do this?
The generalized problem
From a pool of items, there are three attributes to select for. Specifying any two of them should produce one and only one result.
More specifically...
Musical intervals.
A musical interval has:
- a quality (Major, Minor, Perfect, Augment, or Diminished)
- a number (Unison (1), Second (2), Third (3) ... Octave (8))
- a distance of half_steps (for example, a major third is 4 half steps)
If you know any two of these, you can select the correct one.
Some actual code
class Interval(): ##################################### # ... all sorts of things removed ... ##################################### instances = set() # all instances of Interval @classmethod def get_intervals(cls, *, quality=None, number=None, half_steps=None): """Return a set of intervals.""" candidate_sets = [] candidate_sets.append({x for x in cls.instances if x.quality == quality}) candidate_sets.append({x for x in cls.instances if x.number == number}) candidate_sets.append({x for x in cls.instances if x.half_steps == half_steps}) candidate_sets = [x for x in candidate_sets if len(x) > 0] return set.intersection(*candidate_sets) @classmethod def get_interval(cls, quality=None, number=None, half_steps=None): """ Return a single interval.""" try: interval, = cls.get_intervals(quality=quality, number=number, half_steps=half_steps) ## if there was not one and only one result except ValueError: # only select by half_steps candidates = [x for x in cls.instances if half_steps == x.half_steps] # select the first one, # based on quality priority: # Perfect, Major, Minor, Dim, Aug interval, *_ = sorted(candidates, key=lambda x: x.quality.priority) return interval
In the actual code, there's a bunch of other things going on, but this is the general idea.
Another approach
For my specific use case, another approach is simply to not create a set for the unspecified attribute.
if quality is not None: candidate_sets.append({x for x in cls.instances if x.quality == quality}) if number is not None: candidate_sets.append({x for x in cls.instances if x.number == number}) if half_steps is not None: candidate_sets.append({x for x in cls.instances if x.half_steps == half_steps})
In my working code, I actually do both. This allows for a potentially meaningful result even if something is specified incorrectly. I could have decided to let bad input cause explicit failure, but I think I'd rather not in this case.
So... what's the point?
This post looks like a tutorial on list comprehension. Or maybe set operations. But really this post is about problem solving while writing code.
The code solution to this problem is really easy... but only if you've figured out the problem you need to solve.
I started with the following problem:
Find the intersection of all non-empty sets, from an arbitrary pool of sets, not knowing which ones would be empty.
So I started Googling variations on that theme. But there aren't any "intersection of just the good sets" functions. Then I tried to start writing a question for Stack Overflow, and as soon as I had written the title, I knew the answer.
Starting with a collection of sets, drop the empty sets and find the intersection of the remaining sets.
As soon as I broke my one problem into two steps, the problem was immediately solved:
- Create a new collection without the empties. (List comp.)
- Find the intersection of that list.
At the same moment I realized these steps, it also become clear that the original group of sets should be a collection, not just several unrelated objects.
So, the moral of the story is...
If you can't find the solution to your specific problem, restate your problem as a series of steps.
Comments
Comments powered by Disqus