Using `sum(lst, [])` to flatten a list of lists has quadratic time
complexity. Use `chain.from_iterable()` instead. While not strictly
necessary to improve performance, convert to `map()`.
A test case writing out verilog for a 512k entry FIFO is 120x faster
with this applied.