Background
I inherited a test framework containing ~50 tests. It was broken after having received no attention for a year. I spent my first couple of months getting my feet wet, getting it running (even when the suite was running, most of the tests didn’t work), and adding tests. By December, we had 78 “tests” (this was a false number because many tests were setup/teardown) asserting 208 things. It took ~23.5 minutes to run. Based on the way it was coded, this could not be improved; tests were dependent on one another and the code was spaghetti.
At the urging of another developer, I decided to bite the bullet and build a new framework from the ground up. Less than two months later (20ish focused days considering other priorities and time off) I’ve come close to matching the number of “tests” with 68 (these are true, atomic tests) performing ~180 (the actual number is a bit higher due to some loops) assertions in roughly the same amount of time.
But wait, not faster you say? Ah-ha, my friend, you’ve asked a wonderful question. Aside from the framework being far more maintainable, ridding ourselves of test dependencies allows for parallelization.
I won’t rewrite what others have written on the details of Xcode parallel testing, but I will dive into my results and optimizing them.
Performance
Xcode allows you to run up to eight parallel simulated devices and also contains an “auto” option. I’m on a top-of-the-line MacBook Pro from late-2019 (16″ 2.3Ghz 8-core i9, 16GB RAM). Is eight better than five? Is four better than five? Let’s dive in. All numbers are in seconds, approximate, and from a single run so far from scientific.
Note: I did say above my runs are ~23 minutes, but it’s obvious the first below is nearly 28; I did my testing with a “better” device than I normally run; a “lesser” simulated device runs faster, which I’ll get into later.
Without parallelization: 1658
Eight devices: Things ground to a halt and there were timeouts/failures all over the place
Six devices: 740
Five devices: 754
Four devices: 796 (“auto” uses this number)
Three devices: 879
Five seems to be a sweet spot, but things seemed a little laggy at that number; I think it’s wise to trust “auto” as it’ll likely produce the best combination of reliability and speed.
55% improvement with parallelization alone.
How it Works
Xcode launches however many simulators you’ve asked for and gives each one a test class (logical set of tests defined by you). When a simulator has run all the tests in that class, it’s given another one. Some test classes run longer than others and you’ll find some simulators therefore sit idle when they’re through all their tests while another may continue chugging along. More on that in the optimization section below.
Randomization
Worth noting, running the tests in a random order did not affect execution time. The reason for this is it still breaks the tests down by class. The same classes are going to be assigned to the same simulators regardless of how the tests are run within the class and take the same amount of time.
Device Differences
This was perhaps the most surprising thing to me. Unlike Android, where all simulated devices are more or less the same except for screen size, there were noticeable differences in execution time depending on the device used. It seems the more high-end the device, the slower the tests run; perhaps the system is allotting more resources to the actual device and therefore can’t use those resources to run automation? Results (single device):
iPad Pro 12.9: 1658
iPad Pro 9.7: 1459
iPhone 11: 1250
33% improvement running on “lesser” devices. Not to say we won’t continue running on other devices, but if we want the fastest sanity check possible, we’re going to use a small phone.
Optimization
As mentioned above, breaking tests down by class can cause one simulator to run for minutes after all the others have completed their classes and there were no more to be had. Our suite suffered from the greatly as I had lumped all our “edit the data” tests together. Three of the simulators would finish their work between five and seven minutes while one would continue chugging along for eleven. By breaking the tests down into smaller classes (all editing of one type of data in one class, all editing of another in another, …) we get the following (in seconds):
Before optimization: 661
Optimized: 482
27% improvement with optimizing one class, and there’s still more I could do (the theoretical limit appears to be 350 [add seconds per device in the results below and divide by four]), but I’d have to make some choices that defy logical organization in the name of speed and I ain’t down with that.
Another reason to not over-optimize at this point is test cases will continue to get added to the suite, which will cause the way classes get executed and finish to change and perhaps negate any gains made now. Theoretically, I have a free buffer for the next few test classes I add since they’ll only fill in the idle gaps.
The short of it is smaller test classes will run faster, but don’t sacrifice logic for speed
Conclusion
With the addition of parallelization, device benchmarking, and some light optimization, our test automation suite has gone from an unwieldy 23.5 minute behemoth to a svelte eight minute thing of beauty, a 75% overall performance improvement.
Special Considerations
- Atomic tests should not share data sets. When running in parallel, you will run into collisions. This required very minor tweaking for our suite once I optimized; previously the data editing tests would never edit the data at the same time, but now they can so they all need their own data to mess with.
- This parallelization does not seem to be supported by all the continuous integration vendors. I’ve only run a quick test with Circle CI, but it did not work. I also know fastlane doesn’t support parallelization at all, but I have hacked my own solution with it for a POC. Anyway, I have a lot more investigation to do here. While we only run our full suite at night, when timing doesn’t really matter, it would be nice to be able to run more than a smoke test per commit.
Leave a Reply