TL;DR CircleCI and XCUITest are not a match made in heaven and you’re definitely not going to get parallelization. I spent a solid three working days to figure this out and while I didn’t produce what I was hoping for, I learned a lot along the way.

Background

My company uses CircleCI, we make an iOS app, and my job is to automate the UI testing of that with Apple’s native framework, XCUITest.

What Works Locally

My 2019 16-inch MBP @ 2.3GHz with 16GB RAM can run our test suite of 125 test cases perfectly reliably in 25 minutes using four simulated devices at once. This is awesome and fast, but immutable infrastructure is great and the future; I shouldn’t have to maintain a physical machine.

Our Initial CircleCI Setup

The setup of our tests in Circle was fairly straightforward. They have fairly good documentation and really push using Fastlane. The first step was our dev team getting the app building. They did, and this gave me a good starting point for plugging in the automated testing. Once setup, I found the tests were horribly unreliable.

After digging and connecting with support, a “large” resource was allocated to us (a bigger virtual computer with specs [but not performance] matching my laptop, which requires approval on their end) and things ran much better, but not perfectly (I still have random hangs/failures).

This has been “fine” for a few months, but the execution time for a single simulator is over an hour. As such, we only run the tests at night.

How can I get this faster so we can run all the tests more frequently? Parallelization! Remember how I did that locally to great success? The next few sections will go into the various options for this and why they didn’t work.

Native Parallelization w/ Fastlane

Xcode supports parallelization out of the box! Tick a toggle in your project, fire the job off, and it does all the work for you. This increases the load on the testing node, which as you may have guessed, when it’s already running unreliably is not going to get any better. With this solution half my tests were failing with timeouts.

Fastlane Parallelization

No such thing. Unfortunately, this would be the logical place to drive all of this as it’s our interface into issuing the actual commands to the system. Nope.

Native Parallelization w/o Fastlane

In short, get rid of Fastlane and use Xcode parallelization. This was riddled with build errors (module imports). Without fully rebuilding our CI in a way they do not suggest, I’m not going to resolve this. There is probably good reason they don’t recommend not using Fastlane so I’m not going to bother. Even if I did get this working, it’d use a single test node that I know to already be working at its limit with a single thread of execution.

CircleCI Parallelism

I tried this long ago so the details are foggy, but I recall it spawning a bunch of nodes that would build the app themselves (expensive to us), and presumably run as unreliably as my final solution. I also don’t think there was a clean mechanism for using this with Fastlane since all the actual test running logic is in it. I’d have to create and maintain a job step for each test class.

I do plan to revisit this as I learned a lot on this effort that may help here. Unfortunately, while this may make something reliable, I think we’re still looking at 40+ minutes to complete.

Build Your Own

After many trials and some terribly redundant code, I’ve refactored this solution into a pretty neat little bundle. It uses Fastlane and runs a job for each test class, which is an array of strings I need to supply (could probably come up with a script to automate this, but my list of test classes doesn’t change too much so not much maintenance for now).

1. Build the app once for distribution to all the test nodes via a persisted workspace. Without this, each node would build the app itself, which would cost us money and only further slow things down. Debugging where your build is going when you can’t see inside the machine doing the work is a bit tricky, but this was accomplished:

config.yml

  build-for-ui-testing:
    <<: *defaults
    steps:
      - checkout
      - dependencies
      - run:
          name: Build for UI testing
          command: bundle exec fastlane build_for_ui_testing
      - persist_to_workspace:
          root: ~/project/build
          paths: Build/Products/Debug-iphonesimulator/*

fastfile

  lane :build_for_ui_testing do
    run_tests(
      workspace: "MyApp.xcworkspace",
      derived_data_path: "build",
      scheme: "MyAppUITests",
      build_for_testing: true
    )
  end

2. Attach to the shared workspace to pick up the build from the shared workspace and also perform only a limited number of our dependencies since we’re not building:

config.yml

  ui-tests:
    <<: *defaults
    parameters:
      test_class:
        type: string
        default: 'all'
    steps:
      - run:
          name: Pre-start Simulator
          command: xcrun instruments -w "iPad Pro (9.7-inch)" || true
      - checkout
      - ui-test-dependencies
      - attach_workspace:
          at: ~/project/build
      - run:
          name: Fastlane - run_<< parameters.test_class >>_ui_tests
          command: bundle exec fastlane run_ui_tests ios_version:latest test_class:<< parameters.test_class >>
      - store_test_results:
          name: Store test results
          path: output/scan

fastfile

  lane :run_ui_tests do |options|
    test_class = options[:test_class]
    run_tests(
      reset_simulator: true,
      derived_data_path: "build",
      test_without_building: true,
      scheme: "MyAppUITests",
      device: "iPad Pro (9.7-inch)",
      only_testing: "MyAppUITests/" + test_class
    )
  end

3. Create a matrix workflow to fire off a job for each test class (this saved a lot of repetitive code):

config.yml

workflows:
  version: 2.1
  ui-testing:
    jobs:
      - build-for-ui-testing
      - ui-tests:
          requires:
            - build-for-ui-testing
          matrix:
            parameters:
              test_class: ["MyTestClass1", "MyTestClass2", "MyTestClass3", 
              "MyTestClass4", "MyTestClass5", "MyTestClass6", 
              "MyTestClass7", "MyTestClass8", "MyTestClass9", 
              "MyTestClass10", "MyTestClass11", "MyTestClass12", 
              "MyTestClass13", "MyTestClass14", "MyTestClass15", "MyTestClass16", 
              "MyTestClass17", "MyTestClass18", "MyTestClass19", "MyTestClass20", 
              "MyTestClass21", "MyTestClass22"]

And it was beautiful! Kind of…

You see all those purple jobs? They’re “not running.” I think by spinning up 22 nodes at once, I’ve brought Circle to their knees. Some of these will never start, some will hang completely, and others will run in a super flaky state (I’ve yet to have a fully successful run).

I did try setting back to a “medium” sized resource to see if I could put less load on Circle, but that only reminded me why we needed to be running on “large” nodes; so many timeouts for no good reason.

With this solution, our 69 minute single threaded, but mostly reliable run takes 35 minutes and isn’t reliable. Even if 35 minutes were reliable, that’s too long to be running often. Remember, my laptop does this work in 25 minutes.

Had this worked better, the next step was to figure out how to get clean notifications out of this; our current setup Slacks a pass/fail per job, but with 24 total jobs, that’s noisy.

UPDATE: As soon as I posted this and push my final commit to this experimental branch, the entire workflow succeeded for the first time. ?‍♂️

Conclusion

CircleCI doesn’t support XCUITest parallelization. Based on my searching for help on the topic, nobody has done this or they have, failed, and didn’t bother writing about it. I’ll be sharing this write-up with Circle support in hopes they can tell me what I did wrong or they can improve their service to support this.

Continuing Thoughts Part 1

I’ve shared this with my team and a prevailing thought is that even though I’ve split the jobs to run in parallel, they may all still be running on the same machine, hence running so poorly. The question is if this is the case, how can we get things running across multiple machines? Obviously our build would need to persist to more than a workspace (maybe a storage bucket somewhere out there?), but what would that mean for signing? I worry we’re then more or less “distributing” a development build, which I don’t expect to work.

My brain is a bit fried from this work so I’m going to take a few days away from it (unless I hear from Circle support) and give the parallelism option another look. I’m not hopeful. I am very fortunate to work in a place that lets me experiment like this. I’ve probably not mentioned it, but we’re hiring!

Continuing Thoughts Part Deux

I spent more time on this as I promised I wouldn’t… Every job appears to have been receiving a unique VM to run on. It’s up to the service provider to spin those up in a timely fashion to support my usage. The suggestion that I think about physical devices on their end I can no longer entertain; the whole point of giving this work to their systems is they’re better equipped to handle maintenance and allocations of systems than I. Funny enough, the first other person I’ve ever heard of attempting to deal with this popped on my radar today. 2 of my last 3 workflows have run flawlessly in 33 minutes, but 66% isn’t close to cutting it.

Part 3 of Thoughts

Towards the end of my experimentation above, I came across this Fastlane “Test Center” plugin. I was out of time at the moment and generally dislike adding layers to anything I do; this would have to wait. Fast forward to now, I had a free day to play with this plugin, it’s absolutely awesome, the developer is very active, and I got it doing what I wanted it to do locally. Push it up to Circle… and it doesn’t work the same way… not even close. Parallel runs simply don’t work. I bounced around the plugin’s discussions, issue reports, and even chatted with the developer, and concluded… Circle does things weird and again I can’t do what I want to do without adding even more hack-like complexity. I’ll experiment with that kind of manual test batching next time I have time, but that’s likely far off as I need to build and Android test framework for now.