A UI testing driver is a tool that helps automating interactions with a UI. Typically it is used in automated functional and regression test suites. Essentially a driver is a tool that can automatically drive through a UI.
I have worked on
Frankenstein, have discussed and worked with
Ketan on Twist while he was developing
SWTBot and have brainstormed with
Hakan on his new web testing driver Krypton (Sadly, I cannot find any public repository of this yet. Will post it when I get hold of it). While doing these, I found that there is something fundamentally similar to all UI testing drivers and if you understand these basic ideas, you can very easily work with any driver or implement your own for some new platform.
I would like to formalize the concepts of a UI testing driver in this post. This should give you a good mental model to understand the working of a driver. Before we start, let's answer a basic question:
How would you test UI manually?
Lets say you want to login into a typical application. You need to enter your username and password into the text field and the password field respectively and then click on the "Sign In" button. If you want to do it manually, you first search or
locate where the username field is. Then, you
interact with it i.e. type your username into the field. You would do the same with the password field. You would then search or locate where the "Sign In" button is and click it.
The two main tasks that you do manually are:
1) Locate what you want and
2) Interact with it
These form the
basic operations of any UI testing driver. If you can automate these two operations, you can essentially test any UI automatically i.e. have a driver for that UI.
1. Locate Elements
The first thing a driver needs to provide is a mechanism to locate or search for elements. Most drivers use a concept called
locator. A locator is like an address. Based on how accurate the locator is, a driver can get one or more UI elements that match the given locator. I want to stick with the same term - locator, because it pretty accurately describes what it does. The actual syntax of locators is left up to the driver implementers.
Examples:
- .name could match an element whose CSS class is name.
- Table >> Chapter 1 >> Questions could match an element which is called "Questions" and is a list item of "Chapter 1" which in turn is a sub-list of "Table"
Repeatability
An important property of locators is that given a UI with some structure, a driver should return the same element or a set of elements for a given locator every time, even when there are new elements added or unrelated elements being deleted from that UI.
Example: If the locator
text_with_label['Foo'] identifies a text field whose label is 'Foo', adding a new text field, say, after this field should not change what element the driver returns.
While this property is important for the sake of stability and non-flaky tests, its not mandatory. Sometimes, non-deterministic locators which use the nearness in terms of distance on the UI or relative positioning is used. This can be quiet handy when testing a UI that is not so well written and cannot be changed easily.
Absolute Positions
Historically, tools like QTP used the absolute co-ordinates of a given UI component as a locator when recorded. This can be extremely flaky and should not be used as is. Most tools do not use this approach anymore and I have mentioned this just for the sake of legacy.
2. Interact with Elements
Once a driver knows what element it is dealing with, you can specify what you want to do with it. For example, you can click a button, choose an option from a drop down, drag and drop an image onto a Thrash Can icon etc. In order to do these interactions, a driver needs to simulate what a user does. Right from simulating events to actually moving the mouse and sending keyboard events, a driver can choose to do it in a few different ways. The following are some ways of doing this:
- OS Native events - In this approach, the driver sends OS level native events to the element identified by the locator. For example, in order to click on a button, the mouse pointer is actually moved to the button and a real mouse button click is sent on a OS level.
- Good thing about this is that it is as close to what happens in the real world as you can get in automated testing
- Bad thing about this is that it in order to implement this, you would need to write very low level code or use libraries like Java's AWT Robot. Either way, the application under test needs to have focus, in which case you cannot run multiple tests on the same box and development becomes annoying.
- Example: Frankenstein
- Framework Native events - In this approach, the driver sends all possible events programmatically to an element that would make sense for a given interaction. For example, in SWTBot, in order to click on a button, the driver sends SWT Events such as MouseIn, MouseButtonDown, MouseButtonUp, MouseOut, MouseButtonClick etc to the button in the order in which the real events would be sent.
- Good thing about this approach is that it is very easy to develop. This can be run without giving the application under test focus. It works for the 98% case
- Bad thing about this is the absence of the perceived safety of doing the real thing. No one would be calling these events manually in production environment, which make this seem like a very high level integration test. Though, this should not matter, it is sometimes brought up as an issue. In the 2% case, the issue could be that the event listener is hooked onto a different element - may be a container but the driver is sending the events to the element matched by the locator.
- Example: Selenium, Sahi, SWTBot
- Application Native events - In this approach, the driver sends events native to that application to an element. For example, inside a browser, you can send browser specific native events like COM events in IE, XPCOM events in Firefox etc. You would be working on a fairly high level compared to OS level events, but get the benefits of native events. This can be thought of as a middle ground between the first and second approaches. Webdriver uses this approach.
For the first and third approaches, the current position of the located element can be evaluated. Though this would be an absolute position, this is still OK as it wont be persisted. The event is then sent to the evaluated co-ordinate. This way, a driver would simulate a user's interaction.
You can pretty much map what most UI testing drivers do to the above 2 basic operations.