After saying last week that I should be able to control my iPad with my eyes, I have just discovered the “Enable Head Pointer” option on my Mac.
(Here’s a screenshot of the interface, for future reference.)
This allows my cursor to be controlled by the position of my head, as captured by my webcam.
I also have it set up to left click when I raise my eyebrows. This is using an option called “Enable alternative pointer actions” that recognises various facial expressions.
It works surprisingly well! A little jittery maybe, and there’s a cognitive disconnect because the computer responds to the position of my head – but not the direction of my pupils.
Using the head pointer for a short while, I found that it worked well “leaning back,” but got confusing when I picked up the mouse again or started typing. So…
- There’s an option to activate the head pointer based on a facial expression. Now I have it set up to turn on/off based on scrunching my nose.
- I still have a need to enter text (for example, a tweet or searching for music). So I’ve set Dictation to activate by double-tapping the option key – I couldn’t see a way to do this with a facial expression.
Some observations:
As an input control it’s clunky, but nothing some machine learning wouldn’t sort out. For example, multitouch on smartphones is great at rejecting spurious input and understanding where you intended to tap, rather than the xy coordinates of physical contact. (Try using your phone with the screen upside down. It’s next to impossible because it’s built around the shape of capacitative contact and an assumed position of your eyes.) So if you got the software to consider both head movement and gaze direction, and trained it by looking at how users iterate towards intended targets, I’m sure you would end up with an almost magical “do what I mean” input mode.
It is so close to being something I would use in preference to a mouse… or rather, alongside one. What this tells me is that there is scope for an interface where you hop between mouse, gaze, speech, and back again. Why should I lean in just to open a calendar event, tap on Zoom link, and join a call?
Nose scrunch, look, eyebrows, look, eyebrows, done. Try it if you can. It’s amazing.
Can I see an interface like this becoming standard? No, not on desktop computers… but I think it’s worth perfecting because of where it might lead. Might it be useful to control a smart TV – how would it work for a group? Or, when I was speculating last year about voice control for lightbulbs and stoves (but without sharing data with the cloud), maybe fluidly swapping between gaze and voice would be the ideal interface to the smart home.
Final observation: It’s worth noting that the “Head Pointer” is an Accessibility feature on the Mac. If you really sweat the details on accessibility, it turns out there is often broad applicability.
I’m a big fan of Microsoft’s Inclusive Design efforts, and check out a diagram of the Inclusive Design approach here. In a nutshell their view is that disability is contextual. Somebody may permanently have one arm, but temporarily have an injured arm, or situationally be holding a baby – solve the coffee shop door problem for one of these groups, solve for all.
And as someone who is often holding a toddler, whose eyes are already not that great and are getting worse, with trouble hearing, and an inability to recognise faces that honestly I should get checked out at some point, it’s a design approach I can get behind.
After saying last week that I should be able to control my iPad with my eyes, I have just discovered the “Enable Head Pointer” option on my Mac.
(Here’s a screenshot of the interface, for future reference.)
This allows my cursor to be controlled by the position of my head, as captured by my webcam.
I also have it set up to left click when I raise my eyebrows. This is using an option called “Enable alternative pointer actions” that recognises various facial expressions.
It works surprisingly well! A little jittery maybe, and there’s a cognitive disconnect because the computer responds to the position of my head – but not the direction of my pupils.
Using the head pointer for a short while, I found that it worked well “leaning back,” but got confusing when I picked up the mouse again or started typing. So…
Some observations:
As an input control it’s clunky, but nothing some machine learning wouldn’t sort out. For example, multitouch on smartphones is great at rejecting spurious input and understanding where you intended to tap, rather than the xy coordinates of physical contact. (Try using your phone with the screen upside down. It’s next to impossible because it’s built around the shape of capacitative contact and an assumed position of your eyes.) So if you got the software to consider both head movement and gaze direction, and trained it by looking at how users iterate towards intended targets, I’m sure you would end up with an almost magical “do what I mean” input mode.
It is so close to being something I would use in preference to a mouse… or rather, alongside one. What this tells me is that there is scope for an interface where you hop between mouse, gaze, speech, and back again. Why should I lean in just to open a calendar event, tap on Zoom link, and join a call?
Nose scrunch, look, eyebrows, look, eyebrows, done. Try it if you can. It’s amazing.
Can I see an interface like this becoming standard? No, not on desktop computers… but I think it’s worth perfecting because of where it might lead. Might it be useful to control a smart TV – how would it work for a group? Or, when I was speculating last year about voice control for lightbulbs and stoves (but without sharing data with the cloud), maybe fluidly swapping between gaze and voice would be the ideal interface to the smart home.
Final observation: It’s worth noting that the “Head Pointer” is an Accessibility feature on the Mac. If you really sweat the details on accessibility, it turns out there is often broad applicability.
I’m a big fan of Microsoft’s Inclusive Design efforts, and check out a diagram of the Inclusive Design approach here. In a nutshell their view is that disability is contextual. Somebody may permanently have one arm, but temporarily have an injured arm, or situationally be holding a baby – solve the coffee shop door problem for one of these groups, solve for all.
And as someone who is often holding a toddler, whose eyes are already not that great and are getting worse, with trouble hearing, and an inability to recognise faces that honestly I should get checked out at some point, it’s a design approach I can get behind.