scroll down to overview
.
IAA
.
Customizable Touch-Less
Command System
Concept
.






Motivation
Users' options to command and interact with modern computing/smart devices (laptops, tablets, smart phones. etc.) have remained largely restricted in terms of customizability as they mostly pre-set/pre-defined by manufacturers. A command system in which users can customize/self-defined commands has vast potential to improve usability and command efficiency. A touch-less version of such a concept may yield additional advantages, for example, a significant increase in productivity for the visually-impaired population compared to the traditional voice-directed command system.
A customizable, touch-less command system may also be extended to more advanced use cases as the world moves into the era of VR/AR.
​
​
​
Data
Data are collected as coordinates of landmarks over face, left hand, right hand, and pose leveraging MediaPipe extracted from webcam captures using OpenCV.
​
MediaPipe.solutions.holistic:
​
​
​
​
​
​
​
​
​
​
Specifically, landmarks from 30 webcam captures (videos) each 30 frames long are captured and saved for each command (there are 3 currently tested commands) for training. There are a total of 1,662 landmarks in each captured frame.
Creating directories for saving training data to be collected:
​
​
​
​
​
​
​
​
​
Collecting training data for each command:
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Algorithm
A structure of Long/Short-Term Memory (LSTM) Neural Network is currently leveraged for learning.
Please refer to the code file for details.
​
​
​
Current Results
Current implementation is able to recognize actions given little deviation from training cases. It is less accurate against use cases that deviate more from training cases, i.e. too close to or far away from webcam.
Potential improvements to be implemented/tested are proposed in the "Future Versions" section (see below).
​
​
​
​
​
​
​
​
​
​
​
​
Version History
-
IAA0.0: Implements minimum working pipeline (utilities, data collection, modeling, training, evaluation, and live demo) for customized data collection and action recognition of hand gestures with demonstrable but suboptimal performance
Future Versions
​
-
IAA0.1: Implements image recognition of gestures on top of action recognition from IAA0.0 (system tries to recognition based on both actions and static images). This will potentially improve accuracy when hands are situated in/near the end positions of an action/hand motion
-
IAA0.2: Tunes the Neural Network(s) to improve accuracy and recognition speed
-
IAA0.3: Implements additional steps to introduce noise based on real data collected to improve robustness against real use cases
-
IAA1.0: Implements aesthetic refinements and marching squares-inspired UI visuals
-
IAA2.0: Introduces custom options for interactions based on the z-axis (distance to the webcam). This will allow for more variations of commands with the same number of hand gestures
.
.