[Submitted on 8 Oct 2024]
View HTML PDF (experimental).
Abstract:Text input is a crucial capability for any modern computing environment, and lightweight augmented reality glasses are no exception. Lightweight AR glasses are designed to be worn all day, but they have a limitation in that multiple cameras cannot be used for hand tracking. This constraint highlights the need for an extra input device. We propose a solution to this problem: RingGesture is a ring-based technique for mid-air gesture tracking, which uses electrodes to mark the beginning and end of gesture trajectory and IMU sensors to track hand movements. This method provides an intuitive experience that is similar to the raycast-based midair gesture typing used in VR headsets. It allows for a seamless translation from hand movements into cursor movement. Score Fusion is a novel deep learning word prediction framework that combines three key components to improve accuracy and input speed. These are: a) A word-gesture decoding, b) A spatial spelling correction, and c), a lightweight context language model. This framework, on the other hand, fuses the scores of the three models in order to predict the most probable words with greater precision. We conducted comparative and longitudinal research to demonstrate two key findings. First, the overall effectiveness and performance of RingGesture. It achieves an average text input speed of 27.3 Words per Minute (WPM) with a peak performance at 47.9 WPM. Second, we highlight the superiority of the Score Fusion framework which offers a 28,2% improvement in the uncorrected Character Error rate over a conventional Word Prediction framework, Naive Correction. This leads to a 55.2% increase in text entry speed. RingGesture also received a System Usability score of 83, indicating its excellent usability.
Submission History
From Junxiao Shen DR [view email]
[v1] Tue 8 Oct 2024 12:15:30 (28,169KB)