Back to the Project

SoundSurf

October 6, 2024

I still remember the day my all-time favorite vocalist got into a tragic motorcycle accident when a car ignored a traffic sign. He ended up paralyzed from head to toe in 2011. Even speaking became a struggle since he could not control any part of his body, and no one ever expected him to sing again. But his father created a machine that he can manually applies pressure to his stomach, allowing him to make sounds. The process is painful and dangerous as his ribs constantly break but he never stopped practicing.

Now, he can sing the songs he used to perform, and he is improving every day. Not only has he regained his voice, but he also studied further on music, completed his master’s and PhD, and became a professor teaching at a university. His life has always inspired and motivated me to push harder, but I had not considered his daily struggles until tragedy struck my family.

About three years ago, my uncle suddenly collapsed in the middle of the road on his way to work and remained unconscious for weeks. Initially, we thought it was a heart attack, but it was not. The hospital could not figure out what had happened, and later they labeled it a rare disease that caused by bacterial infection.

Fortunately, he woke up, but being unconscious for so long caused nerve damage, leaving half of his body paralyzed. Watching his family suffer made me realize how close tragedy can be. It can literally happen to anyone. Then I remembered the singer using his mouth to press a button on his iPad while enduring pain and thought about how challenging life must be for people with limited mobility. Things we consider normal and easy in daily life, like using a computer or phone, are impossible for them without help. I also reflected on Elena Mukhina’s story that patients can be in dangerous situations if their guardians isn't around.

I wanted to create a solution to help people with disabilities access basic tools independently and protect themselves in emergencies. During StormHacks 2024, I finally found a team to bring my vision to life. Together, we developed SoundSurf, a voice-activated, AI-powered web browser designed to help people with physical disabilities access the Internet.

We chose to build a full browser using Electron.js instead of creating a Chrome extension, as it avoids requiring permissions, a potential barrier for users who cannot use a mouse. The browser interface relies on simple HTML and CSS, making it user-friendly and accessible. The default website interface seemed small and did not fit the full screen of a laptop monitor. This was because the design was optimized for iPad size, considering that some users might not use other devices and may be unable to adjust the screen size. By making it fit the iPad dimensions, we aimed to ensure a consistent and accessible experience for our target use.

For speech-to-text functionality, we integrated OpenAI’s Whisper. While Chromium offers a built-in solution, it requires a GCP API key outside the Chromium browser, which we found restrictive. We also implemented audio filtering to handle background noise, ensuring functionality even in noisy hackathon environments.

For basic commands, we used custom functions without relying on a large language model. However, for more complex interactions like identifying and interacting with specific buttons or form fields, we built a FastAPI server. This server uses OpenAI’s GPT-4 to analyze raw HTML and return appropriate accessibility selectors for the browser to interact with.

One challenge we faced was handling dynamic websites built with frameworks like React or Vue. Their complex HTML structures consumed too many OpenAI tokens, highlighting the need for a more efficient approach, potentially through fine-tuned models tailored for web interactions.

Reflecting on the project, we realized that some of our browser logic was mismanaged, with too much functionality placed in the preload.js file instead of being distributed to renderer.js. Moving forward, we plan to streamline the logic, fine-tune models, and improve the browser’s efficiency to make it even more accessible for our target users.

It was not an easy journey, and my team retired from hackathons after this project, but we are all proud of creating something meaningful. We won the hackathon, and our solution earned the Best Social Good Hacks award. I will never forget when the Chief Operating Officer from the Brave came up to me, held my hand, and said how impressed she was with the project and how much she appreciated that we thought about helping others.

Github DevPost