Week 7 - June 22 - 26

Boom. Week 7 completed.

We started this week with a brief team meeting to get the week’s action plan set and me and Mike rolling. After discussion, we decided that this week for me would about getting the transcript up and running with no errors and bugs.

As I mentioned last week, the code for the transcript had an error in the text streaming for the RTT feature in WebRTC. Every time a key is pressed - it is sent in a seperate line and not user friendly. So we needed to fix that to improve the UI. Before Week 6 ended, Norman drafted a code to fix that by setting a timer to detect inactivity and if there is none for 5 seconds - it would print all of the content that was streamed. It had some bugs and errorsbbut there was something to work on. So I started Monday with tackling and cleaning the code to get the text stream to stream fluently. It didn’t take long because By the end of Monday, I was able to get the code working beautifully (Thanks to Norman for drafting the code)!

After I was able to finish my major focus of the week - getting the transcript running, I went over to help Mike and his ASR. I touched on this a couple of times in the previous blogs but for the past 7 weeks, my focus has been with Microsoft Azure and its’ ASR meanwhile Mike was working on Google Cloud Platform and Web Speech API (WSA). We were really doing the same things with our ASR: connecting the ASR to WebRTC, getting it to stream fluently in the text caption area, and getting it connected with the transcript. So most of the time, we would take turns in break throughs and getting codes down for the same tasks - i.e. the transcript - and when that would happen, we would share codes and help each other out. This time it was my turn to help Mike and get his WSA ASR up and running for the transcript.

However, when working on his ASR, Mike and I realized that his ASR for the captions was not fully functional. It wouldn’t work in the text stream from the remote users, the text would print repeatedly. So we had to kind of take a step back in WSA’s progress and fix the problem so we could connect it to the transcript option. Luckily, I worked a lot with the captioning and transcript feature of the ASR in WebRTC so I was able to use the experience and go through the codes and comb out the errors. Eventually, we were able to get the captions up and running and get it connected with the transcript coding that I created for the MS Azure ASR. Check! Another big task down for the week.

Wait. You thought I was done huh? Nope. After getting that done, me and Mike met with Dr. Raja again to discuss where to go next. Dr. Raja brought up testing/evaluation that needed to be done for our product development. See, as mentioned in my homepage, the goal of this project is to improve usability and quality of captioning by finding the best possible choice of Automatic Speech Recognition and interface. So for the past 7 weeks, me and Mike were working on developing the technology product of our research project. We were trying to find the best ASR and have that be the product with the captioning/transcript interface of WebRTC. With 7 weeks of implementing the ASRs and dropping those that did not work, we discovered the MS Azure provided the best ASR experience for users.

So we had that figured out, next was to find the best interface for users to view captioning along with RTT feature that WebRTC had already contained. Now it was time for us to create a test - A/B test - to gather data/evaluation from users for our product and see what aspects of the product are good and what needed to be improved. We decided that we would create two enviroments/versions of our product - one with captioning directly in each participants’ window and another with a transcript that overlays the videos. The two versions would be group A and group B and have our testers see what the interface experience is like.

Since I was finished with my tasks, and Mike had to work on another one, I was assigned to be in charge of creating the two versions. At first, I thought it would be easy right? Just have one without the transcript option, and one with a transcript option but have it already on screen without a button to show/hide it. Yeah, you would think, but nope. Apparently with the process of creating the transcript option for all ASRs - we missed an error. There was a problem with a delay in reciving the remote users’ text streaming. The code that was set up for MS Azure’s ASR had two functions in getting the STT service - the recognizing function and the recognized function. In local user’s view, there was no problem because we had the function be able to print the recognizing text then remove that and replace it with the final/recognized text with puncuation and captialization. But for the remote user’s text, we had the code only send the recognized text through a peer connection which meant if the user were to speak for 5 minutes straight, there would be no text until the user stops talking and the recognized function gets called. Now that was a huge error and we had to fix that before we could create the two versions for the A/B Testing.

Well, very lucky and grateful for me, Norman came to the rescue yet again! He was able to write up a code that would send the recognizing text to everyone then once it detected recognized text, the code would replace the recognized with the recognizing so there would be no repeated text and a final clean one. However, he only wrote the code up for the captioning in each participants’ windows feature and so I needed to extract the code and implement that into the transcript feature. Though, just like last week - Friday came by and the end of the week was here before I could really tackle the code for the transcription.

Overall, the week was a great one. We were able to make much progress and take a step forward in our research project. I will be starting next week yet again on getting the transcript up and running effectively. Hopefully by then, there shouldn’t be any errors with the ASR and that we can move forward with the evaluation part of this project. Talk to you all in a week! :)

Written on June 26, 2020