Some people eat, sleep and chew gum, I do genealogy and write...

Sunday, November 19, 2017

Voice recognition software: Boon or Bane?


For me, functional voice recognition software has been one of most elusive goals of my many years' fascination with computers. I always dreamed that by talking, the computer program would magically convert my speech into text thereby creating a more effortless way to write. But the reality has always been far from the imagined goal. Until quite recently, the transcriptions from voice recognition or VR software have been cranky and most of the time more trouble than they are worth.

From time-to-time over the years, I have written about my experiences with voice recognition software and now it is time, once again to return to the subject. The programs available today are just barely adequate. However, to achieve the present level of marginal functionality, both the hardware and the software had to reach a certain level of sophistication and speed that is only, just now, becoming available. If you want to use voice recognition software, I suggest you will need the fastest computer you can possibly afford and a relatively expensive software program and even then, the product will still be barely satisfactory.

There are a number of different levels of voice recognition software in common use today. The most basic level, almost a toy, involves recognizing voice commands and some speech. Good examples of these types of programs are Apple's iOS program Siri and Google's Android program Assistant. These programs are designed to provide vocal interaction with a computer but provide only marginal text recognition. We use these programs for dictating short text messages and have a good time laughing at the mistranslations and mistakes.

The next level includes programs such as the integrated voice recognition software in both the Apple MacOS operating system and Microsoft's Windows operating system. Both of these programs to an adequate level of recognizing the spoken word, but both have only very rudimentary editing capabilities. From my experience, most people are not even aware that their computer can transcribe speech into a variety of existing programs. The lack of basic editing capabilities renders these programs useless for other than casual note-taking.

Many years ago, IBM initiated a program to develop speech recognition. This culminated in a program called Via Voice. Eventually, the program was evidently abandoned and the program was sold off to Nuance Software. Nuance has very slowly improved their VR programs over the years, culminating in programs for both Windows and Mac, now called simply Dragon but previously known as Dragon Naturally Speaking.

Unfortunately, several of the low level, rudimentary programs such as Siri and Google Assistant, are touted as voice recognition software. If a program produces text that requires more time to edit than it takes to type by hand, then it is useless. That is the case with Apple, Microsoft and Google at this point in time. The following is an example of using Apple's voice dictation program to read this paragraph.
Unfortunately,Several of the full level,Riddimentary program such as SiriAnd Google assistant,Are touted as a voice recognition software.If a program produces textThat requires more time toEditThen it takes to type by hand,Then it is useless.That is the case with Apple,MicrosoftAnd GoogleAt this point in timeThe following is an example of using apples voice dictation program to read this paragraph.
As you can see, the program interprets commas as periods and messes up the word spacing. To go back and "fix" the dictation is a waste of time. If I wanted to actually use this dictated text or make modifications, I would spend an inordinately large amount of time doing so. I have a hard enough time editing what I write without throwing in a bunch of time-consuming errors.

That brings us down to the only consumer-level product available today: Nuance's Dragon. First of all, it is a relatively expensive program. The Mac version is presently $300.00 and upgrades are usually nearly as expensive as re-purchasing the program. In addition, the program is buggy and needs to be restarted periodically to stop the program from adding in random characters. The Mac version bugs seem to persist over upgrade versions. But it is apparently almost the only game in town. In addition, to add insult to injury, the program is licensed to only one computer or device and so people like me who use two or three or more personally owned computers are limited to using the program on only one unless we want to spend another $300 to add another computer. Interestingly, the PC version starts at $59.00.

During the past few months, I have been using Dragon on my Mac to write many of my blog posts. I am certain that no one could tell when I am using the software and when I am not. For me, the increased level of productivity and speed is worth the price, but I am surprised that there is not a little bit more competition out there. Voice recognition is becoming ubiquitous, but until the editing capability catches up with the recognition, the programs will not replace Dragon.

One last note. There are a lot of different versions of Dragon on sale on Amazon.com. These are almost all older, even less useful versions of the program. Be careful when purchasing the program.

2 comments:

  1. Does Dragon require training? If so, it won't recognize random voices well.

    I had an hour long recording of an interview of my uncle with two people there asking questions and commenting. So there were three voices. I tryed Microsoft dictate, spent several days looking online and tried some "state of the art" open source and University packages. None did even close to an adequate job, and could not recognize change of voice which is essential to record in an interview. I thought of trying Dragon, but I didn't want to pay for what I expected to be another failure.

    Yes, I agree, this is one field that can be hugely improved. Any young keen programmers out there who want to make a fortune? A working voice recognition algorithm is definitely needed.

    ReplyDelete
    Replies
    1. I worked with one researcher who was recording interviews he was making in care centers. He found Dragon Naturally Speaking to work fairly well, even with a change in the people speaking. But, yes, it does require training and you have to add vocabulary items from time to time.

      Delete