Voice Recognition eBlock Proposal

Time Estimation:

Project Complexity and Design Description:

We want to create a voice eBlock that is able to interface with the rest of the eBlock set. This eBlock would be able to take voice commands from either everyone or a specified user, based on the mode selected. In normal mode, the block would listen for a specific two word phrase from any user. The first word would be it’s “name”, this name is programmed by the user or through DIP switch settings. Once the device hears it’s name it will await a command, either On (device outputs a “Yes”) or OFF (device outputs a “No”). Because each device is named, the user does not need to worry about multiple voice eBlocks interfering with each other. All this is done with no training and no need for a PC.

In security mode, the device does the same thing, except only for a specified user. That user would have to train the block for the user’s voice and after that the block will only function for that user. Some security measures will be put in place to prevent retraining to circumvent security mode.

Range of device would be approx. 3-4 meters with a slightly elevated voice. Background noise would not affect device (within reason).

Possible Features (Time permitting):

Low-Power mode: Device would enter low-power mode after a set period of inactivity. To bring it out of this mode a “Yes” input would be required.
Extendable microphone: Extends range, enables concealment of microphone and easier placement of eBlock.
Adjustable Sensitivity: Range is adjustable, but reduces accuracy at longer distances due to increased noise.

Trade Off Analysis:

Technology	Technology Description	Cons	Pros	Design Decision
Speaker Independent Vs. Speaker Dependent	Speaker independent voice recognition requires no training on the part of the user. Where as Speaker dependent requires some initial training.	Speaker Independent: Cannot be used for security applications. Harder to implement in software (more complicated). Less accurate (97% on most ICs) Speaker Dependent: Requires training (more complicated for user) More memory required to hold training	Speaker Independent: Anyone can use it. Ease of Use Less complicated (no initial programming) Speaker Dependent: Can be used for Security Purposes. More accurate (99% on most ICs)	We decided to go with the speaker independent technology because the product will be easier for the consumer use. In addition the time-to-market will not be greatly affected if we go with the ASSP (Application Specific Standard Part) option. In fact it may be less because we would not have to worry about the extra logic and programming required for voice training.
Software Recognition Vs. Integrated Circuit Recognition	Using an IC to handle voice recognition in addition to a general purpose processor for control functions. In contrast to using a general purpose processor for handling voice recognition as well as control functions.	Software Recognition: Complicated to program, much more complex software. High NRE and maintenance costs. Requires additional possessing power. (Requires a more powerful processor with large multiplier) Software development is time consuming. IC Recognition: Higher per unit costs (approx. $15-20 for low volumes). Additional cost of development kit. (Approx. $100-150) Additional power consumption from extra IC. Less customization possible	Software Recognition: Cheaper in High Volumes (able to amortize cost) Less hardware Possibly lower power consumption IC Recognition: Simpler to program, simpler software. Lower NRE cost (using off the shelf part) Requires a less powerful processor because of simplified duties. Shortened development time.	The decision was made to go with IC recognition due to the short time-to-market constraint. This is done at the expense of power and cost constraints. However, we may be able to shut the PIC down when no words are being recognized saving us some power.
Unique Block Vs. Common Block	Can multiple blocks be controlled independently when in close proximity? Unique block has an additional word to differentiate it from other blocks whereas with the common block there would be no way to differentiate between different voice blocks.	Unique Block: More complicated to program Requires more hardware. More Complicated for the User. Possibly more expensive because of more complicated design. Common Block: Less functional (cannot have more than one voice block in proximity)	Unique Block: Possible to have more than one voice block in close proximity without interference. More control over functions Common Block: Less complicated design Possibly cheaper (because of less complicated design)	We decided to go with the unique block because it allows us to add more flexibility to the product without making the product extremely difficult to use. As far a engineering goes we would have more complex control software and addition hardware (dip switches).
Continuous Listening Vs. Non-Continuous Listening	With Continuous Listening, the voice block would listen constantly for commands, whereas the Non-Continuous would only start to listen when given some “yes” input, such as a button press from a button block.	Continuous Listening: Requires more power. Non-Continuous Listening: Some input required to “alert” block. More complicated (because of button and additional logic)	Continuous Listening: No input required for action. (More Convenient) Increased Usability Non-Continuous Listening: Requires less power.	Decided to go with continuous listening mode however may we may add a low-power mode so that both are possible time permitting.
Multiple Word Recognition Vs. Isolated Word Recognition	With multiple word recognition, small phrases are possible, with more natural speech, whereas with isolated word, the user would speak slower and be able to only use single words.	Multiple Word Recognition: Requires more complicated software More effort for the device to get acclimated to the user. Isolated Word: Unnatural Speech More effort on the part of the user to get acclimated to the device.	Multiple Word Recognition: Natural Speech Easier for user to command block. Isolated Word: Requires simpler software Easier for block to understand user.	We decide to go with multiple word recognition because it allows for more natural speech which would make it easier for the user to use. Currently most Voice ASSP support multiple word recognition.

Cost:

$150 for development kit and speech hardware.

Slight Update:

As of 01/16/04 we have two choices for the project:

1. Clapper-Type

This eBlock would recognize some unique sound (other than speech), and use that as the trigger to whether to output a "yes" or a "no".

Pros: Easier to implement, much simpler. Components are cheaper and probably easier to work with.

Cons: Not very "exciting". Not what we really wanted to do. Not much room to add to it later. Basic design.

2. Speech-Type

This eBlock would recognize the word (or words) specific to a command. Based on this command it would output a "yes" or a "no".

Pros: Much more "exciting" design, More versatile, Potentially more attractive in the market. More of a personal Achievement.

Cons: Very complicated. I'm not sure if we'll have enough time to properly research and implement something like this. Expensive.

Decisions:

No decisions have been finalized as of yet. We're still waiting to see how tommorow's (saturday) meeting with the whole class goes.
Either way there's a good chance that both designs will be implemented in some say, and depending on how the quarter goes at the end
the two designs may merge into a new design.