Time Estimation:
Project Complexity and Design Description:
We want to create a voice eBlock that is able to interface with the rest of the eBlock set. This eBlock would be able to take voice commands from either everyone or a specified user, based on the mode selected. In normal mode, the block would listen for a specific two word phrase from any user. The first word would be it’s “name”, this name is programmed by the user or through DIP switch settings. Once the device hears it’s name it will await a command, either On (device outputs a “Yes”) or OFF (device outputs a “No”). Because each device is named, the user does not need to worry about multiple voice eBlocks interfering with each other. All this is done with no training and no need for a PC.
In security mode, the device does the same thing, except only for a specified user. That user would have to train the block for the user’s voice and after that the block will only function for that user. Some security measures will be put in place to prevent retraining to circumvent security mode.
Range of device would be approx. 3-4 meters with a slightly elevated voice. Background noise would not affect device (within reason).
Possible Features (Time permitting):
Trade Off Analysis:
Technology | Technology Description | Cons | Pros | Design Decision |
Speaker Independent Vs. Speaker Dependent
|
Speaker independent voice recognition requires no training on the part of the user. Where as Speaker dependent requires some initial training. |
Speaker Independent:
Speaker Dependent:
|
Speaker Independent:
Speaker Dependent:
|
We decided to go with the speaker independent technology because the product will be easier for the consumer use. In addition the time-to-market will not be greatly affected if we go with the ASSP (Application Specific Standard Part) option. In fact it may be less because we would not have to worry about the extra logic and programming required for voice training. |
Software Recognition Vs. Integrated Circuit Recognition
|
Using an IC to handle voice recognition in addition to a general purpose processor for control functions. In contrast to using a general purpose processor for handling voice recognition as well as control functions. |
Software Recognition:
|
Software Recognition:
IC Recognition:
|
The decision was made to go with IC recognition due to the short time-to-market constraint. This is done at the expense of power and cost constraints. However, we may be able to shut the PIC down when no words are being recognized saving us some power. |
Unique Block Vs. Common Block
|
Can multiple blocks be controlled independently when in close proximity? Unique block has an additional word to differentiate it from other blocks whereas with the common block there would be no way to differentiate between different voice blocks. | Unique
Block:
Common Block:
|
Unique
Block:
Common Block:
|
We decided to go with the unique block because it allows us to add more flexibility to the product without making the product extremely difficult to use. As far a engineering goes we would have more complex control software and addition hardware (dip switches). |
Continuous Listening Vs. Non-Continuous Listening
|
With Continuous Listening, the voice block would listen constantly for commands, whereas the Non-Continuous would only start to listen when given some “yes” input, such as a button press from a button block. |
Continuous Listening:
Non-Continuous Listening:
|
Continuous Listening:
Non-Continuous Listening:
|
Decided to go with continuous listening mode however may we may add a low-power mode so that both are possible time permitting. |
Multiple Word Recognition Vs. Isolated Word Recognition
|
With multiple word recognition, small phrases are possible, with more natural speech, whereas with isolated word, the user would speak slower and be able to only use single words. |
Multiple Word Recognition:
Isolated Word:
|
Multiple Word Recognition:
Isolated Word:
|
We decide to go with multiple word recognition because it allows for more natural speech which would make it easier for the user to use. Currently most Voice ASSP support multiple word recognition. |
Cost:
$150 for development kit and speech hardware.
Slight Update:
As of 01/16/04 we
have two choices for the project:
1. Clapper-Type
This eBlock would
recognize some unique sound (other than speech), and use that as the
trigger to whether to output a "yes" or a "no".
Pros: Easier to
implement, much simpler. Components are cheaper and probably easier to
work with.
Cons: Not very "exciting". Not what we really wanted to do. Not much room to add to it later. Basic design.
2. Speech-Type
This eBlock would
recognize the word (or words) specific to a command. Based on this
command it would output a "yes" or a "no".
Pros: Much more
"exciting" design, More versatile, Potentially more attractive in the
market. More of a personal Achievement.
Cons: Very
complicated. I'm not sure if we'll have enough time to properly
research and implement something like this. Expensive.
Decisions:
No decisions have
been finalized as of yet. We're still waiting to see how tommorow's
(saturday) meeting with the whole class goes.
Either way there's a good chance that both designs will be implemented
in some say, and depending on how the quarter goes at the end
the two designs may merge into a new design.