Hm. Seems like something that should have just jumped out at me. Oh well. I guess what I'd do then is (the verb class instance) "give" would look for two nouns and in the order they are given.
Although what exactly does BNF mean? This is really unfamiliar territory to me; I don't even know what a tokenizer is.
A tokenizer parses a string and splits it up by tokens, usually whitespace, it has no idea what is a verb or a noun or even a legitimate word. It used to be the de facto method for basic string parsing in Java (I mean, in the early days of Java, back when I was actually doing classes in it), but it is now deprecated in favor of Java's 'regular expression' library.
Regular expressions are currently the preferred method of parsing strings in Java. I'd recommend looking them up if you plan to do any string parsing at all. They look complicated, but in fact they are very simple, they are just a method of identifying arbitrary patterns in strings.
For example, the regular expression:
\b[a-zA-Z]+\b
matches out all individual words consisting only of alphabetical characters.
The simplest way to use regular expressions to break out text in Java is to use the String.split() method:
String[] String.split(String regexp)
which splits a string based on the specified regular expression and gives you back an array of all the resulting substrings. It's like the tokenizer, but the tokens are specified in terms of regular expressions, so the code:
String someWords = "Here are some words."
String[] splitWords = someWords.split("\\W");
ought to yield the array:
{"Here", "are", "some, "words"}
Since \W, or \\W as it's written in a Java string is the regular expression code for any non-word character, such as a space or tab, this splits the strings up around the whitespace.
BNF, or Backus-Naur Form, is a language used to describe a type of formal grammar called a 'context-free' grammar. A formal grammar consists of a vocabulary set, two sets of symbols (a terminal and non-terminal set) a series of replacement rules, called productions, which define all strings which can be generated using that grammar. For example, the productions:
S -> AB
A -> a | aa
B -> b | bb
begin at the start symbol S and can generate the strings "ab", "aab", "abb" and "aabb". Backus-Naur form is simply a particular method of writing productions which is commonly used because it is easy to interpret computationally. You don't necessarily need to use it, but it's good to understand how productions contribute to parsing. For example, the Backus-Naur form of a really simple English-like grammar with a small number of words could be:
<article> := "the" | "a"
<noun> := "dog" | "cat"
<verb> := "runs" | "swims"
<adverb> := "quickly"
<noun phrase> := <article> <noun>
<verb phrase> := <verb> | <verb> <adverb>
<sentence> := <noun phrase> <verb phrase> "."
Which would produce the sentences "the dog runs.", "the dog swims.", "a dog runs quickly", "the cat swims quickly", etc. Repeatedly applying the productions of a grammar results in a 'parse tree.' or a tree where the root is the start symbol and the leaves are a series of 'terminals,' or symbols which cannot be replaced using any rules in the grammar. A non-terminal is any symbol which is not a terminal.
Parsing comes down to taking input and determining whether or not it can be generated by the grammar in question. Using the above productions, the input "a cat runs." would be given the green light but "runs the quickly cat." would fail. The two methods of parsing a string to see whether it can be generated using a given grammar are top down and bottom up.
Top down parsing begins at the start symbol and attempts derive a valid parse tree which matches the input. Parsers like the LL parser work this way, by attempting to match the left most input symbol to the left-most symbol on the result side of a production, starting at the root of the tree and working its way to the leaves. For example, a top down parser for the above grammar would begin analysis at the <sentence> level and try to work its way up on the left hand side of the parse tree to match the first word of the specified phrase.
Bottom up parsing begins at the leaves and attempts to combine terms from the input into productions, working down towards the start symbol. This kind of parsing is also called shift-reduce parsing because it functions by dropping symbols on to a stack and reducing them when the values on the stack match a production rule.
Like the above poster mentioned, both of these can be implemented as state machines. Of course, depending on the type of state machine in question, your state tables might end up being huge. For simple machines, a simple left to right top down approach would be sufficient. Here's some pseudocode:
1. Is the first word a verb? If so, go to state 2.
2. Is the second word a noun? If so, go to state 3.
3. Are there remaining letters? If not, process phrase, otherwise go to state 4.
4. Is the next word a preposition? If so, go to state 5.
5. Is the next word a noun? If so, process phrase (with preposition).
That covers you for "Eat food" and "Give food to man," although not for intransitive verb commands, like "Sleep," or intransitive verbs with prepositions, which would require a few additional states.
Instead of 'starting small and working up,' decide exactly what kinds of phrases you are going to cover and write explicitly how they are going to be assembled and from what kinds of words. That way you can code once, it will function to specification, and you will have a lower chance of introducing bugs.