The paper presents a framework and a set of synthetic toy tasks (classified into skill sets) for analyzing the performance of different machine learning algorithms.
- Single/Two/Three Supporting Facts: Questions where a single(or multiple) supporting facts provide the answer. More is the number of supporting facts, tougher is the task.
- Two/Three Supporting Facts: Requires differentiation between objects and subjects.
- Yes/No Questions: True/False questions.
- Counting/List/Set Questions: Requires ability to count or list objects having a certain property.
- Simple Negation and Indefinite Knowledge: Tests the ability to handle negation constructs and model sentences that describe a possibility and not a certainty.
- Basic Coreference, Conjunctions, and Compound Coreference: Requires ability to handle different levels of coreference.
- Time Reasoning: Requires understanding the use of time expressions in sentences.
- Basic Deduction and Induction: Tests basic deduction and induction via inheritance of properties.
- Position and Size Reasoning
- Path Finding: Find path between locations.
- Agent's Motivation: Why an agent performs an action ie what is the state of the agent.
- The dataset is available here and the source code to generate the tasks is available here.
- The different tasks are independent of each other.
- For supervised training, the set of relevant statements is provided along with questions and answers.
- The tasks are available in English, Hindi and shuffled English words.
- Simulated world consists of entities of various types (locations, objects, persons etc) and of various actions that operate on these entities.
- These entities have their internal state and follow certain rules as to how they interact with other entities.
- Basic simulations are of the form: eg Bob go school.
- To add variations, synonyms are used for entities and actions.
- N-gram classifier baseline
- LSTMs
- Memory Networks (MemNNs)
- Structured SVM incorporating externally labeled data
- Adaptive Memories - learn the number of hops to be performed instead of using the fixed value of 2 hops.
- N-grams - Use a bag of 3-grams instead of a bag-of-words.
- Nonlinearity - Apply 2-layer neural network with tanh nonlinearity in the matching function.
- Uses coreference resolution and semantic role labeling (SRL) which are themselves trained on a large amount of data.
- First train with strong supervision to find supporting statements and then use a similar SVM to find the response.
- Standard MemNN outperform N-gram and LSTM but still fail on a number of tasks.
- MemNNs with Adaptive Memory improve the performance for multiple supporting facts task and basic induction task.
- MemNNs with N-gram modeling improves results when word order matters.
- MemNNs with Nonlinearity performs well on Yes/No tasks and indefinite knowledge tasks.
- Structured SVM outperforms vanilla MemNNs but not as good as MemNNs with modifications.
- Structured SVM performs very well on path finding task due to its non-greedy search approach.