J. Hernández-Orallo, M, Baroni, J. Bieger, N. Chmait, D. L. Dowe, K. Hofmann, F. Martínez-Plumed, C. Strannegård and K. R. Thórisson (2017), A New AI Evaluation Cosmos: Ready to Play the Game?, In: AI Magazine, AAAI
We report on a series of new platforms and events dealing with AI evaluation that may change the way in which AI systems are compared and their progress is measured. The introduction of a more diverse and challenging set of tasks in these platforms can feed AI research in the years to come, shaping the notion of success and the directions of the field. However, the playground of tasks and challenges presented there may misdirect the field without some meaningful structure and systematic guidelines for its organization and use. Anticipating this issue, we also report on several initiatives and workshops that are putting the focus on analyzing the similarity and dependencies between tasks, their difficulty, what capabilities they really measure and — ultimately — on elaborating new concepts and tools that can arrange tasks and benchmarks into a meaningful taxonomy.
J. Bieger, K. R. Thórisson, and B. R. Steunebrink (2017), The Pedagogical Pentagon: A Conceptual Framework for Artificial Pedagogy, In: T. Everitt, B. Goertzel, and A. Potapov (eds.), Proceedings of Artificial General Intelligence (AGI-17), 212–222, Springer-Verlag, Melbourne, Australia
Artificial intelligence (AI) and machine learning (ML) research has traditionally focused most energy on constructing systems that can learn from data and/or environment interactions. This paper considers the parallel science of teaching: Artificial Pedagogy (AP). Teaching provides us with a method—aside from programming—for imparting our knowledge to AI systems, and it facilitates cumulative, online learning — which is especially important in cases where the combinatorics of sub-tasks preclude enumeration or a-priori modeling, or where unforeseeable novelty is inherent and unavoidable in the learner's assignments. Teaching is a complex process not currently very well understood, and pedagogical theories proposed so far have exclusively targeted human learners. What is needed is a framework that relates the many facets of teaching, in a way that works for a range of learners including machines.
We present the Pedagogical Pentagon—a conceptual framework that identifies five core concepts of AP: learners, task-environments, testing, training and teaching. We describe these concepts, their interactions, and what we would need to know about them in the context of AP. The pentagon is meant to facilitate research in this complex new area by encouraging a structured and systematic approach organized around its five corners.
K. R. Thórisson, J. Bieger, T. Thorarensen, J. S. Sigurðardóttir, and B. R. Steunebrink (2016), Why Artificial Intelligence Needs a Task Theory—And What It Might Look Like, In: B. S. Steunebrink, P. Wang, and B. Goertzel (eds.), Proceedings of Artificial General Intelligence (AGI-16), 118–128, Springer-Verlag, New York, USA
The concept of “task” is at the core of artificial intelligence (AI): Tasks are used for training and evaluating AI systems, which are built in order to perform and automatize tasks we deem useful. In other fields of engineering theoretical foundations allow thorough evaluation of designs by methodical manipulation of well understood parameters with a known role and importance; this allows an aeronautics engineer, for instance, to systematically assess the effects of wind speed on an airplane’s performance and stability. No framework exists in AI that allows this kind of methodical manipulation: Performance results on the few tasks in current use (cf. board games, question-answering) cannot be easily compared, however similar or different. The issue is even more acute with respect to artificial general intelligence systems, which must handle unanticipated tasks whose specifics cannot be known beforehand. A task theory would enable addressing tasks at the class level, bypassing their specifics, providing the appropriate formalization and classification of tasks, environments, and their parameters, resulting in more rigorous ways of measuring, comparing, and evaluating intelligent behavior. Even modest improvements in this direction would surpass the current ad-hoc nature of machine learning and AI evaluation. Here we discuss the main elements of the argument for a task theory and present an outline of what it might look like for physical tasks.
J. Bieger, K. R. Thórisson, and P. Wang (2015), Safe Baby AGI, In: J. Bieger, B. Goertzel, and A. Potapov (eds.), Proceedings of Artificial General Intelligence (AGI-15), 46–49, Springer-Verlag, Berlin, Germany
Out of fear that artificial general intelligence (AGI) might pose a future risk to human existence, some have suggested slowing or stopping AGI research, to allow time for theoretical work to guarantee its safety. Since an AGI system will necessarily be a complex closed-loop learning controller that lives and works in semi-stochastic environments, its behaviors are not fully determined by its design and initial state, so no mathematico-logical guarantees can be provided for its safety. Until actual running AGI systems exist—and there is as of yet no consensus on how to create them—that can be thoroughly analyzed and studied, any proposal on their safety can only be based on weak conjecture. As any practical AGI will unavoidably start in a relatively harmless baby-like state, subject to the nurture and education that we provide, we argue that our best hope to get safe AGI is to provide it proper education.
K. R. Thórisson, J. Bieger, S. Schiffel, and D. Garrett (2015), Towards Flexible Task Environments for Comprehensive Evaluation of Artificial Intelligent Systems & Automatic Learners, In: J. Bieger, B. Goertzel, and A. Potapov (eds.), Proceedings of Artificial General Intelligence (AGI-15), 187–196, Springer-Verlag, Berlin, Germany
Evaluation of artificial intelligence (AI) systems is a prerequisite for comparing them on the many dimensions they are intended to perform on. Design of task-environments for this purpose is often ad-hoc, focusing on some limited aspects of the systems under evaluation. Testing on a wide range of tasks and environments would better facilitate comparisons and understanding of a system's performance, but this requires that manipulation of relevant dimensions cause predictable changes in the structure, behavior, and nature of the task-environments. What is needed is a framework that enables easy composition, decomposition, scaling, and configuration of task-environments. Such a framework would not only facilitate evaluation of the performance of current and future AI systems, but go beyond it by allowing evaluation of knowledge acquisition, cognitive growth, lifelong learning, and transfer learning. In this paper we list requirements that we think such a framework should meet to facilitate the evaluation of intelligence, and present preliminary ideas on how this could be realized.
D. Garrett, J. Bieger, and K. R. Thórisson (2014), Tunable and Generic Problem Instance Generation for Multi-objective Reinforcement Learning, Proceedings of the Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 1–8, IEEE, Orlando, Florida
A significant problem facing researchers in reinforcement learning, and particularly in multi-objective learning, is the dearth of good benchmarks. In this paper, we present a method and software tool enabling the creation of random problem instances, including multi-objective learning problems, with specific structural properties. This tool, called Merlin (for Multi-objective Environments for Reinforcement LearnINg), provides the ability to control these features in predictable ways, thus allowing researchers to begin to build a more detailed understanding about what features of a problem interact with a given learning algorithm to improve or degrade the algorithm’s performance.We present this method and tool, and briefly discuss the controls provided by the generator, its supported options, and their implications on the generated benchmark instances.
J. Bieger, K. R. Thórisson, and D. Garrett (2014), Raising AI: Tutoring Matters, In: B. Goertzel, L. Orseau and J. Snaider (eds.), Proceedings of Artificial General Intelligence (AGI-14), 1–10, Springer-Verlag, Quebec, Canada
Humans and other animals are often touted as examples of systems that possess general intelligence. However, rarely if ever do they achieve high levels of intelligence and autonomy on their own: they are raised by parents and caregivers in a society with peers and seniors, who serve as teachers and examples. Current methods for developing artificial learning systems typically do not account for this. This paper gives a taxonomy of the main methods for raising / educating naturally intelligent systems and provides examples for how these might be applied to artificial systems. The methods are heuristic rewarding, decomposition, simplification, situation selection, teleoperation, demonstration, coaching, explanation, and cooperation. We argue that such tutoring methods that provide assistance in the learning process can be expected to have great benefits when properly applied to certain kinds of artificial systems.
T. Tsoneva, J. Bieger, and G. Garcia Molina (2010), Towards error-free interaction, Proceedings of the 32nd EMBC
Human-machine interaction (HMI) relies on pattern recognition algorithms that are not perfect. To improve the performance and usability of these systems we can utilize the neural mechanisms in the human brain dealing with error awareness. This study aims at designing a practical error detection algorithm using electroencephalogram signals that can be integrated in an HMI system. Thus, real-time operation, customization, and operation convenience are important. We address these requirements in an experimental framework simulating machine errors. Our results confirm the presence of brain potentials related to processing of machine errors. These are used to implement an error detection algorithm emphasizing the differences in error processing on a per subject basis. The proposed algorithm uses the individual best bipolar combination of electrode sites and requires short calibration. The single-trial error detection performance on six subjects, characterized by the area under the ROC curve ranges from 0.75 to 0.98.
D. Zhu, J. Bieger, G. Garcia Molina, and R. M. Aarts (2009), A survey of stimulation methods used in SSVEP-based BCIs, Journal of Computational Intelligence and Neuroscience
Brain-computer interface (BCI) systems based on the steady-state visual evoked potential (SSVEP) provide higher information throughput and require shorter training than BCI systems using other brain signals. To elicit an SSVEP, a repetitive visual stimulus (RVS) has to be presented to the user. The RVS can be rendered on a computer screen by alternating graphical patterns, or with external light sources able to emit modulated light. The properties of an RVS (e.g., frequency, color) depend on the rendering device and influence the SSVEP characteristics. This affects the BCI information throughput and the levels of user safety and comfort. Literature on SSVEP-based BCIs does not generally provide reasons for the selection of the used rendering devices or RVS properties. In this paper, we review the literature on SSVEP-based BCIs and comprehensively report on the different RVS choices in terms of rendering devices, properties, and their potential influence on BCI performance, user safety and comfort.
J. Bieger, I. Sprinkhuizen-Kuyper, and I. van Rooij (2009), Meaningful Representations Prevent Catastrophic Interference, In: T. Calders, K. Tuyls, and M. Pechenizkiy (eds.), Proceedings of the 21st BNAIC
Artificial Neural Networks (ANNs) attempt to mimic human neural networks in order to perform tasks. In order to do this, tasks need to be represented in ways that the network understands. In ANNs these representations are often arbitrary, whereas in humans it seems that these representations are often meaningful. This article shows how using more meaningful representations in ANNs can be very beneficial. We demonstrate that by using our Static Meaningful Representation Learning (SMRL) technique, ANNs can avoid the problem of catastrophic interference when sequentially learning multiple simple tasks. We also discuss how our approach overcomes known limitations of other techniques for dealing with catastrophic interference.
J. Bieger and K. R. Thórisson (2017), Evaluating Understanding, In: Evaluation of General-Purpose AI Workshop at IJCAI-17, Melbourne, Australia
Understanding is an important aspect of intelligence that has taken a back seat in many approaches to AI. While results in automation can be achieved without it, we argue that understanding is especially important for general-purpose systems. Understanding goes beyond “good performance” on a range of dimensions: if we know that a system understands, we can trust that it will behave relatively robustly, reasonably and predictably—even in novel situations—and that it will be able to use previous understanding to facilitate the acquisition of new understanding. It is doubtful that we could classify systems as having general intelligence if they don’t really understand their tasks, environment, and world, and thus it is important for us to verify the level of understanding of any system intended to strive for generality and autonomy. But because understanding is a hard-to-define internal property of a system, evaluation can be difficult. To further our understanding of understanding and facilitate the development of understanding AI systems, we propose four kinds of tests: A system is said to understand a phenomenon if it can make predictions about it, achieve goals with respect to it, explain it and (re)create it.
J. Bieger, K. R. Thórisson, T. Thorarensen, J. S. Sigurðardóttir, and B. R. Steunebrink (2016), Evaluation of General-Purpose Artificial Intelligence: Why, What & How?, In: Evaluation of General-Purpose AI Workshop at ECAI 2016, The Hague, The Netherlands
System evaluation allows an observer to obtain information about a system’s behavior, and as such is a crucial aspect of any system research and design process. Evaluation in the field of artificial intelligence (AI) is mostly done by measuring a system’s performance on a specialized task. This is appropriate for systems targeted at narrow tasks and domains, but not for evaluating general-purpose AI, which must be able to accomplish a wide range of tasks, including those not foreseen by the system’s designers. Dealing with such novel situations requires general-purpose systems to be adaptive, learn and change over time, which evaluation based on quite different principles. The unique challenges this brings remain largely unaddressed to date, as most evaluation methods either focus on the binary assessment of whether some level of intelligence (e.g. human) has been reached, or performance on a test battery at a particular point in time. In this paper we describe a wide range of questions which we would like to see new evaluation methods for. We take look at various purposes for evaluation from the perspectives of different stakeholders (the why), consider the properties of adaptive systems that are to be measured (the what), and discuss some of the challenges for obtaining the desired information in practice (the how). While these questions largely still lack good answers, we nevertheless attempt to illustrate some issues that we believe are necessary (but perhaps not sufficient) to provide a strong foundation for evaluating general-purpose AI, and propose some ideas for directions in which such work could develop.
T. Thorarensen, K. R. Thórisson, J. Bieger, and J. S. Sigurðardóttir (2016), FraMoTEC: Modular Task-Environment Construction Framework for Evaluating Adaptive Control Systems, Evaluation of General-Purpose AI Workshop at ECAI 2016, The Hague, The Netherlands
While evaluation of specialized tools can be restricted to the task they were designed to perform, evaluation of more general abilities and adaptation requires testing across a large range of tasks. To be helpful in the development of general AI systems, tests should not just evaluate performance at a certain point in time, but also facilitate the measurement of knowledge acquisition, cognitive growth, lifelong learning, and transfer learning. No framework as of yet offers easy modular composition and scaling of task-environments for this purpose, where a wide range of tasks with variations can quickly be constructed, administered, and compared. In this paper we present a new framework in development that allows modular construction of physical task-environments for evaluating intelligent control systems. Our proto- task theory on which the framework is built aims for a deeper understanding of tasks in general, with a future goal of providing a theoretical foundation for all resource-bounded real-world tasks. The tasks discussed here that can currently be constructed in the framework are rooted in physics, allowing us to analyze the performance of control systems in terms of expended time and energy.
J. Bieger (2016), Artificial Pedagogy: A Proposal, HLAI 2016 Doctoral Consortium, New York, USA
General intelligence is the ability to perform complex new tasks in a wide range of large and dynamic environments using available knowledge and resources. The knowledge to perform these tasks must be constructed throughout an entire lifetime as situations change and new challenges arise. Outside of very narrow domains it is virtually intractable to learn a complex new thing from scratch without guidance. Human children are trained, educated and raised to help them learn the basics, transfer humanity’s highly sophisticated knowledge, and to grow cognitively. As far back as 1950 Alan Turing stressed the importance of teaching in “artificial intelligence” (AI) (Turing 1950). The way to get a system to operate at an adult level is not to program it directly, but rather to create a child machine and then educate it. Research in AI has mainly focused on developing algorithms with different learning mechanisms and much less on how to teach artificial systems. The goal of the proposed work is to study “artificial pedagogy” (AP): the science of how to teach an AI, with an emphasis on systems that aspire to reach or surpass human-level intelligence.
J. Bieger, D. Zhu, and G. Garcia Molina (2010), Effects of Stimulation Properties in Steady-State Visual Evoked Potential Based Brain-Computer Interfaces
Brain-Computer Interfaces (BCIs) enable people to control appliances without involving the normal output pathways of peripheral nerves and muscles. A particularly promising type of BCI is based on the Steady-State Visual Evoked Potential (SSVEP). Users can select commands by focusing on visual stimuli that alternate appearance with a certain frequency. The properties of these stimuli, such as size and color, as well as the device they are rendered on, can significantly affect the performance, comfort and safety of the system. However, the choice of stimulation properties is often ad-hoc or copied. In this paper we report our findings about the effects of rendering device, refresh rate, environmental illumination, contrast, color, spatial frequency and size of visual stimuli. In order to investigate these effects online, a high-performance BCI was developed. User comfort was measured using a questionnaire. The results suggest that high contrast stimulation works the best, while also being the least comfortable. However, maximum black/white contrast is often not needed and other stimuli (e.g. blue/green stimulation) are shown to work almost as well, while being far more comfortable. Knowledge of these effects can help to improve SSVEP-based BCIs.
Master thesis: Stimulation Effects in SSVEP-Based BCIs
Brain-Computer Interfaces (BCIs) enable people to control appliances without involving the normal output pathways of peripheral nerves and muscles. A particularly promising type of BCI is based on the Steady-State Visual Evoked Potential (SSVEP). Users can select commands by focusing their attention on repetitive visual stimuli (RVSi) that change one of their properties (e.g. color or pattern) with a certain frequency. These properties as well as the device the RVSi are rendered on, can greatly affect the performance, applicability, comfort and safety of the BCI.
Despite this fact, stimulation properties have received fairly little attention in the BCI literature to this date. Furthermore, a heavy emphasis is placed on BCI performance to the detriment of other important factors such as comfort and safety. The research reported in this document aims at studying the effects of stimulation properties on performance as well as comfort of SSVEP-based BCIs. Research was performed in both offline and online settings, using a custom made high-performance BCI. Comfort was measured using a custom questionnaire.
A large variability across subjects was found, but the results confirm that stimulation properties have a considerable impact on performance and comfort of SSVEP-based BCIs. In general, a large difference between stimulation states is beneficial for BCI performance, but detrimental to user comfort. A couple of configurations were found that provide a good compromise between comfort and performance.
Conclusions: Both the performance and comfort of SSVEP-based BCIs depend significantly on the properties of the RVSi employed in them. In general, more pronounced differences between stimulus states result in better performance, but less comfort. Some property combinations were found that provide a good compromise between comfort and performance. Color stimulation on a dark background seems especially promising.
These findings suggest that the choice of stimulation properties should be made with great care when designing an SSVEP-based BCI. More research is necessary to determine what settings of properties and combinations thereof generally provide the best results. Stimulation property optimization for individual users can also yield great advantages for the usefulness of a BCI
Bachelor thesis: Sequentially Learning Multiple Meaningful Representations in Static Neural Networks: Avoiding Catastrophic Interference in Multi-layer Perceptrons
Artificial neural networks (ANNs) attempt to mimic human neural networks in order to solve problems and carry out tasks. However, in contrast to their human counterparts ANNs cannot generally learn to perform new tasks without forgetting everything they already know due to a phenomenon called catastrophic interference. This paper discusses this phenomenon, shows that it occurs in multi-layer perceptrons with arbitrary task representations and proposes and discusses the static meaningful representation learning method that uses meaningful task representations to circumvent this problem when learning to perform multiple tasks. The technique is powerful enough to enable the learning of several simple tasks without changing the weights of the network. It remains to be seen whether the technique scales to more interesting task domains. The real potential of using meaningful task representations lies in their combination with other techniques.
Superception (2013) at the AGI Summer School 2013
As part of the AGI Summer School 2013 in Beijing students were given the optional assignment to propose a relevant topic and give a presentation on it. In this presentation I discuss the issues of bridging the gap between perception and reasoning in AGI. While specific perception modalities shouldn't necessarily be included in a general architecture, a general architecture should be able to support any sensors that are deemed necessary. For some types of data (e.g. visual) sensors tend to produce very low-level, highly complex perceptual data at very high throughput rates. This may be tough to deal with for high-level reasoning architectures with the overhead of full generality. In this presentation I discuss six types of strategies for dealing with this and relate them back to the architectures covered in the summer school.
Curiosity (2013) at the AGI Summer School 2013
As part of the AGI Summer School 2013 in Beijing students were given the optional assignment to propose a relevant topic and give a presentation on it. In this presentation I discuss the importance of curiosity in a system with insufficient knowledge and resources. Time and energy should not be wasted when no obvious course of action presents itself, but can more effectively be used to acquire more knowledge. The presentation discusses Schmidhuber's theories of fun & creativity and active exploration & curiosity, Oudeyer & Kaplan's Intelligent Adaptive Curiosity model (pdf) and the Work-Play-Dream framework (pdf) by Steunebrink, Koutník, Thórisson, Nivel & Schmidhuber which would later be presented at the main AGI-13 conference by Eric Nivel. These theories are related primarily to AERA with an attempt to elicit discussion about curiosity in NARS and OpenCog (the cognitive architectures covered at the summer school).
Master presentation: Stimulation effects in SSVEP-Based BCIs
Most information in this presentation can be found in my master thesis and the unpublished "Effects of Stimulation Properties in Steady-State Visual Evoked Potential Based Brain-Computer Interfaces" paper referenced above. I gave this presentation once at Phliips Research to report on my findings and once at the Radboud University as a defense for receiving my master of science degree.
Bachelor presentation: Sequentially Learning Multiple Tasks in ANNs
Most information in this presentation can be found in my bachelor thesis and the "Meaningful Representations Prevent Catastrophic Interference" publication referenced above. I gave this presentation to defend my research in order to receive my bachelor of science degree.
Artificial Intelligence in High School Education (2007) at the Dutch Computer Science Education Conference (NIOC)
I was asked to give a presentation on how Artificial Intelligence (AI) topics could be added to expanding high school Computer Science (CS) curricula. The presentation first introduced AI in terms of its various subfields and applications and made a case for its importance in society. Two incorporation approaches were suggested: 1) to teach an AI module at the end of the course when students had sufficiently strong CS skills, or 2) to integrate it throughout the course by using AI examples and assignments to illustrate and elaborate on CS concepts. The presentation was in Dutch.