AI
DeepMind papers at ICML 2018
The 2018 International Conference on Machine Learning will take place in Stockholm, Sweden from 1015 July.For those attending and planning the week ahead, we are sharing a schedule of DeepMind presentations at ICML (you can download a pdf version here). We look forward to the many engaging discussions, ideas, and collaborations that are sure to arise from the conference!Efficient Neural Audio SynthesisAuthors: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Nouri, Norman Casagrande, Edward Lockhart, Sander Dieleman, Aaron van den Oord, Koray KavukcuogluSequential models achieve stateoftheart results in audio, visual and textual domains with respect to both estimating the data distribution and generating desired samples. Efficient sampling for this class of models at the cost of little to no loss in quality has however remained an elusive task. With a focus on texttospeech synthesis, we show that compact recurrent architectures, a remarkably high degree of weight sparsification and a novel reordering of the variables greatly reduce sampling latency while maintaining high audio fidelity. We first describe a compact singlelayer recurrent neural network, the WaveRNN, with a novel dual softmax layer that matches the quality of the stateoftheart WaveNet model.
Efficient Neural Audio Synthesis
Authors: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Nouri, Norman Casagrande, Edward Lockhart, Sander Dieleman, Aaron van den Oord, Koray Kavukcuoglu
Sequential models achieve stateoftheart results in audio, visual and textual domains with respect to both estimating the data distribution and generating desired samples. Efficient sampling for this class of models at the cost of little to no loss in quality has however remained an elusive task. With a focus on texttospeech synthesis, we show that compact recurrent architectures, a remarkably high degree of weight sparsification and a novel reordering of the variables greatly reduce sampling latency while maintaining high audio fidelity. We first describe a compact singlelayer recurrent neural network, the WaveRNN, with a novel dual softmax layer that matches the quality of the stateoftheart WaveNet model. Persistent GPU kernels for the WaveRNN are able to synthesize 24kHz 16bit audio 4 times faster than real time. We then apply a weight sparsification technique to the model. We show that, given a constant number of weights, large sparse networks perform better than small dense networks. Using a large Sparse WaveRNN, we demonstrate the feasibility of realtime synthesis of highfidelity audio on a lowpower mobile phone CPU. We use a large Sparse WaveRNN to demonstrate the first instance of realtime synthesis of highfidelity audio on lowresource mobile phone CPU. Finally, we introduce a novel reordering of the variables in the factorization of the joint distribution. The reordering makes it possible to trade vacuous dependencies on samples from the distant future for the ability to generate in batches. The Batch WaveRNN produces up to 16 samples per step maintaining high quality and enables audio synthesis that is up to 40 times faster than real time.
Presentations:
 11:00 – 11:20 AM @ Victoria (Oral)
 06:15 – 09:00 PM @ Hall B #105 (Poster)
Learning to Search with MCTSnets
Authors: Arthur Guez*, Theophane Weber*, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Remi Munos, David Silver
Planning problems are among the most important and wellstudied problems in artificial intelligence. They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and backup those evaluations to the root of a search tree. Among these algorithms, MonteCarlo tree search (MCTS) is one of the most general, powerful and widely used. A typical implementation of MCTS uses cleverly designed rules, optimised to the particular characteristics of the domain. These rules control where the simulation traverses, what to evaluate in the states that are reached, and how to backup those evaluations. In this paper we instead learn where, what and how to search. Our architecture, which we call an MCTSnet, incorporates simulationbased search inside a neural network, by expanding, evaluating and backingup a vector embedding. The parameters of the network are trained endtoend using gradientbased optimisation. When applied to small searches in the wellknown planning problem Sokoban, the learned search algorithm significantly outperformed MCTS baselines.
Presentations:
 11:20 – 11:30 AM @ Victoria (Oral)
 06:15 – 09:00 PM @ Hall B #92 (Poster)
LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration
Authors: Gellert Weisz, Andras Gyorgy, and Csaba Szepesvari
We consider the problem of configuring generalpurpose solvers to run efficiently on problem instances drawn from an unknown distribution. The goal of the configurator is to find a configuration that runs fast on average on most instances, and do so with the least amount of total work. It can run a chosen solver on a random instance until the solver finishes or a timeout is reached. We propose LEAPSANDBOUNDS, an algorithm that tests configurations on randomly selected problem instances for longer and longer time. We prove that the capped expected runtime of the configuration returned by LEAPSANDBOUNDS is close to the optimal expected runtime, while our algorithm’s running time is nearoptimal. Our results show that LEAPSANDBOUNDS is more efficient than the recent algorithm of Kleinberg et al. (2017), which, to our knowledge, is the only other algorithm configuration method that claims to have nontrivial theoretical guarantees. Experimental results on configuring a public SAT solver on a public benchmark also stand witness to the superiority of our method.
Presentations:
 11:30 – 11:40 AM @ A6 (Oral)
 06:15 – 09:00 PM @ Hall B #165 (Poster)
Implicit Quantile Networks for Distributional Reinforcement Learning
Authors: Will Dabney*, Georg Ostrovski*, David Silver, Remi Munos
In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and stateoftheart distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the stateaction return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risksensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithms implicitly defined distributions to study the effects of risksensitive policies in Atari games.
Presentations:
 11:40 – 11:50 AM @ A1 (Oral)
 06:15 – 09:00 PM @ Hall B #3 (Poster)
Graph Networks as Learnable Physics Engines for Inference and Control
Authors: Alvaro Sanchez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, Peter Battaglia
Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models—based on graph networks—which implement an inductive bias for object and relationcentric representations of complex, dynamical systems. Our results show that as a forward model, our approach supports accurate predictions, and surprisingly strong and efficient generalization, across eight distinct physical systems which we varied parametrically and structurally. We also found that our inference model can perform system identification from real and simulated data. Our models are also differentiable, and support online planning via gradient based trajectory optimization, as well as offline policy optimization. Our framework offers new opportunities for harnessing and exploiting rich knowledge about the world, and takes a key step toward building machines with more humanlike representations of the world.
Presentations:
 11:50 AM – 12:00 PM @ Victoria (Oral)
 06:15 – 09:00 PM @ Hall B #84 (Poster)
More Robust Doubly Robust Offpolicy Evaluation
Authors: Mehrdad Farajtabar, Yinlam Chow, and Mohammad Ghavamzadeh
We study the problem of offpolicy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of a policy from the data generated by another policy(ies). In particular, we focus on the doubly robust (DR) estimators that consist of an importance sampling (IS) component and a performance model, and utilize the low (or zero) bias of IS and low variance of the model at the same time. Although the accuracy of the model has a huge impact on the overall performance of DR, most of the work on using the DR estimators in OPE has been focused on improving the IS part, and not much on how to learn the model. In this paper, we propose alternative DR estimators, called more robust doubly robust (MRDR), that learn the model parameter by minimizing the variance of the DR estimator. We first present a formulation for learning the DR model in RL. We then derive formulas for the variance of the DR estimator in both contextual bandits and RL, such that their gradients w.r.t. the model parameters can be estimated from the samples, and propose methods to efficiently minimize the variance. We prove that the MRDR estimators are strongly consistent and asymptotically optimal. Finally, we evaluate MRDR in bandits and RL benchmark problems, and compare its performance with the existing methods.
Presentations:
 11:50 AM – 12:00 PM @ A1 (Oral)
 06:15 – 09:00 PM @ Hall B #62 (Poster)
Conditional Neural Processes
Authors: Marta Garnelo, Dan Rosenbaum, Christopher Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo Rezende, S. M. Ali Eslami
Deep neural networks excel at function approximation, yet they are typically trained from scratch for each new function. On the other hand, Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a new function at test time. Yet GPs are computationally expensive, and it can be hard to design appropriate priors. In this paper we propose a family of neural models, Conditional Neural Processes (CNPs), that combine the benefits of both. CNPs are inspired by the flexibility of stochastic processes such as GPs, but are structured as neural networks and trained via gradient descent. CNPs make accurate predictions after observing only a handful of training data points, yet scale to complex functions and large datasets. We demonstrate the performance and versatility of the approach on a range of canonical machine learning tasks, including regression, classification
and image completion.
Presentations:
 02:10 – 02:20 PM @ Victoria (Oral)
 06:15 – 09:00 PM @ Hall B #130 (Poster)
Generative Temporal Models with Spatial Memory for Partially Observed Environments
Authors: Marco Fraccaro, Danilo Jimenez Rezende, Yori Zwols, Alexander Pritzel, S. M. Ali Eslami, Fabio Viola
In modelbased reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent’s representations during training or via use as part of an explicit planning mechanism. However, their application in practice has been limited to simplistic environments, due to the difficulty of training such models in larger, potentially partiallyobserved and 3D environments. In this work we introduce a novel actionconditioned generative model of such challenging environments. The model features a nonparametric spatial memory system in which we store learned, disentangled representations of the environment. Lowdimensional spatial updates are computed using a statespace model that makes use of knowledge on the prior dynamics of the moving agent, and highdimensional visual observations are modelled with a Variational AutoEncoder. The result is a scalable architecture capable of performing coherent predictions over hundreds of time steps across a range of partially observed 2D and 3D environments.
Presentations:
 02:30 – 02:50 PM @ A7 (Oral)
 06:15 – 09:00 PM @ Hall B #101 (Poster)
Disentangling by Factorising
Authors: Hyunjik Kim, Andriy Mnih
We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon βVAE by providing a better tradeoff between disentanglement and reconstruction quality and being more robust to the number of training iterations. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.
Presentations:
 02:50 – 03:00 PM @ A7 (Oral)
 06:15 – 09:00 PM @ Hall B #90 (Poster)
Learning by Playing – Solving Sparse Reward Tasks from Scratch
Authors: Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tobias Springenberg
We propose Scheduled Auxiliary Control (SAC), a new learning paradigm in the context of Reinforcement Learning (RL) . SAC enables learning of complex behaviors – from scratch – in the presence of multiple sparse reward signals. To achieve this the agent is equipped with a set of general auxiliary tasks, that it attempts to learn simultaneously via offpolicy RL. The key idea behind our method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment – enabling it
to excel at sparse reward RL. Our experiments in several challenging robotic manipulation settings demonstrate the power of our approach.
Read more on the DeepMind blog.
Presentations:
 04:20 – 04:40 PM @ A1 (Oral)
 06:15 – 09:00 PM @ Hall B #41 (Poster)
Fast Parametric Learning with Activation Memorization
Authors: Jack W Rae, Chris Dyer, Peter Dayan, Timothy P Lillicrap
Neural networks trained with backpropagation often struggle to identify classes that have been observed a small number of times. In applications where most class labels are rare, such as language modelling, this can become a performance bottleneck. One potential remedy is to augment the network with a fastlearning nonparametric model which attends over recent activations. We explore a simplified architecture where we treat a subset of the model parameters as fast memory stores. This can help retain information over longer time intervals than a traditional memory, and does not require additional space or compute. In the case of image classification, we display faster binding of novel classes on an Omniglot image curriculum task. We also show improved performance for wordbased language models on news reports (GigaWord), books (Project Gutenberg) and Wikipedia articles (WikiText103) — the latter achieving stateoftheart perplexity.
Presentations:
 04:40 – 04:50 PM @ Victoria (Oral)
 06:15 – 09:00 PM @ Hall B #121 (Poster)
Learning Implicit Generative Models with the Method of Learned Moments
Authors: Suman Ravuri, Shakir Mohamed, Mihaela Rosca, and Oriol Vinyals
We propose a method of moments (MoM) algorithm for training largescale implicit generative models. Moment estimation in this setting encounters two problem: it is often difficult to define the millions of moments needed to learn the model parameters, and it is hard to determine which properties are useful when specifying moments. To address the first issue, we introduce a moment network, and define the moments as the gradient of the network’s output with respect to its parameters and the network’s hidden units. To tackle the second problem, we use asymptotic theory to highlight desiderata for moments – namely they should minimize the asymptotic variance of estimated model parameters – and introduce an objective to learn better moments. The sequence of objectives created by this Method of Learned Moments (MoLM) can train highquality neural image samplers. On CIFAR10, we demonstrate that MoLMtrained generators achieve significantly higher Inception Scores and lower Frechet Inception Distances than those trained with gradient penalty regularized adversarial objectives. These generators also achieve nearly perfect MultiScale Structural Similarity Scores on CelebA, and can create highquality samples of resolutions up to 128×128.
Presentations:
 04:40 – 04:50 PM @ A7 (Oral)
 06:15 – 09:00 PM @ Hall B #112 (Poster)
Automatic Goal Generation for Reinforcement Learning Agents
Authors: David Held, Xinyang Geng, Carlos Florensa, Pieter Abbeel
Reinforcement learning is a powerful technique to train an agent to perform a task. However, an agent that is trained using reinforcement learning is only capable of achieving the single task that is specified via its reward function. Such an approach does not scale well to settings in which an agent needs to perform a diverse set of tasks, such as navigating to varying positions in a room or moving objects to varying locations. Instead, we propose a method that allows an agent to automatically discover the range of tasks that it is capable of performing. We use a generator network to propose tasks for the agent to try to achieve, specified as goal states. The generator network is optimized using adversarial training to produce tasks that are always at the appropriate level of difficulty for the agent. Our method thus automatically produces a curriculum of tasks for the agent to learn. We show that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment. Our method can also learn to achieve tasks with sparse rewards, which traditionally pose significant challenges.
Presentations:
 04:40 – 04:50 PM @ A1 (Oral)
 06:15 – 09:00 PM @ Hall B #135 (Poster)
Machine Theory of Mind
Authors: Neil C. Rabinowitz, Frank Perbet, H. Francis Song, Chiyuan Zhang, S. M. Ali Eslami, Matthew Botvinick
Theory of mind (ToM) broadly refers to humans’ ability to represent the mental states of others, including their desires, beliefs, and intentions. We propose to train a machine to build such models too. We design a Theory of Mind neural network – a ToMnet – which uses metalearning to build models of the agents it encounters. The ToMnet learns a strong prior model for agents’ future behaviour, and, using only a small number of behavioural observations, can bootstrap to richer predictions about agents’ characteristics and mental states. We apply the ToMnet to agents behaving in simple gridworld environments, showing that it learns to model random, algorithmic, and deep RL agents from varied populations, and that it passes classic ToM tasks such as the “SallyAnne” test of recognising that others can hold false beliefs about the world.
Presentations:
 05:00 – 05:20 PM @ A3 (Oral)
 06:15 – 09:00 PM @ Hall B #208 (Poster)
Path Consistency Learning in Tsallis Entropy Regularized MDPs
Authors: Ofir Nachum, Yinlam Chow, and Mohammad Ghavamzadeh
We study the sparse entropyregularized reinforcement learning (ERL) problem in which the entropy term is a special form of the Tsallis entropy. The optimal policy of this formulation is sparse, i.e., at each state, it has nonzero probability for only a small number of actions. This addresses the main drawback of the standard Shannon entropyregularized RL (soft ERL) formulation, in which the optimal policy is {\em softmax}, and thus, may assign a nonnegligible probability mass to nonoptimal actions. This problem is aggravated as the number of actions is increased. In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called sparse PCL, for the sparse ERL problem that can work with both onpolicy and offpolicy data. We first derive a sparse consistency equation that specifies a relationship between the optimal value function and policy of the sparse ERL along any system trajectory. Crucially, a weak form of the converse is also true, and we quantify the suboptimality of a policy which satisfies sparse consistency, and show that as we increase the number of actions, this suboptimality is better than that of the soft ERL optimal policy. We then use this result to derive the sparse PCL algorithms. We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with a large number of actions.
Presentations:
 05:20 – 05:40 PM @ A1 (Oral)
 06:15 – 09:00 PM @ Hall B #172 (Poster)
Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement
Authors: Andre Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Makowitz, Augustin Zidek, Remi Munos
The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we investigate the feasibility of combining SF&GPI with the representation power of deep learning. Since in deep RL we are interested in learning all the components of SF&GPI concurrently, the existing interdependencies between them can lead to instabilities. In this work we propose a solution for this problem that makes it possible to use SF & GPI online, at scale. In order to empirically verify this claim, we apply the proposed method to a complex 3D environment that requires hundreds of millions of transitions to be solved. We show that the transfer promoted by SF&GPI leads to reasonable policies on unseen tasks almost instantaneously. We also show how to build on the transferred policies to learn policies that are specialised to the new tasks, which can then be added to the agent’s set of skills to be used in the future.
Presentations:
 05:20 – 05:40 PM @ A3 (Oral)
 06:15 – 09:00 PM @ Hall B #163 (Poster)
Been There, Done That: MetaLearning with Episodic Recall
Authors: Samuel Ritter, Jane Wang, Sid Jayakumar, Zeb KurthNelson, Charles Blundell, Razvan Pascanu, Matt Botvinick
Metalearning agents have demonstrated the ability to rapidly explore and exploit new tasks sampled from the task distribution on which they were trained. However, when these agents encounter situations that they explored in the distant past, they are not able to remember the results of their past exploration. Thus, instead of immediately exploiting previously discovered solutions, they must again explore from scratch. In this work, we argue that the necessity to remember the results of past exploration is ubiquitous in naturalistic environments. We propose a formalism for modeling this kind of recurring environment structure, then develop a metalearning architecture for solving such environments. This architecture melds the standard LSTM working memory with a differentiable neural episodic memory. We explore the capabilities of this episodic LSTM in four recurrentstate stochastic process environments: 1.) episodic contextual bandits, 2.) compositional contextual bandits, 3.) episodic twostep task, and 4.) contextual watermaze navigation.
Presentations:
 05:40 – 05:50 PM @ A3 (Oral)
 06:15 – 09:00 PM @ Hall B #209 (Poster)
Adversarial Risk and the Dangers of Evaluating Against Weak Attacks
Authors: Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli.
This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. The existence of adversarial examples in trained neural networks reflects the fact that expected risk alone does not capture the model’s performance against worstcase inputs. We motivate the use of advKarol Hausmaersarial risk as an objective, although it can not easily be computed exactly. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may be obscured to adversaries, by optimizing this surrogate rather than the true adversarial risk. We demonstrate that this is a significant problem in practice by repurposing gradientfree optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that
our formulations and results will help researchers to develop more powerful defenses.
Presentations:
 05:50 – 06:00 PM @ A7 (Oral)
 06:15 – 09:00 PM @ Hall B #132 (Poster)
Source: https://deepmind.com/blog/announcements/deepmindpapersicml2018
AI
How does it know?! Some beginner chatbot tech for newbies.
Most people will know by now what a chatbot or conversational AI is. But how does one design and build an intelligent chatbot? Let’s investigate some essential concepts in bot design: intents, context, flows and pages.
I like using Google’s Dialogflow platform for my intelligent assistants. Dialogflow has a very accurate NLP engine at a cost structure that is extremely competitive. In Dialogflow there are roughly two ways to build the bot tech. One is through intents and context, the other is by means of flows and pages. Both of these design approaches have their own version of Dialogflow: “ES” and “CX”.
Dialogflow ES is the older version of the Dialogflow platform which works with intents, context and entities. Slot filling and fulfillment also help manage the conversation flow. Here are Google’s docs on these concepts: https://cloud.google.com/dialogflow/es/docs/concepts
Context is what distinguishes ES from CX. It’s a way to understand where the conversation is headed. Here’s a diagram that may help understand how context works. Each phrase that you type triggers an intent in Dialogflow. Each response by the bot happens after your message has triggered the most likely intent. It’s Dialogflow’s NLP engine that decides which intent best matches your message.
1. 8 Proven Ways to Use Chatbots for Marketing (with Real Examples)
2. How to Use Texthero to Prepare a Textbased Dataset for Your NLP Project
What’s funny is that even though you typed ‘yes’ in exactly the same way twice, the bot gave you different answers. There are two intents that have been programmed to respond to ‘yes’, but only one of them is selected. This is how we control the flow of a conversation by using context in Dialogflow ES.
Unfortunately the way we program context into a bot on Dialogflow ES is not supported by any visual tools like the diagram above. Instead we need to type this context in each intent without seeing the connection to other intents. This makes the creation of complex bots quite tedious and that’s why we map out the design of our bots in other tools before we start building in ES.
The newer Dialogflow CX allows for a more advanced way of managing the conversation. By adding flows and pages as additional control tools we can now visualize and control conversations easily within the CX platform.
This entire diagram is a ‘flow’ and the blue blocks are ‘pages’. This visualization shows how we create bots in Dialogflow CX. It’s immediately clear how the different pages are related and how the user will move between parts of the conversation. Visuals like this are completely absent in Dialogflow ES.
It then makes sense to use different flows for different conversation paths. A possible distinction in flows might be “ordering” (as seen here), “FAQs” and “promotions”. Structuring bots through flows and pages is a great way to handle complex bots and the visual UI in CX makes it even better.
At the time of writing (October 2020) Dialogflow CX only supports English NLP and its pricing model is surprisingly steep compared to ES. But bots are becoming critical tech for an increasing number of companies and the cost reductions and quality of conversations are enormous. Building and managing bots is in many cases an ongoing task rather than a single, roundedoff project. For these reasons it makes total sense to invest in a tool that can handle increasing complexity in an easytouse UI such as Dialogflow CX.
This article aims to give insight into the tech behind bot creation and Dialogflow is used merely as an example. To understand how I can help you build or manage your conversational assistant on the platform of your choice, please contact me on LinkedIn.
AI
Who is chatbot Eliza?
Between 1964 and 1966 Eliza was born, one of the very first conversational agents. Discover the whole story.
Between 1964 and 1966 Eliza was born, one of the very first conversational agents. Its creator, Joseph Weizenbaum was a researcher at the famous Artificial Intelligence Laboratory of the MIT (Massachusetts Institute of Technology). His goal was to enable a conversation between a computer and a human user. More precisely, the program simulates a conversation with a Rogérian psychoanalyst, whose method consists in reformulating the patient’s words to let him explore his thoughts himself.
1. 8 Proven Ways to Use Chatbots for Marketing (with Real Examples)
2. How to Use Texthero to Prepare a Textbased Dataset for Your NLP Project
The program was rather rudimentary at the time. It consists in recognizing key words or expressions and displaying in return questions constructed from these key words. When the program does not have an answer available, it displays a “I understand” that is quite effective, albeit laconic.
Weizenbaum explains that his primary intention was to show the superficiality of communication between a human and a machine. He was very surprised when he realized that many users were getting caught up in the game, completely forgetting that the program was without real intelligence and devoid of any feelings and emotions. He even said that his secretary would discreetly consult Eliza to solve his personal problems, forcing the researcher to unplug the program.
Conversing with a computer thinking it is a human being is one of the criteria of Turing’s famous test. Artificial intelligence is said to exist when a human cannot discern whether or not the interlocutor is human. Eliza, in this sense, passes the test brilliantly according to its users.
Eliza thus opened the way (or the voice!) to what has been called chatbots, an abbreviation of chatterbot, itself an abbreviation of chatter robot, literally “talking robot”.
Source: https://chatbotslife.com/whoischatbotelizabfeef79df804?source=rss—a49517e4c30b—4
AI
FermiNet: Quantum Physics and Chemistry from First Principles
Weve developed a new neural network architecture, the Fermionic Neural Network or FermiNet, which is wellsuited to modeling the quantum state of large collections of electrons, the fundamental building blocks of chemical bonds.
Unfortunately, 0.5% error still isn’t enough to be useful to the working chemist. The energy in molecular bonds is just a tiny fraction of the total energy of a system, and correctly predicting whether a molecule is stable can often depend on just 0.001% of the total energy of a system, or about 0.2% of the remaining “correlation” energy. For instance, while the total energy of the electrons in a butadiene molecule is almost 100,000 kilocalories per mole, the difference in energy between different possible shapes of the molecule is just 1 kilocalorie per mole. That means that if you want to correctly predict butadiene’s natural shape, then the same level of precision is needed as measuring the width of a football field down to the millimeter.
With the advent of digital computing after World War II, scientists developed a whole menagerie of computational methods that went beyond this mean field description of electrons. While these methods come in a bewildering alphabet soup of abbreviations, they all generally fall somewhere on an axis that trades off accuracy with efficiency. At one extreme, there are methods that are essentially exact, but scale worse than exponentially with the number of electrons, making them impractical for all but the smallest molecules. At the other extreme are methods that scale linearly, but are not very accurate. These computational methods have had an enormous impact on the practice of chemistry – the 1998 Nobel Prize in chemistry was awarded to the originators of many of these algorithms.
Fermionic Neural Networks
Despite the breadth of existing computational quantum mechanical tools, we felt a new method was needed to address the problem of efficient representation. There’s a reason that the largest quantum chemical calculations only run into the tens of thousands of electrons for even the most approximate methods, while classical chemical calculation techniques like molecular dynamics can handle millions of atoms. The state of a classical system can be described easily – we just have to track the position and momentum of each particle. Representing the state of a quantum system is far more challenging. A probability has to be assigned to every possible configuration of electron positions. This is encoded in the wavefunction, which assigns a positive or negative number to every configuration of electrons, and the wavefunction squared gives the probability of finding the system in that configuration. The space of all possible configurations is enormous – if you tried to represent it as a grid with 100 points along each dimension, then the number of possible electron configurations for the silicon atom would be larger than the number of atoms in the universe!
This is exactly where we thought deep neural networks could help. In the last several years, there have been huge advances in representing complex, highdimensional probability distributions with neural networks. We now know how to train these networks efficiently and scalably. We surmised that, given these networks have already proven their mettle at fitting highdimensional functions in artificial intelligence problems, maybe they could be used to represent quantum wavefunctions as well. We were not the first people to think of this – researchers such as Giuseppe Carleo and Matthias Troyer and others have shown how modern deep learning could be used for solving idealised quantum problems. We wanted to use deep neural networks to tackle more realistic problems in chemistry and condensed matter physics, and that meant including electrons in our calculations.
There is just one wrinkle when dealing with electrons. Electrons must obey the Pauli exclusion principle, which means that they can’t be in the same space at the same time. This is because electrons are a type of particle known as fermions, which include the building blocks of most matter – protons, neutrons, quarks, neutrinos, etc. Their wavefunction must be antisymmetric – if you swap the position of two electrons, the wavefunction gets multiplied by 1. That means that if two electrons are on top of each other, the wavefunction (and the probability of that configuration) will be zero.
This meant we had to develop a new type of neural network that was antisymmetric with respect to its inputs, which we have dubbed the Fermionic Neural Network, or FermiNet. In most quantum chemistry methods, antisymmetry is introduced using a function called the determinant. The determinant of a matrix has the property that if you swap two rows, the output gets multiplied by 1, just like a wavefunction for fermions. So you can take a bunch of singleelectron functions, evaluate them for every electron in your system, and pack all of the results into one matrix. The determinant of that matrix is then a properly antisymmetric wavefunction. The major limitation of this approach is that the resulting function – known as a Slater determinant – is not very general. Wavefunctions of real systems are usually far more complicated. The typical way to improve on this is to take a large linear combination of Slater determinants – sometimes millions or more – and add some simple corrections based on pairs of electrons. Even then, this may not be enough to accurately compute energies.

AI1 week ago
7 Reasons Why is your business in need of a chatbot?

AI1 week ago
Future of Contact Centers Enabling Employees to Work from Home

AI1 week ago
5 Astonishing Statistics Associated With ELearning Industry

AI4 days ago
Ai and Chatbots are Transforming the Customer Experience

AI1 week ago
Submit a Guest Post: Get Results with Posts on Other Blogs

AI1 week ago
This is How Chatbots Are Replacing Traditional Customer Support

AI1 week ago
Fast reinforcement learning through the composition of behaviours

AI5 days ago
5 Reasons Business Should Consider to Hire a Franchise Consultant