Deep name generator is a recurrent neural network for generating realistically sounding (and sometimes even existing) names, with interface written in javascript and running in browser.

Figure left: Architecture of the currently used model with input and output dimensions

Parameters

  • Language model: choose the language model. Different models were trained on different datasets and also with slightly different network architectures.

  • Word start: letters that you want your word to begin with.

  • Randomness: coefficient determining how random the output will be: higher value means more randomness. For randomness=0, the output is purely deterministic, randomness around 1 corresponds to standard probabilities learned by the model, and for high coefficients the outputs degenerate into random letter sequences.

Algorithm

The model generates the word sequentially: one character at each step, while always taking into account all previously generated characters. At each step, the already generated characters are encoded into sparse vectors having 1 at the position of the character in alphabet and 0 at all other positions (one-hot encoding). These vectors are fed into a neural network which outputs a vector \(P\) with elements corresponding to the character scores: the score \(P_i\) at position \(i\) can be interpreted as modelled probability that letter at the \(i\)-th position of the alphabet is the next letter of the word. Using the randomness coefficient r, this vector is further processed using following formula for its elements:

\[\tilde P_i = \frac{P_i^{1/r}}{|| \sum_i^N P_i^{1/r} ||_1}\]

These new probabilities \(\tilde P_i\) are then used for sampling the next letter. Note that for \(r \rightarrow 0\) sampling degenerates into argmax and the output is deterministic. On the other hand, for \(r \rightarrow \infty\), each letter is sampled with the same probability, which results in random letter sequences without any human language resemblance.

The neural network consists of one-dimensional convolutional layers with kernel size 1, which are used to transform the sparse one-hot vector representation of a character into dense character embeding and to further process it. These are followed by one or more recurrent layers which integrate the information from previous characters. The final layer uses softmax activation and outputs the score vector \(P\).

Training procedure

The prediction process described above already indicates the training procedure: the model was simply trained to predict the letters of incomplete words -for the sake of this, only a simple dataset consisting of words is needed.

Implementation

The model was implemented and trained in Python TensorFlow with Keras API and exported into the TensorFlow.js. The inference is done in purely in javascript. This means, the neural network is currently running right in your browser. Source code can be found in this repo.

Data sources