Build an AI Programmer using Recurrent Neural Network (3)

Recurrent Neural Networks (RNNs) are gaining a lot of attention in recent years because it has shown great promise in many natural language processing tasks. Despite their popularity, there are a limited number of tutorials which explain how to implement a simple and interesting application using the state-of-art tools. In this series, we will use a recurrent neural network to train an AI programmer, which can write Java code like a real programmer (hopefully). The following will be covered:

1. Building a simple AI programmer
2. Improving the AI programmer – Using tokens
3. Improving the AI programmer – Using different network structures (this post)

In the previous posts, we built a basic AI programmer using characters and tokens as training data respectively. Both of the approaches use a simple 1-layer LSTM neural network. More specifically, the network uses a many-to-one structure as shown in the following diagram:

For sequence-to-sequence predictions, there are other structures such as one-to-many and many-to-many. In this post, we will implement a simple many-to-many network structure like the following. The code is pushed to the same repository on GitHub (link is provided at the end of this post).

Since the majority part of the code is the same with the previous post, I only highlight the differences here.

1. Prepare the training data

Since this time we will predict a sequence instead of the next token, the y should also be a sequence. y is the sequence left-shifted by one from X.

NUM_INPUT_TOKENS = 10
step = 3
sequences = []
 
for i in range(0, len(tokenized) - NUM_INPUT_TOKENS-1, step):
    sequences.append(tokenized[i: i + NUM_INPUT_TOKENS+1])
 
print('# of training sequences:', len(sequences))
 
X_temp = np.zeros((len(sequences), NUM_INPUT_TOKENS + 1, len(uniqueTokens)), dtype=np.bool)
X = np.zeros((len(sequences), NUM_INPUT_TOKENS, len(uniqueTokens)), dtype=np.bool)
y = np.zeros((len(sequences), NUM_INPUT_TOKENS, len(uniqueTokens)), dtype=np.bool)
 
for i, sequence in enumerate(sequences):
    for t, char in enumerate(sequence):
        X_temp[i, t, token_indices[char]] = 1
 
num_sequences = len(X_temp)
for i, vec in enumerate(X_temp):
    y[i] = vec[1:]
    X[i]= vec[:-1]

2. Build a many-to-many recurrent neural network

Here is the code to build a many-to-many recurrent network.

model = Sequential()
model.add(LSTM(128, input_shape=(NUM_INPUT_TOKENS, len(uniqueTokens)), return_sequences=True))
model.add(TimeDistributed(Dense(len(uniqueTokens))))
model.add(Activation('softmax'))
optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
print(model.summary())

You can print the network structure:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 10, 128)           670208    
_________________________________________________________________
time_distributed_1 (TimeDist (None, 10, 1180)          152220    
_________________________________________________________________
activation_1 (Activation)    (None, 10, 1180)          0         
=================================================================

Like what we did for the many-to-one structure, we can also easily stack one more layer of LSTM like the following:

model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(NUM_INPUT_TOKENS, len(uniqueTokens))))
model.add(LSTM(128, return_sequences=True))
model.add(TimeDistributed(Dense(len(uniqueTokens))))
model.add(Activation('softmax'))
optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
print(model.summary())

The network structure is like this:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 10, 128)           670208    
_________________________________________________________________
lstm_2 (LSTM)                (None, 10, 128)           131584    
_________________________________________________________________
time_distributed_1 (TimeDist (None, 10, 1180)          152220    
_________________________________________________________________
activation_1 (Activation)    (None, 10, 1180)          0         
=================================================================

3. Results

The results looks better than the previous many-to-one network after only a few iterations. I highly recommend you to run the code and have your own observations and think about the reason. That would be a good exercise.

runattributes = numberelements [ i ] . offsets [ currindex ] ; 
patternentry ucompactintarray ; 
import sun . util . oldstart ; 

4. What’s Next?

In this post, I used a many-to-many structure network to train the model and the model predicts token sequences. Maybe for fun, you can also try the one-to-many network. Check out this table to see other network structures. In addition, there are many other parameters we can tune to make the training faster and make the AI Programmer better.

Source Code

1) The source code of this post is lstm_ai_coder_tokens_many2many.py which is located at https://github.com/ryanlr/RNN-AI-Programmer

3 thoughts on “Build an AI Programmer using Recurrent Neural Network (3)”

  1. Thanks for posting a nice tutorial.

    I have tried to replace LSTM with Bidirectional LSTM in a sequence to sequence modeling in order to compare the performances, but it does not work. Can you please let me know the reason for this? Is it possible to achieve or not for Java language modeling for next token prediction?

Leave a Comment