Recurrent Neural Networks (RNNs) are gaining a lot of attention in recent years because it has shown great promise in many natural language processing tasks. Despite their popularity, there are a limited number of tutorials which explain how to implement a simple and interesting application using the state-of-art tools. In this series, we will use a recurrent neural network to train an AI programmer, which can write Java code like a real programmer (hopefully). The following will be covered:
1. Building a simple AI programmer
2. Improving the AI programmer – Using tokens
3. Improving the AI programmer – Using different network structures (this post)
In the previous posts, we built a basic AI programmer using characters and tokens as training data respectively. Both of the approaches use a simple 1-layer LSTM neural network. More specifically, the network uses a many-to-one structure as shown in the following diagram:
For sequence-to-sequence predictions, there are other structures such as one-to-many and many-to-many. In this post, we will implement a simple many-to-many network structure like the following. The code is pushed to the same repository on GitHub (link is provided at the end of this post).
Since the majority part of the code is the same with the previous post, I only highlight the differences here.
1. Prepare the training data
Since this time we will predict a sequence instead of the next token, the y should also be a sequence. y is the sequence left-shifted by one from X.
NUM_INPUT_TOKENS = 10 step = 3 sequences = [] for i in range(0, len(tokenized) - NUM_INPUT_TOKENS-1, step): sequences.append(tokenized[i: i + NUM_INPUT_TOKENS+1]) print('# of training sequences:', len(sequences)) X_temp = np.zeros((len(sequences), NUM_INPUT_TOKENS + 1, len(uniqueTokens)), dtype=np.bool) X = np.zeros((len(sequences), NUM_INPUT_TOKENS, len(uniqueTokens)), dtype=np.bool) y = np.zeros((len(sequences), NUM_INPUT_TOKENS, len(uniqueTokens)), dtype=np.bool) for i, sequence in enumerate(sequences): for t, char in enumerate(sequence): X_temp[i, t, token_indices[char]] = 1 num_sequences = len(X_temp) for i, vec in enumerate(X_temp): y[i] = vec[1:] X[i]= vec[:-1] |
2. Build a many-to-many recurrent neural network
Here is the code to build a many-to-many recurrent network.
model = Sequential() model.add(LSTM(128, input_shape=(NUM_INPUT_TOKENS, len(uniqueTokens)), return_sequences=True)) model.add(TimeDistributed(Dense(len(uniqueTokens)))) model.add(Activation('softmax')) optimizer = RMSprop(lr=0.01) model.compile(loss='categorical_crossentropy', optimizer=optimizer) print(model.summary()) |
You can print the network structure:
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lstm_1 (LSTM) (None, 10, 128) 670208 _________________________________________________________________ time_distributed_1 (TimeDist (None, 10, 1180) 152220 _________________________________________________________________ activation_1 (Activation) (None, 10, 1180) 0 =================================================================
Like what we did for the many-to-one structure, we can also easily stack one more layer of LSTM like the following:
model = Sequential() model.add(LSTM(128, return_sequences=True, input_shape=(NUM_INPUT_TOKENS, len(uniqueTokens)))) model.add(LSTM(128, return_sequences=True)) model.add(TimeDistributed(Dense(len(uniqueTokens)))) model.add(Activation('softmax')) optimizer = RMSprop(lr=0.01) model.compile(loss='categorical_crossentropy', optimizer=optimizer) print(model.summary()) |
The network structure is like this:
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lstm_1 (LSTM) (None, 10, 128) 670208 _________________________________________________________________ lstm_2 (LSTM) (None, 10, 128) 131584 _________________________________________________________________ time_distributed_1 (TimeDist (None, 10, 1180) 152220 _________________________________________________________________ activation_1 (Activation) (None, 10, 1180) 0 =================================================================
3. Results
The results looks better than the previous many-to-one network after only a few iterations. I highly recommend you to run the code and have your own observations and think about the reason. That would be a good exercise.
runattributes = numberelements [ i ] . offsets [ currindex ] ; patternentry ucompactintarray ; import sun . util . oldstart ;
4. What’s Next?
In this post, I used a many-to-many structure network to train the model and the model predicts token sequences. Maybe for fun, you can also try the one-to-many network. Check out this table to see other network structures. In addition, there are many other parameters we can tune to make the training faster and make the AI Programmer better.
Source Code
1) The source code of this post is lstm_ai_coder_tokens_many2many.py which is located at https://github.com/ryanlr/RNN-AI-Programmer
los archivos de esta libreria son ahora propiedad del que ideo y fuciono todos los archivos propiedad Andres Miranda cualquier compañia que copie en estos archivos es uso ilegal y.no es libre la licencia es para quien quiera haceptar mis condiciones https://www.programcreek.com/cpp/?action=search_project&Query=cc%2B%2B%2Fpersonal.data%2Fandresmiranda&submit=Search<spoiler> https://media4.giphy.com/media/IorAqDrjKiDcc3QdN3/giphy-downsized-medium.gif
Thanks for this Post sir.
Good Info about Building an https://frontiertechnologyinstitute.com/python-programming-certification/
Thanks for posting a nice tutorial.
I have tried to replace LSTM with Bidirectional LSTM in a sequence to sequence modeling in order to compare the performances, but it does not work. Can you please let me know the reason for this? Is it possible to achieve or not for Java language modeling for next token prediction?