Choongwoo Han Software Engineer tunz

Generate JavaScript Code with Transformer for Fun

1. Introduction

Programming language is a kind of intermediate language that connects humans and machines. While both of machines and humans can understand a given code written in PL, only humans can write down code. If machines are also be able to generate code automatically, we can use that for various jobs such as program synthesis or automated software testing. We might be able to turn descriptions into code, or generate test code automatically. One another example I can imagine with the technology is to mutate code, and I think this could be a first step to train machines to learn how to write code because mutation is easier than writing down a code from scratch.

For those who don’t know about fuzzing, fuzzing is software random testing to find bugs, especially memory corruption bugs in native softwares written in C/C++. It is to generate random inputs and give the inputs to a target program, and to check if the inputs make the program crash or not. It sounds very simple, but there are many challenges. Since recent software is very complex, there are tremendous amount of software bugs and input spaces. Thus, it is also important to try interesting inputs first within limited time and computing resources.

For example, let’s assume that we have to fuzz JavaScript engines. How can we generate inputs for JS engines? JS engines take JavaScript code as input. Randomly generated bytes may be able to crash JS engines, but most of them are blocked by Syntax Error and not exploitable even if it’s crashed. We need to execute deeper code, and to do so, we need to generate valid JS code. There were many researches to generate syntactically valid JS code by mixing pre-defined code templates or mutating abstract syntax tree to bypass syntax errors so that we can focus on interesting bugs. They’re indeed a very effective method because they found a lot of real bugs, but still they have many limitations because of lack of semantic knowledge.

I think we can overcome the limitations with deep learning and reinforcement learning techniques someday. Software testing is an optimization problem. Coverage-based fuzzings like afl, which is kind of hill climbing and genetic algorithm based on coverage feedbacks, already showed good performance. So, I expect that intuitions trained by deep learning can improve fuzzing by pruning uninteresting input spaces like what AlphaGo did. But, there is a long way to go. We have too many challenges we have to solve to achieve that.

First, a deep learning model have to understand the structure of JavaScript code. I guess the first time I tried deep learning was 2015 or 2016 when I read Andrej Karpathy’s blog post. I was skeptical at that time because the code generated by char-rnn has many mistakes. The experiment results are very interesting, but it doesn’t seem like char-rnn understands the code structure and semantics. It just looks like a code, and it was even difficult to be compiled because compilers don’t allow any single syntactic mistake.

A few years passed now. So, I decided to check how much deep learning is improved.

2. Overview of Transformer

Transformer is a deep learning model introduced by Attention Is All You Need at 2017. Recently, it has been actively studied in various fields, especially in natural language processing (NLP), and it’s outperforming existing state-of-the-arts. Transformer improved training performance a lot with self-attention. While existing techniques, such as RNN, were feeding input and target sequences to encoder and decoder one by one in sequence, Transformer made it available to feed inputs and targets at once in training steps.

[transformer model image]

I don’t want to explain all about Transformer in this post, so let’s skip the details for now. You can find their implementation in tensor2tensor github repository, or I think this post is describing Transformer well with illustrations.

While I don’t know how great it is since I’m not an expert of deep learning, I’ve heard the term “Transformer” frequently and about its good performance, which inspired me to try Transformer.

3. Experiments

Let’s see what happens when we train JS language model with Transformer. I designed two types of simple expriments: 1) predicting a next statement when a statement is given, and 2) filling a blank statement when multiple statements are given.

For the experiments, I used JS testcases in V8, and also used esprima escodegen for preprocessing. I used my own machine, Geforce 1080 Ti, for training.

Predict a next statement

I generated a dataset for this experiment by pairing consecutive statements of every BlockStatement and Program types of AST nodes, and trained a model in character-level. Even though I knew that char-level training is quite limited, I thought char-level is the our final goal since we cannot generate every possible testcases with existing tokens in the v8 testcases. I wanted to see how well the model understands concepts like identifiers. The model was overfitted if I run the training for long time, so I made an early stop after 1 day. I used transformer_small hparams setting.

[training plot]

These are some interesting results of the trained model. Target is an input feeded to the model, and Generated is an output generated by the model. Every inferences took about 1 second in CPU.

The trained model catched that some_fn is a callable identifier. Note that some_fn never appears in the training dataset.

let some_fn = function(v) {return v;} // Target
some_fn(); // Generated

It also catched literal numbers, and increased the number. But, it failed to generate a correct statement because it declared with a duplicated identifier.

fn(some_var, 456); // Target
fn(some_var, 457); // Generated
let some_var = 123; // Target
let some_var = 124; // Generated

It failed to find the correct identifier vvv and u8arr, but catched that they are array type.

let vvv = new Array(123); // Target
assertEquals(123, v[0]); // Generated
let u8arr = new Uint8Array(arrbuf); // Target

// Generated
for (let i = 0; i < arr.length; ++i) {
    u8arr[i] = 0;
}

Predict a statement in the middle of multiple statements

4. Limitations

  • Slow inference time
  • limited length
  • If char-level, too slow inference time. If word-level, how can we generate unknown strings or regex?

5. Conclusion

References

[1] Attention Is All You Need
[2] Neural machine translation by jointly learning to align and translate
[3] RNN TODO
[4] tensor2tensor
[5] jsfunfuzz
[6] Fuzzing with Code Fragments
[7]
[8]
[9]
[10] Description2Code
[11] Fuzzing, Wikipedia
[12] afl
[13] Hill climbing, Wikipedia
[14] Genetic algorithm, Wikipedia
[15] AlphaGo
[16] RNN, Wikipedia
[17] http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Universal Tranformers

  • Constituency Parsing with a Self-Attentive Encoder (https://arxiv.org/pdf/1805.01052.pdf)

  • V8
  • Attention is all you need
  • Char RNN -> Code generation
  • JS Mutation papers
  • Fuzzing
  • RNN

[x_1 = \frac{5 + \sqrt{25 - 4 \times 6}}{2} = 3]