Translating Natural Language to Code
Update (2 Apr 2013): Happy April Fool’s day!
We’ve all been there. A long meeting where software engineers have to make an architecture decision. It’s been dragging on for hours; no one seems to agree on anything. The discussions become heated as engineers visualize the suggested solutions with drawings and code samples on a whiteboard.
This happened to us last November while brainstorming about an important new feature in Transifex. Everyone was on a different page and we couldn’t settle on an agreement for hours. By the time we left the office, it was nearly midnight.
How many times has this happened to you? You have the perfect idea in your head — it seems so simple to you. But when you explain it to others, they just can’t see the simplicity of the code you have in mind. They insist that it will be hard to maintain and won’t scale.
We decided to fix this problem.
During a hackfest in January, somewhere up in the Greek mountains, we decided to roll up our sleeves and solve this.
We built a natural language processing engine that translates English text to code.
Product Managers, rejoice: you no longer need to convince developers to agree with your feature request. Developers, throw a party: forget about working on those sales and marketing features you don’t like.
TurkeyCode (TC) is computer-aided natural-language programming. TC uses Transifex libraries to read English text and produce programming code. We started with English as a source language with a limited vocabulary and Python as the target language.
Here is an example on how text input is translated to the Python code.
TurkeyCode is a feature which reads text and translates it to code.
def __main__() class TurkeyCode(feature): def read(text): pass def translate(text): return code class text: pass
|Whenever it reads text, TurkeyCode remembers the length of the text for future use.|
To translate, TurkeyCode calls nltk to tokenize the text.
import nltk def __main__() class TurkeyCode(feature): length = 0 def read(text): self.length = len(text) def translate(text): code = nltk.tokenize(text) return code class text: pass
Under the hood, TurkeyCode uses the nltk library to classify the sentences and deconstruct them to tokens. It then uses PLY to map the tokenized text to a Python construct. Type text as you normally would and a Python implementation comes out.
Combined with our computer-aided tool for translating between programming languages we announced 12 months ago, you can export this Python implementation to other programming languages such as C, Perl, or PHP.
TurkeyCode is a brand new technology still in early development. Our roadmap for the next 6 months includes adding support for TC to our API so the prototype solution can be saved in your repository on GitHub, and supporting Spanish and Russian as source languages.
One limitation of the platform is that it requires a keyboard at the moment. Quite a few meetings in tech companies have a no-devices policy, so TurkeyCode is limited. We plan to add a speech-to-text system to the mix: you record every meeting and a daemon automatically converts what’s said into text, which is then turned into code. Within minutes, you have a prototype solution in Python.
Linus Torvalds said, “Talk is cheap, show me the code.” This is the future of coding. We’ve been using TurkeyCode internally for the past two months and it has already saved us dozens of hours of prototyping.
Next Tuesday, we will be launching the beta for select Transifex users. If you’re interested, please tweet this post using the button at the bottom of the page and we’ll add you to our list.