Earl Barr's research portrait


When Earl Barr is in a provocative mood, he likes to suggest that he's a heretic. For one thing, his work crosses several subdisciplines within computer science: cyber security, programming languages, and software engineering. For another, when he really gets going he'll say things like, "The difference between a vulnerability and a bug is the existence of an exploit." As the mood passes he qualifies that: "It has 'truthiness', but it's not true."

This all makes more sense when you look at Barr's atypical background: his undergraduate degree is in literature and philosophy. With that base, it's notable how much of his best-known work focuses on language in one form or another. A recent piece of work, for example, that was profiled in Wired pioneers automated transplanting of code from one piece of software to another. The tool Barr devised with Mark Harman, Bill Langdon, Alex Marginean, Justyna Petke, and Yue Jia, can isolate the code of a useful feature in one program, find the right place to put it in another program, and insert it with minimal human involvement.

Barr earned his literature and philosophy degree at the University of the Pacific, in his home town, Stockton, California, the largest port for agricultural produce coming out of California's central valley. He only became interested in programming during his senior year when he decided he ought to learn something about computers. Having studied logic in philosophy, he says, he found it "easy".

Having graduated second in his class, "I thought I would get a great job." The black turtleneck-wearing, poetry-writing, aspiring screenwriter found himself stacking computer terminals and pulling cables in the California state assembly. Each day there ended with a time-consuming struggle to enter the days task's into a particular piece of poorly designed software.

"So I hacked the system," he says. "I wrote an exploit, violated its security, and reformatted the interface so everything you needed to do was on a single page, and I turned a 40-minute task into a five-minute task." When co-workers saw he got to leave early and wanted the same interface, he obliged. Soon, fearing he'd be fired, he was called to a meeting with a group of DEC consultants who had spent a year collecting substantial salaries under assignment to solve the same problem. Instead, his manager fired half the consultants and told Barr, "You're a programmer now."

He learned on the job for a while. But in wanderings around the world, first to Africa, France, and Wall Street, he realised that lacking a relevant degree made it hard to get a job in computer science. Back in California, he completed undergraduate course work at Sacramento State University. He completed his masters and PhD there in 2009. He joined UCL as a senior lecturer in 2012; he is a member of the System Software Engineering Group and the Centre for Research on Evolution, Search and Testing (CREST),

One of his early papers, 2007's "ConceptDoppler: A Weather Tracker for Internet Censorship" , written with Jedidiah R. Crandall, Daniel Zinn, Michael Byrd, and Rich East, studied the workings of the so-called "Great Firewall of China". The group discovered that China's internet censorship works by using a feature of TCP that is designed to aid fault tolerance and recovery; it can issue a reset to shut down both sides of a connection. Because this is a built-in feature, it's not easy to design a bypass mechanism - but it can be used to probe the system and use the responses to build a list of what is being censored. The result, ConceptDoppler, an architecture for maintaining the list over time, derived its efficiency from using latent semantic analysis to cluster filtered keywords. Among the 122 censored keywords the project uncovered were many they expected, such as "falun gong", but also some surprises, such as (in Chinese characters) "mein kampf" and "international geological scientific federation".

As part of his postdoc at UC Davis, Barr produced some of his most significant work to date."On the Naturalness of Software" , with Abram Hindle, Zhendong Su, Mark Gabel, and Premkumar Devanbu, applied natural language processing techniques to source code (not the natural language fragments within the source code). The results sound simple at first: code is much less surprising than natural language. Among other properties, code has much longer-range dependencies than natural language does. In other words, a variable defined at the very beginning of a long program with millions of instructions may only be used much later, something that's rarely true of natural language.

"Many people think we just confirmed conventional wisdom, but the claim is not so obvious as all that because of the existence of these long-range anaphora - noun references that need to be resolved in natural language but tend to be close together," he explains.

Source code, he says, is also very different in its rate of production of neologisms - usages that haven't been seen before. The researchers found that in natural language - shown by scanning such large natural language collections as the Gutenberg Project database, the Brown corpus, and Canadian Hansard - the rate of production of neologism climbs very steeply at first but very quickly levels off so they become rare. "That's not true when you look at code, because programmers are always naming things. You could say that coding is more primordial than natural language in the sense that programmers are inventing a language to discuss some problem." In code, a graph of the rate of neologism production nearly follows a 45-degree x/y line.

A follow-up paper, "Learning Natural Coding Conventions", written with Miltos Allamanis, Christian Bird, and Charles Sutton, won the Distinguished Paper award at the 2014 ACM SIGSOFT. This paper studied the conventions that developers on large projects establish for formatting code and naming elements such as identifiers in order to make the code easier for later developers to read and understand. These conventions can be hard for a human to derive when they're working on a small piece of a large, distributed project. The resulting tool, NATURALIZE, learns the style of a codebase and helps programmers patching and extending the software to follow existing conventions.

"On the Naturalness of Software" has had tens of follow-ups by others, and will appear as a research highlight in Communications of the ACM. "It seems like it has opened the door to an exciting new area of research in software engineering that exploits the statistical properties of code to help developers write better, more secure code more quickly - and therefore more cheaply."

This page was last modified on 18 Mar 2016.

Dr Earl Barr



6.06a, Malet Place Engineering


+44 0(20) 7679-3570


e.barr [at] cs.ucl.ac.uk