
Code generation systems like DeepMind’s AlphaCode, Amazon’s CodeWhisperer, and OpenAI’s Codex, which powers GitHub’s Copilot service, offer a tantalizing glimpse of what’s possible with AI in computer programming today. But so far, only a handful of these AI systems have been made freely available to the public and open source, reflecting the commercial incentives of the companies building them.
In a bid to change that, AI startup Hugging Face and ServiceNow Research, the R&D division of ServiceNow, today launched BigCode, a new project that aims to develop “state-of-the-art” AI systems. ” for coding in an “open and accountable” environment. way. The goal is to eventually release a data set large enough to form a code generation system, which will then be used to create a prototype – a 15 billion parameter model, larger than Codex (12 billion parameters) but smaller than AlphaCode (~41.4 billion parameters) — using ServiceNow’s internal graphics card cluster. In machine learning, parameters are the parts of an AI system learned from historical training data and essentially define the skill of the system on a problem, such as code generation.
Inspired by Hugging Face’s BigScience efforts to open up highly sophisticated text-generating systems, BigCode will be open to anyone with a professional background in AI research who can commit time to the project, organizers say. The application form was put online this afternoon.
“In general, we expect candidates to be affiliated with a research organization (academic or industrial) and to work on the technical/ethical/legal aspects of [large language models] to code apps,” ServiceNow wrote in a blog post. “Once the [code-generating system] is trained, we will assess his abilities… We will strive to make the assessment easier and broader so that we can learn more about the [system’s] capacities. »
By collaboratively developing a code generation system, which will be open source under a license that will allow developers to reuse it subject to certain terms and conditions, BigCode seeks to resolve some of the controversies that have arisen around the practice of AI – motorized code generation – especially regarding fair use. The nonprofit Software Freedom Conservancy, among others, has criticized GitHub and OpenAI for using public source code, not all of which is permissively licensed, to train and monetize Codex. Codex is available through OpenAI’s paid API, while GitHub recently started charging for access to Copilot. For their part, GitHub and OpenAI continue to assert that Codex and Copilot do not violate any license terms.
The organizers of BigCode say they will work to ensure that only files from repositories with permissive licenses enter the aforementioned training dataset. Along the way, they say, they will work to establish “responsible” AI practices for training and sharing code-generating systems of all types, seeking input from relevant stakeholders before making policy decisions. .
ServiceNow and Hugging Face did not provide any timelines for when the project would be completed. But they expect it to explore several forms of code generation over the next few months, including systems that automatically complete and synthesize code from snippets and natural language descriptions and work in a wide range of domains, tasks and programming languages.
Assuming the ethical, technical, and legal issues are ever resolved, AI-based coding tools could significantly reduce development costs while allowing coders to focus on more creative tasks. According to a Cambridge University study, at least half of developers’ efforts go into debugging and not active programming, costing the software industry around $312 billion a year.