Open source under the microscope

University researcher Walt Scacchi leads a team that investigates the inner workings of open-source development, including the growing number of corporate-sponsored projects.

Paul Festa Staff Writer, CNET News.com
Paul Festa
covers browser development and Web standards.
Paul Festa
9 min read
Open-source developers have long done their work in the public eye. Now, they're doing it under an academic microscope.

Walt Scacchi, a senior research scientist at the University of California at Irvine's Institute for Software Research, has been looking at open-source projects from an analytical perspective, studying the open-source model in an ongoing, 10-year project that draws some comforting conclusions for open-source sponsors and developers.

Scacchi and fellow researchers have found a significant failure rate among open-source projects. But among those that get off the ground, research has shown not only that the open-source approach can yield better software more quickly and for less money than traditional methods but also that volunteering for an open-source project can be an effective way to get a job.

Get Up to Speed on...
Open source
Get the latest headlines and
company-specific news in our
expanded GUTS section.

Often, Scacchi's work is as much sociological as technical, as he and colleagues examine phenomena like "community building" and cultural institutions alongside drier subjects like code and project design.

And academia's work on open source is more than academic. Three projects by Scacchi and colleagues at UC, Santa Clara University and the University of Illinois will use the data to design new development tools for big, multiorganization projects.

Scacchi and colleagues are at work on four different research projects. Their first National Science Foundation grants came through in the fall of 2000, following a few years of unfunded research. Current funding will bring the project through 2006, and Scacchi estimates that it will be at least a 10-year research investment. He spoke to CNET News.com about his work from his office in Irvine.

What exactly is your research trying to determine?
In general, we're trying to understand how free and open-source software development works in practice. Are the processes the same that are taught in engineering classes, the guidelines that we teach in academia? Or are they doing something else? If so, is it a poor version of software engineering, bumbling along in such a way that they wouldn't do it if they knew better?

We've looked at free and open-source projects, multiple projects in multiple communities, not only the popular areas like Web or Web infrastructure software--Mosaic and Apache are two examples--but are also looking at open-source practices in the computer game community or in the world of astrophysics and deep-space imaging or academic-software design. By looking at multiple projects across these different arenas, what we see is something different than what's advocated in the principles of software engineering.

What are some of the differences you've found, apart from the obvious ones?
For example, in software engineering, there's a widespread view that it's necessary to elicit and capture the requirement specifications of the system to be developed so that once implemented, it's possible to pose questions as to what was implemented, compared with what was specified.

We do not see or observe or find in open-source projects any online documents that software engineers would identify as a software requirements specification. That poses the question: What problem are they solving, if they haven't written down the problem? While it's true that there's no requirements specification, what there is instead is what we've identified as a variety of software informalisms.

What do you mean by "informalism"?
That word is chosen to help compare to the practice advocated in software engineering, in which one creates a formal systems specification or design that might be delivered to the customer. Informalisms are such things as information posted on a Web page, a threaded e-mail discussion or a set of comments in source code in a project repository. It may be a set of how-tos or FAQs on how to get things accomplished. Each is a carrier of fragments of what the requirements for the system are going to be.

If they're put together in such a haphazard way, can they really be considered requirements?
Yes and no. Clearly, they're distributed, but in order for people to contribute to the project, those people need to understand where those requirements are and how they relate to each other and how to pull them together. Part of how the community works is that each of the participants discusses what the system should do in whatever informalism they feel is the most appropriate to them.

What the licenses do in practice is reinforce and institutionalize a set of beliefs, values and norms for how free or open-source software should be developed.

Once the requirements are figured out, how are systems designed in open source?
We've begun to codify the practice we observe with the label "continuous design." That would mean, much like requirements, that there is no unique baseline design, necessarily. Instead, there is today's understanding of the design, which may be different from yesterday's or tomorrow's. It's not a bounded activity with a fixed and targeted deliverable, but an ongoing activity, and the system design is represented across the web of these informalisms. The key point is the evolving nature of it.

What's the relationship between your design informalisms and continuous design?
Informalisms refer to artifacts, to the medium. Continuous design characterizes the practice or process; what produces or consumes the artifacts.

What about management?
There's a self-management process we're calling "virtual project management." As the participants start to make choices, to create functionality in certain ways, using certain tools with certain architectural tendencies, they constrain how subsequent choices can be made and how the system can be expanded or not. I look at the project and say, "Here's something I can do," and I become a virtual owner or a designated leader who has a certain amount to say in an area. From an organizational standpoint, this looks less like a hierarchical organization than a meritocracy. People ascend to positions of authority based on accomplishment and expertise.

And yet, there is a hierarchy, isn't there? Projects have owners.
The idea of a meritocracy is not independent of hierarchies. They tend to be not as tall and broader or wider than others. There might be a group of elders or a single individual providing the vision. There is a sort of layering going on here, but the layers are permeable.

Have you been looking at just the sort of grassroots open-source projects? Or have you also been looking at the more recent corporate projects?
Yes, we have been looking at the modern-day version of these corporate-sponsored open-source projects. An example would be NetBeans at Sun. Another is Eclipse at IBM, and a third is the Gelato Federation sponsored by Hewlett-Packard. There's a growing number of these large corporate-sponsored open-source projects, meaning that the corporation is assigning its salaried employees to work full- or part-time on the project. They may be either trying to put together what the volunteer community is doing or addressing the parts of the system no volunteers have stepped forward to do.

Open-source projects also serve as venues for recruiting, looking at the volunteers for potential employees. Companies that get involved in sponsoring a project can find out who are the good people here in the community who have the natural talent or the track record or experience, which they would be unlikely to find through traditional recruiting means. These people might be in geographic locations that are inconvenient, but they are really capable and have deep expertise, and let's see if there is a new kind of employment relationship that might be able to engage them to make the voluntary contributions and engage in work for pay. And those people tend to get higher-than-average pay. People who are typically in the core contributors--people near the center of the project--they're the ones who have this higher level of participation; their work products are publicly available for others to individually evaluate, and companies find that that's an extremely important resource.

So what does your research say about the effectiveness of open-source development?
One thing we find with respect to participation is that in a couple of other surveys, 60 percent of open-source software developers who show up as core contributors tend to be contributors to two to 10 other projects. Once you've established a reputation of expertise in a certain area, you can take that to another project, or conversely, people seek out your expertise, because you know how to do certain kinds of things. The overall dynamic that starts to emerge is that there's a social mechanism for the creation of critical mass that lets these projects coalesce and come together, so systems can grow and evolve at rates that far exceed what's predicted by good software practice. Software engineering predicts that projects grow by the inverse square law, meaning that initial growth is fast. It then slows down, and then, with a project shift, you get steady growth.

But in the more successful open-source projects, you get a hockey stick (curved line) on your graph--a longer period of slow growth, then critical mass starts to kick in, and the growth curve starts to shoot up in a greater-than-linear growth rate.

So what, exactly, is happening to spur that faster growth you're seeing in open source? What's an example?
Let's say you're a master of UI (user interface) technology, so you hook up to another project and can import or reuse the code and the

People are breaking away from the tradition of the individual artist, saying there's another way to build upon the work of others.
expertise that's been acquired so far. If our projects form that symbiosis, they can merge with a third, so this starts to account for why you see that substantial growth. This is a manifestation of software reuse that is different than what's being advocated by the software engineering community, which says everything's in a library that everyone dips in to in order to take what they need, and then it goes away. Here you say, "I create something you want, so I make my work in both contexts and together we have new context, and as we build this social network, what we're doing is bringing software expertise and source code with us so that in comparatively short amounts of time we can have large amounts of people create a large system without the coordination or management of a central corporate authority or project manager."

What else are you looking at in your research?
One thing is the role of free and open-source public licenses--things like the GPL. We're not going to address legal issues, but what the licenses do in practice is reinforce and institutionalize a set of beliefs, values and norms for how free or open-source software should be developed. It's a statement of affiliation, of how to build software, of the reasons why to build software. Here open-source licenses not only serve community property rights but also act as a way of declaring affiliation with this broader social movement. Open-source is becoming a global social movement, so it can grow beyond the boundaries of software development.

Where are you seeing open-source principles adopted beyond software development?
There's an open-source community in architecture, working in developed countries, of people who will contribute their designs in developing or emerging countries, where hiring an architect to do something is prohibitively expensive. There's open-source education--

Like at MIT.
That's at the college level, but also in grade schools and high schools globally. People in the United States and Europe are contributing content for math and science classes for their own countries and developing countries, where purchasing textbooks is prohibitively expensive. In the visual-arts community, there's a movement to explore what it means to do works of art for sharing, or building upon works of art of other people. People are breaking away from the tradition of the individual artist, saying there's another way to build upon the work of others.

And in the area of government, a number of European and Third World countries are looking to adopt open-source systems for reasons of perceived cost or low cost, but at the same time they bring in the open-source systems, they also embrace the ideology of openness, which in turn may be a revitalization of what it means to be an open, democratic nation or government. So the process becomes open source so that citizens can better understand how their governments work and how a corporate provider of information technology is serving its own interest in selling systems to its government or if it's helping the people.