The Pervasive Data Center

Read all 'programming' posts in The Pervasive Data Center
November 6, 2009 6:00 AM PST

Intel's James Reinders on parallelism - Part 2

by Gordon Haff
  • Post a comment

Intel's James Reinders is an expert on parallelism; his most recent book covered the C++ extensions for parallelism provided by Intel Threaded Building Blocks. He's also the Director of Marketing and Business for the company's Software Development Products. In Part 1 of our discussion at the Intel Developers Forum in September we talked about how to think about performance in a parallel programming environment, why such environments give developers headaches, and what can be done about it.

Here, in Part 2, we move on to cloud computing, functional and dynamic languages, and what needs to happen with computer science education.

Few wide-ranging conversations these days would be complete without at least a nod to cloud computing which Reinders views as very much connected to the matter of parallel programming.

Cloud computing is parallel programming. You're solving the same problem. In fact, someone that's good at decomposing a program to run in parallel on a multicore or on a supercomputer... the same thought process is necessary to decompose a problem in cloud computing. What's different in cloud computing is that the cost of a connection or a communication between two different clouds is so high. You really need to get it right. It works best when a little message is sent, does an enormous amount of computing, and gets a little message back.

Data parallelism tends to be very fine-grained.

Task parallelism like we see with Cilk and Threaded Building Blocks is a little bit more coarse.

Cloud computing has to be very very coarse-grained parallelism.

But there's something common about how you have to think about it.

The tools that will let people do cloud computing, express a problem in cloud computing, may eventually just map onto a multicore.

The granularity that Reinders discusses refers to how small a chunk of computing can be, given the cost and latency of communications. Within a single processor, communications bandwidth is high and latencies low, so software can afford to perform a relatively small task and then synchronize the results. (Although moving large amounts of data can still be relatively "expensive" which is why data parallelism can be finer-grained than task parallelism; see Part 1 for further background on data parallelism.)

By contrast, external communication networks have limited bandwidth and are relatively slow--on the order of four or five orders of magnitude slower than communications within a system. Therefore, tasks have to be parceled out in relatively large chunks that, ideally, don't have to be packaged up with a significant amount of local data.

Next up was education. Here, Reinders' basic message was focusing on the theory before diving into the implementation details. I suspect that this highlights one of the key challenges: Parallel programming tends to require a solid grasp of programming theory and doesn't lend itself particularly well to just "hacking around" in the absence of that grounding.

I've been doing a lot in the area of teaching parallelism. What a lot of people think of right away is teach them locks, teach them mutexes [algorithms to prevent the simultaneous use of a common resource], teach about how to create a thread, destroy a thread. That's all wrong. You want to be talking at a higher level. How do you decompose an algorithm? What is synchronization in general? Why does it exist?

Things I would hope undergraduates would learn are parsing theory, DAG representations [a tool used to represent common subexpressions in an optimizing compiler], database schemas, data structures, algorithms. All these are high level, not things like [the programming language] Java. Parallel programming's like that too. You get hands-on touching the synchronization method or whatever but you want to teach the higher level key concepts.

Some people it's going to be more in-tune with their thinking but you try and teach it to everyone.

Given that most of today's languages weren't expressly designed for parallel programming, discussions about parallelism often turn to new programming languages. This means functional languages most of all but can also involve dynamic or scripting languages which generally handle more low-level details under the covers than do Java or C++.

Functional languages don't lend themselves to easy, or easily comprehensible, description. A common shorthand is that "Functional programming is a style of programming that emphasizes the evaluation of expressions, rather than execution of commands." But that probably doesn't help much if you don't already know what it is. As for Wikipedia's entry, Tim Bray--no programming slouch--called it fairly impenetrable. (Perhaps you begin to see the problem.)

A couple of things I'm interested in functionals for. We don't wake up one day and everyone uses. It's sequential semantics again and sequential semantics appeal to people and functional languages don't have them. But some people eat them up.

And they solve amazing problems. You can code things up in them that are much easier to understand than if they are written in a traditional language although they can be cryptic or terse to a lot of programmers.

Erlang [a functional language] has gotten a bit more and more usage. Maybe it is creeping in. It's not going to take over the world overnight but it seems like the one that might stay around. May be talking about it 20 years from now and saying, yeah, Erlang's been around for 25 years. It might be accepted as a language. It may have legs.

But even Java. [Unlike Erlang,] It appealed to people who programmed in C and C++; it didn't challenge them to think differently. And because of the strict typing and stuff it helps [the enterprise developer] to deploy certain types of apps.

Python [a dynamic language] is interesting. It is so popular with a lot of scientists. It's on my short list of things, where if we can figure out where to partner or extend some of the things we're doing, Python's on my short list of languages that we want to help with parallelism. Maybe some of our Ct technology would apply there. We'll see if other people agree with us. Think the concepts we're talking about are pretty portable. 

Finally, we concluded our discussion with hardware.Are there opportunities at the hardware and firmware level with memory subsystems or with specific technologies such as transactional memory? Sun Microsystems was very interested in transactional memory in the context of its now canceled "Rock" microprocessor. The basic concept behind transactional memory is to provide an alternative to lock-based synchronization by handling concurrency problems as they occur at a low-level rather than having the programmer protect against them all the time.

The best solutions tend to not be silver bullets so much as incremental. Nehalem [Intel's latest microprocessor generation] in a way probably helped us more than  anything in recent memory because we moved to the QuikPath interconnect and moved bandwidths up and latencies down. Larrabee [a many-core Intel microprocessor still under development] may pave the way with some innovations in interconnects. I think there may be some refinements needed. Interconnecting the processors is a classic supercomputer issue.

Transactional memory has slammed up against a very tough reality which is that hardware always wants to be finite; software solutions wants to be infinite. Think there's something there.I think the people looking at transactional memory have started to make observations about locks that may end up being useful. It's funny. The mission of transactional memory is to get rid of locks but the more they looked at it the more they understood about how locks behave. There might actually be possibilities to make locks behave better in hardware.

Can we do the hardware a little differently? Not the sexiest thing in the world. But as we move from single-threaded to  multi-threaded what complications are we creating things [that the hardware can help with]?

Even if you don't subscribe to the more extreme views of programming and software being in a crisis because of the move to multi-core, we're clearly in a transition. New tools are needed and programmers will have to adapt as well, to at least some degree.

November 5, 2009 6:00 AM PST

Intel's James Reinders on parallelism: Part 1

by Gordon Haff
  • 1 comment

Multicore processors are here to stay and the number of  cores that we'll see packed onto a single chip is only going to increase. That's because Moore's Law is only indirectly about performance; it's directly about increasing the number of transistors. And, for a variety of reasons, turning those transistors into performance today largely depends on cranking up the core count.

There's a downside to this approach though. Programs that consist of a single thread of instructions can only run on a single core. This in turn means that they're not going to get much faster no matter how many cores a chip adds. Running faster means going multi-threaded--splitting up the task and working on the different pieces in parallel. The problem is that programming multi-threaded applications introduces complications that don't exist with single-threading.

These complications and ways to overcome them was the topic of my conversation with James Reinders at the Intel Developers Forum in September. Reinders is the director of marketing and business for Intel's Software Development Products. He's an expert on parallelism and his most recent book covered the C++ extensions for parallelism provided by Intel Threaded Building Blocks.

In part 1 of this discussion we talked about how to think about performance in a parallel programming environment, why such environments give developers headaches, and what can be done about it.

Reinders began by noting that developers fall into roughly two groups when it comes to parallel programming: those who are still concerned about ultimate performance even in a parallel world and those who are just looking for a way to deal with it at all.

The challenge is understanding what we're trying to introduce, how to use parallelism, but with programmer efficiency. Because programmers don't need yet another thing to worry about. There's plenty of those out there.

And we need to be a little more relaxed about the performance. The people who start asking me about efficiency in every last cycle used and such--I characterize them as people we need to talk to more about our high-performance computing-oriented tools that give you full control. And other people are "I don't even know how to approach parallelism." I think there is a different set of ways to talk about the problem.

The problems with this second group comes down to the fact that most programmers are used to dealing with something called "sequential semantics." A detailed description of programming semantics is a complex computer science topic but, at a high level, sequential semantics means more or less what it sounds like it sounds; instructions follow one after another and execute in the order that they are written.

If you store the number "1" in variable A, then store the number "2" in variable B, and then add them together in a third instruction, you can be confident that the answer will be "3." It won't depend on timing vagaries that might have caused the addition to happen before the stores. Most people start out programming sequentially using languages designed for that purpose.

Parallel programming, on the other hand, introduces concepts like data races (the answer is dependent on the timing of other events) and deadlocks (in which two threads are each waiting for the other to complete so that neither ever does). Here's Reinders:

If you've ever managed and got a bunch of people working on a project together, one of the headaches you get is coordinating with each other. What did Fred say to Sally? They're doing things out of order or whatever. Parallel programming can give you that same sort of headache.

The programming terminology you'll hear the compiler people use is "sequential semantics." One of the interesting areas is what can we do if we ensure sequential semantics. We recently acquired a team in Massachusetts who were working for a company called Cilk Arts.

Our hope is that Cilk can do a subset of what Threaded Building Blocks [TBB] can but preserve sequential semantics. We think we can do sequential semantics, do a subset of what TBB does, since we're introducing keywords into the compiler--that has some disadvantages because it's not as portable--but we think we might be able to magically give you sequential semantics and not give up performance. That's a big if.

Now why would we invest in that?

Because there are a lot programmers who have been getting along just fine with sequential programming. But when you tell them to add this or that for parallelism, a big thing that trips them up is that you no longer obey sequential semantics; you have more than one thing running around and you get data races, deadlocks, and it doesn't feel comfortable.

Now some people will argue that you need to do these things to get good performance. We have the feeling that in some cases you don't need to take that big of a leap to get pretty good performance.

And no one's going to criticize your app on a quad core for being only 70 percent efficient.

From there we moved on to data parallelism which focuses on distributing data across processing elements. It contrasts with the task parallelism that we commonly associate with the term parallel programming. Pervasive DataRush is one commercial product based on a data parallelism model. APL, the language with the strange symbols (for those with long memories), is often considered the first data parallel language. There have been a variety of others, often extensions to more conventional languages like C and FORTRAN, but none were widely used.

The other thing we're looking at is data parallelism. And that's where we acquired the RapidMind team and combined them with our Ct [C for Throughput Computing] team.

Data parallelism just takes it one step further. Data parallelism is all about the parallelism in the data. So you're talking about the data when you program.

And once you start talking about the data, the tools underneath can move the data around. Leaving the data management up to the programmer [as with Cilk and TBB] turns out to be a terrific headache. This applies equally to a cluster where they don't share memory or a GPU and a CPU in the same system.

But a language like RapidMind or Ct can address that problem. And CUDA and OpenCL can too [frameworks primarily oriented towards heterogeneous processing that uses graphics cores for computing tasks] but RapidMind and Ct are at a much higher level of abstraction which means that we're betting on the idea that we can attract more developers and give up some efficiency.

Part 2 of our conversation will cover cloud computing, functional and dynamic languages, and what needs to happen with respect to programmer education.

April 8, 2009 7:43 AM PDT

Pervasive takes on multicore programming

by Gordon Haff
  • Post a comment

Writing software that can simultaneously make use of multiple processors can be hard. Yet the advent of multicore processors--four cores per chip is now common--means that more and more software needs to do just that.

With processor performance increases now increasingly coming through the ability to handle more execution threads, rather than handling individual ones faster, multithreaded programming, in one form or another, is pretty much the only path to writing faster software, going forward.

Pervasive DataRush architecture.

(Credit: Pervasive Software)

Researchers and developers are tackling this issue from a lot of different angles, including new languages and a greater focus on multithreaded programming in computer science curricula. However, perhaps the most promising general direction is toward what you might call multicore virtualization--the abstraction of parallel complexities by carefully crafted algorithms and run times that handle most of the heavy lifting. (MapReduce and gaming engines are examples of the sort of thing I'm talking about.)

The latest announcement in this vein comes from Pervasive Software, whose DataRush product is now generally available. The company describes DataRush thusly:

At the heart of Pervasive DataRush is a powerful, massively parallel data-processing engine that enables fast, efficient, deep analysis and searching of large data stores. The platform integrates breakthrough technology to resolve well-known parallel-programming challenges associated with writing software for multicore processors: built-in features automatically handle issues such as locking, threading, and deadlock.

From a technical perspective, DataRush is a library and processing engine written in Java--which makes it portable to different operating systems because it runs in a Java Virtual Machine, as opposed to directly on the operating system. It's not specific to any one application, and it has operators and application programming interfaces (APIs) that can be exploited for a variety of parallel applications.

That said, Pervasive has focused and optimized DataRush around analytic tasks that typically require lots of parallel processing. A typical application is the sort of near-real-time data crunching that credit card companies do to detect and counter fraud.

From a business perspective, Pervasive's strategy with DataRush is to move up the software stack to higher-level solutions. However, these will still be in the form of enabling applications to handle certain types of tasks rather than actually getting into the application space themselves.

Pervasive Software is a well-established software company that has been listed on Nasdaq since 1997 and counts thousands of users as its customers. Along with a set of data integration products, it sells a database that is popular with many "low IT" organizations (PDF). Now it's adding DataRush to tackle a problem that didn't even exist for the mainstream users that Pervasive serviced 10 years ago.

September 26, 2008 10:43 AM PDT

Five riffs on EmTech08

by Gordon Haff
  • 1 comment

I spent the past couple of days attending Technology Review's EmTech08 conference at MIT. Lots of interesting speakers and ideas, some in areas of tech that I follow day-to-day (such as cloud computing) and others that I follow more in the vein of an interested observer (alternative fuels, open voting systems). In many respects, it's a refreshing change of pace from the events I commonly attend that tend to be more focused on today's immediate IT concerns.

EmTech08 gave me lots to mull--and I'll roll that mulling into more in-depth pieces down the road. For today, though, I'm just going to expand a bit on a few statements and thoughts I ran across in the course of the two days that particularly caught my attention.

The state of the market for tools in parallel computing is abysmal. (Marc Snir, University of Illinois)

There seems to be a default assumption around the IT industry today that, as processors evolve to more cores and more heterogeneous processing (and as computing architectures get more distributed), the software will evolve apace. Changes will be required, of course, but nothing to really worry about. I'm not so sure. Even if one discounts the most dramatic doomsayers, lots of researchers and IT executives see serious gaps in both tools and training to deal with highly threaded processing. Consider that several of the panelists in the Parallel Programming session spoke warmly of the potential for functional languages. Yet it's very early days for the likes of Haskell and Fortress and, in general, language development and adoption is a very long process.

People choose killer apps. (Craig Mundie, Microsoft)

Craig Mundie spoke at length about many of the characteristics that he saw such an application as having. He used terms like context- and location-aware, immersive, and personal. The general theme was "client + cloud," the idea being that this level of realism and immersion requires huge processing power and function at the client even if the data and orchestration takes place in the network someplace. He didn't really specify a specific next-generation application though. I'm not sure killer "app" is really the right term though. The original PC had a true killer app--the spreadsheet. But subsequent generations have really had interaction models--first the GUI and then the browser. The next generation will probably be something similar, exposing an even more varied set of applications but in richer ways. (CNET News' Dan Farber has an in-depth post on Mundie's keynote.)

Ephemerality of the Web is something that needs to be addressed. (Web 2.0/3.0 panel)

One of the ironies of the digitized world is that it's a potential enabler for an unprecedented level of preservation but, in practice, it often ends up opening the door to huge amounts of content vanishing in an instant. At the consumer level, photos offer an illustrative example. The combination of external hard drives and online services can far better protect digital photos from mishaps than is possible with negatives, slides, and prints. But, in practice, most consumers don't have good backup systems and can easily lose everything with the crash of a hard disk. Long-term preservation is an even bigger problem, both online and off. What happens as companies are purchased or go out of business to the content that they create or host?

Web technologies are the best platform for mobile development. (Kevin Lynch, Adobe Systems)

The iPhone hype (or, indeed, the babel that surrounds Apple in general) can be wearing. However, one thing that the iPhone has really accomplished is to crystallize the notion that the browser is a viable interface for mobile phones--at least high-end mobile phones. Kevin's contention that Adobe's AIR runtime is also necessarily part of the mix is more debatable, but the general concept that mobile applications will tend to center around the same mobile technologies that are used on "PCs" seems sound. (I tend to think that the mobile device will be a smartphone rather than a separate "Mobile Internet Device" (MID) but that's a separate debate that centers more around form factors and networks than programming models.)

Many energy solutions are not relevant at the scale that matters. (Vinod Khosla)

Finally, a sizable chunk of the conference was devoted to "green" and energy. Unsurprising given how it's a hot (if also overhyped) topic in IT and elsewhere. It's also an area that lends itself to transformative innovation--which fits well with the general focus of EmTech. Which is what venture capitalist Vinod Khosla, a co-founder of Sun Microsystems, is really talking to with that statement. It's not that incremental changes aren't desirable. They are. Indeed, a lot of power efficiency work in technologies such as microprocessors is a sort of whack-a-mole game of accumulating small wins. However, from a macro and policy perspective, big wins don't come from the niches. They come from making substantial impacts on substantial use cases.

  • prev
  • 1
  • next
advertisement
Click Here

About The Pervasive Data Center

This blog takes a deep (and often skeptical) look at trends big and small in the world of enterprise servers, data centers, and "Yotta-scale" computing. This means also taking into account the myriad of software, networks, and devices that are driving change in (or being driven by) these back-end systems. Stories posted to this blog may also appear on Illuminata's site.

Gordon Haff is a principal IT adviser for Illuminata of Nashua, N.H. Before becoming an IT industry analyst, Gordon held a variety of product-marketing positions at Data General, spanning more than a decade. He's programmed for DOS, Windows, and Linux; builds his own PCs; and holds engineering degrees from MIT and Dartmouth, with an MBA from Cornell. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

The Pervasive Data Center topics

Most Discussed