Intel gets to the core of Cloud infrastructures

Intel.jpg

 

The job title 'Director of Marketing and Business for Software Developer Products at Intel’ may not at first sound too relevant to the future development of Cloud Computing, but in practice it means that James Reinders is the company’s key evangelist for parallel computing developments at a time when Parallel technologies and the cloud are about to come much closer together.

Every server being sold now is based on multicore processors, with 4-cores becoming the norm. These are the basis for parallel computing in supercomputing, and the technologies are moving down fast into commercial areas, with the needs of cloud infrastructures being a prime target. Reinders took time out at the recent Intel Software Developer Conference to talk to Business Cloud 9 about that growing relationship.
 
Martin Banks (MB): It seems to me that the Cloud and parallelism are a marriage made in heaven – maybe not consummated yet but….
 
James Reinders (JR) : Yes, in several different ways. Obviously our push on multicore processors gives more and more density and power efficiency which helps you implement a Cloud. It is amazing how much power datacentres consume now and multicore processors really help. That reality of Cloud Computing will continue because of multicore processors. The other thing is that when you are using a cloud you start to get some of the same parallel programming challenges such as deadlocks and race conditions, though as yet there aren’t any tools for that when designing applications in the Cloud. You need to understand those issues and deal with them, because there aren’t any tools yet that deal with it.
 
MB: Is that because those tools don’t exist or they’re impossible to create?
 
JR: They don’t happen to exist yet, and one of the challenges will be the emergence of standards. The Cloud has a lot of variety in it and it is harder to develop things when there is a lot of variety. But you’ve certainly seen a lot of consolidation and standardisation in the Cloud – standardised of stacks and things like that – so eventually you’ll see tools emerge. It’s not my area of expertise but it does strike me that a lot of the topics people talk about in the Cloud and parallelism are similar.
 
MB: Does that suggest that you see the same programming problems emerging in the Cloud that have emerged in parallelism?
 
JR: Absolutely. And they’ve been there for a while. If an airline has only two seats left and three people try to buy one the system needs to ensure only two can do it. The airlines are still not very good at that. There are techniques in parallelism developed to deal with that which are only now being looked at for solving the Cloud issue. They are now looking at transactional memory as a research topic.
 
MB: So can the tools in Intel’s Parallel Studio that are used to trap deadlocks and race conditions be applied to Cloud environments?
 
JR: The tools themselves no but the inspired ideas, yes. Some of the tools we’ve used for MPI analysis inspired some of what we do in Parallel Studio and now we have developed that. I don’t see us having any plans as yet to take those lessons back and develop tools for the Cloud, but it’s logical to think that we solving problems that can be used to inspire tools for the Cloud in the future. The way that we’re detecting deadlock and race conditions in Parallel Studio could be used to develop a tool for the Cloud. I don’t know of anyone doing that, but as we understand the problems better in one space usually it becomes easier to apply them to another.
 
MB: Do you see that being work that is appropriate for your group in Intel?
 
JR: Not right now. It would be out of our core competence right now, and its less clear how Intel’s role in that would play out. We develop tools that are close to the silicon, so it is easier for people to understand what Intel doing it. Also we don’t see tools as the major barrier in Cloud Computing. We do see some barriers, such as standardisation of the software. So we’re working with both Microsoft and the Linux community. The Linux community is evolving standards where Intel has had some involvement. Security is also a big challenge, so we’re working with the hardware and software communities to see how we can build more confidence in the security of platforms that implement Clouds.
 
MB: When it comes to security it seems that whatever technologies a grown man can come up with a 12 year-old can crack in 30 seconds, and if you put too much security into the silicon you fix it there, which could make it a problem. At what level are you talking about security on the silicon?
 
JR: We can go quite far. We’ve already got random number generators in chipsets that are used by advanced cryptography. That’s not really the cloud but we are eventually going there. It is a huge challenge when you have a server in the cloud that’s sharing data that should be secure. There are a lot of things that software and hardware can do together there. Encryption is one thing. You can develop encryption that a 12 year-old won’t crack so long as you maintain security over the keys. It is usually that which is the weak point. It’s a software/hardware thing that has to be worked out in policies. Security is going to be the big issue with the Cloud. Even with the limited cloud usage today it does not seem long between reports of some major problem. But I think it is the way the world would favour going if we can solve these problems.
 
MB: During your presentation you were talking about Ct, which looks like a good advance for developing cloud services, in that it could be used to develop the code that runs in a parallel environment which could then be used to build the tools that build cloud services. You were also saying that it allows the code to be run on different platforms, which itself maps onto Cloud infrastructures where services will need to run on virtual servers set up for the purpose.
 
JR: You’re right. That’s a really interesting observation. You have to suspend your disbelief a bit about bandwidth, which is always a challenge with the Cloud, but the key to Ct is that you have to describe the data in ways that allow people to move it around. Normally, if you do something like an array there are pointers to it, and the pointers will be everywhere. But if you move the data to a different machine the pointers may access the data when its not the current copy. So Ct allows you to create arrays but you’re not allowed to take the address if it moves. Normally we’re thinking of tightly coupled environments like a GPU because the data’s local but you’re right, you can carry that to the logical conclusion. We could take the data and send them up to the Cloud somewhere. Uploading it to an Amazon surfer and doing more work on it?....I guess.
 
MB: So Ct is potentially a tool for the Cloud?
 
JR: It’s the right type of abstraction. It’s not how we expect it to be used at this point but as the bandwidth to the Cloud gets greater the potential for something like Ct to upload problems to the Cloud is there. I hadn’t thought of it that way but its there.
 
MB: So have you addressed the issue of what type of tools, at the silicon level, the Cloud service providers might need?
 
JR: Everybody doing services in the cloud benefits from the same tools – all of us do to start with; Optimising compilers, libraries etc, because their computers are running software. That can be pretty straight forward. I’ll give you an example. Facebook has written an accelerator for Python. They have written everything in Python but that is an interpreted language so they wrote a specialty version that compiles. That is now an open source project and Facebook is doing the optimisation, which has given them double the performance of the compiler. I don’t know how many servers Facebook has but the optimisation means they can halve that number. Traditional optimisations are pretty critical – I have worked with Google to optimise their code and I think their major concern is that the tools needed to optimise communications don’t really exist.
 
MB: And Google’s servers are very low spec so the code optimisation approach really plays to the strength of that type of Cloud infrastructure. In fact one of your colleagues (Uli Dumschat, of Intel’s Software and Solutions Group) was pondering the idea of servers based on the Atom processor (which are widely used in Netbooks, smartphones and embedded systems).
 
JR: There have been a lot of people ponder that. It becomes a pretty simple equation about density and performance, if you can design a board with Atoms on it and get better density and lower power. The power consumption plays in with the density. If you have lower power you can pack the processors in with greater density.
 
MB: You did talk about Larabee (Intel’s pitch at the Graphics Processor Unit market), you indicated it had missed its market window.
 
JR: We decided we are not going to sell it, but with are making systems with Larabee in it and doing software development on it. We hope to announce later this year what productisation of Larabee architecture will eventually exist. But it was a very specific chip for graphics applications. We had some delays and by the time it was ready the opportunity had passed, High end graphics cards don’t cost very much and we didn’t want to be number three or four in the market. We do believe that graphics processors will be like our Larabee design. So we decided to work on the software and the next generation design, and figure out what market window we can hit.
 
MB: But graphics processors are being targeted as a processor for business applications in the Cloud.
 
JR: We think they will eventually, but that is complicated.
 
MB: Atom based?
 
JR: Something I learned a long time ago is if you can doing something in parallel and can scale it then you don’t care how many cores it runs on. So what has happened with our processor design over time is that we have kept making them bigger to add exotic things like out of order execution, pre-fetching etc. We were able to increase the absolute performance of the processor but the efficiency of the die area to performance gets lower. So if you’re building a parallel machine, having a lot of smaller processors may make more sense. If I have a processor that is twice the die area but only performs 50% faster then I could build two processors and get two units of performance whereas I’d get one and a half units of performance with the larger processor.
 
This has been a thought around in computing for some time. Danny Hillis’ PhD thesis that led to Thinking Machines in the 1980s. He believed you could build a massively parallel machine out of bit-slice processors but didn’t work out because the cores were too simple and they couldn’t do enough work relative to the communication cost. So then what they did is they added floating point chips which started to balance the system. But if you took 50 Nehalem’s (Intel’s latest x86 architecture processor) and string them together for graphics processing that would not be as efficient as 50 Pentiums. Would Atom be the right strike? I think it would be an excellent choice.
 
MB: When it comes to the hybrid approach (which is now seen as a favourite future architecture for multicore processors) I assume Intel must be thinking something like three or four Nehalems and a bunch of Atoms as the obvious approach.
 
JR: Yes. Maybe a 40-core processor with eight Nehalems and 32 Atoms, that makes more sense than 40 Nehalems. Once you believe in using multiple cores you are talking about a scalable parallel algorithm so you might as well have part of the design be able to handle a very scalable application and for the less scalable applications use the Nehalem.
 
MB: Would the Nehalem be able to act as the on-chip I/O management/work scheduling system? For example, the thought occurs that the ideal processor would have four Nehalems - two for input and output management and two for specialised applications, plus a number of Atom processors. Does that map onto your thinking? Is four the right number?
 
JR: I don’t know, but you are asking the right question. That’s what you think about when you design a balanced machine. When I was involved in designing the first TeraFLOPS machine our nodes had two P6 processors on them. One handled all the messaging passing to the other nodes, and the other did all the computation. It was very efficient. And we’ll just collapse that down to work on a single chip at some point, though I don’t know when that will be. 
 
MB: So not this week then?
 
JR: Probably not.
 
 

 

 

 

tags for Intel gets to the core of Cloud infrastructures

Now on techcloud 9

Commenting on the cloud

Next | Previous

Twitter feed

Tag cloud