Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log in

[ Create a new account ]

Core Microarchitecture

This post by Benson_Intel for Intel on Mon Apr 09, '07 11:24 AM
For the next 2 weeks Benson, Brett and Jeff will be on-line to discuss the Core microarchitecture. As you all know, our latest dual and quad core microprocessors are based on the Core microarchitecture and are a vast improvement over the previous generation of Netburst designs. Brett, Jeff and I have spent the previous 8+ years in Intel's Desktop Microprocessor division working with customers and Intel's design team to define processor features, specifications and board designs. We are looking forward to discussing a wide range of hardware features here; everything from low power states for Energy Star, maximum and typical power dissipation levels, cache designs, temperature measurement and thermal monitoring, dual and quad core performance, front side bus vs. integrated memory controller, and pipelines to name a few. You may have also read some of the recent articles about our upcoming processors built on the 45 nanometer process. We can answer questions on those products too. (If you happen to be in Beijing, China next week you can attend IDF and ask one of our Principal Engineers all about the 45nm Core 2 processors).

Please keep in mind that we are hardware experts, not software, so questions on virtualization, security and multi-threading optimizations are outside of our realm.

The three of us are located in Oregon, on the West coast of the USA, and will be responding to your questions and comments on a daily basis.

If for some reason you have no questions for us, I'd be interested in your response to a couple of my own:

  • What is more important, a processor having particular architecture features or a processor that has the best performance?
  • How do you use information displayed by some hardware monitoring programs such as processor temperatures or voltages?

Related Stories

[+] Desktops & Laptops 15 comments
Hi all, my name is Alan and I would like to introduce John and John (no relation). It is our turn in the barrel this week as we dovetail from the recent discussion around Core Microarchitecture and focus on enterprise laptops and desktops. We work in Intel's IT department and want to spend the week relating our experiences, direction and challenges in managing a large and diverse client environment.

We plan on covering relevant issues around business clients with topics ranging from purchasing considerations, refresh, feature selection all the way through manageability, security, virtualization and everything in between. We'll talk about the good, the bad and even the ugly when it comes to business clients.

Our plan is to put out some topics that we hope will have interest to the Slashdot community and provide insight into technology elements and business processes that drive decisions. All three of us are based in the United States and will be posting and responding as frequently as we can. Once the week ends we'll be passing the torch to some of our coworkers who will drive discussion on next week's topic, mobility and wireless.
[+] Intel in the News
Most of this week's big stories about Intel are coming out of the Beijing Intel Development Forum. And there's a lot of them, too.

First, forget Dual Core. Or even Quad Core. Here's a PC Pro story headlined Intel finally demos 80-core pocessor. There has been talk about this ultra-mighy-processor in the past, but this is its first public unveiling. No, it won't run Windows (or any known desktop Linux distro), but still: 6 GHz, 2 teraflop performance. That'd be nice to have around the shop purely for bragging rights, wouldn't it? Surely we can come up with *some* kind of "practical" use for the thing that'll get our bosses to spring for one, right? Worth a try....
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Your questions

    (Score:3, Interesting)
    by suv4x4 (956391) on Monday April 09, @05:42PM (#18668315)
    Hi, it's pleasure to have you guys around to discuss the future of computing together :)

    To answer your questions from my perspective (I'm, of course, no one special, just average programmer):

    * What is more important, a processor having particular architecture features or a processor that has the best performance?

    We're long after the phase where we're coding large pieces of code in assembler, and moved to higher level programming platforms. As such, and with all software around built for existing architecture, it really benefits us more to improve the performance of the existing features versus introduce constantly new SIMD extensions and so on.

    The need for new features arises when you can't squeeze more performance out of the existing architecture. Isn't this, after all, why NetBurst died: it's not as if the programmers worldwide suddenly woke up thinking "damn it, I hate serial programming, instead I want to deal with thread concurrency issues, deadlocks, race conditions and all that fun! Give us multicore!"

    But.. as multicore is the best way to scale up, then so be it. Same for SSE - it's NOT fun having to implement more and more branches in our code (does the CPU have SSE 3? then SSE3 code, else if it has SSE2, SSE2 code, else if it has 3DNow!, then 3DNow! code etc. etc.) to take advantage of new CPU-s.

    This is why Itanium was such a dangerous step, and ultimately failed. It may have had incredible architecture, but for programmers world-wide, it meant we drop everything we worked long years to build and start porting from scratch, more or less and deal with the quirks of a totally new CPU architecture.

    * How do you use information displayed by some hardware monitoring programs such as processor temperatures or voltages?

    Well, if they are really off target, I call tech support :D
  • by Tragek (772040) on Monday April 09, @07:39PM (#18669347)
    (Last Journal: Saturday March 31, @02:05PM)
    What is your take on the idea of power consumption vs specialization. Do you see any benefits in an architecture like Niagra/Niagra2, a specialized architecture that can do excellent in certain types of tasks, but poorly in others, yet having a good power consumption (How it compares to the core architecture, I have no clue. I'm interested in processor architectures, but I'm certainly no expert). Or do you think it would be better to keep going with the generalized processors, but try to keep their power consumption down?
  • by Joe The Dragon (967727) on Monday April 09, @08:46PM (#18669797)
    What is better
    also why did you make the quad-core cpu have to use the same FSB to talk to each other as the cpus need to use to get to ram / the chipset. Why not have a bus that is used just for the dual-core to dual-core traffic?
  • by fred fleenblat (463628) on Monday April 09, @09:00PM (#18669891)
    x86 is already just an emulation layer in prefetch. how about adding support for some other popular instruction sets like arm, alpha, pa-risc, sparc, or ppc?

    it would sure shake things up.
  • by Extide (1002782) on Monday April 09, @09:16PM (#18669989)
    So to answer the posters questions,
    1. For the most part, speed. That is not to say SSE* is worthless, as its not, but, as someone above posted it makes the programmers life more difficult having different code paths, etc. -- Personally I run a core 2 duo box @ 3.4ghz with a 15K scsi drive -- going back to the stock 2.4ghz of the e6600 feels slow, and you can defiantly tell in using the pc, at least I can. I am somewhat picky with that aspect and my pc builds, but with this setup I am quite pleased really.

    2. Hardware monitoring beyond temps and fan speeds is only really useful when initially setting up a system and seeing how fast you can overclock it, at least in a desktop environment. BUT having as many details, and high accuracy (like 0.01v or so, better would be nice) really helps in this area. (Now I am talking about all kinds of voltages here, not just vcore, so that is probably more to do with the bios & motherboard manufacturers)

    So I know this is going to be a hot topic, can you please just explain as much as you can about the new CSI (?) interconnect system that works similarly to hypertransport that is supposed to debut on the chip revision after penryn? (2nd gen 45nm)

    Anyways I really think you guys did an amazing job with the Core 2 microarchitecture, I mean it was better than I thought it was going to be. (Too bad the P4 didn't have the same story but oh well ;) )
  • Progress

    (Score:2, Insightful)
    by Duncan3 (10537) on Tuesday April 10, @01:00AM (#18671645)
    (http://www.mithral.com/~beberg/)
    Now that chips are 10-100x more powerful then anyone in my family needs except myself, where are the ultra low power PCs? The Mac Minis I have most of my relatives hooked up with are great, but there is still a very long way to go. There is no reason a family PC should use more power then a 6W night light. You have some 10W core 2 duo parts, but that's for laptops only.

    ATI and nVidia are launching ~250Watt cards which is causing people to pause, because if you're in the room with it you'll suffer from heat exhaustion. Intel isn't doing much better for the desktop frankly.

    These beasts have got to stop. Integrate a graphic core and some RAM, make it under 10W and call it a day.
    • Re:Progress by dreamchaser (Score:2) Tuesday April 10, @09:17AM
      • 1 reply beneath your current threshold.
    • Re:Progress by jimstapleton (Score:2) Tuesday April 10, @01:06PM
    • Re:Progress by Duncan3 (Score:2) Wednesday April 11, @05:28PM
    • Amen to that by Krishnoid (Score:3) Friday April 13, @05:31PM
    • Re:Progress by sarahbau (Score:2) Sunday April 15, @03:18PM
  • Architecture features are really interesting only as they affect performance; I don't care whether the three-dimensional FFTs, provided by a library, in the core of the crystallography software I work on run fast because the library runs them on multiple cores, or because it taking advantage of SIMD instructions for accelerating FFT butterfly operations, or because the processor noticed that it had lots of work scheduled and boosted the clock speed briefly, or indeed because it's connected to a supercomputer in Barcelona and run them there.

    On the other hand it's an interesting brain-exercise to write code which fully exploits SIMD instructions, so from that point of view I'd quite like exotic instructions; on the other hand this is code I'm writing for myself rather than for distribution, and we still have to distribute code that runs on Athlon chips without SSE2.
    • 1 reply beneath your current threshold.
  • Displays of CPU speed are really only relevant to improve my confidence that the speed-on-demand governor software in Linux is working correctly; you lose confidence when seeing that both cores are 100% loaded with non-niced processes and the displayed speed is still 1.0GHz.

    Temperature, again, is really only useful to improve my confidence that the people I bought my computer from assembled it correctly, and that the ventilation system hasn't broken; if the Intel boxed cooler isn't capable of keeping the processor it came with adequately cooled, the air-flow in the box is probably completely screwed up, and it's likely that I ought to cycle down and get a new fan to replace the one that has doubtless died. The program that came with my D850GB board in 2001, which displayed a window and sounded alarms when temperatures went out of spec, was ideal for that; no need for real-time displaying of the temperature.
  • My thoughts

    (Score:2)
    by jimstapleton (999106) on Tuesday April 10, @12:58PM (#18677825)
    (Last Journal: Tuesday February 06, @10:13AM)
    Well, first I'll say this:
    I was a die-hard AMD fan until the Core2 CPUs came out, though the PentiumM/Core series did make me waver a bit before hand. Good job.

    That being said - Features on a CPU are only relevant in that they improve performance, and when being named, they provide a way to ascertain in what areas performance boosts are available.

    With that, the features I'd like to see are
    (1) Someone else mentioned alternative instruction set emulation, that would be nice. Especially with multi-core. It could be useful to dynamically switch one of the cores to "ARM", "SPARC" or maybe even "Itanium" mode for testing and debugging. At least as far as the opcodes go. I suspect the rest of the system would have to be emulated in software and the other CPUs would have to act as information gateways between the emulating CPUs due to system hardware design differences.

    (2) Up to this point, AMD has been somewhat belatedly adding Intel instruction sets (SSE[x], MMX, etc.) to their chips. Intel has not been adding the 3DNow instruction sets to their chips. In comparisons, I've seen several apps where, if it didn't use 3DNow, Intel had the lead, but when 3DNow was added AMD pushed well ahead. So, I'd really like to see 3DNow on an Intel chip if they can get AMD to license it.

    (3) Last but most definetly not least - a continuation of the efforts to reduce CPU heat production.
  • by Anonymous Coward on Tuesday April 10, @03:39PM (#18680531)
    Why has there been a transition from clockspeed increase and architecture feature addition to multicore?

    I mean, where are the 7GHz processors? By now you'd think we wouldn't have an effective speed cap around the 3.5GHz mark [the fastest I can usually find them]

    Seeing as the x86 instruction set is an emulation in prefetch, why not give us access to the ACTUAL instruction set? I'd gladly program that instead.

    Why is there so much fuss from everyone about "low-power" desktop computers? It's not on a battery, it really does not matter if the computer takes 5 watts or 500 watts.
  • PowerPC features

    (Score:2)
    by gnasher719 (869701) on Tuesday April 10, @07:41PM (#18683673)
    Are there any plans to extend the processors and instruction set to 32 general purpose and 32 vector registers?

    (This could be done in a backward compatible way. Agree with AMD that having two REX prefixes in an instruction is illegal, instead of just ignoring any REX prefix but the last. Then in a future processor two REX prefixes could be used to specify 32 registers of each kind instead of 16).
  • by gnasher719 (869701) on Tuesday April 10, @07:48PM (#18683759)
    There are two features in the PowerPC instruction set that give it a big advantage in the vector unit: The vector permute (vperm) instruction which can combine arbitrary bytes from two vectors by using a permutation vector, and the vector select (vsel) instruction which combines two vectors using a mask vector. Would there be any chance of adding these instructions to the Intel vector unit?
  • by gnasher719 (869701) on Tuesday April 10, @07:53PM (#18683805)
    The third PowerPC feature that Intel processors don't have is fused multiply-add. This is the only thing that keeps PowerPC just about ahead in floating-point performance; it is also really helpful for implementing quad-precision floating-point arithmetic. Can you see any chance of adding this to Intel CPUs?
  • The Core 2 Duo took a step in the right direction over the Pentium 4. The second core improved the responsiveness of Windows and Linux (especially under a heavy multiprocess load). It sucks less power from my power company. Less noise from the fans would also be nice, but I can't tell based on the other computers next to the machine with the Core 2 Duo. All those fans can be deafening in a lab or annoying in a quiet home theater setting.

    Not all programs are compute intensive. A fair amount are bound to memory bandwidth and memory latency. Those new hybrid hard drives with flash memory look promising to reduce the bandwidth problems between the hard drive and RAM, but I haven't seen much innovation on improving the speed between RAM and the CPU. L1/L2 caches only do so much, and the CPU speeds seem to have far outpaced the RAM performance. More cores means more of a load on the memory.

    I'm interested in what kind of memory bandwidth improvements are on the horizon. Is the memory controller going to be integrated on the next CPU, like AMD Athlon X2? Will it use XDR memory, like the Cell processor? What kind of memory improvements will we see in the future?

    BTW a lot of people don't monitor their CPU voltage or temperature. The room thermostat, computer fans and heat sinks take care of the temperature, and the UPS takes care of the voltage. As long as the computer is doing its work, I can continue playing *cough* working on important things.
  • Memory mechanisms.

    (Score:2)
    by Goalie_Ca (584234) on Wednesday April 11, @07:59PM (#18696747)
    (http://www.sfu.ca/~rdickie)
    Assuming we can start to create smart mini-schedulers that can schedule jobs to all of these cpu's.. what memory bottlenecks are we expected to see. Knowing that there exist different memory architectures each with different tradeoffs what are we likely to see? Numa, etc.?
  • My Answers

    (Score:2)
    by trisweb (690296) on Thursday April 12, @04:17PM (#18707785)
    (Last Journal: Thursday August 07, @03:46AM)
    Hi, thanks for the honest article, much better on the 2nd try ;-) I'll do my best to give good answers...

      * What is more important, a processor having particular architecture features or a processor that has the best performance?

    I personally believe that performance only becomes a factor if I actually notice a slowdown because of it. I don't think I have with any modern processor. I think performance and architecture features would go hand-in-hand though; if you're adding features for reasons other than performance in specific situations, then I'd like to know why... you probably mean "general performance" as in ramping up the clock speed vs "specific performance" as in focusing on optimizing the architecture to get the most out of it. I'm most interested in balance; I want a processor to be smartly designed, to include all the architecture features that I want (mainly 64 bit and reasonably optimized for video encoding and graphics processing) while having a large cache and to do it all at the fastest clock speed possible for the process and the features. I don't want to tell you where that balance is, I want you to figure it out for me. That's your job :-) But it shouldn't just be about raw performance, nor should it be about including every feature you can think of. Balance.

      * How do you use information displayed by some hardware monitoring programs such as processor temperatures or voltages?

    This is a harder question. I run AMD currently as I've said before, and I don't feel I ever need to look at the temp or voltages. The X2's run fairly cool (just try to find a heatsink/Fan for the AM2 socket... no one makes them!) so it's never something I've had to worry about. And I don't overclock, if I did it'd be very conservative.

    So the answer is that I don't use that information. I look at it once to make sure it's under, oh, say 50 deg. C, and I never want to see it again. My criteria is that my processor a) keeps running stably, and b) doesn't catch fire. Anything outside that I don't even want to think about.

    Thanks for your interest in the Slashdot community... keep on being honest with us ;-)
  • Question

    (Score:1)
    by laplace_man (856560) on Friday April 13, @05:04AM (#18715659)
    (http://pg302.sourceforge.net/)
    What is more important, a processor having particular architecture features or a processor that has the best performance?
    Well it depends. It is quite obvious to me now that there are two different camps now. You have low consumption processors made for laptops and mobile devices and server/gaming, multi core processors.Now the time has come when fragmentation of this market should happen again. In a way it already did happen with wide use of ARM architecture but I expect further fragmentation on desktop market in the near future.There is no need to have a dual core processor for simple word processing right? I expect that there will be more fragmentation here in the future. I think there is no need for adding more instruction sets in processors (even though that could be fun for programmer).I believe in cross compiling on a higher level.That's why I stick with GCC that supports wide range of microprocessors. So if adding new instruction set would hurt performance/power consumption don't do it! Unfortunately desktop market is driven by the "FORCE" of Microsoft who doesn't take cross compiling very seriously and is just creating demand for new performance boosts on certain areas what eventually mean new instruction sets and more backward compatibility junk in microprocessor ,big power consumption and lower performance(nothing new right)?

    How do you use information displayed by some hardware monitoring programs such as processor temperatures or voltages?
    Programmers unfortunately don't want to do anything about it on this subject as long as their program works. I believe there will be some great movements on this area in connection with process importance mapping/frequency scaling such as Nice value on Unix machines.Smart scheduling and voltage CPU regulation should happen on OS in not very distant future and should become some kind of a standard. Not only that we need information on voltages it is even more important to get current consumption for beginning on large CPU's.If I take this further I think OS should give user possibility to easily change importance of certain process for example from top of the window. The thing is this is OS task and not regular programmer's task. Problem here is of course standard.
  • by Royce3 (1087871) on Friday April 13, @07:56AM (#18716555)
    Being on the cutting edge of real-time ray-tracing development, I'd have to say the thing that would be most effective would be to get a feature that the way ray tracing hardware developers as well as gpu developers have and that is to work around cache misses. What I've read about in ray tracing hardware and in gpus is if there's a cache miss, the hardware puts that "thread" into a frozen state and works on something else until the cache is loaded. I realize that hardware thread support would probably not fit within the scope of x86, but perhaps some instructions could be added that would allow you to asyncronously load/save data into/from the cache, and query the state of that request. This way you could work on something else while waiting on the request to finish. In realtime raytracing the problem dataset is too big and too unpredictable to serially optimize, so these kinds of tools would allow for a vast leap forward in performance. Thanks!
  • end user

    (Score:2)
    by fontkick (788075) on Friday April 13, @09:42AM (#18717547)
    As an end user who builds my own PCs, all I want is a good value. It's a cliche, but it's the best way to describe it. For me that means a high-end processor should cost under $400, it should be fast, and it should not require a 40dB cooling fan. In a laptop, battery life is more important than computing power. How this is implemented is of no concern to me. The Core2Duo is as close to perfection as I've seen - it's amazingly fast at end user computing tasks (RARing, encoding, multitasking, Photoshop, gaming, etc). My system is very quiet, and it doesn't mimic a space heater. Intel has done a great job on the C2D.
    • Re:end user by Jeff_Intel (Score:1) Friday April 13, @11:37AM
      • Re:end user by dreamchaser (Score:2) Saturday April 14, @11:55AM
  • by jd (1658) <imipak.yahoo@com> on Saturday April 14, @02:26PM (#18733323)
    (http://slashdot.org/ | Last Journal: Wednesday March 07, @03:14AM)
    There are more processor designs than there are CPUs currently under development, precisely because nobody really knows how to balance all of the hardware considerations. A "pure" RISC CPU is generally faster for the same clock speed for integer arithmetic and other very basic operations than a hybrid design such as the modern x86. On the other hand, nobody has been able to build "pure" RISC systems capable of efficiently handling advanced floating-point arithmetic.

    (IIRC, for very basic opcodes, a truly pure RISC chip is still generally around four times faster than a RISC/CISC hybrid, clock-cycle for clock-cycle.)

    Parallelisms within the chip architecture - multi-core, SMP, or something else along those lines - have also been an area filled with frantic activity. The best you can scale SMP is about 16-way. The best anyone actually scaled an Inmos T400 (and still got performance benefits) was 1024-way. You'd need a 16-way SMP configuration of Intel's 80-core wafers to scale better, core-for-core.

    Then, there's the issue of what you actually want in the CPU at all. Where a function can be provided better using processor-in-memory architectures, why bother optimizing the CPU implementation of that function? The whole reason for the hybrid design was to avoid over-optimizing the wrong stuff. If you extrapolate that to what can be offloaded, doesn't it make more sense to de-optimize the offloadable and use the real-estate gained to improve performance in areas that the CPU absolutely has to do well in?

    I guess that last bit sums up my question for Intel's gurus and genius extraordinaires - offloading (whether with PIM or some other method) requires something to offload to. Those things don't generally exist because there's never been anything to offload onto them from. Other chicken-and-egg-and-trex problems exist, for much the same reason. How do you optimize your core microarchitecture from a technologically-correct standpoint, when the other technology required for you to do so may never exist until you have done so?

    (All other optimization problems reduce to this, because all other optimization problems will have "better" technical solutions than the ones you can directly get to. Just ask de Bono.)

  • by fred fleenblat (463628) on Monday April 16, @02:57PM (#18753963)
    Here's a little change of topic. I like to think I'm pretty smart most of the time, but it took several weeks for me to grok Core, Core 2, Core 2 Duo (plus an Extreme in there sometimes).

    At first I thought core meant "core" like number of processing cores. Then it seemed like 2 meant how many cores there were. But that's completely wrong on both counts. It's the Duo. But four cores makes quad instead of Quattro? I guess Audi would have sued you. I suppose you better be careful when you get to 8 cores or you might step on a spiderman 2 character, sigh.

    On top of all that weirdness, Sony uses nearly the same naming convention for their memory sticks.

    I'm not offering a better idea, I really don't know. I just remember it being really clear that Pentium 1 through Pentium 4 offered improved performance and clock speeds with increasing number. Maybe Pentium 5 seemed silly to somebody in the marketing/branding department but it would have been less confusing. I liked the "m" suffix for mobile, and hey an "s" for server would have been nice.

    Fair's fair: a certain three letter competitor's naming choices are not much better.
  • So in products like the Mac Pro with 8 cores and two chips how is cache coherency maintained? Is it a write-invalidate scheme or something else?

    In particular does it cost any extra FSB bandwidth to maintain cache coherency or is this somehow accomplished by simply listening to the reads and writes from the other chip?

    Thanks
  • VT-D Technology

    (Score:1)
    by RodM (1089419) on Tuesday April 17, @05:33AM (#18764871)
    Hi, Just a quick question... can you give me any indication as to when the VT-D technology will appear in one of the Intel Processors? Thank you
  • How about both

    (Score:2)
    by Colin Smith (2679) on Tuesday April 17, @06:13AM (#18765045)

    What is more important, a processor having particular architecture features or a processor that has the best performance?
    Um coming from a corporate user. I think we're down to performance per watt now. We're no longer buying massive individual systems with big fast CPUs because it's much cheaper in most cases to cluster a bunch of small systems with slower and cheaper CPUs. The issue becomes power and heat per unit density.

    There would have to be some massive jump in performance to switch back to big iron with fast/expensive CPUS.

    On this. Are we likely to see any reconfigurable CPUs [wikipedia.org] from Intel in the near future? Wouldn't that be an architectural feature which improves performance?

    How do you use information displayed by some hardware monitoring programs such as processor temperatures or voltages?
    We don't. Don't care at that level of detail. It's the overall system. Typically the disks are a bigger a problem for power consumption/heat dissipation. Don't get me wrong, we still want low power/ high performance CPUs but it simply isn't worthwhile to monitor it on a per CPU basis.
     
  • by Benson_Intel (1084359) on Tuesday April 17, @05:01PM (#18773229)
    (Last Journal: Monday April 09, @11:16AM)
    There have had a number of discussions and questions on processor instructions, so I thought I would provide a link to the new SSE4 instructions that will be available in the upcoming 45nm Core processors. http://softwarecommunity.intel.com/articles/eng/11 93.htm [intel.com]
  • by Krishnoid (984597) on Thursday April 19, @04:10AM (#18794837)
    I think we've moved past performance issues for applications in the classical era of computing. Since 1990, I think the shape of computing has changed in the productivity realm (and standardized somewhat) to emphasize a different application space. My best guesses would be to start at the top and work your way backward to see what features would best support today's applications rather than yesterday's computationally intensive scientific applications, gaming, and office suite applications. I think it's fair to say that those applications have as much performance as they can really benefit from. My stream of consciousness on this:
    • Browsers
      • Basic performance -- if all parts of a page are cached in RAM, reload the page and work backward to imagine what features would make it display faster or simplify common browser features. Try this with http://www.intel.com/ [intel.com] or your favorite other sites.
      • Basic model -- Markup-based (e.g., word-processor-style) page layout at realtime speed
      • Note that all parenthesized items are just ideas -- not sanity-checked for actual performance contributions or feasibility or even applicability
      • Performance
        • Page parsing (hardware support to accelerate sgml/html parsing into DOM tree)
        • Page layout (2d bin-packing (e.g.,) algorithm support)
        • Page rendering calculations (hardware support for element x/y size determination and layout)
        • Actual page rendering (DMA to layout images, better 2d graphics support)?
        • Javascript acceleration (general dynamic language improvement)
      • Quality
        • Proper typesetting (realtime TeX-style text layout) for ease of reading
        • Font antialising (hardware support for TrueType/outline font rendering algorithms)
    • Scripting languages
      • Dynamic languages are becoming more common now that programmer time and code design/maintenance is costlier than cpu time. It would be great to have features that support it.
      • Again, items not sanity checked, just general ideas
      • Performance possibilities
        • Dynamic languages poke around their symbol tables a bit, while compiled languages don't. Perhaps this is a performance issue?
        • Dynamic data structure support (dynamic arrays, dictionaries, implicitly-typed scalars)
        • Compiled code is parsed through a BNF grammar once, dynamic code every time it's run. How about instructions/hardware support for lexing/tokenizing/BNF-based parsing?
        • Languages as old as LISP could define new code on the fly -- hardware support for dynamically generated code, and/or passing around closures?
        • Memory gets de/allocated/copied much more frequently and in smaller chunks -- can there be better hardware support for that?
        • Awareness (at some rudimentary level) of objects and method calls and appropriate hardware support (e.g., perl6's upcoming 'parrot' pseudo-assembler).
        • Whatever the hell it is that functional programming languages do -- they can be much more safer and reliable (perhaps making them a path for programming in the future), but are slower. Perhaps their execution patterns can get some hardware support features?
      • Ease-of-use
        • Compiled code is parsed through a BNF grammar once, dynamic code every time it's run. How about instructions/hardware support for common lexing/tokenizing/BNF-based parsing code patterns?
        • Regular expression/state machine features?
    • Graphical desktop support
      • Better 2d X-server support for many tabs x windows x desktops -- maybe improved rectangle overlap calculations for faster window exposure/damage calculations?
      • X-server-on-a-chip, or some common X server administrative/graphics functions on a chip (not a CPU feature, but maybe an Intel graphics feature)?
      • Just fire up your linux desktop and see what window ops are a bit sluggish.
      • A
  • by Paracelcus (151056) on Thursday April 19, @05:21PM (#18804451)
    Gee whiz ain't that fast, OK, now what?
    Will it unlock the secrets of the universe?
    Will it cure cancer?
    Will it get an old fart a job?
    Uhhh, will it make my mail server run faster, will it break absurdly strong encryption so the gummermint can spy on us better?
    Will great aunt Tilly be able to remember how to get to her email account without calling her grandson?
    Will the airlines run on time?
    160 cores, 12Ghz, 6TF's, but gas is still $4.00 per gallon.
  • by HW_Hack (1031622) on Friday April 20, @12:22AM (#18808515)
    The future is obviously multi-core - and with virtulization most likely multiple concurrent OS sessions. At some point (in the number of cores / work loads) collisions to use the FSB will begin to limit overall computational throughput. You can witness this on typical dual core PC where CPU-1 is busy but has to drop down to near 0% as CPU-2 need to start a job or flush the cache etc. Now just magnify this a 100 fold or more as we run more programs - processes - OS's concurrently.

    I see that Intels near future offerings will have 6MB caches ... going to a large cache will help reduce cache misses and having to fight over the FSB but its not a true solution. Its an extra buffer but does not get at the root need for dealing with a "single path" into the memory controller.

    Of course we can't have a chip 1000's of pins as we duplicate traditional parallel data and address buses - one obvious choice is to to use PCI-Express technology (high-speed serial lines) to crunch those 64 data bus lines into maybe 16 or 32 lines. Another solution is to move at least part of the memory controller into the CPU core and do this multi-bus work within the CPU - and again most likley using high speed serial lines.

  • I work on a number of C++ programs, which mostly use only integer arithmetic. I do not really want to learn MMX/SSE/SSE2/SSE3/etc myself, I want features to be used best by the libraries and compilers I use.
  • You do realise that current generation Intel CPU's are cooler than the equivalent AMD offerings, right? Oh of course you do...after all you're just trolling without anything pertinent to add.

    I suppose you make half of a good point with regards to parallelization. You *can*, it's just tough to do any only addresses a subset of the potential computation problems out there. Current tools are lacking though, and I would love to see Intel step up to the plate with regards to SMP libraries/toolsets.
    [ Parent ]
    • 1 reply beneath your current threshold.
  • by Brett_Intel (1083709) on Tuesday April 10, @06:53PM (#18683243)
    For the processor the temperature data can be read via MSR 19C. This is an offset value from the maximum operating temperature, not an actual temperature. The specifics are documented in the Software Developer's Manuals. For voltage it would depend on the implementation by the motherboard designer as the processor itself doesn't have any mechanism for voltage monitoring.
    [ Parent ]
  • Re:Three Areas

    (Score:1)
    by Brett_Intel (1083709) on Friday April 13, @01:29PM (#18720847)
    Your suggestion to reintroduce SMT is a good one. We looked at SMT on the Core uArch and found that the performance was not worth the power and transitor cost. However, with the new Nehalem archtecture we will once again have SMT. This is a good example of how architectural features don't always make sense on all products. Focus on power consumption and acoustics will continue. In the consumer space, there isn't necessarily a broad awareness or desire for lower power though this is beginning to change especially as the PC moves into the form factors you mention and into the living room. The new Energy Star program should also help drive the industry to more power efficiency.
    [ Parent ]
  • 5 replies beneath your current threshold.