Stories
Slash Boxes
Comments

Dev.SN ♥ developers

posted by LaminatorX on Monday February 24 2014, @12:30PM   Printer-friendly
from the Can-I-get-some-dips-with-that? dept.

Rashek writes:

"Intel and Qualcomm just announced their roadmaps for mobile System on a Chip at this year's Mobile World Congress.

Intel presented performance numbers of their Merrifield SoC, a dual-core Silvermont based SoC that's effectively the phone version of Bay Trail, with some carefully chosen benchmarks that compared it to Apple's A7 SoC and Qualcomm's Snapdragon 800 series. Meanwhile, Qualcomm revealed future 64-bit Snapdragons for its mid-tier Snapdragon series. The Snapdragon 610 and 615 will arrive in Android smartphones in Q4 of this year and are four and eight core implementations of ARM's Cortex A53."

 
This discussion has been archived. No new comments can be posted.
Display Options Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Informative) by mth on Monday February 24 2014, @02:14PM

    by mth (2848) on Monday February 24 2014, @02:14PM (#6013) Homepage

    The 64-bit ARM instruction set drops some features that were in the 32-bit instruction set to simplify its implementation in hardware. That could lead to a lower power consumption, although I don't know if it does in practice.

    I'm also not sure how large the contribution of the SoC is to the overall phone power use: how often applications access the radio might be more relevant. In general the screen is also a significant power user on mobile devices, but in the case of a phone the screen will be off most of the time.

    Starting Score:    1  point
    Moderation   +2  
       Informative=2, Total=2
    Extra 'Informative' Modifier   0  

    Total Score:   3  
  • (Score: 2) by TheRaven on Tuesday February 25 2014, @05:54AM

    by TheRaven (270) on Tuesday February 25 2014, @05:54AM (#6517) Journal

    Note that ARMv8 requires both AArch32 and AArch64 to be implemented, so you still need the extra complexity to exist, although it may be unpowered. I wouldn't be too surprised if the second generation of ARMv8 chips just include a pure-AArch64 core and a Cortex-A7 or A15 with the ability to switch between them. Unlike Thumb, it is not possible to mix AArch32 and AArch64 in the same program, so you need a full context switch to go between them, so you may as well just have a completely separate 32-bit chip on the die that you only power when you're running 32-bit code.

    The two big simplifications in AArch64 mode are moving PC from the GPR space and removing the load / store multiple instructions. The former means that branch prediction is easier because only specific branch instructions can be branches. In AArch32, you can do fun things like implement vtable-based branches by a load instruction with pc as destination. This means that you need to decode operands early in the pipeline to get the instruction to the branch predictor early enough for it to be used. It's also annoying for simple jumps (e.g. add 32 to the program counter), because you need to effectively just do the instruction early, but you can't use the normal forwarding paths on an out-of-order architecture. With explicit branches, you can simplify the forwarding paths a lot.

    The second is easier because store multiple stores between 1 and 16 registers and updates one register. This means that it has to be a multi-cycle instruction and has a varying execution length. This was fine on early ARM chips with very simple pipelines, because they just ran a little loop and then continued - you stall the pipeline, but when the pipeline is only 3 stages long and you can still start fetching and decoding the next two it doesn't actually hurt performance. It's also fairly okay on an out-of-order architecture (although a bit painful, because you might need to forward the entire register set, which makes for some very wide paths. It's really horrible on the low-power chips (e.g. the A7) because the simple and low power implementation stalls the pipeline (hurting performance because you have to stall everything from register read to writeback) and a more complex one burns power. The AArch64 equivalents are store-pair and load-pair, which work for the normal case (stack spills / loads) but can be implemented with predictable latency and simple forwarding paths - especially since the width of the store is the same as a NEON load / store, so you already need channels wide enough for a single-cycle-latency implementation to and from the L1.

    --
    sudo mod me up