• QuaternionsRock@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    8 months ago

    This article fails to mention the single biggest differentiator between x86 and ARM: their memory models. Considering the sheer amount of everyday software that is going multithreaded, this is a huge issue, and the reason why ARM drastically outperforms x86 running software like modern web browsers.

    • pycorax@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 months ago

      Do you mind elaborating what is it about the difference on their memory models that makes a difference?

      • QuaternionsRock@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        7 months ago

        Here is a great article on the topic. Basically, x86 spends a comparatively enormous amount of energy ensuring that its strong memory guarantees are not violated, even in cases where such violations would not affect program behavior. As it turns out, the majority of modern multithreaded programs only occasionally rely on these guarantees, and including special (expensive) instructions to provide these guarantees when necessary is still beneficial for performance/efficiency in the long run.

        For additional context, the special sauce behind Apple’s Rosetta 2 is that the M family of SoCs actually implement an x86 memory model mode that is selectively enabled when executing dynamically translated multithreaded x86 programs.

        • pycorax@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          7 months ago

          Thanks for the links, they’re really informative. That said, it doesn’t seem to be entirely certain that the extra work done by the x86 arch would incur a comparatively huge difference in energy consumption. Granted, that isn’t really the point of the article. I would love to hear from someone who’s more well versed in CPU design on the impact of it’s memory model. The paper is more interesting with regards to performance but I don’t find it very conclusive since it’s comparing ARM vs TSO on an ARM processor. It does link this paper which seems more relevant to our discussion but a shame that it’s paywalled.

      • sunbeam60@lemmy.one
        link
        fedilink
        English
        arrow-up
        2
        ·
        7 months ago

        On the x86 architecture, RAM is used by the CPU and the GPU has a huge penalty when accessing main RAM. It therefore has onboard graphics memory.

        On ARM this is unified so GPU and CPU can both access the same memory, at the same penalty. This means a huge class of embarrassingly parallel problems can be solved quicker on this architecture.

        • pycorax@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          7 months ago

          Do x86 CPUs with iGPUs not already use unified memory? I’m not exactly sure what you mean but are you referring to the overhead of having to do data copying over from CPU to GPU memory on discrete graphics cards when performing GPU calculations?

          • sunbeam60@lemmy.one
            link
            fedilink
            English
            arrow-up
            1
            ·
            7 months ago

            Yes unified and extremely slow compared to an ARM architecture’s unified memory, as the GPU sort of acts as if it was discrete.

            • pycorax@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              7 months ago

              Do you have any sources for this? Can’t seem to find anything specific describing the behaviour. It’s quite surprising to me since the Xbox and PS5 uses unified memory on x86-64 and would be strange if it is extremely slow for such a use case.

              • sunbeam60@lemmy.one
                link
                fedilink
                English
                arrow-up
                1
                ·
                7 months ago

                It’s been a while since I’ve coded on the Xbox, but at least in the 360, the memory wasn’t really unified as such. You had 10 MB of EDRAM that formed your render target and then there was specialised functions to copy the EDRAM output to DRAM. So it was still separated and while you could create buffers in main memory that you access in the shaders, at some penalty.

                It’s not that unified memory can’t be created, but it’s not the architecture of a PC, where peripheral cards communicate over the PCI bus, with great penalties to touch RAM.

                • pycorax@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  7 months ago

                  Well for the current generation consoles they’re both x86-64 CPUs with only a single set of GDDR6 memory shared across the CPU and GPU so I’m not sure if you have such a penalty anymore

                  It’s not that unified memory can’t be created, but it’s not the architecture of a PC, where peripheral cards communicate over the PCI bus, with great penalties to touch RAM.

                  Are there any tests showing the difference in memory access of x86-64 CPUs with iGPUs compared to ARM chips?