• 5 Posts
  • 147 Comments
Joined 1 year ago
cake
Cake day: August 8th, 2023

help-circle
  • The PC I’m using as a little NAS usually draws around 75 watt. My jellyfin and general home server draws about 50 watt while idle but can jump up to 150 watt. Most of the components are very old. I know I could get the power usage down significantly by using newer components, but not sure if the electricity use outweighs the cost of sending them to the landfill and creating demand for more newer components to be manufactured.




  • Last time I looked it up and calculated it, these large models are trained on something like only 7x the tokens as the number of parameters they have. If you thought of it like compression, a 1:7 ratio for lossless text compression is perfectly possible.

    I think the models can still output a lot of stuff verbatim if you try to get them to, you just hit the guardrails they put in place. Seems to work fine for public domain stuff. E.g. “Give me the first 50 lines from Romeo and Juliette.” (albeit with a TOS warning, lol). “Give me the first few paragraphs of Dune.” seems to hit a guardrail, or maybe just forced through reinforcement learning.

    A preprint paper was released recently that detailed how to get around RL by controlling the first few tokens of a model’s output, showing the “unsafe” data is still in there.





  • I use GPT (4o, premium) a lot, and yes, I still sometimes experience source hallucinations. It also will sometimes hallucinate incorrect things not in the source. I get better results when I tell it not to browse. The large context of processing web pages seems to hurt its “performance.” I would never trust gen AI for a recipe. I usually just use Kagi to search for recipes and have it set to promote results from recipe sites I like.





  • Hmm. I just assumed 14B was distilled from 72B, because that’s what I thought llama was doing, and that would just make sense. On further research it’s not clear if llama did the traditional teacher method or just trained the smaller models on synthetic data generated from a large model. I suppose training smaller models on a larger amount of data generated by larger models is similar though. It does seem like Qwen was also trained on synthetic data, because it sometimes thinks it’s Claude, lol.

    Thanks for the tip on Medius. Just tried it out, and it does seem better than Qwen 14B.






  • Yeah, I think this could be the end of free and fair elections in the U.S., and there’s no coming back from that without a revolution. Don’t get me wrong, I don’t think most of us will directly be killed by this change; our lives will just be shittier. It’ll be like living in Russia. Given how utterly incompetent the administration is looking, and the things they say they’re going to do (mass deportation of a significant part of our workforce, blanket tariffs, gutting social safety-nets), we may speed-run an economic and societal collapse. That could sow the seeds for a horrible and bloody revolution.

    Or, maybe I’m wrong and the important institutions will somehow hold against a christo-fascist party controlling all branches of the federal government and a president with immunity. If there are still are free and fair elections, then congress could block a lot of things in 2026, and start repairing some of the damage in 2028.

    Still, it does not bode well that the U.S. elected these people in the first place, and at best, the U.S. will slowly crumble for decades.