𞋴𝛂𝛋𝛆

  • 4 Posts
  • 22 Comments
Joined 2 years ago
cake
Cake day: June 9th, 2023

help-circle
  • Actually look at the way Discord works in your network, like all the raw IP addresses and and connections with no clear ownership or human readable name, with dozens of changing connections to get any of it to work. Then go try to ask questions about what is going on and who you’re connecting to. Discover that none of it is documented or described anywhere. Then realize that this means no one running Discord is doing so on a fully audited and logged host. You simply cannot be without a bunch of effort. I made it to the 6th layer of whitelisted raw IP addresses, and still nothing worked while trying to connect to Discord in a fully logged and documented network. I am simply unwilling to write a script to annotate that many connections so that all of my logs make sense. I seriously doubt anyone on Discord is doing so, and they certainly lack any understanding of what they are connecting to, why, or the protocols. So the Discord user is telling me “my opsec and privacy awareness is as nonexistent as a pig in a herd running off a cliff, and my system should be assumed compromised with no idea of what might be connected.” Everyone else doing it is a garbage excuse. That no one appears to have gotten hurt – has tissue thin merit, but also reveals that the user runs blind in herds while hoping for the best. Such information infers a lot about a person, their depth, accountability, and ethics – in certain scopes.


  • 𞋴𝛂𝛋𝛆@lemmy.worldtomemes@lemmy.worldMy formal wear
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    4 days ago

    Am I the only person that really hates randomly seeing an event depicted where I know people are suffering and dying in the image? Maybe it is my decade and counting of suffering from physical disability that makes me particularly empathetic to human suffering. Maybe it was because I was sitting in geometry class when the announcement came in, class stopped, and we all turned on the TV two minutes before the second plane struck. I watched this in real time, and all the events as they unfolded. There will never be a day when I forget seeing that happen. The jumpers that followed bothered me most at the time.

    Knowing myself, and how much I care about strangers, if I had been there, I would have died while trying to help people. I often imagine what it might have been like as the collapse happened, or the experience within the plane. I have a vivid imagination for such abstractions; the heat, the sounds; the way materials fail and collapse, the view out of the window, the spectrum of how others react, how I quietly endure whatever I must in the moment – often more aware of the events unfolding around me – only to experience others when they suddenly realize what is happening moments later.

    That is what I see in this image. I don’t like seeing it on Lemmy in any unnecessary context. I don’t see the politics. I see the people inside. I see you the person reading this right now, wherever you are. You, in a moment of innocent vulnerability, going about your day, half tuned out of the real world that surrounds you, and I care about you, as both a fellow human in real life space around you and as a digital neighbor here. To me, it is like we are both in that building, on one of the floors struck and burning, and I am doing everything I can to help you escape; refusing to leave until you are freed. I don’t know you or your name. If the shoe is on the other foot, I can easily write off myself as a loss; insisting that I am fine and that you need to escape while leaving me behind. That hypothetical does not bother me at all. It is only the idea of you, the real you, seriously, the you looking at this screen right now, that bothers me to think it is you stuck in there. I did not leave you. We didn’t die alone you and I. Call me crazy, but I care, and I can’t help it. It is who I am.



  • I haven’t looked into the issue of PCIe lanes and the GPU.

    I don’t think it should matter with a smaller PCIe bus, in theory, if I understand correctly (unlikely). The only time a lot of data is transferred is when the model layers are initially loaded. Like with Oobabooga when I load a model, most of the time my desktop RAM monitor widget does not even have the time to refresh and tell me how much memory was used on the CPU side. What is loaded in the GPU is around 90% static. I have a script that monitors this so that I can tune the maximum number of layers. I leave overhead room for the context to build up over time but there are no major changes happening aside from initial loading. One just sets the number of layers to offload on the GPU and loads the model. However many seconds that takes is irrelevant startup delay that only happens once when initiating the server.

    So assuming the kernel modules and hardware support the more narrow bandwidth, it should work… I think. There are laptops that have options for an external FireWire GPU too, so I don’t think the PCIe bus is too baked in.


  • Anything under 16 is a no go. Your number of CPU cores are important. Use Oobabooga Textgen for an advanced llama.cpp setup that splits between the CPU and GPU. You'll need at least 64 GB of RAM or be willing to offload layers using the NVME with deepspeed. I can run up to a 72b model with 4 bit quantization in GGUF with a 12700 laptop with a mobile 3080Ti which has 16GB of VRAM (mobile is like that).

    I prefer to run a 8×7b mixture of experts model because only 2 of the 8 are ever running at the same time. I am running that in 4 bit quantized GGUF and it takes 56 GB total to load. Once loaded it is about like a 13b model for speed but is ~90% of the capabilities of a 70b. The streaming speed is faster than my fastest reading pace.

    A 70b model streams at my slowest tenable reading pace.

    Both of these options are exponentially more capable than any of the smaller model sizes even if you screw around with training. Unfortunately, this streaming speed is still pretty slow for most advanced agentic stuff. Maybe if I had 24 to 48gb it would be different, I cannot say. If I was building now, I would be looking at what hardware options have the largest L1 cache, the most cores that include the most advanced AVX instructions. Generally, anything with efficiency cores are removing AVX and because the CPU schedulers in kernels are usually unable to handle this asymmetry consumer junk has poor AVX support. It is quite likely that all the problems Intel has had in recent years has been due to how they tried to block consumer stuff from accessing the advanced P-core instructions that were only blocked in microcode. It requires disabling the e-cores or setting up a CPU set isolation in Linux or BSD distros.

    You need good Linux support even if you run windows. Most good and advanced stuff with AI will be done with WSL if you haven’t ditched doz for whatever reason. Use https://linux-hardware.org/ to see support for devices.

    The reason I mentioned avoid consumer e-cores is because there have been some articles popping up lately about all p-core hardware.

    The main constraint for the CPU is the L2 to L1 cache bus width. Researching this deeply may be beneficial.

    Splitting the load between multiple GPUs may be an option too. As of a year ago, the cheapest option for a 16 GB GPU in a machine was a second hand 12th gen Intel laptop with a 3080Ti by a considerable margin when all of it is added up. It is noisy, gets hot, and I hate it many times, wishing I had gotten a server like setup for AI, but I have something and that is what matters.







  • Need max AVX instructions. Anything with P/E cores is junk. Only enterprise P cores have the max AVX instructions. When P/E are mixed the advanced AVX is disabled in microcode because the CPU scheduler is unable to determine if a process thread contains an AVX instruction and there is no asymmetrical scheduler that handles this. Prior to early 12k series Intel, the microcode for P enterprise could allegedly run if swapped manually. This was “fused off” to prevent it, probably because Linux could easily be adapted to asymmetrical scheduling but Windows would probably not. The whole reason W11 had to be made was because of the E-cores and the way the scheduler and spin up of idol cores works, at least according to someone on Linux Plumbers for the CPU scheduler ~2020. There are already asymmetric schedulers in Android ARM.

    Anyways I think it was on Gamer’s Nexus in the last week or two that Intel was doing some all P core consumer stuff. I’d look at that. According to chips and cheese, the primary CPU bottleneck for tensors is the bus width and clock management of the L2 to L1 cache.

    I do alright with my laptop, but haven’t tried R1 stuff yet. The 70B llama2 stuff that I ran was untenable for CPU only with a 12700 with just CPU. It is a little slower than my reading pace when split with a 16 GB GPU, and that was running a 4 bit quantization version.












  • You generally want to use a trusted protection module (TPM) chip like what is on most current computers and Pixel phones. The thing to understand about the TPM chips is that they have a set of unique internal keys that cannot be accessed at all. These keys are used to hash against and create other keys. The inaccessibility of this unique keyset is the critical factor. If you store keys in any regular memory, you are taking a chance.

    Maybe check out Joe Grand’s YT stuff. He has posted about hacking legit keys to recover large crypto amounts. Joe is behind the JTAGulator, if you have ever seen that one, and was a famous child hacker going by “Kingpin.”

    I recall reading somewhere about a software implementation of TPM for secure boot, but I didn’t look into it very deeply and do not recall where I read about it. Probably on Gentoo, Arch, or maybe in the book Beyond Bios (terrible)

    Andrew Huang used to have stuff up on YT that would be relevant to real security of such a device, but you usually need to know where he wrote articles to find links because most of his stuff isn’t publicly listed on YT. He has also removed a good bit over the years when certain exploits are unfixable like accessing the 8051 microcontroller built into most SD cards and running transparently. Andrew is the author of Hacking the Xbox which involved basically a man in the middle attack on a high speed PCIE (IIRC) connection.

    It would be a ton of work to try to reverse engineer what you have created and implemented in such a device. Unless you’re storing millions, it is probably not something anyone is going to mess with.