• 1 Post
  • 37 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle









  • To actually read how they did it, here is there model page: https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k

    Approach:

    • meta-llama/Meta-Llama-3-8B-Instruct as the base
    • NTK-aware interpolation [1] to initialize an optimal schedule for RoPE theta, followed by empirical RoPE theta optimization
    • Progressive training on increasing context lengths, similar to Large World Model [2] (See details below)

    Infra

    We build on top of the EasyContext Blockwise RingAttention library [3] to scalably and efficiently train on contexts up to 1048k tokens on Crusoe Energy high performance L40S cluster.

    Notably, we layered parallelism on top of Ring Attention with a custom network topology to better leverage large GPU clusters in the face of network bottlenecks from passing many KV blocks between devices. This gave us a 33x speedup in model training (compare 524k and 1048k to 65k and 262k in the table below).

    Data

    For training data, we generate long contexts by augmenting SlimPajama. We also fine-tune on a chat dataset based on UltraChat [4], following a similar recipe for data augmentation to [2].







  • I use it almost daily.

    It does produce good code. It does not reliably produce good code. I am a programmer, it makes my job 10x faster and I just have to fix a few bugs in the code it usually generates. Over time, I learned what it is good at (UI code, converting things, boilerplate) and what it struggles with (anything involving newer tech, algorithmic understanding, etc.)

    I often refer to it as my intern: It acts like an academically trained, not particularly competent, but very motivated, fast typing intern.

    But then I am also working on the field. Prompting it correctly is too often dismissed as a skill (I used to dismiss it too). It needs more understanding than people give it credit for.

    I think that like many IT tech it will go from being a dev tool to everyday tool gradually.

    All the pieces of the puzzle to be able to control a computer by voice using only natural language are there. You don’t realize how big it is. Companies haven’t assembled it yet because it is actually harder to monetize on it than code it. I think probably Apple is in the best position for it. Microsoft is going to attempt and will fail like usual and Google will probably put a half-assed attempt at it. I’ll personally go for the open source version of it.




  • keepthepace@slrpnk.nettoAI@lemmy.mlAI on AMD
    link
    fedilink
    arrow-up
    2
    ·
    9 months ago

    Can’t wait! But really, this type of things is what makes it hard for me to cheer at AMD:

    For reasons unknown to me, AMD decided this year to discontinue funding the effort and not release it as any software product. But the good news was that there was a clause in case of this eventuality: Janik could open-source the work if/when the contract ended.

    I wish we had a champion of openness but in that respect AMD just looks like a worse version of NVidia. Hell, even Intel has been a better player!


  • keepthepace@slrpnk.nettoAI@lemmy.mlAI on AMD
    link
    fedilink
    arrow-up
    6
    arrow-down
    1
    ·
    9 months ago

    That’s the opposite of the feedback I got. AMD claims to support all of the transformers library but many people report this to be a lie.

    I am in no love of companies that establish de-facto monopolices, but that is indeed what NVidia has right now. Everything is built over CUDA, AMD has a lot of catch-up to do.

    I have the impression that Apple chips support more things than AMD does.

    There are some people making things work on AMD, and I cheer to them, but let’s not pretend it is as easy as with Nvidia. Most packages depend on cuda for gpu acceleration.