• 1 Post
  • 17 Comments
Joined 1 year ago
cake
Cake day: September 14th, 2023

help-circle







  • If you’re using an LLM, you should limit the output via a grammar to something like json, jsonl, or csv so you can load it into scripts and validate that the generated data matches the source data. Though at that point you might as well just parse the raw data and do it yourself. If I were you, I’d honestly use something like pandas/polars or even excel to get it done reliably without people bashing you for using the forbidden technology even if you can 100% confirm that the data is real and not hallucinated.

    I also wouldn’t use any cloud LLM solution like OpenAI, Gemini, Grok, etc. Since those can change and are really hard to validate and give you little to no control of the model. I’d recommend using a local solution like running an open weight model like Mistral Nemo 2407 Instruct locally using llama.cpp or vLLM since the entire setup will not change unless you manually go in and change something. We use a custom finetuned version of Mixtral 8x7B Instruct at work in a research setting and it works very well for our purposes (translation and summarization) despite what critics think.

    Tl;dr Use pandas/polars if you want 100% reliable (Human error not accounted). LLMs require lots of work to get reliable output from

    Edit: There’s lots of misunderstanding for LLMs. You’re not supposed to use the bare LLM for any tasks except extremely basic ones that could be done by hand better. You need to build a system around them for your specific purpose. Using a raw LLM without a Retrieval Augmented Generation (RAG) system and complaining about hallucinations is like using the bare ass Linux kernel and complaining that you can’t use it as a desktop OS. Of course an LLM will hallucinate like crazy if you give it no data. If someone told you that you have to write a 3 page paper on the eating habits of 14th century monarchs in Europe and locked you in a room with absolutely nothing except 3 pages of paper and a pencil, you’d probably write something not completely accurate. However, if you got access to the internet and a few databases, you could write something really good and accurate. LLMs are exceptionally good at summarization and translation. You have to give them data to work with first.



  • In small datasets, the speed difference is minimal; but, once you get to large datasets with hundreds of thousands to millions of entries they do make quite a difference. For example, you’re a large bank with millions of clients, and you want to get a list of the people with the most money in an account. Depending on the sorting algorithm used, the processing time could range from seconds to days. That’s also only one operation, there’s so much other useful information that could be derived from a database like that using sorting.



  • ralakus@lemmy.worldto196@lemmy.blahaj.zoneRule
    link
    fedilink
    arrow-up
    39
    ·
    2 months ago

    For anyone curious, I couldn’t find an exact statistics but hearing aids in the US cost between $2000 to $8000 per pair with the average costs sitting around $5000-$6000 per pair.

    Insurance coverage varies per insurance provider and per state. It looks like many people will end up paying the maximum required by law before insurance takes over which is roughly between $1000-$3000 depending on state.

    Not only is a single purchase expensive, you usually have to replace them every 3 to 5 years.