• 7 Posts
  • 108 Comments
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle
  • inspxtr@lemmy.worldtoSelfhosted@lemmy.world2024 Self-Host User Survey Results
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    2
    ·
    20 days ago

    Wonder how the survey was sent out and whether that affected sampling.

    Regardless, with -3-4k responses, that’s disappointing, if not concerning.

    I only have a more personal sense for Lemmy. Do you have a source for Lemmy gender diversity?

    Anyway, what do you think are the underlying issues? And what would be some suggestions to the community to address them?










  • If you’ve never worked before, this can be considered practice runs for the when you do.

    Like one of the other commentors said, assume everything is accessible by Google and/or your university (and later, your boss, company, organization, …).

    And not just you, but the people who interact with you through it. So that means you may be able to put up defenses, but if they don’t (and they most likely do not), the data that you interact with them would likely be accessible as well.

    So here are some potential suggestions to minimize private-data access by Google/university while still being able to work with others (adjust things depending on your threat model of course):

    • use Google Workspace services only for collaboration and for official business communication
    • don’t link things that may be personal, such as Google Map, Youtube, Search history, Browser, …
    • if more sensitive things need to be shared with other people, use more private/encrypted solutions that you like or the university suggests. You should use the latter if it’s still “business”-related, e.g. communicate about medical research data with PII
    • if there are communications that need sensitive information (eg HR documents, tax documents), ask them (a) if you can bring the sensitive documents to them, (b) or if the university has an encrypted solution, or © if you can use your own encrypted solution (eg put files on protondrive and you give them the appropriate folder password in person)
    • go through all Google privacy and security settings every 6 months or so, and turn off what you don’t need (there are usually a bunch of guides for that). Note: every 6 months because there may be new stuff that they add
    • turn off all the AI integrated features (sometimes called smart features) in Google services like Mail, GDoc, …
    • avoid using GDrive for storage of personal files - if you need to, try to encrypt them before uploading
    • you may find there are other people like you; and if you work with them, try to ask whether they are comfortable with alternatives or if they have anything suggestions. However, this is usually rare in most fields, so keep your expectations low for this
    • use the multi-account containers in Firefox to containerize all stuff related to university account in one container. Don’t use Google Chrome; if you must you Chromium, there are other “forks?” that you can try
    • use UBlock Origin and block unnecessary Google services (you’ll have to play around with this a lot)
    • avoid clicking on links in emails if possible, but instead copy them by selecting them (or the right click, copy). This is an unfounded suspicion, Google may track what links you click on








  • yeah I guess maybe the formatting and the verbosity seems a bit annoying? Wonder what the alternatives solution could be to better engage people from mastodon, which is what this bot is trying to address.

    edit: just to be clear, I’m not affiliated with the bot or its creator. This is just my observation from multiple posts I see this bot comments on.




  • Thanks for the suggestions! I’m actually also looking into llamaindex for more conceptual comparison, though didn’t get to building an app yet.

    Any general suggestions for locally hosted LLM with llamaindex by the way? I’m also running into some issues with hallucination. I’m using Ollama with llama2-13b and bge-large-en-v1.5 embedding model.

    Anyway, aside from conceptual comparison, I’m also looking for more literal comparison, AFAIK, the choice of embedding model will affect how the similarity will be defined. Most of the current LLM embedding models are usually abstract and the similarity will be conceptual, like “I have 3 large dogs” and “There are three canine that I own” will probably be very similar. Do you know which choice of embedding model I should choose to have it more literal comparison?

    That aside, like you indicated, there are some issues. One of it involves length. I hope to find something that can build up to find similar paragraphs iteratively from similar sentences. I can take a stab at coding it up but was just wondering if there are some similar frameworks out there already that I can model after.