enumerator4829

joined 5 months ago
[–] enumerator4829@sh.itjust.works 16 points 3 weeks ago (2 children)

VSCode is just Emacs with a weirder Lisp. (/s)

(You can tear my Emacs from my cold dead hands)

I also hate that warning, but it’s basically ”Can’t fit your text, with the font and properties you specified, into the box you specified without making it look like ass”

Easiest way to preserve formatting is to reword the text. Then again, would be nice if it didn’t happen all the time in my normal paragraphs as soon as I use a word with more than 10 characters…

[–] enumerator4829@sh.itjust.works 2 points 1 month ago (2 children)

Reword your text to fit.

[–] enumerator4829@sh.itjust.works 10 points 2 months ago (2 children)

Biomedical AI literally won the Nobel prize last year. But LLMs won’t help at all.

Tangentially related, any biomedical outfit that hasn’t bought a shitton of GPUs to run alphafold on is probably mismanaging money.

[–] enumerator4829@sh.itjust.works 1 points 3 months ago (1 children)

You mean a transparency log? Just sign and publish. Or if it’s confidential, have a timestamp authority sign it, but what’s the point of a confidential blockchain? Sure, we han have a string of hashes chained together á la git, but that’s just an implementation detail. Where does the trust come from, who does the audit? That’s the interesting part.

[–] enumerator4829@sh.itjust.works 2 points 3 months ago (3 children)

If your blockchain isn’t distributed, it doesn’t need to be a blockchain, because then you already have trust established.

[–] enumerator4829@sh.itjust.works 1 points 3 months ago* (last edited 3 months ago)

You assume a uniform distribution. I’m guessing that it’s not. The question isn’t ”Does the model contain compressed representations of all works it was trained on”. Enough information on any single image is enough to be a copyright issue.

Besides, the situation isn’t as obviously flawed with image models, when compared to LLMs. LLMs are just broken in this regard, because it only takes a handful of bytes being retained in order to violate copyright.

I think there will be a ”find out” stage fairly soon. Currently, the US projects lots and lots of soft power on the rest of the world to enforce copyright terms favourable to Disney and friends. Accepting copyright violations for AI will erode that power internationally over time.

Personally, I do think we need to rework copyright anyway, so I’m not complaining that much. Change the law, go ahead and make the high seas legal. But set against current copyright laws, most large datasets and most models constitute copyright violations. Just imagine the shitshow if OpenAI was an European company training on material from Disney.

[–] enumerator4829@sh.itjust.works 3 points 3 months ago

Document databases are the future /s

[–] enumerator4829@sh.itjust.works 5 points 3 months ago

Or you know, trusted timestamps and cryptographic signatures via normal PKI. A Merkle tree isn’t worth shit legally if you can’t verify it against a trust outside of the tree.

All of the blockchain bullshit miss that part - you can create a cryptographic representation of money or contracts, but you can’t actually enforce, verify or trust anything in the real world without intermediaries. On the other hand, I can trust a certificate from a CA because there are verifiable actual real-world consequences for someone if that CA breaks legal agreements.

I’ll use a folder of actual papers, signed using a pen. Have some witnesses, make sure they have a legal stake and consequences, and you are golden.

[–] enumerator4829@sh.itjust.works 6 points 3 months ago (5 children)

Distributed blockchains are useful when all of the below are fulfilled:

  • Need for distributed ledger
  • Peers are adversarial w.r.t. contents of transactions in the ledger
  • Enough peers exist so that no group can become a majority and thus assume control
  • No trusted central authority exists

Here, we have a single peer creating entries in a ledger. We can get away with a copy of the ledger and one or more trusted timestamping authorities.

[–] enumerator4829@sh.itjust.works 7 points 3 months ago (2 children)

There is an argument that training actually is a type of (lossy) compression. You can actually build (bad) language models by using standard compression algorithms to ”train”.

By that argument, any model contains lossy and unstructured copies of all data it was trained on. If you download a 480p low quality h264-encoded Bluray rip of a Ghibli movie, it’s not legal, despite the fact that you aren’t downloading the same bits that were on the Bluray.

Besides, even if we consider the model itself to be fine, they did not buy all the media they trained the model on. The action of downloading media, regardless of purpose, is piracy. At least, that has been the interpretation for normal people sailing the seas, large companies are of course exempt from filthy things like laws.

[–] enumerator4829@sh.itjust.works 1 points 3 months ago (1 children)

What? Just base64 encrypt it before you store it in the git hub

view more: next ›