Great points; I don't think we have the vocabulary let alone the conceptual fluency to fully understand this yet.
I agree that a human writer consuming large amounts of Hemingway isn't that different from an AI trained on him. But I think that only covers half the question. The reading/training isn't what violates the copyright -- it's the spitting out of what's read into the marketplace. The Hemingway human who turns around and writes The Sun Keeps Rising or what-have-you is adding something ineffable to the mix that a machine by definition can't. Call it originality, experience, creativity, soul -- even a mediocre human writer is bringing an element that transforms the work from a simple synthesis. Ask a human on a desert island who's never read a book in their life to tell you a story and you'll get _something_, however wobbly. Do that with a machine that's never read anything and you'll get computer circuits and sand.
That's why I think in the end LLMs are closer to Napster than a human -- more elaborate and disguised than a traditional copy, sure. But fundamentally borrowed and not original. And thus an infringement of that which is.
Idk, I think the problem with copyright and AI is deeper and more complicated than the most of the current conversation recognizes. Your point about whether not something that produces content that cannot be copyrighted can simultaneously somehow violate copyright hints at the issues. Napster ultimately was just making copies. But that's not how LLMs work -- they're not really copying anything, just "learning" how words are associated with a trillion parameters. If I read dozens of novels by Stephen King and Hemingway and then write my own book, people might notice that the writing style is in the tradition of those writers -- but unless I copy them word for word, it's not copyright infringement -- if it were, art (which is always in conversation with what came before) wouldn't be possible. Certainly it would make no sense to say I'm violating copyright just by *reading* (training myself) on their works, would it?
On the other hand, LLMs aren't people, and I think intuitively we have a justifiable aversion to the idea that a supercomputer could "imitate" the writing style of Margret Atwood and churn out a hundred books in her "style" over the weekend. But I'm not sure it's "stealing" in same way copyright infringement is stealing.
Ultimately the whole concept of copyright itself is a human creation that would probably never have arisen without the invention of mechanical means of reproduction. If we want to address the problem we intuit here, I think we're going to have think harder about what it is we're really objecting to and develop a legal framework that addresses that.
Great points; I don't think we have the vocabulary let alone the conceptual fluency to fully understand this yet.
I agree that a human writer consuming large amounts of Hemingway isn't that different from an AI trained on him. But I think that only covers half the question. The reading/training isn't what violates the copyright -- it's the spitting out of what's read into the marketplace. The Hemingway human who turns around and writes The Sun Keeps Rising or what-have-you is adding something ineffable to the mix that a machine by definition can't. Call it originality, experience, creativity, soul -- even a mediocre human writer is bringing an element that transforms the work from a simple synthesis. Ask a human on a desert island who's never read a book in their life to tell you a story and you'll get _something_, however wobbly. Do that with a machine that's never read anything and you'll get computer circuits and sand.
That's why I think in the end LLMs are closer to Napster than a human -- more elaborate and disguised than a traditional copy, sure. But fundamentally borrowed and not original. And thus an infringement of that which is.
Idk, I think the problem with copyright and AI is deeper and more complicated than the most of the current conversation recognizes. Your point about whether not something that produces content that cannot be copyrighted can simultaneously somehow violate copyright hints at the issues. Napster ultimately was just making copies. But that's not how LLMs work -- they're not really copying anything, just "learning" how words are associated with a trillion parameters. If I read dozens of novels by Stephen King and Hemingway and then write my own book, people might notice that the writing style is in the tradition of those writers -- but unless I copy them word for word, it's not copyright infringement -- if it were, art (which is always in conversation with what came before) wouldn't be possible. Certainly it would make no sense to say I'm violating copyright just by *reading* (training myself) on their works, would it?
On the other hand, LLMs aren't people, and I think intuitively we have a justifiable aversion to the idea that a supercomputer could "imitate" the writing style of Margret Atwood and churn out a hundred books in her "style" over the weekend. But I'm not sure it's "stealing" in same way copyright infringement is stealing.
Ultimately the whole concept of copyright itself is a human creation that would probably never have arisen without the invention of mechanical means of reproduction. If we want to address the problem we intuit here, I think we're going to have think harder about what it is we're really objecting to and develop a legal framework that addresses that.