This is great, however, your book’s content is now in the OpenAI database or corpus or training set or whatever you want to call it, even before being published. Aren’t you concerned about that? You have no copyright protection yet.
While searching for like-minded developers, I came across your post. My proofreading workflows mainly consists of emails, markdown (GitHub pages), and some company-internal tools. GPT-3.5 gets it right almost every time, but having a diff and selectively applying changes is crucial for me. I have approached this problem with a small open-source desktop app that requires an OpenAI API token. It is still in the experimental stage in terms of functionality and design, but it already operates quite reliably at its core:
I don't have access to the gpt-4 model in the API, so I tried gpt-3.5-turbo and it simply summarises the first lines of text and removes the rest. I have only some lines for testing in one document, the lines of text are separated by blank lines. And I use plain text, no markdown, but that shouldn't be an issue.
Nice post! You mention a edit.py script, which is not the script in the gist (that one is called gpt-proofread.py). Would you mind sharing that too?
It's the same script. I just renamed it for the gist so that the name is a bit more telling about what the script does.
This is great, however, your book’s content is now in the OpenAI database or corpus or training set or whatever you want to call it, even before being published. Aren’t you concerned about that? You have no copyright protection yet.
For the API, they don't store the data for training the models: https://openai.com/policies/api-data-usage-policies
That's only for ChatGPT, but you can also opt-out.
I thought you could only opt-out for GPT-4. Do they charge for using the GPT-4 API?
While searching for like-minded developers, I came across your post. My proofreading workflows mainly consists of emails, markdown (GitHub pages), and some company-internal tools. GPT-3.5 gets it right almost every time, but having a diff and selectively applying changes is crucial for me. I have approached this problem with a small open-source desktop app that requires an OpenAI API token. It is still in the experimental stage in terms of functionality and design, but it already operates quite reliably at its core:
https://www.aibtra.dev
That looks really interesting. Thanks for building and sharing.
I don't have access to the gpt-4 model in the API, so I tried gpt-3.5-turbo and it simply summarises the first lines of text and removes the rest. I have only some lines for testing in one document, the lines of text are separated by blank lines. And I use plain text, no markdown, but that shouldn't be an issue.
Maybe I'll have to wait for gpt-4 in this case.