update the download URLs to azure CDN

2025-08-22 01:51:41 +00:00 · 2021-02-17 19:56:08 -05:00 · 2021-02-17 19:56:08 -05:00 · 2c102400c7
commit 2c102400c7
parent d6f4e2956b
3 changed files with 4 additions and 4 deletions
--- a/detector/README.md
+++ b/detector/README.md
@ -12,13 +12,13 @@ For motivations and discussions regarding the release of this detector model, pl
 Download the weights for the fine-tuned `roberta-base` model (478 MB):

 ```bash
-wget https://storage.googleapis.com/gpt-2/detector-models/v1/detector-base.pt
+wget https://openaipublic.azureedge.net/gpt-2/detector-models/v1/detector-base.pt
 ```

 or `roberta-large` model (1.5 GB):

 ```bash
-wget https://storage.googleapis.com/gpt-2/detector-models/v1/detector-large.pt
+wget https://openaipublic.azureedge.net/gpt-2/detector-models/v1/detector-large.pt
 ```

 These RoBERTa-based models are fine-tuned with a mixture of temperature-1 and nucleus sampling outputs,
--- a/detector/download.py
+++ b/detector/download.py
@ -30,7 +30,7 @@ def download(*datasets, data_dir='data'):
            if os.path.isfile(output_file):
                continue

-            r = requests.get("https://storage.googleapis.com/gpt-2/output-dataset/v1/" + filename, stream=True)
+            r = requests.get("https://openaipublic.azureedge.net/gpt-2/output-dataset/v1/" + filename, stream=True)

            with open(output_file, 'wb') as f:
                file_size = int(r.headers["content-length"])
--- a/download_dataset.py
+++ b/download_dataset.py
@ -17,7 +17,7 @@ for ds in [
 ]:
    for split in ['train', 'valid', 'test']:
        filename = ds + "." + split + '.jsonl'
-        r = requests.get("https://openaipublic.blob.core.windows.net/gpt-2/output-dataset/v1/" + filename, stream=True)
+        r = requests.get("https://openaipublic.azureedge.net/gpt-2/output-dataset/v1/" + filename, stream=True)

        with open(os.path.join(subdir, filename), 'wb') as f:
            file_size = int(r.headers["content-length"])