Spaces:
Runtime error
Runtime error
Commit
·
795dc75
1
Parent(s):
81885f7
open in new tab
Browse files- static/tabs.html +20 -19
static/tabs.html
CHANGED
|
@@ -93,7 +93,8 @@ a:visited {
|
|
| 93 |
<p>
|
| 94 |
<b>Dataset Streaming</b>
|
| 95 |
Usually data is stored on disk and needs to be fully or partially loaded into CPU memory to be used for training.
|
| 96 |
-
Large datasets used for pre-training measure in <a href="https://arxiv.org/abs/2101.00027">hundreds of gigabytes</a>
|
|
|
|
| 97 |
This can pose a significant problem, as most desktop and cheap cloud instance simply do not have that much space.
|
| 98 |
Furthermore, downloading the dataset over the internet would take up hours before one can even begin training.
|
| 99 |
<!--Changing the dataset means downloading a new dataset in full and using additional disk space.-->
|
|
@@ -106,7 +107,7 @@ a:visited {
|
|
| 106 |
</p>
|
| 107 |
<center>
|
| 108 |
Here's a tutorial for using these techniques:<br>
|
| 109 |
-
<a href="https://colab.research.google.com/gist/justheuristic/75f6a2a731f05a213a55cd2c8a458aaf/fine-tune-a-language-model-with-dataset-streaming-and-8-bit-optimizers.ipynb">
|
| 110 |
<img src="https://colab.research.google.com/assets/colab-badge.svg" width=360px>
|
| 111 |
</a>
|
| 112 |
</center>
|
|
@@ -159,7 +160,7 @@ a:visited {
|
|
| 159 |
<li>
|
| 160 |
<p>
|
| 161 |
Another defense is replacing the naive averaging of the peers' gradients with an <b>aggregation technique robust to outliers</b>.
|
| 162 |
-
<a href="https://arxiv.org/abs/2012.10333">Karimireddy et al. (2020)</a>
|
| 163 |
suggested such a technique (named CenteredClip) and proved that it does not significantly affect the model's convergence.
|
| 164 |
</p>
|
| 165 |
|
|
@@ -172,7 +173,7 @@ a:visited {
|
|
| 172 |
</p>
|
| 173 |
|
| 174 |
<p>
|
| 175 |
-
Recently, <a href="https://arxiv.org/abs/2106.11257">Gorbunov et al. (2021)</a>
|
| 176 |
proposed a robust aggregation protocol for decentralized systems that does not require this assumption.
|
| 177 |
This protocol uses CenteredClip as a subroutine but is able to detect and ban participants who performed it incorrectly.
|
| 178 |
</p>
|
|
@@ -182,54 +183,54 @@ a:visited {
|
|
| 182 |
<div role="tabpanel" class="tab-pane" id="tab3">
|
| 183 |
<p>In this section, we provide a roadmap for you to run the collaborative training yourself.</p>
|
| 184 |
<p>
|
| 185 |
-
<b>Got confused?</b> Feel free to ask any questions at our <a href="https://discord.gg/uGugx9zYvN">Discord</a>!
|
| 186 |
</p>
|
| 187 |
<ol>
|
| 188 |
<li>
|
| 189 |
Set up dataset streaming:
|
| 190 |
<ul>
|
| 191 |
<li>
|
| 192 |
-
<a href="https://huggingface.co/docs/datasets/share_dataset.html">Upload</a> your dataset to Hugging Face Hub
|
| 193 |
-
in a streaming-friendly format (<a href="https://huggingface.co/datasets/laion/laion_100m_vqgan_f8">example</a>).
|
| 194 |
</li>
|
| 195 |
<li>Set up dataset streaming (see the "Efficient Training" section).</li>
|
| 196 |
</ul>
|
| 197 |
</li>
|
| 198 |
<li>
|
| 199 |
-
Write code of training peers (<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_trainer.py">example</a>):
|
| 200 |
<ul>
|
| 201 |
<li>Implement your model, set up dataset streaming, and write the training loop.</li>
|
| 202 |
<li>
|
| 203 |
Get familiar with the hivemind library
|
| 204 |
-
(e.g., via the <a href="https://learning-at-home.readthedocs.io/en/latest/user/quickstart.html">quickstart</a>).
|
| 205 |
</li>
|
| 206 |
<li>
|
| 207 |
In the training loop, wrap up your PyTorch optimizer with
|
| 208 |
-
<a href="https://learning-at-home.readthedocs.io/en/latest/modules/optim.html#hivemind.optim.experimental.optimizer.Optimizer">hivemind.Optimizer</a>
|
| 209 |
-
(<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/task.py#L121">example</a>).
|
| 210 |
</li>
|
| 211 |
</ul>
|
| 212 |
</li>
|
| 213 |
<li>
|
| 214 |
-
<b>(optional)</b> Write code of auxiliary peers (<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_aux_peer.py">example</a>):
|
| 215 |
<ul>
|
| 216 |
<li>
|
| 217 |
Auxiliary peers a special kind of peers responsible for
|
| 218 |
-
logging loss and other metrics (e.g., to <a href="https://wandb.ai/">Weights & Biases</a>)
|
| 219 |
-
and uploading model checkpoints (e.g., to <a href="https://huggingface.co/docs/transformers/model_sharing">Hugging Face Hub</a>).
|
| 220 |
</li>
|
| 221 |
<li>
|
| 222 |
Such peers don't need to calculate gradients and may be run on cheap machines without GPUs.
|
| 223 |
</li>
|
| 224 |
<li>
|
| 225 |
They can serve as a convenient entry point to
|
| 226 |
-
<a href="https://learning-at-home.readthedocs.io/en/latest/modules/dht.html">hivemind.DHT</a>
|
| 227 |
(i.e., their address can be specified as <code>initial_peers</code>).
|
| 228 |
</li>
|
| 229 |
<li>
|
| 230 |
It is useful to fix their address by providing <code>host_maddrs</code> and <code>identity_path</code>
|
| 231 |
arguments to <code>hivemind.DHT</code>
|
| 232 |
-
(these are forwarded to the underlying <a href="https://libp2p.io/">libp2p</a> daemon).
|
| 233 |
</li>
|
| 234 |
</ul>
|
| 235 |
</li>
|
|
@@ -241,10 +242,10 @@ a:visited {
|
|
| 241 |
People may run them online and/or download and run them on their own hardware.
|
| 242 |
</li>
|
| 243 |
<li>
|
| 244 |
-
<a href="https://huggingface.co/organizations/new">Create</a> a Hugging Face organization
|
| 245 |
with all resources related to the training
|
| 246 |
(dataset, model, inference demo, links to a dashboard with loss and other metrics, etc.).
|
| 247 |
-
Look at <a href="https://huggingface.co/training-transformers-together">ours</a> as an example.
|
| 248 |
</li>
|
| 249 |
<li>
|
| 250 |
Set up an authentication system (see the "Security" section).
|
|
@@ -255,7 +256,7 @@ a:visited {
|
|
| 255 |
ban accounts who behave maliciously.
|
| 256 |
</li>
|
| 257 |
<li>
|
| 258 |
-
Set up an inference demo for your model (e.g., using <a href="https://huggingface.co/spaces">Spaces</a>) or
|
| 259 |
a script that periodically uploads the inference results to show the training progress.
|
| 260 |
</li>
|
| 261 |
</ul>
|
|
|
|
| 93 |
<p>
|
| 94 |
<b>Dataset Streaming</b>
|
| 95 |
Usually data is stored on disk and needs to be fully or partially loaded into CPU memory to be used for training.
|
| 96 |
+
Large datasets used for pre-training measure in <a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/abs/2101.00027">hundreds of gigabytes</a>
|
| 97 |
+
or even <a target="_blank" rel="noopener noreferrer" href="https://laion.ai/laion-400-open-dataset/">terabytes</a>.
|
| 98 |
This can pose a significant problem, as most desktop and cheap cloud instance simply do not have that much space.
|
| 99 |
Furthermore, downloading the dataset over the internet would take up hours before one can even begin training.
|
| 100 |
<!--Changing the dataset means downloading a new dataset in full and using additional disk space.-->
|
|
|
|
| 107 |
</p>
|
| 108 |
<center>
|
| 109 |
Here's a tutorial for using these techniques:<br>
|
| 110 |
+
<a target="_blank" rel="noopener noreferrer" href="https://colab.research.google.com/gist/justheuristic/75f6a2a731f05a213a55cd2c8a458aaf/fine-tune-a-language-model-with-dataset-streaming-and-8-bit-optimizers.ipynb">
|
| 111 |
<img src="https://colab.research.google.com/assets/colab-badge.svg" width=360px>
|
| 112 |
</a>
|
| 113 |
</center>
|
|
|
|
| 160 |
<li>
|
| 161 |
<p>
|
| 162 |
Another defense is replacing the naive averaging of the peers' gradients with an <b>aggregation technique robust to outliers</b>.
|
| 163 |
+
<a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/abs/2012.10333">Karimireddy et al. (2020)</a>
|
| 164 |
suggested such a technique (named CenteredClip) and proved that it does not significantly affect the model's convergence.
|
| 165 |
</p>
|
| 166 |
|
|
|
|
| 173 |
</p>
|
| 174 |
|
| 175 |
<p>
|
| 176 |
+
Recently, <a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/abs/2106.11257">Gorbunov et al. (2021)</a>
|
| 177 |
proposed a robust aggregation protocol for decentralized systems that does not require this assumption.
|
| 178 |
This protocol uses CenteredClip as a subroutine but is able to detect and ban participants who performed it incorrectly.
|
| 179 |
</p>
|
|
|
|
| 183 |
<div role="tabpanel" class="tab-pane" id="tab3">
|
| 184 |
<p>In this section, we provide a roadmap for you to run the collaborative training yourself.</p>
|
| 185 |
<p>
|
| 186 |
+
<b>Got confused?</b> Feel free to ask any questions at our <a target="_blank" rel="noopener noreferrer" href="https://discord.gg/uGugx9zYvN">Discord</a>!
|
| 187 |
</p>
|
| 188 |
<ol>
|
| 189 |
<li>
|
| 190 |
Set up dataset streaming:
|
| 191 |
<ul>
|
| 192 |
<li>
|
| 193 |
+
<a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/docs/datasets/share_dataset.html">Upload</a> your dataset to Hugging Face Hub
|
| 194 |
+
in a streaming-friendly format (<a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/datasets/laion/laion_100m_vqgan_f8">example</a>).
|
| 195 |
</li>
|
| 196 |
<li>Set up dataset streaming (see the "Efficient Training" section).</li>
|
| 197 |
</ul>
|
| 198 |
</li>
|
| 199 |
<li>
|
| 200 |
+
Write code of training peers (<a target="_blank" rel="noopener noreferrer" href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_trainer.py">example</a>):
|
| 201 |
<ul>
|
| 202 |
<li>Implement your model, set up dataset streaming, and write the training loop.</li>
|
| 203 |
<li>
|
| 204 |
Get familiar with the hivemind library
|
| 205 |
+
(e.g., via the <a target="_blank" rel="noopener noreferrer" href="https://learning-at-home.readthedocs.io/en/latest/user/quickstart.html">quickstart</a>).
|
| 206 |
</li>
|
| 207 |
<li>
|
| 208 |
In the training loop, wrap up your PyTorch optimizer with
|
| 209 |
+
<a target="_blank" rel="noopener noreferrer" href="https://learning-at-home.readthedocs.io/en/latest/modules/optim.html#hivemind.optim.experimental.optimizer.Optimizer">hivemind.Optimizer</a>
|
| 210 |
+
(<a target="_blank" rel="noopener noreferrer" href="https://github.com/learning-at-home/dalle-hivemind/blob/main/task.py#L121">example</a>).
|
| 211 |
</li>
|
| 212 |
</ul>
|
| 213 |
</li>
|
| 214 |
<li>
|
| 215 |
+
<b>(optional)</b> Write code of auxiliary peers (<a target="_blank" rel="noopener noreferrer" href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_aux_peer.py">example</a>):
|
| 216 |
<ul>
|
| 217 |
<li>
|
| 218 |
Auxiliary peers a special kind of peers responsible for
|
| 219 |
+
logging loss and other metrics (e.g., to <a target="_blank" rel="noopener noreferrer" href="https://wandb.ai/">Weights & Biases</a>)
|
| 220 |
+
and uploading model checkpoints (e.g., to <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/docs/transformers/model_sharing">Hugging Face Hub</a>).
|
| 221 |
</li>
|
| 222 |
<li>
|
| 223 |
Such peers don't need to calculate gradients and may be run on cheap machines without GPUs.
|
| 224 |
</li>
|
| 225 |
<li>
|
| 226 |
They can serve as a convenient entry point to
|
| 227 |
+
<a target="_blank" rel="noopener noreferrer" href="https://learning-at-home.readthedocs.io/en/latest/modules/dht.html">hivemind.DHT</a>
|
| 228 |
(i.e., their address can be specified as <code>initial_peers</code>).
|
| 229 |
</li>
|
| 230 |
<li>
|
| 231 |
It is useful to fix their address by providing <code>host_maddrs</code> and <code>identity_path</code>
|
| 232 |
arguments to <code>hivemind.DHT</code>
|
| 233 |
+
(these are forwarded to the underlying <a target="_blank" rel="noopener noreferrer" href="https://libp2p.io/">libp2p</a> daemon).
|
| 234 |
</li>
|
| 235 |
</ul>
|
| 236 |
</li>
|
|
|
|
| 242 |
People may run them online and/or download and run them on their own hardware.
|
| 243 |
</li>
|
| 244 |
<li>
|
| 245 |
+
<a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/organizations/new">Create</a> a Hugging Face organization
|
| 246 |
with all resources related to the training
|
| 247 |
(dataset, model, inference demo, links to a dashboard with loss and other metrics, etc.).
|
| 248 |
+
Look at <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/training-transformers-together">ours</a> as an example.
|
| 249 |
</li>
|
| 250 |
<li>
|
| 251 |
Set up an authentication system (see the "Security" section).
|
|
|
|
| 256 |
ban accounts who behave maliciously.
|
| 257 |
</li>
|
| 258 |
<li>
|
| 259 |
+
Set up an inference demo for your model (e.g., using <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/spaces">Spaces</a>) or
|
| 260 |
a script that periodically uploads the inference results to show the training progress.
|
| 261 |
</li>
|
| 262 |
</ul>
|