intro
📄️ intro
What is Dataset Factory (datafact)
🗃️ Templates
3 items
What is Dataset Factory (datafact)
datafact
is a dataset bundling framework that help you build, package, and distribute datasets.
Installation
To install datafact
:
pip install -U datafact
Guide
Choose a template
To see the list of available templates:
datafact templates
datafact has the following templates:
- hello-world: suitable for basic dataset projects.
- synthetic: suitable for creating dataset with synthetic data (via mkb).
- media: suitable for dataset projects with media files (image, audio, video data).
For more complicated examples:
Example: Build and Publish hello/world dataset.
Create a new project
datafact new hello/world
Enter the project folder
cd hello/world
Build the dataset
python project.py build
Preview the build (Optional)
python project.py preview show
Publish the dataset
# publish to local dataset.sh repo
python project.py publish
View it in dataset.sh web ui.
dataset.sh gui hello/world
Upload it to remote (Optional)
dataset.sh remote -p default upload -s hello/world -t latest hello/world
More examples
✅ media datasets
a dataset.sh tutorial
Tutorial: media datasets
in this guide, you will learn to how to bundle dataset with media files.
Start✅ synthetic data datasets
a dataset.sh tutorial
Tutorial: synthetic data datasets
in this guide, you will learn to how to create and bundle dataset with synthetic data.
Start