Skip to main content

Getting Started

dataset.sh is a dataset manager designed to simplify the process of installing, managing, and publishing datasets. We hope to make working with datasets as straightforward as using package managers like npm or pip for programming libraries.

Install

To get started, you can install dataset.sh via pip:

pip install dataset.sh -U
dataset.sh --help

Not interested in publishing, but want to see what you can do with dataset? Jump to here

How to create and publish datasets

You can build and publish datasets using

Examples

✅ hello/world
a dataset.sh tutorial

Tutorial: hello-world

In this guide, you will learn to create a simple dataset

Start
✅ media datasets
a dataset.sh tutorial

Tutorial: media datasets

in this guide, you will learn to how to bundle dataset with media files.

Start
✅ synthetic data datasets
a dataset.sh tutorial

Tutorial: synthetic data datasets

in this guide, you will learn to how to create and bundle dataset with synthetic data.

Start

How to import datasets

You can load the content of dataset by following the instruction on our dataset browser web ui.

open gui from command line
dataset.sh gui hello/world