Guide For Dataset Creators
· 2 min read
What do you need to know
- General Best Practice
- (Optional) How to use
easytype
to provide type annotation. - How to create a dataset bundle using one of the following:
dataset.sh
API: Tutorialdatafact
: Tutorial
- creating synthetic datasets? You may also want to learn
mkb
- How to verify dataset quality? (How good does the data capture your intents)
- setup dataset.sh account and upload to our public dataset repository server.
What datasets can I build?
- create synthetic datasets using generative model
- convert old datasets
- crowdsource or annotate them yourself
- convert from knowledge graph (e.g. wikidata)
Want to build applications with data?
You can build
- Open source ML model using our datasets
- Web/Mobile app that display/navigate/visualize data so human can learn from it
- Video that visualize data
- Informative graphic data visualization
Want to contribute to the dataset.sh
core software?
You want to
- understand the internal data structure
- understand the communication protocol
Looking for Ideas?
If you want to contribute but don't know where to start, you can consider the following:
- create synthetic datasets using generative model
- convert old datasets
- crowdsource or annotate them yourself
- convert from knowledge graph (e.g. wikidata)
for applications:
- consider build and open source ML model using our datasets
- create data visualization
- use data for your app