Template: media
Create Project
First create new project using the following command:
datafact new datafact-tutorial/media -t media
Click to show output
Now go to the project folder datafact-tutorial/media
.
cd datafact-tutorial/media
ls
Hide Output
README.md bin_files.py data.py datafact.json project.py type.py
What to modify
You should modify the following files:
data.py
type.py
bin_files.py
README.md
- data.py
- bin_files.py
- type.py
- README.md
Implement a function called create_data_dict
which produce the content of your dataset.
from dataset_sh.constants import DEFAULT_COLLECTION_NAME
colors = [
('red', (255, 0, 0)),
('green', (0, 255, 0)),
('blue', (0, 0, 255)),
('yellow', (255, 255, 0)),
('orange', (255, 165, 0)),
('purple', (128, 0, 128)),
('cyan', (0, 255, 255)),
('magenta', (255, 0, 255)),
('white', (255, 255, 255)),
('black', (0, 0, 0)),
('gray', (128, 128, 128)),
('brown', (165, 42, 42)),
('pink', (255, 192, 203)),
('violet', (238, 130, 238)),
('gold', (255, 215, 0)),
('silver', (192, 192, 192)),
('maroon', (128, 0, 0)),
('olive', (128, 128, 0)),
('teal', (0, 128, 128)),
('navy', (0, 0, 128)),
('lime', (0, 255, 0)),
]
def create_main_collection():
return [
{
'color': color,
'rgb': rgb,
'image': f'{color}.png',
} for color, rgb in colors
]
def create_data_dict():
return {
DEFAULT_COLLECTION_NAME: create_main_collection()
}
Implement a generator function iter_binary_files
for the list of your binary data in this file.
import struct
import zlib
from data import colors
has_binary_files = True
def create_fake_png(width=128, height=128, color=(255, 0, 0)):
"""
Create a fake PNG image with the specified width, height, and color.
Args:
width (int): Width of the image in pixels.
height (int): Height of the image in pixels.
color (tuple): RGB color as a tuple (R, G, B), values between 0-255.
Returns:
bytes: The binary data of the PNG file.
"""
# PNG file signature
png_signature = b'\x89PNG\r\n\x1a\n'
# IHDR chunk
bit_depth = 8 # Bits per channel
color_type = 2 # Color type: RGB
compression = 0 # No compression
filter_method = 0 # No filter
interlace = 0 # No interlace
ihdr_data = struct.pack(">IIBBBBB", width, height, bit_depth, color_type, compression, filter_method, interlace)
ihdr_chunk = b'IHDR' + ihdr_data
ihdr_crc = struct.pack(">I", zlib.crc32(ihdr_chunk) & 0xFFFFFFFF)
ihdr = struct.pack(">I", len(ihdr_data)) + ihdr_chunk + ihdr_crc
# IDAT chunk (pixel data)
pixel_data = b''.join(
b'\x00' + bytes(color) * width for _ in range(height) # Rows with no filter, filled with the specified color
)
compressed_data = zlib.compress(pixel_data)
idat_chunk = b'IDAT' + compressed_data
idat_crc = struct.pack(">I", zlib.crc32(idat_chunk) & 0xFFFFFFFF)
idat = struct.pack(">I", len(compressed_data)) + idat_chunk + idat_crc
# IEND chunk
iend_chunk = b'IEND'
iend_crc = struct.pack(">I", zlib.crc32(iend_chunk) & 0xFFFFFFFF)
iend = struct.pack(">I", 0) + iend_chunk + iend_crc
# Combine all chunks into the final PNG file
return png_signature + ihdr + idat + iend
def iter_binary_files():
for color_name, color_value in colors:
name = f'{color_name}.png'
content = create_fake_png(color=color_value)
yield name, content
Provide your type annotation in the data_types
dictionary in this file.
from easytype import TypeBuilder
from dataset_sh.constants import DEFAULT_COLLECTION_NAME
ColorImage = TypeBuilder.create(
'ColorImage',
color=str,
rgb=tuple[int, int, int],
pic=str,
)
data_types = {
DEFAULT_COLLECTION_NAME: ColorImage
}
Provide the readme document of your dataset in this file.
# datafact template: media
This dataset is built by the datafact template: `media`,
It contains a collection of images of pure colors.
# Create and publish dataset
This folder contains a project created by datafact (DATAset FACTory),
You can use this project to build and publish your dataset to [dataset.sh](https://dataset.sh)
## How to modify this project
You may want to modify the following files as follows:
* data.py
Implement a function called `create_data_dict` which produce the content of your dataset.
* type.py
Provide your type annotation in the `data_types` dictionary in this file.
* bin_files.py
Implement a generator function `iter_binary_files` for the list of your binary data in this file.
* README.md
Provide the readme document of your dataset in this file.
## Build Dataset
To build your dataset, use command:
```shell
python project.py build
```
## Publish Dataset
To publish your dataset, use command:
```shell
python project.py publish
```
You can also using the experimental auto-doc feature, to generated a documentation based on the type annotation stored in type.py
:
datafact experimental auto-doc
Build and publish
To build your dataset
python project.py build
To preview your build before publish
python project.py preview show
To publish your build
python project.py publish