本文主要通过两个实例来加强对TFRecord数据的理解,分别为目标检测算法SSD和语义分割算法deeplab。仅作为学习笔记以便后续查阅。

TFRecord简介

TFRecords文件表示字符串序列(二进制文件)。其格式不是随机访问的,因此它适用于流式传输大量数据,但如果需要快速分片或其他非顺序访问则不适用。

TFRecords文件包含CRC32C(使用Castagnoli多项式的32位CRC)哈希值的字符串序列。每条记录都有以下格式

uint64 length
uint32 masked_crc32_of_length
byte data[length]
uint32 masked_crc32_of_data

这些记录连接在一起以生成文件。更多CRC的描述参阅这里,CRC的掩码形式如下:

masked_crc = ((crc >> 15) | (crc << 17)) + 0xa282ead8ul

SSD中TFRecord的使用

SSD是一种目标检测方法。本节中使用到的代码主要来自于SSD-Tensorflow。本节主要包括数据集的介绍,TFRecord数据写入,TFRecord数据读取。

数据集介绍

本节中主要使用VOC 2007数据集作为例子。具体下载方法请参考py-faster-rcnn中的介绍。VOC2007具体文件结构如下所示:

|---VOC2007
|---Annotations
|---000001.xml ~ 009963.xml
|---ImageSets
|---Layout
|---test.txt
|---train.txt
|---trainval.txt
|---val.txt
|---Main(每个类别都有对应的下列文件爱呢)
|---test.txt...
|---train.txt...
|---trainval.txt...
|---val.txt...
|---Segmentation
|---test.txt
|---train.txt
|---trainval.txt
|---val.txt
|---JPEGImages
|---000001.jpg ~ 009963.jpg
|---SegmentationClass
|---*.png(总共632张分割图)
|---SegmentationObject
|---*.png(总共632张分割图)

Annotations对应于目标检测的标注,SegmentationClass和SegmentationObject对应于语义分割的标注。

Annotation中000005.xml的内容如下所示:

<annotation>
<folder>VOC2007</folder>
<filename>000001.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
<flickrid>341012865</flickrid>
</source>
<owner>
<flickrid>Fried Camels</flickrid>
<name>Jinky the Fruit Bat</name>
</owner>
<size>
<width>353</width>
<height>500</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>dog</name>
<pose>Left</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>48</xmin>
<ymin>240</ymin>
<xmax>195</xmax>
<ymax>371</ymax>
</bndbox>
</object>
<object>
<name>person</name>
<pose>Left</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>8</xmin>
<ymin>12</ymin>
<xmax>352</xmax>
<ymax>498</ymax>
</bndbox>
</object>
</annotation>

这里不对数据做过多介绍,具体请参阅官方网址。

TFRecord数据写入

TFRecord构建的结构大致如下:

                                                               |--> tf.train.Feature 
|--> tf.train.Feature
tf.train.Example(features=) -> tf.train.Features(feature={}) --| ...
|--> tf.train.Feature
|--> tf.train.Feature
/ tf.train.BytesList
tf.train.Feature -- tf.train.Int64List
\ tf.train.FloatList

用文字描述上面的结构大概是这样的。首先最外层是一个tf.train.Example(features=)类型,需要传一个tf.train.Features(feature={})类型的参数给features。而tf.train.Features需要传入一个字典类型给feature参数,字典key为保存数据序列的名称,value为一个tf.train.BytesList, tf.train.Int64Listtf.train.FloatList中的一种类型。这三种类型分别对应于string,integer和float类型的list数据。并且这三种类型含有共同的参数value,value需要传入的数据为对应类型的list数据。

主要代码

首先需要像写txt文件一样打开文件

# https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/pascalvoc_to_tfrecords.py#L210
with tf.python_io.TFRecordWriter(tf_filename) as tfrecord_writer:

以下代码对应3种类型的特征方法

# https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/dataset_utils.py#L30
def int64_feature(value):
"""Wrapper for inserting int64 features into Example proto.
"""
if not isinstance(value, list):
value = [value]
return tf.train.Feature(int64_list=tf.train.Int64List(value=value))


def float_feature(value):
"""Wrapper for inserting float features into Example proto.
"""
if not isinstance(value, list):
value = [value]
return tf.train.Feature(float_list=tf.train.FloatList(value=value))


def bytes_feature(value):
"""Wrapper for inserting bytes features into Example proto.
"""
if not isinstance(value, list):
value = [value]
return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))

以下程序主要是读取图像并解析相应的xml文件获取标注

https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/pascalvoc_to_tfrecords.py#L70
def _process_image(directory, name):
"""Process a image and annotation file.

Args:
filename: string, path to an image file e.g., '/path/to/example.JPG'.
coder: instance of ImageCoder to provide TensorFlow image coding utils.
Returns:
image_buffer: string, JPEG encoding of RGB image.
height: integer, image height in pixels.
width: integer, image width in pixels.
"""
# Read the image file.
filename = directory + DIRECTORY_IMAGES + name + '.jpg'
image_data = tf.gfile.FastGFile(filename, 'r').read()

# Read the XML annotation file.
filename = os.path.join(directory, DIRECTORY_ANNOTATIONS, name + '.xml')
tree = ET.parse(filename)
root = tree.getroot()

# Image shape.
size = root.find('size')
shape = [int(size.find('height').text),
int(size.find('width').text),
int(size.find('depth').text)]
# Find annotations.
bboxes = []
labels = []
labels_text = []
difficult = []
truncated = []
for obj in root.findall('object'):
label = obj.find('name').text
labels.append(int(VOC_LABELS[label][0]))
labels_text.append(label.encode('ascii'))

if obj.find('difficult'):
difficult.append(int(obj.find('difficult').text))
else:
difficult.append(0)
if obj.find('truncated'):
truncated.append(int(obj.find('truncated').text))
else:
truncated.append(0)

bbox = obj.find('bndbox')
bboxes.append((float(bbox.find('ymin').text) / shape[0],
float(bbox.find('xmin').text) / shape[1],
float(bbox.find('ymax').text) / shape[0],
float(bbox.find('xmax').text) / shape[1]
))
return image_data, shape, bboxes, labels, labels_text, difficult, truncated

以下程序主要得到tf.train.Example

# https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/pascalvoc_to_tfrecords.py#L124
def _convert_to_example(image_data, labels, labels_text, bboxes, shape,
difficult, truncated):
"""Build an Example proto for an image example.

Args:
image_data: string, JPEG encoding of RGB image;
labels: list of integers, identifier for the ground truth;
labels_text: list of strings, human-readable labels;
bboxes: list of bounding boxes; each box is a list of integers;
specifying [xmin, ymin, xmax, ymax]. All boxes are assumed to belong
to the same label as the image label.
shape: 3 integers, image shapes in pixels.
Returns:
Example proto
"""
xmin = []
ymin = []
xmax = []
ymax = []
for b in bboxes:
assert len(b) == 4
# pylint: disable=expression-not-assigned
[l.append(point) for l, point in zip([ymin, xmin, ymax, xmax], b)]
# pylint: enable=expression-not-assigned

image_format = b'JPEG'
example = tf.train.Example(features=tf.train.Features(feature={
'image/height': int64_feature(shape[0]),
'image/width': int64_feature(shape[1]),
'image/channels': int64_feature(shape[2]),
'image/shape': int64_feature(shape),
'image/object/bbox/xmin': float_feature(xmin),
'image/object/bbox/xmax': float_feature(xmax),
'image/object/bbox/ymin': float_feature(ymin),
'image/object/bbox/ymax': float_feature(ymax),
'image/object/bbox/label': int64_feature(labels),
'image/object/bbox/label_text': bytes_feature(labels_text),
'image/object/bbox/difficult': int64_feature(difficult),
'image/object/bbox/truncated': int64_feature(truncated),
'image/format': bytes_feature(image_format),
'image/encoded': bytes_feature(image_data)}))
return example

为了便于理解,我将对应的feature函数还原,并只取前两个特征,这样代码就与上面介绍的结构相对应了。代码如下

example = tf.train.Example(features=tf.train.Features(feature={
'image/height': tf.train.Feature(int64_list=tf.train.Int64List(value=[shape[0]])),
'image/width':tf.train.Feature(int64_list=tf.train.Int64List(value=[shape[1]]))
}))

最后在写如TFRecord之前需要调用SerializeToString方法

# https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/pascalvoc_to_tfrecords.py#L180
tfrecord_writer.write(example.SerializeToString())

TFRecord数据读取

在SSD的代码中,主要是使用slim模块来协助读取TFRecord的。主要代码如下:

# https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/pascalvoc_common.py#L49
reader = tf.TFRecordReader
# Features in Pascal VOC TFRecords.
keys_to_features = {
'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
'image/format': tf.FixedLenFeature((), tf.string, default_value='jpeg'),
'image/height': tf.FixedLenFeature([1], tf.int64),
'image/width': tf.FixedLenFeature([1], tf.int64),
'image/channels': tf.FixedLenFeature([1], tf.int64),
'image/shape': tf.FixedLenFeature([3], tf.int64),
'image/object/bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/ymax': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/label': tf.VarLenFeature(dtype=tf.int64),
'image/object/bbox/difficult': tf.VarLenFeature(dtype=tf.int64),
'image/object/bbox/truncated': tf.VarLenFeature(dtype=tf.int64),
}
items_to_handlers = {
'image': slim.tfexample_decoder.Image('image/encoded', 'image/format'),
'shape': slim.tfexample_decoder.Tensor('image/shape'),
'object/bbox': slim.tfexample_decoder.BoundingBox(
['ymin', 'xmin', 'ymax', 'xmax'], 'image/object/bbox/'),
'object/label': slim.tfexample_decoder.Tensor('image/object/bbox/label'),
'object/difficult': slim.tfexample_decoder.Tensor('image/object/bbox/difficult'),
'object/truncated': slim.tfexample_decoder.Tensor('image/object/bbox/truncated'),
}
decoder = slim.tfexample_decoder.TFExampleDecoder(
keys_to_features, items_to_handlers)

dataset = slim.dataset.Dataset(
data_sources=file_pattern,
reader=reader,
decoder=decoder,
num_samples=split_to_sizes[split_name],
items_to_descriptions=items_to_descriptions,
num_classes=num_classes,
labels_to_names=labels_to_names)

provider = slim.dataset_data_provider.DatasetDataProvider(
dataset,
num_readers=FLAGS.num_readers,
common_queue_capacity=20 * FLAGS.batch_size,
common_queue_min=10 * FLAGS.batch_size,
shuffle=True)
# Get for SSD network: image, labels, bboxes.
[image, shape, glabels, gbboxes] = provider.get(['image', 'shape', 'object/label', 'object/bbox'])

其中TFExampleDecoder的定义如下:

# https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py#L455
class TFExampleDecoder(data_decoder.DataDecoder):
"""A decoder for TensorFlow Examples.
Decoding Example proto buffers is comprised of two stages: (1) Example parsing
and (2) tensor manipulation.
In the first stage, the tf.parse_example function is called with a list of
FixedLenFeatures and SparseLenFeatures. These instances tell TF how to parse
the example. The output of this stage is a set of tensors.
In the second stage, the resulting tensors are manipulated to provide the
requested 'item' tensors.
To perform this decoding operation, an ExampleDecoder is given a list of
ItemHandlers. Each ItemHandler indicates the set of features for stage 1 and
contains the instructions for post_processing its tensors for stage 2.
"""

def __init__(self, keys_to_features, items_to_handlers):
"""Constructs the decoder.
Args:
keys_to_features: a dictionary from TF-Example keys to either
tf.VarLenFeature or tf.FixedLenFeature instances. See tensorflow's
parsing_ops.py.
items_to_handlers: a dictionary from items (strings) to ItemHandler
instances. Note that the ItemHandler's are provided the keys that they
use to return the final item Tensors.
"""
self._keys_to_features = keys_to_features
self._items_to_handlers = items_to_handlers

def list_items(self):
"""See base class."""
return list(self._items_to_handlers.keys())

def decode(self, serialized_example, items=None):
"""Decodes the given serialized TF-example.
Args:
serialized_example: a serialized TF-example tensor.
items: the list of items to decode. These must be a subset of the item
keys in self._items_to_handlers. If `items` is left as None, then all
of the items in self._items_to_handlers are decoded.
Returns:
the decoded items, a list of tensor.
"""
example = parsing_ops.parse_single_example(serialized_example,
self._keys_to_features)

# Reshape non-sparse elements just once, adding the reshape ops in
# deterministic order.
for k in sorted(self._keys_to_features):
v = self._keys_to_features[k]
if isinstance(v, parsing_ops.FixedLenFeature):
example[k] = array_ops.reshape(example[k], v.shape)

if not items:
items = self._items_to_handlers.keys()

outputs = []
for item in items:
handler = self._items_to_handlers[item]
keys_to_tensors = {key: example[key] for key in handler.keys}
outputs.append(handler.tensors_to_item(keys_to_tensors))
return outputs

decode方法中example = parsing_ops.parse_single_example(serialized_example, self._keys_to_features)便是解析TFRecord数据。decode方法主要在slim.dataset_data_provider.DatasetDataProvider被调用,详情见这里

deeplab中TFRecord的使用

TFReccord写入

主要代码如下

# https://github.com/tensorflow/models/blob/master/research/deeplab/datasets/build_data.py#L105
def _int64_list_feature(values):
"""Returns a TF-Feature of int64_list.
Args:
values: A scalar or list of values.
Returns:
A TF-Feature.
"""
if not isinstance(values, collections.Iterable):
values = [values]

return tf.train.Feature(int64_list=tf.train.Int64List(value=values))


def _bytes_list_feature(values):
"""Returns a TF-Feature of bytes.
Args:
values: A string.
Returns:
A TF-Feature.
"""
def norm2bytes(value):
return value.encode() if isinstance(value, str) and six.PY3 else value

return tf.train.Feature(
bytes_list=tf.train.BytesList(value=[norm2bytes(values)]))


def image_seg_to_tfexample(image_data, filename, height, width, seg_data):
"""Converts one image/segmentation pair to tf example.
Args:
image_data: string of image data.
filename: image filename.
height: image height.
width: image width.
seg_data: string of semantic segmentation data.
Returns:
tf example of one image/segmentation pair.
"""
return tf.train.Example(features=tf.train.Features(feature={
'image/encoded': _bytes_list_feature(image_data),
'image/filename': _bytes_list_feature(filename),
'image/format': _bytes_list_feature(
_IMAGE_FORMAT_MAP[FLAGS.image_format]),
'image/height': _int64_list_feature(height),
'image/width': _int64_list_feature(width),
'image/channels': _int64_list_feature(3),
'image/segmentation/class/encoded': (
_bytes_list_feature(seg_data)),
'image/segmentation/class/format': _bytes_list_feature(
FLAGS.label_format),
}))
# https://github.com/tensorflow/models/blob/master/research/deeplab/datasets/build_voc2012_data.py#L84
def _convert_dataset(dataset_split):
"""Converts the specified dataset split to TFRecord format.
Args:
dataset_split: The dataset split (e.g., train, test).
Raises:
RuntimeError: If loaded image and label have different shape.
"""
dataset = os.path.basename(dataset_split)[:-4]
sys.stdout.write('Processing ' + dataset)
filenames = [x.strip('\n') for x in open(dataset_split, 'r')]
num_images = len(filenames)
num_per_shard = int(math.ceil(num_images / float(_NUM_SHARDS)))

image_reader = build_data.ImageReader('jpeg', channels=3)
label_reader = build_data.ImageReader('png', channels=1)

for shard_id in range(_NUM_SHARDS):
output_filename = os.path.join(
FLAGS.output_dir,
'%s-%05d-of-%05d.tfrecord' % (dataset, shard_id, _NUM_SHARDS))
with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
start_idx = shard_id * num_per_shard
end_idx = min((shard_id + 1) * num_per_shard, num_images)
for i in range(start_idx, end_idx):
sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
i + 1, len(filenames), shard_id))
sys.stdout.flush()
# Read the image.
image_filename = os.path.join(
FLAGS.image_folder, filenames[i] + '.' + FLAGS.image_format)
image_data = tf.gfile.FastGFile(image_filename, 'rb').read()
height, width = image_reader.read_image_dims(image_data)
# Read the semantic segmentation annotation.
seg_filename = os.path.join(
FLAGS.semantic_segmentation_folder,
filenames[i] + '.' + FLAGS.label_format)
seg_data = tf.gfile.FastGFile(seg_filename, 'rb').read()
seg_height, seg_width = label_reader.read_image_dims(seg_data)
if height != seg_height or width != seg_width:
raise RuntimeError('Shape mismatched between image and label.')
# Convert to tf example.
# 关键代码是下面两行代码
example = build_data.image_seg_to_tfexample(
image_data, filenames[i], height, width, seg_data)
tfrecord_writer.write(example.SerializeToString())
sys.stdout.write('\n')
sys.stdout.flush()

TFRecord读取

# https://github.com/tensorflow/models/blob/master/research/deeplab/datasets/segmentation_dataset.py#L156
# Specify how the TF-Examples are decoded.
keys_to_features = {
'image/encoded': tf.FixedLenFeature(
(), tf.string, default_value=''),
'image/filename': tf.FixedLenFeature(
(), tf.string, default_value=''),
'image/format': tf.FixedLenFeature(
(), tf.string, default_value='jpeg'),
'image/height': tf.FixedLenFeature(
(), tf.int64, default_value=0),
'image/width': tf.FixedLenFeature(
(), tf.int64, default_value=0),
'image/segmentation/class/encoded': tf.FixedLenFeature(
(), tf.string, default_value=''),
'image/segmentation/class/format': tf.FixedLenFeature(
(), tf.string, default_value='png'),
}
items_to_handlers = {
'image': tfexample_decoder.Image(
image_key='image/encoded',
format_key='image/format',
channels=3),
'image_name': tfexample_decoder.Tensor('image/filename'),
'height': tfexample_decoder.Tensor('image/height'),
'width': tfexample_decoder.Tensor('image/width'),
'labels_class': tfexample_decoder.Image(
image_key='image/segmentation/class/encoded',
format_key='image/segmentation/class/format',
channels=1),
}

decoder = tfexample_decoder.TFExampleDecoder(
keys_to_features, items_to_handlers)

dataset = dataset.Dataset(
data_sources=file_pattern,
reader=tf.TFRecordReader,
decoder=decoder,
num_samples=splits_to_sizes[split_name],
items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
ignore_label=ignore_label,
num_classes=num_classes,
name=dataset_name,
multi_label=True)

然后通过 input_genetator调用dataset

# https://github.com/tensorflow/models/blob/master/research/deeplab/train.py#L253
samples = input_generator.get(
dataset,
FLAGS.train_crop_size,
clone_batch_size,
min_resize_value=FLAGS.min_resize_value,
max_resize_value=FLAGS.max_resize_value,
resize_factor=FLAGS.resize_factor,
min_scale_factor=FLAGS.min_scale_factor,
max_scale_factor=FLAGS.max_scale_factor,
scale_factor_step_size=FLAGS.scale_factor_step_size,
dataset_split=FLAGS.train_split,
is_training=True,
model_variant=FLAGS.model_variant)
inputs_queue = prefetch_queue.prefetch_queue(
samples, capacity=128 * config.num_clones)

而在input_generator的get方法中,通过dataset_data_provider.DatasetDataProvider调用dataset,这里就和VOC 2007中的描述一样了。

# https://github.com/tensorflow/models/blob/master/research/deeplab/utils/input_generator.py
data_provider = dataset_data_provider.DatasetDataProvider(
dataset,
num_readers=num_readers,
num_epochs=None if is_training else 1,
shuffle=is_training)
image, label, image_name, height, width = _get_data(data_provider, dataset_split)