通过分析实例来理解TFRecord数据的读取与写入

本文主要通过两个实例来加强对TFRecord数据的理解，分别为目标检测算法SSD和语义分割算法deeplab。仅作为学习笔记以便后续查阅。

TFRecord简介

TFRecords文件表示字符串序列（二进制文件）。其格式不是随机访问的，因此它适用于流式传输大量数据，但如果需要快速分片或其他非顺序访问则不适用。

TFRecords文件包含CRC32C（使用Castagnoli多项式的32位CRC）哈希值的字符串序列。每条记录都有以下格式

uint64 length
uint32 masked_crc32_of_length
byte   data[length]
uint32 masked_crc32_of_data

这些记录连接在一起以生成文件。更多CRC的描述参阅这里，CRC的掩码形式如下：

masked_crc = ((crc >> 15) | (crc << 17)) + 0xa282ead8ul

SSD中TFRecord的使用

SSD是一种目标检测方法。本节中使用到的代码主要来自于SSD-Tensorflow。本节主要包括数据集的介绍，TFRecord数据写入，TFRecord数据读取。

数据集介绍

本节中主要使用VOC 2007数据集作为例子。具体下载方法请参考py-faster-rcnn中的介绍。VOC2007具体文件结构如下所示：

|---VOC2007
    |---Annotations
        |---000001.xml ~ 009963.xml
    |---ImageSets
        |---Layout
            |---test.txt
            |---train.txt
            |---trainval.txt
            |---val.txt
        |---Main(每个类别都有对应的下列文件爱呢)
            |---test.txt...
            |---train.txt...
            |---trainval.txt...
            |---val.txt...
        |---Segmentation
            |---test.txt
            |---train.txt
            |---trainval.txt
            |---val.txt
    |---JPEGImages
        |---000001.jpg ~ 009963.jpg
    |---SegmentationClass
        |---*.png(总共632张分割图)
    |---SegmentationObject
        |---*.png(总共632张分割图)

Annotations对应于目标检测的标注，SegmentationClass和SegmentationObject对应于语义分割的标注。

Annotation中000005.xml的内容如下所示：

<annotation>
	<folder>VOC2007</folder>
	<filename>000001.jpg</filename>
	<source>
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
		<flickrid>341012865</flickrid>
	</source>
	<owner>
		<flickrid>Fried Camels</flickrid>
		<name>Jinky the Fruit Bat</name>
	</owner>
	<size>
		<width>353</width>
		<height>500</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>dog</name>
		<pose>Left</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>48</xmin>
			<ymin>240</ymin>
			<xmax>195</xmax>
			<ymax>371</ymax>
		</bndbox>
	</object>
	<object>
		<name>person</name>
		<pose>Left</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>8</xmin>
			<ymin>12</ymin>
			<xmax>352</xmax>
			<ymax>498</ymax>
		</bndbox>
	</object>
</annotation>

这里不对数据做过多介绍，具体请参阅官方网址。

TFRecord数据写入

TFRecord构建的结构大致如下：

                                                               |--> tf.train.Feature 
                                                               |--> tf.train.Feature
tf.train.Example(features=) -> tf.train.Features(feature={}) --|         ...
                                                               |--> tf.train.Feature
                                                               |--> tf.train.Feature
                   / tf.train.BytesList
tf.train.Feature --  tf.train.Int64List
                   \ tf.train.FloatList

用文字描述上面的结构大概是这样的。首先最外层是一个tf.train.Example(features=)类型，需要传一个tf.train.Features(feature={})类型的参数给features。而tf.train.Features需要传入一个字典类型给feature参数，字典key为保存数据序列的名称，value为一个tf.train.BytesList, tf.train.Int64List和tf.train.FloatList中的一种类型。这三种类型分别对应于string，integer和float类型的list数据。并且这三种类型含有共同的参数value，value需要传入的数据为对应类型的list数据。

主要代码

首先需要像写txt文件一样打开文件

# https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/pascalvoc_to_tfrecords.py#L210
with tf.python_io.TFRecordWriter(tf_filename) as tfrecord_writer:

以下代码对应3种类型的特征方法

# https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/dataset_utils.py#L30
def int64_feature(value):
    """Wrapper for inserting int64 features into Example proto.
    """
    if not isinstance(value, list):
        value = [value]
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))


def float_feature(value):
    """Wrapper for inserting float features into Example proto.
    """
    if not isinstance(value, list):
        value = [value]
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))


def bytes_feature(value):
    """Wrapper for inserting bytes features into Example proto.
    """
    if not isinstance(value, list):
        value = [value]
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))

以下程序主要是读取图像并解析相应的xml文件获取标注

https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/pascalvoc_to_tfrecords.py#L70
def _process_image(directory, name):
    """Process a image and annotation file.

    Args:
      filename: string, path to an image file e.g., '/path/to/example.JPG'.
      coder: instance of ImageCoder to provide TensorFlow image coding utils.
    Returns:
      image_buffer: string, JPEG encoding of RGB image.
      height: integer, image height in pixels.
      width: integer, image width in pixels.
    """
    # Read the image file.
    filename = directory + DIRECTORY_IMAGES + name + '.jpg'
    image_data = tf.gfile.FastGFile(filename, 'r').read()

    # Read the XML annotation file.
    filename = os.path.join(directory, DIRECTORY_ANNOTATIONS, name + '.xml')
    tree = ET.parse(filename)
    root = tree.getroot()

    # Image shape.
    size = root.find('size')
    shape = [int(size.find('height').text),
             int(size.find('width').text),
             int(size.find('depth').text)]
    # Find annotations.
    bboxes = []
    labels = []
    labels_text = []
    difficult = []
    truncated = []
    for obj in root.findall('object'):
        label = obj.find('name').text
        labels.append(int(VOC_LABELS[label][0]))
        labels_text.append(label.encode('ascii'))

        if obj.find('difficult'):
            difficult.append(int(obj.find('difficult').text))
        else:
            difficult.append(0)
        if obj.find('truncated'):
            truncated.append(int(obj.find('truncated').text))
        else:
            truncated.append(0)

        bbox = obj.find('bndbox')
        bboxes.append((float(bbox.find('ymin').text) / shape[0],
                       float(bbox.find('xmin').text) / shape[1],
                       float(bbox.find('ymax').text) / shape[0],
                       float(bbox.find('xmax').text) / shape[1]
                       ))
    return image_data, shape, bboxes, labels, labels_text, difficult, truncated

以下程序主要得到tf.train.Example

# https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/pascalvoc_to_tfrecords.py#L124
def _convert_to_example(image_data, labels, labels_text, bboxes, shape,
                        difficult, truncated):
    """Build an Example proto for an image example.

    Args:
      image_data: string, JPEG encoding of RGB image;
      labels: list of integers, identifier for the ground truth;
      labels_text: list of strings, human-readable labels;
      bboxes: list of bounding boxes; each box is a list of integers;
          specifying [xmin, ymin, xmax, ymax]. All boxes are assumed to belong
          to the same label as the image label.
      shape: 3 integers, image shapes in pixels.
    Returns:
      Example proto
    """
    xmin = []
    ymin = []
    xmax = []
    ymax = []
    for b in bboxes:
        assert len(b) == 4
        # pylint: disable=expression-not-assigned
        [l.append(point) for l, point in zip([ymin, xmin, ymax, xmax], b)]
        # pylint: enable=expression-not-assigned

    image_format = b'JPEG'
    example = tf.train.Example(features=tf.train.Features(feature={
            'image/height': int64_feature(shape[0]),
            'image/width': int64_feature(shape[1]),
            'image/channels': int64_feature(shape[2]),
            'image/shape': int64_feature(shape),
            'image/object/bbox/xmin': float_feature(xmin),
            'image/object/bbox/xmax': float_feature(xmax),
            'image/object/bbox/ymin': float_feature(ymin),
            'image/object/bbox/ymax': float_feature(ymax),
            'image/object/bbox/label': int64_feature(labels),
            'image/object/bbox/label_text': bytes_feature(labels_text),
            'image/object/bbox/difficult': int64_feature(difficult),
            'image/object/bbox/truncated': int64_feature(truncated),
            'image/format': bytes_feature(image_format),
            'image/encoded': bytes_feature(image_data)}))
    return example

为了便于理解，我将对应的feature函数还原，并只取前两个特征，这样代码就与上面介绍的结构相对应了。代码如下

example = tf.train.Example(features=tf.train.Features(feature={
	'image/height': tf.train.Feature(int64_list=tf.train.Int64List(value=[shape[0]])),
	'image/width':tf.train.Feature(int64_list=tf.train.Int64List(value=[shape[1]]))
}))

最后在写如TFRecord之前需要调用SerializeToString方法

# https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/pascalvoc_to_tfrecords.py#L180
tfrecord_writer.write(example.SerializeToString())

TFRecord数据读取

在SSD的代码中，主要是使用slim模块来协助读取TFRecord的。主要代码如下：

# https://github.com/busyboxs/SSD-Tensorflow/blob/master/datasets/pascalvoc_common.py#L49
reader = tf.TFRecordReader
# Features in Pascal VOC TFRecords.
keys_to_features = {
	'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
	'image/format': tf.FixedLenFeature((), tf.string, default_value='jpeg'),
	'image/height': tf.FixedLenFeature([1], tf.int64),
	'image/width': tf.FixedLenFeature([1], tf.int64),
	'image/channels': tf.FixedLenFeature([1], tf.int64),
	'image/shape': tf.FixedLenFeature([3], tf.int64),
	'image/object/bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
	'image/object/bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
	'image/object/bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
	'image/object/bbox/ymax': tf.VarLenFeature(dtype=tf.float32),
	'image/object/bbox/label': tf.VarLenFeature(dtype=tf.int64),
	'image/object/bbox/difficult': tf.VarLenFeature(dtype=tf.int64),
	'image/object/bbox/truncated': tf.VarLenFeature(dtype=tf.int64),
}
items_to_handlers = {
	'image': slim.tfexample_decoder.Image('image/encoded', 'image/format'),
	'shape': slim.tfexample_decoder.Tensor('image/shape'),
	'object/bbox': slim.tfexample_decoder.BoundingBox(
			['ymin', 'xmin', 'ymax', 'xmax'], 'image/object/bbox/'),
	'object/label': slim.tfexample_decoder.Tensor('image/object/bbox/label'),
	'object/difficult': slim.tfexample_decoder.Tensor('image/object/bbox/difficult'),
	'object/truncated': slim.tfexample_decoder.Tensor('image/object/bbox/truncated'),
}
decoder = slim.tfexample_decoder.TFExampleDecoder(
	keys_to_features, items_to_handlers)

dataset = slim.dataset.Dataset(
		data_sources=file_pattern,
		reader=reader,
		decoder=decoder,
		num_samples=split_to_sizes[split_name],
		items_to_descriptions=items_to_descriptions,
		num_classes=num_classes,
		labels_to_names=labels_to_names)

provider = slim.dataset_data_provider.DatasetDataProvider(
                    dataset,
                    num_readers=FLAGS.num_readers,
                    common_queue_capacity=20 * FLAGS.batch_size,
                    common_queue_min=10 * FLAGS.batch_size,
                    shuffle=True)
# Get for SSD network: image, labels, bboxes.
[image, shape, glabels, gbboxes] = provider.get(['image', 'shape', 'object/label', 'object/bbox'])

其中TFExampleDecoder的定义如下：

# https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py#L455
class TFExampleDecoder(data_decoder.DataDecoder):
  """A decoder for TensorFlow Examples.
  Decoding Example proto buffers is comprised of two stages: (1) Example parsing
  and (2) tensor manipulation.
  In the first stage, the tf.parse_example function is called with a list of
  FixedLenFeatures and SparseLenFeatures. These instances tell TF how to parse
  the example. The output of this stage is a set of tensors.
  In the second stage, the resulting tensors are manipulated to provide the
  requested 'item' tensors.
  To perform this decoding operation, an ExampleDecoder is given a list of
  ItemHandlers. Each ItemHandler indicates the set of features for stage 1 and
  contains the instructions for post_processing its tensors for stage 2.
  """

  def __init__(self, keys_to_features, items_to_handlers):
    """Constructs the decoder.
    Args:
      keys_to_features: a dictionary from TF-Example keys to either
        tf.VarLenFeature or tf.FixedLenFeature instances. See tensorflow's
        parsing_ops.py.
      items_to_handlers: a dictionary from items (strings) to ItemHandler
        instances. Note that the ItemHandler's are provided the keys that they
        use to return the final item Tensors.
    """
    self._keys_to_features = keys_to_features
    self._items_to_handlers = items_to_handlers

  def list_items(self):
    """See base class."""
    return list(self._items_to_handlers.keys())

  def decode(self, serialized_example, items=None):
    """Decodes the given serialized TF-example.
    Args:
      serialized_example: a serialized TF-example tensor.
      items: the list of items to decode. These must be a subset of the item
        keys in self._items_to_handlers. If `items` is left as None, then all
        of the items in self._items_to_handlers are decoded.
    Returns:
      the decoded items, a list of tensor.
    """
    example = parsing_ops.parse_single_example(serialized_example,
                                               self._keys_to_features)

    # Reshape non-sparse elements just once, adding the reshape ops in
    # deterministic order.
    for k in sorted(self._keys_to_features):
      v = self._keys_to_features[k]
      if isinstance(v, parsing_ops.FixedLenFeature):
        example[k] = array_ops.reshape(example[k], v.shape)

    if not items:
      items = self._items_to_handlers.keys()

    outputs = []
    for item in items:
      handler = self._items_to_handlers[item]
      keys_to_tensors = {key: example[key] for key in handler.keys}
      outputs.append(handler.tensors_to_item(keys_to_tensors))
    return outputs

decode方法中example = parsing_ops.parse_single_example(serialized_example, self._keys_to_features)便是解析TFRecord数据。decode方法主要在slim.dataset_data_provider.DatasetDataProvider被调用，详情见这里。

deeplab中TFRecord的使用

TFReccord写入

主要代码如下

# https://github.com/tensorflow/models/blob/master/research/deeplab/datasets/build_data.py#L105
def _int64_list_feature(values):
  """Returns a TF-Feature of int64_list.
  Args:
    values: A scalar or list of values.
  Returns:
    A TF-Feature.
  """
  if not isinstance(values, collections.Iterable):
    values = [values]

  return tf.train.Feature(int64_list=tf.train.Int64List(value=values))


def _bytes_list_feature(values):
  """Returns a TF-Feature of bytes.
  Args:
    values: A string.
  Returns:
    A TF-Feature.
  """
  def norm2bytes(value):
    return value.encode() if isinstance(value, str) and six.PY3 else value

  return tf.train.Feature(
      bytes_list=tf.train.BytesList(value=[norm2bytes(values)]))


def image_seg_to_tfexample(image_data, filename, height, width, seg_data):
  """Converts one image/segmentation pair to tf example.
  Args:
    image_data: string of image data.
    filename: image filename.
    height: image height.
    width: image width.
    seg_data: string of semantic segmentation data.
  Returns:
    tf example of one image/segmentation pair.
  """
  return tf.train.Example(features=tf.train.Features(feature={
      'image/encoded': _bytes_list_feature(image_data),
      'image/filename': _bytes_list_feature(filename),
      'image/format': _bytes_list_feature(
          _IMAGE_FORMAT_MAP[FLAGS.image_format]),
      'image/height': _int64_list_feature(height),
      'image/width': _int64_list_feature(width),
      'image/channels': _int64_list_feature(3),
      'image/segmentation/class/encoded': (
          _bytes_list_feature(seg_data)),
      'image/segmentation/class/format': _bytes_list_feature(
          FLAGS.label_format),
  }))

# https://github.com/tensorflow/models/blob/master/research/deeplab/datasets/build_voc2012_data.py#L84
def _convert_dataset(dataset_split):
  """Converts the specified dataset split to TFRecord format.
  Args:
    dataset_split: The dataset split (e.g., train, test).
  Raises:
    RuntimeError: If loaded image and label have different shape.
  """
  dataset = os.path.basename(dataset_split)[:-4]
  sys.stdout.write('Processing ' + dataset)
  filenames = [x.strip('\n') for x in open(dataset_split, 'r')]
  num_images = len(filenames)
  num_per_shard = int(math.ceil(num_images / float(_NUM_SHARDS)))

  image_reader = build_data.ImageReader('jpeg', channels=3)
  label_reader = build_data.ImageReader('png', channels=1)

  for shard_id in range(_NUM_SHARDS):
    output_filename = os.path.join(
        FLAGS.output_dir,
        '%s-%05d-of-%05d.tfrecord' % (dataset, shard_id, _NUM_SHARDS))
    with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
      start_idx = shard_id * num_per_shard
      end_idx = min((shard_id + 1) * num_per_shard, num_images)
      for i in range(start_idx, end_idx):
        sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
            i + 1, len(filenames), shard_id))
        sys.stdout.flush()
        # Read the image.
        image_filename = os.path.join(
            FLAGS.image_folder, filenames[i] + '.' + FLAGS.image_format)
        image_data = tf.gfile.FastGFile(image_filename, 'rb').read()
        height, width = image_reader.read_image_dims(image_data)
        # Read the semantic segmentation annotation.
        seg_filename = os.path.join(
            FLAGS.semantic_segmentation_folder,
            filenames[i] + '.' + FLAGS.label_format)
        seg_data = tf.gfile.FastGFile(seg_filename, 'rb').read()
        seg_height, seg_width = label_reader.read_image_dims(seg_data)
        if height != seg_height or width != seg_width:
          raise RuntimeError('Shape mismatched between image and label.')
        # Convert to tf example.
		# 关键代码是下面两行代码
        example = build_data.image_seg_to_tfexample(
            image_data, filenames[i], height, width, seg_data)
        tfrecord_writer.write(example.SerializeToString())
    sys.stdout.write('\n')
    sys.stdout.flush()

TFRecord读取

# https://github.com/tensorflow/models/blob/master/research/deeplab/datasets/segmentation_dataset.py#L156
# Specify how the TF-Examples are decoded.
keys_to_features = {
	'image/encoded': tf.FixedLenFeature(
		(), tf.string, default_value=''),
	'image/filename': tf.FixedLenFeature(
		(), tf.string, default_value=''),
	'image/format': tf.FixedLenFeature(
		(), tf.string, default_value='jpeg'),
	'image/height': tf.FixedLenFeature(
		(), tf.int64, default_value=0),
	'image/width': tf.FixedLenFeature(
		(), tf.int64, default_value=0),
	'image/segmentation/class/encoded': tf.FixedLenFeature(
		(), tf.string, default_value=''),
	'image/segmentation/class/format': tf.FixedLenFeature(
		(), tf.string, default_value='png'),
}
items_to_handlers = {
	'image': tfexample_decoder.Image(
		image_key='image/encoded',
		format_key='image/format',
		channels=3),
	'image_name': tfexample_decoder.Tensor('image/filename'),
	'height': tfexample_decoder.Tensor('image/height'),
	'width': tfexample_decoder.Tensor('image/width'),
	'labels_class': tfexample_decoder.Image(
		image_key='image/segmentation/class/encoded',
		format_key='image/segmentation/class/format',
		channels=1),
}

decoder = tfexample_decoder.TFExampleDecoder(
	keys_to_features, items_to_handlers)

dataset = dataset.Dataset(
      data_sources=file_pattern,
      reader=tf.TFRecordReader,
      decoder=decoder,
      num_samples=splits_to_sizes[split_name],
      items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
      ignore_label=ignore_label,
      num_classes=num_classes,
      name=dataset_name,
      multi_label=True)

然后通过 input_genetator调用dataset

# https://github.com/tensorflow/models/blob/master/research/deeplab/train.py#L253
samples = input_generator.get(
          dataset,
          FLAGS.train_crop_size,
          clone_batch_size,
          min_resize_value=FLAGS.min_resize_value,
          max_resize_value=FLAGS.max_resize_value,
          resize_factor=FLAGS.resize_factor,
          min_scale_factor=FLAGS.min_scale_factor,
          max_scale_factor=FLAGS.max_scale_factor,
          scale_factor_step_size=FLAGS.scale_factor_step_size,
          dataset_split=FLAGS.train_split,
          is_training=True,
          model_variant=FLAGS.model_variant)
      inputs_queue = prefetch_queue.prefetch_queue(
          samples, capacity=128 * config.num_clones)

而在input_generator的get方法中，通过dataset_data_provider.DatasetDataProvider调用dataset，这里就和VOC 2007中的描述一样了。

# https://github.com/tensorflow/models/blob/master/research/deeplab/utils/input_generator.py
data_provider = dataset_data_provider.DatasetDataProvider(
	dataset,
	num_readers=num_readers,
	num_epochs=None if is_training else 1,
	shuffle=is_training)
image, label, image_name, height, width = _get_data(data_provider, dataset_split)