美文网首页
TensorFlow1.0 - C2 Guide - 2 Est

TensorFlow1.0 - C2 Guide - 2 Est

作者: 左心Chris | 来源:发表于2019-08-13 21:12 被阅读0次

1 Premade Estimators

input function return dataset, feature column, estimators

1.1 Prerequisites and getting the sample code and programming stack

pip install pandas
git clone https://github.com/tensorflow/models
cd models/samples/core/get_started/
python premade_estimator.py

programming stack


1.2 Sample of classifying irises: an overview


1.3 Overview of programming with Estimators

  • Create one or more input functions.
  • Define the model's feature columns.
  • Instantiate an Estimator, specifying the feature columns and various hyperparameters.
  • Call one or more methods on the Estimator object, passing the appropriate input function as the source of the data.

1.4 Create input functions and define the feature columns and instantiate an estimator and train, evaluate, predict

1.4.1 Input function

Return features and label

def train_input_fn(features, labels, batch_size):
    """An input function for training"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle, repeat, and batch the examples.
    return dataset.shuffle(1000).repeat().batch(batch_size)
1.4.2 Feature columns
# Feature columns describe how to use the input.
my_feature_columns = []
for key in train_x.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))
1.4.3 Instantiate an estimator
# Build a DNN with 2 hidden layers and 10 nodes in each hidden layer.
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # Two hidden layers of 10 nodes each.
    hidden_units=[10, 10],
    # The model must choose between 3 classes.
    n_classes=3)
1.4.4 Train evaluate predict
# Train the Model.
classifier.train(
    input_fn=lambda:iris_data.train_input_fn(train_x, train_y, args.batch_size),
    steps=args.train_steps)
# Evaluate the model.
eval_result = classifier.evaluate(
    input_fn=lambda:iris_data.eval_input_fn(test_x, test_y, args.batch_size))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"')

for pred_dict, expec in zip(predictions, expected):
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print(template.format(iris_data.SPECIES[class_id],
                          100 * probability, expec))

2 Checkpoints

checkpoints自动save和restore

2.1 Saving

  • Automatically write checkpoints and event files
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    hidden_units=[10, 10],
    n_classes=3,
    model_dir='models/iris')

If not set,then they are saved into a temp directory

  • Checking frequency
    Write every 10 minutes
    Write a checkpoint when the train starts and completes
    Retain only the 5 most recent
    Using tf.estimator.RunConfig() to abjust

2.2 Restoring

第一次调用train方法,保存一个checkpoint
以后每次调用train,evaluate,predict都会

  • 重新建立graph
  • 由最新的checkpoint初始化新model的参数
  • 注意model和checkpoint必须匹配上才行

3 Feature Columns

9种2类

3.1 Input to a DNN

Numerical feature and categorical feature(use one-hot)

3.2 Feature Columns

Numeric column

直接的数字化特征

Bucketized column

把数字按照区段切分转化为one-hot格式

Categorical identity columns

把数字1个1个切换转化为one-hot格式,超过范围的为默认值(默认值必须配置,否则就fail)https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_identity

Categorical vocabulary column

一个是从list读,一个是从file里面读

# Given input "feature_name_from_input_fn" which is a string,
# create a categorical feature by mapping the input to one of
# the elements in the vocabulary list.
vocabulary_feature_column =
    tf.feature_column.categorical_column_with_vocabulary_list(
        key=feature_name_from_input_fn,
        vocabulary_list=["kitchenware", "electronics", "sports"])

# Given input "feature_name_from_input_fn" which is a string,
# create a categorical feature to our model by mapping the input to one of
# the elements in the vocabulary file
vocabulary_feature_column =
    tf.feature_column.categorical_column_with_vocabulary_file(
        key=feature_name_from_input_fn,
        vocabulary_file="product_class.txt",
        vocabulary_size=3)
Hashed column
hashed_feature_column =
    tf.feature_column.categorical_column_with_hash_bucket(
        key = "some_feature",
        hash_bucket_size = 100) # The number of categories
Crossed column

You can create a feature cross from input_fn or any categorical column, except categorical_column_with_hash_bucket

def make_dataset(latitude, longitude, labels):
    assert latitude.shape == longitude.shape == labels.shape

    features = {'latitude': latitude.flatten(),
                'longitude': longitude.flatten()}
    labels=labels.flatten()

    return tf.data.Dataset.from_tensor_slices((features, labels))


# Bucketize the latitude and longitude using the `edges`
latitude_bucket_fc = tf.feature_column.bucketized_column(
    tf.feature_column.numeric_column('latitude'),
    list(atlanta.latitude.edges))

longitude_bucket_fc = tf.feature_column.bucketized_column(
    tf.feature_column.numeric_column('longitude'),
    list(atlanta.longitude.edges))

# Cross the bucketized columns, using 5000 hash bins.
crossed_lat_lon_fc = tf.feature_column.crossed_column(
    [latitude_bucket_fc, longitude_bucket_fc], 5000)

fc = [
    latitude_bucket_fc,
    longitude_bucket_fc,
    crossed_lat_lon_fc]

# Build and train the Estimator.
est = tf.estimator.LinearRegressor(fc, ...)

The feature column assigns an example to a index by running a hash function on the tuple of inputs, followed by a modulo operation with hash_bucket_size(在上面的例子里面是5000,把地理位置tuple hash转化为index)
可能存在hash冲突问题,when creating feature crosses, you typically still should include the original (uncrossed) features in your model (as in the preceding code snippet).来解决冲突

3.3 Indicator and embedding columns

Indicator
categorical_column = ... # Create any type of categorical column.

# Represent the categorical column as an indicator column.
indicator_column = tf.feature_column.indicator_column(categorical_column)
两种方法
  • 直接用indicator column,但是如果类别太多,这个indicator就太多
  • 用embedding column,把indicator column映射为lookup table
Embedding column

现在rule of thumb:embedding_dimensions = number_of_categories0.25
比如说 3 = 81
0.25 设置 embedding size为3

categorical_column = ... # Create any categorical column

# Represent the categorical column as an embedding column.
# This means creating an embedding vector lookup table with one element for each category.
embedding_column = tf.feature_column.embedding_column(
    categorical_column=categorical_column,
    dimension=embedding_dimensions)

3.4 Passing feature columns to Estimators

Not all estimators permit all types of feature_columns

4 Datasets for Estimators

Dataset输入方法

Basic input

Reading a CSV File

5 Creating Custom Estimators

Pre-made vs. custom

Process

Write a model function
Define the model
Implement training, evaluation, and prediction
The custom Estimator
TensorBoard

相关文章

网友评论

      本文标题:TensorFlow1.0 - C2 Guide - 2 Est

      本文链接:https://www.haomeiwen.com/subject/ndcpjctx.html