1 Premade Estimators
input function return dataset, feature column, estimators
1.1 Prerequisites and getting the sample code and programming stack
pip install pandas
git clone https://github.com/tensorflow/models
cd models/samples/core/get_started/
python premade_estimator.py
programming stack

1.2 Sample of classifying irises: an overview


1.3 Overview of programming with Estimators
- Create one or more input functions.
- Define the model's feature columns.
- Instantiate an Estimator, specifying the feature columns and various hyperparameters.
- Call one or more methods on the Estimator object, passing the appropriate input function as the source of the data.
1.4 Create input functions and define the feature columns and instantiate an estimator and train, evaluate, predict
1.4.1 Input function
Return features and label
def train_input_fn(features, labels, batch_size):
"""An input function for training"""
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
# Shuffle, repeat, and batch the examples.
return dataset.shuffle(1000).repeat().batch(batch_size)
1.4.2 Feature columns
# Feature columns describe how to use the input.
my_feature_columns = []
for key in train_x.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
1.4.3 Instantiate an estimator
# Build a DNN with 2 hidden layers and 10 nodes in each hidden layer.
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
# Two hidden layers of 10 nodes each.
hidden_units=[10, 10],
# The model must choose between 3 classes.
n_classes=3)
1.4.4 Train evaluate predict
# Train the Model.
classifier.train(
input_fn=lambda:iris_data.train_input_fn(train_x, train_y, args.batch_size),
steps=args.train_steps)
# Evaluate the model.
eval_result = classifier.evaluate(
input_fn=lambda:iris_data.eval_input_fn(test_x, test_y, args.batch_size))
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"')
for pred_dict, expec in zip(predictions, expected):
class_id = pred_dict['class_ids'][0]
probability = pred_dict['probabilities'][class_id]
print(template.format(iris_data.SPECIES[class_id],
100 * probability, expec))
2 Checkpoints
checkpoints自动save和restore
2.1 Saving
- Automatically write checkpoints and event files
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
hidden_units=[10, 10],
n_classes=3,
model_dir='models/iris')
If not set,then they are saved into a temp directory
- Checking frequency
Write every 10 minutes
Write a checkpoint when the train starts and completes
Retain only the 5 most recent
Using tf.estimator.RunConfig() to abjust
2.2 Restoring
第一次调用train方法,保存一个checkpoint
以后每次调用train,evaluate,predict都会
- 重新建立graph
- 由最新的checkpoint初始化新model的参数
- 注意model和checkpoint必须匹配上才行
3 Feature Columns
9种2类
3.1 Input to a DNN
Numerical feature and categorical feature(use one-hot)
3.2 Feature Columns

Numeric column
直接的数字化特征
Bucketized column
把数字按照区段切分转化为one-hot格式
Categorical identity columns
把数字1个1个切换转化为one-hot格式,超过范围的为默认值(默认值必须配置,否则就fail)https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_identity
Categorical vocabulary column
一个是从list读,一个是从file里面读
# Given input "feature_name_from_input_fn" which is a string,
# create a categorical feature by mapping the input to one of
# the elements in the vocabulary list.
vocabulary_feature_column =
tf.feature_column.categorical_column_with_vocabulary_list(
key=feature_name_from_input_fn,
vocabulary_list=["kitchenware", "electronics", "sports"])
# Given input "feature_name_from_input_fn" which is a string,
# create a categorical feature to our model by mapping the input to one of
# the elements in the vocabulary file
vocabulary_feature_column =
tf.feature_column.categorical_column_with_vocabulary_file(
key=feature_name_from_input_fn,
vocabulary_file="product_class.txt",
vocabulary_size=3)
Hashed column

hashed_feature_column =
tf.feature_column.categorical_column_with_hash_bucket(
key = "some_feature",
hash_bucket_size = 100) # The number of categories
Crossed column
You can create a feature cross from input_fn or any categorical column, except categorical_column_with_hash_bucket
def make_dataset(latitude, longitude, labels):
assert latitude.shape == longitude.shape == labels.shape
features = {'latitude': latitude.flatten(),
'longitude': longitude.flatten()}
labels=labels.flatten()
return tf.data.Dataset.from_tensor_slices((features, labels))
# Bucketize the latitude and longitude using the `edges`
latitude_bucket_fc = tf.feature_column.bucketized_column(
tf.feature_column.numeric_column('latitude'),
list(atlanta.latitude.edges))
longitude_bucket_fc = tf.feature_column.bucketized_column(
tf.feature_column.numeric_column('longitude'),
list(atlanta.longitude.edges))
# Cross the bucketized columns, using 5000 hash bins.
crossed_lat_lon_fc = tf.feature_column.crossed_column(
[latitude_bucket_fc, longitude_bucket_fc], 5000)
fc = [
latitude_bucket_fc,
longitude_bucket_fc,
crossed_lat_lon_fc]
# Build and train the Estimator.
est = tf.estimator.LinearRegressor(fc, ...)
The feature column assigns an example to a index by running a hash function on the tuple of inputs, followed by a modulo operation with hash_bucket_size(在上面的例子里面是5000,把地理位置tuple hash转化为index)
可能存在hash冲突问题,when creating feature crosses, you typically still should include the original (uncrossed) features in your model (as in the preceding code snippet).来解决冲突
3.3 Indicator and embedding columns
Indicator
categorical_column = ... # Create any type of categorical column.
# Represent the categorical column as an indicator column.
indicator_column = tf.feature_column.indicator_column(categorical_column)

两种方法
- 直接用indicator column,但是如果类别太多,这个indicator就太多
- 用embedding column,把indicator column映射为lookup table
Embedding column
现在rule of thumb:embedding_dimensions = number_of_categories0.25
比如说 3 = 810.25 设置 embedding size为3
categorical_column = ... # Create any categorical column
# Represent the categorical column as an embedding column.
# This means creating an embedding vector lookup table with one element for each category.
embedding_column = tf.feature_column.embedding_column(
categorical_column=categorical_column,
dimension=embedding_dimensions)
3.4 Passing feature columns to Estimators
Not all estimators permit all types of feature_columns
4 Datasets for Estimators
Dataset输入方法
Basic input
Reading a CSV File
5 Creating Custom Estimators
Pre-made vs. custom
Process
Write a model function
Define the model
Implement training, evaluation, and prediction
The custom Estimator
TensorBoard
网友评论