在粗排阶段进行多个维度的排序时往往需要对每个维度设置不同的权重,在没有科学方法之前,普遍做法是基于业务经验拍一个值,然后在这个值的基础上通过AB测试不断调整;那么今天将尝试通过LR模型训练得到每个维度的权重
1、数据预处理
import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.model_selection import train_test_split
df = pd.read_csv("my_test.csv")
df0 = df[df['f_act']==0].sample(frac=0.01)
df1 = df[df['f_act']==1]
df_sample = df1.append(df0)
df_sample = df_sample.fillna(0)
df_sample.columns
2、模型训练
X = df_sample[
["product_total_pv_3d","product_valid_work_users","product_total_pv_7d","product_price"]]
y = df_sample[["f_act"]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
3、效果测试
predictions = regr.predict(X_test)
squared_deviation = np.power(y_test - predictions, 2)
print("MSE:{}".format(np.mean(squared_deviation)["f_act"]))
4、打印权重
print("权重:w1-w5")
print(list(zip(X_train.columns.values,regr.coef_[0])))
#print("偏移:b")
#print(regr.intercept_)











网友评论