0前言
大容量、全样本和多时间维度的地铁刷卡数据是研究城市交通和出行行为的重要样本。在研究之前,应该确定基于刷卡原始数据,界定工作-居住地点。笔者参考过往的文献中的提取方法(Long, 2015; Zhou, 2014; Gao, 2018; Lee, 2014),并对相应方法进行提炼。以一日(1 d)刷卡数据为例,笔者给出了相应的提取逻辑,同时完成了相应MATLAB代码。
如果您有任何疑问,欢迎联系本文作者。如果你使用了本文中的代码,不必告知作者,但请正确给出引用。
1方法
1.1数据格式
刷卡数据不易获取,但格式都大同小异。笔者使用的刷卡数据包括5个最基本的字段:卡号、上车站、上车时间、下车站、下车时间。在MATLAB中,笔者把这些数据分别整理成了Category、Double、Datetime、Double和Datetime格式。
下图是小部分数据的截图,笔者的数据不可以共享,请不要联系笔者获取数据。截图仅为方便理解代码之用。
数据示例
1.2识别规则
每日职住识别
- 一个个体至少在一天内拥有两条刷卡记录,这样该个体才能构成基本的“家-工作地-家/其他”的出行链;
- 对于一个个体的连续两条刷卡记录,若前一条的出站站点和后一条的入站站点完全一致,则识别该个体在该站点停留,记为在s站点停留t时间(s,t);
- 将个体在一日内最长(max(t))停留,且超过6小时(t>=6h)的站点,识别为工作地,这些个体称为可识别个体(detectable individual)若个体无满足相应条件的站点,则记为不可识别的个体(undetectable individual);
- 将这些可识别个体(detectable individual)当天上车的第一个站点记为居住地,因为交通调查显示超过99.5%个体的第一站点与居住站点一致。
1.3编程实现
在具体的编程实现上,代码的编写与识别规则的描述基本一致,但是有许多具体的技巧,这些技巧的目的主要是为了让程序运行的更加快速。譬如,要注意到提取职住地是极为复杂-耗时的运算,应该一步一步缩小提取的对象,对于不可能提取的个体提前剔除(例如仅1次出行的)。
下面是每日职住的识别代码。对于该代码的表现,笔者给出初步的数据:在约300w条的一日刷卡记录下,笔者采用一般的个人台式电脑(4代i5,16gb内存,全固态硬盘)运行,大约可以在15个小时内提取出8w个微观个体的职住站点。因此,如果在更大的时间维度上进行提取,应该对代码进行相应的优化,具体的优化方法笔者将在后续的技术文章中给出。
%% Extract the unique individuals (提取个体,即准备识别对象的全集)
% Extract all unique users and their metro ride/rides.
CardData.CardID = categorical(CardData.CardID); % Transform the ID into catogorical format for the use of countcats function.
TbUser_All = table(unique(CardData.CardID)); % extract a table containing all metro riders in a day
TbUser_All.Properties.VariableNames{1} = 'CardID';
TbUser_All.Count = countcats(CardData.CardID);
TbUser = TbUser_All(TbUser_All.Count>= 2,:);% a metro-rider should take at least 2 rides in a day to extract a travel-chain.
% Display some results
clear Text
Text = ['##The day is the day ',num2str(y),' in year ',num2str(x+2014),'.','There are ',num2str(height(TbUser_All)),' unique users. ','Users take more than 2 rides count for ',num2str(height(TbUser)),'.##'];
disp(Text) %print the day, unique users and users that took more than 2 rides.
clear TbUser_All
clear Text
%% Generate the job station of all detectable individuals, meanwhile, delete the undetectable individuals from the table(TbUser_J) (在全集中提取可以被识别就业地的地铁乘客)
% First,calculate all unique riders' possible job station and duration. (TbUser 4 rows)
clear i;
clear ID;
for i = 1:5000
clear TravelRecord
ID = TbUser{i,1};
if TbUser{i,2} == 2 %for the users with only 2 records (which acccount for more than 90%), things can be more simple.
TravelRecord = CardData(CardData.CardID == ID,:); % extract a table containing all riders records
clear T_Work; % the possible duration Time of Work
clear S_Work; % the possible Station of Work
T_Work = duration(00,00,00); %set the duration to 00:00:00
if TravelRecord{1,5} == TravelRecord{2,3} %s1 in trip2 == s2 in trip1
T_Work = TravelRecord{2,2}-TravelRecord{1,4};
S_Work = TravelRecord{1,5};
else
S_Work = 0;% for station ID == 0, it represents no detectable results.
end
TbUser{i,3} = S_Work;
TbUser{i,4} = T_Work;
clear TravelRecord;
clear T_Work;
clear S_Work;
elseif TbUser{i,2} >= 3 % when TbUser.Count > 2
TravelRecord = CardData(CardData.CardID == ID,:); % extract a table containing all riders records
clear T_Work;
clear T_Work0;
clear S_Work;
clear m;
T_Work = duration(00,00,00);
S_Work = 0;
for m = 1:(TbUser{i,2}-1) % This loop is to find the longest staying station and calculate the duration
if TravelRecord{m,5} == TravelRecord{(m+1),3}
T_Work0 = TravelRecord{(m+1),2}-TravelRecord{m,4};%the staying time
if T_Work0 >= T_Work
T_Work = T_Work0;
S_Work = TravelRecord{m,5};
end
clear T_Work0
end
end
TbUser{i,3} = S_Work;
TbUser{i,4} = T_Work;
clear TravelRecord;
clear T_Work;
clear S_Work;
end
end
clear i;
clear ID;
clear m;
TbUser.Properties.VariableNames{3} = 'S_Job';
TbUser.Properties.VariableNames{4} = 'T_Job';
% In this part, the TbUser has been calculated the possbile work station and work duration of all unique users.
T = duration(06,00,00);
TbUser_J = TbUser(TbUser.T_Job>T,:);%extract the unique records that takes a job duration more than 6 hours.
clear T;
% Print the results
clear Text
Text = ['##The day is the day ',num2str(y),' in year ',num2str(x+2014),'.','There are ',num2str(height(TbUser_J)),' job-detetable users.##'];
disp(Text) %print the day, unique users and users that took more than 2 rides.
clear TbUser
clear Text
%% Generate the home station of all detectable individuals, meanwhile, delete the undetectable ones (TbUser_JH)(提取他们的居住地)
clear i;
clear ID;
clear S_Home
for i = 1:height(TbUser_J)
ID = TbUser_J{i,1};
clear TravelRecord;
TravelRecord = CardData(CardData.CardID == ID,:); % extract a table containing all riders records
S_Home = TravelRecord{1,3};
TbUser_J{i,5} = S_Home;
end
clear i;
clear ID;
clear S_Home;
clear TravelRecord;
TbUser_J.Properties.VariableNames{5} = 'S_Home';
TbUser_JH = TbUser_J;
clear Text
Text = ['##The day is the day ',num2str(y),' in year ',num2str(x+2014),'. The job-home location has been extracted successfully!##'];
disp(Text)
clear Text
clear TbUser_J;
clear x;
clear y;
clear CardData;
toc(代码结束,tic-toc用于计算时间,不需要可以删去)
资料
> Long, Y., & Thill, J. C. (2015). Combining smart card data and household travel survey to analyze jobs-housing relationships in Beijing. Computers, Environment and Urban Systems, 53, 19–35.
https://doi.org/10.1016/j.compenvurbsys.2015.02.005
*这篇文献详细介绍了本文采用的识别规则,并使用北京的数据给出了实例*
> Zhou, J., & Long, Y. (2014). Jobs-housing balance of bus commuters in Beijing: Exploration with large-scale synthesized smart card data. Transportation Research Record: Journal of the Transportation Research Board, 2418, 1–10.
https://doi.org/10.3141/2418-01
*规则与数据均与Long(2015)的文章一致*
> Gao, Q.-L., Li, Q.-Q., Yue, Y., Zhuang, Y., Chen, Z.-P., & Kong, H. (2018). Exploring changes in the spatial distribution of the low-to-moderate income group using transit smart card data. Computers, Environment and Urban Systems, 72(July 2017), 68–77.
https://doi.org/10.1016/j.compenvurbsys.2018.02.006
*这篇文章中采用的识别规则比较特别,笔者未采用*
> Lee, S. G., & Hickman, M. (2014). Trip purpose inference using automated fare collection data. Public Transport, 6(1–2), 1–20.
https://doi.org/10.1007/s12469-013-0077-5











网友评论