【技术】用地铁刷卡数据识别个体职住

作者: pugkingup | 来源:发表于2019-11-10 21:12 被阅读0次

【技术】用地铁刷卡数据识别个体职住
广东人脸识别测温终端一体机厂家门禁考勤机替代指纹识别门禁机
基于职住表的街道平均通勤距离分析
计算机毕业设计Hadoop+Spark交通大数据地铁客流量分析
进入广州、深圳地铁！全态识别测试效果令人瞩目
刷脸进出校园，智能授课，安徽拉开智慧校园建设新篇章
互联快讯北京可手机刷卡进地铁 AI击败DotA2顶级玩家
基于手机信令的大数据分析教程（番外一）——GIS中基于空间连接汇
基于手机信令的大数据分析教程（十）——数据库中计算每个街道的平均
这样操作，可以快速录入数据

0前言

大容量、全样本和多时间维度的地铁刷卡数据是研究城市交通和出行行为的重要样本。在研究之前，应该确定基于刷卡原始数据，界定工作-居住地点。笔者参考过往的文献中的提取方法(Long, 2015; Zhou, 2014; Gao, 2018; Lee, 2014)，并对相应方法进行提炼。以一日（1 d）刷卡数据为例，笔者给出了相应的提取逻辑，同时完成了相应MATLAB代码。
如果您有任何疑问，欢迎联系本文作者。如果你使用了本文中的代码，不必告知作者，但请正确给出引用。

1方法

1.1数据格式

刷卡数据不易获取，但格式都大同小异。笔者使用的刷卡数据包括5个最基本的字段：卡号、上车站、上车时间、下车站、下车时间。在MATLAB中，笔者把这些数据分别整理成了Category、Double、Datetime、Double和Datetime格式。
下图是小部分数据的截图，笔者的数据不可以共享，请不要联系笔者获取数据。截图仅为方便理解代码之用。

数据示例

1.2识别规则

每日职住识别

一个个体至少在一天内拥有两条刷卡记录，这样该个体才能构成基本的“家-工作地-家/其他”的出行链；
对于一个个体的连续两条刷卡记录，若前一条的出站站点和后一条的入站站点完全一致，则识别该个体在该站点停留，记为在s站点停留t时间（s，t）；
将个体在一日内最长（max(t)）停留，且超过6小时（t>=6h）的站点，识别为工作地，这些个体称为可识别个体（detectable individual）若个体无满足相应条件的站点，则记为不可识别的个体（undetectable individual）；
将这些可识别个体（detectable individual）当天上车的第一个站点记为居住地，因为交通调查显示超过99.5%个体的第一站点与居住站点一致。

1.3编程实现

在具体的编程实现上，代码的编写与识别规则的描述基本一致，但是有许多具体的技巧，这些技巧的目的主要是为了让程序运行的更加快速。譬如，要注意到提取职住地是极为复杂-耗时的运算，应该一步一步缩小提取的对象，对于不可能提取的个体提前剔除（例如仅1次出行的）。
下面是每日职住的识别代码。对于该代码的表现，笔者给出初步的数据：在约300w条的一日刷卡记录下，笔者采用一般的个人台式电脑（4代i5，16gb内存，全固态硬盘）运行，大约可以在15个小时内提取出8w个微观个体的职住站点。因此，如果在更大的时间维度上进行提取，应该对代码进行相应的优化，具体的优化方法笔者将在后续的技术文章中给出。

%% Extract the unique individuals （提取个体，即准备识别对象的全集）
% Extract all unique users and their metro ride/rides.
CardData.CardID = categorical(CardData.CardID); % Transform the ID into catogorical format for the use of countcats function.
TbUser_All = table(unique(CardData.CardID)); % extract a table containing all metro riders in a day
TbUser_All.Properties.VariableNames{1} = 'CardID';
TbUser_All.Count = countcats(CardData.CardID);
TbUser = TbUser_All(TbUser_All.Count>= 2,:);% a metro-rider should take at least 2 rides in a day to extract a travel-chain.

% Display some results
clear Text
Text = ['##The day is the day ',num2str(y),' in year ',num2str(x+2014),'.','There are ',num2str(height(TbUser_All)),' unique users. ','Users take more than 2 rides count for ',num2str(height(TbUser)),'.##'];
disp(Text) %print the day, unique users and users that took more than 2 rides.
clear TbUser_All
clear Text

%% Generate the job station of all detectable individuals, meanwhile, delete the undetectable individuals from the table(TbUser_J) （在全集中提取可以被识别就业地的地铁乘客）
% First,calculate all unique riders' possible job station and duration. (TbUser 4 rows)
clear i;
clear ID;
for i = 1:5000
clear TravelRecord
    ID = TbUser{i,1};
    if TbUser{i,2} == 2 %for the users with only 2 records (which acccount for more than 90%), things can be more simple.
        TravelRecord = CardData(CardData.CardID == ID,:); % extract a table containing all riders records
        
        clear T_Work; % the possible duration Time of Work
        clear S_Work; % the possible Station of Work
        T_Work = duration(00,00,00); %set the duration to 00:00:00
        if TravelRecord{1,5} == TravelRecord{2,3} %s1 in trip2 == s2 in trip1
            T_Work = TravelRecord{2,2}-TravelRecord{1,4};
            S_Work = TravelRecord{1,5};
        else
            S_Work = 0;% for station ID == 0, it represents no detectable results.
        end
        TbUser{i,3} = S_Work;
        TbUser{i,4} = T_Work;
        clear TravelRecord;
        clear T_Work;
        clear S_Work;
        
    elseif TbUser{i,2} >= 3 % when TbUser.Count > 2
        TravelRecord = CardData(CardData.CardID == ID,:); % extract a table containing all riders records
        
        clear T_Work;
        clear T_Work0;
        clear S_Work;
        clear m;
        T_Work = duration(00,00,00);
        S_Work = 0;
        
        for m = 1:(TbUser{i,2}-1) % This loop is to find the longest staying station and calculate the duration
            if TravelRecord{m,5} == TravelRecord{(m+1),3}
                T_Work0 = TravelRecord{(m+1),2}-TravelRecord{m,4};%the staying time
                
                if T_Work0 >= T_Work
                    T_Work = T_Work0;
                    S_Work = TravelRecord{m,5};
                end
                clear T_Work0
            end
        end
        TbUser{i,3} = S_Work;
        TbUser{i,4} = T_Work;
        clear TravelRecord;
        clear T_Work;
        clear S_Work;
    end
end
clear i;
clear ID;
clear m;
TbUser.Properties.VariableNames{3} = 'S_Job';
TbUser.Properties.VariableNames{4} = 'T_Job';

% In this part, the TbUser has been calculated the possbile work station and work duration of all unique users.
T = duration(06,00,00);
TbUser_J = TbUser(TbUser.T_Job>T,:);%extract the unique records that takes a job duration more than 6 hours.
clear T;

% Print the results
clear Text
Text = ['##The day is the day ',num2str(y),' in year ',num2str(x+2014),'.','There are ',num2str(height(TbUser_J)),' job-detetable users.##'];
disp(Text) %print the day, unique users and users that took more than 2 rides.
clear TbUser
clear Text

%% Generate the home station of all detectable individuals, meanwhile, delete the undetectable ones (TbUser_JH)（提取他们的居住地）
clear i;
clear ID;
clear S_Home
for i = 1:height(TbUser_J)
    ID = TbUser_J{i,1};
    clear TravelRecord;
    TravelRecord = CardData(CardData.CardID == ID,:); % extract a table containing all riders records
    S_Home = TravelRecord{1,3};
    
    TbUser_J{i,5} = S_Home;
end
clear i;
clear ID;
clear S_Home;
clear TravelRecord;

TbUser_J.Properties.VariableNames{5} = 'S_Home';
TbUser_JH = TbUser_J;


clear Text
Text = ['##The day is the day ',num2str(y),' in year ',num2str(x+2014),'. The job-home location has been extracted successfully!##'];
disp(Text)
clear Text

clear TbUser_J;
clear x;
clear y;
clear CardData;

toc（代码结束，tic-toc用于计算时间，不需要可以删去）

资料

>  Long, Y., & Thill, J. C. (2015). Combining smart card data and household travel survey to analyze jobs-housing relationships in Beijing. Computers, Environment and Urban Systems, 53, 19–35. 
https://doi.org/10.1016/j.compenvurbsys.2015.02.005
*这篇文献详细介绍了本文采用的识别规则，并使用北京的数据给出了实例*
> Zhou, J., & Long, Y. (2014). Jobs-housing balance of bus commuters in Beijing: Exploration with large-scale synthesized smart card data. Transportation Research Record: Journal of the Transportation Research Board, 2418, 1–10. 
https://doi.org/10.3141/2418-01
*规则与数据均与Long（2015）的文章一致*
> Gao, Q.-L., Li, Q.-Q., Yue, Y., Zhuang, Y., Chen, Z.-P., & Kong, H. (2018). Exploring changes in the spatial distribution of the low-to-moderate income group using transit smart card data. Computers, Environment and Urban Systems, 72(July 2017), 68–77. 
https://doi.org/10.1016/j.compenvurbsys.2018.02.006
*这篇文章中采用的识别规则比较特别，笔者未采用*
> Lee, S. G., & Hickman, M. (2014). Trip purpose inference using automated fare collection data. Public Transport, 6(1–2), 1–20. 
https://doi.org/10.1007/s12469-013-0077-5

【技术】用地铁刷卡数据识别个体职住
0前言大容量、全样本和多时间维度的地铁刷卡数据是研究城市交通和出行行为的重要样本。在研究之前，应该确定基于刷卡原...
广东人脸识别测温终端一体机厂家门禁考勤机替代指纹识别门禁机
现在大多数企业的考勤方式，被指纹识别技术和打卡技术占据着，传统的刷卡识别更是处于“半退休”的状态。随着人脸识别技术...
基于职住表的街道平均通勤距离分析
基于职住表的街道平均通勤距离分析本教程基于西安职住表数据进行分析，前期数据导入，职住数据处理，职住OD分析在“L...
计算机毕业设计Hadoop+Spark交通大数据地铁客流量分析
简介该项目主要分析深圳通刷卡数据，通过大数据技术来研究深圳地铁客运能力及探索优化服务的方向主要讲解Flink流...
进入广州、深圳地铁！全态识别测试效果令人瞩目
近日，全态识别又向业界带来新动态：全态识别技术已进入广州地铁、深圳地铁。在测试现场，它的表现更令人瞩目！深圳地铁...
刷脸进出校园，智能授课，安徽拉开智慧校园建设新篇章
安徽“智慧校园”建设成果突出大数据，数字终端及大数据技术、人脸识别技术遍布校园校门口人脸识别技术，快速准确扫描...
互联快讯北京可手机刷卡进地铁 AI击败DotA2顶级玩家
手机刷卡进地铁北京地铁全线支持手机刷卡乘车 8月14日起北京地铁将全线支持手机刷卡乘车。手机一卡通具有不用缴纳押...
基于手机信令的大数据分析教程（番外一）——GIS中基于空间连接汇
1、将职住数据连接到基站属性表中 GIS中连接数据库，在下拉中找到我们之前每个基站职、住数据的表jizhan_h和...
基于手机信令的大数据分析教程（十）——数据库中计算每个街道的平均
本节重点：职住数据与地理位置和坐标信息连接，计算基站点间直线距离，然后求每个街道的平均通勤距离一、职住数据与地理...
这样操作，可以快速录入数据
1、快速录入数据的方法 1.1用OCR（光学字符识别）技术快速导入数据＊通过扫描光学输入方式将各种票据、报刊、书...