Warning: Undefined variable $sql_ispay in /www/zhoutao/wp-content/themes/dux/shop/include/wppay.class.php on line 64
PHPCMS的采集和其他采集系统大同小异,先从列表页获取内容列表,然后批量采集列表里的内容。当遇到没有列表(如微信,头条)或者是不想全部采集,只采集某个内容的情况下,就不好办了!
也有网友分享过采集单页的功能,要么规则无法自定义,要么是借用第三方接口的,以下教程是自己在采集管理里写采集规则,自主采集,不依靠第三方!
先看看视频吧
[qq-video vids='e33161a25mp']
修改方法如下:
第一步:
打开
/phpcms/modules/collection/node.php
修改public_test_content方法为下面的代码
//测试文章内容采集
public function public_test_content() {
$url = isset($_GET['url']) ? urldecode($_GET['url']) : exit('0');
$nodeid = isset($_GET['nodeid']) ? intval($_GET['nodeid']) : showmessage(L('illegal_parameters'), HTTP_REFERER);
pc_base::load_app_class('collection', '', 0);
if ($data = $this->db->get_one(array('nodeid'=>$nodeid))) {
$contents = collection::get_content($url, $data);
//加载所有的处理函数
$funcs_file_list = glob(dirname(__FILE__).DIRECTORY_SEPARATOR.'spider_funs'.DIRECTORY_SEPARATOR.'*.php');
foreach ($funcs_file_list as $v) {
include $v;
}
//在这里测试
foreach ($contents as $_key=>$_content) {
if($_key=='content') $contents['spider_image']=spider_images(new_stripslashes($_content));
if(trim($_content)=='') $contents[$_key] = "";//'◆◆◆◆◆◆◆◆◆◆'.$_key.' empty◆◆◆◆◆◆◆◆◆◆';
}
if(isset($_GET['jsoncallback'])){
if (pc_base::load_config('system', 'charset') == 'gbk') {
$contents = array_iconv($contents, 'utf-8', 'gbk');
}
echo safe_replace($_GET['jsoncallback'])."({\"items\":".json_encode($contents)."})";
}else{
print_r($contents);
}
} else {
showmessage(L('notfound'));
}
}
同时在下面增加
public function public_spider(){
$nodelist = $this->db->select(array('siteid'=>$this->siteid),'nodeid,name','','nodeid DESC');
$buttons = $this->select2arr($nodelist, '', 'id=\'nodeid\'', '选择规则');
include $this->admin_tpl('node_spider');
}
private static function select2arr($array = array(), $id = 0, $str = '', $default_option = '') {
$string = '<select '.$str.'>';
$default_selected = (empty($id) && $default_option) ? 'selected' : '';
if($default_option) $string .= "<option value='' $default_selected>$default_option</option>";
if(!is_array($array) || count($array)== 0) return false;
foreach($array as $key=>$vs) {
//$selected = $id==$key ? 'selected' : '';
$string .= '<option value="'.$vs['nodeid'].'" >'.$vs['name'].'</option>';
}
$string .= '</select>';
return $string;
}
/phpcms/modules/collection/classes/spider_photos.php
增加
function spider_images($str) {
$field = $GLOBALS['field'];
$array = array();
if(empty($str)) return $array;
$array[$field.'_url'] = array();
preg_match_all('/(?:(http:|https:|rtsp:))((?!thumb)\S)*?(?:\.jpg|\.jpeg|\.png|\.bmp|\.gif)/i', $str, $out);//不含有thumb的url
if (isset($out[0]))foreach ($out[0] as $v) {
$array[$field.'_url'][] = $v;
}
return $array;
}
第二步:
您需要 支付1元 才能阅读付费内容
如果帮到您,请点个赞。如需帮助请评论留言或者加我QQ:6045564
评论前必须登录!
注册